Skip to content
This repository was archived by the owner on Nov 16, 2023. It is now read-only.
This repository was archived by the owner on Nov 16, 2023. It is now read-only.

Mismatch in output of onnx exported CharTokenizer model #477

@antoniovs1029

Description

@antoniovs1029

The onnx export test for CharTokenizer is failing in the current tests so it has been disabled (link). The output comming from ML.NET, OnnxRunner, and ORT on that test are different.

Here is a repro script, and its output. Notice the difference both in values and dtypes between the different outputs.

NOTE: The DataFrameTool is the one found here in the repository.

Repro

import pandas as pd
import tempfile
from nimbusml.datasets import get_dataset
from nimbusml.preprocessing.text import CharTokenizer
from nimbusml.preprocessing import OnnxRunner
from data_frame_tool import DataFrameTool as DFT

file_path = get_dataset("wiki_detox_train").as_filepath()
dataset = pd.read_csv(file_path, sep='\t')
dataset = dataset.head(10)

estimator = CharTokenizer(columns={'SentimentText_Transform': 'SentimentText'})
estimator.fit(dataset)

print("\n\nML.NET RESULT")
result_expected = estimator.transform(dataset)
print(estimator.model_)
print(result_expected)
print(result_expected.dtypes)

print("\n\nORT RESULT")
onnx_path = "C:\\Users\\anvelazq\Desktop\\is29chartokenizer\\chartokenizer.onnx"
estimator.export_to_onnx(onnx_path, 'com.microsoft.ml')
onnxrunner = OnnxRunner(model_file=onnx_path)
result_onnx = onnxrunner.fit_transform(dataset)
print(result_onnx)
print(result_onnx.dtypes)

print("\n\nONNX RUNNER RESULT")
df_tool = DFT(onnx_path)
result_ort = df_tool.execute(dataset, [])
print(result_ort)
print(result_ort.dtypes)

Output

ML.NET RESULT
C:\Users\anvelazq\AppData\Local\Temp\tmp4dd2p6jl.model.bin
   Sentiment                                      SentimentText  SentimentText_Transform.000  ...  SentimentText_Transform.419  SentimentText_Transform.420  SentimentText_Transform.421
0          1    ==RUDE== Dude, you are rude upload that carl...                          1.0  ...                          NaN                          NaN                          NaN
1          1    == OK! ==  IM GOING TO VANDALIZE WILD ONES W...                          1.0  ...                          NaN                          NaN                          NaN
2          1     Stop trolling, zapatancas, calling me a lia...                          1.0  ...                          NaN                          NaN                          NaN
3          1    ==You're cool==  You seem like a really cool...                          1.0  ...                          NaN                          NaN                          NaN
4          1   ::::: Why are you threatening me? I'm not bei...                          1.0  ...                          NaN                          NaN                          NaN
5          1    == hey waz up? ==  hey ummm... the fif four ...                          1.0  ...                          NaN                          NaN                          NaN
6          0   ::::::::::I'm not sure either. I think it has...                          1.0  ...                          NaN                          NaN                          NaN
7          0   *::Your POV and propaganda pushing is dully n...                          1.0  ...                         45.0                         31.0                          2.0
8          0    == File:Hildebrandt-Greg and Tim.jpg listed ...                          1.0  ...                          NaN                          NaN                          NaN
9          0    ::::::::This is a gross exaggeration. Nobody...                          1.0  ...                          NaN                          NaN                          NaN

[10 rows x 424 columns]
Sentiment                        int64
SentimentText                   object
SentimentText_Transform.000    float64
SentimentText_Transform.001    float64
SentimentText_Transform.002    float64
                                ...
SentimentText_Transform.417    float64
SentimentText_Transform.418    float64
SentimentText_Transform.419    float64
SentimentText_Transform.420    float64
SentimentText_Transform.421    float64
Length: 424, dtype: object


ORT RESULT
   Sentiment                                      SentimentText  SentimentText_Transform.000  ...  SentimentText_Transform.419  SentimentText_Transform.420  SentimentText_Transform.421
0          1    ==RUDE== Dude, you are rude upload that carl...                          2.0  ...                          NaN                          NaN                          NaN
1          1    == OK! ==  IM GOING TO VANDALIZE WILD ONES W...                          2.0  ...                          NaN                          NaN                          NaN
2          1     Stop trolling, zapatancas, calling me a lia...                          2.0  ...                          NaN                          NaN                          NaN
3          1    ==You're cool==  You seem like a really cool...                          2.0  ...                          NaN                          NaN                          NaN
4          1   ::::: Why are you threatening me? I'm not bei...                          2.0  ...                          NaN                          NaN                          NaN
5          1    == hey waz up? ==  hey ummm... the fif four ...                          2.0  ...                          NaN                          NaN                          NaN
6          0   ::::::::::I'm not sure either. I think it has...                          2.0  ...                          NaN                          NaN                          NaN
7          0   *::Your POV and propaganda pushing is dully n...                          2.0  ...                         46.0                         32.0                          3.0
8          0    == File:Hildebrandt-Greg and Tim.jpg listed ...                          2.0  ...                          NaN                          NaN                          NaN
9          0    ::::::::This is a gross exaggeration. Nobody...                          2.0  ...                          NaN                          NaN                          NaN

[10 rows x 424 columns]
Sentiment                        int64
SentimentText                   object
SentimentText_Transform.000    float32
SentimentText_Transform.001    float32
SentimentText_Transform.002    float32
                                ...
SentimentText_Transform.417    float32
SentimentText_Transform.418    float32
SentimentText_Transform.419    float32
SentimentText_Transform.420    float32
SentimentText_Transform.421    float32
Length: 424, dtype: object


ONNX RUNNER RESULT
   Sentiment.output                               SentimentText.output  SentimentText_Transform.output.0  ...  SentimentText_Transform.output.419  SentimentText_Transform.output.420  SentimentText_Transform.output.421
0                 1    ==RUDE== Dude, you are rude upload that carl...                                 2  ...                               65535                               65535                               65535
1                 1    == OK! ==  IM GOING TO VANDALIZE WILD ONES W...                                 2  ...                               65535                               65535                               65535
2                 1     Stop trolling, zapatancas, calling me a lia...                                 2  ...                               65535                               65535                               65535
3                 1    ==You're cool==  You seem like a really cool...                                 2  ...                               65535                               65535                               65535
4                 1   ::::: Why are you threatening me? I'm not bei...                                 2  ...                               65535                               65535                               65535
5                 1    == hey waz up? ==  hey ummm... the fif four ...                                 2  ...                               65535                               65535                               65535
6                 0   ::::::::::I'm not sure either. I think it has...                                 2  ...                               65535                               65535                               65535
7                 0   *::Your POV and propaganda pushing is dully n...                                 2  ...                                  46                                  32                                   3
8                 0    == File:Hildebrandt-Greg and Tim.jpg listed ...                                 2  ...                               65535                               65535                               65535
9                 0    ::::::::This is a gross exaggeration. Nobody...                                 2  ...                               65535                               65535                               65535

[10 rows x 424 columns]
Sentiment.output                       int64
SentimentText.output                  object
SentimentText_Transform.output.0      uint16
SentimentText_Transform.output.1      uint16
SentimentText_Transform.output.2      uint16
                                       ...
SentimentText_Transform.output.417    uint16
SentimentText_Transform.output.418    uint16
SentimentText_Transform.output.419    uint16
SentimentText_Transform.output.420    uint16
SentimentText_Transform.output.421    uint16
Length: 424, dtype: object

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions