Skip to content

Update default n-gram length for Text Transform to match default text recipe #2870

@daholste

Description

@daholste

@justinormont and the text team tuned default n-gram lengths for the default text recipe in the internal repo

These defaults are:
Word -- bigrams (w/ unigrams)
Character -- trigrams (w/o unigrams and bigrams)

One chart from his findings:
image

The line w/ the light blue call-out represents current ML.NET defaults (Unigram + Trichar)
The line w/ the light green call-out is the requested change (Bigram + Trichar)
The line w/ the pink call-out shows the Trigram+Trichar is better in terms of accuracy, but with a time hit, and accuracy has a cross over at NumIterations > 8 for Averaged Perceptron learner.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Priority of the issue for triage purpose: Needs to be fixed at some point.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions