Skip to content

Fixes for the documentation of the TextCatalog  #3491

@sfilipi

Description

@sfilipi

TokenizeIntoCharactersAsKeys:

  • The description of TokenizingByCharactersEstimator should be corrected to:
    "Create a TokenizingByCharactersEstimator, which tokenizes words by splitting text into sequences of
    characters using a sliding window."
  • outputColumnName description should state that the outputs are Uints rather than keys? I think it might confuse the users that those are KeyDataViewTypes. Or should the name of this method be changed? @artidoro @Ivanidzo4ka @zeahmed ?
    "Name of the column resulting from the transformation of inputColumnName. This column's data type will be a variable-sized vector of Uint".
  • useMarkerCharacters needs a better description.

RemoveStopWords

  • inputColumnName,:
    "This estimator operates over a vector of text.

CustomStopWordsRemovingEstimator

  • Output column data type
    "Variable-sized vector of Text"
    Replace Unknown-sized vector with Variable-sized vector.
  • xref not resolving:
    xref:Microsoft.ML.Transforms.Text.CustomStopWordsRemovingTransformer/

WordHashBagEstimator

  • Output column data type
    Known-size vector of of Single
  • Replace metadata with annotations in the documentation references.

NgramHashingEstimator

  • broken xref:Microsoft.ML.Transforms.Text.NgramHashingTransformer/ link.
  • casing: "in a way that the former takes "

NormalizeText

  • outputColumnName
    "This column's data type is a scalar of text or "

WordEmbeddingEstimator

  • Add links for Glove50D, dimensionality of the embedding model used.
  • Re-phrasehere and everywhere: See the See Also section for links to usage examples.

Metadata

Metadata

Assignees

Labels

documentationRelated to documentation of ML.NET

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions