Skip to content

Add more Tokenizer functionality - Create without download sync/async ; Trim APIs #7043

@tarekgh

Description

@tarekgh

We need to incorporate the following enhancements into the tokenizer:

  • Enable the creation of tokenizers with streaming capability to avoid on-demand downloading of vocabulary files.
  • Introduce an API to facilitate encoding up to a specified maximum token count.
  • Introduce API to support encoding text from the end up to the maximum count.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions