This repository was archived by the owner on Sep 10, 2025. It is now read-only.
Commit 9480258
authored
Add HuggingFace Tokenizer support for Granite Code (#1261)
* feat(tokenizer): Add an abstract base class for additional tokenizer support
Branch: GraniteCodeSupport
Signed-off-by: Gabe Goodhart <[email protected]>
* feat(tokenizers): Add a python impl of the Tokenizer interface using tokenizers
This allows for all HF tokenizers to be supported in the python layer. It
will need significant work to offer similar compatibility at the c++ layer.
Signed-off-by: Gabe Goodhart <[email protected]>
* feat(builder): Add support for using the TokenizersTokenizer in builder
Branch: GraniteCodeSupport
Signed-off-by: Gabe Goodhart <[email protected]>
* feat(tokenizers): Add and plumb the option to use the "tokenizers" tokenizer
Branch: GraniteCodeSupport
Signed-off-by: Gabe Goodhart <[email protected]>
* fix(tokenizers): Fix how bos/eos tokens are parsed from tokenizers (lib)
Branch: GraniteCodeSupport
Signed-off-by: Gabe Goodhart <[email protected]>
* fix(hf_tokenizer): Rename to HFTokenizer and corresponding flags
#1251
Branch: TokenizersTokenizer-1251
Co-Authored-By: [email protected]
Signed-off-by: Gabe Goodhart <[email protected]>
---------
Signed-off-by: Gabe Goodhart <[email protected]>1 parent 4510ba0 commit 9480258
File tree
5 files changed
+169
-7
lines changed- tokenizer
- torchchat
- cli
5 files changed
+169
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
| 27 | + | |
26 | 28 | | |
27 | 29 | | |
28 | 30 | | |
| |||
38 | 40 | | |
39 | 41 | | |
40 | 42 | | |
41 | | - | |
| 43 | + | |
42 | 44 | | |
43 | 45 | | |
44 | 46 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
215 | 215 | | |
216 | 216 | | |
217 | 217 | | |
| 218 | + | |
218 | 219 | | |
219 | 220 | | |
220 | 221 | | |
| |||
224 | 225 | | |
225 | 226 | | |
226 | 227 | | |
| 228 | + | |
227 | 229 | | |
228 | 230 | | |
229 | 231 | | |
| |||
234 | 236 | | |
235 | 237 | | |
236 | 238 | | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
237 | 251 | | |
238 | 252 | | |
239 | 253 | | |
240 | 254 | | |
241 | 255 | | |
242 | 256 | | |
| 257 | + | |
243 | 258 | | |
244 | 259 | | |
245 | 260 | | |
| |||
251 | 266 | | |
252 | 267 | | |
253 | 268 | | |
254 | | - | |
| 269 | + | |
255 | 270 | | |
256 | 271 | | |
257 | 272 | | |
258 | 273 | | |
| 274 | + | |
259 | 275 | | |
| 276 | + | |
| 277 | + | |
260 | 278 | | |
261 | | - | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
262 | 284 | | |
263 | | - | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
264 | 290 | | |
265 | 291 | | |
266 | 292 | | |
| |||
655 | 681 | | |
656 | 682 | | |
657 | 683 | | |
658 | | - | |
659 | | - | |
| 684 | + | |
| 685 | + | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
270 | 270 | | |
271 | 271 | | |
272 | 272 | | |
| 273 | + | |
273 | 274 | | |
| 275 | + | |
274 | 276 | | |
275 | 277 | | |
276 | 278 | | |
| |||
327 | 329 | | |
328 | 330 | | |
329 | 331 | | |
| 332 | + | |
330 | 333 | | |
331 | 334 | | |
332 | 335 | | |
333 | 336 | | |
334 | 337 | | |
335 | 338 | | |
| 339 | + | |
336 | 340 | | |
337 | 341 | | |
338 | 342 | | |
| |||
341 | 345 | | |
342 | 346 | | |
343 | 347 | | |
| 348 | + | |
344 | 349 | | |
345 | 350 | | |
346 | 351 | | |
| |||
367 | 372 | | |
368 | 373 | | |
369 | 374 | | |
370 | | - | |
| 375 | + | |
| 376 | + | |
371 | 377 | | |
372 | 378 | | |
373 | 379 | | |
| |||
0 commit comments