Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Conversation

@parmeet
Copy link
Contributor

@parmeet parmeet commented Aug 3, 2022

Updated benchmark code to run on pre-defined number of samples and batch size. By running on higher number of samples gives more robust statistics because 1) we show more variable length samples to tokenizer 2) we are running for larger number of batches instead of just 1 as currently the case.

Benchmark results

Number or samples: 100000

non-batched input

TorchText BERT Tokenizer: 1.7653241670000002
HF BERT Tokenizer (slow): 27.455106365
HF BERT Tokenizer (fast): 5.351107693000003

Batched input

Batch-size: 50
TorchText BERT Tokenizer: 1.376252063
HF BERT Tokenizer (fast): 1.5889374279999995

Batch-size: 100
TorchText BERT Tokenizer: 1.3049638119999996
HF BERT Tokenizer (fast): 1.4069846630000002

Batch-size: 200
TorchText BERT Tokenizer: 1.275028583
HF BERT Tokenizer (fast): 1.2769447180000002

Batch-size: 400
TorchText BERT Tokenizer: 1.3523340929999996
HF BERT Tokenizer (fast): 1.2558808729999997

Apparently, for HF tokenizer the operator call is slower compared to torchtext but the backend implementation is faster. This is why after increase batch size to certain point, HF tokenizer (fast) is more performant compared to torchtext's implementation.

Copy link
Contributor

@Nayef211 Nayef211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@parmeet parmeet merged commit 8eb0561 into pytorch:main Aug 3, 2022
@parmeet parmeet deleted the update_bert_bench branch August 3, 2022 18:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants