Make BERT benchmark code more robust #1871

parmeet · 2022-08-03T18:09:37Z

Updated benchmark code to run on pre-defined number of samples and batch size. By running on higher number of samples gives more robust statistics because 1) we show more variable length samples to tokenizer 2) we are running for larger number of batches instead of just 1 as currently the case.

Benchmark results

Number or samples: 100000

non-batched input

TorchText BERT Tokenizer: 1.7653241670000002
HF BERT Tokenizer (slow): 27.455106365
HF BERT Tokenizer (fast): 5.351107693000003

Batched input

Batch-size: 50
TorchText BERT Tokenizer: 1.376252063
HF BERT Tokenizer (fast): 1.5889374279999995

Batch-size: 100
TorchText BERT Tokenizer: 1.3049638119999996
HF BERT Tokenizer (fast): 1.4069846630000002

Batch-size: 200
TorchText BERT Tokenizer: 1.275028583
HF BERT Tokenizer (fast): 1.2769447180000002

Batch-size: 400
TorchText BERT Tokenizer: 1.3523340929999996
HF BERT Tokenizer (fast): 1.2558808729999997

Apparently, for HF tokenizer the operator call is slower compared to torchtext but the backend implementation is faster. This is why after increase batch size to certain point, HF tokenizer (fast) is more performant compared to torchtext's implementation.

Nayef211

LGTM

Make BERT benchmark code more robust

95ad0be

facebook-github-bot added the cla signed label Aug 3, 2022

parmeet requested a review from Nayef211 August 3, 2022 18:09

Nayef211 approved these changes Aug 3, 2022

View reviewed changes

parmeet merged commit 8eb0561 into pytorch:main Aug 3, 2022

parmeet deleted the update_bert_bench branch August 3, 2022 18:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make BERT benchmark code more robust #1871

Make BERT benchmark code more robust #1871

Uh oh!

parmeet commented Aug 3, 2022

Uh oh!

Nayef211 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Make BERT benchmark code more robust #1871

Make BERT benchmark code more robust #1871

Uh oh!

Conversation

parmeet commented Aug 3, 2022

Benchmark results

non-batched input

Batched input

Uh oh!

Nayef211 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants