1616#
1717#
1818# In this tutorial, we will apply the dynamic quantization on a BERT
19- # model, closely following the BERT model from the HuggingFace
20- # Transformers examples ( https://github.com/huggingface/transformers) .
19+ # model, closely following the BERT model from ` the HuggingFace
20+ # Transformers examples < https://github.com/huggingface/transformers>`_ .
2121# With this step-by-step journey, we would like to demonstrate how to
2222# convert a well-known state-of-the-art model like BERT into dynamic
2323# quantized model.
2727# achieves the state-of-the-art accuracy results on many popular
2828# Natural Language Processing (NLP) tasks, such as question answering,
2929# text classification, and others. The original paper can be found
30- # here: https://arxiv.org/pdf/1810.04805.pdf.
30+ # ` here < https://arxiv.org/pdf/1810.04805.pdf>`_ .
3131#
3232# - Dynamic quantization support in PyTorch converts a float model to a
3333# quantized model with static int8 or float16 data types for the
3434# weights and dynamic quantization for the activations. The activations
3535# are quantized dynamically (per batch) to int8 when the weights are
36- # quantized to int8.
37- #
38- # In PyTorch, we have `torch.quantization.quantize_dynamic API
39- # <https://pytorch.org/docs/stable/quantization.html#torch.quantization.quantize_dynamic>`_
40- # ,which replaces specified modules with dynamic weight-only quantized
41- # versions and output the quantized model.
36+ # quantized to int8. In PyTorch, we have `torch.quantization.quantize_dynamic API
37+ # <https://pytorch.org/docs/stable/quantization.html#torch.quantization.quantize_dynamic>`_,
38+ # which replaces specified modules with dynamic weight-only quantized
39+ # versions and output the quantized model.
4240#
4341# - We demonstrate the accuracy and inference performance results on the
4442# `Microsoft Research Paraphrase Corpus (MRPC) task <https://www.microsoft.com/en-us/download/details.aspx?id=52398>`_
4745# a corpus of sentence pairs automatically extracted from online news
4846# sources, with human annotations of whether the sentences in the pair
4947# are semantically equivalent. Because the classes are imbalanced (68%
50- # positive, 32% negative), we follow common practice and report both
51- # accuracy and `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_
48+ # positive, 32% negative), we follow the common practice and report
49+ # `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_.
5250# MRPC is a common NLP task for language pair classification, as shown
5351# below.
5452#
55- # .. figure:: /_static/img/bert_mrpc .png
53+ # .. figure:: /_static/img/bert .png
5654
5755
5856######################################################################
59- # Setup
57+ # 1. Setup
6058# -------
6159#
6260# Install PyTorch and HuggingFace Transformers
6361# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6462#
6563# To start this tutorial, let’s first follow the installation instructions
66- # in PyTorch and HuggingFace Github Repo: -
67- #
68- # * https://github.com/pytorch/pytorch/#installation -
69- #
70- # * https://github.com/huggingface/transformers#installation
71- #
72- # In addition, we also install ``sklearn`` package, as we will reuse its
64+ # in PyTorch `here <https://github.com/pytorch/pytorch/#installation>`_ and HuggingFace Github Repo `here <https://github.com/huggingface/transformers#installation>`_.
65+ # In addition, we also install `scikit-learn <https://github.com/scikit-learn/scikit-learn>`_ package, as we will reuse its
7366# built-in F1 score calculation helper function.
7467#
7568# .. code:: shell
9487
9588
9689######################################################################
97- # Import the necessary modules
90+ # 2. Import the necessary modules
9891# ----------------------------
9992#
10093# In this step we import the necessary Python modules for the tutorial.
137130
138131
139132######################################################################
140- # Download the dataset
133+ # 3. Download the dataset
141134# --------------------
142135#
143136# Before running MRPC tasks we download the `GLUE data
144- # <https://gluebenchmark.com/tasks>`_ by running this `script
145- # <https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e>`_ followed by
146- # `download_glue_data <https://github.com/nyu-mll/GLUE-baselines/blob/master/download_glue_data.py>`_.
147- # and unpack it to some directory “glue_data/MRPC”.
137+ # <https://gluebenchmark.com/tasks>`_ by running `this script
138+ # <https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e>`_
139+ # and unpack it to a directory `glue_data`.
148140#
149141#
150142# .. code:: shell
151143#
152- # wget https://gist.githubusercontent.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e/raw/17b8dd0d724281ed7c3b2aeeda662b92809aadd5/download_glue_data.py
153144# python download_glue_data.py --data_dir='glue_data' --tasks='MRPC'
154- # ls glue_data/MRPC
155145#
156146
157147
158148######################################################################
159- # Helper functions
149+ # 4. Helper functions
160150# ----------------
161151#
162152# The helper functions are built-in in transformers library. We mainly use
163153# the following helper functions: one for converting the text examples
164154# into the feature vectors; The other one for measuring the F1 score of
165155# the predicted result.
166156#
167- # Convert the texts into features
168- # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
169- #
170- # `glue_convert_examples_to_features <https://github.com/huggingface/transformers/blob/master/transformers/data/processors/glue.py>`_.
171- # load a data file into a list of ``InputFeatures``.
157+ # The `glue_convert_examples_to_features <https://github.com/huggingface/transformers/blob/master/transformers/data/processors/glue.py>`_ function converts the texts into input features:
172158#
173159# - Tokenize the input sequences;
174160# - Insert [CLS] at the beginning;
175161# - Insert [SEP] between the first sentence and the second sentence, and
176162# at the end;
177163# - Generate token type ids to indicate whether a token belongs to the
178- # first sequence or the second sequence;
179- #
180- # F1 metric
181- # ~~~~~~~~~
164+ # first sequence or the second sequence.
182165#
183166# The `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_
184167# can be interpreted as a weighted average of the precision and recall,
185168# where an F1 score reaches its best value at 1 and worst score at 0. The
186169# relative contribution of precision and recall to the F1 score are equal.
187- # The formula for the F1 score is:
170+ # The equation for the F1 score is:
188171#
189- # F1 = 2 \* (precision \* recall) / (precision + recall)
172+ # - F1 = 2 \* (precision \* recall) / (precision + recall)
190173#
191174
192175
193176######################################################################
194- # Fine-tune the BERT model
177+ # 5. Fine-tune the BERT model
195178# --------------------------
196179#
197180
204187# with the pre-trained BERT model to classify semantically equivalent
205188# sentence pairs on MRPC task.
206189#
207- # To fine-tune the pre-trained BERT model (“ bert-base-uncased” model in
190+ # To fine-tune the pre-trained BERT model (`` bert-base-uncased`` model in
208191# HuggingFace transformers) for the MRPC task, you can follow the command
209- # in `examples<https://github.com/huggingface/transformers/tree/master/examples>`_"
192+ # in `examples <https://github.com/huggingface/transformers/tree/master/examples#mrpc >`_:
210193#
211194# ::
212195#
213196# export GLUE_DIR=./glue_data
214197# export TASK_NAME=MRPC
215- # export OUT_DIR=/mnt/homedir/jianyuhuang/public/bert /$TASK_NAME/
198+ # export OUT_DIR=. /$TASK_NAME/
216199# python ./run_glue.py \
217200# --model_type bert \
218201# --model_name_or_path bert-base-uncased \
229212# --save_steps 100000 \
230213# --output_dir $OUT_DIR
231214#
232- # We provide the fined-tuned BERT model for MRPC task here (We did the
233- # fine-tuning on CPUs with a total train batch size of 8):
234- #
235- # https://drive.google.com/drive/folders/1mGBx0t-YJAWXHbgab2f_IimaMiVHlKh-
236- #
237- # To save time, you can manually copy the fined-tuned BERT model for MRPC
238- # task in your Google Drive (Create the same “BERT_Quant_Tutorial/MRPC”
239- # folder in the Google Drive directory), and then mount your Google Drive
240- # on your runtime using an authorization code, so that we can directly
241- # read and write the models into Google Drive in the following steps.
242- #
243-
244- from google .colab import drive
245- drive .mount ('/content/drive' )
246-
215+ # We provide the fined-tuned BERT model for MRPC task `here <https://download.pytorch.org/tutorial/MRPC.zip>`_.
216+ # To save time, you can download the model file (~400 MB) directly into your local folder ``$OUT_DIR``.
247217
248218######################################################################
249- # Set global configurations
219+ # 6. Set global configurations
250220# -------------------------
251221#
252222
258228
259229configs = Namespace ()
260230
261- # The output directory for the fine-tuned model.
262- configs .output_dir = "/content/drive/My Drive/BERT_Quant_Tutorial /MRPC/"
231+ # The output directory for the fine-tuned model, $OUT_DIR .
232+ configs .output_dir = ". /MRPC/"
263233
264- # The data directory for the MRPC task in the GLUE benchmark.
265- configs .data_dir = "/content /glue_data/MRPC"
234+ # The data directory for the MRPC task in the GLUE benchmark, $GLUE_DIR/$TASK_NAME .
235+ configs .data_dir = ". /glue_data/MRPC"
266236
267237# The model name or path for the pre-trained model.
268238configs .model_name_or_path = "bert-base-uncased"
@@ -294,7 +264,7 @@ def set_seed(seed):
294264
295265
296266######################################################################
297- # Load the fine-tuned BERT model
267+ # 7. Load the fine-tuned BERT model
298268# ------------------------------
299269#
300270
@@ -312,11 +282,12 @@ def set_seed(seed):
312282
313283
314284######################################################################
315- # Define the tokenize and evaluation function
285+ # 8. Define the tokenize and evaluation function
316286# -------------------------------------------
317287#
318- # We reuse the tokenize and evaluation function from `huggingface <https://github.com/huggingface/transformers/blob/master/examples/run_glue.py>`_.
288+ # We reuse the tokenize and evaluation function from `Huggingface <https://github.com/huggingface/transformers/blob/master/examples/run_glue.py>`_.
319289#
290+
320291# coding=utf-8
321292# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
322293# Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
@@ -455,7 +426,7 @@ def load_and_cache_examples(args, task, tokenizer, evaluate=False):
455426
456427
457428######################################################################
458- # Apply the dynamic quantization
429+ # 9. Apply the dynamic quantization
459430# -------------------------------
460431#
461432# We call ``torch.quantization.quantize_dynamic`` on the model to apply
@@ -474,11 +445,11 @@ def load_and_cache_examples(args, task, tokenizer, evaluate=False):
474445
475446
476447######################################################################
477- # Check the model size
448+ # 10. Check the model size
478449# --------------------
479450#
480451# Let’s first check the model size. We can observe a significant reduction
481- # in model size:
452+ # in model size (FP32 total size: 438 MB; INT8 total size: 181 MB) :
482453#
483454
484455def print_size_of_model (model ):
@@ -491,7 +462,7 @@ def print_size_of_model(model):
491462
492463
493464######################################################################
494- # The BERT model used in this tutorial (bert-base-uncased) has a
465+ # The BERT model used in this tutorial (`` bert-base-uncased`` ) has a
495466# vocabulary size V of 30522. With the embedding size of 768, the total
496467# size of the word embedding table is ~ 4 (Bytes/FP32) \* 30522 \* 768 =
497468# 90 MB. So with the help of quantization, the model size of the
@@ -501,15 +472,14 @@ def print_size_of_model(model):
501472
502473
503474######################################################################
504- # Evaluate the inference accuracy and time
475+ # 11. Evaluate the inference accuracy and time
505476# ----------------------------------------
506477#
507478# Next, let’s compare the inference time as well as the evaluation
508479# accuracy between the original FP32 model and the INT8 model after the
509480# dynamic quantization.
510481#
511482
512- # Evaluate the original FP32 BERT model
513483def time_model_evaluation (model , configs , tokenizer ):
514484 eval_start_time = time .time ()
515485 result = evaluate (configs , model , tokenizer , prefix = "" )
@@ -518,6 +488,7 @@ def time_model_evaluation(model, configs, tokenizer):
518488 print (result )
519489 print ("Evaluate total time (seconds): {0:.1f}" .format (eval_duration_time ))
520490
491+ # Evaluate the original FP32 BERT model
521492time_model_evaluation (model , configs , tokenizer )
522493
523494# Evaluate the INT8 BERT model after the dynamic quantization
@@ -539,7 +510,8 @@ def time_model_evaluation(model, configs, tokenizer):
539510#
540511# We have 0.6% F1 score accuracy after applying the post-training dynamic
541512# quantization on the fine-tuned BERT model on the MRPC task. As a
542- # comparison, in the recent paper [3] (Table 1), it achieved 0.8788 by
513+ # comparison, in a `recent paper <https://arxiv.org/pdf/1910.06188.pdf>`_ (Table 1),
514+ # it achieved 0.8788 by
543515# applying the post-training dynamic quantization and 0.8956 by applying
544516# the quantization-aware training. The main reason is that we support the
545517# asymmetric quantization in PyTorch while that paper supports the
@@ -561,7 +533,7 @@ def time_model_evaluation(model, configs, tokenizer):
561533
562534
563535######################################################################
564- # Serialize the quantized model
536+ # 12. Serialize the quantized model
565537# -----------------------------
566538#
567539# We can serialize and save the quantized model for the future use.
@@ -583,7 +555,7 @@ def time_model_evaluation(model, configs, tokenizer):
583555# having a limited implication on accuracy.
584556#
585557# Thanks for reading! As always, we welcome any feedback, so please create
586- # an issue here ( https://github.com/pytorch/pytorch/issues) if you have
558+ # an issue ` here < https://github.com/pytorch/pytorch/issues>`_ if you have
587559# any.
588560#
589561
@@ -592,14 +564,14 @@ def time_model_evaluation(model, configs, tokenizer):
592564# References
593565# -----------
594566#
595- # [1] J.Devlin, M. Chang, K. Lee and K. Toutanova, BERT: Pre-training of
567+ # [1] J.Devlin, M. Chang, K. Lee and K. Toutanova, ` BERT: Pre-training of
596568# Deep Bidirectional Transformers for Language Understanding (2018)
569+ # <https://arxiv.org/pdf/1810.04805.pdf>`_.
597570#
598- # [2] HuggingFace Transformers.
599- # https://github.com/huggingface/transformers
571+ # [2] `HuggingFace Transformers <https://github.com/huggingface/transformers>`_.
600572#
601- # [3] O. Zafrir, G. Boudoukh, P. Izsak, & M. Wasserblat (2019). Q8BERT:
602- # Quantized 8bit BERT. arXiv preprint arXiv: 1910.06188.
573+ # [3] O. Zafrir, G. Boudoukh, P. Izsak, and M. Wasserblat (2019). ` Q8BERT:
574+ # Quantized 8bit BERT <https://arxiv.org/pdf/ 1910.06188.pdf>`_ .
603575#
604576
605577
0 commit comments