1616#
1717#
1818# In this tutorial, we will apply the dynamic quantization on a BERT
19- # model, closely following the BERT model from the HuggingFace
20- # Transformers examples ( https://github.com/huggingface/transformers) .
19+ # model, closely following the BERT model from ` the HuggingFace
20+ # Transformers examples < https://github.com/huggingface/transformers>`_ .
2121# With this step-by-step journey, we would like to demonstrate how to
2222# convert a well-known state-of-the-art model like BERT into dynamic
2323# quantized model.
2727# achieves the state-of-the-art accuracy results on many popular
2828# Natural Language Processing (NLP) tasks, such as question answering,
2929# text classification, and others. The original paper can be found
30- # here: https://arxiv.org/pdf/1810.04805.pdf.
30+ # ` here < https://arxiv.org/pdf/1810.04805.pdf>`_ .
3131#
3232# - Dynamic quantization support in PyTorch converts a float model to a
3333# quantized model with static int8 or float16 data types for the
3636# quantized to int8.
3737#
3838# In PyTorch, we have `torch.quantization.quantize_dynamic API
39- # <https://pytorch.org/docs/stable/quantization.html#torch.quantization.quantize_dynamic>`_
40- # , which replaces specified modules with dynamic weight-only quantized
39+ # <https://pytorch.org/docs/stable/quantization.html#torch.quantization.quantize_dynamic>`_,
40+ # which replaces specified modules with dynamic weight-only quantized
4141# versions and output the quantized model.
4242#
4343# - We demonstrate the accuracy and inference performance results on the
4747# a corpus of sentence pairs automatically extracted from online news
4848# sources, with human annotations of whether the sentences in the pair
4949# are semantically equivalent. Because the classes are imbalanced (68%
50- # positive, 32% negative), we follow common practice and report both
51- # accuracy and `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_
50+ # positive, 32% negative), we follow the common practice and report
51+ # `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_.
5252# MRPC is a common NLP task for language pair classification, as shown
5353# below.
5454#
6363# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6464#
6565# To start this tutorial, let’s first follow the installation instructions
66- # in PyTorch and HuggingFace Github Repo: -
67- #
68- # * https://github.com/pytorch/pytorch/#installation -
69- #
70- # * https://github.com/huggingface/transformers#installation
71- #
72- # In addition, we also install ``sklearn`` package, as we will reuse its
66+ # in PyTorch `here <https://github.com/pytorch/pytorch/#installation>`_ and HuggingFace Github Repo `here <https://github.com/huggingface/transformers#installation>`_.
67+ # In addition, we also install `scikit-learn <https://github.com/scikit-learn/scikit-learn>`_ package, as we will reuse its
7368# built-in F1 score calculation helper function.
7469#
7570# .. code:: shell
141136# --------------------
142137#
143138# Before running MRPC tasks we download the `GLUE data
144- # <https://gluebenchmark.com/tasks>`_ by running this `script
145- # <https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e>`_ followed by
146- # `download_glue_data <https://github.com/nyu-mll/GLUE-baselines/blob/master/download_glue_data.py>`_.
147- # and unpack it to some directory “glue_data/MRPC”.
139+ # <https://gluebenchmark.com/tasks>`_ by running `this script
140+ # <https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e>`_
141+ # and unpack it to a directory `glue_data`.
148142#
149143#
150144# .. code:: shell
151145#
152- # wget https://gist.githubusercontent.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e/raw/17b8dd0d724281ed7c3b2aeeda662b92809aadd5/download_glue_data.py
153146# python download_glue_data.py --data_dir='glue_data' --tasks='MRPC'
154- # ls glue_data/MRPC
155147#
156148
157149
164156# into the feature vectors; The other one for measuring the F1 score of
165157# the predicted result.
166158#
167- # Convert the texts into features
168- # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
169- #
170- # `glue_convert_examples_to_features <https://github.com/huggingface/transformers/blob/master/transformers/data/processors/glue.py>`_.
171- # load a data file into a list of ``InputFeatures``.
159+ # The `glue_convert_examples_to_features <https://github.com/huggingface/transformers/blob/master/transformers/data/processors/glue.py>`_ function converts the texts into input features:
172160#
173161# - Tokenize the input sequences;
174162# - Insert [CLS] at the beginning;
175163# - Insert [SEP] between the first sentence and the second sentence, and
176164# at the end;
177165# - Generate token type ids to indicate whether a token belongs to the
178- # first sequence or the second sequence;
179- #
180- # F1 metric
181- # ~~~~~~~~~
166+ # first sequence or the second sequence.
182167#
183168# The `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_
184169# can be interpreted as a weighted average of the precision and recall,
185170# where an F1 score reaches its best value at 1 and worst score at 0. The
186171# relative contribution of precision and recall to the F1 score are equal.
187- # The formula for the F1 score is:
172+ # The equation for the F1 score is:
188173#
189- # F1 = 2 \* (precision \* recall) / (precision + recall)
174+ # - F1 = 2 \* (precision \* recall) / (precision + recall)
190175#
191176
192177
204189# with the pre-trained BERT model to classify semantically equivalent
205190# sentence pairs on MRPC task.
206191#
207- # To fine-tune the pre-trained BERT model (“ bert-base-uncased” model in
192+ # To fine-tune the pre-trained BERT model (`` bert-base-uncased`` model in
208193# HuggingFace transformers) for the MRPC task, you can follow the command
209- # in `examples<https://github.com/huggingface/transformers/tree/master/examples>`_"
194+ # in `examples <https://github.com/huggingface/transformers/tree/master/examples#mrpc >`_:
210195#
211196# ::
212197#
213198# export GLUE_DIR=./glue_data
214199# export TASK_NAME=MRPC
215- # export OUT_DIR=/mnt/homedir/jianyuhuang/public/bert /$TASK_NAME/
200+ # export OUT_DIR=. /$TASK_NAME/
216201# python ./run_glue.py \
217202# --model_type bert \
218203# --model_name_or_path bert-base-uncased \
229214# --save_steps 100000 \
230215# --output_dir $OUT_DIR
231216#
232- # We provide the fined-tuned BERT model for MRPC task here (We did the
233- # fine-tuning on CPUs with a total train batch size of 8):
234- #
235- # https://drive.google.com/drive/folders/1mGBx0t-YJAWXHbgab2f_IimaMiVHlKh-
236- #
237- # To save time, you can manually copy the fined-tuned BERT model for MRPC
238- # task in your Google Drive (Create the same “BERT_Quant_Tutorial/MRPC”
239- # folder in the Google Drive directory), and then mount your Google Drive
240- # on your runtime using an authorization code, so that we can directly
241- # read and write the models into Google Drive in the following steps.
242- #
243-
244- from google .colab import drive
245- drive .mount ('/content/drive' )
246-
217+ # We provide the fined-tuned BERT model for MRPC task `here <https://download.pytorch.org/tutorial/MRPC.zip>`_.
218+ # To save time, you can download the model file (~400 MB) directly into your local folder ``$OUT_DIR``.
247219
248220######################################################################
249221# Set global configurations
258230
259231configs = Namespace ()
260232
261- # The output directory for the fine-tuned model.
262- configs .output_dir = "/content/drive/My Drive/BERT_Quant_Tutorial /MRPC/"
233+ # The output directory for the fine-tuned model, $OUT_DIR .
234+ configs .output_dir = ". /MRPC/"
263235
264- # The data directory for the MRPC task in the GLUE benchmark.
265- configs .data_dir = "/content /glue_data/MRPC"
236+ # The data directory for the MRPC task in the GLUE benchmark, $GLUE_DIR/$TASK_NAME .
237+ configs .data_dir = ". /glue_data/MRPC"
266238
267239# The model name or path for the pre-trained model.
268240configs .model_name_or_path = "bert-base-uncased"
@@ -315,8 +287,9 @@ def set_seed(seed):
315287# Define the tokenize and evaluation function
316288# -------------------------------------------
317289#
318- # We reuse the tokenize and evaluation function from `huggingface <https://github.com/huggingface/transformers/blob/master/examples/run_glue.py>`_.
290+ # We reuse the tokenize and evaluation function from `Huggingface <https://github.com/huggingface/transformers/blob/master/examples/run_glue.py>`_.
319291#
292+
320293# coding=utf-8
321294# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
322295# Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
@@ -478,7 +451,7 @@ def load_and_cache_examples(args, task, tokenizer, evaluate=False):
478451# --------------------
479452#
480453# Let’s first check the model size. We can observe a significant reduction
481- # in model size:
454+ # in model size (FP32 total size: 438 MB; INT8 total size: 181 MB) :
482455#
483456
484457def print_size_of_model (model ):
@@ -491,7 +464,7 @@ def print_size_of_model(model):
491464
492465
493466######################################################################
494- # The BERT model used in this tutorial (bert-base-uncased) has a
467+ # The BERT model used in this tutorial (`` bert-base-uncased`` ) has a
495468# vocabulary size V of 30522. With the embedding size of 768, the total
496469# size of the word embedding table is ~ 4 (Bytes/FP32) \* 30522 \* 768 =
497470# 90 MB. So with the help of quantization, the model size of the
@@ -509,7 +482,6 @@ def print_size_of_model(model):
509482# dynamic quantization.
510483#
511484
512- # Evaluate the original FP32 BERT model
513485def time_model_evaluation (model , configs , tokenizer ):
514486 eval_start_time = time .time ()
515487 result = evaluate (configs , model , tokenizer , prefix = "" )
@@ -518,6 +490,7 @@ def time_model_evaluation(model, configs, tokenizer):
518490 print (result )
519491 print ("Evaluate total time (seconds): {0:.1f}" .format (eval_duration_time ))
520492
493+ # Evaluate the original FP32 BERT model
521494time_model_evaluation (model , configs , tokenizer )
522495
523496# Evaluate the INT8 BERT model after the dynamic quantization
@@ -539,7 +512,8 @@ def time_model_evaluation(model, configs, tokenizer):
539512#
540513# We have 0.6% F1 score accuracy after applying the post-training dynamic
541514# quantization on the fine-tuned BERT model on the MRPC task. As a
542- # comparison, in the recent paper [3] (Table 1), it achieved 0.8788 by
515+ # comparison, in a `recent paper <https://arxiv.org/pdf/1910.06188.pdf>`_ (Table 1),
516+ # it achieved 0.8788 by
543517# applying the post-training dynamic quantization and 0.8956 by applying
544518# the quantization-aware training. The main reason is that we support the
545519# asymmetric quantization in PyTorch while that paper supports the
@@ -583,7 +557,7 @@ def time_model_evaluation(model, configs, tokenizer):
583557# having a limited implication on accuracy.
584558#
585559# Thanks for reading! As always, we welcome any feedback, so please create
586- # an issue here ( https://github.com/pytorch/pytorch/issues) if you have
560+ # an issue ` here < https://github.com/pytorch/pytorch/issues>`_ if you have
587561# any.
588562#
589563
@@ -592,14 +566,14 @@ def time_model_evaluation(model, configs, tokenizer):
592566# References
593567# -----------
594568#
595- # [1] J.Devlin, M. Chang, K. Lee and K. Toutanova, BERT: Pre-training of
569+ # [1] J.Devlin, M. Chang, K. Lee and K. Toutanova, ` BERT: Pre-training of
596570# Deep Bidirectional Transformers for Language Understanding (2018)
571+ # <https://arxiv.org/pdf/1810.04805.pdf>`_.
597572#
598- # [2] HuggingFace Transformers.
599- # https://github.com/huggingface/transformers
573+ # [2] `HuggingFace Transformers <https://github.com/huggingface/transformers>`_.
600574#
601- # [3] O. Zafrir, G. Boudoukh, P. Izsak, & M. Wasserblat (2019). Q8BERT:
602- # Quantized 8bit BERT. arXiv preprint arXiv: 1910.06188.
575+ # [3] O. Zafrir, G. Boudoukh, P. Izsak, and M. Wasserblat (2019). ` Q8BERT:
576+ # Quantized 8bit BERT <https://arxiv.org/pdf/ 1910.06188.pdf>`_ .
603577#
604578
605579
0 commit comments