This repository was archived by the owner on Sep 10, 2025. It is now read-only.
File tree Expand file tree Collapse file tree 7 files changed +148
-0
lines changed Expand file tree Collapse file tree 7 files changed +148
-0
lines changed Original file line number Diff line number Diff line change 1+ build
2+ ** /* .pt
Original file line number Diff line number Diff line change 1+ cmake_minimum_required (VERSION 3.18 FATAL_ERROR)
2+ project (libtorchtext_cpp_example)
3+
4+ SET (BUILD_TORCHTEXT_PYTHON_EXTENSION OFF CACHE BOOL "Build Python binding" )
5+
6+ find_package (Torch REQUIRED)
7+ message ("libtorchtext CMakeLists: ${TORCH_CXX_FLAGS} " )
8+ set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS} " )
9+
10+ add_subdirectory (../.. libtorchtext)
11+ add_subdirectory (tokenizer)
Original file line number Diff line number Diff line change 1+ # Libtorchtext Examples
2+
3+ - [ Tokenizer] ( ./tokenizer )
4+
5+ ## Build
6+
7+ The example applications in this directory depend on ` libtorch ` and ` libtorchtext ` . If you have a working ` PyTorch ` , you
8+ already have ` libtorch ` . Please refer to
9+ [ this tutorial] ( https://pytorch.org/tutorials/advanced/torch_script_custom_classes.html ) for the use of ` libtorch ` and
10+ TorchScript.
11+
12+ ` libtorchtext ` is the library of torchtext's C++ components without Python components. It is currently not distributed,
13+ and it will be built alongside with the applications.
14+
15+ To build ` libtorchtext ` and the example applications you can run the following command.
16+
17+ ``` bash
18+ chmod +x build.sh # give script execute permission
19+ ./build.sh
20+ ```
21+
22+ For the usages of each application, refer to the corresponding application directory.
Original file line number Diff line number Diff line change 1+ #! /usr/bin/env bash
2+
3+ set -eux
4+
5+ this_dir=" $( cd " $( dirname " ${BASH_SOURCE[0]} " ) " > /dev/null 2>&1 && pwd ) "
6+ build_dir=" ${this_dir} /build"
7+
8+ mkdir -p " ${build_dir} "
9+ cd " ${build_dir} "
10+
11+ git submodule update
12+ cmake \
13+ -DCMAKE_PREFIX_PATH=" $( python -c ' import torch;print(torch.utils.cmake_prefix_path)' ) " \
14+ -DRE2_BUILD_TESTING:BOOL=OFF \
15+ -DBUILD_TESTING:BOOL=OFF \
16+ -DSPM_ENABLE_SHARED=OFF \
17+ ..
18+ cmake --build .
Original file line number Diff line number Diff line change 1+ # Tokenizer
2+
3+ This example demonstrates how you can use torchtext's ` GPT2BPETokenizer ` in a C++ environment.
4+
5+ ## Steps
6+
7+ ### 1. Download necessary artifacts
8+
9+ First we download ` gpt2_bpe_vocab.bpe ` and ` gpt2_bpe_encoder.json ` artifacts, both of which are needed to construct the
10+ ` GPT2BPETokenizer ` object.
11+
12+ ``` bash
13+ curl -O https://download.pytorch.org/models/text/gpt2_bpe_vocab.bpe
14+ curl -O https://download.pytorch.org/models/text/gpt2_bpe_encoder.json
15+ ```
16+
17+ ### 2. Create tokenizer TorchScript file
18+
19+ Next we create our tokenizer object, and save it as a TorchScript object. We also print out the output of the tokenizer
20+ on a sample sentence and verify that the output is the same before and after saving and re-loading the tokenizer. In the
21+ next steps we will load and execute the tokenizer in our C++ application. The C++ code is found in
22+ [ ` main.cpp ` ] ( ./main.cpp ) .
23+
24+ ``` bash
25+ tokenizer_file=" tokenizer.pt"
26+ python create_tokenizer.py --tokenizer-file " ${tokenizer_file} "
27+ ```
28+
29+ ### 3. Build the application
30+
31+ Please refer to [ the top level README.md] ( ../README.md )
32+
33+ ### 4. Run the application
34+
35+ Now we run the C++ application ` tokenizer ` , with the TorchScript object we created in Step 2. The tokenizer is run with
36+ the following sentence as input and we verify that the output is the same as that of Step 2.
37+
38+ In [ the top level directory] ( ../ )
39+
40+ ``` bash
41+ ./build/tokenizer/tokenize " tokenizer/${tokenizer_file} "
42+ ```
Original file line number Diff line number Diff line change 1+ from argparse import ArgumentParser
2+
3+ import torch
4+ from torchtext import transforms
5+
6+
7+ def main (args ):
8+ tokenizer_file = args .tokenizer_file
9+ sentence = "The green grasshopper jumped over the fence"
10+
11+ # create tokenizer object
12+ encoder_json = "gpt2_bpe_encoder.json"
13+ bpe_vocab = "gpt2_bpe_vocab.bpe"
14+ tokenizer = transforms .GPT2BPETokenizer (encoder_json_path = encoder_json , vocab_bpe_path = bpe_vocab )
15+
16+ # script and save tokenizer
17+ tokenizer = torch .jit .script (tokenizer )
18+ print (tokenizer (sentence ))
19+ torch .jit .save (tokenizer , tokenizer_file )
20+
21+ # load saved tokenizer and verify outputs match
22+ t = torch .jit .load (tokenizer_file )
23+ print (t (sentence ))
24+
25+
26+ if __name__ == "__main__" :
27+ parser = ArgumentParser ()
28+ parser .add_argument ("--tokenizer-file" , default = "tokenizer.pt" , type = str )
29+ main (parser .parse_args ())
Original file line number Diff line number Diff line change 1+ #include < torch/nn/functional/activation.h>
2+ #include < torch/script.h>
3+
4+ #include < iostream>
5+ #include < string>
6+ #include < vector>
7+
8+ int main (int argc, const char * argv[]) {
9+ std::cout << " Loading model...\n " ;
10+
11+ torch::jit::script::Module module ;
12+ try {
13+ module = torch::jit::load (argv[1 ]);
14+ } catch (const c10::Error& e) {
15+ return -1 ;
16+ }
17+
18+ torch::NoGradGuard no_grad; // ensures that autograd is off
19+ torch::jit::IValue tokens_ivalue = module .forward (std::vector<c10::IValue>(
20+ 1 , " The green grasshopper jumped over the fence" ));
21+ std::cout << " Result: " << tokens_ivalue << std::endl;
22+
23+ return 0 ;
24+ }
You can’t perform that action at this time.
0 commit comments