From b992c36c5721ac54fc6d0434c85f1231dfa58879 Mon Sep 17 00:00:00 2001 From: Nayef Ahmed Date: Wed, 25 May 2022 18:28:39 -0400 Subject: [PATCH 1/4] Add contributing guidelines for third party libraries and custom C++ operators --- CONTRIBUTING.md | 50 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 6e0dba6084..8312172216 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -48,6 +48,56 @@ python run-clang-format.py \ where `$CLANG_FORMAT` denotes the path to the downloaded binary. +## Adding Third Party Libraries + +The following steps outline how to add third party libraries to torchtext: + +1. Add the git repo you care about as a submodule. Here is a great + [tutorial](https://www.atlassian.com/git/tutorials/git-submodule) on working with submodules in git. + - Navigate to `third_party/` folder and run `git submodule add ` + - Verify the newly added module is present in the + [`.gitmodules`](https://github.com/pytorch/text/blob/main/.gitmodules) file +2. Update + [`third_party/CMakeLists.txt`](https://github.com/pytorch/text/blob/70fc1040ee40faf129604557107cc59fd51c4fe2/third_party/CMakeLists.txt#L8) + to add the following line: `add_subdirectory( EXCLUDE_FROM_ALL)` +3. (Optional) If any of the files within the `csrc/` folder make use of the newly added third party library then + - Add the new submodule folder to + [`​​LIBTORCHTEXT_INCLUDE_DIRS`](https://github.com/pytorch/text/blob/70fc1040ee40faf129604557107cc59fd51c4fe2/torchtext/csrc/CMakeLists.txt#L24) + and to + [`EXTENSION_INCLUDE_DIRS`](https://github.com/pytorch/text/blob/70fc1040ee40faf129604557107cc59fd51c4fe2/torchtext/csrc/CMakeLists.txt#L119) + - Add the submodule name to + [`LIBTORCHTEXT_LINK_LIBRARIES`](https://github.com/pytorch/text/blob/70fc1040ee40faf129604557107cc59fd51c4fe2/torchtext/csrc/CMakeLists.txt#L33) +4. Verify the torchtext build works by running `python setup.py develop` + +## Adding a Custom C++ Operator + +Custom C++ operators can be implemented and registered in torchtext for several reasons including to make an existing +Python component more efficient, and to get around the limitations when working with multithreading in Python (due to +the Global Interpreter Lock). These custom kernels (or “ops”) can be embedded into a TorchScripted model and can be +executed both in Python and in their serialized form directly in C++. You can learn more in this +[tutorial on writing custom C++ operators](https://pytorch.org/tutorials/advanced/torch_script_custom_ops.html) + +Steps to register an operator: + +1. Add the new custom operator to the [`torchtext/csrc`](https://github.com/pytorch/text/tree/main/torchtext/csrc) + folder. This entails writing the header and the source file for the custom op. +2. Add the new source files to the + [`LIBTORCHTEXT_SOURCES`](https://github.com/pytorch/text/blob/70fc1040ee40faf129604557107cc59fd51c4fe2/torchtext/csrc/CMakeLists.txt#L11) + list. +3. Register the operators with torchbind and pybind. Torchbind registration happens in the + [`register_torchbindings.cpp`](https://github.com/pytorch/text/blob/70fc1040ee40faf129604557107cc59fd51c4fe2/torchtext/csrc/register_torchbindings.cpp#L14) + file. Pybind registration happens in the + [`register_pybindings.cpp`](https://github.com/pytorch/text/blob/70fc1040ee40faf129604557107cc59fd51c4fe2/torchtext/csrc/register_pybindings.cpp#L34) + file. +4. Write a Python wrapper class that is responsible for exposing the torchbind/pybind registered operators via Python. + You can find some examples of this in the + [`torchtext/transforms.py`](https://github.com/pytorch/text/blob/70fc1040ee40faf129604557107cc59fd51c4fe2/torchtext/transforms.py#L274) + file. +5. Write a unit test that tests the functionality of the operator through the Python wrapper class. You can find some + examples in the + [`test/test_transforms.py`](https://github.com/pytorch/text/blob/70fc1040ee40faf129604557107cc59fd51c4fe2/test/test_transforms.py#L317) + file. + ## Contributor License Agreement ("CLA") In order to accept your pull request, we need you to submit a CLA. You only need to do this once to work on any of From a7a84376565f9f0325e1c64b799e17b79831d98a Mon Sep 17 00:00:00 2001 From: Nayef Ahmed Date: Wed, 25 May 2022 18:40:47 -0400 Subject: [PATCH 2/4] Fix formatting --- CONTRIBUTING.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 8312172216..3e257fad7b 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -84,11 +84,13 @@ Steps to register an operator: 2. Add the new source files to the [`LIBTORCHTEXT_SOURCES`](https://github.com/pytorch/text/blob/70fc1040ee40faf129604557107cc59fd51c4fe2/torchtext/csrc/CMakeLists.txt#L11) list. -3. Register the operators with torchbind and pybind. Torchbind registration happens in the - [`register_torchbindings.cpp`](https://github.com/pytorch/text/blob/70fc1040ee40faf129604557107cc59fd51c4fe2/torchtext/csrc/register_torchbindings.cpp#L14) - file. Pybind registration happens in the - [`register_pybindings.cpp`](https://github.com/pytorch/text/blob/70fc1040ee40faf129604557107cc59fd51c4fe2/torchtext/csrc/register_pybindings.cpp#L34) - file. +3. Register the operators with torchbind and pybind + - Torchbind registration happens in the + [`register_torchbindings.cpp`](https://github.com/pytorch/text/blob/70fc1040ee40faf129604557107cc59fd51c4fe2/torchtext/csrc/register_torchbindings.cpp#L14) + file + - Pybind registration happens in the + [`register_pybindings.cpp`](https://github.com/pytorch/text/blob/70fc1040ee40faf129604557107cc59fd51c4fe2/torchtext/csrc/register_pybindings.cpp#L34) + file. 4. Write a Python wrapper class that is responsible for exposing the torchbind/pybind registered operators via Python. You can find some examples of this in the [`torchtext/transforms.py`](https://github.com/pytorch/text/blob/70fc1040ee40faf129604557107cc59fd51c4fe2/torchtext/transforms.py#L274) From 2bebc633bb3ed50da87a81d124c0dd1cf17dd527 Mon Sep 17 00:00:00 2001 From: Nayef Ahmed Date: Thu, 26 May 2022 14:34:41 -0400 Subject: [PATCH 3/4] Fixing PR comments --- CONTRIBUTING.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 3e257fad7b..6172f9b182 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -52,7 +52,7 @@ where `$CLANG_FORMAT` denotes the path to the downloaded binary. The following steps outline how to add third party libraries to torchtext: -1. Add the git repo you care about as a submodule. Here is a great +1. Add the third party library as a submodule. Here is a great [tutorial](https://www.atlassian.com/git/tutorials/git-submodule) on working with submodules in git. - Navigate to `third_party/` folder and run `git submodule add ` - Verify the newly added module is present in the @@ -65,8 +65,9 @@ The following steps outline how to add third party libraries to torchtext: [`​​LIBTORCHTEXT_INCLUDE_DIRS`](https://github.com/pytorch/text/blob/70fc1040ee40faf129604557107cc59fd51c4fe2/torchtext/csrc/CMakeLists.txt#L24) and to [`EXTENSION_INCLUDE_DIRS`](https://github.com/pytorch/text/blob/70fc1040ee40faf129604557107cc59fd51c4fe2/torchtext/csrc/CMakeLists.txt#L119) - - Add the submodule name to + - Add the "targets" name defined by the third party library's `CMakeLists` file to [`LIBTORCHTEXT_LINK_LIBRARIES`](https://github.com/pytorch/text/blob/70fc1040ee40faf129604557107cc59fd51c4fe2/torchtext/csrc/CMakeLists.txt#L33) + - Note that the third party libraries are linked statically with torchtext 4. Verify the torchtext build works by running `python setup.py develop` ## Adding a Custom C++ Operator From 3310bddb7c55389df11d0b0ffa87bd1a5e96de8c Mon Sep 17 00:00:00 2001 From: Nayef Ahmed Date: Thu, 26 May 2022 14:39:36 -0400 Subject: [PATCH 4/4] Resolve PR comment --- CONTRIBUTING.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 6172f9b182..1e1d0e0502 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -50,7 +50,8 @@ where `$CLANG_FORMAT` denotes the path to the downloaded binary. ## Adding Third Party Libraries -The following steps outline how to add third party libraries to torchtext: +The following steps outline how to add third party libraries to torchtext. We assume that the third party library has +correctly setup their `CMakeLists.txt` file for other libraries to take a dependency on. 1. Add the third party library as a submodule. Here is a great [tutorial](https://www.atlassian.com/git/tutorials/git-submodule) on working with submodules in git. @@ -65,7 +66,7 @@ The following steps outline how to add third party libraries to torchtext: [`​​LIBTORCHTEXT_INCLUDE_DIRS`](https://github.com/pytorch/text/blob/70fc1040ee40faf129604557107cc59fd51c4fe2/torchtext/csrc/CMakeLists.txt#L24) and to [`EXTENSION_INCLUDE_DIRS`](https://github.com/pytorch/text/blob/70fc1040ee40faf129604557107cc59fd51c4fe2/torchtext/csrc/CMakeLists.txt#L119) - - Add the "targets" name defined by the third party library's `CMakeLists` file to + - Add the "targets" name defined by the third party library's `CMakeLists.txt` file to [`LIBTORCHTEXT_LINK_LIBRARIES`](https://github.com/pytorch/text/blob/70fc1040ee40faf129604557107cc59fd51c4fe2/torchtext/csrc/CMakeLists.txt#L33) - Note that the third party libraries are linked statically with torchtext 4. Verify the torchtext build works by running `python setup.py develop`