Skip to content

Conversation

gs-olive
Copy link
Contributor

Description

  • Fix compilation error for GPT-2 model arising from Byte-type inputs fed into TensorRT Engine
  • Update translation dictionary between Torch and TensorRT types to include at::kByte
  • Add field to PartitioningInfo specifying whether to cast Int8 inputs to TensorRT Engines to Int, to avoid error arising from Int8 inputs being fed into non-quantized engines
  • Add automatic detection of quantized/calibrated models and disable Int8 => Int32 casting in those cases
  • Fix bug where LoweringInfo target device was not being updated for Python API
  • Allow castNode to force creation of a new node and avoid searching for an existing one to convert
  • Add test to ensure cast is inserted in the Torch engine preceding a TensorRT engine, when the Byte tensor is an output of the Torch engine

Error displayed when passing Int8 inputs to non-quantized TRT Engine:

ERROR: [Torch-TensorRT TorchScript Conversion Context] - 4: input_0: input/output with DataType Int8 in network without Q/DQ layers must have dynamic range set when no calibrator is used.
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 4: [network.cpp::validate::2772] Error Code 4: Internal Error (DataType does not match TensorFormats.)
ERROR: [Torch-TensorRT TorchScript Conversion Context] - 2: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )

With this PR, GPT-2 now compiles and runs inference successfully.

Fixes #1455

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Checklist:

  • [ x ] My code follows the style guidelines of this project (You can use the linters)
  • [ x ] I have performed a self-review of my own code
  • [ x ] I have commented my code, particularly in hard-to-understand areas and hacks
  • [ x ] I have made corresponding changes to the documentation
  • [ x ] I have added tests to verify my fix or my feature
  • [ x ] New and existing unit tests pass locally with my changes
  • [ x ] I have added the relevant labels to my PR in so that relevant reviewers are notified

@gs-olive gs-olive requested a review from bowang007 December 14, 2022 04:58
@github-actions github-actions bot added component: api [Python] Issues re: Python API component: api [C++] Issues re: C++ API component: core Issues re: The core compiler component: partitioning component: tests Issues re: Tests labels Dec 14, 2022
@gs-olive gs-olive self-assigned this Dec 14, 2022
- Fix compilation error for GPT-2 model arising from Byte-type inputs
fed into TensorRT Engine
- Update translation dictionary between Torch and TensorRT types to
include `at::kByte`
- Add field to PartitioningInfo specifying whether to cast Int8 inputs
to TensorRT Engines to Int, to avoid error arising from Int8 inputs
being fed into non-quantized engines
- Add automatic detection of quantized/calibrated models and disable
Int8 => Int32 casting in those cases
- Fix bug where LoweringInfo target device was not being updated for Python API
- Allow `castNode` to force creation of a new node and avoid searching
for an existing one to convert
- Add test to ensure cast is inserted in the Torch engine preceding a
TensorRT engine, when the Byte tensor is an output of the Torch engine
@gs-olive gs-olive requested a review from peri044 December 21, 2022 04:53
@gs-olive gs-olive changed the title fix: Properly cast Int8 inputs to TensorRT Engines in Fallback fix: Properly cast intermediate Int8 tensors to TensorRT Engines in Fallback Dec 21, 2022
Comment on lines 233 to 245
if (partitioning_info.truncate_long_and_double) {
for (size_t i = 0; i < seg_block.inputs().size(); ++i) {
if (ivalues_maps[seg_block.raw_inputs()[i]].isTensor()) {
auto cur_ivalue = ivalues_maps[seg_block.raw_inputs()[i]];
at::ScalarType t = cur_ivalue.toTensor().scalar_type();
if (t == at::kLong) {
// we add a cast operation to cast the type to Int64
auto cast_node = createCastNode(seg_block, i, true, target_device);
seg_block.g()->prependNode(cast_node);
seg_block.inputs()[i]->replaceAllUsesAfterNodeWith(cast_node, cast_node->outputs()[0]);
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just linter formatting changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I manually made the formatting changes to reduce redundancy of if statements, but they should be functionally equivalent to the previous version

- Address review comments
- Improve documentation and logging messages
- Restructure casting function to allow for casting of variable data
types
- Add casting for `at::kByte` segment block inputs as well as segment
block outputs
@gs-olive gs-olive requested a review from peri044 December 22, 2022 03:32
Copy link
Collaborator

@peri044 peri044 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@peri044 peri044 merged commit 544654f into pytorch:master Dec 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed component: api [C++] Issues re: C++ API component: api [Python] Issues re: Python API component: core Issues re: The core compiler component: partitioning component: tests Issues re: Tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 [Bug] Compilation Error on GPT-2

3 participants