Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Conversation

@pmabbo13
Copy link
Contributor

@pmabbo13 pmabbo13 commented Jul 18, 2022

Description

Add pre-trained T5 model weights for base configuration and an API to load them

Process

  • The T5Conf dataclass was created to store the configuration values of the model
  • The T5Bundle API was created such that the build_model method initializes a T5 model (encoder-only or encoder-decoder) according to the input configuration and checkpoint provided to load the pre-trained weights. The get_model method uses the build_model method to load a model according to the object's config and path attributes (the latter stores a path to the saved pre-trained weights).
  • T5_BASE_ENCODER is a bundler object that is configured to be the encoder-only base model, and whose path attribute points to the t5.base.encoder.pt pre-trained weights in the s3 bucket.
  • T5_BASE is a bundler object that is configured to be the encoder-decoder base model, and whose path attribute points to the t5.base.pt pre-trained weights in the s3 bucket.

Testing

Informal testing was done in this notebook to ensure that we can successfully load the pre-trained models using the bundler objects, and that they have the same output as the HuggingFace implementation.

This PR description will be updated once formal unittests and integration tests have been added.

Copy link
Contributor

@Nayef211 Nayef211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Do we plan to add the end-to-end test in a followup PR? For reference, the integration tests can be found here and the unit tests for the existing RobertaBundle can be found here

@pmabbo13
Copy link
Contributor Author

LGTM! Do we plan to add the end-to-end test in a followup PR? For reference, the integration tests can be found here and the unit tests for the existing RobertaBundle can be found here (https://github.com/pytorch/text/blob/main/test/models/test_models.py)

I intended to add the tests in a separate PR, but I have no opposition to adding it to this PR.

@pmabbo13 pmabbo13 requested review from abhinavarora and parmeet July 19, 2022 14:44
Copy link
Contributor

@parmeet parmeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM! Do we plan to add integration tests in follow-up PR?

@parmeet
Copy link
Contributor

parmeet commented Jul 19, 2022

Overall LGTM! Do we plan to add integration tests in follow-up PR?

Oops just saw the answer here #1846 (comment). Feel free to ignore :)

@Nayef211
Copy link
Contributor

I intended to add the tests in a separate PR, but I have no opposition to adding it to this PR.

Gotcha that should be fine. I do agree with @parmeet's comment about not expecting the user to pass in the decoder_input if we already know the padding indices from the input config to the model.

@parmeet
Copy link
Contributor

parmeet commented Jul 19, 2022

LGTM! Do we plan to add the end-to-end test in a followup PR? For reference, the integration tests can be found here and the unit tests for the existing RobertaBundle can be found here (https://github.com/pytorch/text/blob/main/test/models/test_models.py)

I intended to add the tests in a separate PR, but I have no opposition to adding it to this PR.

Sure, feel free to do it in separate PR.

@pmabbo13 pmabbo13 merged commit ca2e5a4 into pytorch:main Jul 19, 2022
@pmabbo13 pmabbo13 deleted the feature/t5-bundler branch July 19, 2022 17:30
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants