Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Conversation

@NicolasHug
Copy link
Member

This PR removes the dependency on torchdata:

  • from the torchtext package (setup.py / requirements.txt, etc.)
  • from the CI: all unittest and build jobs.

The files that still have references to torchdata after this PR are:

(base) ➜  text git:(remove_torchdata) ✗ git grep -i -l torchdata 
CONTRIBUTING_DATASETS.md
README.rst
docs/source/datasets.rst
examples/torcharrow/README.md
examples/tutorials/sst2_classification_non_distributed.py
examples/tutorials/t5_demo.py
test/torchtext_unittest/datasets/common.py
test/torchtext_unittest/datasets/test_agnews.py
test/torchtext_unittest/datasets/test_amazonreviews.py
test/torchtext_unittest/datasets/test_cc100.py
test/torchtext_unittest/datasets/test_cnndm.py
test/torchtext_unittest/datasets/test_cola.py
test/torchtext_unittest/datasets/test_conll2000chunking.py
test/torchtext_unittest/datasets/test_dbpedia.py
test/torchtext_unittest/datasets/test_enwik9.py
test/torchtext_unittest/datasets/test_imdb.py
test/torchtext_unittest/datasets/test_iwslt2016.py
test/torchtext_unittest/datasets/test_iwslt2017.py
test/torchtext_unittest/datasets/test_mnli.py
test/torchtext_unittest/datasets/test_mrpc.py
test/torchtext_unittest/datasets/test_multi30k.py
test/torchtext_unittest/datasets/test_penntreebank.py
test/torchtext_unittest/datasets/test_qnli.py
test/torchtext_unittest/datasets/test_qqp.py
test/torchtext_unittest/datasets/test_rte.py
test/torchtext_unittest/datasets/test_sogounews.py
test/torchtext_unittest/datasets/test_squads.py
test/torchtext_unittest/datasets/test_sst2.py
test/torchtext_unittest/datasets/test_stsb.py
test/torchtext_unittest/datasets/test_udpos.py
test/torchtext_unittest/datasets/test_wikitexts.py
test/torchtext_unittest/datasets/test_wnli.py
test/torchtext_unittest/datasets/test_yahooanswers.py
test/torchtext_unittest/datasets/test_yelpreviews.py
torchtext/datasets/ag_news.py
torchtext/datasets/amazonreviewfull.py
torchtext/datasets/amazonreviewpolarity.py
torchtext/datasets/cc100.py
torchtext/datasets/cnndm.py
torchtext/datasets/cola.py
torchtext/datasets/conll2000chunking.py
torchtext/datasets/dbpedia.py
torchtext/datasets/enwik9.py
torchtext/datasets/imdb.py
torchtext/datasets/iwslt2016.py
torchtext/datasets/iwslt2017.py
torchtext/datasets/mnli.py
torchtext/datasets/mrpc.py
torchtext/datasets/multi30k.py
torchtext/datasets/penntreebank.py
torchtext/datasets/qnli.py
torchtext/datasets/qqp.py
torchtext/datasets/rte.py
torchtext/datasets/sogounews.py
torchtext/datasets/squad1.py
torchtext/datasets/squad2.py
torchtext/datasets/sst2.py
torchtext/datasets/stsb.py
torchtext/datasets/udpos.py
torchtext/datasets/wikitext103.py
torchtext/datasets/wikitext2.py
torchtext/datasets/wnli.py
torchtext/datasets/yahooanswers.py
torchtext/datasets/yelpreviewfull.py
torchtext/datasets/yelpreviewpolarity.py

The torchtext.datasets namespace still relies on torchdata but that's OK: users who still need it can just install torchdata manually. Similarly I did not update/remove the tests/datasets/* files and instead just ignored all the dataset tests in pytest.ini.

I will be submitting follow-up PRs to also address the user-facing docs, raise proper warnings, etc.

@NicolasHug NicolasHug requested a review from atalman March 22, 2024 11:49
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 22, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/text/2241

Note: Links to docs will display an error until the docs builds have been completed.

❌ 19 New Failures, 5 Unrelated Failures

As of commit 9e13774 with merge base 52c0d85 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

if channel == "nightly":
validateTorchdataVersion()

print("torchtext version is ", torchtext.__version__)
Copy link
Contributor

@atalman atalman Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could include 1-2 basic torchtext functionalities tests, here to see if they work. This can be done in followup pr

@NicolasHug NicolasHug merged commit 51f373d into pytorch:main Mar 22, 2024
@NicolasHug NicolasHug deleted the remove_torchdata branch March 22, 2024 14:34
PaliC added a commit that referenced this pull request Mar 22, 2024
huydhn added a commit that referenced this pull request Mar 23, 2024
* Remove torchdata dependency from package and from CI (#2241)

* Fix torchdata import error (#2242)

* Remove stuff

* stuff

* lint

---------

Co-authored-by: Nicolas Hug <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants