Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 11 additions & 11 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ commands:
steps:
- run:
name: Generate CCI cache key
command:
command: |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧐 Is this cosmetic or was it broken before?

echo "$(date "+%D")" > .cachekey
cat .circleci/cached_datasets_list.txt >> .cachekey
- persist_to_workspace:
Expand Down Expand Up @@ -380,24 +380,24 @@ jobs:
name: Generate cache
no_output_timeout: 30m
command: |
if [ ! -f .data/cache_status_file.json ] ; then
if [ ! -f /root/.torchtext/cache/cache_status_file.json ] ; then
.circleci/unittest/linux/scripts/setup_env.sh
.circleci/unittest/linux/scripts/install.sh
.circleci/unittest/linux/scripts/generate_cache.sh
fi
cat .data/cache_status_file.json
cat /root/.torchtext/cache/cache_status_file.json
- save_cache:

key: v1-linux-dataset-{{ checksum ".cachekey" }}

paths:
- .data
- /root/.torchtext/cache
- save_cache:

key: v1-linux-cache-index-{{ checksum ".cachekey" }}

paths:
- .data/cache_status_file.json
- /root/.torchtext/cache/cache_status_file.json

unittest_linux:
<<: *binary_common
Expand Down Expand Up @@ -432,7 +432,7 @@ jobs:

paths:
- .vector_cache
- .data
- /root/.torchtext/cache
- run:
name: Post process
command: .circleci/unittest/linux/scripts/post_process.sh
Expand All @@ -457,24 +457,24 @@ jobs:
name: Generate daily data Cache
no_output_timeout: 30m
command: |
if [ ! -f .data/cache_status_file.json ] ; then
if [ ! -f C:/Users/circleci/.torchtext/cache/cache_status_file.json ] ; then
.circleci/unittest/windows/scripts/setup_env.sh
.circleci/unittest/windows/scripts/install.sh
.circleci/unittest/windows/scripts/generate_cache.sh
fi
cat .data/cache_status_file.json
cat C:/Users/circleci/.torchtext/cache/cache_status_file.json
- save_cache:

key: v1-windows-dataset-{{ checksum ".cachekey" }}

paths:
- .data
- C:/Users/circleci/.torchtext/cache
- save_cache:

key: v1-windows-cache-index-{{ checksum ".cachekey" }}

paths:
- .data/cache_status_file.json
- C:/Users/circleci/.torchtext/cache/cache_status_file.json

unittest_windows:
<<: *binary_common
Expand Down Expand Up @@ -509,7 +509,7 @@ jobs:

paths:
- .vector_cache
- .data
- C:/Users/circleci/.torchtext/cache
- run:
name: Post process
command: .circleci/unittest/windows/scripts/post_process.sh
Expand Down
22 changes: 11 additions & 11 deletions .circleci/config.yml.in
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ commands:
steps:
- run:
name: Generate CCI cache key
command:
command: |
echo "$(date "+%D")" > .cachekey
cat .circleci/cached_datasets_list.txt >> .cachekey
- persist_to_workspace:
Expand Down Expand Up @@ -380,24 +380,24 @@ jobs:
name: Generate cache
no_output_timeout: 30m
command: |
if [ ! -f .data/cache_status_file.json ] ; then
if [ ! -f /root/.torchtext/cache/cache_status_file.json ] ; then
.circleci/unittest/linux/scripts/setup_env.sh
.circleci/unittest/linux/scripts/install.sh
.circleci/unittest/linux/scripts/generate_cache.sh
fi
cat .data/cache_status_file.json
cat /root/.torchtext/cache/cache_status_file.json
- save_cache:
{% raw %}
key: v1-linux-dataset-{{ checksum ".cachekey" }}
{% endraw %}
paths:
- .data
- /root/.torchtext/cache
- save_cache:
{% raw %}
key: v1-linux-cache-index-{{ checksum ".cachekey" }}
{% endraw %}
paths:
- .data/cache_status_file.json
- /root/.torchtext/cache/cache_status_file.json

unittest_linux:
<<: *binary_common
Expand Down Expand Up @@ -432,7 +432,7 @@ jobs:
{% endraw %}
paths:
- .vector_cache
- .data
- /root/.torchtext/cache
- run:
name: Post process
command: .circleci/unittest/linux/scripts/post_process.sh
Expand All @@ -457,24 +457,24 @@ jobs:
name: Generate daily data Cache
no_output_timeout: 30m
command: |
if [ ! -f .data/cache_status_file.json ] ; then
if [ ! -f C:/Users/circleci/.torchtext/cache/cache_status_file.json ] ; then
.circleci/unittest/windows/scripts/setup_env.sh
.circleci/unittest/windows/scripts/install.sh
.circleci/unittest/windows/scripts/generate_cache.sh
fi
cat .data/cache_status_file.json
cat C:/Users/circleci/.torchtext/cache/cache_status_file.json
- save_cache:
{% raw %}
key: v1-windows-dataset-{{ checksum ".cachekey" }}
{% endraw %}
paths:
- .data
- C:/Users/circleci/.torchtext/cache
- save_cache:
{% raw %}
key: v1-windows-cache-index-{{ checksum ".cachekey" }}
{% endraw %}
paths:
- .data/cache_status_file.json
- C:/Users/circleci/.torchtext/cache/cache_status_file.json

unittest_windows:
<<: *binary_common
Expand Down Expand Up @@ -509,7 +509,7 @@ jobs:
{% endraw %}
paths:
- .vector_cache
- .data
- C:/Users/circleci/.torchtext/cache
- run:
name: Post process
command: .circleci/unittest/windows/scripts/post_process.sh
Expand Down
4 changes: 2 additions & 2 deletions test/common/cache_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@
import torchtext
from .parameterized_utils import load_params

CACHE_STATUS_FILE = '.data/cache_status_file.json'
CACHE_STATUS_FILE = os.path.join(os.path.expanduser('~/.torchtext/cache'), 'cache_status_file.json')


def check_cache_status():
assert os.path.exists(CACHE_STATUS_FILE), "Cache status file does not exists"
assert os.path.exists(CACHE_STATUS_FILE), "Cache status file [{}] does not exists".format(CACHE_STATUS_FILE)
with open(CACHE_STATUS_FILE, 'r') as f:
missing_datasets = []
cache_status = json.load(f)
Expand Down
4 changes: 2 additions & 2 deletions torchtext/data/datasets_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@ def _wrap_split_argument_with_fn(fn, splits):
raise ValueError("Internal Error: Given function {} did not adhere to standard signature.".format(fn))

@functools.wraps(fn)
def new_fn(root='.data', split=splits, **kwargs):
def new_fn(root=os.path.expanduser('~/.torchtext/cache'), split=splits, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this BC-breaking change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a good question. This is the default path where datasets will be downloaded. In worst case, the data would be downloaded again for users relying on default behavior instead of providing their own root folder. Any think else that could potentially make it BC-breaking?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I think it's on a boarder-line but it is okay to it as backward compatible.
Yet, we probably should emphasize in the upcoming release note.

result = []
for item in _check_default_set(split, splits, fn.__name__):
result.append(fn(root, item, **kwargs))
Expand Down Expand Up @@ -250,7 +250,7 @@ def decorator(func):
raise ValueError("Internal Error: Given function {} did not adhere to standard signature.".format(fn))

@functools.wraps(func)
def wrapper(root='.data', *args, **kwargs):
def wrapper(root=os.path.expanduser('~/.torchtext/cache'), *args, **kwargs):
new_root = os.path.join(root, dataset_name)
if not os.path.exists(new_root):
os.makedirs(new_root)
Expand Down