-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Description
.from_pretrained() can work in offline mode by loading from the cache, but we lack a method to explicitly populate that cache.
I'd like something along the lines of .cache_pretrained_for_offline() that fetches all the files necessary for .from_pretrained() but doesn't actually load the weights.
Use cases would include:
- something you do in an installation step or a "prepare for offline use" action to avoid loading delays later in the application, or in anticipation of network access becoming unavailable.
- preparing an environment (archive, container, disk image, etc) on a low-resource machine that will then be copied over to the high-spec machine for production use.
It should be able to run without a GPU (or other intended target device for the model) or heaps of RAM.
The advantage of populating the huggingface_hub cache with the model instead of saving a copy of the model to an application-specific local path is that you get to share that cache with other applications, you don't need any extra code to apply updates to your copy, you don't any switch to change from the default on-demand loading location to your local copy, etc.