-
Notifications
You must be signed in to change notification settings - Fork 1.3k
purge: Add dvc purge command #10880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
purge: Add dvc purge command #10880
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #10880 +/- ##
==========================================
+ Coverage 90.68% 90.98% +0.29%
==========================================
Files 504 508 +4
Lines 39795 41025 +1230
Branches 3141 3257 +116
==========================================
+ Hits 36087 37325 +1238
- Misses 3042 3054 +12
+ Partials 666 646 -20 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Specific question for reviewers: are there any parts of the code for which there are existing helpers in the codebase I don't know about (I haven't done much dev in DVC) |
Note: reviewers can get a feel for the tool by running: # 1. Initialize a Git repo
git init dvc-repo
cd dvc-repo
# 2. Initialize DVC
dvc init
# 3. Create a few 1MB junk files
for i in (seq 1 5)
head -c 1M </dev/urandom > file_$i.bin
end
# 4. Add the files to DVC
dvc add file_*.bin
# 5. Commit changes to Git
git add .
git commit -m "Initialize DVC repo with 1MB junk files" (be sure to have the dvc version installed). 1. Preview what files would be deleted$ dvc purge --dry-run
WARNING: This will show what local DVC-tracked outputs would be removed for the entire workspace.
(dry-run: showing what would be removed, no changes).
ERROR: No default remote configured. Cannot safely purge outputs without verifying remote backup.
Use `--force` to purge anyway. 2. Preview what files would be deleted (with
|
Hi, thank you for creating the pull request. I am OOO, so please give me a few days for me to review this (and the problem statement/issue itself). |
This PR introduces a new
dvc purge
command to remove DVC-tracked outputs and their cache, while leaving stage metadata (.dvc
files, dvc.yaml) intact. It's intended as a safer/faster alternative to manually deleting files and cache when cleaning up a workspace.CLI
dvc purge [targets...] [--recursive] [--dry-run] [-f|--force] [-y]
targets...
: optional list of specific files/directories to purge. If omitted, the entire repo is considered.--recursive
,-r
: recurse into directories.--dry-run
: show what would be removed, without deleting anything.--force
,-f
: bypass safety checks (dirty outputs, remote backup).--yes
,-y
: skip confirmation prompt.Behaviour
Safety Checks
Before purging, DVC performs two safety checks:
Dirty outputs – if an output has been modified in the workspace and differs from cache:
--force
is used.Remote backup – if a default remote is configured, verify that all outputs are present remotely:
--force
.--force
.--force
, purge proceeds but logs a warning that data may be permanently lost.Example
Tests
--force
--force
Fixes #10874
Docs will be added in iterative/dvc.org#5464