Skip to content

Conversation

Wheest
Copy link
Contributor

@Wheest Wheest commented Oct 3, 2025

This PR introduces a new dvc purge command to remove DVC-tracked outputs and their cache, while leaving stage metadata (.dvc files, dvc.yaml) intact. It's intended as a safer/faster alternative to manually deleting files and cache when cleaning up a workspace.

CLI

dvc purge [targets...] [--recursive] [--dry-run] [-f|--force] [-y]
  • targets...: optional list of specific files/directories to purge. If omitted, the entire repo is considered.
  • --recursive, -r: recurse into directories.
  • --dry-run: show what would be removed, without deleting anything.
  • --force, -f: bypass safety checks (dirty outputs, remote backup).
  • --yes, -y: skip confirmation prompt.

Behaviour

  • Collect outputs (outs) from .dvc files and dvc.yaml.
  • For each output:
    • Remove workspace copies (files/dirs).
    • Remove corresponding objects from the local cache.
  • Stage metadata remains intact.
  • Non-DVC files are never touched.

Safety Checks

Before purging, DVC performs two safety checks:

  1. Dirty outputs – if an output has been modified in the workspace and differs from cache:

    • Abort with PurgeError unless --force is used.
  2. Remote backup – if a default remote is configured, verify that all outputs are present remotely:

  • If missing -> abort unless --force.
  • If no remote is configured -> abort unless --force.
  • With --force, purge proceeds but logs a warning that data may be permanently lost.

Example

$ dvc purge --dry-run
WARNING: This will permanently remove local DVC-tracked outputs for the entire workspace.
(dry-run: showing what would be removed, no changes).
ERROR: No default remote configured. Cannot safely purge outputs without verifying remote backup.
Use `--force` to purge anyway.
$ dvc purge --force -y
WARNING: This will permanently remove local DVC-tracked outputs for the entire workspace.
WARNING: No default remote configured. Proceeding with purge due to --force. Outputs may be permanently lost.
Removed 5 outputs (workspace + cache).

Tests

  • ✅ Purge removes both workspace + cache copies, leaves .dvc metadata.
  • ✅ Purge with targets removes only matching outs.
  • ✅ Recursive purge works on nested dirs.
  • ✅ Dry-run lists removals without making changes
  • ✅ Dirty outs raise error unless --force
  • ✅ Missing remote / missing objects raise error unless --force
  • ✅ CLI tests for confirmation, -y, and force behavior.

Fixes #10874
Docs will be added in iterative/dvc.org#5464

@github-project-automation github-project-automation bot moved this to Backlog in DVC Oct 3, 2025
Copy link

codecov bot commented Oct 3, 2025

Codecov Report

❌ Patch coverage is 94.97908% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.98%. Comparing base (2431ec6) to head (8a218df).
⚠️ Report is 135 commits behind head on main.

Files with missing lines Patch % Lines
dvc/repo/purge.py 86.66% 6 Missing and 4 partials ⚠️
dvc/commands/purge.py 94.11% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10880      +/-   ##
==========================================
+ Coverage   90.68%   90.98%   +0.29%     
==========================================
  Files         504      508       +4     
  Lines       39795    41025    +1230     
  Branches     3141     3257     +116     
==========================================
+ Hits        36087    37325    +1238     
- Misses       3042     3054      +12     
+ Partials      666      646      -20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Wheest
Copy link
Contributor Author

Wheest commented Oct 3, 2025

Specific question for reviewers: are there any parts of the code for which there are existing helpers in the codebase I don't know about (I haven't done much dev in DVC)

@Wheest
Copy link
Contributor Author

Wheest commented Oct 3, 2025

Note: reviewers can get a feel for the tool by running:

# 1. Initialize a Git repo
git init dvc-repo
cd dvc-repo

# 2. Initialize DVC
dvc init

# 3. Create a few 1MB junk files
for i in (seq 1 5)
    head -c 1M </dev/urandom > file_$i.bin
end

# 4. Add the files to DVC
dvc add file_*.bin

# 5. Commit changes to Git
git add .
git commit -m "Initialize DVC repo with 1MB junk files"

(be sure to have the dvc version installed).

1. Preview what files would be deleted

$ dvc purge --dry-run
WARNING: This will show what local DVC-tracked outputs would be removed for the entire workspace.
(dry-run: showing what would be removed, no changes).
ERROR: No default remote configured. Cannot safely purge outputs without verifying remote backup.
Use `--force` to purge anyway.

2. Preview what files would be deleted (with --force)

$ dvc purge --dry-run --force
WARNING: This will show what local DVC-tracked outputs would be removed for the entire workspace.
(dry-run: showing what would be removed, no changes).
WARNING: No default remote configured. Proceeding with purge due to --force. Outputs may be permanently lost.
[dry-run] Would remove file_4.bin
[dry-run] Would remove file_5.bin
[dry-run] Would remove file_1.bin
[dry-run] Would remove file_3.bin
[dry-run] Would remove file_2.bin
Nothing to purge.

3. Try and purge files that aren't backed up

$ dvc purge
WARNING: This will permanently remove local DVC-tracked outputs for the entire workspace.
Are you sure you want to proceed? [y/n]: y
ERROR: Some outputs are not present in the remote cache and would be permanently lost if purged:
  - file_4.bin
  - file_5.bin
  - file_1.bin
  - file_3.bin
  - file_2.bin
Use `--force` to purge anyway.

4. Change a file, preview warnings

# append 10 random bytes at the end
$ dd if=/dev/urandom bs=1 count=10 >> file_1.bin

$ dvc purge --dry-run
WARNING: This will show what local DVC-tracked outputs would be removed for the entire workspace.
(dry-run: showing what would be removed, no changes).
ERROR: Some tracked outputs have uncommitted changes. Use `--force` to purge anyway.
  - file_1.bin

5. Set up remote, preview what would be removed

$ mkdir -p /tmp/dvc-remote
$ dvc remote add -d local_remote /tmp/dvc-remote
$ dvc push
$ dvc purge --dry-run
WARNING: This will show what local DVC-tracked outputs would be removed for the entire workspace.
(dry-run: showing what would be removed, no changes).
[dry-run] Would remove file_4.bin
[dry-run] Would remove file_5.bin
[dry-run] Would remove file_1.bin
[dry-run] Would remove file_3.bin
[dry-run] Would remove file_2.bin
Nothing to purge.

6. Purge files that are confirmed to be backed up

$ dvc purge -y
WARNING: This will permanently remove local DVC-tracked outputs for the entire workspace.
Removed 5 outputs (workspace + cache).

@skshetry
Copy link
Collaborator

skshetry commented Oct 4, 2025

Hi, thank you for creating the pull request. I am OOO, so please give me a few days for me to review this (and the problem statement/issue itself).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

Successfully merging this pull request may close these issues.

Remove all locally downloaded data
2 participants