You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: etc/corpus/README.md
+14-1Lines changed: 14 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -47,13 +47,26 @@ In the example above, one would manage to fit 68568 repositories into 3.5TB.
47
47
Run `head -n 999 repo_metadata.sample.jsonl | ./clone-repos.sh <corpus>` to clone into the given `<corpus>` location, or any other invocation with
48
48
your respective `repo_metadata.jsonl` and the computed amount of repos to include as in `head -n <your-count> <your.jsonl>`.
49
49
50
-
#### Add one large (100GB+) repository by hand
50
+
#### Add one large (100GB+) repository and one with a lot of commits repository by hand
51
51
52
52
Invoke `git clone --bare https://github.com/NagatoDEV/PlayStation-Home-Master-Archive <corpus>/github.com/NagatoDEV/PlayStation-Home-Master-Archive` (after replacing `<curpus>` with your base path)
53
53
to obtain one sample of a huge repository with a lot of assets and other binary data whose tree spans more than 440k files.
54
54
55
55
That way, we also get to see what happens when we have to handle huge binary files in massive trees.
56
56
57
+
Another massive tree and a more than 1.3m commits comes in with this invocation:
0 commit comments