Skip to content

Commit 24dcba4

Browse files
committed
More information about special repositories for the corpus
1 parent f5e9229 commit 24dcba4

File tree

1 file changed

+14
-1
lines changed

1 file changed

+14
-1
lines changed

etc/corpus/README.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,13 +47,26 @@ In the example above, one would manage to fit 68568 repositories into 3.5TB.
4747
Run `head -n 999 repo_metadata.sample.jsonl | ./clone-repos.sh <corpus>` to clone into the given `<corpus>` location, or any other invocation with
4848
your respective `repo_metadata.jsonl` and the computed amount of repos to include as in `head -n <your-count> <your.jsonl>`.
4949

50-
#### Add one large (100GB+) repository by hand
50+
#### Add one large (100GB+) repository and one with a lot of commits repository by hand
5151

5252
Invoke `git clone --bare https://github.com/NagatoDEV/PlayStation-Home-Master-Archive <corpus>/github.com/NagatoDEV/PlayStation-Home-Master-Archive` (after replacing `<curpus>` with your base path)
5353
to obtain one sample of a huge repository with a lot of assets and other binary data whose tree spans more than 440k files.
5454

5555
That way, we also get to see what happens when we have to handle huge binary files in massive trees.
5656

57+
Another massive tree and a more than 1.3m commits comes in with this invocation:
58+
59+
`git clone --bare https://github.com/archlinux/svntogit-community <corpus>/github.com/archlinux/svntogit-community`.
60+
61+
Both repos should be topped off with
62+
```shell
63+
cd <corpus>
64+
for d in github.com/archlinux/svntogit-community github.com/NagatoDEV/PlayStation-Home-Master-Archive; do
65+
git -C $d read-tree @
66+
git -C $d commit-graph write --no-progress --reachable
67+
done
68+
```
69+
5770
### Run on-off `gix` commands by hand
5871

5972
Sometimes it's interesting to try a new command against all available repositories to see if it fails:

0 commit comments

Comments
 (0)