From 96c0da68dadae66a803abf738a275bd042ac97ba Mon Sep 17 00:00:00 2001 From: Hu Xu Date: Wed, 4 Oct 2023 14:17:43 -0700 Subject: [PATCH] update readme and citation. --- metaclip/README.md | 5 +++-- CITATION.cff => openclip_CITATION.cff | 0 2 files changed, 3 insertions(+), 2 deletions(-) rename CITATION.cff => openclip_CITATION.cff (100%) diff --git a/metaclip/README.md b/metaclip/README.md index 43f809f..6cfebf8 100644 --- a/metaclip/README.md +++ b/metaclip/README.md @@ -1,7 +1,7 @@ # MetaCLIP -This is a minimal demo/skeleton code of CLIP curation, please check Algorithm 1 in MetaCLIP paper. -**This is not the production pipeline used to collect data for paper**. +This is a minimal demo/skeleton code of CLIP curation, please check Algorithm 1 in [MetaCLIP paper](https://arxiv.org/pdf/2309.16671.pdf). +**This is not the pipeline used to collect data in paper**. ## Part 1 Sub-string matching @@ -37,6 +37,7 @@ Want a distributed system to parse the full CC and download a dataset? consider ## Part 2 Balancing (expected after image downloading/NSFW/dedup) + ```bash mkdir -p data/CC/balanced python metaclip/balancing.py data/CC/matched data/CC/balanced 20000 # the magic 20k ! diff --git a/CITATION.cff b/openclip_CITATION.cff similarity index 100% rename from CITATION.cff rename to openclip_CITATION.cff