-
Notifications
You must be signed in to change notification settings - Fork 1
Create ssh_repo_elm.md #37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I tried to build my first reproducible analysis with datalad repos for input / output, using data hosted on elm via ssh. And I struggled :) so I put together this tutorial with the help of chatgpt. I think I got it to work, but it's very possible I made mistakes. And I really struggled with datalad's docs. So hopefully more knowledgeable people can review and confirm this is in order, and the tutorial can save time to others (and myself) in the future.
datalad/ssh_repo_elm.md
Outdated
| ```bash | ||
| datalad create-sibling \ | ||
| --name elm \ | ||
| --site datalad \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what is the --site option, it is not in the docs, maybe a GPT-hallulu.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, I used chatgpt to create a summary of the steps, and he got this one wrong. Apologies I should have double checked. I believe that's what I used from my bash history: datalad create-sibling -s elm ssh://elm/data/simexp/pbellec/image10k-zooniverse --existing=skip
datalad/ssh_repo_elm.md
Outdated
| --name elm \ | ||
| --site datalad \ | ||
| --sshurl ssh://elm/data/simexp/pbellec/image10k-zooniverse \ | ||
| --shared all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want datasets to be writable by the group and readable by all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I was thinking to let people deal with permissions in their own space. Then if we want to publish the dataset either upload a version of it on zooniverse (for open data) or create a new sibling on S3. You're suggesting we would get a single folder for the lab hosting all datalad datasets?
datalad/ssh_repo_elm.md
Outdated
| datalad create-sibling-github courtois-neuromod image10k-zooniverse \ | ||
| --github-organization courtois-neuromod \ | ||
| --access-protocol ssh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| datalad create-sibling-github courtois-neuromod image10k-zooniverse \ | |
| --github-organization courtois-neuromod \ | |
| --access-protocol ssh | |
| datalad create-sibling-github courtois-neuromod/image10k-zooniverse \ | |
| --access-protocol ssh |
fix deprecation.
This requires a personal access token with adequate permissions to create repos for that org/user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed. I'm going to describe the method where the repo is created manually on github then added as a sibling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, I've used the syntax your suggested for org name, but with the datalad version I get through pip on my machine this does not seem to work, and I had to use the soon-obsolete flag --github-organization
|
|
||
| ### ⚠️ Tips & Troubleshooting | ||
|
|
||
| * If `datalad get` fails with `annex-ignore`, you likely cloned from GitHub only. Clone once from `elm` to propagate sibling config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using --as-common-datasrc NAME see above would fix that. Or setting the create sibling as autoenabled afterward git-annex configremote elm autoenable=true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this has been a point I'm still struggling with!! I could not get it to work such that installing from github would download from elm. So if I add --as-common-datasrc when I create the elm siblings it should fix it? or is that configuration staying local?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK experimented a bit and could not get it to work. I tried to remove the elm siblings then adding it back with: datalad siblings add --name elm --url ssh://elm/data/simexp/pbellec/image10k-zooniverse --as-common-datasrc origin Got this error:
add-sibling(impossible): . (sibling) [cannot configure as a common data source, URL protocol is not http or https] .: elm(+) [ssh://elm/data/simexp/pbellec/image10k-zooniverse (git)]
I tried to build my first reproducible analysis with datalad repos for input / output, using data hosted on elm via ssh. And I struggled :) so I put together this tutorial with the help of chatgpt. I think I got it to work, but it's very possible I made mistakes. And I really struggled with datalad's docs. So hopefully more knowledgeable people can review and confirm this is in order, and the tutorial can save time to others (and myself) in the future.