From 878303474fb70fdbf36111b8d0547b6b92743ea9 Mon Sep 17 00:00:00 2001 From: CNeuromod Bot Date: Thu, 13 Feb 2025 11:55:01 -0500 Subject: [PATCH 1/2] wip: data+project management workflow --- content/project_and_data_management.md | 87 ++++++++++++++++++++++++++ 1 file changed, 87 insertions(+) create mode 100644 content/project_and_data_management.md diff --git a/content/project_and_data_management.md b/content/project_and_data_management.md new file mode 100644 index 0000000..365bc43 --- /dev/null +++ b/content/project_and_data_management.md @@ -0,0 +1,87 @@ +# Project and data management + +To improve projects workflow, we divide these in differents units of work of differents types. + +## Work units types + +### Datasets +These as standardized (BIDS*(-ish)*) from data collection instruments: +eg. `mario` (as a short for `mario.bids`), or `mario.crowdsourced_curiosity` +Those are produced by data manager for most instruments, but can be contributed to by other members: +- a student doing a collection of data on a crowdsourcing platform: create a new dataset repo +- adding physio data to a BIDS dataset: branch+PR to existing repo + +#### template(s) +For BIDS we have [template](https://github.com/courtois-neuromod/bids_template) +If we try to BIDS-ify all data, these could be validated by custom-schema with the new validator. + +### Standard Derivatives +These are reuseable units of work that can be produced by anyone in the lab for the purpose of data release or on the journey of their own project. +These use a standard pipeline, and after choosing sensible parameters should to be ran once and for all. +`mario.fmriprep` , `things.glmsingle`, `floc.rois`, `things.memory_glm`, `mario.rois_timeseries`. + + +#### template + +The [template](https://github.com/courtois-neuromod/derivatives_template) contains placeholders `README` and `dataset_description.json` and some standard github workflows to be ran on the dataset (eg. deploy test, bids-validator...) + +### Analysis: + +This is a unit that tests hypotheses following the scientific method. + +eg. `mario.training_dependent_rsa`, the README would say: +> Here we test that practice of mario induces some reinforcement of scene-specific RSA patterns. +> Hypotheses: +> - 1: ratio of intra-scene vs. inter-scenes RSA distances in regions xxx increases with time of practice measured as cumulative duration of gameplay in the study. Only scenes where mario does not die are used to avoid an obvious bias. +> - 2: ... + +If possible it should rely on `mario.scenes_rsa` that would have extracted brainwise RSA patterns by scenes that could be reused by others. + +#### template +Project should start by forking a project template (to be designed+ implemented) + +``` ++-- .github/worflows/ ++-- Dockerfile # optional: builds environment to run the code+notebooks, can use other files (eg. requirements.txt for pip...) ++-- README.md ++-- docs/ # if there is a need for more docs than the README, maybe not necessary ++-- sourcedata/ # all raw and derivatives datasets stored here ++-- src +| +-- my_module.py +| +-- my_module_test.py # encouraged +| +-- my_script.py ++-- notebooks # (well formatted notebooks, all cells ran in order) ++-- playground|sandbox # mess (eg. dirty notebooks) that is not covered by tests, nor to be reviewed, nor used as final results +``` + +From the SIMEXP template we will derive a more specific neuromod template with the super-dataset pre-installed in sourcedata, an analysis bootstrap would be: +- GH: create from template, choose a good name +- `datalad install -s url_of_the_new_repo` +- `cd new_repo_clone && datalad create -f -d .` +- `datalad get -n sourcedata/cneuromod/{friends,movie10}/{timeseries,annotations}` +- ... the workflow below ... + +#### Workflow + +Iterating on a `dev` branch, then when ready for review and passing tests, open a PR to `main`, tag @PIs and @others for review, iterate on `dev` branch to address review. +Merge PR when it is approved (as all or majority agree this is an interesting and scientifically valid piece of work). +This cycle is repeated to improve or add analysis. + +To follow scientific method: + +- create the repo, add a README stating the hypotheses you want to test and a short method for each -> PR+review +- write dirty code for hypothesis #1 in the playground, get some interesting things, move/clean it to src/notebooks -> PR+review +- write code for hypothesis #2 in the playground, move/clean it to src/notebooks -> PR+review +- add another hypothesis #3 +method in the README -> PR+review (should move fast) +- write code for hypothesis #3 in the playground , move/clean it to src/notebooks -> PR+review +- improve code to test hypothesis #2 -> PR+review + +Master level: if you want to work on multiple hypotheses at the same time, create more branches+PRs. + +If analysis differs too much (done some RSA, now want to do encoding), create a new analysis repo. + +### Papers +Likely a myst-article. + +eg. a repo named `smb1_practice_induced_rsa_pattern_stabilization` created from [neurolibre/mystical-article](https://github.com/neurolibre/mystical-article) +Links the analysis repo as submodule, uses small 3d brain maps, matrices, tsvs,... from it to generate figures. From e1aadf08e5ea8426cd509de0dd20c0651083c985 Mon Sep 17 00:00:00 2001 From: CNeuromod Bot Date: Thu, 13 Feb 2025 12:06:21 -0500 Subject: [PATCH 2/2] gh project management --- content/project_and_data_management.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/content/project_and_data_management.md b/content/project_and_data_management.md index 365bc43..14ec728 100644 --- a/content/project_and_data_management.md +++ b/content/project_and_data_management.md @@ -8,8 +8,9 @@ To improve projects workflow, we divide these in differents units of work of dif These as standardized (BIDS*(-ish)*) from data collection instruments: eg. `mario` (as a short for `mario.bids`), or `mario.crowdsourced_curiosity` Those are produced by data manager for most instruments, but can be contributed to by other members: -- a student doing a collection of data on a crowdsourcing platform: create a new dataset repo -- adding physio data to a BIDS dataset: branch+PR to existing repo +- a student doing a collection of data on a crowdsourcing platform: create a new dataset repo from template. +- adding physio data to a BIDS dataset: branch+PR to existing repo. +- enriching BIDS events with variable extracted from bk2 retro gameplay files. #### template(s) For BIDS we have [template](https://github.com/courtois-neuromod/bids_template) @@ -42,6 +43,7 @@ Project should start by forking a project template (to be designed+ implemented) ``` +-- .github/worflows/ ++-- .github/ISSUE_TEMPLATE/ # could contains issue template for steps in the projects if we use GH project mngt. +-- Dockerfile # optional: builds environment to run the code+notebooks, can use other files (eg. requirements.txt for pip...) +-- README.md +-- docs/ # if there is a need for more docs than the README, maybe not necessary @@ -80,6 +82,8 @@ Master level: if you want to work on multiple hypotheses at the same time, creat If analysis differs too much (done some RSA, now want to do encoding), create a new analysis repo. +This repo could use the GH project management tool, creating issues for action items (and sub-issues if necessary), the issues could span other repos (eg. derivatives to be created, raw data to be enriched). + ### Papers Likely a myst-article.