WiP RF+ENH: further RF to support flexible heuristic to name sequences so they follow BIDS in their naiming #32

yarikoptic · 2016-10-10T00:41:33Z

Sits on top of #31

It implements perspective heuristic which would allow simply to name sequences following BIDS convention. More details available at https://docs.google.com/document/d/1R54cgOe481oygYVZxI7NHrifDyFUZAjOBwCTu7M7y48/edit?usp=sharing

TODOs

harmonize location (and may be naming) of stored "info" (now sticks out too much)
make it work in multiple sessions within the same study
~~move derivatives~~ (atm there is no derivatives)
make sure original dicoms are tarballed and stored in appropriate subdir of BIDS dataset
basic support for DataLad ;)
- annotate dicom tarballs and anatomicals in annex with distribution-restrictions=sensitive
made DICOM tarballs reproducible

one of the test DICOM datasets to try it on is http://datasets.datalad.org/?dir=/dicoms/dartmouth-phantoms/bids_test3

…or getting arb set of files Decided to split away 'session' notion, which was pretty much used in the script only whenever multiple tarballs are provided, as a sign of multiple sessions. Otherwise -- session is given on cmdline, and thus processing that particular session might be 'incompatible' with future runs for other sessions if we save the entire mapping in a single file which we might load, and which wouldn't have that session information. Also, to allow for consumption of arbitrary set of dicoms, which might be coming from different studies and sessions, we need to analyze/group before extracting useful session sequence information. So, that was also in preparation to that

…from the same study instance uid

now that we use named tuple, easy to make it into a key. For file groups though we would need to unpair them, but it all just makes "alignment" easier and allows to group per studyUID easier

r=9; out=../outputs-all-reran$r; rm -rf $out; HEUDICONV_LOGLEVEL=DEBUG bin/heudiconv --dbg -f heuristics/dbic_bids.py -c dcm2niix -o $out -b ../dartmouth-phantoms/bids_test3

codecov-io · 2016-10-10T00:42:47Z

Codecov Report

Merging #32 into master will increase coverage by 64.24%.
The diff coverage is 82.48%.

@@             Coverage Diff             @@
##           master      #32       +/-   ##
===========================================
+ Coverage   14.11%   78.35%   +64.24%     
===========================================
  Files           4        6        +2     
  Lines         503     1109      +606     
===========================================
+ Hits           71      869      +798     
+ Misses        432      240      -192

Impacted Files	Coverage Δ
tests/test_heuristics.py	`100% <100%> (ø)`
tests/test_main.py	`100% <100%> (ø)`	⬆️
tests/utils.py	`100% <100%> (ø)`
tests/test_tarballs.py	`100% <100%> (ø)`
bin/heudiconv	`74.02% <78.07%> (+61.68%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c14df90...7b6f732. Read the comment docs.

…ierarchy of BIDS, safeguards against writing over

…fo (BK) seqinfo not necessarily matches info in terms of # of items apparently... so will not work as is

…cludes fieldmaps also creates template files for dataset descriptor etc

satra · 2016-10-14T01:58:12Z

@yarikoptic - the heuristics has become very complicated now - requires a sufficiently enlightened python programmer. can we have the simpler version, which was relatively easy for people to learn?

also:

we should not rely on dicomstack but just use pydicom + nibabel to group the dicoms
we should embed all the dicom metadata into the bids info json file. (currently bids is very limited w.r.t dicom metadata)

yarikoptic · 2016-10-14T02:29:55Z

I didn't change any of heuristics IIRC (at least not yet) ;)
I just allowed for what was not allowed before (e.g. sorting dicoms by study UIDs etc).
custom handling for BIDS files simply didn't do the right thing e.g. for fieldmaps etc

indeed currently it is a heap of changes, but once again -- I am just trying first to tune it so I could do what I need to accomplish, and later would hopefully look to catch back diversions from original implementation... which is tough as you know since there were not a single test of any kind.

…y precise

…malizing sid

…roviding dicom series time for files mtime

…ailing spaces from json files

…ersion.

…uth-phantoms/bids_test4-20161014/phantom-1/fmap_acq-3mm

…writes), unified some names, etc

* gh-mvdoc/enh-anonym: Reload scans keys if existing, start adding tests Clean up requirements and setup.py Update dcmstack source to fix missing importsys Fix dbic_bids.py compatibility with python 3.x Remove forgotten pdb Extract function to get row info and add test Add randomly generated column to avoid figuring out date from hash Name variables that make sense Simplify parsing of date NF: save scan_keys tsv file BF: do not test with bids flag if using convertall Add more requirements BF: add missing import re in embedder

satra · 2017-07-10T13:20:08Z

@yarikoptic - any chance this can be looked at and @matthew-brett's comments addressed?

yarikoptic

@matthew-brett @satra ha -- for some reason I was looking for your @matthew-brett comments in the PR against my clone, and saw none... then looked here. hopefully I caught all of them. I am now about to push the changes addressing (some) of them. And also merged with the work we did during brainhack -- to improve annonimization etc

yarikoptic · 2017-07-10T13:58:42Z

bin/heudiconv

 from collections import namedtuple
+from collections import defaultdict
+from collections import OrderedDict as ordereddict
 from os.path import isdir


could as well be
just a mater of habit/style and also

easier to prune -- I just need to delete the lines which are grayed out by pycharm due to imported names not being used.
Whenever it is all within a single line -- more pain, since need to go select specific names etc

I believe that it eases for pycharm to refactor/move pieces of code into other modules, so it finds/moves such import statements as well

yarikoptic · 2017-07-10T14:02:21Z

bin/heudiconv

-converted. This edited file will always overwrite the original file. If
-there is a need to revert to original state, please delete this edit.txt
-file and rerun the conversion
+This script uses DicomStack and mri_convert to convert DICOM directories.


took your version and expanded... most probably will need to resolve conflicts with master by now ;) thanks

yarikoptic · 2017-07-10T14:05:53Z

bin/heudiconv

+)
+
+
+class TempDirs(object):


Valid concern! But primarily for the case whenever such refactoring would (and if, since original idea was to have a single script IIRC) happen -- ATM it is just as fine imho, so just added a TODO comment on that

yarikoptic · 2017-07-10T14:07:02Z

bin/heudiconv

    dcmfilter : callable, optional
      If called on dcm_data and returns True, it is used to set
      series_id
+    grouping : str ('studyUID', 'accession_number') or None, optional


thanks, done

yarikoptic · 2017-07-10T14:14:23Z

bin/heudiconv


    Parameters
    ----------
    fl : list of str


I see no reason for type (_list) suffix really -- making it files (noone works with open files in here, so no fear of confusing with file instances) and flfilter into file_filter

yarikoptic · 2017-07-10T14:16:20Z

bin/heudiconv

+    allowed_groupings = ['studyUID', 'accession_number', None]
+    if grouping not in allowed_groupings:
+        raise ValueError('I do not know how to group by {0}'.format(grouping))
+    per_studyUID = grouping == 'studyUID'


could be done imho either way. this centralizes comparison to the string value into a variable. Happen we decide to refactor anything and complicate the logic to decide e.g. on per_studyUID value -- will be just a single point to change instead of hunting for all the comparisons to a string value. So I would leave it as is for now

yarikoptic · 2017-07-10T14:17:03Z

bin/heudiconv

-            pass
+            studyUID_ = mw.dcm_data.StudyInstanceUID
+        except AttributeError:
+            #import pdb; pdb.set_trace()


thanks -- cleaned up here and elsewheres

yarikoptic · 2017-07-10T14:18:42Z

bin/heudiconv

+        except AttributeError:
+            #import pdb; pdb.set_trace()
+            lgr.info("File %s is missing any StudyInstanceUID" % filename)
+            studyUID_ = None


ok, did to file_studyUID

* origin/master: enh: dcm2niix now outputs TotalReadoutTime Conflicts: Dockerfile -- kept our (Satra's) reformatted version and just progressed dcm2niix to v1.0.20170621 (60bab318ee738b644ebb1396bbb8cbe1b006218f)

…later in the code

* gh-mvdoc/enh-anonym: Add test for add_rows_to_scans_keys_file BF: forgotten nipype Conflicts: setup.py -- both fixed for nipype

satra · 2017-07-13T14:35:17Z

@yarikoptic - LGTM - i'm going to file an issue about cluster/distributed computing support to be added back after this is merged.

add David V. Smith (DVSneuro) as author

yarikoptic added 6 commits October 8, 2016 23:34

ENH: verify that whatever is given from groupping into seqinfo comes …

da386da

…from the same study instance uid

ENH: allow to group using StudyInstanceUID

d88d448

RF: make group_dicom_into_seqinfos return a dict

6bef1ed

now that we use named tuple, easy to make it into a key. For file groups though we would need to unpair them, but it all just makes "alignment" easier and allows to group per studyUID easier

RF: make group_dicom_into_seqinfos return a dict

2727bae

now that we use named tuple, easy to make it into a key. For file groups though we would need to unpair them, but it all just makes "alignment" easier and allows to group per studyUID easier

ENH+RF: first somewhat working dbic bids heuristic

1995cde

r=9; out=../outputs-all-reran$r; rm -rf $out; HEUDICONV_LOGLEVEL=DEBUG bin/heudiconv --dbg -f heuristics/dbic_bids.py -c dcm2niix -o $out -b ../dartmouth-phantoms/bids_test3

yarikoptic mentioned this pull request Oct 10, 2016

NF+ENH: make heuidiconv being able to layout the files deducing subject_id and using other DICOM fields #31

Merged

yarikoptic added 7 commits October 11, 2016 21:34

ENH: dbic-bids heuristics adjustment, dump into sourcedata mimicing h…

4004b44

…ierarchy of BIDS, safeguards against writing over

ENH: deal with multiple files generated for fieldmap by passing seqin…

29ce45f

…fo (BK) seqinfo not necessarily matches info in terms of # of items apparently... so will not work as is

Somewhat working version laying files out for a sample study which in…

e55166b

…cludes fieldmaps also creates template files for dataset descriptor etc

ENH: generate participant file etc

dba4e07

ENH: let's try to use pytest

f886e92

ENH+BF: few basic tests and a little bugfix already ;-)

da8e8b5

TST+BF: testing initialization of datalad dataset there

b16081e

yarikoptic added 13 commits October 13, 2016 22:31

ENH: notes on annex/datalad, try to install datalad via pip on elderl…

6f17ecf

…y precise

ENH: setup git user on travis so git does not complain

77f607d

RF: modality -> seqtype

2f8ed08

Fixups for datalad'ing the datasets, having no spurious _run, and nor…

0a45c0c

…malizing sid

ENH: reproducible dicoms (yet to test) by mocking out time.time and p…

9efd82e

…roviding dicom series time for files mtime

[DATALAD] new dataset

3c45bbf

BF: workaround for datalad issue with create/save dance + removing tr…

a2b03ce

…ailing spaces from json files

Adjusted description a bit (might still not correspond fully) and a v…

4d2c633

…ersion.

NF: added a single sample dicom file (fieldmap only phase from dartmo…

1d548ec

…uth-phantoms/bids_test4-20161014/phantom-1/fmap_acq-3mm

ENH+BF: make dicoms reproducible, simplify API etc

1f8d248

ENH(TST): very basic smoke tests for converall and dbic_bids heuristics

3412225

RF+BF+ENH (BIG:-/): --overwrite (to remove safety protection for over…

9684513

…writes), unified some names, etc

RF: just removed if True block and dedented

dc88b2c

mvdoc and others added 16 commits June 23, 2017 22:19

Add more requirements

25e77d5

BF: do not test with bids flag if using convertall

84c273c

NF: save scan_keys tsv file

16000e8

Simplify parsing of date

ec813e9

Name variables that make sense

c604256

Add randomly generated column to avoid figuring out date from hash

a3beffc

Extract function to get row info and add test

12c00aa

Remove forgotten pdb

4024047

Fix dbic_bids.py compatibility with python 3.x

d7b8df4

Update dcmstack source to fix missing importsys

f48fc9b

Clean up requirements and setup.py

079e63d

Reload scans keys if existing, start adding tests

463d902

ENH: add metadata for sensitive materials

382a815

BF: forgotten nipype

652d53c

Add test for add_rows_to_scans_keys_file

97fbd33

ENH: add .gitattributes only if was missing

a169e0b

yarikoptic commented Jul 10, 2017

View reviewed changes

yarikoptic added 3 commits July 10, 2017 10:29

ENH: addressing Matthew's comments -- should be no functional changes

25729f4

Merge remote-tracking branch 'origin/master' into enh-dbic2

0393eb4

* origin/master: enh: dcm2niix now outputs TotalReadoutTime Conflicts: Dockerfile -- kept our (Satra's) reformatted version and just progressed dcm2niix to v1.0.20170621 (60bab318ee738b644ebb1396bbb8cbe1b006218f)

BF: we need nipype for core and pytest not nose for testing

4cdb33d

yarikoptic assigned satra Jul 10, 2017

yarikoptic added 2 commits July 10, 2017 18:43

BF: fixup for recent RF fl -> files which collided with use of files …

b55b892

…later in the code

Merge remote-tracking branch 'gh-mvdoc/enh-anonym' into enh-dbic2

7b6f732

* gh-mvdoc/enh-anonym: Add test for add_rows_to_scans_keys_file BF: forgotten nipype Conflicts: setup.py -- both fixed for nipype

This was referenced Jul 13, 2017

Add cluster distributed conversion support #69

Open

Add sequence name to metadata extracted for heuristics #66

Merged

satra merged commit 48bd423 into nipy:master Aug 7, 2017

yarikoptic added a commit that referenced this pull request Jul 6, 2023

Merge pull request #32 from DVSneuro/main

3dd4f1d

add David V. Smith (DVSneuro) as author

		)


		class TempDirs(object):

WiP RF+ENH: further RF to support flexible heuristic to name sequences so they follow BIDS in their naiming #32

WiP RF+ENH: further RF to support flexible heuristic to name sequences so they follow BIDS in their naiming #32

Uh oh!

Conversation

yarikoptic commented Oct 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-io commented Oct 10, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

satra commented Oct 14, 2016

Uh oh!

yarikoptic commented Oct 14, 2016

Uh oh!

satra commented Jul 10, 2017

Uh oh!

yarikoptic left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

satra commented Jul 13, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yarikoptic commented Oct 10, 2016 •

edited

Loading

codecov-io commented Oct 10, 2016 •

edited

Loading