Skip to content

Conversation

@yarikoptic
Copy link
Member

@yarikoptic yarikoptic commented Oct 10, 2016

Sits on top of #31

It implements perspective heuristic which would allow simply to name sequences following BIDS convention. More details available at https://docs.google.com/document/d/1R54cgOe481oygYVZxI7NHrifDyFUZAjOBwCTu7M7y48/edit?usp=sharing

TODOs

  • harmonize location (and may be naming) of stored "info" (now sticks out too much)
  • make it work in multiple sessions within the same study
  • move derivatives (atm there is no derivatives)
  • make sure original dicoms are tarballed and stored in appropriate subdir of BIDS dataset
  • basic support for DataLad ;)
    • annotate dicom tarballs and anatomicals in annex with distribution-restrictions=sensitive
  • made DICOM tarballs reproducible

one of the test DICOM datasets to try it on is http://datasets.datalad.org/?dir=/dicoms/dartmouth-phantoms/bids_test3

…or getting arb set of files

Decided to split away 'session' notion, which was pretty much used in the script
only whenever multiple tarballs are provided, as a sign of multiple sessions.
Otherwise -- session is given on cmdline, and thus processing that particular session
might be 'incompatible' with future runs for other sessions if we save the entire
mapping in a single file which we might load, and which wouldn't have that session
information.

Also, to allow for consumption of arbitrary set of dicoms, which might be coming
from different studies and sessions, we need to analyze/group before extracting
useful session sequence information.  So, that was also in preparation to that
now that we use named tuple, easy to make it into a key.  For file groups
though we would need to unpair them, but it all just makes "alignment" easier
and allows to group per studyUID easier
now that we use named tuple, easy to make it into a key.  For file groups
though we would need to unpair them, but it all just makes "alignment" easier
and allows to group per studyUID easier
r=9; out=../outputs-all-reran$r; rm -rf $out; HEUDICONV_LOGLEVEL=DEBUG bin/heudiconv --dbg -f heuristics/dbic_bids.py -c dcm2niix -o $out -b ../dartmouth-phantoms/bids_test3
@codecov-io
Copy link

codecov-io commented Oct 10, 2016

Codecov Report

Merging #32 into master will increase coverage by 64.24%.
The diff coverage is 82.48%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master      #32       +/-   ##
===========================================
+ Coverage   14.11%   78.35%   +64.24%     
===========================================
  Files           4        6        +2     
  Lines         503     1109      +606     
===========================================
+ Hits           71      869      +798     
+ Misses        432      240      -192
Impacted Files Coverage Δ
tests/test_heuristics.py 100% <100%> (ø)
tests/test_main.py 100% <100%> (ø) ⬆️
tests/utils.py 100% <100%> (ø)
tests/test_tarballs.py 100% <100%> (ø)
bin/heudiconv 74.02% <78.07%> (+61.68%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c14df90...7b6f732. Read the comment docs.

@satra
Copy link
Member

satra commented Oct 14, 2016

@yarikoptic - the heuristics has become very complicated now - requires a sufficiently enlightened python programmer. can we have the simpler version, which was relatively easy for people to learn?

also:

  1. we should not rely on dicomstack but just use pydicom + nibabel to group the dicoms
  2. we should embed all the dicom metadata into the bids info json file. (currently bids is very limited w.r.t dicom metadata)

@yarikoptic
Copy link
Member Author

I didn't change any of heuristics IIRC (at least not yet) ;)
I just allowed for what was not allowed before (e.g. sorting dicoms by study UIDs etc).
custom handling for BIDS files simply didn't do the right thing e.g. for fieldmaps etc

indeed currently it is a heap of changes, but once again -- I am just trying first to tune it so I could do what I need to accomplish, and later would hopefully look to catch back diversions from original implementation... which is tough as you know since there were not a single test of any kind.

mvdoc and others added 16 commits June 23, 2017 22:19
* gh-mvdoc/enh-anonym:
  Reload scans keys if existing, start adding tests
  Clean up requirements and setup.py
  Update dcmstack source to fix missing importsys
  Fix dbic_bids.py compatibility with python 3.x
  Remove forgotten pdb
  Extract function to get row info and add test
  Add randomly generated column to avoid figuring out date from hash
  Name variables that make sense
  Simplify parsing of date
  NF: save scan_keys tsv file
  BF: do not test with bids flag if using convertall
  Add more requirements
  BF: add missing import re in embedder
@satra
Copy link
Member

satra commented Jul 10, 2017

@yarikoptic - any chance this can be looked at and @matthew-brett's comments addressed?

Copy link
Member Author

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthew-brett @satra ha -- for some reason I was looking for your @matthew-brett comments in the PR against my clone, and saw none... then looked here. hopefully I caught all of them. I am now about to push the changes addressing (some) of them. And also merged with the work we did during brainhack -- to improve annonimization etc

from collections import namedtuple
from collections import defaultdict
from collections import OrderedDict as ordereddict
from os.path import isdir
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could as well be
just a mater of habit/style and also

  • easier to prune -- I just need to delete the lines which are grayed out by pycharm due to imported names not being used.
    Whenever it is all within a single line -- more pain, since need to go select specific names etc
  • I believe that it eases for pycharm to refactor/move pieces of code into other modules, so it finds/moves such import statements as well

bin/heudiconv Outdated
converted. This edited file will always overwrite the original file. If
there is a need to revert to original state, please delete this edit.txt
file and rerun the conversion
This script uses DicomStack and mri_convert to convert DICOM directories.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took your version and expanded... most probably will need to resolve conflicts with master by now ;) thanks

)


class TempDirs(object):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid concern! But primarily for the case whenever such refactoring would (and if, since original idea was to have a single script IIRC) happen -- ATM it is just as fine imho, so just added a TODO comment on that

bin/heudiconv Outdated
dcmfilter : callable, optional
If called on dcm_data and returns True, it is used to set
series_id
grouping : str ('studyUID', 'accession_number') or None, optional
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, done

bin/heudiconv Outdated
Parameters
----------
fl : list of str
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no reason for type (_list) suffix really -- making it files (noone works with open files in here, so no fear of confusing with file instances) and flfilter into file_filter

allowed_groupings = ['studyUID', 'accession_number', None]
if grouping not in allowed_groupings:
raise ValueError('I do not know how to group by {0}'.format(grouping))
per_studyUID = grouping == 'studyUID'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be done imho either way. this centralizes comparison to the string value into a variable. Happen we decide to refactor anything and complicate the logic to decide e.g. on per_studyUID value -- will be just a single point to change instead of hunting for all the comparisons to a string value. So I would leave it as is for now

bin/heudiconv Outdated
pass
studyUID_ = mw.dcm_data.StudyInstanceUID
except AttributeError:
#import pdb; pdb.set_trace()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks -- cleaned up here and elsewheres

bin/heudiconv Outdated
except AttributeError:
#import pdb; pdb.set_trace()
lgr.info("File %s is missing any StudyInstanceUID" % filename)
studyUID_ = None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, did to file_studyUID

* origin/master:
  enh: dcm2niix now outputs TotalReadoutTime

Conflicts:
   Dockerfile -- kept our (Satra's) reformatted version and just progressed dcm2niix to v1.0.20170621 (60bab318ee738b644ebb1396bbb8cbe1b006218f)
* gh-mvdoc/enh-anonym:
  Add test for add_rows_to_scans_keys_file
  BF: forgotten nipype

 Conflicts:
	setup.py -- both fixed for nipype
@satra
Copy link
Member

satra commented Jul 13, 2017

@yarikoptic - LGTM - i'm going to file an issue about cluster/distributed computing support to be added back after this is merged.

@satra satra merged commit 48bd423 into nipy:master Aug 7, 2017
yarikoptic added a commit that referenced this pull request Jul 6, 2023
add David V. Smith (DVSneuro) as author
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants