Skip to content

NF nib-diff #617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 79 commits into from
Aug 2, 2018
Merged
Show file tree
Hide file tree
Changes from 67 commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
328d3bb
RF: moved all the functionality from nib-ls under nibabel.cmdline
yarikoptic Nov 3, 2017
d293a20
ENH: added a skeleton for nib-diff command
yarikoptic Nov 3, 2017
5491af4
TODO1 attempt 1: processed data type, data shape, and data offset of …
chrispycheng Nov 6, 2017
949762c
tweaked to remove AnalyzeHeader but currently still has problems
chrispycheng Nov 16, 2017
93b5e09
added nib-diff to setup.py
chrispycheng Nov 17, 2017
22804f1
first attempt at nib-diff that doesnt work
chrispycheng Nov 17, 2017
f81a78b
removed incorrectly committed changes
chrispycheng Nov 17, 2017
fe9c052
BK: stab at the test_dict_diff
yarikoptic Nov 17, 2017
5eb4477
first attempt at diff_dicts method and diff testing file
chrispycheng Nov 28, 2017
e2defb0
Merge branch 'nf-nib-diff' of https://github.com/yarikoptic/nibabel i…
chrispycheng Nov 28, 2017
5e3a767
tried something else with header fields
chrispycheng Nov 30, 2017
7febf65
latest attempt: restructured diff_dicts() method, troubleshooting ens…
chrispycheng Nov 30, 2017
23a43ba
corrected misplacement of cmdline files and latest attempt at diff_di…
chrispycheng Dec 1, 2017
fae491d
progress! tweaked bugs, corrected rookie mistakes like cmdline placem…
chrispycheng Dec 1, 2017
3e87d81
got rid of proc file and function works at a basic level
chrispycheng Dec 2, 2017
a3b35d9
tweaked diff_dicts to be compatible with tests
chrispycheng Dec 3, 2017
1491c61
got rid of None, troubleshot tests
chrispycheng Dec 9, 2017
397bc03
introduced hypothesis to use for testing with pretty sexy results
chrispycheng Dec 15, 2017
f192f65
noted hypothesis need for tests, refactored diff_dicts name
chrispycheng Dec 16, 2017
92553a2
attempt at TODO#2: allowing specification of header fields
chrispycheng Dec 18, 2017
7a70d56
now functional for several header files.
chrispycheng Dec 21, 2017
f5e930d
tweak to make hypothesis work with a list, but problem above has not …
chrispycheng Dec 23, 2017
df82a51
tweaked names and code as suggested!
chrispycheng Dec 24, 2017
6d706f5
bug fix
chrispycheng Dec 27, 2017
774ce3b
cosmetic tweaks
chrispycheng Jan 5, 2018
911d781
cleaned up code
chrispycheng Jan 7, 2018
0458694
promoted generic programming and got test to work again
chrispycheng Jan 13, 2018
fed70e9
tried to clean code but couldnt get comprehension going
chrispycheng Jan 13, 2018
0b59dfb
comment and docstring
chrispycheng Jan 15, 2018
2920abf
added options for text, json, yaml but still have to implement
chrispycheng Jan 21, 2018
1e57409
work in progress with all outputs
chrispycheng Feb 3, 2018
fd6c474
simplified code but now bizarrely doesnt run with two files
chrispycheng Mar 14, 2018
497ad2a
beautified text output, next up json and yaml
chrispycheng Mar 16, 2018
df0aa79
modified the tests for the new diff_values
chrispycheng Mar 17, 2018
c23143c
commenting out json and yaml, cosmetic mod to ls.py
chrispycheng Mar 18, 2018
db16d85
BF: all those flags are just boolean flags and no need to repeat thei…
yarikoptic Mar 26, 2018
feca439
progress towards table formatting
chrispycheng Mar 29, 2018
acf667b
table now works and looks p good
chrispycheng Mar 29, 2018
bb3fbf0
slight tweak to table formatting
chrispycheng Mar 29, 2018
8a92010
very inefficient but successfully removed dtype when its identical
chrispycheng Apr 4, 2018
a9a572a
removed extra blank line lol
chrispycheng Apr 5, 2018
3290a66
boosting coverage
chrispycheng Apr 5, 2018
92e4ed0
Merge remote-tracking branch 'nibabel/master' into nf-nib-diff
chrispycheng Apr 5, 2018
06e8dd7
data comparison function added
chrispycheng Apr 12, 2018
8fd6995
implemented test-nib-diff
chrispycheng Apr 24, 2018
df8bc04
added another test
chrispycheng Apr 25, 2018
45d3fbf
changed travis appveyor utils per test errors
chrispycheng Apr 25, 2018
1cbf5b3
test files
chrispycheng Apr 26, 2018
45bdf64
wasnt included in last commit by accident
chrispycheng Apr 26, 2018
c600746
maybe this will work instead
chrispycheng Apr 26, 2018
e26adb5
maybe this will work instead
chrispycheng Apr 26, 2018
3802919
test fixed
chrispycheng May 5, 2018
0ce86df
Merge branch 'nf-nib-diff' of https://github.com/yarikoptic/nibabel i…
chrispycheng May 5, 2018
72bc800
fixed to be more generic
chrispycheng May 6, 2018
0cf2a8c
fixed file name issue in nib diff
chrispycheng May 24, 2018
5db2654
fixed changes as yarik requested
chrispycheng May 24, 2018
50a480e
fixed problems noted by Travis CI
chrispycheng May 31, 2018
9e155df
building in difflib to see whats going on
chrispycheng May 31, 2018
3c0c90c
Merge branch 'master' into nf-nib-diff
effigies Jun 3, 2018
f8c32b8
latest attempt at compliance changes dict to list
chrispycheng Jun 26, 2018
51733b0
Merge branch 'nf-nib-diff' of https://github.com/yarikoptic/nibabel i…
chrispycheng Jun 26, 2018
41caade
Remove unrelated to PR difference
yarikoptic Jun 27, 2018
f1cee5f
RF+BF: report that diff if any header of data differs, return dict of…
yarikoptic Jun 27, 2018
676ac70
combined into one line
chrispycheng Jun 28, 2018
f476c48
new structure, names, tests
chrispycheng Jul 4, 2018
10c2c42
hypothesis: all my problems were due to that one test
chrispycheng Jul 5, 2018
7989563
whoops missed this
chrispycheng Jul 5, 2018
ae74339
i think im going to cry. code cleaned and made more pythonic
chrispycheng Jul 11, 2018
82b1457
added and fixed tests
chrispycheng Jul 11, 2018
45d0edf
fixing up appveyor and travis problems
chrispycheng Jul 11, 2018
c1f553f
fixed a fringe use case
chrispycheng Jul 12, 2018
2f89242
style tweak for travis
chrispycheng Jul 12, 2018
6613522
removed duplication, made things more generic
chrispycheng Jul 16, 2018
a311d7b
moved functionality outside to test and increase coverage
chrispycheng Jul 17, 2018
2cd69b5
boosting coverage by testing main
chrispycheng Jul 24, 2018
414da00
main test corrected for max coverage
chrispycheng Jul 27, 2018
59006b0
imported StringIO from six instead of io
chrispycheng Jul 27, 2018
672661e
added a test for cmdline function
chrispycheng Jul 27, 2018
baf6cdc
changes per Yariks comments!
chrispycheng Jul 28, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ cache:
- $HOME/.cache/pip
env:
global:
- DEPENDS="six numpy scipy matplotlib h5py pillow pydicom"
- DEPENDS="six numpy scipy matplotlib h5py pillow pydicom hypothesis"
- OPTIONAL_DEPENDS=""
- INSTALL_TYPE="setup"
- EXTRA_WHEELS="https://5cf40426d9f06eb7461d-6fe47d9331aba7cd62fc36c7196769e4.ssl.cf2.rackcdn.com"
Expand Down Expand Up @@ -95,7 +95,7 @@ before_install:
- source venv/bin/activate
- python --version # just to check
- pip install -U pip wheel # needed at one point
- retry pip install nose flake8 mock # always
- retry pip install nose flake8 mock hypothesis # always
- pip install $EXTRA_PIP_FLAGS $DEPENDS $OPTIONAL_DEPENDS
- if [ "${COVERAGE}" == "1" ]; then
pip install coverage;
Expand Down
3 changes: 1 addition & 2 deletions appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,7 @@ install:
- SET PATH=%PYTHON%;%PYTHON%\Scripts;%PATH%

# Install the dependencies of the project.
- pip install numpy scipy matplotlib nose h5py mock
- pip install pydicom
- pip install numpy scipy matplotlib nose h5py mock hypothesis pydicom
- pip install .
- SET NIBABEL_DATA_DIR=%CD%\nibabel-data

Expand Down
17 changes: 17 additions & 0 deletions bin/nib-diff
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#!python
# emacs: -*- mode: python-mode; py-indent-offset: 4; indent-tabs-mode: nil -*-
# vi: set ft=python sts=4 ts=4 sw=4 et:
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
#
# See COPYING file distributed along with the NiBabel package for the
# copyright and license terms.
#
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
"""
Quick diff summary for a set of neuroimaging files
"""

from nibabel.cmdline.diff import main

if __name__ == '__main__':
main()
1 change: 1 addition & 0 deletions dev-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@
-r requirements.txt
nose
mock
hypothesis
298 changes: 298 additions & 0 deletions nibabel/cmdline/diff.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,298 @@
#!python
# emacs: -*- mode: python-mode; py-indent-offset: 4; indent-tabs-mode: nil -*-
# vi: set ft=python sts=4 ts=4 sw=4 et:
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
#
# See COPYING file distributed along with the NiBabel package for the
# copyright and license terms.
#
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
"""
Quick summary of the differences among a set of neuroimaging files
"""
from __future__ import division, print_function, absolute_import

import re
import sys
from collections import OrderedDict
from optparse import OptionParser, Option
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't hold this PR up for this, but just FYI optparse has been deprecated, and argparse is the supported argument parser.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah... we should convert all the cmdline tools which still use optparse in some one PR ;)


import numpy as np

import nibabel as nib
import nibabel.cmdline.utils
import hashlib


def get_opt_parser():
# use module docstring for help output
p = OptionParser(
usage="%s [OPTIONS] [FILE ...]\n\n" % sys.argv[0] + __doc__,
version="%prog " + nib.__version__)

p.add_options([
Option("-v", "--verbose", action="count",
dest="verbose", default=0,
help="Make more noise. Could be specified multiple times"),

Option("-H", "--header-fields",
dest="header_fields", default='all',
help="Header fields (comma separated) to be printed as well (if present)"),
])

return p


def diff_values(first_item, second_item):
"""Generically compares two values, returns true if different"""
if np.any(first_item != second_item): # comparing items that are instances of class np.ndarray
return True

elif type(first_item) != type(second_item): # comparing items that differ in data type
return True

else: # all other use cases
return first_item != second_item


def diff_headers(files, fields):
"""Iterates over all header fields of all files to find those that differ

Parameters
----------
files: a given list of files to be compared
fields: the fields to be compared

Returns
-------
list
header fields whose values differ across files
"""

headers = []

for f in range(len(files)): # for each file
for h in fields: # for each header

# each maneuver is encased in a try block after exceptions have previously occurred
# get the particular header field within the particular file

try:
field = files[f][h]

except ValueError:
continue

# filter numpy arrays with a NaN value
try:
if np.all(np.isnan(field)):
continue

except TypeError:
pass

# compare current file with other files
for i in files[f + 1:]:
other_field = i[h]

# sometimes field.item doesn't work
try:
# converting bytes to be compared as strings
if isinstance(field.item(0), bytes):
field = field.item(0).decode("utf-8")

# converting np.ndarray to lists to remove ambiguity
if isinstance(field, np.ndarray):
field = field.tolist()

if isinstance(other_field.item(0), bytes):
other_field = other_field.item(0).decode("utf-8")
if isinstance(other_field, np.ndarray):
other_field = other_field.tolist()

except AttributeError:
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why continue? if one misses smth when others don't -- they differ


# if the header values of the two files are different, append
if diff_values(field, other_field):
headers.append(h)

if headers: # return a list of headers for the files whose values differ
return headers


def diff_header_fields(header_field, files):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it is just a single header_field, better rename function to correspond ;)

Also you also have get_headers_diff which is just the one which goes over multiple header fields.
So, please make names and signature consistent between them and more descriptive, also while renaming files into something closer to the current state of affairs, e.g. diff_header_fields(headers, fields) and diff_header_field(headers, field) ?

adjust also docstring to correspond. e.g.

headers: list of dict 
  Header records to be compared
field: str
  Name of the header field to compare

In PyCharm you could easily rename functions using Refactor -> Rename function.

But then I also spotted diff_headers above ;-) so it seems that there is duplicate functionality:

  • diff_headers now just returns header fields which differ
  • then you go again and get the values for those fields

Why not to make get_diff_headers do what diff_headers does now?
what you need is just to go through dicts and see which fields differ (or present in one but absent in another) and return a dict with those fields which differ.

"""Iterates over a single header field of multiple files

Parameters
----------
header_field: a given header field
files: the files to be compared

Returns
-------
list
str for each value corresponding to each file's given header field
"""

keyed_inputs = []

for i in files:

# each maneuver is encased in a try block after exceptions have previously occurred
# get the particular header field within the particular file

try:
field_value = i[header_field]
except ValueError:
continue

# compare different data types, return all values as soon as diff is found
for x in files[1:]:
try:
data_diff = diff_values(str(x[header_field].dtype), str(field_value.dtype))

if data_diff:
break
except ValueError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea ValueError could/should happen here (no comment etc) but I do not think it should be considered as "there is no difference"

continue

# string formatting of responses
try:

# if differences are found among data types
if data_diff:
# accounting for how to arrange arrays
if field_value.ndim < 1:
keyed_inputs.append("{}@{}".format(field_value, field_value.dtype))
elif field_value.ndim == 1:
keyed_inputs.append("{}@{}".format(list(field_value), field_value.dtype))

# if no differences are found among data types
else:
if field_value.ndim < 1:
keyed_inputs.append(field_value)
elif field_value.ndim == 1:
keyed_inputs.append(list(field_value))

except UnboundLocalError:
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't look "kosher" -- try to make code explicit to not have some undefined local variables used


for i in range(len(keyed_inputs)):
keyed_inputs[i] = str(keyed_inputs[i])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and in general in Python try to avoid explicit indexing. Eg. here it is just a list comprehension (or even a map):

keyed_inputs = [str(x) for x in keyed_inputs]

or

# list() because map in PY3 is a generator
keyed_inputs = list(map(str, keyed_inputs)))


return keyed_inputs


def get_headers_diff(file_headers, headers):
"""Get difference between headers

Parameters
----------
file_headers: list of actual headers from files
headers: list of header fields that differ

Returns
-------
dict
str: list for each header field which differs, return list of
values per each file
"""
output = OrderedDict()

# if there are headers that differ
if headers:

# for each header
for header in headers:

# find the values corresponding to the files that differ
val = diff_header_fields(header, file_headers)

# store these values in a dictionary
if val:
output[header] = val

return output


def get_data_md5sums(files):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that you would hate continued review @chrispycheng but, by seeing the function name I got confused why below it returns an empty list if there is only one unique value. So, please

  • rename to e.g. get_data_diff
  • add a docstring describing the output like you nicely did for the above get_header_diff


md5sums = [
hashlib.md5(np.ascontiguousarray(nib.load(f).get_data(), dtype=np.float32)).hexdigest()
for f in files
]

if len(set(md5sums)) == 1:
return []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that might be one contributor to your .000?% coverage miss (do you have codecov extension to the browser installed to see what lines aren't covered?). Apparently there is no test which verifies that you do get empty list in output whenever two files have the same data? you could make a dedicated test for this function and feed it


return md5sums


def main():
"""Getting the show on the road"""

parser = get_opt_parser()
(opts, files) = parser.parse_args()

nibabel.cmdline.utils.verbose_level = opts.verbose

assert len(files) >= 2, "Please enter at least two files"

if nibabel.cmdline.utils.verbose_level < 3:
# suppress nibabel format-compliance warnings
nib.imageglobals.logger.level = 50

file_headers = [nib.load(f).header for f in files]

if opts.header_fields: # will almost always have a header field
# signals "all fields"
if opts.header_fields == 'all':
# TODO: header fields might vary across file types, thus prior sensing would be needed
header_fields = file_headers[0].keys()
else:
header_fields = opts.header_fields.split(',')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apparently has no test case to test this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you test an intermediary if/else statement within a function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invoke the command/function with options where you specify your list of fields to be used for comparison

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that would be the command parser itself? I don't think any other NiBabel function has a test for that, maybe that could be a separate PR?

headers = diff_headers(file_headers, header_fields)
diff = get_headers_diff(file_headers, headers)
data_diff = get_data_md5sums(files)

if data_diff:
diff['DATA(md5)'] = data_diff

if diff:
print("These files are different.")
print("{:<11}".format('Field'), end="")

for f in files:
output = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is output simply the file name or something more complicated? If not, using os.path.basename might be easier, readable and generic across OSes to obtain the filename?

i = 0
while i < len(f):
if f[i] == "/" or f[i] == "\\":
output = ""
else:
output += f[i]
i += 1

print("{:<45}".format(output), end="")

print()

for key, value in diff.items():
print("{:<11}".format(key), end="")

for item in value:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea that might be interesting to consider is whether layout differences vertically, which can scale better with number of images. Although horizontal layout looks nice for 2/3 images, which can be the default, but for a 10 images, it might be easier print each differing field into a separate block of values from all images.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, but most frequent case is actually two files and many fields differing

item_str = str(item)
# Value might start/end with some invisible spacing characters so we
# would "condition" it on both ends a bit
item_str = re.sub('^[ \t]+', '<', item_str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this different from using str.strip()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Place those indicators only if anything was replaced at any end

item_str = re.sub('[ \t]+$', '>', item_str)
# and also replace some other invisible symbols with a question
# mark
item_str = re.sub('[\x00]', '?', item_str)
print("{:<45}".format(item_str), end="")

print()

raise SystemExit(1)
else:
print("These files are identical.")
35 changes: 30 additions & 5 deletions nibabel/cmdline/tests/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,15 @@
Test running scripts
"""

from numpy.testing import (assert_almost_equal,
assert_array_equal)

from nose.tools import (assert_true, assert_false, assert_raises,
assert_equal, assert_not_equal)
from nose.tools import assert_equal

import nibabel as nib
from nibabel.cmdline.utils import *
from nibabel.cmdline.diff import diff_header_fields, diff_headers
from os.path import (dirname, join as pjoin, abspath)


DATA_PATH = abspath(pjoin(dirname(__file__), '../../tests/data'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is already one defined, just use

from nibabel.testing import data_path

note that in above '/' within the path might be *nix specific and might (?) not work on Windows. That is why we use all the path.join



def test_table2string():
Expand Down Expand Up @@ -42,3 +44,26 @@ def get_test(self):

assert_equal(safe_get(test, "test"), 2)
assert_equal(safe_get(test, "failtest"), "-")


def test_diff_headers():
fnames = [pjoin(DATA_PATH, f)
for f in ('standard.nii.gz', 'example4d.nii.gz')]
file_headers = [nib.load(f).header for f in fnames]
headers = ['sizeof_hdr', 'data_type', 'db_name', 'extents', 'session_error', 'regular', 'dim_info', 'dim', 'intent_p1',
'intent_p2', 'intent_p3', 'intent_code', 'datatype', 'bitpix', 'slice_start', 'pixdim', 'vox_offset', 'scl_slope',
'scl_inter', 'slice_end', 'slice_code', 'xyzt_units', 'cal_max', 'cal_min', 'slice_duration', 'toffset', 'glmax',
'glmin', 'descrip', 'aux_file', 'qform_code', 'sform_code', 'quatern_b', 'quatern_c', 'quatern_d', 'qoffset_x',
'qoffset_y', 'qoffset_z', 'srow_x', 'srow_y', 'srow_z', 'intent_name', 'magic']

assert_equal(diff_headers(file_headers, headers), ['regular', 'dim_info', 'dim', 'datatype', 'bitpix', 'pixdim',
'slice_end', 'xyzt_units', 'cal_max', 'descrip', 'qform_code',
'sform_code', 'quatern_b', 'quatern_c', 'quatern_d', 'qoffset_x',
'qoffset_y', 'qoffset_z', 'srow_x', 'srow_y', 'srow_z'])


def test_diff_header_fields():
fnames = [pjoin(DATA_PATH, f)
for f in ('standard.nii.gz', 'example4d.nii.gz')]
file_headers = [nib.load(f).header for f in fnames]
assert_equal(diff_header_fields("dim_info", file_headers), ['0', '57'])
Binary file added nibabel/tests/data/spmT_0001.nii.gz
Binary file not shown.
Binary file added nibabel/tests/data/spmT_0001_2.nii.gz
Binary file not shown.
Loading