[Fix] buffer_size on ArraySequence #597

skoudoro · 2018-02-16T17:10:46Z

Hi,

When you initialize ArraySequence with a big generator on Windows, this operation is really really slow. (6 hours for 560583 streamlines...). Unfortunately, setting up the buffer_size does not have any effect.

So the goal of this PR is to fix this issue. Now, this operation takes me 46sec.

@MarcCote, @Garyfallidis, Can you look at this PR? Thanks!

MarcCote · 2018-02-16T18:08:17Z

nibabel/streamlines/tests/test_array_sequence.py

+        seq_with_buffer = ArraySequence(gen_2, buffer_size=256)
+
+        # Check buffer size effect
+        assert_true(seq_with_buffer.data.shape[0] > seq.data.shape[0])


Interesting, should we make sure that once creating is done we have the same ._data.shape[0]?

Sorry, I do not understand what you mean here

I just meant depending on the buffer_size you chose you might find yourself with a lot of "reserved" allocated space. Let say you have only one streamline in streamlines and do arr_seq = ArraySequence(streamlines, buffer_size=1024) then your arr_seq will be using 1024.

I thought we might want to trim arr_seq._data at the end of the extend in order to reduce memory consumption. On the other hand, using the default buffer_size=4 would only consume up to 4Mb which is rather small.

ok, I see, I was surprised too :-). For sure it will be good to trim the data. We should create a new PR for that, I suppose.

MarcCote · 2018-02-16T18:11:04Z

@matthew-brett can you confirm there is a typo in the logic when we compute the size of the buffer.
In master:
self.rows_per_buf = bytes_per_row / self.bytes_per_buf
with this PR
self.rows_per_buf = self.bytes_per_buf / bytes_per_row

matthew-brett · 2018-02-16T18:13:51Z

Oops, yes, that does look like a typo.

Garyfallidis · 2018-02-16T18:18:07Z

Looks good to me but I see that Travis is failing @MarcCote and @matthew-brett ?
Also, @matthew-brett this current bug holds the dipy release too. We were hoping to completely move to the new streamlines API. When is the next nibabel release planned? Apologies in advance for the pressure.

Garyfallidis · 2018-02-16T20:08:11Z

Ah I see. The errors are from other nibabel functions not from this PR.

effigies · 2018-02-16T20:23:29Z

At least the last one seems to be test_array_sequence.test_concatenate, which seems relevant.

I've opened an issue for the DICOM-related failures.

skoudoro · 2018-02-16T20:27:32Z

Ok, I will look at this one @effigies. It seems to be a buffer_size problem.

effigies · 2018-02-16T20:42:22Z

nibabel/streamlines/array_sequence.py

@@ -37,7 +37,7 @@ def __init__(self, arr_seq, common_shape, dtype):
        self.common_shape = common_shape
        n_in_row = reduce(mul, common_shape, 1)
        bytes_per_row = n_in_row * dtype.itemsize
-        self.rows_per_buf = bytes_per_row / self.bytes_per_buf
+        self.rows_per_buf = self.bytes_per_buf / bytes_per_row


Should rows_per_buf be an int or a float? If the idea is to load whole rows at a time, then you may want to use // to ensure that you round down.

effigies · 2018-02-17T02:13:59Z

nibabel/streamlines/tests/test_array_sequence.py

@@ -318,4 +328,4 @@ def test_concatenate():
    seqs = [seq[:, [i]] for i in range(seq.common_shape[0])]
    new_seq = concatenate(seqs, axis=0)
    assert_true(len(new_seq), seq.common_shape[0] * len(seq))
-    assert_array_equal(new_seq._data, seq._data.T.reshape((-1, 1)))
+    assert_array_equal(new_seq._data, seq._data.T.reshape((-1, 0)))


That's an illegal shape. What shape are you shooting for, here?

matthew-brett · 2018-02-17T15:50:52Z

nibabel/streamlines/array_sequence.py

@@ -37,7 +37,7 @@ def __init__(self, arr_seq, common_shape, dtype):
        self.common_shape = common_shape
        n_in_row = reduce(mul, common_shape, 1)
        bytes_per_row = n_in_row * dtype.itemsize
-        self.rows_per_buf = bytes_per_row / self.bytes_per_buf
+        self.rows_per_buf = int(np.ceil(self.bytes_per_buf / bytes_per_row))


Do you want ceil or floor here?

ceil because I want the minimal value to be 1 for rows_per_buf

But I guess you will have too many rows, oftentimes. How about:

self.rows_per_buf = max(1, int(self.bytes_per_buf / bytes_per_row))

Thanks, I think your way is more explicit, I will do that

Garyfallidis · 2018-02-17T18:26:15Z

Okay, I believe I was wrong DIPY's release does not need to be delayed by this fix. We can simply check for Nibabel's version and if not the one with the fix, inherit ArraySequence in our Streamlines class and overload the specific function that has the issue. !!! :) So, a bit of patching will do it :)

codecov-io · 2018-02-19T22:43:24Z

Codecov Report

Merging #597 into master will increase coverage by 1.05%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #597      +/-   ##
==========================================
+ Coverage    93.4%   94.46%   +1.05%     
==========================================
  Files         189      177      -12     
  Lines       25820    24950     -870     
  Branches     2752     2661      -91     
==========================================
- Hits        24118    23569     -549     
+ Misses       1205      908     -297     
+ Partials      497      473      -24

Impacted Files	Coverage Δ
nibabel/streamlines/array_sequence.py	`100% <100%> (ø)`	⬆️
nibabel/streamlines/tests/test_array_sequence.py	`99.51% <100%> (+0.01%)`	⬆️
nibabel/nicom/tests/test_dicomwrappers.py	`98.34% <0%> (ø)`	⬆️
nibabel/externals/tests/test_netcdf.py
nibabel/externals/netcdf.py
nibabel/benchmarks/bench_arrayproxy_slicing.py
nibabel/externals/__init__.py
nibabel/benchmarks/bench_streamlines.py
nibabel/benchmarks/bench_load_save.py
nibabel/benchmarks/bench_fileslice.py
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6eacaf5...ddd7240. Read the comment docs.

coveralls · 2018-02-19T22:43:33Z

Coverage increased (+1.03%) to 96.361% when pulling ddd7240 on skoudoro:fix-array-sequence into 6eacaf5 on nipy:master.

matthew-brett · 2018-02-20T12:52:17Z

@MarcCote - are you OK with this one now? If so, we can merge.

MarcCote · 2018-02-20T15:07:08Z

@matthew-brett @effigies Yes, I'm OK with all the changes. Thanks @skoudoro.

skoudoro · 2018-02-20T15:11:32Z

Thanks all for your review

skoudoro added 2 commits February 16, 2018 10:36

fix rows_per_buf definition

77a7dc3

add a test to check buffer_size effect

3724fcf

MarcCote reviewed Feb 16, 2018

View reviewed changes

effigies reviewed Feb 16, 2018

View reviewed changes

skoudoro added 2 commits February 16, 2018 17:34

adressed @effigies suggestion : get a integer by rounding rows_per_buf

bc96348

fix concatenate test

2ace3ef

effigies reviewed Feb 17, 2018

View reviewed changes

matthew-brett reviewed Feb 17, 2018

View reviewed changes

skoudoro added 2 commits February 19, 2018 17:09

addressed matthew comment, and adding shrink to finalize_append step

7ce7d0f

test_correction

c7861d2

STY: Remove spaces after final newline

ddd7240

MarcCote approved these changes Feb 20, 2018

View reviewed changes

effigies approved these changes Feb 20, 2018

View reviewed changes

effigies merged commit e48b746 into nipy:master Feb 20, 2018

skoudoro deleted the fix-array-sequence branch February 20, 2018 15:12

Garyfallidis mentioned this pull request May 29, 2018

ETA for next nibabel release? #631

Closed

skoudoro mentioned this pull request Jul 25, 2018

Upgrade nibabel minimum version dipy/dipy#1597

Merged

[Fix] buffer_size on ArraySequence #597

[Fix] buffer_size on ArraySequence #597

Uh oh!

Conversation

skoudoro commented Feb 16, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MarcCote commented Feb 16, 2018

Uh oh!

matthew-brett commented Feb 16, 2018

Uh oh!

Garyfallidis commented Feb 16, 2018

Uh oh!

Garyfallidis commented Feb 16, 2018

Uh oh!

effigies commented Feb 16, 2018

Uh oh!

skoudoro commented Feb 16, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Garyfallidis commented Feb 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-io commented Feb 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coveralls commented Feb 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matthew-brett commented Feb 20, 2018

Uh oh!

MarcCote commented Feb 20, 2018

Uh oh!

skoudoro commented Feb 20, 2018

Uh oh!

Uh oh!

Garyfallidis commented Feb 17, 2018 •

edited

Loading

codecov-io commented Feb 19, 2018 •

edited

Loading

coveralls commented Feb 19, 2018 •

edited

Loading