-
Notifications
You must be signed in to change notification settings - Fork 264
[Fix] buffer_size on ArraySequence #597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
seq_with_buffer = ArraySequence(gen_2, buffer_size=256) | ||
|
||
# Check buffer size effect | ||
assert_true(seq_with_buffer.data.shape[0] > seq.data.shape[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, should we make sure that once creating is done we have the same ._data.shape[0]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I do not understand what you mean here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just meant depending on the buffer_size you chose you might find yourself with a lot of "reserved" allocated space. Let say you have only one streamline in streamlines
and do arr_seq = ArraySequence(streamlines, buffer_size=1024)
then your arr_seq
will be using 1024.
I thought we might want to trim arr_seq._data
at the end of the extend
in order to reduce memory consumption. On the other hand, using the default buffer_size=4
would only consume up to 4Mb which is rather small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I see, I was surprised too :-). For sure it will be good to trim the data. We should create a new PR for that, I suppose.
@matthew-brett can you confirm there is a typo in the logic when we compute the size of the buffer. |
Oops, yes, that does look like a typo. |
Looks good to me but I see that Travis is failing @MarcCote and @matthew-brett ? |
Ah I see. The errors are from other nibabel functions not from this PR. |
At least the last one seems to be I've opened an issue for the DICOM-related failures. |
Ok, I will look at this one @effigies. It seems to be a |
@@ -37,7 +37,7 @@ def __init__(self, arr_seq, common_shape, dtype): | |||
self.common_shape = common_shape | |||
n_in_row = reduce(mul, common_shape, 1) | |||
bytes_per_row = n_in_row * dtype.itemsize | |||
self.rows_per_buf = bytes_per_row / self.bytes_per_buf | |||
self.rows_per_buf = self.bytes_per_buf / bytes_per_row |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should rows_per_buf
be an int
or a float
? If the idea is to load whole rows at a time, then you may want to use //
to ensure that you round down.
@@ -318,4 +328,4 @@ def test_concatenate(): | |||
seqs = [seq[:, [i]] for i in range(seq.common_shape[0])] | |||
new_seq = concatenate(seqs, axis=0) | |||
assert_true(len(new_seq), seq.common_shape[0] * len(seq)) | |||
assert_array_equal(new_seq._data, seq._data.T.reshape((-1, 1))) | |||
assert_array_equal(new_seq._data, seq._data.T.reshape((-1, 0))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an illegal shape. What shape are you shooting for, here?
@@ -37,7 +37,7 @@ def __init__(self, arr_seq, common_shape, dtype): | |||
self.common_shape = common_shape | |||
n_in_row = reduce(mul, common_shape, 1) | |||
bytes_per_row = n_in_row * dtype.itemsize | |||
self.rows_per_buf = bytes_per_row / self.bytes_per_buf | |||
self.rows_per_buf = int(np.ceil(self.bytes_per_buf / bytes_per_row)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want ceil
or floor
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ceil
because I want the minimal value to be 1 for rows_per_buf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I guess you will have too many rows, oftentimes. How about:
self.rows_per_buf = max(1, int(self.bytes_per_buf / bytes_per_row))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think your way is more explicit, I will do that
Okay, I believe I was wrong DIPY's release does not need to be delayed by this fix. We can simply check for Nibabel's version and if not the one with the fix, inherit ArraySequence in our Streamlines class and overload the specific function that has the issue. !!! :) So, a bit of patching will do it :) |
Codecov Report
@@ Coverage Diff @@
## master #597 +/- ##
==========================================
+ Coverage 93.4% 94.46% +1.05%
==========================================
Files 189 177 -12
Lines 25820 24950 -870
Branches 2752 2661 -91
==========================================
- Hits 24118 23569 -549
+ Misses 1205 908 -297
+ Partials 497 473 -24 Continue to review full report at Codecov.
|
@MarcCote - are you OK with this one now? If so, we can merge. |
@matthew-brett @effigies Yes, I'm OK with all the changes. Thanks @skoudoro. |
Thanks all for your review |
Hi,
When you initialize
ArraySequence
with a big generator on Windows, this operation is really really slow. (6 hours for 560583 streamlines...). Unfortunately, setting up thebuffer_size
does not have any effect.So the goal of this PR is to fix this issue. Now, this operation takes me 46sec.
@MarcCote, @Garyfallidis, Can you look at this PR? Thanks!