-
Notifications
You must be signed in to change notification settings - Fork 265
Blackrock add summary of automatic data segmentation #1769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blackrock add summary of automatic data segmentation #1769
Conversation
| # read nsx headers | ||
| nsx_header_reader = self._nsx_header_reader[spec_version] | ||
| self._nsx_basic_header[nsx_nb], self._nsx_ext_header[nsx_nb] = nsx_header_reader(nsx_nb) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to better document how the sampling frequency of each stream is calculated according to the spec.
| ("timestamps", "uint64"), | ||
| ("num_data_points", "uint32"), | ||
| ("samples", "int16", self._nsx_basic_header[nsx_nb]["channel_count"]), | ||
| ("samples", "int16", (self._nsx_basic_header[nsx_nb]["channel_count"],)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This removes a warning from numpy, raw number as shapes are deprecate and will be removed:
FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. npackets = int((filesize - offset) / np.dtype(ptp_dt).itemsize)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And based on our testing this change is also backward compatible or should we test this over a greater range of numpy versions than our current skinny IO tests?
zm711
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few initial comments/questions.
| # E.g. it is 1 for 30_000, 3 for 10_000, etc | ||
| nsx_period = self._nsx_basic_header[nsx_nb]["period"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this 30_000 true for all versions? I vaguely remember that different versions had different resolutions but maybe I'm wrong?
And seems like this was hardcoded the same way below before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's how the spec defines it and the history of the spec does not mention changes on that value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. I just couldn't remember.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can confirm, the nominal base rate has always been 30 kHz, and this is the rate used by sample groups 5 and 6. Other sample groups are just sub-sampling this base rate (hopefully with anti-aliasing built in).
| else: | ||
| t_start = timestamps / ts_res | ||
| t_stop = max(t_stop, t_start + length / self.sig_sampling_rates[nsx_nb]) | ||
| t_stop = max(t_stop, t_start + length / self._nsx_sampling_frequency[nsx_nb]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be explained to me why would the length lead to being longer than the the t_stop itself? Maybe I need to read the whole reader to understand this, but if there is a two second explanation that would be great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is overly-defensive programming to say that t_stop should be larger than 0, check the whole block:
t_stop = 0.0
for nsx_nb in self.nsx_to_load:
spec = self._nsx_spec[nsx_nb]
if "timestamp_resolution" in self._nsx_basic_header[nsx_nb].dtype.names:
ts_res = self._nsx_basic_header[nsx_nb]["timestamp_resolution"]
elif spec == "2.1":
ts_res = self._nsx_params[spec](nsx_nb)["timestamp_resolution"]
else:
ts_res = 30_000
period = self._nsx_basic_header[nsx_nb]["period"]
sec_per_samp = period / 30_000 # Maybe 30_000 should be ['sample_resolution']
length = self.nsx_datas[nsx_nb][data_bl].shape[0]
if self._nsx_data_header[nsx_nb] is None:
t_start = 0.0
t_stop = max(t_stop, length / self._nsx_sampling_frequency[nsx_nb])
else:
timestamps = self._nsx_data_header[nsx_nb][data_bl]["timestamp"]
if hasattr(timestamps, "size") and timestamps.size == length:
# FileSpec 3.0 with PTP -- use the per-sample timestamps
t_start = timestamps[0] / ts_res
t_stop = max(t_stop, timestamps[-1] / ts_res + sec_per_samp)
else:
t_start = timestamps / ts_res
t_stop = max(t_stop, t_start + length / self._nsx_sampling_frequency[nsx_nb])writing 0 directly would have been more readable than defining the variable though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree because we define t_stop in all branches right? So we don't need to guard with an initial t_stop. That was definitely my confusion. We could always update that later. It's not crucial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do it now, we both already read and understood and we think this could be improved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forget about it, the loop above is properly sort of aligning the t_stops across the nsx segments.
So, yes, let's improve this later.
| ("timestamps", "uint64"), | ||
| ("num_data_points", "uint32"), | ||
| ("samples", "int16", self._nsx_basic_header[nsx_nb]["channel_count"]), | ||
| ("samples", "int16", (self._nsx_basic_header[nsx_nb]["channel_count"],)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And based on our testing this change is also backward compatible or should we test this over a greater range of numpy versions than our current skinny IO tests?
| # We convert this indices to actual timestamps in seconds | ||
| raw_timestamps = struct_arr["timestamps"] | ||
| timestamps_sampling_rate = self._nsx_basic_header[nsx_nb]["timestamp_resolution"] # clocks per sec uint64 or uint32 | ||
| timestamps_in_seconds = raw_timestamps / timestamps_sampling_rate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this division lead to any issues uint/float should just be float right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, let me double-check the casting rules to see if nasty surprises could appear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that the unsigned integers will be cast to float before dividing so I don't expect anything here, can you think on something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I didn't have anything in mind, but since we've dealt with so many random overflows lately especially on the Neo side with the changes in numpy 2.0 I'm just scared of a blind spot in array math :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, yes, I think this is safe
neo/rawio/blackrockrawio.py
Outdated
| segmentation_report_message += "+-----------------+-----------------------+-----------------------+\n" | ||
| warnings.warn(segmentation_report_message) | ||
|
|
||
| for seg_index, seg_start_index in enumerate(segment_start_indices): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a little unclear. why do you have index index for your naming. Could we make this naming a little clearer for the future :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this could be improved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improved.
I think that shapes should have always been specified as tuples and using a scalar was the odd case that they are now banning. We can check if that notation works for the smallest version of numpy that we support (which is?) if you want to be extra careful. |
It's probably fine. I'm just being overly cautious. We go down to 1.24, but we only test on 1.26 for the IOs. You're probably right that we were doing it incorrectly before. So switching to correct shouldn't lead to problems. |
Ok. |
neo/rawio/blackrockrawio.py
Outdated
| # This is read as an uint32 numpy scalar from the header so we transform it to python int | ||
| header_size = offset or int(self._nsx_basic_header[nsx_nb]["bytes_in_headers"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't we get in this discussion last time that this fails in the case that an offset of 0 is given :) Or did you change more logic? somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I made the same mistake, dammit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed.
|
@cboulay, if you have a moment you can feel free to read through this. I know you're plugged into a couple issues we are working through with blackrock, so I figure it's better to keep you in the loop. My plan is to get this merged by tomorrow as it is mostly helping us debug time gap issues for additional future work. |
|
Small delay, but this is on my list to do final review asap. |
|
After tests I will merge. |
See #1770
The summary of the gaps look like this:
