Skip to content

Conversation

@h-mayorquin
Copy link
Contributor

@h-mayorquin h-mayorquin commented Sep 8, 2025

See #1770

The summary of the gaps look like this:
image

@h-mayorquin h-mayorquin changed the title blackrock improved message Blackrock add summary of automatic data segmentation Sep 8, 2025
# read nsx headers
nsx_header_reader = self._nsx_header_reader[spec_version]
self._nsx_basic_header[nsx_nb], self._nsx_ext_header[nsx_nb] = nsx_header_reader(nsx_nb)

Copy link
Contributor Author

@h-mayorquin h-mayorquin Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to better document how the sampling frequency of each stream is calculated according to the spec.

("timestamps", "uint64"),
("num_data_points", "uint32"),
("samples", "int16", self._nsx_basic_header[nsx_nb]["channel_count"]),
("samples", "int16", (self._nsx_basic_header[nsx_nb]["channel_count"],)),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removes a warning from numpy, raw number as shapes are deprecate and will be removed:

FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. npackets = int((filesize - offset) / np.dtype(ptp_dt).itemsize)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And based on our testing this change is also backward compatible or should we test this over a greater range of numpy versions than our current skinny IO tests?

@h-mayorquin h-mayorquin marked this pull request as ready for review September 9, 2025 03:00
Copy link
Contributor

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few initial comments/questions.

Comment on lines +336 to +337
# E.g. it is 1 for 30_000, 3 for 10_000, etc
nsx_period = self._nsx_basic_header[nsx_nb]["period"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this 30_000 true for all versions? I vaguely remember that different versions had different resolutions but maybe I'm wrong?

And seems like this was hardcoded the same way below before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's how the spec defines it and the history of the spec does not mention changes on that value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. I just couldn't remember.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can confirm, the nominal base rate has always been 30 kHz, and this is the rate used by sample groups 5 and 6. Other sample groups are just sub-sampling this base rate (hopefully with anti-aliasing built in).

else:
t_start = timestamps / ts_res
t_stop = max(t_stop, t_start + length / self.sig_sampling_rates[nsx_nb])
t_stop = max(t_stop, t_start + length / self._nsx_sampling_frequency[nsx_nb])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be explained to me why would the length lead to being longer than the the t_stop itself? Maybe I need to read the whole reader to understand this, but if there is a two second explanation that would be great!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is overly-defensive programming to say that t_stop should be larger than 0, check the whole block:

                t_stop = 0.0
                for nsx_nb in self.nsx_to_load:
                    spec = self._nsx_spec[nsx_nb]
                    if "timestamp_resolution" in self._nsx_basic_header[nsx_nb].dtype.names:
                        ts_res = self._nsx_basic_header[nsx_nb]["timestamp_resolution"]
                    elif spec == "2.1":
                        ts_res = self._nsx_params[spec](nsx_nb)["timestamp_resolution"]
                    else:
                        ts_res = 30_000
                    period = self._nsx_basic_header[nsx_nb]["period"]
                    sec_per_samp = period / 30_000  # Maybe 30_000 should be ['sample_resolution']
                    length = self.nsx_datas[nsx_nb][data_bl].shape[0]
                    if self._nsx_data_header[nsx_nb] is None:
                        t_start = 0.0
                        t_stop = max(t_stop, length / self._nsx_sampling_frequency[nsx_nb])
                    else:
                        timestamps = self._nsx_data_header[nsx_nb][data_bl]["timestamp"]
                        if hasattr(timestamps, "size") and timestamps.size == length:
                            # FileSpec 3.0 with PTP -- use the per-sample timestamps
                            t_start = timestamps[0] / ts_res
                            t_stop = max(t_stop, timestamps[-1] / ts_res + sec_per_samp)
                        else:
                            t_start = timestamps / ts_res
                            t_stop = max(t_stop, t_start + length / self._nsx_sampling_frequency[nsx_nb])

writing 0 directly would have been more readable than defining the variable though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree because we define t_stop in all branches right? So we don't need to guard with an initial t_stop. That was definitely my confusion. We could always update that later. It's not crucial.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do it now, we both already read and understood and we think this could be improved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget about it, the loop above is properly sort of aligning the t_stops across the nsx segments.

So, yes, let's improve this later.

("timestamps", "uint64"),
("num_data_points", "uint32"),
("samples", "int16", self._nsx_basic_header[nsx_nb]["channel_count"]),
("samples", "int16", (self._nsx_basic_header[nsx_nb]["channel_count"],)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And based on our testing this change is also backward compatible or should we test this over a greater range of numpy versions than our current skinny IO tests?

# We convert this indices to actual timestamps in seconds
raw_timestamps = struct_arr["timestamps"]
timestamps_sampling_rate = self._nsx_basic_header[nsx_nb]["timestamp_resolution"] # clocks per sec uint64 or uint32
timestamps_in_seconds = raw_timestamps / timestamps_sampling_rate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this division lead to any issues uint/float should just be float right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, let me double-check the casting rules to see if nasty surprises could appear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that the unsigned integers will be cast to float before dividing so I don't expect anything here, can you think on something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I didn't have anything in mind, but since we've dealt with so many random overflows lately especially on the Neo side with the changes in numpy 2.0 I'm just scared of a blind spot in array math :/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, yes, I think this is safe

segmentation_report_message += "+-----------------+-----------------------+-----------------------+\n"
warnings.warn(segmentation_report_message)

for seg_index, seg_start_index in enumerate(segment_start_indices):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a little unclear. why do you have index index for your naming. Could we make this naming a little clearer for the future :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this could be improved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved.

@h-mayorquin
Copy link
Contributor Author

And based on our testing this change is also backward compatible or should we test this over a greater range of numpy versions than our current skinny IO tests?

I think that shapes should have always been specified as tuples and using a scalar was the odd case that they are now banning. We can check if that notation works for the smallest version of numpy that we support (which is?) if you want to be extra careful.

@zm711
Copy link
Contributor

zm711 commented Sep 10, 2025

And based on our testing this change is also backward compatible or should we test this over a greater range of numpy versions than our current skinny IO tests?

I think that shapes should have always been specified as tuples and using a scalar was the odd case that they are now banning. We can check if that notation works for the smallest version of numpy that we support (which is?) if you want to be extra careful.

It's probably fine. I'm just being overly cautious. We go down to 1.24, but we only test on 1.26 for the IOs. You're probably right that we were doing it incorrectly before. So switching to correct shouldn't lead to problems.

@h-mayorquin
Copy link
Contributor Author

And based on our testing this change is also backward compatible or should we test this over a greater range of numpy versions than our current skinny IO tests?

I think that shapes should have always been specified as tuples and using a scalar was the odd case that they are now banning. We can check if that notation works for the smallest version of numpy that we support (which is?) if you want to be extra careful.

It's probably fine. I'm just being overly cautious. We go down to 1.24, but we only test on 1.26 for the IOs. You're probably right that we were doing it incorrectly before. So switching to correct shouldn't lead to problems.

Ok.

Comment on lines 1057 to 1058
# This is read as an uint32 numpy scalar from the header so we transform it to python int
header_size = offset or int(self._nsx_basic_header[nsx_nb]["bytes_in_headers"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we get in this discussion last time that this fails in the case that an offset of 0 is given :) Or did you change more logic? somewhere else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I made the same mistake, dammit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

@zm711
Copy link
Contributor

zm711 commented Sep 11, 2025

@cboulay, if you have a moment you can feel free to read through this. I know you're plugged into a couple issues we are working through with blackrock, so I figure it's better to keep you in the loop. My plan is to get this merged by tomorrow as it is mostly helping us debug time gap issues for additional future work.

@zm711
Copy link
Contributor

zm711 commented Sep 14, 2025

Small delay, but this is on my list to do final review asap.

@zm711
Copy link
Contributor

zm711 commented Sep 15, 2025

After tests I will merge.

@zm711 zm711 merged commit 72ec76a into NeuralEnsemble:master Sep 15, 2025
3 checks passed
@zm711 zm711 added this to the 0.14.3 milestone Sep 15, 2025
@h-mayorquin h-mayorquin deleted the add_informative_message_to_blackrock branch September 15, 2025 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants