Fix resampling of gapped recordings.#4499
Fix resampling of gapped recordings.#4499grahamfindlay wants to merge 2 commits intoSpikeInterface:mainfrom
Conversation
h-mayorquin
left a comment
There was a problem hiding this comment.
This makes sense to me. I can't still fully wrap my head around exactly how resampling goes bad at the gaps but of course we should account for them. Thanks as wel for taking a look at the neo implmeentation regarding gaps it is great to unify the API.
Some comments on the implementation:
-
Pre-allocate memory instead of concatenating. In
_get_traces_gapped, you know the output size upfront (end_frame - start_frame). Pre-allocating a single array and filling slices would avoid the intermediate allocations from appending topiecesand the finalnp.concatenate. This is the pattern we use elsewhere, e.g. inNoiseGeneratorRecording. -
Unify the code paths. Rather than having
get_tracesdispatch to_get_traces_gapped, could_get_traces_gappedjust become the singleget_traces? A gap-free recording is simply one section spanning the entire trace, so the gapped logic handles it naturally with one iteration of the loop. This would eliminate the duplication between the two methods. -
Reuse
get_chunk_with_margin. The margin clamping, fetching, and reflect-padding logic in_get_traces_gappedreimplements whatget_chunk_with_marginalready does. The difference is that you clamp to section boundaries instead of recording boundaries, but I think this could be handled by passing the section length as the effective boundary. Worth exploring to avoid duplicating that logic. -
Extract gap detection into a shared utility. The gap detection and section boundary computation (~30 lines in
__init__) could live as a standalone function inrecording_tools.py, something likedetect_sections(time_vector, sampling_frequency, gap_tolerance_ms)that returns the(n_sections, 2)boundary array. This would make the__init__easier to read and the utility reusable if other preprocessing steps that use filters (e.g.,FilterRecording) need gap awareness later. What do you think? -
Tests. The coverage is solid. A couple of minor notes: there's no test for integer ratio with gaps on traces (the trace-correctness tests all use non-integer ratio), and
test_resample_traces_across_gapaccesses private attributes (_sec_n_out) to find the split point between sections. You could derive that from the output time vector instead (find the gap in resampled timestamps), which would decouple the test from the internal representation.
All of these are strict improvements and I think the current implementation is good to go as-is. We can track these as a follow-up issue. The only one that I kind of feel should be done here is the pre-allocation (1) and maybe the tests (5) but I will leave that to @alejoe91 and you to decide.
This handles an edge case that I should have thought of in #4429 : When a recording containing gaps is resampled, interpolation of timestamps and data should never span a gap. They should be done on a section-by-section basis, where "section" = "a gapless (contiguously sampled) stretch of data" (I chose the term "section" to avoid confusion with
neo/spikeinterface"blocks" or "segments", which are are not required to be gapless).This PR does that, and introduces a new
gap_tolerance_msparameter toresample()that controls the behavior:I tried to make this analogous to and consistent with the
gap_tolerance_msparameter inneo, discussed [here] (NeuralEnsemble/python-neo#1773) by @h-mayorquin and @samuelgarcia and implemented in this PR. So the behavior for the following values ofgap_tolerance_msis:None(default): If timestamp gaps are detected in the parent timevector, a
ValueErroris raised with a report of the gaps (count, sizes,positions).
<value>(e.g.,1.0): Opt in to section-wise resampling. Gapslarger than the threshold trigger section splitting; smaller gaps are
treated as continuous.
0.0: Strict mode — split on any gap >= 1.5 sample periods.Internally, gap detection identifies contiguous sections of samples.
Each section is resampled independently --
both timestamps and traces -- then results are concatenated. Margins for
edge-artifact reduction are clamped to section boundaries so that FFT
processing never crosses a gap. This is the most obnoxious part. I really wanted to find a way to let
scipy.signalhandle reflect padding in this case, but I could not, so I do it, and that adds a fair bit of complexity to the code.When no gaps are detected (eg if the recording has no time vector), the existing code path executes unchanged.
I tested this on a bunch of my own gapped recordings, and it seems to work. Then I had my buddy Claude ;) write a bunch of tests (included here) based on those, using synthetic gapped recordings.
Because of the antialiasing filter applied during
decimate(), a similar approach should technically be taken there (so the AA filter applied to one section isn't influenced by data from a neighboring section), but the scope of this PR is just limited toresample(). Errors introduced by gaps indecimate()should be small and mostly negligible.