-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SpykingCircus2 clustering crash at template estimation #3722
Comments
I never tested with 4096 channels, so I should have a look. This estimate_template() function is preallocating a big matrix of n_channels x n_templates x num_time_samples. So if n_templates is large, then this could be large. However, assuming you have roughly 5000 templates, this should fit in memory. I'll try some tests, but with 125Go of RAM it should work smoothly |
note that total_memory will not really work here, this is only for chunk preprocessing, not for the clustering step. Is this obtained with the main branch? No idea whatsoever on the number of templates found? If recording not too big, I could have a look, let me know while I'll do some tests with synthetic data |
This is what I had gathered from some issues with computing PCs during quality metrics, with similar memory usage.
Yes, I checked with the latest commits.
With HerdingSpikes I get around 2000 units from that recording.
I have a minute-long recording that's just under 10 GB, would that be ok ? |
Perfect, please share it with me such that I can troubleshoot |
Here's the link : https://filesender.renater.fr/?s=download&token=25c4ac59-a7f6-45db-8eb5-ca153d154ef2 Then :
|
The problem, I think, might be the estimate_template() function. Indeed, this functions, when used in a parallel context, allocates a giant array of size (num_workers, num_templates, num_channels, num_samples). This might be very large for very high density arrays. @samuelgarcia @alejoe91, what do you think? |
It looks like the giant array is indeed the problem. The sorter found 3768 raw units before cleaning, so that's 3768 clusters x 40 cores x 4096 channels x 80 samples, x 4 bytes for float32 which means about 184 GB for that array, on top of what's already loaded in memory. I ran it again on a single core, no more OOM errors, but it took almost 15 hours to process a minute-long recording. I do get a bunch of If there could be some functional equivalent of |
Yes, I'm working on a patch, trying to bypass this step and thus avoid this memory bottleneck. When you said you ran it on a single core, you mean the whole software or just this step. Because only estimate templates need to be launched on a reduced number of cores to avoid memory saturation. That could also be one option (use max cores for all steps except here, reduce as function of memory) |
For now I just set the |
I'm also getting a bunch of |
the scipy.optimize warnings should not be a major deal, and I'll make a PR to try to adjust automatically the number of workers during template estimation, before something more robust and stable |
I've tried a patch in the branch total_duration. See #3721 |
I've checked, and the patch is working for now, allowing to keep the speed. Please use it while I'll keep working on a deeper patch |
However, I must say that while it seems to run, this is not at the speed of light. I'll keep your 1min long file as an example to optimize everything |
I checked it out, I'll let it run overnight to be sure but it has already passed the point where it used to crash. If shared access of a single array is too tricky, I'd love to see the same system applied to some of the quality metrics calculations, especially the PCA-based ones, as they tend to also pre-allocate big arrays from what I can tell. On an unrelated note, as I understand it SC2 creates a sorting analyzer in its final steps to do some curation. Would it make sense to be able to save that analyzer to disk without everything else saved by the |
For concurrent write to shared array, we'll have to wait for @samuelgarcia inputs. But what I did here is a simple hack: before estimate_template, I adapt the number of jobs as function of the RAM requested and available. I agree that maybe, this should be a generic option and go directly into this estimate_template() function, such that all sub functions using it will benefit from that and avoid crashing (maybe PCA, but not only). We'll discuss that, if no better options to make this function robust w.r.t. sizes /number of channels. Yes, SC2 creates an analyzer internally, and indeed, I could save it with a special option (could be debug or something else). However, the API of run_sorter currently is to return a sorting, so it will be up to the user to go in the created folder to grasp this analyzer. But this is a good idea. The analyzer also won't have waveforms, but again, this can be added if needed by users. I'll think about that and make a PR |
Seems like there's another hangup further down the line :
I'ill try to trace it down properly but my guess is maybe
Agreed, this is partly what I had mentionned in #3525 : it looks like a lot of sorters already compute things that could be useful for downstream analysis, only for them to be discarded after sorting and then recomputed. Having the ability to get an analyzer with some preloaded extensions could save some time. |
Yes, I'll keep debugging and making the software work for such a number of channels. At least, in this PR, I've also reduce the memory footprint for the SVD, and now as you are requesting, the final analyzer is saved |
I tried again with the newer version of the PR, but I'm having some trouble replicating what I had before : if I pass the number of cores directly to the sorter with If I pass the
And still pass the arguments to the sorter, parallelization happens but I still get another crash down the road. As far as I can tell, There might be a situation where the amount of available memory is so low that the patch only selects a single core, but as far as I know not all steps of the sorting pipeline make use of that adjustment. |
It looks like |
I think I found the source of the issue : in If those weren't set previously, it reverts to the default value of a single core being used. It would probably need something like :
Or |
Indeed, sorry for that. But this branch will still not work on your data. I'll try to finish one with a new clustering that would avoid the cleaning of the dict, and the reestimation of the templates from the data. Thus, this should make it work for your large arrays. I'm on it |
At least, I brought back the get_optimal_n_jobs (sorry for the mistake) and your fix for set_optimal_chunk_size. Thanks a lot ! Let's make the code work (quickly) on 4225 channels ! |
I'm trying to run SC2 on 3Brain HD-MEA data (4096 channels, 20 kHz) with mostly default parameters :
The only trace I get is :
I've been able to trace the crash back to a call to
estimate_templates
(here) which then seems to callestimate_templates_with_accumulator
.From what I could gather this looks like an out of memory error, but I've never seen something quite like this with other OOM Python issues.
The GUI monitor shows a modest 17x10⁶ TB being used :
And
dmesg
shows the following :I'm not sure if this is normal behavior and there are simply too many redundant units or if there's an actual issue with memory handling.
I've tried passing
total_memory
to both the sorters'sjob_kwargs
and SpikeInterface's globaljob_kwargs
, but I'm not sure it's taken into account when not dealing with the recording itself.The text was updated successfully, but these errors were encountered: