-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NEST-3.0] crashes when built with GLIBCXX_ASSERTIONS
defined
#2101
Comments
Disabling our build flags prevents the crash, so it's a downstream build issue. I'll isolate the flag causing the break and report back. |
GLIBCXX_ASSERTIONS
defined
More information on the flag here: https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_macros.html For the time being, I've removed this flag from the Fedora build. If possible, it'll be good to include this flag in the list of default flags that NEST is built with (at least in CI, if not in the released binaries). |
With the examples nest-simulator/pynest/pynestkernel.pyx Line 623 in 8b6a34c
It fails because the vector is empty, so getting a pointer to the first element is impossible. I assume that because the size of the vector is zero, memcpy immediately returns and therefore avoids a segfault when not compiled with the extra assertions.
However, building with I'll try to look into the problem and see if we can add |
@hakonsbm Any news on this? @sanjayankur31 Is it really |
@heplesser I'm still planning to look into it. |
There's a leading underscore: These are the default compiler flags that all Fedora Linux packages are built with (for an x86_64 system):
|
Since I also use Fedora, tried with
Resulted in some installcheck errors:
Find attached the installcheck log file: |
The failing tests (unittests, mpitests) point to real problems which we need to investigate (see installcheck.log file uploaded by Nilton). Strangely, the pytests did not run, why test_blockvector is shown in the report above I do not quite understand. The output at the very end of the installcheck is very strange. Nilton, could you post the "test setup summary" that is printed at the beginning of the "make installcheck" run, before the actual tests start? @terhorstd @jougs @hakonsbm I added this to MS 3.2 to not delay 3.1, but it would still be good if we got it fixed asap. |
Just a note: there are container images for Fedora available if someone needs a Fedora like installation to debug this. (One can install necessary packages etc. using |
@heplesser Pytests didn't run because the |
I have now intalled pytest-timeout. I have also checked on the unittests failures, and fixed part of them. I will open a PR shortly. Here is the intallcheck summary:
And attached is the intallcheck log. |
@niltonlk Thanks for your efforts, I am looking forward to your PR! Interestingly, you have only 498 pynest tests (vs 663 otherwise) and none of the 166 "pynesttest mpi 2". |
Just noticed that I forgot to allow oversubscribe... will redo the intallcheck and report back. |
The number of pynest tests seems to be related to the Without
|
The raw Python test output in the installcheck log is quite garbled, since we run tests in parallel. To get a more readable log, change nest-simulator/testsuite/do_tests.sh Line 474 in 98b2fb4
The two-process MPI Python tests seem to not report because the last of them seems to crash. You could try to add nest-simulator/testsuite/do_tests.sh Line 487 in 98b2fb4
but I have not tested if pytest-xdist works properly when run under BTW, you can also add the |
There was an error in the conditional expression in pulsepacket_generator. Fixing it, corrected
|
@niltonlk That are good news, although I am surprised that also tests that failed before when the assertions were activated and that are entirely unrelated to the pp-generator, now pass. The two tests that still fail do so because of timeouts. I find this a bit curious, since the timeouts are set to 120s and the total runtime for the pynesttest is only 178s. Since the tests run concurrently, this is not entirely implausible. Could you try to run those two tests directly with pytest and see how long they take? |
Looking at the log, both fails seems to be due to
In addition, the above mentioned examples now works!
|
With numprocesses=1 all tests pass:
|
Nice! It might be that the pytest time-out mechanism and the concurrent testing mechanism do not interact entirely well. What I find most surprising maybe is that some of the cpptests for blockvector failed originally and do not longer fail now. Those tests should be completely independent from the fixes you made in #2159. |
My mistake. The last results were without the -D_GLIBCXX_ASSERTIONS flag. I picked the wrong cmake command from history. Sorry for the mess. |
Now I have Surprisingly, the number of reported tests differs if pytest uses 1 process (numprocesses=1) or all available processes (numprocesses=auto):
Another thing is that the pynest examples above (recording_demo.py and balancedneuron.py) fails.
if you call |
@niltonlk Thank you for your detective work! And in a way good to know that the errors did not disappear by magic ;). The varying number of tests may be due
Would you be able to join us for the Open NEST Developer Video Conference this Monday, 13 Sep, 11.30 CEST to discuss how to proceed on this issue? |
--boxed seems to be a backward compatibility alias for pytest-forked --forked I have tried with
About the Open NEST Developer Video Conference I will participate! |
I found where it was trying to access an empty vector. It was in:
Which I just changed to execute the memcpy if vector_ptr points to a non-empy vector (i.e. size() > 0). This time I made sure that the correct flag (-D_GLIBCXX_ASSERTIONS) was used. Installcheck output is as follow:
It seems that all non-mpi pynesttests were recovered with 2 of them (test_erfc_neuron.py and test_regression_issue-1409.py) failing due to timeout. Executing them independetly passes.
However, none of the mpi 2 pynesttests were recovered. In addition, the two examples (recording_demo.py and balancedneuron.py) that were reported to fail, now passes with this last modification. I hope to discuss about this on tomorrows Open NEST Developer meeting before commiting the chages. |
(Edited for clarity)
Describe the bug
When NEST 3.0 is built with
GLIBCXX_ASSERTIONS
, it crashes on running. For example, when running the simple examples from the documentation, one gets:To Reproduce
Steps to reproduce the behavior:
0. Build Nest 3.0 with
-D_GLIBCXX_ASSERTIONS
included in the compilation flags (enabled by default in Fedora)python recording_demo.py
Expected behavior
Should not crash
Screenshots
NA
Desktop/Environment (please complete the following information):
(in case you compiled the source code manually, provide the
"Installation Summary" that you see at the end of the
cmake step)
Additional context
The stacktrace from a build where the crash can be seen is here:
The complete backtrace is here: https://ankursinha.fedorapeople.org/nest-3/nest-3.0-backtrace.log
The text was updated successfully, but these errors were encountered: