-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Investigation] The MSYS2 + meson + Python crash issue #17415
Comments
Short summary of bisecting the issue with upstream CPython in GHA:
|
I've reduced the reproducer to no longer include MSYS2 now: https://github.com/lazka/python-crash-test/blob/70349faa496cdf19c4bd5152435542542eb7a14f/.github/workflows/main.yml |
I've filed a bug report: python/cpython#105400 but I don't expect much from it, but who knows. I'm also not sure how to proceed on the MSYS2 side of things. Prioritizing a 3.11 update would be one option. |
Yes, this issue does not happen "Locally, outside of CI". |
OK. Is it possible to know the actual CI setup (VMM name/version, the host CPUID, the host OS) ? |
@eli-schwartz, sure, I saw the defined action. Well, this says thing about the Guest: But no configuration of the lower levels – the VMM/hypervisor, the Host OS/hardware. EDIT: Anyway, I do not even understand it it's possible to download/build the exact disk image of the Guest OS to try it at locally. |
I have no idea how people generally find out information about the host hardware and hypervisor used on Github's servers, and would not assume that someone who ran a workflow could find that out. |
Interesting that it's specifically |
Small update. I've reviewed meson's One important thing I noticed that the issue happens only with I think next step would be to remove meson and ninja (if possible) from the reproducer. I tested with script that only runs But my fear is that since it doesn't happen with Python 3.11 there will not be much interest (if at all) to find the root cause of this. I think during process exit some resources/fd are not in a valid state which causes the issue. |
There's a lot tied into that one these days... Windows error reporting, JIT debugging, etc only happen when SEM_NOGPFAULTERRORBOX is not set.
I hit this issue without mingw or llvm-nm being involved (meson running C binaries compiled with msvc, via msbuild, not ninja). SEM_NOGPFAULTERRORBOX was the relevant factor, SEM_FAILCRITICALERRORS did not matter. Unfortunately the failure rates in my case are too low to make for good test case, and I hadn't been able to increase them meaningfully. From writing this up for postgres:
Which to me strongly hints that the issue is somewhere in the windows runtime. Possibly the explanation for why the issue is visible in CI but not, so far, locally, is due to different version of the windows runtime being used. |
Hi! Writing here since that question may also help with this issue I am experiencing a deadlock in python when running Regarding this issue, I'd try installing a vectored exception handler which simply calls Sleep() with a super long timeout. Then, when python hangs due to the exception handler, attach gdb (or lldb) to the process via SSH and gather a stacktrace. Vectored exception handlers always run on top of the normal stack, no unwind is done, so the backtrack would give us all the infos we need. Not entirely sure if handlers are also run during process teardown (ExitProcess), but most probably yes. |
Would it be possible to update Python package to 3.11? It is already at 3.11.4 and 3.10 is in "security fixes" only stage. Arch also provides 3.11 by default, but of course they have also other version available in repo. |
I can confirm that a mingw build of 3.11 also doesn't crash: msys2-contrib/cpython-mingw#139 (comment) |
There is a longstanding bug with random crashes of Python 3.10 on CI. See: python/cpython#105400 msys2/MINGW-packages#11864 msys2/MINGW-packages#17415
There is a long-standing bug with random crashes of Python 3.10 on CI. See: python/cpython#105400 msys2/MINGW-packages#11864 msys2/MINGW-packages#17415
There is a long-standing bug with random crashes of Python 3.10 on CI. See: python/cpython#105400 msys2/MINGW-packages#11864 msys2/MINGW-packages#17415
There is a long-standing bug with random crashes of Python 3.10 on CI. See: python/cpython#105400 msys2/MINGW-packages#11864 msys2/MINGW-packages#17415
There is a long-standing bug with random crashes of Python 3.10 on CI. See: python/cpython#105400 msys2/MINGW-packages#11864 msys2/MINGW-packages#17415
There is a long-standing bug with random crashes of Python 3.10 on CI. See: python/cpython#105400 msys2/MINGW-packages#11864 msys2/MINGW-packages#17415
This reverts commit e945f35. With MSYS2 udpating to Python 3.11, this should no longer be needed. See msys2/MINGW-packages#17415 (comment)
This reverts commit e945f35. With MSYS2 udpating to Python 3.11, this should no longer be needed. See msys2/MINGW-packages#17415 (comment)
This reverts commit e945f35. With MSYS2 udpating to Python 3.11, this should no longer be needed. See msys2/MINGW-packages#17415 (comment) (cherry picked from commit 3752041)
This reverts commit e945f35. With MSYS2 udpating to Python 3.11, this should no longer be needed. See msys2/MINGW-packages#17415 (comment) (cherry picked from commit 3752041)
This reverts commit e945f35. With MSYS2 udpating to Python 3.11, this should no longer be needed. See msys2/MINGW-packages#17415 (comment) (cherry picked from commit 3752041)
Make sure Python is updated to >=3.11 (fix msys2/MINGW-packages#17415).
Make sure Python is updated to >=3.11 (fix msys2/MINGW-packages#17415).
This reverts commit e945f35. With MSYS2 udpating to Python 3.11, this should no longer be needed. See msys2/MINGW-packages#17415 (comment)
It was added in 13fe2e0, but it's now unnecessary as the issue has been fixed. See msys2/MINGW-packages#17415
It was added in 13fe2e0, but it's now unnecessary since the issue has been fixed. See msys2/MINGW-packages#17415
Should this be closed? |
Let's hope it doesn't come back :) |
Make sure Python is updated to >=3.11 (fix msys2/MINGW-packages#17415).
Another issue (old one: #11864) to collect some information and what was tried so far.
I've created a small repo for reproducing the issue: https://github.com/lazka/python-crash-test
Any ideas regarding what we could try welcome.
The issue
meson fails with STATUS_ACCESS_VIOLATION sometimes like this when being called from ninja, and we haven't found a way to reproduce it locally:
This below is now out of date, see the followup answers for the cause
When has it started
Where it failed so far
It happened in:
It happend with:
Where it hasn't failed so far
What makes the error go away
MSYS=winjitdebug
What doesn't make the error go away
MSYSTEM=
doesn't help.Bisecting:
Testing with official CPython builds:
Error started: v3.10.0a3...v3.10.0a4
Fixed: v3.11.0a3...v3.11.0a4
Bisect, see next post.
The text was updated successfully, but these errors were encountered: