-
-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ECL test sometimes fails to create maxima directory #26968
Comments
comment:1
It would be nice to give every test process its own |
comment:2
I have also argued in the past that the tests should use a clean |
comment:3
The only reason I could think of is that some weird configurations wouldn't be tested. But I would think that is more of a feature: Having doctests depend on the user configuration makes everything much more difficult and relying on users to pick up issues when testing with their configuration doesn't sound like a particularly good practice either. |
comment:4
Replying to @timokau:
Yes, if there are any bugs resulting from user configuration those should be then reported by users and captured as a regression test: Tests should be isolated as possible and reproducible. |
comment:5
Do you happen to know how practical it would be to implement then? As I mentioned earlier, it seems like |
comment:6
I think it would be quite practical; just nobody's been sufficiently motivated enough before now. Probably somewhere in |
comment:7
But that would set |
comment:8
No, not if it were done after fork. |
comment:9
Retarging tickets optimistically to the next milestone. If you are responsible for this ticket (either its reporter or owner) and don't believe you are likely to complete this ticket before the next release (8.7) please retarget this ticket's milestone to sage-pending or sage-wishlist. |
comment:10
Replying to @embray:
I now remembered why that wouldn't work. There are other variables that depend on |
comment:11
Although that wouldn't even fix this issue since |
comment:12
I still haven't looked at it that deeply but I suspect you're overthinking it. Though the concern about |
comment:13
Repeating my comment I accidentally posted in #22652 here: Replying to @embray:
Which is admittedly way less occurrences than I expected. Still, those would need to be re-evaluated or their usage changed.
#22652 would be amazing. I don't see why it would necessary fix these issues though. Of course it would make sense to tackle both issues at the same time.
Speaking of narrow fixes, for this specific use-case, for this particular issue that should do it (how do I get that nice patch highlighting on trac?):
I'll probably adopt that for nix. I don't want to fix the same issue on our build server every week. Should I submit that upstream or is that too "hacky"? |
comment:14
I ended up with NixOS/nixpkgs#54285. |
comment:15
Replying to @timokau:
On the line right after the triple braces |
comment:16
You seem too much focusing on the doctest framework here: this is not a bug in the doctest framework, so it shouldn't be fixed in the doctest framework. Ideally, this should be reported to ECL upstream. |
comment:17
They seem to assume only one ecl process is running at a time, which may or may not be a bug. |
comment:18
Ticket retargeted after milestone closed (if you don't believe this ticket is appropriate for the Sage 8.8 release please retarget manually) |
comment:19
As the Sage-8.8 release milestone is pending, we should delete the sage-8.8 milestone for tickets that are not actively being worked on or that still require significant work to move forward. If you feel that this ticket should be included in the next Sage release at the soonest please set its milestone to the next release milestone (sage-8.9). |
comment:20
I'm getting a similar failure while building doc-html |
This bug is still biting, see e.g. https://github.com/void-linux/void-packages/actions/runs/4169062901/jobs/7216578712 Could we workaround this? E.g. create the directory before ECL runs Is this reported as a bug in ECL? |
I mean something like the following (proof of concept, the computation of --- a/src/sage/interfaces/maxima_lib.py
+++ b/src/sage/interfaces/maxima_lib.py
@@ -116,6 +116,20 @@ ecl_eval("(in-package :maxima)")
ecl_eval("(setq $nolabels t))")
ecl_eval("(defvar *MAXIMA-LANG-SUBDIR* NIL)")
ecl_eval("(set-locale-subdir)")
+
+# Compute `*maxima-objdir*` as in `(set-pathnames)`.
+# See: src/init-cl.lisp in maxima source code.
+import os
+maxima_objdir = os.path.join(
+ os.environ.get("MAXIMA_USERDIR"), "binary",
+ ecl_eval("(maxima-version1)").python()[1:-1],
+ ecl_eval("*maxima-lispname*").python()[1:-1],
+ ecl_eval("(lisp-implementation-version1)").python()[1:-1]
+ )
+# Create *maxima-objdir* before calling (set-pathnames) to avoid a race
+# in ecl's implementation of ensure-directories-exist. See #26968.
+os.makedirs(maxima_objdir, exist_ok=True)
+
ecl_eval("(set-pathnames)")
ecl_eval("(defun add-lineinfo (x) x)")
ecl_eval('(defun principal nil (cond ($noprincipal (diverg)) ((not pcprntd) (merror "Divergent Integral"))))') An alternative is to first run |
Here's a small script to reproduce the issue: from sage.all import *
import os, sys, tempfile
def fork_and_try_import():
pid = os.fork()
if not pid:
try:
import sage.interfaces.maxima_lib
except RuntimeError as e:
print("RuntimeError:", e)
os._exit(1)
os._exit(0)
return pid
def run_test(d, num=5):
failed = 0
for i in range(num):
with tempfile.TemporaryDirectory(dir=d) as tmpdir:
print(f"Run #{i} in {tmpdir}")
os.environ['MAXIMA_USERDIR'] = os.path.abspath(tmpdir)
pid1 = fork_and_try_import()
pid2 = fork_and_try_import()
failed += os.waitpid(pid1, 0)[1] != 0
failed += os.waitpid(pid2, 0)[1] != 0
print(f"Failed tests: {failed}/{num}")
if len(sys.argv) > 1:
d = sys.argv[1]
else:
d = None
with tempfile.TemporaryDirectory(dir=d) as tmpdir:
print(f"Running test in {tmpdir}")
run_test(tmpdir) This is still a race so it won't reproduce every time, depending on the speed of your filesystem. However, I have better luck reproducing this on either on NFS or on SSHFS, since the network latency gives plenty of time for the race to trigger. By default this uses
My home dir here is NFS:
Here's how to do it using sshfs (change
|
A direct way to reproduce the doctest failures is as follows. Assuming
Note this will doctest the same file twice in parallel. Usually one of the two will fail the first time. After that the maxima directory will be created so no race anymore: you have to For the same reason, you may be able to reproduce this if your Now that I can easily reproduce it, I'll PR a workaround. |
We use a temporary `MAXIMA_USERDIR` so its empty, and we try to initialize maxima twice in parallel to entice the race. This temporary dir is placed within `DOT_SAGE` so it is easy to try different filesystems. The bug triggers more frequently if `DOT_SAGE` is in a high latency filesystem (e.g. sshfs on a non-local host). The next commit introduces a workaround for the bug.
When maxima is initialized a bug in ecl implementation of `ensure-directories-exist` might result in a runtime error. As a workaround, in case we get a runtime error we use python to create the directory and continue with maxima initialization. Note that for normal usage the directory will already exist within the user's `DOT_SAGE` so this code will almost never run. However, when running doctests on CI this occasionally triggers.
We run a new instance of sage in a subprocess to ensure maxima is not already initialized. We use a temporary MAXIMA_USERDIR so its empty, and we try to initialize maxima twice in parallel to entice the race. This temporary dir is placed within `DOT_SAGE` so it is easy to try different filesystems. The bug triggers more frequently if `DOT_SAGE` is in a high latency filesystem (e.g. sshfs on a non-local host). The next commit introduces a workaround for the bug.
When maxima is initialized a bug in ecl implementation of `ensure-directories-exist` might result in a runtime error. As a workaround, in case we get a runtime error we use python to create the directory and continue with maxima initialization. Note that for normal usage the directory will already exist within the user's `DOT_SAGE` so this code will almost never run. However, when running doctests on CI this occasionally triggers.
When maxima is initialized a bug in ecl implementation of `ensure-directories-exist` might result in a runtime error. As a workaround, in case we get a runtime error we use python to create the directory and then continue with maxima initialization. Note that for normal usage the directory will already exist within the user's `DOT_SAGE` so this code will almost never run. However, when running doctests on CI this occasionally triggers.
gh-35195: Workaround for an ecl race in maxima init ### 📚 Description When maxima is initialized a bug in ecl implementation of `ensure-directories-exist` might result in a runtime error. As a workaround, in case we get a runtime error we use python to create the directory and continue with maxima initialization. Note that for normal usage the directory will already exist within the user's `DOT_SAGE` so this code will almost never run. However, when running doctests on CI this occasionally triggers. #### New doctest The first commit introduces a doctest to try to catch this race. We run a new instance of sage in a subprocess to ensure maxima is not already initialized. We use a temporary `MAXIMA_USERDIR` so its empty, and we try to initialize maxima twice in parallel to entice the race. This temporary dir is placed within `DOT_SAGE` so it is easy to try different filesystems. The bug triggers more frequently if `DOT_SAGE` is in a high latency filesystem (e.g. sshfs on a non-local host). Closes #26968. ### 📝 Checklist <!-- Put an `x` in all the boxes that apply. --> <!-- If your change requires a documentation PR, please link it appropriately --> <!-- If you're unsure about any of these, don't hesitate to ask. We're here to help! --> - [x] I have made sure that the title is self-explanatory and the description concisely explains the PR. - [x] I have linked an issue or discussion. - [x] I have created tests covering the changes. URL: #35195 Reported by: Gonzalo Tornaría Reviewer(s): Gonzalo Tornaría, Matthias Köppe
Calling initialize-runtime-globals will run set-pathnames and be subject to the issue described in sagemath#26968. Thus the workaround introduced in sagemath#35195 has to be done before anything that may call set-pathnames (e.g. initialize-runtime-globals).
Make sure sagemath#26968 is not unfixed after sagemath#35707.
I just got this test failure (and I'm pretty sure I've seen it before):
This cannot be reliably reproduced. Probably some parallelism issue (tests were run with 4 threads).
CC: @embray @saraedum @jdemeyer
Component: doctest framework
Keywords: random_fail ecl
Issue created by migration from https://trac.sagemath.org/ticket/26968
The text was updated successfully, but these errors were encountered: