ECL test sometimes fails to create maxima directory #26968

timokau · 2018-12-29T15:14:04Z

I just got this test failure (and I'm pretty sure I've seen it before):

File "/nix/store/br51f1j23cnxmbf19yn62rn5pp2rmqdh-sage-src-8.5/src/doc/ca/intro/index.rst", line 492, in doc.ca.intro.index
Failed example:
    solve([x + y == 6, x - y == 4], x, y)
Exception raised:
    Traceback (most recent call last):
      File "/nix/store/mfbhcnzdxiyj831pr3hg1wwh556a6lv3-python-2.7.15-env/lib/python2.7/site-packages/sage/doctest/forker.py", line 671, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/nix/store/mfbhcnzdxiyj831pr3hg1wwh556a6lv3-python-2.7.15-env/lib/python2.7/site-packages/sage/doctest/forker.py", line 1086, in compile_and_execute
        exec(compiled, globs)
      File "<doctest doc.ca.intro.index[1]>", line 1, in <module>
        solve([x + y == Integer(6), x - y == Integer(4)], x, y)
      File "/nix/store/mfbhcnzdxiyj831pr3hg1wwh556a6lv3-python-2.7.15-env/lib/python2.7/site-packages/sage/symbolic/relation.py", line 1107, in solve
        m = maxima(f)
      File "sage/misc/lazy_import.pyx", line 354, in sage.misc.lazy_import.LazyImport.__call__ (build/cythonized/sage/misc/lazy_import.c:3682)
        return self.get_object()(*args, **kwds)
      File "sage/misc/lazy_import.pyx", line 189, in sage.misc.lazy_import.LazyImport.get_object (build/cythonized/sage/misc/lazy_import.c:2347)
        return self._get_object()
      File "sage/misc/lazy_import.pyx", line 221, in sage.misc.lazy_import.LazyImport._get_object (build/cythonized/sage/misc/lazy_import.c:2584)
        self._object = getattr(__import__(self._module, {}, {}, [self._name]), self._name)
      File "/nix/store/mfbhcnzdxiyj831pr3hg1wwh556a6lv3-python-2.7.15-env/lib/python2.7/site-packages/sage/interfaces/maxima_lib.py", line 113, in <module>
        ecl_eval("(set-pathnames)")
      File "sage/libs/ecl.pyx", line 1326, in sage.libs.ecl.ecl_eval (build/cythonized/sage/libs/ecl.c:10224)
        cpdef EclObject ecl_eval(str s):
      File "sage/libs/ecl.pyx", line 1341, in sage.libs.ecl.ecl_eval (build/cythonized/sage/libs/ecl.c:10163)
        o=ecl_safe_eval(o)
      File "sage/libs/ecl.pyx", line 350, in sage.libs.ecl.ecl_safe_eval (build/cythonized/sage/libs/ecl.c:5221)
        raise RuntimeError("ECL says: {}".format(
    RuntimeError: ECL says: Could not create directory "/build/sage-home/.sage/maxima"
    C library error: "File exists"
**********************************************************************

This cannot be reliably reproduced. Probably some parallelism issue (tests were run with 4 threads).

CC: @embray @saraedum @jdemeyer

Component: doctest framework

Keywords: random_fail ecl

Issue created by migration from https://trac.sagemath.org/ticket/26968

The text was updated successfully, but these errors were encountered:

timokau · 2019-01-04T15:50:52Z

comment:1

It would be nice to give every test process its own DOT_SAGE directory. Would that be possible? I tried doing that, but got confused as to how the doctest processes are actually started and how isolated they are from the "master" process. I tried setting DOT_SAGE in env.py based on DOCTEST_MODE, but that apparently always is False.

embray · 2019-01-04T16:08:14Z

comment:2

I have also argued in the past that the tests should use a clean .sage. It's a problem for some other tests as well. I don't know if there's a good reason not to, other than "nobody's done it yet".

timokau · 2019-01-05T11:50:18Z

comment:3

The only reason I could think of is that some weird configurations wouldn't be tested. But I would think that is more of a feature: Having doctests depend on the user configuration makes everything much more difficult and relying on users to pick up issues when testing with their configuration doesn't sound like a particularly good practice either.

embray · 2019-01-07T14:16:23Z

comment:4

Replying to @timokau:

The only reason I could think of is that some weird configurations wouldn't be tested. But I would think that is more of a feature: Having doctests depend on the user configuration makes everything much more difficult and relying on users to pick up issues when testing with their configuration doesn't sound like a particularly good practice either.

Yes, if there are any bugs resulting from user configuration those should be then reported by users and captured as a regression test: Tests should be isolated as possible and reproducible.

timokau · 2019-01-07T20:01:01Z

comment:5

Do you happen to know how practical it would be to implement then? As I mentioned earlier, it seems like DOCTEST_MODE is only set to true after env.py was evaluated.

embray · 2019-01-08T14:01:03Z

comment:6

I think it would be quite practical; just nobody's been sufficiently motivated enough before now.

Probably somewhere in sage.doctest.forker we would just create a tempdir and set sage.env.DOT_SAGE to that tempdir before running any examples in a test file.

timokau · 2019-01-12T12:37:03Z

comment:7

But that would set sage.env.DOT_SAGE for all processes right?

embray · 2019-01-14T10:36:59Z

comment:8

No, not if it were done after fork.

embray · 2019-01-15T18:15:21Z

comment:9

Retarging tickets optimistically to the next milestone. If you are responsible for this ticket (either its reporter or owner) and don't believe you are likely to complete this ticket before the next release (8.7) please retarget this ticket's milestone to sage-pending or sage-wishlist.

timokau · 2019-01-17T20:31:33Z

comment:10

Replying to @embray:

I think it would be quite practical; just nobody's been sufficiently motivated enough before now.

Probably somewhere in sage.doctest.forker we would just create a tempdir and set sage.env.DOT_SAGE to that tempdir before running any examples in a test file.

I now remembered why that wouldn't work. There are other variables that depend on sage.env.DOT_SAGE which wouldn't be re-evaluated. We'd probably need to either change env.py to use @propertys or spawn completely independent sage instances.

timokau · 2019-01-17T20:34:13Z

comment:11

Although that wouldn't even fix this issue since MAXIMA_USERDIR is set in sage-env, not env.py.

embray · 2019-01-18T12:32:56Z

comment:12

I still haven't looked at it that deeply but I suspect you're overthinking it. Though the concern about MAXIMA_USERDIR is definitely valid. That's the kind of problem I wanted to address in #22652, but every time it's come up it's been easier to just implement a narrow fix rather than a general solution.

timokau · 2019-01-18T17:42:35Z

comment:13

Repeating my comment I accidentally posted in #22652 here:

Replying to @embray:

I still haven't looked at it that deeply but I suspect you're overthinking it.

env.py contains these lines:

_add_variable_or_fallback('PYTHON_EGG_CACHE',   opj('$DOT_SAGE', '.python-eggs'))
_add_variable_or_fallback('SAGE_STARTUP_FILE',  opj('$DOT_SAGE', 'init.sage'))

Which is admittedly way less occurrences than I expected. Still, those would need to be re-evaluated or their usage changed.

Though the concern about MAXIMA_USERDIR is definitely valid. That's the kind of problem I wanted to address in #22652,

#22652 would be amazing. I don't see why it would necessary fix these issues though. Of course it would make sense to tackle both issues at the same time.

but every time it's come up it's been easier to just implement a narrow fix rather than a general solution.

Speaking of narrow fixes, for this specific use-case, for this particular issue that should do it (how do I get that nice patch highlighting on trac?):

diff --git a/src/sage/doctest/forker.py b/src/sage/doctest/forker.py
index 02e18e67e7..115f748727 100644
--- a/src/sage/doctest/forker.py
+++ b/src/sage/doctest/forker.py
@@ -668,7 +668,14 @@ class SageDocTestRunner(doctest.DocTestRunner, object):
 
             try:
                 # Don't blink!  This is where the user's code gets run.
+                # print("MAXIMA_USERDIR is: {}".format(os.environ['MAXIMA_USERDIR']))
+                os.environ['MAXIMA_USERDIR'] = "/tmp/maxima-userdir-{}".format(os.getpid())
                 self.compile_and_execute(example, compiler, test.globs)
             except SystemExit:
                 raise
             except BaseException:

I'll probably adopt that for nix. I don't want to fix the same issue on our build server every week. Should I submit that upstream or is that too "hacky"?

timokau · 2019-01-18T21:03:40Z

comment:14

I ended up with NixOS/nixpkgs#54285.

jhpalmieri · 2019-01-18T21:27:56Z

comment:15

Replying to @timokau:

(how do I get that nice patch highlighting on trac?)

On the line right after the triple braces {{{, add the line #!diff. (See also https://trac.sagemath.org/wiki/WikiProcessors for more options like #!sh, #!html, etc.)

jdemeyer · 2019-01-19T13:28:01Z

comment:16

You seem too much focusing on the doctest framework here: this is not a bug in the doctest framework, so it shouldn't be fixed in the doctest framework.

Ideally, this should be reported to ECL upstream.

timokau · 2019-01-19T13:36:27Z

comment:17

They seem to assume only one ecl process is running at a time, which may or may not be a bug.

embray · 2019-03-25T10:56:15Z

comment:18

Ticket retargeted after milestone closed (if you don't believe this ticket is appropriate for the Sage 8.8 release please retarget manually)

embray · 2019-06-14T14:54:19Z

comment:19

As the Sage-8.8 release milestone is pending, we should delete the sage-8.8 milestone for tickets that are not actively being worked on or that still require significant work to move forward. If you feel that this ticket should be included in the next Sage release at the soonest please set its milestone to the next release milestone (sage-8.9).

mkoeppe · 2020-04-18T03:16:02Z

comment:20

I'm getting a similar failure while building doc-html

tornaria · 2023-02-14T13:59:50Z

This bug is still biting, see e.g. https://github.com/void-linux/void-packages/actions/runs/4169062901/jobs/7216578712

Could we workaround this? E.g. create the directory before ECL runs ensure-directories-exist. If the directory exists, there's no race. I'm not sure there is a simple robust way to know the name of the directory in advance (e.g. for me it seems to be $DOT_SAGE/maxima/binary/5_45_1/ecl/21_2_1/ but clearly that changes with maxima and ecl versions).

Is this reported as a bug in ECL?

tornaria · 2023-02-16T00:24:08Z

I mean something like the following (proof of concept, the computation of *maxima-objdir* is not 100% correct).

--- a/src/sage/interfaces/maxima_lib.py
+++ b/src/sage/interfaces/maxima_lib.py
@@ -116,6 +116,20 @@ ecl_eval("(in-package :maxima)")
 ecl_eval("(setq $nolabels t))")
 ecl_eval("(defvar *MAXIMA-LANG-SUBDIR* NIL)")
 ecl_eval("(set-locale-subdir)")
+
+# Compute `*maxima-objdir*` as in `(set-pathnames)`.
+# See: src/init-cl.lisp in maxima source code.
+import os
+maxima_objdir = os.path.join(
+        os.environ.get("MAXIMA_USERDIR"), "binary",
+        ecl_eval("(maxima-version1)").python()[1:-1],
+        ecl_eval("*maxima-lispname*").python()[1:-1],
+        ecl_eval("(lisp-implementation-version1)").python()[1:-1]
+        )
+# Create *maxima-objdir* before calling (set-pathnames) to avoid a race
+# in ecl's implementation of ensure-directories-exist.  See #26968.
+os.makedirs(maxima_objdir, exist_ok=True)
+
 ecl_eval("(set-pathnames)")
 ecl_eval("(defun add-lineinfo (x) x)")
 ecl_eval('(defun principal nil (cond ($noprincipal (diverg)) ((not pcprntd) (merror "Divergent Integral"))))')

An alternative is to first run ecl_eval("(set-pathnames)"); catch the RuntimeError and if so use the value of *maxima-objdir* as computed by (set-pathnames). I think it should use os.makedirs() then run (set-pathnames) again.

tornaria · 2023-02-20T19:22:48Z

Here's a small script to reproduce the issue:

from sage.all import *
import os, sys, tempfile

def fork_and_try_import():
    pid = os.fork()
    if not pid:
        try:
            import sage.interfaces.maxima_lib
        except RuntimeError as e:
            print("RuntimeError:", e)
            os._exit(1)
        os._exit(0)
    return pid

def run_test(d, num=5):
    failed = 0
    for i in range(num):
        with tempfile.TemporaryDirectory(dir=d) as tmpdir:
            print(f"Run #{i} in {tmpdir}")
            os.environ['MAXIMA_USERDIR'] = os.path.abspath(tmpdir)
            pid1 = fork_and_try_import()
            pid2 = fork_and_try_import()
            failed += os.waitpid(pid1, 0)[1] != 0
            failed += os.waitpid(pid2, 0)[1] != 0

    print(f"Failed tests: {failed}/{num}")

if len(sys.argv) > 1:
    d = sys.argv[1]
else:
    d = None

with tempfile.TemporaryDirectory(dir=d) as tmpdir:
        print(f"Running test in {tmpdir}")
        run_test(tmpdir)

This is still a race so it won't reproduce every time, depending on the speed of your filesystem. However, I have better luck reproducing this on either on NFS or on SSHFS, since the network latency gives plenty of time for the race to trigger.

By default this uses /tmp and is tipically hard to trigger the race:

$ sage sage-race.py 
Running test in /tmp/tmpqb989z33
Run #0 in /tmp/tmpqb989z33/tmpsg61q0xr
Run #1 in /tmp/tmpqb989z33/tmpfvz6o5_9
Run #2 in /tmp/tmpqb989z33/tmpe3qb7qpu
Run #3 in /tmp/tmpqb989z33/tmp9a5n9pqj
Run #4 in /tmp/tmpqb989z33/tmpfu2qr3aj
Failed tests: 0/5

My home dir here is NFS:

$ sage sage-race.py ~
Running test in /home/tornaria/tmp3q0d_q4q
Run #0 in /home/tornaria/tmp3q0d_q4q/tmp60_u_uuc
RuntimeError: ECL says: Could not create directory "/home/tornaria/tmp3q0d_q4q/tmp60_u_uuc/binary"
C library error: "File exists"
Run #1 in /home/tornaria/tmp3q0d_q4q/tmp03bxyaf9
RuntimeError: ECL says: Could not create directory "/home/tornaria/tmp3q0d_q4q/tmp03bxyaf9/binary"
C library error: "File exists"
Run #2 in /home/tornaria/tmp3q0d_q4q/tmpz37vri7x
RuntimeError: ECL says: Could not create directory "/home/tornaria/tmp3q0d_q4q/tmpz37vri7x/binary"
C library error: "File exists"
Run #3 in /home/tornaria/tmp3q0d_q4q/tmp4m55dxh1
RuntimeError: ECL says: Could not create directory "/home/tornaria/tmp3q0d_q4q/tmp4m55dxh1/binary"
C library error: "File exists"
Run #4 in /home/tornaria/tmp3q0d_q4q/tmpp8bw75kr
RuntimeError: ECL says: Could not create directory "/home/tornaria/tmp3q0d_q4q/tmpp8bw75kr/binary"
C library error: "File exists"
Failed tests: 5/5

Here's how to do it using sshfs (change REMOTE-HOST by the name of a host you can ssh into):

$ mkdir /tmp/mnt
$ sshfs REMOTE-HOST: /tmp/mnt/
$ sage sage-race.py /tmp/mnt/
Running test in /tmp/mnt/tmpi8hw10cb
Run #0 in /tmp/mnt/tmpi8hw10cb/tmp4ehsdrr7
RuntimeError: ECL says: Could not create directory "/tmp/mnt/tmpi8hw10cb/tmp4ehsdrr7/binary"
C library error: "File exists"
Run #1 in /tmp/mnt/tmpi8hw10cb/tmpa_icybzr
RuntimeError: ECL says: Could not create directory "/tmp/mnt/tmpi8hw10cb/tmpa_icybzr/binary"
C library error: "File exists"
Run #2 in /tmp/mnt/tmpi8hw10cb/tmpgqlbi43n
RuntimeError: ECL says: Could not create directory "/tmp/mnt/tmpi8hw10cb/tmpgqlbi43n/binary"
C library error: "File exists"
Run #3 in /tmp/mnt/tmpi8hw10cb/tmp2v6883z7
RuntimeError: ECL says: Could not create directory "/tmp/mnt/tmpi8hw10cb/tmp2v6883z7/binary"
C library error: "File exists"
Run #4 in /tmp/mnt/tmpi8hw10cb/tmprdtc0yme
RuntimeError: ECL says: Could not create directory "/tmp/mnt/tmpi8hw10cb/tmprdtc0yme/binary"
C library error: "File exists"
Failed tests: 5/5
$ fusermount -u /tmp/mnt
$ rmdir /tmp/mnt/

tornaria · 2023-02-21T16:17:54Z

A direct way to reproduce the doctest failures is as follows. Assuming /tmp/mnt is a slow fs (say, sshfs as above)

$ DOT_SAGE=/tmp/mnt/temp-dot-sage ./sage -tp 2 src/sage/interfaces/maxima_lib.py{,}
...
    RuntimeError: ECL says: Could not create directory "/tmp/mnt/tmp-dot-sage/maxima/binary/5_45_1"
    C library error: "File exists"
...

Note this will doctest the same file twice in parallel. Usually one of the two will fail the first time. After that the maxima directory will be created so no race anymore: you have to rm -rf /tmp/mnt/temp-dot-sage/maxima before each try for this to fail.

For the same reason, you may be able to reproduce this if your $HOME is on an NFS-mounted filesystem without messing with DOT_SAGE, but only if you rm -rf ~/.sage/maxima before each try.

Now that I can easily reproduce it, I'll PR a workaround.

We use a temporary `MAXIMA_USERDIR` so its empty, and we try to initialize maxima twice in parallel to entice the race. This temporary dir is placed within `DOT_SAGE` so it is easy to try different filesystems. The bug triggers more frequently if `DOT_SAGE` is in a high latency filesystem (e.g. sshfs on a non-local host). The next commit introduces a workaround for the bug.

When maxima is initialized a bug in ecl implementation of `ensure-directories-exist` might result in a runtime error. As a workaround, in case we get a runtime error we use python to create the directory and continue with maxima initialization. Note that for normal usage the directory will already exist within the user's `DOT_SAGE` so this code will almost never run. However, when running doctests on CI this occasionally triggers.

We run a new instance of sage in a subprocess to ensure maxima is not already initialized. We use a temporary MAXIMA_USERDIR so its empty, and we try to initialize maxima twice in parallel to entice the race. This temporary dir is placed within `DOT_SAGE` so it is easy to try different filesystems. The bug triggers more frequently if `DOT_SAGE` is in a high latency filesystem (e.g. sshfs on a non-local host). The next commit introduces a workaround for the bug.

When maxima is initialized a bug in ecl implementation of `ensure-directories-exist` might result in a runtime error. As a workaround, in case we get a runtime error we use python to create the directory and continue with maxima initialization. Note that for normal usage the directory will already exist within the user's `DOT_SAGE` so this code will almost never run. However, when running doctests on CI this occasionally triggers.

When maxima is initialized a bug in ecl implementation of `ensure-directories-exist` might result in a runtime error. As a workaround, in case we get a runtime error we use python to create the directory and then continue with maxima initialization. Note that for normal usage the directory will already exist within the user's `DOT_SAGE` so this code will almost never run. However, when running doctests on CI this occasionally triggers.

gh-35195: Workaround for an ecl race in maxima init ### 📚 Description When maxima is initialized a bug in ecl implementation of `ensure-directories-exist` might result in a runtime error. As a workaround, in case we get a runtime error we use python to create the directory and continue with maxima initialization. Note that for normal usage the directory will already exist within the user's `DOT_SAGE` so this code will almost never run. However, when running doctests on CI this occasionally triggers. #### New doctest The first commit introduces a doctest to try to catch this race. We run a new instance of sage in a subprocess to ensure maxima is not already initialized. We use a temporary `MAXIMA_USERDIR` so its empty, and we try to initialize maxima twice in parallel to entice the race. This temporary dir is placed within `DOT_SAGE` so it is easy to try different filesystems. The bug triggers more frequently if `DOT_SAGE` is in a high latency filesystem (e.g. sshfs on a non-local host). Closes #26968. ### 📝 Checklist    - [x] I have made sure that the title is self-explanatory and the description concisely explains the PR. - [x] I have linked an issue or discussion. - [x] I have created tests covering the changes. URL: #35195 Reported by: Gonzalo Tornaría Reviewer(s): Gonzalo Tornaría, Matthias Köppe

Calling initialize-runtime-globals will run set-pathnames and be subject to the issue described in sagemath#26968. Thus the workaround introduced in sagemath#35195 has to be done before anything that may call set-pathnames (e.g. initialize-runtime-globals).

Make sure sagemath#26968 is not unfixed after sagemath#35707.

timokau added this to the sage-8.6 milestone Dec 29, 2018

timokau added c: doctest framework labels Dec 29, 2018

embray modified the milestones: sage-8.6, sage-8.7 Jan 15, 2019

embray modified the milestones: sage-8.7, sage-8.8 Mar 25, 2019

embray removed this from the sage-8.8 milestone Jun 14, 2019

mkoeppe added this to the sage-9.1 milestone Apr 18, 2020

mkoeppe modified the milestones: sage-9.3, sage-9.4 Mar 24, 2021

mkoeppe modified the milestones: sage-9.4, sage-9.5 Jul 19, 2021

mkoeppe modified the milestones: sage-9.5, sage-9.6 Dec 18, 2021

mkoeppe modified the milestones: sage-9.6, sage-9.7 May 3, 2022

mkoeppe modified the milestones: sage-9.7, sage-9.8 Sep 19, 2022

mkoeppe removed this from the sage-9.8 milestone Jan 29, 2023

tornaria mentioned this issue Feb 25, 2023

Workaround for an ecl race in maxima init #35195

Merged

3 tasks

tobiasdiez mentioned this issue Feb 25, 2023

Flaky doctests #35196

Open

3 tasks

vbraun closed this as completed in #35195 Mar 26, 2023

This was referenced Jun 25, 2023

Make Sage work with maxima 5.47 #35707

Merged

maxima: update to 5.47.0, patch and rebuilds sagemath for maxima, singular, numpy updates. void-linux/void-packages#44624

Merged

This was referenced Jun 25, 2023

Make sure #26968 is not unfixed after #35707 #35832

Closed

Make sure #26968 is not unfixed after #35707. antonio-rojas/sage#2

Merged

antonio-rojas added a commit to antonio-rojas/sage that referenced this issue Jun 25, 2023

Merge pull request #2 from tornaria/maxima-5.47.0

808ab47

Make sure sagemath#26968 is not unfixed after sagemath#35707.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ECL test sometimes fails to create maxima directory #26968

ECL test sometimes fails to create maxima directory #26968

timokau commented Dec 29, 2018

timokau commented Jan 4, 2019

embray commented Jan 4, 2019

timokau commented Jan 5, 2019

embray commented Jan 7, 2019

timokau commented Jan 7, 2019

embray commented Jan 8, 2019

timokau commented Jan 12, 2019

embray commented Jan 14, 2019

embray commented Jan 15, 2019

timokau commented Jan 17, 2019

timokau commented Jan 17, 2019

embray commented Jan 18, 2019

timokau commented Jan 18, 2019

timokau commented Jan 18, 2019

jhpalmieri commented Jan 18, 2019

jdemeyer commented Jan 19, 2019

timokau commented Jan 19, 2019

embray commented Mar 25, 2019

embray commented Jun 14, 2019

mkoeppe commented Apr 18, 2020

tornaria commented Feb 14, 2023

tornaria commented Feb 16, 2023

tornaria commented Feb 20, 2023

tornaria commented Feb 21, 2023

ECL test sometimes fails to create maxima directory #26968

ECL test sometimes fails to create maxima directory #26968

Comments

timokau commented Dec 29, 2018

timokau commented Jan 4, 2019

embray commented Jan 4, 2019

timokau commented Jan 5, 2019

embray commented Jan 7, 2019

timokau commented Jan 7, 2019

embray commented Jan 8, 2019

timokau commented Jan 12, 2019

embray commented Jan 14, 2019

embray commented Jan 15, 2019

timokau commented Jan 17, 2019

timokau commented Jan 17, 2019

embray commented Jan 18, 2019

timokau commented Jan 18, 2019

timokau commented Jan 18, 2019

jhpalmieri commented Jan 18, 2019

jdemeyer commented Jan 19, 2019

timokau commented Jan 19, 2019

embray commented Mar 25, 2019

embray commented Jun 14, 2019

mkoeppe commented Apr 18, 2020

tornaria commented Feb 14, 2023

tornaria commented Feb 16, 2023

tornaria commented Feb 20, 2023

tornaria commented Feb 21, 2023