Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running wsl.exe from within distribution appears to hang forever [WSLInterop fails] #295

Closed
1 task done
vkoukis opened this issue Aug 15, 2022 · 2 comments
Closed
1 task done
Assignees
Labels
bug Something isn't working

Comments

@vkoukis
Copy link

vkoukis commented Aug 15, 2022

Hello,

thanks for your efforts with genie, it is very cool!

Windows version (build number):
Version 21H2 (OS Build 19044.1889)

Linux distribution:
Ubuntu 20.04 LTS installed with wsl --install -d ubuntu-20.04, but this seems to break WSLInterop for every distribution on WSL2, I think this happens because they share binfmt_misc kernel-side configuration.

Kernel version:
5.4.72-microsoft-standard-WSL2

Genie version:
genie 2.4

Describe the bug
I have installed genie 2.4 on Ubuntu 20.04 using the Debian package from the wsl-transdebian repository.

Before starting the bottle, WSLInterop works just fine, and I can run wsl.exe without any problems, both from the Ubuntu 20.04 distribution, and from a different, Debian distribution:

vangelis@MAGELLAN2:~$ wsl.exe --help
Copyright (c) Microsoft Corporation. All rights reserved.

Usage: wsl.exe [Argument] [Options...] [CommandLine]
[...]

The moment I start the bottle, running wsl.exe appears to hang for quite some time, until it fails, see below:

vangelis@MAGELLAN2:~$ genie -s
Waiting for systemd....!
vangelis@MAGELLAN2:~$ genie -b
inside
vangelis@MAGELLAN2:~$ /mnt/c/Windows/System32/wsl.exe -d Ubuntu-20.04
[...the command appears to hang for a long time, and sometimes fails with...]
An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.

Which seems we fell into some sort of infinite loop and we ran out of some resource.

I have also confirmed this by attempting to kill wsl.exe from Windows:

C:\>taskkill /f /im wsl.exe
SUCCESS: The process "wsl.exe" with PID 30828 has been terminated.
SUCCESS: The process "wsl.exe" with PID 30808 has been terminated.
SUCCESS: The process "wsl.exe" with PID 41816 has been terminated.
[... a lot more processes killed...]

I have confirmed running wsl.exe fails both for Ubuntu 20.04 [both from inside the bottle, outside the bottle] and for Debian: WSLInterop seems to break for all distributions.

Confirm that you are running inside the bottle:
The output of genie -b.

To Reproduce
Steps to reproduce the behavior:
See above for steps to reproduce the issue.

I have confirmed that wsl.exe hangs only after starting the bottle, because systemd changes WSLInterop configuration via file /usr/lib/binfmt.d/WSLInterop.conf, which genie installs.

This configuration is the one before starting the bottle:

$ cat /proc/sys/fs/binfmt_misc/WSLInterop
enabled
interpreter /tools/init
flags: F
offset 0
magic 4d5a

This configuration is the one in /usr/lib/binfmt.d/WSLInterop.conf after starting the bottle:

$ cat /proc/sys/fs/binfmt_misc/WSLInterop
enabled
interpreter /init
flags: PF
offset 0
magic 4d5a

Note the difference in the P flag.

Systemd seems to enable this configuration via /usr/lib/binfmt.d/WSLInterop.conf and this service:

$ systemctl status systemd-binfmt
● systemd-binfmt.service - Set Up Additional Binary Formats
     Loaded: loaded (/lib/systemd/system/systemd-binfmt.service; static; vendor preset: enabled)
     Active: active (exited) since Mon 2022-08-15 11:02:57 EEST; 14min ago

Expected behavior

I provide more context below, but I have confirmed I can solve this problem, and running wsl.exe just works from anywhere by either:

  1. preventing systemd from configuring binfmt_msc: systemctl mask systemd-binfmt, or
  2. removing /usr/lib/binfmt.d/WSLInterop.conf altogether.

Screenshots
[I don't have any screenshots of this]

Additional context

I have tried to understand more, I am exposing my context below. Looking forward to your feedback.

I understand file /usr/lib/binfmt.d/WSLInterop.conf became a part of genie due to issue #142. Note that the original version of this file introduced the binfmt handler just with the F flag:
#142 (comment)

Please note that binfmt created by systemd is not wsl-specific, thus it is not preconfigured to support windows executables like the one we unmounted. But we can easily bring the Windows interoperability back by adding config manually, which is done by creating e.g. /etc/binfmt.d/99-WSLInterop.conf with below contents:
:WSLInterop:M::MZ::/init:F

But the assertion by @esgie doesn't seem to hold. WSL configures the WSLInterop handler itself when it first starts the distribution, and no matter how many times we unmount or re-mount the binfmt_misc fs, the kernel configuration remains constant, so we don't really need to touch it at all.

Steps to show this:

  1. Mask the service, or remove the offending file:

    $ sudo systemctl mask systemd-binfmt
    $ sudo mv /usr/lib/binfmt.d{,.disabled}
    
  2. Confirm configuration before starting the bottle:

    $ cat /proc/sys/fs/binfmt_misc/WSLInterop
    enabled
    interpreter /tools/init
    flags: F
    offset 0
    magic 4d5a
    vangelis@MAGELLAN2:~$ genie -r
    stopped
    
  3. Confirm the same configuration after starting the bottle:

    vangelis@MAGELLAN2:~$ genie -s
    Waiting for systemd....!
    vangelis@MAGELLAN2:~$ genie -b
    inside
    vangelis@MAGELLAN2:~$ cat /proc/sys/fs/binfmt_misc/WSLInterop
    enabled
    interpreter /tools/init
    flags: F
    offset 0
    magic 4d5a
    
  4. Confirm wsl.exe just works:

    vangelis@MAGELLAN2:~$ /mnt/c/Windows/System32/wsl.exe -d Ubuntu-20.04 hostname
    MAGELLAN2
    

I have also confirmed that actually leaving the file there but removing the problematic P flag also works, because systemd-binfmt.service becomes essentially a no-op.

However, commit ebca3e3 by @cerebrate changed the flags to PF, following discussion in #267:
#267 (comment)

The missing P is the problem. Changing F to PF in WSLInterop.conf would solve this issue, but I can't confirm whether other Windows version have the same situation.

This seems strange, because it seems @NyaMisty was seeing the reverse of what I am seeing, flags PF did work and F failed.

I found this discussion which is relevant:
microsoft/WSL#8162
I understand that at some point, recently, @benhillis modified the binfmt interpreter [/init?] to support the P flag, and require it at registration:
microsoft/WSL#8162 (comment)

So, it could be that @NyaMisty is running a more recent WSL2 version than me, and genie needs to support both configurations.

Given this context, and the fact that asking systemd to configure binfmt.d explicitly seems to be unnecessary, I propose genie doesn't ship /usr/lib/binfmt.d/WSLInterop.conf at all.

I am looking forward to your feedback, and would be happy to follow up with a PR, if you agree with the above conclusions.

Thanks,
Vangelis.

I confirm that I have read the ENTIRE supplied readme file and checked for relevant information on the repository wiki before raising this issue, and that if the solution to this issue is found in either location, it will be closed without further comment:

  • Yes.
@vkoukis vkoukis added the bug Something isn't working label Aug 15, 2022
@vkoukis
Copy link
Author

vkoukis commented Aug 15, 2022

I just realized this is the same problem that @ShadowEO describes here: #287
The conclusion on this issue also seems to be that some WSL2 versions need F, some need PF for binfmt flags.

I think the following would work:

  1. Mask systemd-binfmt.service, so it doesn't attempt to configure binfmt at all, and more importantly, it doesn't destroy its configuration when the bottle is stopped.
  2. Do not ship /usr/lib/binfmt.d/WSLInterop.conf at all, but re-use the version-specific configuration that WSL2 applies at startup, instead.

@cerebrate
Copy link
Member

Fixed in 2.5, shipping as soon as a psutil issue is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants