-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ctx.actions.run fails to invoke batch scripts when running via RBE on Windows #11636
Comments
Yeah, Windows requires the path of the binary to run has to be a Windows style path. I believe why it succeeded locally is because we preprocess the arg0 (replacing / with ) in WindowsSubprocessFactory: bazel/src/main/java/com/google/devtools/build/lib/windows/WindowsSubprocessFactory.java Lines 86 to 97 in ecac47d
@EricBurnett I assume the is no such preprocessing in the RBE Windows runner? Can we add this support so that no matter how the binary path is passed, we can always run it on RBE? |
It seems better (to me) to fix this on Bazel's side, than work around Bazel's shortcoming in the remote system. In the worst case, we can replace the path in the RemoteSpawnRunner, as long as we know that it's supposed to be run on Windows. Why does this work in all other cases, but not in this one? |
OK, I did some debugging and found another place we made sure the first argument is a valid path to execute. bazel/src/main/java/com/google/devtools/build/lib/exec/local/LocalSpawnRunner.java Lines 367 to 376 in ddc57b2
@ulfjack, from the blame history, you are the author of this, do you think it makes sense to implement the same logic in RemoteSpawnRunner? Another thing I want to point out is that CreateProcessW can only take a single command string, but we pass a list of arguments to remote runner. So the remote systems do need to pay attention to how they parse the arguments. Ideally in the same way as bazel/src/main/java/com/google/devtools/build/lib/windows/WindowsSubprocessFactory.java Lines 41 to 44 in ecac47d
|
@meteorcloudy normally I'd say "bazel and RBE should both do what the RE-API specifies", but it's not totally clear on this point.
Elsewhere in the API (e.g. output_paths), relative paths are specified with '/' as a path specifier. I looked for 'absolute' but looks like the only place that made it into the spec is symlink targets, and it's (imho) wrong there: requiring absolute paths to start with '/'. I think I've stated elsewhere my preference for absolute paths is that they be specified in os-specific terms - anything else is not expressive enough. (Windows alone has multiple valid absolute path formats). I'd suggest we document that here as well. For relative paths, I'm not certain - sticking with '/' separated paths probably works, in which case it's the server responsibility to translate a "logical" relative path into the appropriate system-specific path for execution. That at least makes some sense, in that a relative path is really intended to be processed by the server, which has to at least transform it into the appropriate path for execution. (Noting our mis-specification that it's relative to the input root, but there may be a working_directory specified, in which case the server has to re-compute it based on the directory it's going to get executed in or make it absolute for execution anyways.) I could also see the argument that it should be a relative path using os-specific separators, but given our use elsewhere in the API of '/' intentionally, I don't think that would do any harm here as well? In which case this becomes a request for spec clarification ( @bergsieker FYI), after which point it would become an RBE bug that we're not rewriting the path we execute from "logical" relative path to something system-specific for execution. |
The executable path may not be the first element of the arguments array, e.g., because of test and coverage wrappers, or a |
(or RBE would have to guess or reverse engineer the entire command line, which seems clearly undesirable) |
Nested arguments should definitely be OS-specific, yes. I'm only thinking about the first one, which is special in the sense that the remote server must process and possibly modify it. But I'd also be fine choosing to specify that it must always be os-specific (in which case all arguments are then os-specific, as you'd probably expect) - I don't think it makes the remote server's job any harder, and may actually make it easier to process said relative path in an os-specific way. |
@ulfjack Do you want to help fixing the RemoteSpawnRunner? |
I can help fix the problem. I'm not sure RemoteSpawnRunner is the right place for the fix, though - we need to find out where the path is coming from. Things are a bit hectic right now over here but should slow down this week. |
@ulfjack Hello :) How are things looking on this issue - would you have time to help with a fix? I marked this P3 for now, meaning "we would happily take and review a fix for this". |
FYI, this is simply not true. Any "quoted" path containing forward slashes should be invoked correctly. Only when paths are unquoted does it become ambiguous whether the path name segments are part of the name or additional /options. Using a quoted batch file pathname upon invocation should resolve this issue, resolves spaces in pathnames, and is portable to all platforms. |
Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 2+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team ( |
This issue has been automatically closed due to inactivity. If you're still interested in pursuing this, please reach out to the triage team ( |
I'm running into this same problem. It looks like this was discussed in bazelbuild/remote-apis#187, and the consensus there seems to be that it's the client's problem. Does this mean that this is supposed to be fixed in bazel (in which case this issue should be reopened), or does it mean that we should patch the rules_go package to convert the slashes to backslashes on Windows? |
…37902) This eliminates the need for the `grpc_cc_proto_library` bazel BUILD rule introduced in #37863. To make this work, I had to upgrade several bazel dependencies and apply a patch to rules_go to work around bazelbuild/bazel#11636. Closes #37902 PiperOrigin-RevId: 685868647
See bazelbuild/remote-apis#187 (comment), I think that this needs to be both the client's and the server's problem, but the part about |
My Windows experiments concludes that |
`.bat` scripts can only be executed on Windows when their paths use backslashes, not forward slashes. For this reason, local execution carefully replaces forward slashes with backslashes in the executable of a `Spawn` when executing on Windows. This commit adds equivalent logic for the case of remote execution on Windows from any host: * Remote execution replaces forward slashes with backslashes in the first argument when executing on Windows. * Most calls to `PathFragment#getCallablePathString` are replaced with the new execution platform aware `getCallablePathStringForOs`. This affects test actions, in which various wrappers execute different executables (e.g. `--run_under` targets) and thus can have Bazel-controlled executable path strings in locations other than the `argv[0]`. Fixes bazelbuild#11636 Closes bazelbuild#23986. PiperOrigin-RevId: 689357323 Change-Id: Ifb842babc02c6d741d3b45914a5bf5c032204e2b
…n Windows (#24080) `.bat` scripts can only be executed on Windows when their paths use backslashes, not forward slashes. For this reason, local execution carefully replaces forward slashes with backslashes in the executable of a `Spawn` when executing on Windows. This commit adds equivalent logic for the case of remote execution on Windows from any host: * Remote execution replaces forward slashes with backslashes in the first argument when executing on Windows. * Most calls to `PathFragment#getCallablePathString` are replaced with the new execution platform aware `getCallablePathStringForOs`. This affects test actions, in which various wrappers execute different executables (e.g. `--run_under` targets) and thus can have Bazel-controlled executable path strings in locations other than the `argv[0]`. Fixes #11636 Closes #23986. PiperOrigin-RevId: 689357323 Change-Id: Ifb842babc02c6d741d3b45914a5bf5c032204e2b Commit 9c90100 Co-authored-by: Fabian Meumertzheim <[email protected]>
A fix for this issue has been included in Bazel 8.0.0 RC2. Please test out the release candidate and report any issues as soon as possible. |
…rpc#37902) This eliminates the need for the `grpc_cc_proto_library` bazel BUILD rule introduced in grpc#37863. To make this work, I had to upgrade several bazel dependencies and apply a patch to rules_go to work around bazelbuild/bazel#11636. Closes grpc#37902 PiperOrigin-RevId: 685868647
The existing genrule for copying files does not work for Windows when using Bazel 7.X and using RBE. See bazelbuild/bazel#11636 The copy_files macro uses skylib's copy_file rule.
Description of the problem / feature request:
Executing a batch script with
ctx.actions.run
fails when running via RBE on Windows. Running the exact same build locally succeeds without a hitch. It appears Bazel may be passing an Unix style path rather than a Windows path when executing the rule action in RBE. From the linked minimal repro, we see:which appears that
cmd.exe
is interpretingbazel-out
as the command to run and not the full path. This does not occur when running the rule locally. When we purposefully use a Windows path string for theexecutable
field of the rule action, it makes no difference, we get the same result as when we use aFile
type.This issue could possibly be fixed by performing the appropriate quoting/passing the correct flags to
cmd.exe
or ensuring a Windows style path is always used.Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
See: https://github.com/greenhouse-org/bazel-issue-repro/tree/master/ctx_actions_run_rbe
The README in this repo should have details for reproducing the problem and example failure output.
What operating system are you running Bazel on?
Windows
What's the output of
bazel info release
?release 3.1.0
Any other information, logs, or outputs that you want to share?
We ran into this in the Envoy project as a result of bumping to the latest
rules_go
which switched from runninggo tool
commands via bash shell to the nativectx.actions.run
on Windows.Using the
--experimental_remote_grpc_log
flag to generate the grpc log for a failing RBE build and the remote client tool we are able to see the failed and successful actions and representation of the commands that are run. It appears from the output below that Bazel is sending a command to the RBE service that uses an invalid path for a batch script.I believe the invocation ID from this run was 8d806e2e-c3cb-4a69-becb-05571c76f75c, (but I may be mixing up attempts).
One of the failed actions from the issue repro (digest 89a618e0b38a6c9967729316db828b66f4a5e7d7764c514bcd83e87b9b32c8d5/141):
Another failed action looks very similar (digest 27e66ea85ab4978fb639e57e30206b8e5a598bade43746e55c46250e42c6d58d/141).
The successful action from the issue repro (digest 4306f162f55bdc0549d4d986d63ce2985e7a647b3c92e8f9fdef3b4b9ef903ad/139):
Commenting out the
inputs
field of the//:batch_execute_rule_windows_path
rule, running locally the build fails with:It somewhat makes sense, there is not file input to the rule and the path is converted properly, however the fact that Bazel is trying to run a
.bat
file with CreateProcessW is a bit odd. The exact same error is output when use an unaltered path string instead of aFile
type (without replacing/
with\\
) for theexecutable
.Running the same thing remotely, the build fails with the stack trace and error:
It seems something along the way isn’t converting to the executable path to a Windows path when we’re executing remotely, though the actual working directory seems to be set appropriately for the remote container environment.
The text was updated successfully, but these errors were encountered: