Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extra_pip_args do not seem to be passed to whl_library #2239

Closed
swarren12 opened this issue Sep 20, 2024 · 4 comments · Fixed by #2258
Closed

extra_pip_args do not seem to be passed to whl_library #2239

swarren12 opened this issue Sep 20, 2024 · 4 comments · Fixed by #2258

Comments

@swarren12
Copy link

swarren12 commented Sep 20, 2024

🐞 bug report

Affected Rule

I guess the affected rule is whl_library, but from a "dumb end user" point of view, it's pip.parse.

Is this a regression?

Yes.

Description

Expected behaviour: specifying either experimental_index_url or extra_pip_args = [ "--index-url=..." ] allows fetching from a custom index.
Actual behaviour: some parts of the build appear to resolve against the custom URL, but later parts fail.

I think this is related to #2152, but at least for me using the current HEAD of rules_python does not seem to fix the issue.

I'm currently using these rules in an air-gapped CI environment, so it does not have access to pypi.org. On v0.34.0, the following snippet is working fine:

pip.parse(
    hub_name = "lib_python",
    experimental_index_url = "https://internal.repo/repository/pypi_proxy/simple/",
    extra_pip_args = [
        "--index-url=https://internal.repo/repository/pypi_proxy/simple/",
        "--trusted-host", "internal.repo",
    ],
    python_version = "3.11",
    requirements_lock = "//:requirements_lock.txt",
)

However, after upgrading to 0.35.0, this breaks with the following error:

...
ERROR: .../scipy/BUILD.bazel:10:6 ...//scipy:pkg depends on ...scipy//:pkg in repository ..._scipy which failed to fetch. no such package '..._scipy//': 
rules_python:whl_library(..._scipy) FAIL: repo.execute: whl_library.ResolveRequirement(..._scipy, scipy==1.14.0): end: failure:
  command: .../external/rules_python~~python~python_3_11_host/python -m python.private.pypi.whl_installer.wheel_installer --requirement scipy==1.14.0 --isolated --extra_pip_args "{\"arg\":[]}" --pip_data_exclude "{\"arg\":[]}" --environment "{\"arg\":{}}"
...
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f09e0f38cd0>: Failed to establish a new connection: [Errno 111] Connection refused')': /simple/scipy/
ERROR: Could not find a version that satisfies the requirement scipy==1.14.0 (from versions: none)
ERROR: No matching distribution found for scipy==1.14.0
...
subprocess.CalledProcessError: Command '['.../external/rules_python~~python~python_3_11_host/python', '-m', 'pip', '--isolated', 'wheel', '--no-deps', '-r', '/tmp/tmpy1_py_zu']' returned non-zero exit status 1.

The interesting thing here is that the command line being run appears to have completely omitted the extra_pip_args:

--extra_pip_args "{\"arg\":[]}"

As mentioned above, I've tried running with a git_override() pointing to the current HEAD of main, but the error persists:

bazel_dep(name = "rules_python", version = "0.35.0")
git_override(module_name = "rules_python", remote = "https://github.com/bazelbuild/rules_python", commit = "1d7fd51ceab8d09858eca85cce9a16b867b8a3e2")

does not work, but pointing the override to 0.34.0 does:

bazel_dep(name = "rules_python", version = "0.35.0")
git_override(module_name = "rules_python", remote = "https://github.com/bazelbuild/rules_python", commit = "084b877c98b580839ceab2b071b02fc6768f3de6")

🔬 Minimal Reproduction

I can try and produce a minimal reproduction, but it's difficult to prove without having something blocking access to regular pypi.org.

🔥 Exception or Error

I don't think the stack-trace from Python is very helpful, but I include it just for completeness.


Traceback (most recent call last):
  File "", line 198, in _run_module_as_main
  File "", line 88, in _run_code
  File ".../external/rules_python~/python/private/pypi/whl_installer/wheel_installer.py", line 205, in 
    main()
  File ".../external/rules_python~/python/private/pypi/whl_installer/wheel_installer.py", line 190, in main
    subprocess.run(pip_args, check=True, env=env)
  File ".../external/rules_python~~python~python_3_11_x86_64-unknown-linux-gnu/lib/python3.11/subprocess.py", line 571, in run

🌍 Your Environment

Operating System: Fedora Linux 40
Output of bazel version:

Build label: 7.1.2
Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer
Build time: Wed May 8 20:49:55 2024 (1715201395)
Build timestamp: 1715201395
Build timestamp as int: 1715201395

Rules_python version: v0.35.0

@aignas
Copy link
Collaborator

aignas commented Sep 21, 2024

When you pass experimental_index_url, then pip is not used at all to fetch the wheels. I realized that this may be a bug in case you are building from sdists. In an air-gaped environment it would be best to build the wheels separately and upload them to a local PyPi mirror. You will not have great results relying on building from source distributions because of issues like this.

I am not sure when I'll be able to fix it myself, so if you would like to contribute, feel free to put the following line under a conditional to only drop the extra_pip_args only for whl distributions:

whl_library_args.pop("extra_pip_args", None)

There are similar if statements nearby.

@swarren12
Copy link
Author

swarren12 commented Sep 21, 2024

When you pass experimental_index_url, then pip is not used at all to fetch the wheels.

Should rules_python not still respect the value passed to the experimental_index_url property, though?

I'll admit that I'm not very familiar with the Python package management systems, but my understanding is that, if I specify experimental_index_url on pip.parse(), then it should fetch the whl distributions from the URL specified?

I can see that the whl does exist in the internal repository:

$ curl -I -https://internal.repo/repository/pypi_proxy/packages/scipy/1.14.0/scipy-1.14.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
HTTP/1.1 200 OK

and so I'm confused about why it isn't just fetching it?

I'll add that I don't really know what an "sdist" is or why one might choose to use it over a whl. I don't think I'm (deliberately) using sdists, and if I am accidentally, I probably shouldn't be. As far as I'm concerned, I'd be quite happy to fetch the regular whl, as long as it's coming from the internal mirror and not pypi.org. Apologies if linking the other ticket caused some confusion in this area - I don't think my use case is the same as that ticket, it just seemed somewhat similar in symptoms.

@aignas
Copy link
Collaborator

aignas commented Sep 22, 2024

Ah, now I think I get what the problem is - we are fetching some extra dependencies from PyPI which we need for extracting the wheels you can configure the bazel downloader like described in this blog post: https://blog.aspect.build/configuring-bazels-downloader and explicitly forbid any downloads pypi.org and rewrite the URLs so that the tools are downloaded from your airgapped mirrors instead.

I think for sdist builds the issue still exists, so I'll leave the ticet for that.

swarren12 pushed a commit to swarren12/rules_python that referenced this issue Sep 26, 2024
@swarren12
Copy link
Author

swarren12 commented Sep 26, 2024

Thanks for the update @aignas - unfortunately, setting up a rewrite rule in the downloader config didn't seem to help.

I've spent a bit more time debugging this and I think I've found the root cause.

In pypi/extension.bzl, there is a call to parse_requirements() as below:

requirements_by_platform = parse_requirements(
        module_ctx,
        requirements_by_platform = requirements_files_by_platform(
            requirements_by_platform = pip_attr.requirements_by_platform,
            requirements_linux = pip_attr.requirements_linux,
            requirements_lock = pip_attr.requirements_lock,
            requirements_osx = pip_attr.requirements_darwin,
            requirements_windows = pip_attr.requirements_windows,
            extra_pip_args = pip_attr.extra_pip_args,
            python_version = major_minor,
            logger = logger,
        ),
        get_index_urls = get_index_urls,
        evaluate_markers =  ...,
        logger = logger,
)

the pip_attr.extra_pip_args is passed into requirements_files_by_platform(), but my reading of this is that it is only used to determine platform arguments.

The parse_requirements() function itself takes an extra_pip_args parameter, but this is not being specified. I believe adding this fixes the problem.

For a little bit more context: I do not think that the problem was that the --index-url argument was missing as I'd originally assumed, I think this was correctly being passed down via the experimental_index_url argument. The actual problem was that the host had a self-signed certificate that was being rejected without the --trusted-host argument (which was being provided by the extra_pip_args).

github-merge-queue bot pushed a commit that referenced this issue Oct 3, 2024
Before this PR we were just dropping the `extra_pip_args` passed to
`pip.parse` and were just using the args passed through the requirements
file. Thanks to @swarren12 for pointing this out.

This PR also passes `extra_pip_args` to `sdist` `whl_library` instances
so that users can build the `sdists` correctly when using
`experimental_index_url` feature.

Summary:
- pass `extra_pip_args` when building sdists in experimental mode
- join `extra_pip_args` from the file and the pip.parse attr
- test: add a test to ensure that the extra args are joined

Fixes #2239
Closes #2254
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants