Add support for httpx as backend #1085

jakkdl · 2024-02-02T15:34:12Z

First step of #749 as described in #749 (comment)

I was tasked with implementing this, but it's been a bit of a struggle not being very familiar with aiohttp, httpx or aiobotocore - and there being ~zero in-line types. But I think I've fixed enough of the major problems that it's probably useful to share my progress.

There's a bunch of random types added. I can split those off into a separate PR or remove if requested. Likewise for from __future__ import annotations.

TODO:

exceptions

retryable_exceptions: mostly just need to go through all httpx exceptions and decide which ones are fine
- I have done a first pass, but it's quite possible there's more of them as I'm unfamiliar with this, so would be great if somebody else took a look through https://pydoc.dev/httpx/latest/classIndex.html

The mapping between httpx exceptions and aiobotocore exceptions can likely be improved.

aiobotocore/aiobotocore/httpsession.py

Lines 478 to 534 in b19bc09

    
           # **previous exception mapping** 
        
           # aiohttp.ClientSSLError -> SSLError 
        
           # aiohttp.ClientProxyConnectiorError 
        
           # aiohttp.ClientHttpProxyError -> ProxyConnectionError 
        
           # aiohttp.ServerDisconnectedError 
        
           # aiohttp.ClientPayloadError 
        
           # aiohttp.http_exceptions.BadStatusLine -> ConnectionClosedError 
        
           # aiohttp.ServerTimeoutError -> ConnectTimeoutError|ReadTimeoutError 
        
           # aiohttp.ClientConnectorError 
        
           # aiohttp.ClientConnectionError 
        
           # socket.gaierror -> EndpointConnectionError 
        
           # asyncio.TimeoutError -> ReadTimeoutError 
        
           # **possible httpx exception mapping** 
        
           # httpx.CookieConflict 
        
           # httpx.HTTPError 
        
           # * httpx.HTTPStatusError 
        
           # * httpx.RequestError 
        
           #   * httpx.DecodingError 
        
           #   * httpx.TooManyRedirects 
        
           # * httpx.TransportError 
        
           #   * httpx.NetworkError 
        
           #     * httpx.CloseError -> ConnectionClosedError 
        
           #     * httpx.ConnectError -> EndpointConnectionError 
        
           #     * httpx.ReadError 
        
           #     * httpx.WriteError 
        
           #   * httpx.ProtocolError 
        
           #     * httpx.LocalProtocolError -> SSLError?? 
        
           #     * httpx.RemoteProtocolError 
        
           #   * httpx.ProxyError -> ProxyConnectionError 
        
           #   * httpx.TimeoutException 
        
           #     * httpx.ConnectTimeout -> ConnectTimeoutError 
        
           #     * httpx.PoolTimeout 
        
           #     * httpx.ReadTimeout -> ReadTimeoutError 
        
           #     * httpx.WriteTimeout 
        
           #   * httpx.UnsupportedProtocol 
        
           # * httpx.InvalidURL 
        
           except httpx.ConnectError as e: 
        
               raise EndpointConnectionError(endpoint_url=request.url, error=e) 
        
           except (socket.gaierror,) as e: 
        
               raise EndpointConnectionError(endpoint_url=request.url, error=e) 
        
           except asyncio.TimeoutError as e: 
        
               raise ReadTimeoutError(endpoint_url=request.url, error=e) 
        
           except httpx.ReadTimeout as e: 
        
               raise ReadTimeoutError(endpoint_url=request.url, error=e) 
        
           except NotImplementedError: 
        
               raise 
        
           except Exception as e: 
        
               message = 'Exception received when sending urllib3 HTTP request' 
        
               logger.debug(message, exc_info=True) 
        
               raise HTTPClientError(error=e)

proxy support
- postponed to later PR
- this was previously handled per-request, ~~but AFAICT you can only configure proxies per-client in httpx. So need to move the logic for it, and cannot use botocore.httpsession.ProxyConfiguration.proxy_[url,headers]_for(request.url)~~
  - see https://www.encode.io/httpcore/proxies/ for how to handle it per-request
- raising of ProxyConnectionError is very ugly atm, and probably not "correct"?
- ~~BOTO_EXPERIMENTAL__ADD_PROXY_HOST_HEADER~~
  - ~~seems not possible to do when configuring proxies per-client?~~
wrap io.IOBase data in a non-sync-iterable async iterable
- converted to bytes for now.
I have added change info to CHANGES.rst

No longer TODOs after changing the scope to implement httpx alongside aiohttp:

test_patches previously cared about aiohttp. That can probably be retired?
replace aiohttp with httpx in tests.mock_server.AIOServer?
The following connector_args now raise NotImplementedError:
- use_dns_cache: did not find any mentions of dns caches on a quick skim of httpx docs
- force_close: same. Can maybe find out more by digging into docs on what this option does in aiohttp.
- resolver: this is an aiohttp.abc.AbstractResolver which is obviously a no-go.
  - raise error for code passing this
  - figure out equivalent functionality for httpx
url's were previously wrapped with yarl.URL(url, encoding=True). httpx does not support yarl. I don't know what this achieved (maybe the non-normalization??), so skipping it for now.

Some extra tests would probably also be good, but not super critical when we're just implementing httpx alongside aiohttp.

aiobotocore/awsrequest.py

thehesiod · 2024-02-02T23:52:33Z

aiobotocore/httpsession.py

+
+            # previously data was wrapped in _IOBaseWrapper
+            # github.com/aio-libs/aiohttp/issues/1907
+            # I haven't researched whether that's relevant with httpx.


ya silly decision of aiohttp, they took over the stream. Most likely httpx does the right thing. I think to get around the sync/async thing we can just make a stream wrapper that hides the relevant methods...I think I did this somewhere...will try to remember

Would the current tests catch if httpx didn't do the right thing?

jakkdl · 2024-02-06T14:06:50Z

I started wondering whether response.StreamingBody should wrap httpx.Response or one of its iterators (aiter_bytes, aiter_text, aiter_lines or aiter_raw), but am now starting to think that maybe it doesn't make sense to have at all and we should just surface the httpx.Response object to the user and let them handle it as they want.

The way that aiohttp.StreamReader behaves is just different enough that providing a translation layer that handles httpx.Response streams the same way becomes quite clunky/inefficient/tricky/very different. StreamingBody.iter_chunks should be done by specifying chunk size when calling httpx.Response.aiter_bytes, and StreamingBody.iter_lines should use httpx.Response.aiter_lines, but the current API does nothing to stop you from reading one chunk, then one byte, but httpx.Response (very reasonably) only lets you initialize the iterators once.
Implementing iter_chunks/iter_lines/etc as reading one byte at a time with await anext() on an aiter_raw sounds awful, since there's no read() method that can return a set number of bytes. That in general makes StreamingBody.read() quite clunky to implement.

codecov · 2024-02-19T12:39:21Z

Codecov Report

Attention: Patch coverage is 71.42857% with 84 lines in your changes missing coverage. Please review.

Project coverage is 89.93%. Comparing base (1515727) to head (4d4cf32).
Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
aiobotocore/httpxsession.py	71.31%	35 Missing ⚠️
tests/test_basic_s3.py	54.76%	19 Missing ⚠️
tests/conftest.py	66.66%	17 Missing ⚠️
aiobotocore/awsrequest.py	68.75%	5 Missing ⚠️
aiobotocore/_endpoint_helpers.py	71.42%	2 Missing ⚠️
aiobotocore/endpoint.py	77.77%	2 Missing ⚠️
tests/test_lambda.py	75.00%	2 Missing ⚠️
aiobotocore/response.py	92.30%	1 Missing ⚠️
tests/python3.8/test_eventstreams.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1085      +/-   ##
==========================================
- Coverage   90.76%   89.93%   -0.83%     
==========================================
  Files          68       69       +1     
  Lines        6542     6819     +277     
==========================================
+ Hits         5938     6133     +195     
- Misses        604      686      +82

Flag	Coverage Δ
unittests	`89.93% <71.42%> (-0.83%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jakkdl · 2024-02-19T14:35:51Z

Whooooo, all tests are passing!!!!
though I did an ugly with test_non_normalized_key_paths - I understand nothing about that test so I currently made the test pass if httpx returns a normalized path.

current TODOs:

I should add a command-line parameter that sets the http backend to be tested, so I can set up a CI environment without httpx installed to make sure that works.
Retryable exceptions.
- Maybe try to write a test for it
figure out the branches in convert_to_response_dict.
- I think they're fine?
~~figure out proxies, or ~~raise NotImplementedError.
- There is at least one test that sorta checks it so if raising I need to work around it.
Maybe add test for http_session_cls
Add documentation - RTD is broken?

codecov is very sad, but most of that is due to me duplicating code that wasn't covered to start with, or extending tests that aren't run in CI. I'll try to make it very slightly less sad, but making it completely unsad is very much out of scope for this PR.

Likewise RTD is failing ... and I think that's unrelated to the PR?

Add no-httpx run to CI on 3.12 Tests can now run without httpx installed. Exclude `if TYPE_CHECKING` blocks from coverage. various code cleanup

…ive errors on connector args not compatible with httpx. Remove proxy code and raise NotImplementedError. fix/add tests

jakkdl · 2024-02-21T09:23:55Z

@thejcannon @aneeshusa if you wanna do a review pass

jakkdl · 2024-03-20T10:41:02Z

Hey @thehesiod what's the feeling on this? It is turning out to be a messier and more disruptive change than initially thought in #749. I can pull out some of the changes to a separate PR to make this a bit smaller at least

thehesiod · 2024-03-20T16:53:24Z

hey sorry been down with a cold, will look asap. I don't mind big PRs

thehesiod · 2024-03-20T16:56:43Z

tests/test_basic_s3.py

+    if httpx and isinstance(aio_session, httpx.AsyncClient):
+        async with aio_session.stream("GET", presigned_url) as resp:
+            data = await resp.aread()


what do you think of making an adapter class so no test changes are necessary. We can expose the raw stream via a new property if needed

no test changes is not going to be possible - see #1085 (comment). But I could try to make one that implements a bunch of the basic functionality which would minimize test changes.

well, translating between await resp.read() and await resp.aread() with an adapter class is trivial.

But having an adapter class turn calls to resp['Body'].close() into await resp['Body'].aclose...
I guess it's possible in theory to hook into the currently running async framework and run .aclose() from a sync .close() in an adapter class... but that feels like a very bad idea. Especially as we're looking to support anyio and structured concurrency.

I suppose I could write a wrapper class that .. only translates read->aread, and gives specific errors for the other ones? It could maybe help transitioning code currently written, but I think perhaps more appropriate is to if/when dropping aiohttp we make the adapter class raise DeprecationError if calling .read() or .close().

I think the issue is you didn't wrap the body in the StreamingBody, that will solve these issues. For the close I think aiohttp did a bad decision, it should have been an async method. I think we should have a sync close method on the StreamingBody that creates a task around aclose and returns that task. That way if you call it sync it will be fine because eventually it will get closed, and if you want you can await it to wait for it to actually get closed.

I think we should have a sync close method on the StreamingBody that creates a task around aclose and returns that task. That way if you call it sync it will be fine because eventually it will get closed, and if you want you can await it to wait for it to actually get closed.

this will not play well with structured concurrency, and even with asyncio I think it's gonna be hard to guarantee it actually executes. But I think most reasonable people will be using the context-manager form anyway, so it's probably fine to have close return an informative error (with httpx) + add aclose to Streamingbody

thehesiod · 2024-03-20T16:57:52Z

tests/test_basic_s3.py

+    if current_http_backend == 'httpx':
+        assert key == 'key./name'
+    else:
+        assert key == 'key./././name'


hmm, ya this needs to be fixed. We can't have the response changing. This should match whatever botocore does, if current way is incorrect this is fine, otherwise needs to be fixed

The current behaviour is indeed how botocore handles it:
https://github.com/boto/botocore/blob/970d577087d404b0927cc27dc57178e01a3371cd/tests/integration/test_s3.py#L599-L606
so I'll have to do some digging

digging success!... but looks like it needs changes in upstream httpx to get fixed.

When we call self._session.build_request in

aiobotocore/aiobotocore/httpsession.py

Lines 451 to 456 in b19bc09

httpx_request = self._session.build_request(

method=request.method,

url=url,

headers=headers,

content=content,

)

httpx turns the string url into a httpx.URL object, which explicitly normalizes the path https://github.com/encode/httpx/blob/392dbe45f086d0877bd288c5d68abf860653b680/httpx/_urlparse.py#L387

We can manually create a httpx.URL object to be passed into httpx.build_request, but there's no parameter that control normalization, so at that point we'd have to create a subclass of httpx.URL or something to customize the behaviour.

I can open an issue for it in the httpx repo, but I'll first try to figure out why botocore seems to explicitly not want to normalize the path.

Okay so the reason these slashes aren't normalized is because having slashes (and periods) in a key name is allowed
But I think if we can replace the slashes in the key name with %2F somewhere along the chain then I think it'll be handled correctly.

the current percent encoding happens deep within the guts of botocore, where they explicitly mark / as safe for some reason: https://github.com/boto/botocore/blob/970d577087d404b0927cc27dc57178e01a3371cd/botocore/serialize.py#L520
Why? no clue. Can it be marked unsafe? I tried and ran unit tests and some functional tests in botocore/, and some tests did start to fail - though unclear if they're bad errors or just defensive asserts.

This has been requested in httpx since forever: encode/httpx#1805
let's see if me offering to write a PR can give it some traction.

ya main thing is downstream consumers may be doing string matching and could break logic

appreciate the digging!

turns out httpx does have a way to do it now! encode/httpx#1805 (reply in thread)

aiobotocore/_endpoint_helpers.py

aiobotocore/httpsession.py

.github/workflows/ci-cd.yml

aiobotocore/endpoint.py

thehesiod · 2025-02-28T00:02:06Z

thank you for the quick update!

aiobotocore/endpoint.py

thehesiod · 2025-02-28T00:25:34Z

aiobotocore/endpoint.py

+        if httpx and isinstance(http_response.raw, httpx.Response):
+            response_dict['body'] = http_response.raw


should keep the StreamingBody as this is our own version so we can adapt as necessary

I went into detail on this in #1085 (comment)

But if we want to get rid of e.g. aread vs read differences maybe it could make sense to create a HttpxStreamingBody that tries to strike a middle ground

aiobotocore/httpsession.py

thehesiod · 2025-02-28T00:35:06Z

tests/python3.8/boto_tests/unit/test_tokens.py

@@ -286,6 +287,10 @@ async def test_sso_token_provider_refresh(test_case):
    cache_key = "d033e22ae348aeb5660fc2140aec35850c4da997"
    token_cache = {}

+    # deepcopy the test case so the test can be parametrized against the same
+    # test case w/ aiohttp & httpx
+    test_case = deepcopy(test_case)


oh man, parametrize should be returning a frozendict. That's crazy of botocore changing global state in a test, we inherited this code from botocore: https://github.com/boto/botocore/blob/develop/tests/unit/test_tokens.py#L26

yeah it sounds like botocore should get a PR

thehesiod · 2025-02-28T00:36:25Z

tests/test_basic_s3.py

+@pytest.fixture
+def skip_httpx(current_http_backend: str) -> None:
+    if current_http_backend == 'httpx':
+        pytest.skip('proxy support not implemented for httpx')


better put this in conftest.py

why did I ever make it a fixture in the first place?? 😅

oh, because it fails in the setup of test_fail_proxy_request so I can't skip in the test body. It's specifically only used by test_fail_proxy_request so I think it should be in this file, but I'll add a comment

we could insert a bunch of logic into the s3_client fixture or somewhere, but I think the logic for skipping should be next to the test it skips. If it was used for skipping several other tests as well I'd agree though

thehesiod · 2025-02-28T00:38:54Z

tests/test_basic_s3.py

-    # TODO: think about better api and make behavior like in aiohttp
-    resp['Body'].close()
+    if httpx and isinstance(resp['Body'], httpx.Response):
+        data = await resp['Body'].aread()


given body is our own StreamResponse wrapper, i prefer we expose a more sane read method to match botocore, their whole thing of read vs aread is very strange design decision. This will also reduce the diff

tests/test_stubber.py

thehesiod · 2025-02-28T00:45:57Z

ok did a first pass, after StreamingBody fix I can do another pass for exceptions

thehesiod · 2025-02-28T00:46:05Z

awesome work btw!

jakkdl · 2025-03-03T12:29:03Z

I'm not sure why I'm getting spurious errors in test_run_lambda. It doesn't seem to be happening outside of this PR, so ...?

Also the 3.13 fail is.. idk? codecov fail?

jakob-keller · 2025-03-03T12:32:53Z

I'm not sure why I'm getting spurious errors in test_run_lambda. It doesn't seem to be happening outside of this PR, so ...?

Also the 3.13 fail is.. idk? codecov fail?

I notice two things:

Python 3.8 on ubuntu-24.04-arm is known to be flaky. Please ignore that specific case. It is marked as experimental and does not block CI.
Any other failure needs to be resolved. In this case, there is a timeout after 5 minutes. I have not seen that before, i.e. it is probably related to this PR.

jakkdl · 2025-03-03T14:26:25Z

I'm not sure why I'm getting spurious errors in test_run_lambda. It doesn't seem to be happening outside of this PR, so ...?
Also the 3.13 fail is.. idk? codecov fail?

I notice two things:

Python 3.8 on ubuntu-24.04-arm is known to be flaky. Please ignore that specific case. It is marked as experimental and does not block CI.

👍

Any other failure needs to be resolved. In this case, there is a timeout after 5 minutes. I have not seen that before, i.e. it is probably related to this PR.

This PR effectively doubles the number of tests, and the 3.13 in particular triples it with the no-httpx run, so it's not a big surprise that it's hitting timeout limits. Is it fine to just increase the 5-minute limit, or should I make no-httpx a separate CI run or something?

(running full pre-commit in every CI job (taking 1 minute) is also kinda overkill, but that's not for this PR)

jakob-keller · 2025-03-03T14:47:01Z

This PR effectively doubles the number of tests, and the 3.13 in particular triples it with the no-httpx run, so it's not a big surprise that it's hitting timeout limits. Is it fine to just increase the 5-minute limit, or should I make no-httpx a separate CI run or something?

Could we leverage the GitHub Action matrix for that purpose, i.e. add a "no-httpx" dimension or something? That would leverage parallelism and scale better IMO.

jakkdl · 2025-03-03T16:23:11Z

ugh, giving up for today. github actions is dark magic, I'll go play around in a separate branch/fork and figure out wth is going on.

But @thehesiod can still review the code, CI should be all green as soon as I get it to run again :)

jakkdl · 2025-03-04T10:45:02Z

gah, I think I figured it out. Because you use include to set matrix.experimental to set continue-on-error, if I add another include that doesn't specify matrix.experimental then it adds an entry to the matrix where matrix.experimental is unset, which makes continue-on-error undefined, and everything breaks (with no error message).
I think the better way of doing stuff is

continue-on-error: ${{ matrix.python-version == '3.8' && endsWith(matrix.os, 'arm') }}

and likewise with the codecov upload on 3.11; since you're not actually using include to add matrix entries.

But I can work around it for now if you prefer it the way it is

jakkdl · 2025-03-04T11:04:58Z

This PR effectively doubles the number of tests, and the 3.13 in particular triples it with the no-httpx run, so it's not a big surprise that it's hitting timeout limits. Is it fine to just increase the 5-minute limit, or should I make no-httpx a separate CI run or something?

Could we leverage the GitHub Action matrix for that purpose, i.e. add a "no-httpx" dimension or something? That would leverage parallelism and scale better IMO.

A minor problem with this is that you only upload code coverage from your 3.11 run, which means the code for specifically handling httpx not being installed does not get coverage. But this mostly just impacts stuff like this

try:
    import httpx
except ImportError:  # pragma: no cover # tested in a different matrix entry
    http = None

Replace aiohttp with httpx

1eb6888

thehesiod reviewed Feb 2, 2024

View reviewed changes

aiobotocore/awsrequest.py Outdated Show resolved Hide resolved

thehesiod reviewed Feb 2, 2024

View reviewed changes

jakkdl mentioned this pull request Feb 6, 2024

support pluggable http clients/coroutine frameworks (anyio/httpx) #749

Open

jakkdl added 5 commits February 9, 2024 15:03

WIP of full replacement of aiohttp with httpx

d10761c

mostly finished WIP of adding httpx support

3608872

fix various test failures

757819f

fix more parametrization issues

4327b58

fix typo current_http_stack -> current_http_backend

e123adc

jakkdl changed the title ~~Replace aiohttp with httpx~~ Add support for httpx as alternate backend Feb 19, 2024

jakkdl changed the title ~~Add support for httpx as alternate backend~~ Add support for httpx as backend Feb 19, 2024

jakkdl added 2 commits February 20, 2024 11:20

Add pytest flag for specifying backend when running tests.

d029aa4

Add no-httpx run to CI on 3.12 Tests can now run without httpx installed. Exclude `if TYPE_CHECKING` blocks from coverage. various code cleanup

add initial retryable exceptions. _validate_connector_args will now g…

8f6bd69

…ive errors on connector args not compatible with httpx. Remove proxy code and raise NotImplementedError. fix/add tests

jakkdl marked this pull request as ready for review February 20, 2024 12:28

yes

bdc680c

jakkdl requested a review from thehesiod March 1, 2024 10:26

jakkdl and others added 2 commits March 1, 2024 17:23

append coverage when running tests twice

48b7310

Merge branch 'master' into httpx

b19bc09

This was referenced Mar 20, 2024

small fixes to coverage, type hints, documentation, and QoL for Makefile #1101

Merged

expose http_session_cls in AioConfig #1102

Merged

thehesiod reviewed Mar 20, 2024

View reviewed changes

aiobotocore/_endpoint_helpers.py Outdated Show resolved Hide resolved

thehesiod reviewed Mar 20, 2024

View reviewed changes

aiobotocore/httpsession.py Outdated Show resolved Hide resolved

jakob-keller reviewed Feb 27, 2025

View reviewed changes

.github/workflows/ci-cd.yml Outdated Show resolved Hide resolved

jakob-keller reviewed Feb 27, 2025

View reviewed changes

aiobotocore/endpoint.py Outdated Show resolved Hide resolved

jakob-keller reviewed Feb 27, 2025

View reviewed changes

aiobotocore/endpoint.py Outdated Show resolved Hide resolved