Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

requests.get really slow when stream=True #2015

Closed
cournape opened this issue Apr 23, 2014 · 2 comments
Closed

requests.get really slow when stream=True #2015

cournape opened this issue Apr 23, 2014 · 2 comments

Comments

@cournape
Copy link

I noticed that using stream=True is really slow in some cases. Code that shows the issue:

import requests


url = "https://api.enthought.com/eggs/rh5-64/numpy-1.8.0-1.egg"
target = "numpy-1.8.0-1.egg"

use_streaming = True

if use_streaming:
    resp = requests.get(url, stream=True)
else:
    resp = requests.get(url)

resp.raise_for_status()

with open(target, "wb") as target:
    if use_streaming:
        for chunk in resp.iter_content():
            target.write(chunk)
    else:
        target.write(resp.content)

With use_streaming=True, it takes around 40 sec, and only 2 sec when False. Running this script with strace, it looks like the chunk size is 1 byte:

...
recvfrom(4, "\345", 1, 0, NULL, NULL)   = 1 # I see many lines like this

Looks like for some reason the chunk size is ridiculously small ?

I am using requests 2.2.1 on python 2.7 on debian.

@cournape
Copy link
Author

Hm, if I read the documentation correctly, I would have seen that the default chunk size is 1 byte... so nvm.

Is there a rationale for such a small size ?

@Lukasa
Copy link
Member

Lukasa commented Apr 23, 2014

As you've spotted, by default iter_content()'s chunk size is 1, ensuring that it returns as rapidly as possible. This favours responsiveness over throughput (because we endure a large number of syscalls).

Whether this is a good idea or not is unclear. We've got a giant issue that covers this (see #844), and that issue has not been decided emphatically. I'm inclined to increase the size, but wary about the risk of breaking things.

Note that the bug here is not to do with streaming: if you use r.content in either mode it'll be fast. It's simply the use of iter_content at its default chunk size that causes problems.

Anyway, the central issue is in #844, so I'll close this to centralise there.

@Lukasa Lukasa closed this as completed Apr 23, 2014
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants