Back off function for when rate limit is hit #34

alifhughes · 2021-02-07T21:36:55Z

What it does:

Decorator function that retries requests on rate limit status code returned
- This is enabled by default but can be disabled through the client
Does so using exponential back off with a jitter function that spreads out sleep duration
- Idea taken from this blog post by AWS
Wraps around the request in Fetcher base class

alifhughes · 2021-02-09T21:01:21Z

discogs_client/tests/test_utils.py

+        class BackoffTestClass:
+            def __init__(self):
+                self.backoff_enabled = True
+
+            @backoff
+            def always_fails(self):
+                return mock_ratelimited_response
+
+            @backoff
+            def returns_non_ratelimit_status_code(self):
+                return mock_ok_response
+
+            @backoff
+            def succeeds_after_x_calls(self):
+                nonlocal call_count
+                call_count += 1
+                if call_count < 3:
+                    return mock_ratelimited_response
+                else:
+                    return mock_ok_response
+
+            @backoff
+            def function_not_decorated_when_disabled(self):
+                return mock_ratelimited_response


In two minds whether or not to move this class BackoffTestClass outside of the function. It would make the test itself cleaner but this class should only be used by this test, for it to be outside could imply it could be used by other tests? Would be keen to see what people think about this, open to suggestions

alifhughes · 2021-02-09T21:18:57Z

discogs_client/utils.py

+    def wrapper(self, *args, **kwargs):
+
+        MAX_ATTEMPTS = 100
+
+        if not hasattr(self, "backoff_enabled"):
+            raise AttributeError(
+                "backoff decorator needs class with the decorated method "
+                "to have backoff_enabled attribute."
+            )
+
+        if not self.backoff_enabled:
+            return f(self, *args, **kwargs)


I chose to go down this route as after some consideration to other options, I think this is the best choice. Pros for this way is that it makes use @ syntactic sugar, generally I think it looks good and relatively clean and it meant for the least impactful change in the sense that the least amount of existing code has changed. The main con for this is that the use of hasattr(self, "backoff_enabled" and generally self in the decorator. It makes it only useful for a class method, and specifically a class with the backoff_enabled attribute. Usually decorators shouldn't be this specific or have the function it is decorating implementation details 'bleed' into it, which I disliked about this choice.

However, other options were to get rid of the syntactic sugar of the decorator and wrap it 'manually' in the Fetcher constructor similar to something like:

class Fetcher: def __init__(self, backoff_enabled): self.backoff_enabled = backoff_enabled self.request = backoff(self.backoff_enabled)(self.request) ...

but this means I would have to force the user to pass the backoff_enabled into the Client constructor so that it would be passed into the Fetcher constructor when it is built, which I thought would change a lot of code for no great reason.

I was toying with the idea of trying to create a context manager to somehow wrap the request() call something like:

def request(self, method, url, data, headers, params=None): with backoff(self.backoff_enabled) as backoff: response = request(method=method, url=url, data=data, headers=headers, params=params) return response

but really this wouldn't work well either, I think I would need to pass in the request method into the context manager to be able to loop until it didn't hit rate limit?

And lastly I could just move the decorator code into the Fetcher's request() and be gone with the decorator entirely, but again, decorator seemed to be a little cleaner.

alifhughes · 2021-02-09T21:24:02Z

discogs_client/utils.py

+def jitter(delay: int) -> float:
+    return uniform(0, delay)


Jitter idea taken from this blog post by AWS. There is a few different implementations which we could try but in testing this way worked well.

First of all and what I forgot to mention in my firs comment above: Thanks a lot for this PR Alistair! This is amazing! Great job!

Just flew through these ideas and didn't get everything but very interesting post!

Because reviewing all this code seemed slighty over my head I decided to do some kind of smoke-test/performance test. I configured DiscoDOS to use this branch and compared backoff with my own internal rate limit handling (called simply rate_limit_slow_downer). I imported my whole collection (all releases and all tracks) into the local sqlite database. These are the stats:

backoff: Processed releases: 1885. Imported releases to DiscoBASE: 1885. Database errors (release import): 0. Processed tracks: 0. Imported tracks to DiscoBASE: 6790. Database errors (track import): 0. Discogs errors (track import): 0. Discogs import took 0 hours 52 minutes 17 seconds rate_limit_slow_downer: Processed releases: 1885. Imported releases to DiscoBASE: 1885. Database errors (release import): 0. Processed tracks: 0. Imported tracks to DiscoBASE: 6790. Database errors (track import): 0. Discogs errors (track import): 0. Discogs import took 0 hours 42 minutes 14 seconds

Backoff did run through without problems, constantly staying on the edge of 1 or 2 requests remaining, waiting, doing a request, waiting and so on.

My version is pretty simple and works like this: When 15 remaining requests are reached it just waits 3 seconds and then does the next request. So it turned out that it was 10 minutes faster as backoff. I am not sure why but would like to experiment with it. I am guessing it would be better if backoff would "leave some space" between 0 requests left and trying again immediatley. Is this somehow possible?

Also as a next step it would be nice to have some these settings to be good defaults but let the user tune them. But this as a sidenote here, and should rather be a seperate tuning PR. Let's focus on the stability of this feature first!

Another feature idea (for later as well) could be logging, reporting somehow what backoff is doing. Currently I am just asking the _fetchers.rate_limit_remaining….. headers what the current remaining requests are to watch what's going on

First of all and what I forgot to mention in my firs comment above: Thanks a lot for this PR Alistair! This is amazing! Great job!

Just flew through these ideas and didn't get everything but very interesting post!

Haha thanks @JOJ0 😄

Because reviewing all this code seemed slighty over my head I decided to do some kind of smoke-test/performance test. I configured DiscoDOS to use this branch and compared backoff with my own internal rate limit handling (called simply rate_limit_slow_downer). I imported my whole collection (all releases and all tracks) into the local sqlite database. These are the stats:

The real testing over longer periods is a good idea actually and that is a very good to benchmark! I did a shorter experiment but not so much so I'll do some further testing in light of this...

My first guess on the difference in time would be that this PR version is slower because the duration of sleeping on the whole would be higher, as you're doing a consistent 3 seconds, where as this one grows exponentially and chooses a random time, but is only a guess! It would be interesting to display the total time 'slept' between the two. I think main takeaway from the 'full exponential jitter' was that it is kindest to the API we're calling as if we're doing a constant sleep duration it could cause clients grouping up and calling at the same time which is the risk with constant time/without jitter.

I am guessing it would be better if backoff would "leave some space" between 0 requests left and trying again immediatley. Is this somehow possible?

This is also interesting thought, going to do some testing!

README.mkd

JOJ0 · 2021-02-10T10:44:07Z

discogs_client/utils.py

+
+
+def get_backoff_duration(exponent: int) -> float:
+    sleep_duration = 2 ** exponent


Setting the constant 2 to something higher here is worth experimenting here. I will give it a try.

VinylVault · 2021-02-12T09:09:53Z

is there a way i can test these? collection of over 3800 records, which in my current code with a few 1second sleeps takes about 7 hours, as a benchmark

JOJ0 · 2021-02-13T10:44:45Z

is there a way i can test these? collection of over 3800 records, which in my current code with a few 1second sleeps takes about 7 hours, as a benchmark

same options as described in #33, just adjust the branch name: pip install -e git+https://github.com/alifhughes/discogs_client@backoff_on-ratelimit#egg=python3-discogs-client

VinylVault · 2021-02-13T11:42:23Z

@JOJ0
tried the above ...

pip3 install -e git+https://github.com/alifhughes/discogs_client@backoff_on-ratelimit#egg=python3-discogs-client
Obtaining python3-discogs-client from git+https://github.com/alifhughes/discogs_client@backoff_on-ratelimit#egg=python3-discogs-client
  Cloning https://github.com/alifhughes/discogs_client (to revision backoff_on-ratelimit) to ./src/python3-discogs-client
  Did not find branch or tag 'backoff_on-ratelimit', assuming revision or ref.
error: pathspec 'backoff_on-ratelimit' did not match any file(s) known to git
Command "git checkout -q backoff_on-ratelimit" failed with error code 1 in /home/dex/pythoncode/Vault2022/src/python3-discogs-client

JOJ0 · 2021-02-13T13:04:06Z

Did not find branch or tag 'backoff_on-ratelimit', assuming revision or
Command "git checkout -q backoff_on-ratelimit" failed with error code 1 in /home/dex/pythoncode/Vault2022/src/python3-discogs-client

sorry typo in branchname. dash vs underscore ;-)

JOJ0

Actually I think this is working well enough and most of all seems to be stable! Let's get it in and performance optimization/tuneability another time? What do you think? @alifhughes @hellricer

@alifhughes just the one wording thing in the comment could be improved with the one sentence you proposed.

…discogs_client into backoff-on-ratelimit

alifhughes · 2021-02-13T18:46:08Z

Actually I think this is working well enough and most of all seems to be stable! Let's get it in and performance optimization/tuneability another time? What do you think? @alifhughes @hellricer

@alifhughes just the one wording thing in the comment could be improved with the one sentence you proposed.

I think that is fair. I'd definitely like to experiment with making this more efficient but for the sake of getting it out as it is a highly desired feature, I'd be happy to merge as-is, and improve on it later with some more time. If @hellricer is happy with this too, then lets merge!

PS. did the doc update @JOJ0 good spot 🙂

alifhughes added 9 commits February 5, 2021 11:00

wip

18dcaad

tests added, working prototype.

1f8c6c5

tidied files committed

a74eea1

changed global variable in test to nonlocal

ef27a41

fixed message

fc6a4c3

refactored, corrected typing

dc35b22

refactored, changed while loop and removed unused imports

9289779

fixed toggle. refactored, WIP, need to fix tests.

c604616

refactored

8d38a10

alifhughes commented Feb 9, 2021

View reviewed changes

alifhughes requested review from hellricer and JOJ0 February 9, 2021 21:27

Merge branch 'master' into backoff-on-ratelimit

2bcd395

alifhughes marked this pull request as ready for review February 9, 2021 21:28

alifhughes changed the title ~~!DNMY! - Back off function for when rate limit is hit~~ Back off function for when rate limit is hit Feb 9, 2021

JOJ0 reviewed Feb 10, 2021

View reviewed changes

README.mkd Outdated Show resolved Hide resolved

JOJ0 reviewed Feb 10, 2021

View reviewed changes

fixed readme

973bf08

JOJ0 approved these changes Feb 13, 2021

View reviewed changes

alifhughes added 2 commits February 13, 2021 18:39

Merge branch 'backoff-on-ratelimit' of https://github.com/alifhughes/…

c8b84cd

…discogs_client into backoff-on-ratelimit

merged with master

78a287f

alifhughes added 2 commits February 14, 2021 17:05

Merge branch 'master' into backoff-on-ratelimit

389d2ba

Update readme

d4915fe

alifhughes merged commit e6f0433 into joalla:master Feb 14, 2021

JOJ0 mentioned this pull request Dec 11, 2023

Spotify: Improve error handling and retry logic beetbox/beets#4997

Open

JOJ0 mentioned this pull request Nov 15, 2024

Meta issue for JSONDecode errors we see often lately #163

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Back off function for when rate limit is hit #34

Back off function for when rate limit is hit #34

alifhughes commented Feb 7, 2021 •

edited

Loading

alifhughes Feb 9, 2021

alifhughes Feb 9, 2021

alifhughes Feb 9, 2021 •

edited

Loading

JOJ0 Feb 10, 2021 •

edited

Loading

JOJ0 Feb 10, 2021 •

edited

Loading

alifhughes Feb 10, 2021

JOJ0 Feb 10, 2021

VinylVault commented Feb 12, 2021

JOJ0 commented Feb 13, 2021

VinylVault commented Feb 13, 2021

JOJ0 commented Feb 13, 2021

JOJ0 left a comment •

edited

Loading

alifhughes commented Feb 13, 2021



		def get_backoff_duration(exponent: int) -> float:
		sleep_duration = 2 ** exponent

Back off function for when rate limit is hit #34

Back off function for when rate limit is hit #34

Conversation

alifhughes commented Feb 7, 2021 • edited Loading

What it does:

alifhughes Feb 9, 2021

Choose a reason for hiding this comment

alifhughes Feb 9, 2021

Choose a reason for hiding this comment

alifhughes Feb 9, 2021 • edited Loading

Choose a reason for hiding this comment

JOJ0 Feb 10, 2021 • edited Loading

Choose a reason for hiding this comment

JOJ0 Feb 10, 2021 • edited Loading

Choose a reason for hiding this comment

alifhughes Feb 10, 2021

Choose a reason for hiding this comment

JOJ0 Feb 10, 2021

Choose a reason for hiding this comment

VinylVault commented Feb 12, 2021

JOJ0 commented Feb 13, 2021

VinylVault commented Feb 13, 2021

JOJ0 commented Feb 13, 2021

JOJ0 left a comment • edited Loading

Choose a reason for hiding this comment

alifhughes commented Feb 13, 2021

alifhughes commented Feb 7, 2021 •

edited

Loading

alifhughes Feb 9, 2021 •

edited

Loading

JOJ0 Feb 10, 2021 •

edited

Loading

JOJ0 Feb 10, 2021 •

edited

Loading

JOJ0 left a comment •

edited

Loading