Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

box_auth_service() randomly returns a 400 (Failed to connect to box.com API) message #166

Open
jrosen48 opened this issue Aug 7, 2020 · 23 comments
Labels

Comments

@jrosen48
Copy link

jrosen48 commented Aug 7, 2020

Thanks once again for a great package. I noticed that box_auth_service() sometimes returns a 400 (Failed to connect to box.com API) message. The frustrating part is that it appears to be mostly random when it does.

For example, below I run the function 10 times, and 4 of those 10 times were associated with 400 messages, three which were retried which succeeded, and one which did not.

I recognize that this appears to be on Box's - not {boxr}'s - side, but do you have any suggestions or ideas about how to fix or understand what is happening here?

> for (i in seq(10)) {
+     try(box_auth_service(token_text = unlist(read_lines('boxr-auth/token.json'))))
+     Sys.sleep(1)
+ }
boxr: Authenticated using OAuth2 (JWT) as faast ([email protected], id: 13460778359)
boxr: Authenticated using OAuth2 (JWT) as faast ([email protected], id: 13460778359)
Request failed [400]. Retrying in 1 seconds...
boxr: Authenticated using OAuth2 (JWT) as faast ([email protected], id: 13460778359)
Request failed [400]. Retrying in 1 seconds...
Request failed [400]. Retrying in 2 seconds...
Request failed [400]. Retrying in 1 seconds...
Request failed [400]. Retrying in 3.6 seconds...
Error in test_request() : 
  Bad Request (HTTP 400). Failed to connect to box.com API.
boxr: Authenticated using OAuth2 (JWT) as faast ([email protected], id: 13460778359)
Request failed [400]. Retrying in 1 seconds...
boxr: Authenticated using OAuth2 (JWT) as faast ([email protected], id: 13460778359)
Request failed [400]. Retrying in 1.8 seconds...
boxr: Authenticated using OAuth2 (JWT) as faast ([email protected], id: 13460778359)
boxr: Authenticated using OAuth2 (JWT) as faast ([email protected], id: 13460778359)
Request failed [400]. Retrying in 1.4 seconds...
boxr: Authenticated using OAuth2 (JWT) as faast ([email protected], id: 13460778359)
boxr: Authenticated using OAuth2 (JWT) as faast ([email protected], id: 13460778359)
@ijlyttle
Copy link
Member

ijlyttle commented Aug 7, 2020

I don't know, I have seen this happen more-and-more recently myself...

We were very fortunate for #148 so that we can be more resilient, thanks @jameslamb!

@nathancday - do you have any insights here?

@nathancday
Copy link
Member

nathancday commented Aug 7, 2020

I've always noticed similar behavior with other API auth apps, gmailr being the most recent and even boxr before service account support.

I'm not sure what causes it, my gut feel is that it happens more often in R-sessions with multiple auth attempts. So my solve is to restart the R session and retry the auth, which seems to work for me most of the time.

Beyond those feels, I think the automatic retry with backoff behavior added by Chicago R Collab is a reasonable attempt that balances end-user utility with API respect. I'm open to suggestions or demonstrations in other packages.

@jameslamb
Copy link
Contributor

In #148 I kept all the defaults for httr::RETRY()...you might want to try increasing the time that it waits between retries. I wonder if the "when I restart R it seems to work" behavior is just because restarting R can take 5-10 seconds?

I know this is the hardest type of bug to address because it's so hard to reproduce 😬

@jrosen48
Copy link
Author

Thanks so much, all, for engaging with this. Wondering what @ijlyttle and @nathancday you think of the idea of increasing the time that the function waits between retries. I wonder if that could reduce the frequency of errors from roughly 1/10 to something more like 1/20 ... or more. That would make it a lot more feasible to reliably connect via shinyapps.io (what I've been working on doing... per #165). Open to other ideas to address that goal and others here, of course, and happy to try things out and report back what works. Thank you again for creating a fantastic package that made it possible for me to d what I was trying to - and for talking about this.

@ijlyttle
Copy link
Member

One way we could go is to implement an option() by modifying our RETRY() calls along these lines:

httr::RETRY(
  <other args>
  times = getOption("boxr.retry.times", default = 5)
)

If you wanted to modify the number of times, you could:

  • set the option (globally) in your server.R file

    options(boxr.retry.times = 10)
  • scope the option change to a particular function:

    withr::local_options(list(boxr.retry.times = 10))
  • scope it to particular code:

    withr::with_options(
      list(boxr.retry.times = 10),
      box_auth()
    )

@nathancday
Copy link
Member

We have a testable hypothesis: does increasing the back-off reduce the number of 400 errors for authentication?

I'm not in favor of retrying more times, the http protocol says 400 error-ed requests should not be retried by a client without modification. I think we should view this as a back-off only problem, so we remain good API citizens.

@ijlyttle
Copy link
Member

ijlyttle commented Aug 22, 2020

I agree - we want to remain good API citizens, and should with our defaults. At some point, I might like to supply the options, though.

A couple of questions that are bugging me:

  • is it only box_auth_service() where we are seeing this?
  • how on earth does a "bad request" become a "good request" if we are not (deliberately) doing anything different? This is what a colleague calls a "sincerity test" - i.e., we were not sincere enough on the first request...

@jrosen48
Copy link
Author

Hi, all, just thinking about this and wondering about next steps, first and foremost:

Happy to help however I can. My concern - zero percent yours - I am hoping to go "live" with this to collect data via a Shiny app sometime in November, so am for that reason especially motivated to help however I might!

@ijlyttle
Copy link
Member

Hi @jrosen48

For me, this happens intermittently, almost like the occasional "bad hair day" - is it the same for you? Or is it more constant?

Of course, intermittent is still unacceptable if there is a fix at hand.

@jrosen48
Copy link
Author

I think it is approximately 1 times in 10 based on the logs and from using and sharing it with colleagues. About how often I have a bad hair day (okay, maybe a bit less often). It is available from here: https://faast.shinyapps.io/generality-shiny/

I think the issue on my end is that the entire app fails if box has an authentication error. I'm uncertain as to whether that or collecting no data is worse, but, this does make errors very apparent. A very practical work-around is to simply ask people to re-load when it fails.

@ijlyttle
Copy link
Member

I did a bit of googling and came across this.

As a short-term response, we could modify the code so that a 400 error on JWT auth (seems to be the only place I see it, too) will also print the err.response.body. This might suggest a way forward.

@nathancday
Copy link
Member

nathancday commented Sep 30, 2020

I did two little experiments on this yesterday. The goal was to understand how frequently this occurs and if the delay between auth attempts has an effect on frequency. There are two levels 10 seconds and 100 seconds between auth attempts.

box_auth_service_test

Two interesting things popped out to me:

  • Longer intervals b/w auth attemps have an less failures.
  • If an auth attempt will be ultimately successful, there is never more than one retry attempt.

https://gist.github.com/nathancday/e7a038d70c93169afead1d131ce752cc

@ijlyttle
Copy link
Member

ijlyttle commented Oct 10, 2020

I have a first theory as to what is going on - I started getting the failures for a while today. The failures were to get the token (in addition to the test request).

When we make the request, we assert that our claim is valid for a limited time; see the exp parameter here, and our code.

  • we set the expiry 45 seconds into the future
  • you cannot set it for more than 60 seconds into the future

If your expiry claim is invalid, you get a 400 error - I was able to (maybe) reproduce the error by setting the expiry to an invalid value (1 hour in the future). I say maybe because I don't know if this is the problem.

However, if the local computer's clock is sufficiently different from Box' clock, I can see where you might get a problem, and I can see how that could be intermittent.

If that is the problem, is there a way to ask Box what time it is?

@ijlyttle
Copy link
Member

One of the frustrating things about this is that it is not suited to a reprex.

That said, I am having the problem again - I looked at the timestamps in the requests and responses, and compared my clock to the world time API - all good (which is frustrating).

Then, I changed the exp parameter from 55 to 5, and it went from not working to working.

Thinking to write up a small PR that will try different expiry times...

@jrosen48
Copy link
Author

Woohoo!

> box_auth_service(token_text = unlist(read_lines('boxr-auth/token.json')))
boxr: Authenticated using OAuth2 (JWT) as faast ([email protected], id: 13460778359)
> 
> for (i in seq(10)) {
+     try(boxr::box_auth_service(token_text = unlist(readLines('boxr-auth/token.json'))))
+     Sys.sleep(1)
+ }
Failed JWT request: validity claim was 30 seconds.
Retrying JWT request with validity claim of 15 seconds.
boxr: Authenticated using OAuth2 (JWT) as faast ([email protected], id: 13460778359)
Failed JWT request: validity claim was 30 seconds.
Retrying JWT request with validity claim of 15 seconds.
Failed JWT request: validity claim was 15 seconds.
Retrying JWT request with validity claim of 45 seconds.
boxr: Authenticated using OAuth2 (JWT) as faast ([email protected], id: 13460778359)
boxr: Authenticated using OAuth2 (JWT) as faast ([email protected], id: 13460778359)
boxr: Authenticated using OAuth2 (JWT) as faast ([email protected], id: 13460778359)
boxr: Authenticated using OAuth2 (JWT) as faast ([email protected], id: 13460778359)
Failed JWT request: validity claim was 30 seconds.
Retrying JWT request with validity claim of 15 seconds.
Failed JWT request: validity claim was 15 seconds.
Retrying JWT request with validity claim of 45 seconds.
boxr: Authenticated using OAuth2 (JWT) as faast ([email protected], id: 13460778359)
boxr: Authenticated using OAuth2 (JWT) as faast ([email protected], id: 13460778359)
boxr: Authenticated using OAuth2 (JWT) as faast ([email protected], id: 13460778359)
Failed JWT request: validity claim was 30 seconds.
Retrying JWT request with validity claim of 15 seconds.
boxr: Authenticated using OAuth2 (JWT) as faast ([email protected], id: 13460778359)
Failed JWT request: validity claim was 30 seconds.
Retrying JWT request with validity claim of 15 seconds.
boxr: Authenticated using OAuth2 (JWT) as faast ([email protected], id: 13460778359)

Thank you @ijlyttle and @nathancday for your incredible persistence in tracking down this very unusual (to me) kind of issue.

@ijlyttle
Copy link
Member

@jrosen48 Thank you for bringing this to the surface and for your continued patience and willingness to test solutions (both working and non-working)!

@nathancday
Copy link
Member

Seems to solve on my testing too, nice work Ian!

Am I reading your code correctly, each level of dt gets up to 4 retries? If that assumption is valid, we are making a lot more API requests:
Rplot02

@ijlyttle
Copy link
Member

ijlyttle commented Oct 11, 2020

Thanks!

Each level of dt gets only one try; the RETRY is short-circuited for 400 using terminate_on = box_terminal_http_codes().

@nathancday
Copy link
Member

Well then that's just perfect!

@ijlyttle
Copy link
Member

fixed in #193

@ijlyttle
Copy link
Member

Now I'm getting this again:

Failed JWT request: time offset was 0 seconds.
Retrying JWT request: time offset now -15 seconds.
Failed JWT request: time offset was -15 seconds.
Retrying JWT request: time offset now 15 seconds.
Failed JWT request: time offset was 15 seconds.
Retrying JWT request: time offset now -30 seconds.
Failed JWT request: time offset was -30 seconds.
Retrying JWT request: time offset now 30 seconds.
Failed JWT request: time offset was 30 seconds.
Error in test_request() : 
  Bad Request (HTTP 400). Failed to connect to box.com API.

In an email, @nathancday thinks that this may have to do with frequent auth attempts - why not check it out?

To implement a back-off, one could initialize a sleep time, then double it and add some noise should we need to repeat the loop.

@ijlyttle ijlyttle reopened this Oct 17, 2020
@jrosen48
Copy link
Author

I also got this:

Failed JWT request: validity claim was 30 seconds.
Retrying JWT request with validity claim of 15 seconds.
Failed JWT request: validity claim was 15 seconds.
Retrying JWT request with validity claim of 45 seconds.
Failed JWT request: validity claim was 45 seconds.
Retrying JWT request with validity claim of 0 seconds.
Failed JWT request: validity claim was 0 seconds.
Retrying JWT request with validity claim of 60 seconds.
Failed JWT request: validity claim was 60 seconds.
Error in value[[3L]](cond) : 
  Bad Request (HTTP 400). Failed to connect to box.com API.
Calls: local ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>
Execution halted

@ijlyttle
Copy link
Member

This is puzzling and frustrating

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants