[Bug] 0.27.1 introduced an bug when two queries are sent one after another #1210

MaximillionMai · 2020-05-17T00:34:17Z

For this function in ApolloClient,
@discardableResult public func fetch<Query: GraphQLQuery>(query: Query,
cachePolicy: CachePolicy = .returnCacheDataElseFetch,
context: UnsafeMutableRawPointer? = nil,
queue: DispatchQueue = DispatchQueue.main,
resultHandler: GraphQLResultHandler<Query.Data>? = nil) -> Cancellable

[Bug] 0.27.1 introduced a bug when calling the function above, one after another, with different queries. One of the two queries would not call the "resultHandler" callback. Sometimes the one that does call the callback would come back with this error: dataForRequestNotFound.

This bug does not reproduce in 0.24.1. I will check whether it reproduces in 0.25.1 and report back.

designatednerd · 2020-05-17T17:22:27Z

@MaximillionMai Can you confirm which version this breaks on for you? That will help me do a comparison between that version and the previous version to figure out what might have changed here.

yuriiik · 2020-05-19T11:34:50Z

Hi @designatednerd! We're experiencing the same issue. Testing shows that it reproduces in 0.27.1 and does NOT reproduce in 0.24.1 and 0.25.1.

sfoulston · 2020-05-19T13:20:58Z

I have just noticed this too, in v0.27.1. In my case it's when calling watch(...) simultaneously for two different queries. I receive a dataForRequestNotFound error occasionally for one of the queries.

sneakyness · 2020-05-19T19:11:23Z

We've been seeing this occasionally also, have not had time to look into it with more depth as we were assuming it was the dogshit wifi we have currently. Just a series of normal fetches in our case

designatednerd · 2020-05-19T19:18:50Z

Looks like the problem is here - the data object that's been dequeued for the task which has just finished is nil, which is weird, since we create an empty data object for each task before we start it.

This was introduced in 0.27.0 so anything prior to that will not be affected.

Thought: Are any of you calling cancel directly on the Cancelable returned from URLSessionClient?

sneakyness · 2020-05-19T19:29:11Z

Nope! The only place I reliably see it is when I'm calling a series of fetch requests one after another, at one point I kick off a request in parallel for each of the objects received, but the request that fails isn't always one of those.

designatednerd · 2020-05-19T19:53:53Z

@sneakyness Got a rough guess on how many requests at once you've got going before you see this issue?

designatednerd · 2020-05-19T19:59:55Z

Also, that error has an associated value that should have the originalRequest from the task that we couldn't pull data for - are you able to tell if there's any pattern to what's failing vs what's working?

sneakyness · 2020-05-19T20:00:57Z

2-3 completed (each one must finish before the next can start), then 1+ fetch for one type while also making a singular request of another. The 1+ is commonly 4-10+, making more does not seem to change how sporadic it is. I'll start logging that now but IME it's always either of the last two

designatednerd · 2020-05-19T20:05:08Z

Thanks. Woof, sporadic bugs are my faaaavorite to troubleshoot 😆

sneakyness · 2020-05-19T20:32:44Z

Unfortunately I'm in the middle of an API Model refactor, if I'm lucky I should be able to automate firing this process over and over to try and reproduce by end of week

designatednerd · 2020-05-19T21:02:04Z

Just pushed up a branch where I added a test that checks if 300 identical URL requests hitting the client at the same time (from different threads) create different IDs, and it passes.

Throwing some breakpoints in and doing some po's, it looks like it does create one instance per ID of the data in the self.datas property for URLSessionClient, and none of them are nil. That puts a kibosh on my theory of "Maybe different requests are winding up with the same ID if they're being sent at the same time, causing the data to be deallocated when the first request finishes."

The only place the data for a particular task ID gets removed is either in clearTask or clearAllTasks - clearAllTasks only gets hit when the session invalidates, which would give you a different error. clearTask only gets hit either a) If you cancel via URLSessionClient's cancel(task:) method or b) If the task actually completes.

I've added this level of detail because I'm basically stumped. Clearly, something's causing the data for a particular URLSessionTask identifier key to get nulled out. What the hell is doing that? If anyone's got thoughts, would love to hear them.

sneakyness · 2020-05-19T21:18:30Z

I'm using DispatchQueue.concurrentPerform, not super sure on how different that is from .userInitiated.

I'm not doing this specifically for the network requests, there's a decent amount of objects being initialized and sorted, just happened to be a clean spot to snag an id I needed for the request.

Actually I've just realized that it might get called multiple times per index, do you think that's causing it? 🤡

designatednerd · 2020-05-19T21:34:28Z

Actually I've just realized that it might get called multiple times per index

I'm not sure what you mean by this

sneakyness · 2020-05-19T21:41:25Z

If it's parallel can't it rarely call the same index twice? Especially in cases where the number of tasks is >8 (afaik 8 is the max number of threads it'll kick off)

designatednerd · 2020-05-19T21:44:04Z

I still am missing something - what is "it" that could be calling the same index multiple times in this context - do you mean the URLSession creating tasks with the same identifier, or something else?

sneakyness · 2020-05-19T21:45:41Z

concurrentPerform, so it might be calling the same fetch multiple times, but that doesn't explain why it would fail with "dataForRequestNotFound" instead of eventually just succeeding

designatednerd · 2020-05-19T21:54:49Z

Ah! That makes more sense. I updated the test to use concurrentPerform and yep, multiple tasks with the same ID are getting generated.

designatednerd · 2020-05-19T22:49:57Z

(basically if multiple tasks with the same ID are getting generated, then my "I will unique this based on the task identifier" strategy falls apart entirely)

sneakyness · 2020-05-19T22:56:58Z

My case may be different from others in that the first request was fine. If we know for sure this is the cause, I can do whatever to make sure it's only called once.

If there was an implementation of fetch/watch/etc like fetchOnce that either returned a request if there's no task ID, or did nothing/emit something that said this was the case, would that be a good bandaid for the others' cases pending a bigger change?

designatednerd · 2020-05-20T01:36:14Z

The first request would probably be fine in what you were describing because it's not trying to make multiple requests at the same time, it's once you start making multiple concurrent requests that the ID incrementor fails. Kind of wish they'd gone for a UUID instead of an auto-incrementing int for that identifier...

I'll think on that - for now I need to look at what to do about the fact that the ID isn't unique if it's hit concurrently repeatedly, because I thought Apple was handling that and they ain't. 🙃

designatednerd · 2020-05-21T02:35:07Z

👋Can I ask a couple of you folks to try out the fix/multifetch branch to see if the changes there make a difference? I've updated the dictionary to key off the URLSessionTask rather than its identifier, so this should stop the issue where something gets ripped out of the dictionary too early. The tests are passing but I'd love to get confirmation this is working in the real world before I PR it.

yuriiik · 2020-05-22T08:21:13Z

Just tested the fix/multifetch branch. Unfortunately the issue still persists. In our case we just make two identical serial calls to fetch function on the same queue, and for one of these invocations resultHandler is not called back. It happens roughly 4/5 times, sometimes it works as expected.

RolandasRazma · 2020-05-22T08:31:04Z

this sounds like completion handler overwritten somewhere without cancelling and new call created without canceling old one

designatednerd · 2020-05-22T14:06:01Z

@RolandasRazma That's definitely what was happening when using the task identifier as the key, since the task identifier would not get incremented if the session was getting hit quickly enough, and everything for the particular key would get hammered by the next task.

@yuriiik - Are you making other calls at the same time, or just the two identical calls? Also, thank you for testing, it's a bit frustrating that the concurrentPerform test didn't uncover this 😞

yuriiik · 2020-05-22T19:24:36Z

@designatednerd – just two identical calls, one after another.

sneakyness · 2020-05-22T21:42:13Z

The var is different in mine but otherwise they are the same call. Applied branch, saw the bug once, set a breakpoint, never saw it again 👻

The process of triggering the request was streamlined, I was able to run it on repeat long enough to forget about it and still got nothing!

designatednerd · 2020-05-23T03:51:31Z

@sneakyness Did it continue to work fine after you removed the breakpoint? I'm thinking it might be a Heisenbug (in that it changes when observed, probably because the breakpoint slows things up enough to stop whatever was happening concurrently before) if it goes back to not working. If it randomly broke once and then worked forevermore, I have no goddamned idea 🙃

@yuriiik Something occurred to me - can you confirm that you're not using Query Watchers? There's a bug in the current version where we're super-agressively cancelling in-flight fetches if a cache result comes in that's fixed with #1156 (which is merged but not shipped). If you are using query watchers, do you mind pulling master and letting me know if that fixes the issue?

sneakyness · 2020-05-23T04:53:26Z

That lines up with what I figured was going on earlier, the only time it was happening was either after the backend had been rebuilt and the query did a cold run, or if we were in the "best worst zone" of wifi coverage which caused a similar amount of delay.

This weekend I've got some nebulous and grand refactoring planned for the backend, I suspect once the schema tidies up this will disappear. If I manage to cause it again after that I will let you know. I'm inclined to say that this error is a "smell" and that ultimately it's best to find a way to get the result you want with less requests, even if it shouldn't happen

MaximillionMai · 2020-05-24T03:36:39Z

@designatednerd Sorry to get back to you late! It was indeed introduced in version 0.27.1.
For me, it repros when I call two different query calls one right after another.

yuriiik · 2020-05-24T08:44:51Z

@designatednerd we're not using Query Watchers, so that's not the case I think.

designatednerd · 2020-05-26T22:18:27Z

OK, I think I got it. Basically: Concurrency is hard. 🙃

Please see #1227.

designatednerd · 2020-05-28T01:32:06Z

#1227 was merged and is now available as part of 0.28.0. If you are still experiencing any problems around this, please open a new issue indicating what version you are using.

Thanks for uncovering this!

yuriiik · 2020-06-10T12:23:14Z

We can no longer reproduce the issue on 0.28.0. @designatednerd thank you so much!

designatednerd added the networking-stack label May 18, 2020

designatednerd added the bug Generally incorrect behavior label May 21, 2020

This was referenced May 26, 2020

os_unfair_lock after updating to 0.27.1 #1226

Closed

Fix for concurrent fetching bugs #1227

Merged

designatednerd added this to the Next Release milestone May 27, 2020

designatednerd closed this as completed May 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] 0.27.1 introduced an bug when two queries are sent one after another #1210

[Bug] 0.27.1 introduced an bug when two queries are sent one after another #1210

MaximillionMai commented May 17, 2020 •

edited

Loading

designatednerd commented May 17, 2020

yuriiik commented May 19, 2020

sfoulston commented May 19, 2020

sneakyness commented May 19, 2020

designatednerd commented May 19, 2020

sneakyness commented May 19, 2020

designatednerd commented May 19, 2020

designatednerd commented May 19, 2020

sneakyness commented May 19, 2020 •

edited

Loading

designatednerd commented May 19, 2020

sneakyness commented May 19, 2020

designatednerd commented May 19, 2020

sneakyness commented May 19, 2020

designatednerd commented May 19, 2020

sneakyness commented May 19, 2020

designatednerd commented May 19, 2020

sneakyness commented May 19, 2020

designatednerd commented May 19, 2020

designatednerd commented May 19, 2020

sneakyness commented May 19, 2020 •

edited

Loading

designatednerd commented May 20, 2020

designatednerd commented May 21, 2020

yuriiik commented May 22, 2020

RolandasRazma commented May 22, 2020

designatednerd commented May 22, 2020 •

edited

Loading

yuriiik commented May 22, 2020 •

edited

Loading

sneakyness commented May 22, 2020 •

edited

Loading

designatednerd commented May 23, 2020

sneakyness commented May 23, 2020

MaximillionMai commented May 24, 2020

yuriiik commented May 24, 2020

designatednerd commented May 26, 2020

designatednerd commented May 28, 2020

yuriiik commented Jun 10, 2020

[Bug] 0.27.1 introduced an bug when two queries are sent one after another #1210

[Bug] 0.27.1 introduced an bug when two queries are sent one after another #1210

Comments

MaximillionMai commented May 17, 2020 • edited Loading

designatednerd commented May 17, 2020

yuriiik commented May 19, 2020

sfoulston commented May 19, 2020

sneakyness commented May 19, 2020

designatednerd commented May 19, 2020

sneakyness commented May 19, 2020

designatednerd commented May 19, 2020

designatednerd commented May 19, 2020

sneakyness commented May 19, 2020 • edited Loading

designatednerd commented May 19, 2020

sneakyness commented May 19, 2020

designatednerd commented May 19, 2020

sneakyness commented May 19, 2020

designatednerd commented May 19, 2020

sneakyness commented May 19, 2020

designatednerd commented May 19, 2020

sneakyness commented May 19, 2020

designatednerd commented May 19, 2020

designatednerd commented May 19, 2020

sneakyness commented May 19, 2020 • edited Loading

designatednerd commented May 20, 2020

designatednerd commented May 21, 2020

yuriiik commented May 22, 2020

RolandasRazma commented May 22, 2020

designatednerd commented May 22, 2020 • edited Loading

yuriiik commented May 22, 2020 • edited Loading

sneakyness commented May 22, 2020 • edited Loading

designatednerd commented May 23, 2020

sneakyness commented May 23, 2020

MaximillionMai commented May 24, 2020

yuriiik commented May 24, 2020

designatednerd commented May 26, 2020

designatednerd commented May 28, 2020

yuriiik commented Jun 10, 2020

MaximillionMai commented May 17, 2020 •

edited

Loading

sneakyness commented May 19, 2020 •

edited

Loading

sneakyness commented May 19, 2020 •

edited

Loading

designatednerd commented May 22, 2020 •

edited

Loading

yuriiik commented May 22, 2020 •

edited

Loading

sneakyness commented May 22, 2020 •

edited

Loading