-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LocalCache deadlock in Guava 22 #2863
Comments
All recursive operations on a map are undefined behavior, so this works as intended. When the cache was a feature of MapMaker it had this behavior for computations on Java 8's So this works as intended, but could be a tad nicer. |
I'm not sure I see the recursion here? |
Oh shoot, I misread. I've seen the recursion issues raised so many times across hash tables that, on seeing multiple It does appear that |
Isn't this the behavior referenced in #2799? |
I'm also observing some strange behaviour with guava caches under contention. Not sure if these two issues are related. If they aren't, I'm happy to open a separate issue. While I was doing some JMH benchmarks I noticed that benchmarks that were using guava caches stalled completely and/or had very erratic performance during thread contention. Benchmark iterations very often stalled for seconds, minutes or sometimes seemingly indefinitely. It turned out that the cache was the culprit in all cases. The issue is very easy to trigger. I published a sample project on https://github.com/hibox-systems/guava-cache-test that demonstrates the issue. In my environment it works fine with 1, 2 and 3 threads. Using 4 threads or more always stalls the benchmark at some point. I have tried the benchmark with all published guava versions that are compatible with the sample code, i.e. guava 11.0 and newer. Tested with these JVM versions on Ubuntu 16.04 with a Intel Core i7-3770 CPU:
FWIW, I tried replacing the guava caches in the benchmarks with caffeine caches, and all kinds of stalling and erratic performance went away. |
It should be unrelated due to this bug dealing with new code for Java 8. Your benchmark touches prior code. Note that by default there are 4 hash segments, each having its own queue and counter for replaying reads. The queue is uncapped and draining is amortized across readers. There should be a per drain threshold to avoid over penalizing any thread. In a stress test like yours, one could imagine a memory issue, hang ups for the penalized callers, and contention adding to the read buffer. Caffeine uses a lossy ring buffer which avoids all of this. My guess is you are hitting the worst case which wouldn’t be expected in practice, or at least not with the hardware available when the cache was designed. |
This deadlock is not the typical deadlock where 2 threads locked a resource and each one is waiting to lock a second resource already locked by the other thread. The thread owning the account cache lock is parked, which tell us that the locked was not released. I could not determine the exact sequence of events leading to this deadlock making it really hard to report/fix the problem. While investigating, I realized that there quite a few reported issues in Guava that could be major for Gerrit: Our deadlock happening in account cache https://bugs.chromium.org/p/gerrit/issues/detail?id=7645 Other deadlock google/guava#2976 google/guava#2863 Performance google/guava#2063 Race condition google/guava#2971 Because I could not reproduce the deadlock in a dev environment or in a unit test making it almost impossible to fix, I considered other options such as replacing Guava by something else. The maintainer of Caffeine[1] cache claims that Caffeine is a high performance[2], near optimal caching library designed/implemented base on the experience of designing Guava's cache and ConcurrentLinkedHashMap. I also did some benchmarks about spawning a lot of threads reading/writing values from the caches. I ran those benchmarks on both Guava and Caffeine and Guava was always taking at least double the time than Caffeine to complete all operations. Migrating to Caffeine is almost a drop-in replacement. Caffeine interface are very similar to Guava cache and there is an adapter to migrate to Caffeine and keep using Guava's interfaces. After migrating to Caffeine, we saw that deadlock occurrence was reduced from once a day to once every 2 weeks in our production server. The maintainer of Caffeine, Ben Manes pointed us to the possible cause[3] of this issue, a bug[4] in the kernel and its fix[5]. Our kernel version is supposed to have the fix but we will try another OS and kernel version. Replacing Guava caches by Caffeine brings 2 things, it reduces the chances of having the deadlock most likely caused by a kernel bug and improve the cache performance. [1]https://github.com/ben-manes/caffeine [2]https://github.com/ben-manes/caffeine/wiki/Benchmarks [3]https://groups.google.com/forum/#!topic/mechanical-sympathy/QbmpZxp6C64 [4]torvalds/linux@b0c29f7 [5]torvalds/linux@76835b0 Bug: Issue 7645 Change-Id: I8d2b17a94d0e9daf9fa9cdda563316c5768b29ae
Throwing exception in compute function of Cache ConcurrentMap and computing the same key after makes deadlock in LocalCache.java:2387
@test
public void testCacheLoaderException() {
Map cache = CacheBuilder.newBuilder().build().asMap();
The text was updated successfully, but these errors were encountered: