Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LocalManualCache deadlock in guava 21 #2743

Closed
epavlov1 opened this issue Feb 17, 2017 · 9 comments
Closed

LocalManualCache deadlock in guava 21 #2743

epavlov1 opened this issue Feb 17, 2017 · 9 comments
Assignees
Labels
package=cache type=defect Bug, not working as expected
Milestone

Comments

@epavlov1
Copy link

Unit test testCache works with guava 20, locks up in 21 (LocalCache.java,line 2392):

@Test
public void testCache()
{
    String key = "test";
    int count  = 1000;
    Cache<String, String> cache = CacheBuilder.newBuilder().expireAfterAccess(500000, TimeUnit.MILLISECONDS).maximumSize(1000).build();

    IntStream.range(0, count).parallel().forEach((n) -> {
        String val = cache.asMap().computeIfAbsent(key, k->"testv" + n);
        logger.info("Val=" + val);

    });
}
@Test
public void testConcurrentMap()
{
    String key = "test";
    int count  = 1000;
    Map<String, String> cache = new ConcurrentHashMap<>();

    IntStream.range(0, count).parallel().forEach((n) -> {
        String val = cache.computeIfAbsent(key, k->"testv" + n);
        logger.info("Val:", val );

    });
}
@ben-manes
Copy link
Contributor

Reproduced. This is due to mixing synchronization primitives and non-deterministic ordering for lock acquisition.

lock {
  // note valueReference can be an existing value or even itself another loading value if
  // the value for the key is already being computed.
  loadingValueReference = new LoadingValueReference<K, V>(valueReference);

  if (e == null) {
    // create entry...
  }
  e.setValueReference(loadingValueReference);
}
...
synchronized (e) {
  V newValue = loadingValueReference.compute(key, function);
  ...
}

This creates a computation chain where each computer waits for the preceding one to complete. The order of acquiring the intrinsic lock on e is non-deterministic, so T3 could acquire it before T2. Then T3 blocks on waitForValue(), which is waiting for T2 to notify its completion. However, that does not release the lock on e which T3 owns. This results in a deadlock between T2 and T3.

The assumption was probably that, like wait() and notify(), the intrinsic lock would be released when waiting on the SettableFuture. But that's not the case, as it uses LockSupport.park. The mixing of different locking infrastructure tends to not play well. Ideally the lock on e would be acquired inside the segment lock and released outside of it, allowing for ensuring the order. But that's not possible with Java's language constraints.

(Verified Caffeine passes this test case)

@ben-manes
Copy link
Contributor

ben-manes commented Feb 18, 2017

I tried running this in my ComputeBenchmark and it deadlocks immediately. This is despite being fully populated, because of the execution chain mentioned above.

The execution chain means the locking is pessimistic, which is also true of ConcurrentHashMap. Both Guava and Caffeine use pre-screening on a load to optimistically check if the entry is present. This allows avoiding locking in the common case. Its likely preferable to have computeIfAbsent delegate to get(key, Callable<V>) (or rather the internal variant).

Another area of concern is that the internal compute method does not appear to handle refreshAfterWrite.

@kluever kluever added package=cache type=defect Bug, not working as expected labels Feb 21, 2017
cpovirk pushed a commit that referenced this issue Mar 16, 2017
#2743

we need to hold the lock when calling compute, the rest of the calls
like removeEntry need to hold the lock too.

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=150363160
@cpovirk cpovirk added this to the 22.0 milestone Mar 28, 2017
@cpovirk cpovirk closed this as completed Mar 28, 2017
@tclamb
Copy link

tclamb commented Apr 5, 2017

Are there plans to backport this fix to 21.0 and release a patched version?

@tclamb
Copy link

tclamb commented Apr 26, 2017

Here are tests I wrote to characterize the behavior. With 2 threads, neither thread deadlocks. Every single additional thread deadlocks.

@Test
public void noDeadlocksWithOnlyTwoThreads() throws Exception {
    attemptToDeadlock(2);
}

@Test(expected = TimeoutException.class)
public void allAdditionalThreadsDeadlock() throws Exception {
    attemptToDeadlock(3);
}

private void attemptToDeadlock(final int nThreads) throws Exception {
    final Cache<String, String> cache = CacheBuilder.newBuilder().build();
    final Supplier<String> loadThroughCache = () ->
        cache.asMap().computeIfAbsent("foo", key -> {
            // simulate expensive operation
            Uninterruptibles.sleepUninterruptibly(100, TimeUnit.MILLISECONDS);
            return "bar";
        });
    final ExecutorService threadPool = Executors.newFixedThreadPool(nThreads);
    CompletableFuture.allOf(Stream.generate(() ->
        CompletableFuture.supplyAsync(loadThroughCache, threadPool))
        .limit(nThreads)
        .toArray(CompletableFuture[]::new))
        .get(1, TimeUnit.SECONDS);
}

@kshchepanovskyi
Copy link

I faced with this issue today, it was quite unexpected.

@ben-manes
Copy link
Contributor

Until this is released, you could use Caffeine as a stopgap.

@ben-manes
Copy link
Contributor

ben-manes commented May 24, 2017

When trying to upgrade my test suit, I found that the new computation support will deadlock on recursive invocations. For a load() operation this is detected to fail fast. Note that JDK8's ConcurrentHashMap may livelock in these situations and will (most often) fail fast in JDK9. While recursion should not be supported, since this mistake was observed in practice we had tried to fail gracefully.

An example test that fails with Guava but passes with Caffeine (backed by JDK8's CHM) is,

@Test(dataProvider = "caches", expectedExceptions = StackOverflowError.class)
public void computeIfPresent_recursive(Map<Integer, Integer> map) {
  // As we cannot provide immediate checking without an expensive solution, e.g. ThreadLocal,
  // instead we assert that a stack overflow error will occur to inform the developer (vs
  // a live-lock or deadlock alternative).
  Integer key = 100;
  BiFunction<Integer, Integer, Integer> mappingFunction =
      new BiFunction<Integer, Integer, Integer>() {
        boolean recursed;

        @Override public Integer apply(Integer key, Integer value) {
          if (recursed) {
            throw new StackOverflowError();
          }
          recursed = true;
          return map.computeIfPresent(key, this);
        }
      };
  map.put(key, key);
  map.computeIfPresent(key, mappingFunction);
}

Secondly, the resulting statistics differ between Caffeine and Guava for computations. I haven't narrowed down to a compatible adapter, but some of the differences appear to be...

  • Guava records a loadSuccess when the entry is present in a computeIfAbsent (no load)
  • Guava does not record a miss when the entry was absent and must be loaded
  • Guava does not record a loadException when the computation was null (likely okay, as Caffeine calls this loadFailure to model this use-case)
  • Guava does not record a hit was the entry was present (no loading necessary)

Caffeine's tests against its cache and Guava's to verify compatibility. The adapter currently does not use the compute methods, but ideally with v22 it would migrate over. Due to the stats that is hard to flush out (the deadlocking tests can be disabled using an implementation flag).

@ben-manes
Copy link
Contributor

ben-manes commented May 24, 2017

Here are the differences required for test compatibility.

@ben-manes
Copy link
Contributor

ben-manes commented May 24, 2017

Multithreaded tests fail when computations are intermixed with loads. A compute may return null to indicate removal. A load will then obtain that future, wait for the value, and throw an InvalidCacheLoadException ("CacheLoader returned null for key"). Opening a new ticket and rolling back my migration.

lucamilanesio pushed a commit to GerritCodeReview/gerrit that referenced this issue Nov 2, 2017
The dead lock issue is fixed: [1].

[1] google/guava#2743

Bug: Issue 7645
Change-Id: I77dd930503e6869be207ca4a4f2fd85116719506
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package=cache type=defect Bug, not working as expected
Projects
None yet
Development

No branches or pull requests

7 participants