Switch AVRO InstanceCache to use Caffeine cache #32

kokosing · 2018-07-20T06:03:23Z

Switch AVRO InstanceCache to use Caffeine cache

The InstanceCache was made time based cache to workaround
google/guava#2408 as a part of 1d2f31e. The time-based cache did not
work for all workloads, especially with the hardcoded 1 MINUTE ttl
value. In addition the default concurrencyLevel of 4 was not good
enough for the cache to work out-of-the-box. So switch the
InstanceCache to use size-based Caffeine
(https://github.com/ben-manes/caffeine) cache which has the fix for
the original guava bug google/guava#2408.

Also in terms of concurrencyLevel switching to Caffeine helps.
From https://github.com/ben-manes/caffeine/wiki/Benchmarks:
"Caffeine and ConcurrentLinkedHashMap size their internal
structures based on the number of CPUs"

The InstanceCache was made time based cache to workaround google/guava#2408 as a part of 1d2f31e. The time-based cache did not work for all workloads, especially with the hardcoded 1 MINUTE ttl value. In addition the default concurrencyLevel of 4 was not good enough for the cache to work out-of-the-box. So switch the InstanceCache to use size-based Caffeine (https://github.com/ben-manes/caffeine) cache which has the fix for the original guava bug google/guava#2408. Also in terms of concurrencyLevel switching to Caffeine helps. From https://github.com/ben-manes/caffeine/wiki/Benchmarks: "Caffeine and ConcurrentLinkedHashMap size their internal structures based on the number of CPUs"

kokosing · 2018-07-20T06:03:39Z

CC: @anusudarsan

findepi · 2018-07-20T06:45:34Z

src/main/java/org/apache/hadoop/hive/serde2/avro/InstanceCache.java

-    private final Cache<K, V> cache = CacheBuilder.newBuilder()
-            .expireAfterWrite(1, TimeUnit.MINUTES)
+    private final Cache<K, V> cache = Caffeine.newBuilder()
+            .maximumSize(100_000)


I think it would be better to tune Guava cache behavior, if feasible.

findepi · 2018-07-20T06:45:56Z

cc @wagnermarkd

wagnermarkd · 2018-07-20T20:49:37Z

Can you give details of how the 1 minute TTL was failing? @findepi , IIRC the TTL configuration was the only one that wouldn't have the GC issue from #28. Tuning that TTL is then the only option and I'd rather switch to a different cache that works for everyone than expose the TTL as a tunable value.

kokosing · 2018-07-21T18:56:34Z

Can you give details of how the 1 minute TTL was failing?

@anusudarsan Do you remember the exact problem that was solved with this change?

anusudarsan · 2018-07-23T13:53:54Z

We tried to get the information from the client, but did not get much details. We wanted the timeout and concurrencyLevel be configurable, to try different timeouts for different use cases, and having it hard-coded to 1 minute did not help anyway. So the better solution was to fix the cache itself to handle all workloads.

.

kokosing requested a review from electrum July 20, 2018 06:03

facebook-github-bot added the CLA Signed label Jul 20, 2018

findepi previously approved these changes Jul 20, 2018

View reviewed changes

findepi reviewed Jul 20, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch AVRO InstanceCache to use Caffeine cache #32

Switch AVRO InstanceCache to use Caffeine cache #32

kokosing commented Jul 20, 2018

kokosing commented Jul 20, 2018

findepi Jul 20, 2018

findepi commented Jul 20, 2018

wagnermarkd commented Jul 20, 2018

kokosing commented Jul 21, 2018

anusudarsan commented Jul 23, 2018 •

edited

Loading

Switch AVRO InstanceCache to use Caffeine cache #32

Are you sure you want to change the base?

Switch AVRO InstanceCache to use Caffeine cache #32

Conversation

kokosing commented Jul 20, 2018

kokosing commented Jul 20, 2018

findepi Jul 20, 2018

Choose a reason for hiding this comment

findepi commented Jul 20, 2018

wagnermarkd commented Jul 20, 2018

kokosing commented Jul 21, 2018

anusudarsan commented Jul 23, 2018 • edited Loading

anusudarsan commented Jul 23, 2018 •

edited

Loading