Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Guava caches with Caffeine caches #3198

Merged
merged 1 commit into from
Sep 22, 2022

Conversation

porunov
Copy link
Member

@porunov porunov commented Sep 1, 2022

This is a follow-up commit to the 9a22a1a commit
Suppressing #3197 to be able to run benchmark tests on this PR

Migrates the rest 7 caches from Guava implementation to Caffeine implementation

This commit also removes a previously added Benchmark class with the obsolate GuavaVertexCache.class. The benchmark is provided in the next GitHub PR: #3188 which shows that Caffeine has better performance characteristics in all scenarious.

ConcurencyLevel is not provided for the cache implementation anymore. The reason for this can be found by the following link https://groups.google.com/g/guava-discuss/c/BnoN9q9fDgw/m/6kF6H2aLAgAJ

Current JanusGraph benchmarks: https://gist.github.com/porunov/d07178afea8658e41a4802d1fca14274?permalink_comment_id=4287139#gistcomment-4287139
Current PR didn't show neither improvement nor regression in performance for the executed benchmark tests.

Guava recommends using Caffeine cache in their documentation. The only situation when Guava cache makes sense is when using Java <= 7. As JanusGraph supports Java 8 and higher, we can switch to Caffeine. See Guava documentation tells about Caffeine here.

Related to #3188 and #3185

Partially related to #2033

Fixes #871

Signed-off-by: Oleksandr Porunov [email protected]


Thank you for contributing to JanusGraph!

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there an issue associated with this PR? Is it referenced in the commit message?
  • Does your PR body contain #xyz where xyz is the issue number you are trying to resolve?
  • Has your PR been rebased against the latest commit within the target branch (typically master)?
  • Is your initial contribution a single, squashed commit?

For code changes:

  • Have you written and/or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE.txt file, including the main LICENSE.txt file in the root of this repository?
  • If applicable, have you updated the NOTICE.txt file, including the main NOTICE.txt file found in the root of this repository?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered?

@porunov porunov added this to the Release v1.0.0 milestone Sep 1, 2022
@janusgraph-bot janusgraph-bot added the cla: external Externally-managed CLA label Sep 1, 2022
This is a follow-up commit to the 9a22a1a commit

Migrates the rest 7 caches from Guava implementation to Caffeine implementation

This commit also removes a previously added Benchmark class with the obsolate GuavaVertexCache.class. The benchmark is provided in the next GitHub PR: #3188 which shows that Caffeine has better performance characteristics in all scenarious.

ConcurencyLevel is not provided for the cache implementation anymore. The reason for this can be found by the following link https://groups.google.com/g/guava-discuss/c/BnoN9q9fDgw/m/6kF6H2aLAgAJ

Related to #3188 and #3185

Partially related to #2033

Fixes #871

Signed-off-by: Oleksandr Porunov <[email protected]>
@porunov porunov force-pushed the feature/guava-to-caffeine-transition branch from a6859eb to 1754863 Compare September 1, 2022 00:43
@ben-manes
Copy link

I don’t understand ExpirationKCVSCache, but it seems to do expiration itself with a cleanup thread. Caffeine has richer support than Guava (variable, scheduler) and might simplify this code.

@porunov
Copy link
Member Author

porunov commented Sep 1, 2022

I don’t understand ExpirationKCVSCache, but it seems to do expiration itself with a cleanup thread. Caffeine has richer support than Guava (variable, scheduler) and might simplify this code.

I don't know for sure but I assume cache.invalidate(key) might be a complex operation which is why ExpirationKCVSCache doesn't invalidate data immediately but collects expiration for some time and then triggers probabilistic cleanup thread which invalidates a set of collected expired keys.
I do believe that the code in ExpirationKCVSCache should be checked again and simplified.

For example, I don't understand the reason behind this loop over the whole cache:

Is really more efficient to get the whole cache as map, traverse over all keys of that cache and invalidate them if they were expired? Why didn't we just traverse expired keys and sent them to invalidate method? It seems that the code assumes that in case we try to invalidate a key which isn't in the cache then the invalidation is expensive but for some reason I really doubt it.

Basically, to cleanup the code I guess we need to figure out complexity of invalidate method for keys which are in the cache and keys which are not in the cache.

@porunov
Copy link
Member Author

porunov commented Sep 1, 2022

The benchmarks on GitHub side are unreliable because it looks they are executed differently all the time. Thus, I executed benchmarks in my local environment where no other processes or change in environment could influence the result.
The benchmarks can be found here: https://gist.github.com/porunov/d07178afea8658e41a4802d1fca14274?permalink_comment_id=4287139#gistcomment-4287139
The benchmark was conducted on the code which was fully in Guava (i.e. 1754863) and the code which was fully in Caffeine (i.e. 065a48f).
No any differences in benchmarks were seen. I.e. no improvement and no regression.

That said, the removed benchmark in this PR (https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-benchmark/src/main/java/org/janusgraph/VertexCacheBenchmark.java) shows that Caffeine cache access is much faster then Guava cache access (10 times at least).

Thus, I believe this PR is safe to transition the rest of the caches to Caffeine implementation.

@porunov
Copy link
Member Author

porunov commented Sep 1, 2022

@ben-manes pointed here we may see benefit from the higher hit rate.
Another interesting note is the fact that Guava recommends to use Caffeine in their documentation.
They are literally saying that in bold:

Prefer Caffeine over Guava's caching API

@porunov
Copy link
Member Author

porunov commented Sep 19, 2022

@JanusGraph/committers In case anyone needs time to review this PR, please, let me know. Otherwise, I would like to merge this PR in 48 hours

Copy link
Contributor

@farodin91 farodin91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@porunov Awesome work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: external Externally-managed CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TinyLFU cache (vs LRU)
4 participants