Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace fa_cache with htable cache #160

Merged
merged 6 commits into from
Sep 25, 2023
Merged

Replace fa_cache with htable cache #160

merged 6 commits into from
Sep 25, 2023

Conversation

var77
Copy link
Collaborator

@var77 var77 commented Sep 19, 2023

Replaced fa_cache with htable cache for nodes when doing inserts and scans.
Renamed Cache to BlockNumberCache and created NodeCache as the entry and key size differ. Maybe there's a way to make some functions generic, you can mention on PR comments
This change should improve performance per benchmarks

HASHCTL hctl;
} NodeCache;

NodeCache node_cache_create();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we refactor these to have one cache_ api?
It seems the only thing different between the two caches in this PR is the key and values size.
So, we can pass those in create call and have the rest of the arguments be void* in the data structure definition.

Maybe we could have a type-specific wrapper around the data structure for each instance (node_cache and bln_cache) to get some compiler help. Not sure if this is necessary or useful enough, though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

@github-actions
Copy link

github-actions bot commented Sep 20, 2023

Benchmarks

metric old new pct change
recall (after create) 0.740 0.740 -
recall (after insert) 0.776 0.758 -2.32%
select bulk tps 164.930 240.924 +46.08%
select bulk latency (ms) 47.626 32.179 -32.43%
select bulk latency (stddev ms) 8.045 8.648 +7.50%
create latency (ms) 1566.247 1442.848 -7.88%
insert bulk tps 11.839 15.120 +27.71%
insert bulk latency (ms) 84.455 66.122 -21.71%
insert bulk latency (stddev ms) 2.493 1.722 -30.93%
disk usage (bytes) 6348800.000 6348800.000 -

@var77
Copy link
Collaborator Author

var77 commented Sep 20, 2023

@Ngalstyan4 can you please re-review? I did the requested changes.

@var77 var77 requested a review from Ngalstyan4 September 20, 2023 09:55
{
MemoryContext old_context = MemoryContextSwitchTo(cache->hctl.hcxt);
hash_destroy(cache->htab);
MemoryContextDelete(cache->hctl.hcxt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you delete the memory context you are currently on?
Is that safe?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to switch back to old context before deleting, this was not an issue, but I think it may cause undefined behaviour later.


void cache_destroy(HTABCache *cache)
{
MemoryContext old_context = MemoryContextSwitchTo(cache->hctl.hcxt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory context ownership is a little strange here.
It seems the memory context is constructed externally, by the caller, but it is destroyed internally, by this abstract data type. Is there a reason for this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that if I create memory context with the same name they will share the same memory, but I looked through the source code and it seems name is mostly used on errors to display on which memory context the error was appeared. I think it is not critical for us to share the same name of MemoryContext among different cache instances, so I brought back memory context creation inside cache_create

} HTABCache;

HTABCache cache_create(char *name, MemoryContext ctx);
bool cache_remove(HTABCache *cache, HTABCacheKey *key);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document the assumptions of this ADT in this file?

E.g. currently the caller creates the memory context, passes ownership to this ADT and this ADT destroys it upon cache_destroy.
Also answer questions like:
Who owns *key and *entry pointers passed into these API functions? Does the ADT make assumptions about how long those pointers live? (if I understand it correctly, the ADT assumes that anything passed in lives longer than the cache ADT itself. This invariant must be followed by the user of the ADT so this interface file is a good place to document it)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments here

#include <utils/hsearch.h>
#include <utils/memutils.h>

typedef int HTABCacheKey;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use sized integers always (e.g. int32)
Would be great if we could add a ci/cd check for this kinds of things

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to int32 for now, will take a not for CI checks

@Ngalstyan4 Ngalstyan4 merged commit 39b5b00 into main Sep 25, 2023
@Ngalstyan4 Ngalstyan4 deleted the varik/cache branch September 25, 2023 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants