-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(store): UnorderedMap v2 implementation #584
Conversation
…austin/store/bucket
Note to self, there is a TODO added with the ability to make clearing more efficient. Going to do this in parallel (just optimizes away a clone and adds drain iterator for edit: change started here #592 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This is a pretty crucial data structure, and changes are pretty involved, so I'm going to wait for at least one more review on this |
/// # Panics | ||
/// | ||
/// Panics if the key does not exist in the map | ||
fn index_mut(&mut self, index: &Q) -> &mut Self::Output { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that std's maps don't implement index_mut
, so, if we are shooting for consistency, we might want to avoid it as well.
Tough, I think the reason for why there are no index-muts is just forward compatibility (basically, folks want that hm[new_key] = new_value
inserted new value, instead of panicking), so there's a reasoning that, as we, unlike std, can in theory release new major version, we might add IndexMut.
Not sure, though I'd go for conservative and "obvious" solution of just following std here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's a good point. I'll leave them out for now, we can decide if we want it to add an entry or panic on not exist later
@@ -10,11 +10,13 @@ pub use vec::Vector; | |||
pub mod lookup_map; | |||
pub use self::lookup_map::LookupMap; | |||
|
|||
pub mod unordered_map; | |||
pub use self::unordered_map::UnorderedMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, still reeealy don't like C++ish unordered_map naming :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I'm not a huge fan either but decided to keep existing naming to have an easier transition from the existing type. Are you suggesting having it be named HashMap
, StorageHashMap
or something like this? I don't have an opinion on naming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:) I don't have a strong opinion on this, but we do reference UnorderedMap in a lot of places and the dev community might be used to it
{ | ||
/// Values iterator which contains empty and filled cells. | ||
keys: bucket::Iter<'a, K>, | ||
/// Amount of valid elements left to iterate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc comment feels like it is wrong now? Can't connect Amount
with this being a LookupMap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, yes, just forgot to come back to fix this
Op::Clear => { | ||
sv.clear(); | ||
hm.clear(); | ||
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to the PR, but I'd use first_free
/ next_free
in Bucket, rather than next_vacant
/ next_index
. Also like Slot
terminology for container, as in
https://github.com/DataDog/glommio/blob/master/glommio/src/free_list.rs#L40-L44
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine, I'll change now. All private APIs and that change has already come in. Technically it isn't first_free
as this points to the last index removed so I will adjust naming from your suggestion slightly. Do you think the Bucket
type should be changed to something resembling FreeList
? It's private and hasn't been released yet so can easily change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as this points to the last index removed
Last index removed is the first free index (in a free list) :) As for namling, I personally have a very strong association that a vec of slots with index-based list of free slots is a free list, but this might be idiosincratic usage. Doesn't really matter in the grand scheme of things, just wanted to share some conventions I developed around this data structure (as I think I've had to code a Rust free list a dozen of times, its a farily frequent thing).
K: BorshSerialize + Ord + Clone + Borrow<Q>, | ||
V: BorshSerialize + BorshDeserialize, | ||
H: CryptoHasher<Digest = [u8; 32]>, | ||
Q: BorshSerialize + ToOwned<Owned = K>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kindof not a huge fan of the amount of bounds here -- would look bad as a public API in rustdoc. Otoh, not that I can suggest a better alternative :)
use super::{CryptoHasher, LookupMap, UnorderedMap, ValueAndIndex, ERR_INCONSISTENT_STATE}; | ||
use crate::{env, store::bucket}; | ||
|
||
impl<'a, K, V, H> IntoIterator for &'a UnorderedMap<K, V, H> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc for UnorderedMap says its not itrable (docs are wrogn)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that in the docs, unless you are looking for something that I'm not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, we have two UnorderedMaps as of 704cb26? The one in store/unordered_map
claims that it is not iterable. But here we implement the iter for that map.
The one in collections/unordered_map does say that it is iterable.
- https://github.com/matklad/near-sdk-rs/blob/704cb26de463bc9756d38f4d3694bc0d288d2933/near-sdk/src/store/unordered_map/iter.rs#L8
- https://github.com/matklad/near-sdk-rs/blob/704cb26de463bc9756d38f4d3694bc0d288d2933/near-sdk/src/store/unordered_map/mod.rs#L21
the two links I am looking at
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah that was just a detail missed when copying over the API that I hadn't updated. It's fixed now
- When mixing using `sys` and `env`, reduces chance of collision for using `0` | ||
- store: Implement caching `LookupMap` type. This is the new iteration of the previous version of `near_sdk::collections::LookupMap` that has an updated API, and is located at `near_sdk::store::LookupMap`. [PR 487](https://github.com/near/near-sdk-rs/pull/487). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a second te realize what's the difference between lookup map and unordered map (that is, that um hashes the keys to get the trie key). We might explain this better in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well both do, the only difference is that unordered map keeps track of keys in this vector like structure so that it can be iterated (LookupMap
isn't aware of what elements exist in it). I'll try to make docs more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the docs, I was lazy with re-reading this. My fault. Used LookupMap
as a framework since this is a superset of the functionality, but didn't make a pass through the docs.
H: CryptoHasher<Digest = [u8; 32]>, | ||
{ | ||
keys: Bucket<K>, | ||
values: LookupMap<K, ValueAndIndex<V>, H>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 hm, I'd naively expect that we us the following repr here:
struct UnordereMap {
lm: LookupMap<CryptoHash, (K, V)>
}
ie, that we just use hashes as keys and key-value pairs as values.
Which also makes me think, that maybe lookup map is the wrong abstaction? Ie, the difference between lookup map and unordered map is that the former uses borsh serialize as the trie key, while the latter uses sha hash as the key.
So perhaps we need to introduce some kind of RawSet internal abstraction that just stores byte keys and values and parametrized by how we get thouse bytes out of Rust types.
Than lookup map would be one specialization that uses Borsh for both, and UnorderedMap would use Borsh + CryptoHasher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, they both use the hash as the key for lookup. This does make me think I should refactor the LookupMap
to remove keeping the owned values of the types at all. Probably would be lighter and better to build off of.
I can't remember as I'm responding to this what the specific reason was for this in the first place.
Edit: The reason was that you could avoid hashing a key altogether if something was added then deleted. Also, for looking up values you only need to hash when the element is not in cache. This should probably be benchmarked with actual tracking of gas usage to see which would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess I should make a more careful pass over it, as it seems I got myself confused about what this is doing at all :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #596 which might make things more clear and be a higher leverage piece of code to weigh in on
Closes #580
Also removes some unnecessary bounds in
LookupMap
and replaces theunreachable!()
code with aborts