Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(store): UnorderedMap v2 implementation #584

Merged
merged 38 commits into from
Oct 7, 2021

Conversation

austinabell
Copy link
Contributor

Closes #580

Also removes some unnecessary bounds in LookupMap and replaces the unreachable!() code with aborts

@austinabell
Copy link
Contributor Author

austinabell commented Sep 23, 2021

Note to self, there is a TODO added with the ability to make clearing more efficient. Going to do this in parallel (just optimizes away a clone and adds drain iterator for Vector so it doesn't block this, but keep in mind this will change

edit: change started here #592

Copy link
Member

@ChaoticTempest ChaoticTempest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Base automatically changed from austin/store/bucket to master September 28, 2021 19:17
@austinabell
Copy link
Contributor Author

This is a pretty crucial data structure, and changes are pretty involved, so I'm going to wait for at least one more review on this

/// # Panics
///
/// Panics if the key does not exist in the map
fn index_mut(&mut self, index: &Q) -> &mut Self::Output {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that std's maps don't implement index_mut, so, if we are shooting for consistency, we might want to avoid it as well.

Tough, I think the reason for why there are no index-muts is just forward compatibility (basically, folks want that hm[new_key] = new_value inserted new value, instead of panicking), so there's a reasoning that, as we, unlike std, can in theory release new major version, we might add IndexMut.

Not sure, though I'd go for conservative and "obvious" solution of just following std here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's a good point. I'll leave them out for now, we can decide if we want it to add an entry or panic on not exist later

@@ -10,11 +10,13 @@ pub use vec::Vector;
pub mod lookup_map;
pub use self::lookup_map::LookupMap;

pub mod unordered_map;
pub use self::unordered_map::UnorderedMap;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, still reeealy don't like C++ish unordered_map naming :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I'm not a huge fan either but decided to keep existing naming to have an easier transition from the existing type. Are you suggesting having it be named HashMap, StorageHashMap or something like this? I don't have an opinion on naming

cc: @ChaoticTempest @mikedotexe

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) I don't have a strong opinion on this, but we do reference UnorderedMap in a lot of places and the dev community might be used to it

{
/// Values iterator which contains empty and filled cells.
keys: bucket::Iter<'a, K>,
/// Amount of valid elements left to iterate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc comment feels like it is wrong now? Can't connect Amount with this being a LookupMap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yes, just forgot to come back to fix this

Op::Clear => {
sv.clear();
hm.clear();
}
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to the PR, but I'd use first_free/ next_free in Bucket, rather than next_vacant / next_index. Also like Slot terminology for container, as in

https://github.com/DataDog/glommio/blob/master/glommio/src/free_list.rs#L40-L44

Copy link
Contributor Author

@austinabell austinabell Sep 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine, I'll change now. All private APIs and that change has already come in. Technically it isn't first_free as this points to the last index removed so I will adjust naming from your suggestion slightly. Do you think the Bucket type should be changed to something resembling FreeList? It's private and hasn't been released yet so can easily change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as this points to the last index removed

Last index removed is the first free index (in a free list) :) As for namling, I personally have a very strong association that a vec of slots with index-based list of free slots is a free list, but this might be idiosincratic usage. Doesn't really matter in the grand scheme of things, just wanted to share some conventions I developed around this data structure (as I think I've had to code a Rust free list a dozen of times, its a farily frequent thing).

Comment on lines 45 to 48
K: BorshSerialize + Ord + Clone + Borrow<Q>,
V: BorshSerialize + BorshDeserialize,
H: CryptoHasher<Digest = [u8; 32]>,
Q: BorshSerialize + ToOwned<Owned = K>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kindof not a huge fan of the amount of bounds here -- would look bad as a public API in rustdoc. Otoh, not that I can suggest a better alternative :)

use super::{CryptoHasher, LookupMap, UnorderedMap, ValueAndIndex, ERR_INCONSISTENT_STATE};
use crate::{env, store::bucket};

impl<'a, K, V, H> IntoIterator for &'a UnorderedMap<K, V, H>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc for UnorderedMap says its not itrable (docs are wrogn)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that in the docs, unless you are looking for something that I'm not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, we have two UnorderedMaps as of 704cb26? The one in store/unordered_map claims that it is not iterable. But here we implement the iter for that map.

The one in collections/unordered_map does say that it is iterable.

the two links I am looking at

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah that was just a detail missed when copying over the API that I hadn't updated. It's fixed now

- When mixing using `sys` and `env`, reduces chance of collision for using `0`
- store: Implement caching `LookupMap` type. This is the new iteration of the previous version of `near_sdk::collections::LookupMap` that has an updated API, and is located at `near_sdk::store::LookupMap`. [PR 487](https://github.com/near/near-sdk-rs/pull/487).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a second te realize what's the difference between lookup map and unordered map (that is, that um hashes the keys to get the trie key). We might explain this better in the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well both do, the only difference is that unordered map keeps track of keys in this vector like structure so that it can be iterated (LookupMap isn't aware of what elements exist in it). I'll try to make docs more clear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the docs, I was lazy with re-reading this. My fault. Used LookupMap as a framework since this is a superset of the functionality, but didn't make a pass through the docs.

H: CryptoHasher<Digest = [u8; 32]>,
{
keys: Bucket<K>,
values: LookupMap<K, ValueAndIndex<V>, H>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 hm, I'd naively expect that we us the following repr here:

struct UnordereMap {
  lm: LookupMap<CryptoHash, (K, V)>
}

ie, that we just use hashes as keys and key-value pairs as values.

Which also makes me think, that maybe lookup map is the wrong abstaction? Ie, the difference between lookup map and unordered map is that the former uses borsh serialize as the trie key, while the latter uses sha hash as the key.

So perhaps we need to introduce some kind of RawSet internal abstraction that just stores byte keys and values and parametrized by how we get thouse bytes out of Rust types.

Than lookup map would be one specialization that uses Borsh for both, and UnorderedMap would use Borsh + CryptoHasher.

Copy link
Contributor Author

@austinabell austinabell Sep 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, they both use the hash as the key for lookup. This does make me think I should refactor the LookupMap to remove keeping the owned values of the types at all. Probably would be lighter and better to build off of.

I can't remember as I'm responding to this what the specific reason was for this in the first place.

Edit: The reason was that you could avoid hashing a key altogether if something was added then deleted. Also, for looking up values you only need to hash when the element is not in cache. This should probably be benchmarked with actual tracking of gas usage to see which would be better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess I should make a more careful pass over it, as it seems I got myself confused about what this is doing at all :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #596 which might make things more clear and be a higher leverage piece of code to weigh in on

@austinabell austinabell mentioned this pull request Oct 1, 2021
@austinabell austinabell merged commit 610cdf7 into master Oct 7, 2021
@austinabell austinabell deleted the austin/store/unordered_map branch October 7, 2021 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

v2 store UnorderedMap implementation
4 participants