Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Addition/update of various dictionary, set variants #6

Merged
merged 1 commit into from
Jan 28, 2014

Conversation

kmsquire
Copy link
Member

  • OrderedDict (new)
  • OrderedSet (new)
  • DefaultDict (updated)
  • DefaultOrderedDict (new)

The basic structure is there, but not yet any tests or documentation. Bugs are likely still lurking...

cc: @toivoh

Edit Here's an account of the evolution of OrderedDicts:

I've tried various approaches to implement ordered dicts, and this was pretty much the only one with reasonable performance (about 10% slower than Dicts).

Background: In Dicts, for efficiency, keys and values are stored in separate arrays, as:

type Dict{K,V} <: Associative{K,V}
    slots::Array{Uint8,1}
    keys::Array{K,1}
    vals::Array{V,1}
...

An easy way to implement OrderedDicts is to use a type or tuple for vals which contains ordering information (generally, a doubly linked list). Unfortunately, this kills performance, because vals is (generally) no longer contiguous. (Note that I originally did this long before @loladiro's tuple updates).

So, the most performant method is to include ordering information as part of the Dict type. HashDict implements this. It is parametrized as Ordered or Unordered, which are aliases for Int and Nothing, respectively.

Nothing arrays take up little or no space, so other than two additional variables in the Dict type, there is no memory difference and no performance difference between HashDict{K,V,Unordered} andBase.Dict{K,V}. (The idea for Nothing came from Base.Set.) Ideally, I'd like to see Base.Dict implemented based on HashDict.

In JuliaLang/julia#4038, I did this:

type Dict{K,V,O<:Union(Ordered,Unordered)} <: Associative{K,V}

Unfortunately, it had its own issues
a. Constructing plain Dicts using constructor notation always required the Unordered type parameter.
b. (Type1=>Type2)[] produced Dict{Type1,Type2,Unordered}, which required a bit of mucking around in the parser to get things working. It was kinda fun, but it takes a bit of work.
c. OrderedDict could be a typealias or constructor, but not both (see JuliaLang/julia#3427)

Given all of this, I chose to make the HashDict type, and implement OrderedDicts a thin wrapper around it (for aesthetics), and hope that at some point, HashDict will become the base for both Dict and OrderedDict.

One other consideration for the OrderedDict/HashDict implementation (for the future)

  • The current implementation of HashDict uses two additional arrays to maintain ordering information:
    • order is an ordering of indices in the keys/vals array
    • idxs is the same size as keys and vals, and points back into the order array, for easy updating
      This could allow direct access to the nth key-value pair (which was the original inspiration).

Instead, these could be implemented as prev and next in a doubly linked list, which might actually allow some simplification. (Should probably try this.)

@lindahua
Copy link
Contributor

Thanks for the efforts to migrate the ordered containers here.

Just out of curiosity, what approach does you use to implement the ordered containers? (In C++ STL, a balanced binary search tree of some sort is typically used, but it doesn't seem to be the case here).

Also note that there are other efforts to implement red-black trees (see #5).

@kmsquire
Copy link
Member Author

Thanks Dahua. Ordered here is following the convention in Python or Java--in this case, the containers maintain insertion order, not sort order. Both are useful.

The OrderedDict here is like an ordereddict in Python or LinkedHashMap in Java, except that I'm using an array to back the order instead of a linked list.

I was just about to comment on the other issue.

@toivoh
Copy link

toivoh commented Jan 25, 2014

Very nice!

@lindahua
Copy link
Contributor

@kmsquire Thanks for the clarification. I am generally happy with the PR. Just that we probably need some tests.

@kmsquire
Copy link
Member Author

Thanks Dahua. I agree, and will try to get to it this weekend.

@kmsquire
Copy link
Member Author

Okay, this should be in good shape.

I did make some minor updates to the structure, in anticipation of a forthcoming SortedDict

  • DictBase is a hash table, which is not useful as a base for a tree-backed dictionary. Changed the name to HashDict
  • Added DefaultDictBase, which is wrapped by DefaultDict and DefaultOrderedDict, and added a (commented out) implementation of a DefaultSortedDict, for when SortedDict arrives.

One choice I made is that I wanted each of the above dictionary types to be actual concrete classes (with their own constructors, etc.), which required that I implement them as wrappers. This makes the implementation slightly more verbose than using, e.g., type aliases, but I felt the ease of use made the trade-off worth it.

Also, I should add explicitly that I used a delegate macro provided by @johnmyleswhite in (JuliaLang/julia#3292). It's noted, briefly, but I can make it more obvious if you'd like, John.

Feedback and suggestions are very welcome. If all is good, I can squash and merge sometime tomorrow (Tuesday) (or someone else can).

@lindahua
Copy link
Contributor

I took a brief skim. Curious about HashDict? Is it only for internal use (I see OrderedDict depends on it) or can be useful in its own? How is it different from Base's Dict (that is also using hash table)?

@johnmyleswhite
Copy link

If you got any use out of @delegate, I'm happy.

@kmsquire
Copy link
Member Author

Thanks for reviewing, Dahua. HashDict is a modified version of Dict, quite similar to the version I proposed in JuliaLang/julia#4038.

So that the record was more visible, I updated the pull request abstract with a long explanation about the organization of OrderedDict/HashDict. That said, it's explanatory, not proscriptive, so if you (or anyone) has suggestions or updates, I'd be quite fine with that.

@lindahua
Copy link
Contributor

This PR looks good to me.

@kmsquire: please go ahead to merge it if you think it is ready. (Let me know if you don't have the commit access).

* OrderedDict (new)
* OrderedSet (new)
* DefaultDict (updated)
* DefaultOrderedDict (new)

The PR for this commit contains a detailed explanation of
the evolution of OrderedDicts
kmsquire added a commit that referenced this pull request Jan 28, 2014
RFC: Addition/update of various dictionary, set variants
@kmsquire kmsquire merged commit c8115b8 into master Jan 28, 2014
@kmsquire kmsquire deleted the kms/dict_variants branch January 28, 2014 21:40
@kmsquire
Copy link
Member Author

Squashed and merged!

@toivoh
Copy link

toivoh commented Jan 29, 2014

Thanks for taking the time to go through with this!

@kmsquire
Copy link
Member Author

My pleasure!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants