Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: OrderedDict redux #4038

Closed
wants to merge 7 commits into from
Closed

RFC: OrderedDict redux #4038

wants to merge 7 commits into from

Conversation

kmsquire
Copy link
Member

This is yet another redo of OrderedDict.

Unlike previous versions, this version grafts ordering onto the current Dict infrastructure directly, with no impact to the performance of unordered dicts.

julia> a = OrderedDict{String, Int}()
Dict{String,Int64,Int64}()

julia> for i = 1:5 
          a[randstring(3)] = i
       end

julia> a
["0MO"=>1,"91h"=>2,"XtG"=>3,"iYN"=>4,"BH5"=>5]

julia> b = (String=>Int)[]
Dict{String,Int64,Nothing}()

julia> for i = 1:5 
          b[randstring(3)] = i
       end

julia> b
["UiE"=>5,"IWn"=>1,"Un0"=>2,"17V"=>4,"PCQ"=>3]

julia> a = OrderedSet(5,4,3,2,1)
Set{Int64,Int64}(5,4,3,2,1)

julia> a = Set(1,2,3,4,5)
Set{Int64,Nothing}(5,4,2,3,1)

Features:

  • adds two arrays to Dict which maintain the ordering and the mapping to hashmap key/value locations in the hashmap.
  • uses element type Nothing for these arrays for unordered dicts (see 42ce806)
    • Unordered is a typealias for Nothing
    • Ordered is a typealias for Int
  • adds ordered versions of Dict functions, reusing Dict code where possible.
  • type abuse: OrderedDict{K,V}() type constructor returns a Dict{K,V,Ordered}
    • this lets us call OrderedDict{K,V}(), and also have OrderedDict() outer constructors
    • ditto with OrderedSet{K}
  • (K=>V)[] creates a Dict{K,V,Unordered}

Downsides:

  • Requires specifying Ordered or Unordered when calling the main Dict or Set constructor explicitly

Comments:

  • Type parameterization
    • Passing Unordered as a type parameter to the Dict/Set constructors is annoying. For Dict, (K=>V) is a good alternative; there's nothing like that for Set
    • Alternative: could create an UnorderedDict() or UnorderedSet, which is the same amount of typing.
    • Request: @JeffBezanson is it possible to defaults for unspecified type parameters?
    • Another remote possibility: deprecate {} for Any[], and reserve {} for Set notation?
  • Type system abuse

I'm actually less concerned about the type abuse than the extra type parameter to Dict/Set.

Any and all thoughts/suggestions/comments welcome.

@kmsquire
Copy link
Member Author

Performance numbers below, after warmup. Interestingly, Dict is slightly faster in this patch, possibly because of some added specialization, but more likely a fluke. The point is mainly that the performance of Dict didn't change, and the performance of OrderedDict is similar.

Before:

test_ins
========
Dict{K,V}: median=0.28366528200000013

test_del
========
Dict{K,V}: median=0.4370309559999964

test_ins_del
============
Dict{K,V}: median=0.49659235499999754

===============================================================================================================

With this patch:
Nothing = Unordered, 
Int64 = Ordered

test_ins
========
Dict{String,Int,Nothing}: median=0.2622292050000018
Dict{String,Int,Int64}: , median=0.31364540300000016

test_del
========
Dict{String,Int,Nothing}: median=0.3849849159999983
Dict{String,Int,Int64}: median=0.4319687069999986

test_ins_del
============
Dict{String,Int,Nothing}: median=0.4332843499999996
Dict{String,Int,Int64}: median=0.47339647700000054

Test code is here: https://gist.github.com/kmsquire/6217215

@StefanKarpinski
Copy link
Member

Never mind, hadn't yet seen that this was part of a pull-request that explains what you're doing.

@StefanKarpinski
Copy link
Member

Very interesting. This leverages the fact that an Array{Nothing} doesn't take up any room, which is turning out to be a very useful fact.

@kmsquire
Copy link
Member Author

I've been sitting on this for a while, and wanted to get it out. If it weren't for the extra type parameter, this would probably be a no-brainer--doesn't reduce performance, and adds useful functionality. The Ordered/Unordered type parameter makes it a little clunky, so I'm hoping there are ways to make it nicer (such as default type parameter values).

@StefanKarpinski
Copy link
Member

It would be nice to get this added somewhere, but I wonder if it doesn't belong in DataStructures rather than Base.

@kmsquire
Copy link
Member Author

That would be fine.

The main annoyance here is that the trivial implementation of OrderedDicts is not very performant (probably because it duplicates a bunch of information already stored by the main Dict type). This version gets around that by reusing most of the existing machinery, with slight modifications. A package version wouldn't have this luxury, and would either be easy to understand (but not have good performance), or copy much of the existing Dict implementation.

@kmsquire kmsquire closed this Nov 19, 2013
@StefanKarpinski
Copy link
Member

That's a fair point. Maybe it should just be in Base then.

@kmsquire
Copy link
Member Author

If I can remove the main annoyance of the current patch, which is that Dict{String,Int}() doesn't work (it requires a third type parameter of Ordered or Unordered). If I can get around that, I'll reopen.

@kmsquire
Copy link
Member Author

Right, it turns out that I looked at that before. For a type with 3 type parameters, there does not seem to be any way currently to call the type constructor by only specifying two.

@kmsquire
Copy link
Member Author

More details: If I change the basic type constructor to this:

type Dict{K,V,O} <: Associative{K,V}
    ...
    function Dict()
        if !(O <: Union(Unordered, Ordered))
            return Dict{K,V,Unordered}()
        end
        n = 16
        new(zeros(Uint8,n), Array(K,n), Array(V,n), Array(O,n), Array(O,0), 0, 0, identity)
    end
    ...
end

Then this works:

julia> Dict{String,Int,Any}()
Dict{String,Int64,Nothing}()

But this doesn't:

julia> Dict{String,Int}()
ERROR: type cannot be constructed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants