Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Fix" perturbation confusion, ushering in a new age of glorious dual numbers and prosperity #213

Merged
merged 26 commits into from
Apr 13, 2017

Conversation

jrevels
Copy link
Member

@jrevels jrevels commented Mar 29, 2017

Resolves #83.

This PR overhauls ForwardDiff to support a tagging system. The bulk of the work is done, but it still has a bunch of TODOs:

  • support tuple of output buffers for new n-ary derivative! Work is done, but will be introduced in a separate PR (see comment below)
  • add some more tests
  • write a macro for doing ternary dual number definitions (ugh) and use it for the old hypot methods
  • rebase after FMA all the way down #207 is merged and port over the FMA stuff
  • implement a new deprecation layer + tests
  • get SIMD tests passing (I haven't run them yet, maybe they already do)
  • update docs

Notable changes:

  • Experimental multithreading support has been removed. See the relevant commit for details.

  • Dual{N,V<:Real}Dual{T,V<:Real,N} where T is the tag parameter, V is the value type, and N is the chunk size. Similar parameterization changes were made to the AbstractConfig types.

  • If the tagging system is enabled, and perturbation confusion is about to happen, the culprit operation will dispatch to a descriptive error saying "ForwardDiff won't differentiate this safely".

  • (EDIT: This work was completed, but will refactored/added in a separate PR.) Now that we have a tagging system, it's safe to implement n-ary derivative (analogous to ReverseDiff's n-ary API, which allows the user to pass a tuple of arguments instead of a single argument). Now we have low dimensional, stack allocated gradients:

julia> using ForwardDiff, BenchmarkTools

julia> a, b, c = 1.0, 2.0, 3.0;

# it works (and is fast/non-allocating)!
julia> @btime ForwardDiff.derivative(*, ($a, $b, $c))
  3.072 ns (0 allocations: 0 bytes)
(6.0, 3.0, 2.0)

# here's the objective time for comparison:
julia> @btime *($a, $b, $c)
  1.649 ns (0 allocations: 0 bytes)
6.0

# ...and here's the time for passing in Duals manually:
julia> begin
           da, db, dc = (Dual(a, one(a), zero(a), zero(a)),
                         Dual(b, zero(b), one(b), zero(b)),
                         Dual(c, zero(c), zero(c), one(c)))
           @btime *($da, $db, $dc)
       end
  4.234 ns (0 allocations: 0 bytes)
Dual{Void}(6.0,6.0,3.0,2.0)

My plan is to implement n-ary versions of the other API functions in future PRs.

  • The AbstractConfig constructors have changed, and our friend Chunk has made a comeback (and brought a new buddy, Tag). For example, GradientConfig{N}(x) is now GradientConfig(f, x, Chunk{N}()); here's the actual code for the constructor:
function GradientConfig{V,N,F,T}(::F,
                                 x::AbstractArray{V},
                                 ::Chunk{N} = Chunk(x),
                                 ::T = Tag(F, V))
    seeds = construct_seeds(Partials{N,V})
    duals = similar(x, Dual{T,V,N})
    return GradientConfig{T,V,N,typeof(duals)}(seeds, duals)
end

As you can see, the user is now expected to pass in the target function for which the GradientConfig is being used to perform differentiation. This function, along with a hash of the input element type, is used to generate a Tag.

Note the implication here: configuration instances are now tied to the function being differentiated. If you try to reuse a configuration instance constructed with one function to differentiate a different function, you'll get an error:

julia> x = rand(3);

julia> cfg = ForwardDiff.GradientConfig(sum, x);

# works fine:
julia> ForwardDiff.gradient(sum, x, cfg)
3-element Array{Float64,1}:
 1.0
 1.0
 1.0

# won't work:
julia> ForwardDiff.gradient(prod, x, cfg)
ERROR: The provided configuration (of type ForwardDiff.GradientConfig{ForwardDiff.Tag{Base.#sum,0},Float64,3,Array{ForwardDiff.Dual{ForwardDiff.Tag{Base.#sum,0},Float64,3},1}}) was constructed for a function other than the current target function. ForwardDiff cannot safely perform differentiation in this context; see the following issue for details: https://github.com/JuliaDiff/ForwardDiff.jl/issues/83. You can resolve this problem by constructing and using a configuration with the appropriate target function, e.g. `ForwardDiff.GradientConfig(prod, x)`
Stacktrace:
 [1] gradient(::Function, ::Array{Float64,1}, ::ForwardDiff.GradientConfig{ForwardDiff.Tag{Base.#sum,0},Float64,3,Array{ForwardDiff.Dual{ForwardDiff.Tag{Base.#sum,0},Float64,3},1}}) at /Users/jarrettrevels/.julia/v0.6/ForwardDiff/src/gradient.jl:7

Don't worry - if you currently are reusing ForwardDiff configurations between different target functions, you can still do so by inserting nothing as the function argument:

julia> cfg = ForwardDiff.GradientConfig(nothing, x);

julia> ForwardDiff.gradient(sum, x, cfg)
3-element Array{Float64,1}:
 1.0
 1.0
 1.0

julia> ForwardDiff.gradient(prod, x, cfg)
3-element Array{Float64,1}:
 0.0202222
 0.829584
 0.0220355

Note that this (halfway) opts out of the tagging system - a tag is still created from a hash of the element type, but does not incorporate the target function's type. Thus, the cost for this convenience is that you're a bit "less safe" w.r.t. perturbation confusion.

@papamarkou
Copy link
Contributor

Is the preferred method for instantiating configs in terms of performance to tie the config with the differentiated function, or it doesn't matter speed-wise if one uses nothing as an input arg to the config?

@jrevels
Copy link
Member Author

jrevels commented Mar 29, 2017

Is the preferred method for instantiating configs in terms of performance to tie the config with the differentiated function, or it doesn't matter speed-wise if one uses nothing as an input arg to the config?

The latter - passing in the function vs. passing nothing doesn't affect speed at all. The difference is the degree of "safety" you're given by the tagging system. Passing in the function is "safer" because it makes the tag more unique, and thus allows the system to catch perturbation confusion in more cases.

@jrevels jrevels force-pushed the jr/perturb branch 2 times, most recently from 0634627 to 65b273d Compare March 30, 2017 20:34
@jrevels jrevels force-pushed the jr/perturb branch 2 times, most recently from e8310ec to dca95b4 Compare April 7, 2017 15:52
@jrevels
Copy link
Member Author

jrevels commented Apr 7, 2017

Trying to resolve nested dual number method ambiguities on Julia v0.5 is turning out to be a pretty hellish endeavor. Julia v0.6 doesn't seem to have these problems, however. It's enough of a struggle that I'm quite tempted to just drop Julia v0.5 support from this PR, and moving forward, maintain a separate release branch for ForwardDiff v0.4.x, which would still support Julia v0.5.

ForwardDiff v0.5.0 (which would include this PR, and require Julia v0.6 and up) can then be tagged once Julia v0.6 is released.

Does anybody has a very good reason for me to not do this? cc @mlubin @tkelman

For the time being, I'm only going to develop this branch against Julia v0.6.

@mlubin
Copy link
Contributor

mlubin commented Apr 7, 2017

Fine with me

jrevels added 17 commits April 11, 2017 12:51
This enables us to throw the proper error(s) in case of perturbation confusion
…moving experimental multithreading functionality

If you have a use case where the input is large enough for multithreading to be useful,
then you should probably be using ReverseDiff instead of ForwardDiff. The exception to
this is jacobian calculation, but ForwardDiff.jacobian never supported multithreading
anyway. Once Julia's multithreading model matures, we can consider implementing
multithreading for ForwardDiff.jacobian, but we probably should never again support
it for gradients.
@jrevels jrevels changed the title [WIP] "Fix" perturbation confusion, ushering in a new age of glorious dual numbers and prosperity "Fix" perturbation confusion, ushering in a new age of glorious dual numbers and prosperity Apr 12, 2017
@jrevels
Copy link
Member Author

jrevels commented Apr 12, 2017

This is basically finished, assuming Travis passes.

@jrevels
Copy link
Member Author

jrevels commented Apr 13, 2017

Hmm. Actually, I'm going to change one thing about this PR before merging: instead of the low-dimensional gradients being accessible via ForwardDiff.derivative, I'm going to make them accessible via ForwardDiff.gradient.

@jrevels
Copy link
Member Author

jrevels commented Apr 13, 2017

Alright, I'm realizing the API for the stack-allocated gradient/Jacobian could make more sense if I just use StaticArrays instead of Tuples. That would be too much for this PR, though. Thus, I'm going to factor out the stack-allocated API work and do it in a separate PR later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Perturbation Confusion (Nested Differentiation Bug)
3 participants