"Fix" perturbation confusion, ushering in a new age of glorious dual numbers and prosperity #213

jrevels · 2017-03-29T17:49:47Z

Resolves #83.

This PR overhauls ForwardDiff to support a tagging system. The bulk of the work is done, but it still has a bunch of TODOs:

~~support tuple of output buffers for new n-ary derivative!~~ Work is done, but will be introduced in a separate PR (see comment below)
add some more tests
write a macro for doing ternary dual number definitions (ugh) and use it for the old hypot methods
rebase after FMA all the way down #207 is merged and port over the FMA stuff
implement a new deprecation layer + tests
get SIMD tests passing (I haven't run them yet, maybe they already do)
update docs

Notable changes:

Experimental multithreading support has been removed. See the relevant commit for details.
Dual{N,V<:Real} → Dual{T,V<:Real,N} where T is the tag parameter, V is the value type, and N is the chunk size. Similar parameterization changes were made to the AbstractConfig types.
If the tagging system is enabled, and perturbation confusion is about to happen, the culprit operation will dispatch to a descriptive error saying "ForwardDiff won't differentiate this safely".
(EDIT: This work was completed, but will refactored/added in a separate PR.) Now that we have a tagging system, it's safe to implement n-ary derivative (analogous to ReverseDiff's n-ary API, which allows the user to pass a tuple of arguments instead of a single argument). Now we have low dimensional, stack allocated gradients:

julia> using ForwardDiff, BenchmarkTools

julia> a, b, c = 1.0, 2.0, 3.0;

# it works (and is fast/non-allocating)!
julia> @btime ForwardDiff.derivative(*, ($a, $b, $c))
  3.072 ns (0 allocations: 0 bytes)
(6.0, 3.0, 2.0)

# here's the objective time for comparison:
julia> @btime *($a, $b, $c)
  1.649 ns (0 allocations: 0 bytes)
6.0

# ...and here's the time for passing in Duals manually:
julia> begin
           da, db, dc = (Dual(a, one(a), zero(a), zero(a)),
                         Dual(b, zero(b), one(b), zero(b)),
                         Dual(c, zero(c), zero(c), one(c)))
           @btime *($da, $db, $dc)
       end
  4.234 ns (0 allocations: 0 bytes)
Dual{Void}(6.0,6.0,3.0,2.0)

My plan is to implement n-ary versions of the other API functions in future PRs.

The AbstractConfig constructors have changed, and our friend Chunk has made a comeback (and brought a new buddy, Tag). For example, GradientConfig{N}(x) is now GradientConfig(f, x, Chunk{N}()); here's the actual code for the constructor:

function GradientConfig{V,N,F,T}(::F,
                                 x::AbstractArray{V},
                                 ::Chunk{N} = Chunk(x),
                                 ::T = Tag(F, V))
    seeds = construct_seeds(Partials{N,V})
    duals = similar(x, Dual{T,V,N})
    return GradientConfig{T,V,N,typeof(duals)}(seeds, duals)
end

As you can see, the user is now expected to pass in the target function for which the GradientConfig is being used to perform differentiation. This function, along with a hash of the input element type, is used to generate a Tag.

Note the implication here: configuration instances are now tied to the function being differentiated. If you try to reuse a configuration instance constructed with one function to differentiate a different function, you'll get an error:

julia> x = rand(3);

julia> cfg = ForwardDiff.GradientConfig(sum, x);

# works fine:
julia> ForwardDiff.gradient(sum, x, cfg)
3-element Array{Float64,1}:
 1.0
 1.0
 1.0

# won't work:
julia> ForwardDiff.gradient(prod, x, cfg)
ERROR: The provided configuration (of type ForwardDiff.GradientConfig{ForwardDiff.Tag{Base.#sum,0},Float64,3,Array{ForwardDiff.Dual{ForwardDiff.Tag{Base.#sum,0},Float64,3},1}}) was constructed for a function other than the current target function. ForwardDiff cannot safely perform differentiation in this context; see the following issue for details: https://github.com/JuliaDiff/ForwardDiff.jl/issues/83. You can resolve this problem by constructing and using a configuration with the appropriate target function, e.g. `ForwardDiff.GradientConfig(prod, x)`
Stacktrace:
 [1] gradient(::Function, ::Array{Float64,1}, ::ForwardDiff.GradientConfig{ForwardDiff.Tag{Base.#sum,0},Float64,3,Array{ForwardDiff.Dual{ForwardDiff.Tag{Base.#sum,0},Float64,3},1}}) at /Users/jarrettrevels/.julia/v0.6/ForwardDiff/src/gradient.jl:7

Don't worry - if you currently are reusing ForwardDiff configurations between different target functions, you can still do so by inserting nothing as the function argument:

julia> cfg = ForwardDiff.GradientConfig(nothing, x);

julia> ForwardDiff.gradient(sum, x, cfg)
3-element Array{Float64,1}:
 1.0
 1.0
 1.0

julia> ForwardDiff.gradient(prod, x, cfg)
3-element Array{Float64,1}:
 0.0202222
 0.829584
 0.0220355

Note that this (halfway) opts out of the tagging system - a tag is still created from a hash of the element type, but does not incorporate the target function's type. Thus, the cost for this convenience is that you're a bit "less safe" w.r.t. perturbation confusion.

papamarkou · 2017-03-29T19:02:47Z

Is the preferred method for instantiating configs in terms of performance to tie the config with the differentiated function, or it doesn't matter speed-wise if one uses nothing as an input arg to the config?

jrevels · 2017-03-29T19:06:57Z

Is the preferred method for instantiating configs in terms of performance to tie the config with the differentiated function, or it doesn't matter speed-wise if one uses nothing as an input arg to the config?

The latter - passing in the function vs. passing nothing doesn't affect speed at all. The difference is the degree of "safety" you're given by the tagging system. Passing in the function is "safer" because it makes the tag more unique, and thus allows the system to catch perturbation confusion in more cases.

jrevels · 2017-04-07T17:10:26Z

Trying to resolve nested dual number method ambiguities on Julia v0.5 is turning out to be a pretty hellish endeavor. Julia v0.6 doesn't seem to have these problems, however. It's enough of a struggle that I'm quite tempted to just drop Julia v0.5 support from this PR, and moving forward, maintain a separate release branch for ForwardDiff v0.4.x, which would still support Julia v0.5.

ForwardDiff v0.5.0 (which would include this PR, and require Julia v0.6 and up) can then be tagged once Julia v0.6 is released.

Does anybody has a very good reason for me to not do this? cc @mlubin @tkelman

For the time being, I'm only going to develop this branch against Julia v0.6.

mlubin · 2017-04-07T17:57:07Z

Fine with me

…ed constructor

This enables us to throw the proper error(s) in case of perturbation confusion

…moving experimental multithreading functionality If you have a use case where the input is large enough for multithreading to be useful, then you should probably be using ReverseDiff instead of ForwardDiff. The exception to this is jacobian calculation, but ForwardDiff.jacobian never supported multithreading anyway. Once Julia's multithreading model matures, we can consider implementing multithreading for ForwardDiff.jacobian, but we probably should never again support it for gradients.

…currently poorly tested)

@KristofferC

…h a simpler predicate (proposed by @KristofferC)

jrevels · 2017-04-12T19:31:33Z

This is basically finished, assuming Travis passes.

jrevels · 2017-04-13T14:46:06Z

Hmm. Actually, I'm going to change one thing about this PR before merging: instead of the low-dimensional gradients being accessible via ForwardDiff.derivative, I'm going to make them accessible via ForwardDiff.gradient.

jrevels · 2017-04-13T16:32:31Z

Alright, I'm realizing the API for the stack-allocated gradient/Jacobian could make more sense if I just use StaticArrays instead of Tuples. That would be too much for this PR, though. Thus, I'm going to factor out the stack-allocated API work and do it in a separate PR later.

jrevels force-pushed the jr/perturb branch 2 times, most recently from 0634627 to 65b273d Compare March 30, 2017 20:34

jrevels force-pushed the jr/perturb branch 2 times, most recently from e8310ec to dca95b4 Compare April 7, 2017 15:52

jrevels force-pushed the jr/perturb branch from 7a318a4 to 4b8e3e1 Compare April 7, 2017 23:06

This was referenced Apr 9, 2017

Tag Celeste v0.3.0 [https://github.com/jeff-regier/Celeste.jl] JuliaLang/METADATA.jl#8627

Merged

#207, but rebased against master #214

Merged

jrevels added 17 commits April 11, 2017 12:51

add method for making single seed and improve performance of multi-se…

76ec61a

…ed constructor

implement multi-arg form of derivative

dbc9081

refactor Dual to include a tag type parameter

2e8d016

This enables us to throw the proper error(s) in case of perturbation confusion

unroll config type definitions for readability

672d9aa

fix binary Dual ambiguities

acd73c1

fix predicate definition order

f486b71

fix type parameter switchup in Dual implementation

6265392

get all tests passing except for deprecation layer + SIMD tests

188052c

attempt to fix method ambiguity on Julia v0.5

1fd6981

drop Julia v0.5

5224972

get ternary dual operation definition macro working (note that it is …

4662ccd

…currently poorly tested)

re-enable SIMD tests

72b3b7d

remove non-v0.6 version compat code and replace test_approx_duals wit…

d01378a

…h a simpler predicate (proposed by @KristofferC)

implement in-place tuple derivative function

4fa1add

fix typo

a53de0f

get FMA working

a93cc69

jrevels force-pushed the jr/perturb branch from 5ca44bc to a93cc69 Compare April 11, 2017 17:39

use where syntax instead of to-be-deprecated prefix typevar syntax

56a839e

jrevels added 7 commits April 11, 2017 15:28

test n-ary derivative API + perturbation confusion

799f2dc

implement ForwardDiff v0.4.x --> v0.5.x deprecation layer

c11eee2

update benchmarks

ee11abe

fix tag pretty printing for symbols

bfd5baf

change T to V in Partials code to avoid confusion with tag parameter

e708edb

fix jacobian call in benchmarks

84ae3f4

update docs

065f079

jrevels force-pushed the jr/perturb branch from ce26a69 to 065f079 Compare April 12, 2017 19:27

jrevels changed the title ~~[WIP] "Fix" perturbation confusion, ushering in a new age of glorious dual numbers and prosperity~~ "Fix" perturbation confusion, ushering in a new age of glorious dual numbers and prosperity Apr 12, 2017

jrevels mentioned this pull request Apr 13, 2017

use Base.sum instead of handrolled sum with poorly initialized accumulator JuliaStats/Distributions.jl#592

Merged

remove stack-allocated/low-dimensional gradient/Jacobian routines

200846c

jrevels merged commit 1279080 into master Apr 13, 2017

jrevels deleted the jr/perturb branch April 13, 2017 17:03

jrevels mentioned this pull request Apr 13, 2017

Add specialized StaticArrays-based methods #215

Merged

ChrisRackauckas mentioned this pull request Apr 21, 2017

ForwardDiff.jl Usage JuliaDiff/FiniteDiff.jl#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Fix" perturbation confusion, ushering in a new age of glorious dual numbers and prosperity #213

"Fix" perturbation confusion, ushering in a new age of glorious dual numbers and prosperity #213

jrevels commented Mar 29, 2017 •

edited

Loading

papamarkou commented Mar 29, 2017

jrevels commented Mar 29, 2017

jrevels commented Apr 7, 2017

mlubin commented Apr 7, 2017

jrevels commented Apr 12, 2017

jrevels commented Apr 13, 2017

jrevels commented Apr 13, 2017

"Fix" perturbation confusion, ushering in a new age of glorious dual numbers and prosperity #213

"Fix" perturbation confusion, ushering in a new age of glorious dual numbers and prosperity #213

Conversation

jrevels commented Mar 29, 2017 • edited Loading

papamarkou commented Mar 29, 2017

jrevels commented Mar 29, 2017

jrevels commented Apr 7, 2017

mlubin commented Apr 7, 2017

jrevels commented Apr 12, 2017

jrevels commented Apr 13, 2017

jrevels commented Apr 13, 2017

jrevels commented Mar 29, 2017 •

edited

Loading