Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Immutability of distributions #482

Closed
papamarkou opened this issue Apr 8, 2016 · 20 comments
Closed

Immutability of distributions #482

papamarkou opened this issue Apr 8, 2016 · 20 comments

Comments

@papamarkou
Copy link
Contributor

In many real applications, it becomes crucial to be able to tweak a distribution. For example, in various Monte Carlo methods (Gibbs with conditional distributions representing block updates, Metropolis-Hastings proposal), it is needed to retain the same type of distribution and only update its parameters. A very concrete example would be to update the mean μ of a d = Normal(). It would be more reasonable to retain the same distribution and only do d.μ = 1.2, instead of doing d = Normal(1.2). The problem with the second solution is that one needs to reallocate the distribution at each iteration - so imagine 100,000 reallocations of a normal if one has 100,000 MCMC iterations.

I thought two workarounds; we can simply turn the distributions in Distributions from immutable to type or, if you feel that this is not a good idea, I could simply define a new package with mutable distributions, say MutableDistributions.jl. I would favor the former, in order to avoid populating Julia's package ecosystem with multiple distribution-centric packages, but if you feel that turning distributions to mutable types is a big compromise, I am happy to set up a separate package with mutable distributions.

@KristofferC
Copy link
Member

These reallocation would most likely be optimized away by the compiler. I would guess that you would see a performance losses changing to a type because they would not be stored in line and the compiler can do fewer assumptions.

@papamarkou
Copy link
Contributor Author

Interesting. I tried the following test from the REPL (after enclosing the test code in functions), and it turns out that you might as well be right. The first run showed that mutating the mean of a mutable normal is faster, but then subsequent runs show that reallocating immutable normals is at least as fast, perhaps after LLVM's magic kicks in:

julia> using Distributions

julia> type MutableNormal{T <: Real}
         μ::T
         σ::T
       end

julia> function set_immutable_pdf(n::Int)
         for i in 1:n
           d = Normal(1.2)
         end
       end
set_immutable_pdf (generic function with 1 method)

julia> function set_mutable_pdf_mean(n::Int, d)
         for i in 1:n
           d.μ = .2
         end
       end
set_mutable_pdf_mean (generic function with 1 method)

julia> n = 1000000
1000000

julia> d = MutableNormal(0., 1.)
MutableNormal{Float64}(0.0,1.0)

julia> @time set_immutable_pdf(n)
  0.008659 seconds (12.04 k allocations: 512.825 KB)

julia> @time set_mutable_pdf_mean(n, d)
  0.002381 seconds (3.95 k allocations: 156.292 KB)

julia> @time set_immutable_pdf(n)
  0.000002 seconds (4 allocations: 160 bytes)

julia> @time set_mutable_pdf_mean(n, d)
  0.000003 seconds (4 allocations: 160 bytes)

@papamarkou
Copy link
Contributor Author

Just for the record, I also tried setting the mean to a different value at each iteration, to see if the compiler will remain effective, so the following still worked:

julia> using Distributions

julia> type MutableNormal{T <: Real}
         μ::T
         σ::T
       end

julia> function set_immutable_pdf(v::Vector{Float64}, n::Int)
         for i in 1:n
           d = Normal(v[i])
         end
       end
set_immutable_pdf (generic function with 1 method)

julia> function set_mutable_pdf_mean(v::Vector{Float64}, n::Int, d)
         for i in 1:n
           d.μ = v[i]
         end
       end
set_mutable_pdf_mean (generic function with 1 method)

julia> n = 1000000
1000000

julia> v = randn(n);

julia> d = MutableNormal(0., 1.)
MutableNormal{Float64}(0.0,1.0)

julia> @time set_immutable_pdf(v, n)
  0.008959 seconds (12.16 k allocations: 513.358 KB)

julia> @time set_mutable_pdf_mean(v, n, d)
  0.004503 seconds (4.12 k allocations: 163.585 KB)

julia> n = 1000000
1000000

julia> v = randn(n);

julia> d = MutableNormal(0., 1.)
MutableNormal{Float64}(0.0,1.0)

julia> @time set_immutable_pdf(v, n)
  0.000347 seconds (4 allocations: 160 bytes)

julia> @time set_mutable_pdf_mean(v, n, d)
  0.000831 seconds (4 allocations: 160 bytes)

@KristofferC
Copy link
Member

julia> @code_llvm set_immutable_pdf(n)

define void @julia_set_immutable_pdf_32458(i64) #0 {
top:
  ret void
}

In this case, LLVM optimized away the whole function because it didn't actually have any affect on anything. This happens in some of the other functions you try to benchmark.

@papamarkou
Copy link
Contributor Author

One more question. What happens if I use a multivariate instead of a univariate distribution? Won't there be unnecessary repeated allocations/deallocations of the parameters of the distribution? I will try it out this evening, and will post the code here.

@KristofferC
Copy link
Member

For multivariates you can already update the means and covariances in place since they are just vectors and matrices.

@papamarkou
Copy link
Contributor Author

Good point, you actually solved a long-standing question I had. I had been wondering whether it is legitimate (both conceptually and in julian terms) to change a field of type Vector of an immutable type (for example whether it is ok to change the mean μ of a multivariate normal). I am aware it is possible, since the field is only a reference to the vector, I just had this "philosopical" concern that it is kind of "wrong" to change the reference (field) of an immutable type because this means that we actually mutate a supposedly immutable object...

@papamarkou
Copy link
Contributor Author

So, I ran the simple simulation I had in mind. My first observation/reminder is that for a multivariate normal is simply impossible to do

julia> using Distributions

julia> d = MvNormal(ones(2), ones(2))
DiagNormal(
dim: 2
μ: [1.0,1.0]
Σ: 2x2 Array{Float64,2}:
 1.0  0.0
 0.0  1.0
)

julia> d.μ = rand(2)
ERROR: type MvNormal is immutable

The fix is to use a loop, but look now how bad performance becomes even by avoiding reallocating an array:

julia> using Distributions

julia> type MyNormal
         μ::Vector{Float64}
       end

julia> function set_immutable_pdf(v::Vector{Float64}, n::Int, d)
         for i in 1:n
           for j in 1:n
             d.μ[j] = v[j]
           end
         end
       end
set_immutable_pdf (generic function with 1 method)

julia> function set_mutable_pdf_mean(v::Vector{Float64}, n::Int, d)
         for i in 1:n
           d.μ = v
         end
       end
set_mutable_pdf_mean (generic function with 1 method)

julia> n = 10000
10000

julia> v = randn(n);

julia> d = MyNormal(v);

julia> id = MvNormal(v, ones(length(v)));

julia> @time set_immutable_pdf(v, n, id)
  0.097023 seconds (3.67 k allocations: 213.163 KB)

julia> @time set_mutable_pdf_mean(v, n, d)
  0.002242 seconds (2.49 k allocations: 111.813 KB)

julia> v = randn(n);

julia> d = MyNormal(v);

julia> id = MvNormal(v, ones(length(v)));

julia> @time set_immutable_pdf(v, n, id)
  0.093223 seconds (4 allocations: 160 bytes)

julia> @time set_mutable_pdf_mean(v, n, d)
  0.000007 seconds (4 allocations: 160 bytes)

@papamarkou
Copy link
Contributor Author

Obviously this gap in performance comes from the fact that the mutable multivariate normal (which is not a normal distribution really, but that does not relate to the problem) allows to simply change the reference stored in μ. In the case of the immutable MvNormal the values of the vector are copied one by one.

@papamarkou papamarkou reopened this Apr 8, 2016
@KristofferC
Copy link
Member

You can just create new MvNormals from the vector instead of copying, MvNormal(v, d.Σ).

This will lead to no good. The sparse matrix library in base has been haunted by performance problems because SparseMatrixCSC is a type. This means for example that field accesses can't be hoisted. Switching from immutable to type will lead to worse performance all around.

@eford
Copy link

eford commented Apr 8, 2016

  1. What's the point of the i loop? If it's just for timing purposes, I think that you should use an outer loop.
  2. I think we want it to copy the values of the vector one by one. Otherwise, a subsequent change to the mean vector that the MvNormal points to would also result in the mean of the distribution changing, even after one thought they had set the mean of the distribution.

@papamarkou
Copy link
Contributor Author

@KristofferC, I had tried what you proposed (i.e. MvNormal(v, d.Σ)), and it is still inefficient simply because of GC (using an immutable type, about 25% of the simulation is spent on reallocations). I completely agree that there is the more general concern of performance all around, and I consider it important indeed. My main point is that there are different use cases, with different computing needs. Problems with high number of iterations can not be sorted out using an immutable distribution type (or, at least that's what I thought, more on this further below).

@eford, yes, the outer loop was used as a test for measuring performance. I agree with what you said that a reference hides the danger of misusing it by mistake, yet when it is used intentionally and carefully it provides better performance than copying elements one by one (and it is indeed possible to use references in the case of MCMC for instance).

Now, here is the solution to achieve the end goal of touching the reference without reallocations or copying of values:

julia> using Distributions

julia> m = [2., 3.]
2-element Array{Float64,1}:
 2.0
 3.0

julia> d = MvNormal(m, ones(2))
DiagNormal(
dim: 2
μ: [2.0,3.0]
Σ: 2x2 Array{Float64,2}:
 1.0  0.0
 0.0  1.0
)


julia> m[1] = 10.
10.0

julia> d
DiagNormal(
dim: 2
μ: [10.0,3.0]
Σ: 2x2 Array{Float64,2}:
 1.0  0.0
 0.0  1.0
)

So I can simply keep changing m; d.μ is a reference to m so it changes along with m.

@papamarkou
Copy link
Contributor Author

Having found the above workaround, I can now close the issue.

@papamarkou
Copy link
Contributor Author

After thinking a bit more about the problem, I realised that the solution I provided above is ok as long as one changes values of m elementwise, which is not what happens in MCMC, simply because in the context of MCMC m would be sampled all at once as a vector.

I agree with your thoughts @KristofferC about performance all around, it makes sense to retain distributions in Distributions as immutable not to lose performance in user cases different than the ones I describe above. For this reason I will keep the current issue closed.

The only alternatives I can think to solve the computational problem I am dealing with is either to simply copy-paste all the types of Distributions to a new repo called MutableDistributions and try to somehow connect it to Distributions, or simply copy-paste the Distributions code inside Lora, where I can allow the new distribution types to be mutable (in case the former solution is not so favorable in terms of registering a new package).

@KristofferC
Copy link
Member

Do you have any real code that you have benchmarked with?

@papamarkou
Copy link
Contributor Author

Only toy examples @KristofferC, but good point, let me set up a proper Gibbs example involving multivariate distributions to see it in practice rather than speculating via satellite examples of smaller caliber.

@johnmyleswhite
Copy link
Member

The only alternatives I can think to solve the computational problem I am dealing with is either to simply copy-paste all the types of Distributions to a new repo called MutableDistributions and try to somehow connect it to Distributions, or simply copy-paste the Distributions code inside Lora, where I can allow the new distribution types to be mutable (in case the former solution is not so favorable in terms of registering a new package).

Please don't this. I think there are two cases where you would hit problems and both have better solutions that introducing mutable types. Mutable is a terrible name for that type in this context: the important point about those types isn't their mutability, it's the fact that they have identity -- with mutable types you can distinguish a and b in the example below:

a = Normal(0, 1)
b = Normal(0, 1)

Making distributions mutable is in many ways equivalent to making them container types, which I think is more clearly bad than it sounds to make them mutable.

The two issues I see are:

  • Stack allocation of non-pointer free immutables can hurt performance, but will be fixed someday before 1.0.
  • Many of our constructors run lots of checks that we might want to skip. If that's a problem, we should try to find a way to make it possible to always skip input invalidation when constructing distributions.

@andreasnoack
Copy link
Member

After thinking a bit more about the problem, I realised that the solution I provided above is ok as long as one changes values of m elementwise, which is not what happens in MCMC, simply because in the context of MCMC m would be sampled all at once as a vector.

Wouldn't it be an option just to sample the random variables directly into the storage of the variable, e.g. something like

julia> d = MvNormal(rand(1000), randn(1000,1000) |> t -> t't);

julia> randn!(d.μ);

@simonbyrne
Copy link
Member

I think the best solution to this would be resolving JuliaLang/julia#11902

@papamarkou
Copy link
Contributor Author

Yes, @johnmyleswhite, it would be nice to have the option to "turn off" @check_args in the internal constructors of distributions. Ιf that's ok, I will open a standalone issue about this matter.

@simonbyrne thanks for the link! I read it, and remain a bit unclear as to what the plan re immutability is. I haven't understood if it will ultimately allow to update immutable type fields in place (if yes, this would solve the Distributions issue associated with Gibbs).

@andreasnoack your solution seems to be doing what is needed for updating the parameters y in a conditional distribution p(x|y) (as needed for Gibbs)...! At the same time, it sounds I might have to wait till we see how JuliaLang/julia#11902 is resolved...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants