Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extrapolation fails with MethodError: Cannot convert an object of type Missing #424

Closed
ederag opened this issue May 17, 2021 · 12 comments
Closed

Comments

@ederag
Copy link
Contributor

ederag commented May 17, 2021

Extrapolation is not working.

julia> interpolant = LinearInterpolation([1.0, 2.0, 3.0], [1.0, 2.0, 3.0], extrapolation_bc=missing)
3-element extrapolate(interpolate((::Vector{Float64},), ::Vector{Float64}, Gridded(Linear())), missing) with element type Union{Missing, Float64}:
 1.0
 2.0
 3.0

julia> interpolant(4.0)
ERROR: MethodError: Cannot `convert` an object of type Missing to an object of type Float64
Closest candidates are:
  convert(::Type{T}, ::Base.TwicePrecision) where T<:Number at twiceprecision.jl:250
  convert(::Type{T}, ::AbstractChar) where T<:Number at char.jl:180
  convert(::Type{T}, ::CartesianIndex{1}) where T<:Number at multidimensional.jl:136
  ...
Stacktrace:
 [1] (::Interpolations.FilledExtrapolation{Union{Missing, Float64}, 1, Interpolations.GriddedInterpolation{Float64, 1, Float64, Gridded{Linear}, Tuple{Vector{Float64}}}, Gridded{Linear}, Missing})(x::Float64)
   @ Interpolations ~/.julia/packages/Interpolations/GIn2o/src/extrapolation/filled.jl:37
 [2] top-level scope
   @ REPL[18]:1

It might be related to "something is not working well there" comment of #404 (comment) ?


Long story,
after redirection from fonsp/Pluto.jl#1163 (comment)
since using extrapolation_bc=missing gave a "Failed to show" error in Pluto
(not important, just in case someone else would search here for isassigned):

```julia julia> using Interpolations

julia> interpolant = LinearInterpolation([100, 110, 120], [100, 110, 120], extrapolation_bc=missing)

julia> isassigned(interpolant, 1)
ERROR: MethodError: Cannot convert an object of type Missing to an object of type Int64
Closest candidates are:
convert(::Type{T}, ::Ptr) where T<:Integer at pointer.jl:23
convert(::Type{T}, ::Base.TwicePrecision) where T<:Number at twiceprecision.jl:250
convert(::Type{T}, ::AbstractChar) where T<:Number at char.jl:180
...
Stacktrace:
[1] FilledExtrapolation
@ ~/.julia/packages/Interpolations/GIn2o/src/extrapolation/filled.jl:37 [inlined]
[2] getindex
@ ~/.julia/packages/Interpolations/GIn2o/src/Interpolations.jl:449 [inlined]
[3] isassigned(a::Interpolations.FilledExtrapolation{Union{Missing, Float64}, 1, Interpolations.GriddedInterpolation{Float64, 1, Int64, Gridded{Linear}, Tuple{Vector{Int64}}}, Gridded{Linear}, Missing}, i::Int64)
@ Base ./abstractarray.jl:513
[4] top-level scope
@ REPL[5]:1


`isassigned` is used by Pluto, and should return a boolean.

abstractarray.jl:513 is
```julia
function isassigned(a::AbstractArray, i::Integer...)
    try
        a[i...]

and indeed interpolant[i] or getindex(interpolant, 1) give the same error.
getindex is overridden here:

# getindex is supported only for Integer indices (deliberately)
import Base: getindex
@propagate_inbounds getindex(itp::AbstractInterpolation{T,N}, i::Vararg{Integer,N}) where {T,N} = itp(i...)

@mkitti
Copy link
Collaborator

mkitti commented May 17, 2021

Try this:

julia> f64m = Union{Float64, Missing}
Union{Missing, Float64}

julia> interpolant = LinearInterpolation([1.0, 2.0, 3.0], f64m[1.0, 2.0, 3.0], extrapolation_bc=missing)
3-element extrapolate(interpolate((::Vector{Float64},), ::Vector{Union{Missing, Float64}}, Gridded(Linear())), missing) with element type Union{Missing, Float64}:
 1.0
 2.0
 3.0

julia> interpolant(4.0)
missing

@ederag
Copy link
Contributor Author

ederag commented May 17, 2021

Indeed, it works,
although I'd have expected the return type of interpolant(x, y, extrapolation_bc=extrap)
to be a Union{eltype(y), typeof(extrap)}.

julia> eltype(y)
Float64

julia> extrap = missing;
julia> Union{eltype(y), typeof(extrap)}
Union{Missing, Float64}

julia> extrap = 0.0;
julia> Union{eltype(y), typeof(extrap)}
Float64

Or it should be documented in the extrapolation section.

@mkitti
Copy link
Collaborator

mkitti commented May 17, 2021

The more type-stable thing to do would be to use NaN as the fill value since NaN is a Float64.

What you describe is the subject of a commit that I reverted since I need some justification to override some tests:
27be03d
xr #367, #370

Perhaps I have that justification now.

@ederag
Copy link
Contributor Author

ederag commented May 17, 2021

Isn't type stability about having an output entirely determined by the type of the arguments, and not their value ?
(Genuine question, I'm new to julia)
Then even with Union{eltype(y), typeof(extrap)}, it should be type stable and fast.

@mkitti
Copy link
Collaborator

mkitti commented May 17, 2021

In this case, we narrow down the possible return types to Union{Missing, Float64}, but this is still a Union of two distinct types. It would be much easier to optimize if the only return type were just Float64. No branching logic would be needed for downstream code.

@ederag
Copy link
Contributor Author

ederag commented May 17, 2021

For performance critical code, using NaN rather than missing seems better indeed.
Edit: I'm not so sure about that, hold on

But I was using missing in a non-critical part, just to find where to draw 3 vertical lines in a plot.
It would have been natural to exploit skipmissing down the line.
Below are more details of this use case, if you are interested.

machine_caches() returns a named tuple, such as

(L1 = 32768, L2 = 524288, L3 = 16777216)

The memory usage of an ODE kernel:

# 6 arrays of n Float64 used in time_evolution
memory_usage(n) = 6 * Base.summarysize(zeros(Float64, n))

Then the limits of cache filling are evaluated as:

function N_filling_caches()
	interpolant = LinearInterpolation(memory_usage.(N_points_), N_points_)
	caches = machine_caches()
	
	min = memory_usage(first(N_points_))
	max = memory_usage(last(N_points_))
	NamedTuple(name => convert(Float64, interpolant(caches[name]))
		for name in keys(caches)
		if min <= caches[name] <= max  # avoid extrapolation
	)
end

In an ideal world, I'd have mapped the interpolant on machine_caches() and be done.
But that would need JuliaLang/julia#33788
Or even better: interpolant(machine_caches()).

@mkitti
Copy link
Collaborator

mkitti commented May 18, 2021

How about this:

julia> Np = 1:1000.0:100001.0

julia> interpolant = LinearInterpolation(memory_usage.(Np), Np; extrapolation_bc=NaN);

julia> interpolated = NamedTuple{ keys(caches) }( interpolant.( values(caches) ) )
(L1 = 677.6666666666666, L2 = 10917.666666666668, L3 = NaN)

julia> (; [k => v for (k,v) in pairs(interpolated) if !isnan(v)]...)
(L1 = 677.6666666666666, L2 = 10917.666666666668)

@mkitti
Copy link
Collaborator

mkitti commented May 18, 2021

Perhaps, I'm missing something, but is this not sufficient:

julia> N_points(memory) = ( memory/6 - Base.summarysize(zeros(Float64,0)) ) / sizeof(Float64)
N_points (generic function with 1 method)

julia> caches = machine_caches()
(L1 = 32768, L2 = 524288, L3 = 16777216)

julia> NamedTuple{ keys(caches) }( N_points.( values(caches) ) )
(L1 = 677.6666666666666, L2 = 10917.666666666666, L3 = 349520.3333333333)

@ederag
Copy link
Contributor Author

ederag commented May 18, 2021

These are all valid, and better.
Interpolation seemed to be a no-brainer (at least in octave it would have been).
Then, I had missing in mind, since it looked like the equivalent of NA (Not Available) in octave.
Please note that I'm just saying how it happened, for information.
If handling missing adds too much complexity, that's fine.

@ederag
Copy link
Contributor Author

ederag commented May 18, 2021

Actually, it looks that missing is sometimes faster than NaN, at least for some purposes:
https://discourse.julialang.org/t/performance-of-union-missing-float64/16348/2

@mkitti
Copy link
Collaborator

mkitti commented May 18, 2021

I posted there:

julia>       function sum_non_nan(X::AbstractArray)
                  s = zero(eltype(X))
                  @inbounds @simd for x in X
                      # simplify the branch so it can SIMD.
                      s += isnan(x) ? zero(x) : x
                  end
                  s
              end

julia> function sum_nonmissing(X::AbstractArray)
           s = zero(eltype(X))
           @inbounds @simd for x in X
                   s += ismissing(x) ? zero(x) : x
           end
           s
       end

julia> Y1 = rand(10_000_000);

julia> Y2 = Vector{Union{Missing, Float64}}(Y1);

julia> Y3 = ifelse.(rand(length(Y2)) .< 0.9, Y2, missing);

julia> Y3_nan = Array{Float64}(replace(x->ismissing(x) ? NaN : x, Y3));

julia> @btime sum_nonmissing($Y1)
  9.132 ms (0 allocations: 0 bytes)
4.999213478955774e6

julia> @btime sum_non_nan($Y1)
  10.114 ms (0 allocations: 0 bytes)
4.999213478955774e6

julia> @btime sum_nonmissing($Y2);
  17.643 ms (0 allocations: 0 bytes)

julia> @btime sum_nonmissing($Y3);
  13.534 ms (0 allocations: 0 bytes)

julia> @btime sum_non_nan($Y3_nan);
  10.156 ms (0 allocations: 0 bytes)

For me only sum_nonmissing($Y1) is faster and that is because the compiler knows nothing is missing in Y1. sum_non_nan($Y1) and sum_non_nan($Y3_nan) take about the same time.

Note that the conditional structure if important here. Using the ternary operator with ismissing or isnan is important.

If you want to see what the compiler is doing, try @code_llvm or @code_native.

@ederag
Copy link
Contributor Author

ederag commented May 18, 2021

OK, thanks, I'm convinced for performance-critical code.

missing has some advantages for non-critical code, and it would be nice to support it.

But code complexity and maintainability are most important. I have no idea about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants