-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nullable fields don't always need Union{Missing, T} #384
Comments
We recently updated the Yeah, we could potentially check the validitybitmap to see if there are any missings before building the eltype, but it does make me a tad nervous for some unrelated side effects it might introduce. I'd say let's go for a PR and then we can take a look at how much work this would actually be. |
I don't think it's fixed: julia> col1 = Vector{Union{Int64, String}}[
["one", 2],
["one", 2, 3],
["one", 2, 3, 4],
["one", 2, 3, 4, 5]];
julia> df = DataFrame(;col1)
4×1 DataFrame
Row │ col1
│ Array…
─────┼───────────────────────────────────
1 │ Union{Int64, String}["one", 2]
2 │ Union{Int64, String}["one", 2, 3]
3 │ Union{Int64, String}["one", 2, 3…
4 │ Union{Int64, String}["one", 2, 3…
julia> a = tempname()
"/tmp/jl_IngNyJwngp"
julia> Arrow.write(a, df)
"/tmp/jl_IngNyJwngp"
julia> Arrow.Table(a)
Arrow.Table with 4 rows, 1 columns, and schema:
:col1 … SubArray{Union{Missing, Int64, String}, 1, Arrow.DenseUnion{Union{Missing, Int64, String}, Arrow.UnionT{Arrow.Flatbuf.UnionMode.Dense, nothing, Tuple{Union{Missing, Int64}, String}}, Tuple{Arrow.Primitive{Union{Missing, Int64}, Vector{Int64}}, Arrow.List{String, Int32, Vector{UInt8}}}}, Tuple{UnitRange{Int64}}, true}
julia> Arrow.Table(a).col1[1]
2-element view(::Arrow.DenseUnion{Union{Missing, Int64, String}, Arrow.UnionT{Arrow.Flatbuf.UnionMode.Dense, nothing, Tuple{Union{Missing, Int64}, String}}, Tuple{Arrow.Primitive{Union{Missing, Int64}, Vector{Int64}}, Arrow.List{String, Int32, Vector{UInt8}}}}, 1:2) with eltype Union{Missing, Int64, String}:
"one"
2 |
I'm trying to implement the GeoArrow spec, which gives back coordinates in a deeply nested list of a FixedList (a point). Because these lists are theoretically nullable, in Julia we get an deeply nested list with
Union
s ofMissing
, even though these vectors contain nomissing
s. An example for a column of LineStrings (there are geometry types that require two more levels of nesting):2-element Arrow.List{Vector{Union{Missing, Vector{Union{Missing, Tuple{Float64, Float64}}}}}
It's pretty hard to convert these elements to a concrete
Vector{Vector{NTuple, Float64}}
without allocating. Is there a way to edit the view to be non missing? An alternative way would be to passall(validitybitmap)
inbuild
tojuliaeltype
, so we only set Missing when there are actual missing values.I'm happy to make a PR if there's consensus on what to do.
Might be related to #373.
The text was updated successfully, but these errors were encountered: