Skip to content
This repository was archived by the owner on May 24, 2022. It is now read-only.

Horizontally concatenate two narrow columns, then unstack on the concatenation #10

Open
oxinabox opened this issue Mar 9, 2020 · 3 comments

Comments

@oxinabox
Copy link
Member

oxinabox commented Mar 9, 2020

(for my reference this was #8 on the initial list of features)

Example by Ian Goddarrd

julia> using DataFrames,DataFramesMeta


julia> df = DataFrame(region = repeat(["North","North","South","South"],2),
             fuel_type = repeat(["gas","coal"],4),
             load = rand(8),
             time = [1,1,1,1,2,2,2,2],
             )
8×4 DataFrame
│ Row │ region │ fuel_type │ load     │ time  │
│     │ String │ String    │ Float64  │ Int64 │
├─────┼────────┼───────────┼──────────┼───────┤
│ 1   │ North  │ gas       │ 0.1200391     │
│ 2   │ North  │ coal      │ 0.1080791     │
│ 3   │ South  │ gas       │ 0.3774391     │
│ 4   │ South  │ coal      │ 0.961021     │
│ 5   │ North  │ gas       │ 0.3568012     │
│ 6   │ North  │ coal      │ 0.2599412     │
│ 7   │ South  │ gas       │ 0.3034862     │
│ 8   │ South  │ coal      │ 0.8894032     │

julia> df = @transform(df,region_fuel_type = :region.*"_".*:fuel_type)
8×5 DataFrame
│ Row │ region │ fuel_type │ load     │ time  │ region_fuel_type │
│     │ String │ String    │ Float64  │ Int64 │ String           │
├─────┼────────┼───────────┼──────────┼───────┼──────────────────┤
│ 1   │ North  │ gas       │ 0.1200391     │ North_gas        │
│ 2   │ North  │ coal      │ 0.1080791     │ North_coal       │
│ 3   │ South  │ gas       │ 0.3774391     │ South_gas        │
│ 4   │ South  │ coal      │ 0.961021     │ South_coal       │
│ 5   │ North  │ gas       │ 0.3568012     │ North_gas        │
│ 6   │ North  │ coal      │ 0.2599412     │ North_coal       │
│ 7   │ South  │ gas       │ 0.3034862     │ South_gas        │
│ 8   │ South  │ coal      │ 0.8894032     │ South_coal       │

julia> unstack(df,:time,:region_fuel_type,:load)
┌ Warning: `T` is deprecated, use `nonmissingtype` instead.
│   caller = compacttype(::Type, ::Int64) at show.jl:39
└ @ DataFrames ~/.julia/packages/DataFrames/0Em9Q/src/abstractdataframe/show.jl:39
2×5 DataFrame
│ Row │ time  │ North_coal │ North_gas │ South_coal │ South_gas │
│     │ Int64 │ Float64⍰   │ Float64⍰  │ Float64⍰   │ Float64⍰  │
├─────┼───────┼────────────┼───────────┼────────────┼───────────┤
│ 110.1080790.1200390.961020.377439  │
│ 220.2599410.3568010.8894030.303486

@cbdavis suggests we could extend unstack from DataFrames
(might move this issue there. cc @bkamins)

function DataFrames.unstack(df::DataFrame, 
                            rowkey::Union{Symbol,AbstractVector{Symbol}}, 
                            colkeys::AbstractVector{Symbol}, value::Symbol)
	new_colkey = Symbol(join(string.(colkeys), "_"))
	df[!,new_colkey] = [join(x, "_") for x in eachrow(df[:,colkeys])]
	return unstack(df, rowkey, new_colkey, value)
end

unstack(df,:time,[:region, :fuel_type],:load)
@bkamins
Copy link

bkamins commented Mar 9, 2020

If I understand things correctly you want to unstack on several columns and join their values to form target column names - right? I think it is OK to make a PR/issue to DataFrames.jl with this change if you want this functionality. It is non-breaking so we could add it in 1.x release in the worst case.

@oxinabox
Copy link
Member Author

oxinabox commented Mar 9, 2020

Yes, kind of like how you can join on multiple columns.

@bkamins
Copy link

bkamins commented Mar 9, 2020

see JuliaData/DataFrames.jl#2148

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants