Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot add list column to empty data.table #5738

Closed
ben-schwen opened this issue Nov 13, 2023 · 6 comments · Fixed by #5739
Closed

Cannot add list column to empty data.table #5738

ben-schwen opened this issue Nov 13, 2023 · 6 comments · Fixed by #5739
Assignees
Labels
non-atomic column e.g. list columns, S4 vector columns

Comments

@ben-schwen
Copy link
Member

ben-schwen commented Nov 13, 2023

Both inplace operations currently do not allow adding lists of lengths over 1. The problem arises on both 1.14.8 and current dev.

library(data.table)
dt = data.table()
x = list("a", 1)
dt[, a := x]
#> Error in `[.data.table`(dt, , `:=`(a, x)): Supplied 2 items to be assigned to 1 items of column 'a'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
set(dt, j="a", value=x)
#> Error in set(dt, j = "a", value = x): Supplied 2 items to be assigned to 1 items of column 'a'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.

Adding columns of other types works fine

library(data.table)
dt = data.table()
dt[, b := 1:5][]
#>        b
#>    <int>
#> 1:     1
#> 2:     2
#> 3:     3
#> 4:     4
#> 5:     5

dt = data.table()
dt[, c := letters[1:5]][]
#>         c
#>    <char>
#> 1:      a
#> 2:      b
#> 3:      c
#> 4:      d
#> 5:      e
@ben-schwen ben-schwen self-assigned this Nov 13, 2023
@jangorecki jangorecki added the non-atomic column e.g. list columns, S4 vector columns label Nov 13, 2023
@AngelFelizR
Copy link
Contributor

AngelFelizR commented Nov 14, 2023

I think the result makes sense; you just need to grad it into another list as you can see below.

> library(data.table)
> dt = data.table()
> x = list("a", 1)
> dt[, a := .(x)]
> str(dt)

Classes ‘data.table’ and 'data.frame':	2 obs. of  1 variable:
 $ a:List of 2
  ..$ : chr "a"
  ..$ : num 1
 - attr(*, ".internal.selfref")=<externalptr> 
 
 > dt
   a
1: a
2: 1

@ben-schwen
Copy link
Member Author

ben-schwen commented Nov 14, 2023

@AngelFelizR I know how to work around the issue but list(x) != x e.g. on matrix columns...

library(data.table)
dt = data.table()
x = list(diag(2))
dt[, a := x][]
#> Warning in `[.data.table`(dt, , `:=`(a, x)): 2 column matrix RHS of := will be
#> treated as one vector
#>        a
#>    <num>
#> 1:     1
#> 2:     0
#> 3:     0
#> 4:     1

dt = data.table()
x = list(diag(2))
dt[, a := list(x)][]
#>          a
#>     <list>
#> 1: 1,0,0,1

@AngelFelizR
Copy link
Contributor

AngelFelizR commented Nov 14, 2023

You just need to be aware the error make sense in the default use of list that we use with lapply

> dt = data.table()
> x = list("a", 1)
> dt[, c("a","b") := x]
> str(dt)

Classesdata.tableand 'data.frame':	1 obs. of  2 variables:
 $ a: chr "a"
 $ b: num 1
 - attr(*, ".internal.selfref")=<externalptr> 

@jangorecki
Copy link
Member

jangorecki commented Nov 14, 2023

I believe this issue may be caused by a automatic wrapping input on RHS (or value arg) into list call, which we do for user convenience.
There may inherited ambiguity caused by that.

What if we want to add list of lists column? or list of list of lists column?

Atomic columns are automatically enlisted, that's why code below works

x[ c("a","b") := list(1:2, 2:3)]
x[ c("a") := 1:2]

to avoid this inherited ambiguity we would have to ask users to always write list, even for single column

x[ c("a") := list(1:2)]

I am afraid trying to fix this edge case we are just shifting problem to another edge case, but I am not against, as long as we have good unit tests included and revdeps passes.

I may also be wrong, as the issue precisely mentions scalar columns, maybe this part we can fix without shifting problem elsewhere.

@AngelFelizR
Copy link
Contributor

That would be a breaking change and I am not sure if that would improve the general experience

@ben-schwen
Copy link
Member Author

ben-schwen commented Nov 14, 2023

The current situation is the following:
SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP nwecolnames, SEXP values) checks the number of nrow of dt.

Case 1: dt has already some rows, so it simply takes the number of elements in the first column.

Case 2: dt has no rows yet.
a) values is a list, so nrow is set to the length of first element of values
b) values is not a list, so nrow is set to the length of values

I want to extend Case 2a) to better distinguish whether a list column is added or multiple columns (length(values) should be the same length of affected cols)

edit: What is still open are weird cases of trying to add columns to a 0-row data.table

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
non-atomic column e.g. list columns, S4 vector columns
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants