Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

frankv with .SD fails when the function calls 'set' #4429

Closed
smarches opened this issue May 4, 2020 · 2 comments · Fixed by #4434
Closed

frankv with .SD fails when the function calls 'set' #4429

smarches opened this issue May 4, 2020 · 2 comments · Fixed by #4434
Milestone

Comments

@smarches
Copy link

smarches commented May 4, 2020

I was looking to get ranks using frankv(...,ties.method="random") and encountered a seemingly inconsistent behavior. Say we'd like to calculate ranks, by reference, using a character vector of names of colums in .SD. The following all work:

Example

library(data.table)
dt = data.table(
  x = 1:20,
  y = 20:1,
  z = rep(c("A","B","C","D"),5)
)
rank_cols = c('x','y')

dt[,ranks := frankv(.SD,cols = rank_cols,ties.method = 'average'),by = 'z']
dt[,ranks := frankv(.SD,cols = rank_cols,ties.method = 'first'),by = 'z']
dt[,ranks := frankv(.SD,cols = rank_cols,ties.method = 'max'),by = 'z']
dt[,ranks := frankv(.SD,cols = rank_cols,ties.method = 'min'),by = 'z']
dt[,ranks := frankv(.SD,cols = rank_cols,ties.method = 'dense'),by = 'z']
dt[,ranks := frankv(.SD[,rank_cols,with = FALSE],cols = rank_cols,ties.method = 'random'),by = 'z']

but these do not:

dt[,ranks := frankv(.SD,cols = rank_cols,ties.method = 'average',na.last = NA),by = 'z']
dt[,ranks := frankv(.SD,cols = rank_cols,ties.method = 'random'),by = 'z']
# two wrongs don't make a right!
dt[,ranks := frankv(.SD,cols = rank_cols,ties.method = 'random',na.last = NA),by = 'z']

it likewise works again when not explicitly invoking .SD:

dt[,ranks := frankv(get(rank_cols),ties.method = 'random'),by = 'z']

It seems like an interface bug that this call does not work for some (valid) choices of ties.method or na.last ('random' and NA respectively) while working for all others. In both cases it appears that frankv calling set during the function evaluation is what triggers the error. Trying to specify .SDcols also does not work.

In failing cases the error is of the form:

Error in set(x, NULL, "..stats_runif..", v) : 
  .SD is locked. Updating .SD by reference using := or set are reserved for future use. Use := in j directly. Or use copy(.SD) as a (slow) last resort, until shallow() is exported.

Session Info

R version 3.6.3 (2020-02-29)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.12.8

loaded via a namespace (and not attached):
[1] compiler_3.6.3 tools_3.6.3   
@jangorecki
Copy link
Member

Thank you for reporting. I can reproduce on latest devel.

@MichaelChirico
Copy link
Member

MichaelChirico commented May 7, 2020

@jangorecki we have:

x = .shallow(x, cols)

in frankv which is necessary here because then we do

if (is.na(na.last))
  set(x, j = "..na_prefix..", value = is_na(x, cols))

which we can't do on plain .SD.

However, .shallow doesn't remove the .data.table.locked attribute -- shouldn't it?

NVM, I see unlock argument for .shallow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants