Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider reverting fcase recycled conditions #6352

Closed
MichaelChirico opened this issue Aug 5, 2024 · 10 comments · Fixed by #6363
Closed

Consider reverting fcase recycled conditions #6352

MichaelChirico opened this issue Aug 5, 2024 · 10 comments · Fixed by #6363
Milestone

Comments

@MichaelChirico
Copy link
Member

@DavisVaughan offered a suggestion that we reconsider recycling conditions in fcase(): vectorized default should be enough to solve the user issue:

https://fosstodon.org/@davis/112909151150543355

Earmarking this to think it through more carefully & for others' input.

@MichaelChirico MichaelChirico added this to the 1.16.0 milestone Aug 5, 2024
@DavisVaughan
Copy link
Contributor

DavisVaughan commented Aug 5, 2024

In particular if you look back at #4258, I think every case shown there is solved by the more explicit / easier to understand default = <vectorized>

@tdhock
Copy link
Member

tdhock commented Aug 7, 2024

I don't use fcase very often, probably because it did not recycle, so I found it inconvenient.

@DavisVaughan
Copy link
Contributor

@tdhock can you give a convincing argument for recycling the conditions?

@TysonStanley
Copy link
Member

I use it daily. The default = <vectorized> is what I'd use and would make it far more convenient for me.

@tdhock
Copy link
Member

tdhock commented Aug 7, 2024

actually I probably don't understand the issue, can you please clarify what "recycling conditions" and "vectorized default" mean? perhaps with an example that shows what your proposed work-around would be after reverting?

there were 14 thumbs up #4258 (comment)

the inconvenience that I meant is shown below:

> library(data.table)
data.table 1.15.4 using 3 threads (see ?getDTthreads).  Latest news: r-datatable.com
> data.table(iris)[, mycol := fcase(Petal.Width>2, paste(Species), Petal.Width<1, "small", default="medium")][]
     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species     mycol
            <num>       <num>        <num>       <num>    <fctr>    <char>
  1:          5.1         3.5          1.4         0.2    setosa     small
  2:          4.9         3.0          1.4         0.2    setosa     small
  3:          4.7         3.2          1.3         0.2    setosa     small
  4:          4.6         3.1          1.5         0.2    setosa     small
  5:          5.0         3.6          1.4         0.2    setosa     small
 ---                                                                      
146:          6.7         3.0          5.2         2.3 virginica virginica
147:          6.3         2.5          5.0         1.9 virginica    medium
148:          6.5         3.0          5.2         2.0 virginica    medium
149:          6.2         3.4          5.4         2.3 virginica virginica
150:          5.9         3.0          5.1         1.8 virginica    medium
> data.table(iris)[, mycol := fcase(Petal.Width>2, "big", Petal.Width<1, "small", default=Species)][]
Error in fcase(Petal.Width > 2, "big", Petal.Width < 1, "small", default = Species) : 
  Length of 'default' must be 1.

above/CRAN I thought it was strange that fcase allowed vectors in ... but not default (the error).
below/master there is no error which I think is good for consistency.

> library(data.table)
data.table 1.15.99 IN DEVELOPMENT built 2024-08-07 15:42:29 UTC using 3 threads (see ?getDTthreads).  Latest news: r-datatable.com
> data.table(iris)[, mycol := fcase(Petal.Width>2, paste(Species), Petal.Width<1, "small", default="medium")][]
     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species     mycol
            <num>       <num>        <num>       <num>    <fctr>    <char>
  1:          5.1         3.5          1.4         0.2    setosa     small
  2:          4.9         3.0          1.4         0.2    setosa     small
  3:          4.7         3.2          1.3         0.2    setosa     small
  4:          4.6         3.1          1.5         0.2    setosa     small
  5:          5.0         3.6          1.4         0.2    setosa     small
 ---                                                                      
146:          6.7         3.0          5.2         2.3 virginica virginica
147:          6.3         2.5          5.0         1.9 virginica    medium
148:          6.5         3.0          5.2         2.0 virginica    medium
149:          6.2         3.4          5.4         2.3 virginica virginica
150:          5.9         3.0          5.1         1.8 virginica    medium
> data.table(iris)[, mycol := fcase(Petal.Width>2, "big", Petal.Width<1, "small", default=paste(Species))][]
     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species     mycol
            <num>       <num>        <num>       <num>    <fctr>    <char>
  1:          5.1         3.5          1.4         0.2    setosa     small
  2:          4.9         3.0          1.4         0.2    setosa     small
  3:          4.7         3.2          1.3         0.2    setosa     small
  4:          4.6         3.1          1.5         0.2    setosa     small
  5:          5.0         3.6          1.4         0.2    setosa     small
 ---                                                                      
146:          6.7         3.0          5.2         2.3 virginica       big
147:          6.3         2.5          5.0         1.9 virginica virginica
148:          6.5         3.0          5.2         2.0 virginica virginica
149:          6.2         3.4          5.4         2.3 virginica       big
150:          5.9         3.0          5.1         1.8 virginica virginica

@TysonStanley
Copy link
Member

@tdhock this behavior is what I'm referring to with the default (the last example). Not sure what recycling conditions means outside of this for the default.

@DavisVaughan
Copy link
Contributor

I think we are all just misunderstanding each other a bit, this should clear things up

# I think this is bad, and should be an error. Do not recycle `TRUE` to be the same size as the other `condition`s.
# i.e. do not copy the dplyr `case_when()` behavior of `TRUE ~`.
fcase(
  iris$Sepal.Length > 5, ">5",
  iris$Sepal.Length < 4, "<4",
  TRUE, as.character(iris$Sepal.Length)
)

# This is fantastic, yay for vectorized `default`
fcase(
  iris$Sepal.Length > 5, ">5",
  iris$Sepal.Length < 4, "<4",
  default = as.character(iris$Sepal.Length)
)

IIUC, the PR that updated fcase() allows both of those to work now, but I think only the 2nd is practically useful, and the 1st just causes confusion

@MichaelChirico
Copy link
Member Author

the goal of the original FR AIUI is to get closer to SQL's behavior in CASE WHEN statements using ELSE, which allow full column expressions in that last clause where previously we only allowed length(default)==1.

@TysonStanley
Copy link
Member

That makes sense about the original intent but the new functionality is definitely an upgrade. If I'm understanding correctly, we have default = <vectorized>? Are there issues with the implementation?

@MichaelChirico
Copy link
Member Author

Going through the examples in #4258

  • Shouldn't fcase() recycle? #4258 (comment)

    # NOW WORKS
    fcase(
      iris$Sepal.Length > 5, ">5",
      iris$Sepal.Length < 4, "<4",
      default = as.character(iris$Sepal.Length)
    )
  • Shouldn't fcase() recycle? #4258 (comment)

    DTmtcars[, rn := fcase(
                       rn %like% "^Merc", sub("^Merc", "Mercedes", rn),
                       rn == "Toyota Corona", paste0(rn, "s"),
                       ...
                       rep(TRUE, length(rn)), rn # else just the original vector
                     )]
    ### NOW
    DTmtcars[, rn := fcase(
                       rn %like% "^Merc", sub("^Merc", "Mercedes", rn),
                       rn == "Toyota Corona", paste0(rn, "s"),
                       ...
                       default = rn # else just the original vector
                     )]
  • Shouldn't fcase() recycle? #4258 (comment)

    data.table::fcase(
      countries == "USA", "United States of America",
      countries == "Britain", "United Kingdom",
      countries == "Russian Federation", "Russia",
      countries == "Trinidad-Tobago", "Trinidad and Tobago",
      countries == "Bahamas", "The Bahamas",
      countries == "Timor-Leste", "East Timor",
      countries == "UAE", "United Arab Emirates",
      countries == "Congo", "Democratic Republic of the Congo",
      countries == "Sao Tome", "Sao Tome and Principe",
      dont_modify, countries
    )
    ### NOW [obviates need to store dont_modify]
    data.table::fcase(
      countries == "USA", "United States of America",
      countries == "Britain", "United Kingdom",
      countries == "Russian Federation", "Russia",
      countries == "Trinidad-Tobago", "Trinidad and Tobago",
      countries == "Bahamas", "The Bahamas",
      countries == "Timor-Leste", "East Timor",
      countries == "UAE", "United Arab Emirates",
      countries == "Congo", "Democratic Republic of the Congo",
      countries == "Sao Tome", "Sao Tome and Principe",
      default = countries
    )
  • Shouldn't fcase() recycle? #4258 (comment)

    ### TRANSLATED FROM dplyr::case_when()
    countries[, name := data.table::fcase(
      is_czech_name(name) & year <= 1938, "Czechoslovak Republic",
      is_czech_name(name) & year %between% c(1939, 1992), "Czechoslovakia",
      is_czech_name(name) & year >= 1993, "Czech Republic",
      name == "USA", "United States of America",
      name == "Britain", "United Kingdom",
      default = name
    )]

As noted, all of those work just fine with vectorized default=, there's no need to rely on TRUE, x working.

So it's clear AFAICT that we don't need to support TRUE, x which is always equivalent to default = x; I am only slightly unsure whether it's possible to block TRUE, x from working while retaining the rest of the vectorized default PR (default= vectorized & default= evaluated lazily).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants