melt with measure.vars=list of length=1 should return integer in variable column #5209

tdhock · 2021-10-11T17:30:15Z

melt() docs say that variable column should be an integer when measure.vars is a list,

variable.name: name (default ''variable'') of output column containing
          information about which input column(s) were melted. 
...
          If 'measure.vars' is a list of integer/character
          vectors, then each entry of this column contains an integer
          indicating an index/position in each of those vectors.

In most cases that is true, for example

> library(data.table)
> (iris.row <- data.table(iris[1,]))
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1:          5.1         3.5          1.4         0.2  setosa
> melt(iris.row, measure.vars = list(Length=c("Petal.Length","Sepal.Length"), Width=c("Petal.Width","Sepal.Width")))
   Species variable Length Width
1:  setosa        1    1.4   0.2
2:  setosa        2    5.1   3.5
> melt(iris.row, measure.vars = list(Length="Petal.Length",Width="Petal.Width"))
   Sepal.Length Sepal.Width Species variable Length Width
1:          5.1         3.5  setosa        1    1.4   0.2

However if measure.vars is a list of length=1 then we get variable= character column name instead,

> melt(iris.row, measure.vars = list(Length="Petal.Length"))
   Sepal.Length Sepal.Width Petal.Width Species     variable Length
1:          5.1         3.5         0.2  setosa Petal.Length    1.4

This is not a big deal (probably not a lot of users which melt only one column), but it is inconsistent with the documentation, so I will work on a fix.

The text was updated successfully, but these errors were encountered:

SamuelAllain · 2023-02-15T09:47:42Z

+1 I have been surprised by this behaviour too

r2evans · 2024-12-04T13:04:07Z

I've seen the docs for that but had not recognized the real meaning of it until looking at your example measure.vars = list(Length="Petal.Length"). Is there an example of where changing the variable's values to integer indices is meaningful? I've internally assumed that if I needed an index instead of a string, I'd set it with match or factor/levels.

tdhock · 2024-12-04T13:07:17Z

Is there an example of where changing the variable's values to integer indices is meaningful?

not sure how to interpret "meaningful" in this context, could you please clarify?

my goal was to increase consistency.

r2evans · 2024-12-04T16:45:02Z

Sure, I apologize for lack of clarity. What is the justification for having this seemingly inconsistent difference? ("Inconsistent" between 1 and 2+ args, not commenting on documentation-vs-execution, that's different.)

To me, having (starkly) different behavior between 1 arg and 2+ args is counter-intuitive and requires caller's use of data.table::melt to guard against it with extra code. Granted, that code is not cosmic.

To me it seems more intuitive and consistent to always behave the same, whether always-strings or always-integers or always-factors, with arguments (e.g., variable.factor=) that support differing paths.

I'm late to the discussion, in a sense, and I'm not arguing change for change's sake, so I'm "curious" in case you know the history of why this explicit behavior was chosen (or did it just fall into place this way).

tdhock · 2024-12-04T17:05:18Z

ok, so I understand that you are curious about the inconsistency in the output type, between two uses.
First usage below is when measure.vars is a character vector, in which output variable is character (name of melted column)

> melt(iris.row, measure.vars = c("Petal.Length","Sepal.Length"))
   Sepal.Width Petal.Width Species     variable value
         <num>       <num>  <fctr>       <fctr> <num>
1:         3.5         0.2  setosa Petal.Length   1.4
2:         3.5         0.2  setosa Sepal.Length   5.1

Second usage below is when measure.vars is a list, in which output variable is "integer indicating an index/position in each of those vectors." (quote from ?melt)

> melt(iris.row, measure.vars = list(Length=c("Petal.Length","Sepal.Length"),Width=c("Petal.Width","Sepal.Width")))
   Species variable Length Width
    <fctr>   <fctr>  <num> <num>
1:  setosa        1    1.4   0.2
2:  setosa        2    5.1   3.5

you would have to ask @arunsrinivasan about the history about why he made this choice, but I think it makes sense. (integer index is necessary to identify which column was melted)

Rather than using measure.vars=list(...) (above), more recently I implemented support for measure.vars=measure(...) (see below) which is almost always preferable, because we get more informative "variable" columns (part below).

> melt(iris.row, measure.vars = measure(part, value.name, sep="."))
   Species   part Length Width
    <fctr> <char>  <num> <num>
1:  setosa  Sepal    5.1   3.5
2:  setosa  Petal    1.4   0.2

r2evans · 2024-12-04T17:33:46Z

I understand that that's what the docs say, and I see how measure(.) adds a lot of value here. My question is why it's useful to have different behaviors? From a "simple/consistent" approach mindset, I don't think it's wrong to hope that

melt(iris.row, measure.vars = list(Length=c("Petal.Length","Sepal.Length"))) # ignoring the warning
melt(iris.row, measure.vars = list(Length=c("Petal.Length","Sepal.Length"), Width=c("Petal.Width","Sepal.Width")))

would both produce a column named variable that contains a factor with levels derived from the column names. In my use-cases, I can't think of a time when I don't care which column name is paired with each row; I recognize my use-cases are just mine.

I think the point of my question is the history of it (as you suggested, from arunsrinivasan) to know if it "just happened this way" or if there is a specific mindset/use-case where having diverging behavior makes sense. (Backwards compatibility is always valuable, of course.)

Thanks tdhock!

tdhock self-assigned this Oct 11, 2021

tdhock added the reshape dcast melt label Oct 12, 2021

tdhock mentioned this issue Nov 2, 2021

melt(measure.vars=list) returns indices in variable column #5247

Merged

tdhock added the consistency label Nov 2, 2021

tdhock closed this as completed in #5247 Apr 8, 2024

tdhock mentioned this issue Jul 30, 2024

revdep vardpoor example failures after melt change #6071

Closed

tdhock mentioned this issue Nov 29, 2024

move from warning to breaking change in melt/dcast #6629

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

melt with measure.vars=list of length=1 should return integer in variable column #5209

melt with measure.vars=list of length=1 should return integer in variable column #5209

tdhock commented Oct 11, 2021 •

edited

Loading

SamuelAllain commented Feb 15, 2023

r2evans commented Dec 4, 2024

tdhock commented Dec 4, 2024

r2evans commented Dec 4, 2024

tdhock commented Dec 4, 2024 •

edited

Loading

r2evans commented Dec 4, 2024

melt with measure.vars=list of length=1 should return integer in variable column #5209

melt with measure.vars=list of length=1 should return integer in variable column #5209

Comments

tdhock commented Oct 11, 2021 • edited Loading

SamuelAllain commented Feb 15, 2023

r2evans commented Dec 4, 2024

tdhock commented Dec 4, 2024

r2evans commented Dec 4, 2024

tdhock commented Dec 4, 2024 • edited Loading

r2evans commented Dec 4, 2024

tdhock commented Oct 11, 2021 •

edited

Loading

tdhock commented Dec 4, 2024 •

edited

Loading