Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad dodging when all y values are the same #31

Closed
koncina opened this issue Jul 31, 2017 · 4 comments
Closed

Bad dodging when all y values are the same #31

koncina opened this issue Jul 31, 2017 · 4 comments

Comments

@koncina
Copy link

koncina commented Jul 31, 2017

Thank you for the useful ggbeeswarm package.

I came across a strange geom_beeswarm() behaviour while accidentally plotting a vector composed of identical values: the points are dodged vertically and not horizontally.

library(tidyverse)

tibble(y = c(rep(0, 40))) %>%
  ggplot(aes(x = 1, y = y)) +
  ggbeeswarm::geom_beeswarm()

image

If the vector contains a single different value, the expected behaviour is restored:

tibble(y = c(rep(0, 39), 0.1)) %>%
  ggplot(aes(x = 1, y = y)) +
  ggbeeswarm::geom_beeswarm()

image

When combined with the dodge.width argument this behaviour can produce a misleading representation:

tibble(x = rep(LETTERS[1:4], 10), y = rep(0, 40)) %>%
  ggplot(aes(x = x, y = y)) +
  ggbeeswarm::geom_beeswarm(dodge.width = 1)

image

@sherrillmix
Copy link
Collaborator

Thanks for the bug report. There's a couple nested issues here:

  1. ggbeeswarm tries to guess whether the user wants the x or y variable dodged based on the number of unique entries in x and y. Currently when x and y have the same number of unique values, then the y-axis is dodged.
    I think x should be the default but this could conceivably break people's code so we should discuss.
  2. This "smart" behavior can be turned off by explicitly setting groupOnX=TRUE to always dodge on x (or FALSE to dodge on y). Except that option was being lost in the function chain and having no effect.
    I fixed the groupOnX option in current version.
  3. Looking closer at this, I'm worried that ggbeeswarm is being too smart. The behavior seen in your 3rd example remains (although it could be corrected by specifying groupOnX=TRUE). It seems pretty dangerous that e.g. an autogenerated report could suddenly generate nonsense because it's input was all zeros so I'd lean towards removing it and having people specify if they want offset on y. However, people may be depending on the current "smartness" in their existing code. I guess we'll need to debate what the least bad option is.

So in the short-term, I think all your examples should be fixed by installing the github version (devtools::install_github('eclarke/ggbeeswarm'))
and setting ggbeeswarm::geom_beeswarm(groupOnX=TRUE) and, long-term, we'll have to debate what's best to do with the default behavior.

ggplot(NULL,aes(x = rep(LETTERS[1],40), y = rep(0,40))) +
ggbeeswarm::geom_beeswarm(dodge.width = 1,groupOnX=TRUE)

groupOnX

@koncina
Copy link
Author

koncina commented Aug 1, 2017

Thanks for your commit, the restored groupOnX option works well.
This remains indeed an issue when generating automated reports.

Maybe you could use the is_discrete() functions (in scales) to adjust groupOnX? (That would at least work for variables which are explicitly discrete).

For example replacing the way groupOnX is guessed in position-beeswarm.R by something like:
if(is.null(params$groupOnX)) params$groupOnX <- length(unique(data$y)) > length(unique(data$x)) && !scales$y$is_discrete() || scales$x$is_discrete().

However, I would personally encourage to set groupOnX to TRUE by default (assuming the continuous variable on y) and use coord_flip() to produce a flipped plot if required (but as you mentioned that would eventually break some people's code).

@sherrillmix
Copy link
Collaborator

sherrillmix commented Aug 1, 2017

Yeah I'd also vote for the non-smart default groupOnX=TRUE. It seems very bad to have defaults that can completely flip expectations with reasonable inputs.

I think the least harm option might be to change the auto groupOnX line to something like:

if(is.null(params$groupOnX) && length(unique(data$y) > length(unique(data$x)){
  warning('The default behavior of beeswarm has changed in version XXX. 
    In versions <XXX, this plot would have been dodged on the y-axis. 
    In versions >=XXX, grouponX=FALSE must be explicitly set to group on y-axis. 
    Please set grouponX=TRUE/FALSE to avoid this warning and ensure proper axis choice.')
}

so we throw a warning if the line would have triggered. Stil annoying to be e.g. sending warnings into someone's report but at least nothing is misrepresented and it explicitly states whats going on. I guess we could then remove the warning in a version or two while maintaining a warning in the documentation.

@eclarke
Copy link
Owner

eclarke commented Aug 2, 2017

I agree with throwing a warning for this. Since it's a pretty important bugfix I think we should push this and the warning to CRAN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants