Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions for Ch4 - Viz #40

Closed
28 of 49 tasks
joelostblom opened this issue Sep 21, 2022 · 0 comments · Fixed by #77
Closed
28 of 49 tasks

Suggestions for Ch4 - Viz #40

joelostblom opened this issue Sep 21, 2022 · 0 comments · Fixed by #77

Comments

@joelostblom
Copy link
Contributor

joelostblom commented Sep 21, 2022

These are all suggestions, some might seem more harshly written, but that is just to be succinct and write quickly.

  • LOs much shorter than R chapter
  • R&Py Soften section about avoiding pie charts and 3D
  • Add more specific comment on Vega colorschemes
  • R&Py Consider moving the checklists from "Refining the visualization" to the end, this is a lot of info that we are front-loading learners with, most of it likely not meaningful until they have seen some viz. It is nice with a few words on the purpose of viz, but this might be too much details too early.
  • R&Py Fix this sentence to be more correct, zooming in on small difference is completely ok when the difference is meaningful (e.g. rise in earth's temperature) "Don’t adjust the axes to zoom in on small differences. If the difference is small, show that it’s small!"
    • Especially since ggplot zoom in by default as we can see in figure 4.3 and 4.4
  • "To use the altair package, we need to import the altair package. We will also import pandas in order to support reading and other data related operations." -> "We will also import pandas to use for reading in the data."
  • .info instead of .dtypes
  • Incomplete sentence "while using the altair package, We create a plot object"
  • Remove note about how to make it look like ggplot "Note: We can change the size of the point and color of the plot by specifying mark_point(size=10, color='black')."
  • Why do we set color='black' in the lineplot with no explanation?
  • Dangling end parenthesis would be more in line with the syntax used in the altair docs.
  • Axis is being zoomed without any explanation as to why scale(zero=false), both line and scatter.
  • Much of the code use spaces around = in parameter assignment which is R syntax, not Python.
  • Fig 37 uses long lines of code that are hard to read
    • (we should really reset figure numbering per chapter and number chapters/parts)
  • This note is not very encouraging to students and could be expressed in another way "Note: The configure_ function in altair is complex and supports many other functionalities, which can be viewed here" (i usually don't teach configure at all)
  • Scale(zero=false) is explained now after being used multiple times already
  • The code for fig 38 is really hard to read, use more newlines.
  • Why are we setting the tickcount in fig 38?
  • R&Py instead of zooming in on the figure, why don't we filter the data? Is there a specific advantage of zooming in that we want students to learn here?
  • Fig 39, again the visualization is styled to look like the one in the R textbook instead of going with the default in Altair; this adds unnecessary noise to the code and makes it harder to learn.
  • Do we want to talk about filled=True and mark_circle here?
  • Fig 42 & 43 & 44 code needs more newlines (just go through all figures)
  • Incorrect newline in "ongue language has 19460850 speakers, \nwhile the least common"
  • .assign indentation is used incorrectly
  • The legend is incorrectly described as taking up the plot area. This is a limitation of ggplot but not of altair which have a constant area for the chart and the legend is added outside of that.
  • The section about color palette has improper paragraph delineation.
  • We should mention that the default color palette in altair is appropriate for most situations since it is the same as was developed by researchers at Tableau. This is again different from ggplot which uses evenly distributed color around a color wheel which is really inappropriate and should never be used as it includes red and green to separate categorical values.
  • https://www.color-blindness.com/coblis-color-blindness-simulator/ certificate expired
  • Considering limiting ouput to 6 row with hidden pandas config?
  • .sort_values(by = "size", ascending=False).iloc[:12] -> nlargest(12)
  • "our question ("WHICH are the top "
  • Is it really necessary to use configure axis on each chart? It seems like this takes away from the key concepts we are learning and adds noise.
  • fig50 mark_bar(color='black') and then also use the color encoding?
  • "convert the morley data to a tibble to take " Remove tibble
  • R Histograms can become more visually appealing by setting the color outline to 'white'
  • Why is the section on "histograms" using a bar chart without binning?
  • The vertical line in altair is currently drawn using a separate dataframe instead of just alt.value/alt.datum
  • Need to explain altair data types here since we are starting to use :N
  • I don't think we explained what + does?
  • Weird indentation for .assign again
  • The default number of bins in altair is not 30. It uses scott's or silverman's rules to compute it and then snaps it (down) to evenly align with the d3 axis grid.
  • "helping TO answer our question"
  • Need a more elaborate section on saving files and common issues
  • Zoomed in image from R
  • Add additional resources
  • R In an introductory course like this, it could be beneficial to teach geom_col instead of geom_bar(stat = 'identity'), which simplifies the explanation and reduces potential confusion. geom_bar would then only be used for counts. The drawback is that most people in the R community seem to not be aware of geom_col so you often see the more cryptic geom_bar(stat = 'identity') in the wild (but the tidyverse recommends it https://ggplot2.tidyverse.org/reference/geom_bar). We also use geom_histograminstead ofgeom_barwithstat_binorscale_x_bin`.
  • R Why not use facet_wrap(~continent, ncol=1) instead of facet_grid(rows = vars(continent))? It seems both more intuitive and common to me, and it is also the recommended approach in the tidyverse docs https://ggplot2.tidyverse.org/reference/facet_grid.html
@joelostblom joelostblom changed the title Suggestions for chapter 4 - Viz Suggestions for Ch4 - Viz Sep 27, 2022
@lheagy lheagy mentioned this issue Dec 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant