-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.Rmd
149 lines (110 loc) · 5.08 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
warning = F
)
```
# meta goals
- learn about ggplot2 extension strategies and extension workflow
- learn about how teaching strategies can inform development
- learn more about robust package practices
- practice collaborating on package, git and github
- publication w/ collaborators
# ggols goals (abstract)
Statistical modeling is an important technique to understanding relationships in data. In statistics classes, introductions to modeling often start by modeling one continuous 'response' variable, y, by one continuous 'independent' variable, x. After discussing statistical features of this simple modeling set-up, students are then often introduced to multivariate thinking in which a second explanatory variable is used in modeling; most commonly, this second variable is an cat or categorical variable, where conditions are discrete and therefore easier to reason about.
Using the popular and powerful ggplot2 library, visualizations of models with one explanatory variable is relatively straightforward via the ggplot2::geom_smooth() function. However, it is much more difficult to use the ggplot2 package to visualize models with a second, categorical prediction variable. geom_smooth will indicate different predictions for different categories when an categorical variable is included in the aesthetic specification via for example aes(linetype = my_cat) or aes(color = my_cat). However, the model estimation is performed independently for each category; geom_smooth is instructed to break each dataset apart and compute group-wise. Compelling, 2-D visuals of the models are possible, but challenging to choreograph with ggplot2.
The ggplot2 extension package {{ggols}} (from OLS) aims to provide new geom_*'s that will allow for easy visualization of models of the form y ~ x + z where x and y are continuous and z is discrete. The extension strategy that is used is to specify that model computation happen at the panel level (or maybe layer); and not the group level. In addition to providing functionality like geom smooth, which allows for easy visualization of a smear of model predictions and confidence intervals, {{ggols}} also seeks to provide functions that allow for easy visualization of other statistical quantities (like residuals, prediction interval, intercepts, etc.). Many of the examples will be OLS, but we aim to build something that can be used in a more generic way as well.
<!-- badges: start -->
<!-- badges: end -->
The goal of ggols is to ...
## Installation
You can install the released version of ggols from [CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("ggols")
```
And the development version from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("EvaMaeRey/ggols")
```
## ggxmean gives us all the geom_lm_* functions
```{r example}
library(tidyverse)
library(ggxmean)
ggplot(cars) +
aes(x = speed, y = dist,) +
geom_point() +
geom_lm() +
geom_lm_intercept(color = "blue") +
geom_lm_intercept_label(hjust = -.2) +
geom_lm_conf_int() +
geom_lm_residuals() +
geom_lm_fitted() +
geom_lm_formula() +
ggols:::geom_lm_rsquared() #should be moved to ggxmean
```
# Visualization, adding cat var
```{r cat}
library(ggols)
ggplot(cars) +
aes(x = speed, y = dist, cat = dist > 30,
color = dist > 30) +
geom_point() +
geom_lm_cat() +
geom_lm_cat_intercept(color = "blue") +
geom_lm_cat_intercept_label(hjust = -.2) +
geom_lm_cat_conf_int() +
geom_lm_cat_residuals() +
geom_lm_cat_fitted() +
geom_lm_cat_formula() +
geom_lm_cat_rsquared()
```
# Visualization, adding cat interaction
```{r interaction}
ggplot(cars) +
aes(x = speed, y = dist, cat = dist > 30,
color = dist > 30) +
geom_point() +
geom_lm_interaction() +
geom_lm_interaction_intercept(color = "blue") +
geom_lm_interaction_intercept_label(hjust = -.2) +
geom_lm_interaction_conf_int() +
geom_lm_interaction_residuals() +
geom_lm_interaction_fitted() +
geom_lm_interaction_formula() +
geom_lm_interaction_rsquared()
```
# Visualization, and two cat variables
```{r int2}
ggplot(palmerpenguins::penguins) +
aes(x = bill_length_mm, y = bill_depth_mm,
cat = species, cat2 = sex,
color = species, shape = sex) +
geom_point() +
geom_lm_cat2() +
geom_lm_cat2_conf_int() +
geom_lm_cat2_residuals() +
geom_lm_cat2_fitted() +
geom_lm_cat2_formula() +
geom_lm_cat2_rsquared() +
NULL
last_plot() +
geom_lm_cat2_intercept(color = "blue") +
geom_lm_cat2_intercept_label(hjust = -.2)
```
```{r}
ggplot(palmerpenguins::penguins) +
aes(x = bill_length_mm, y = bill_depth_mm,
cat = species,
color = species, shape = sex) +
geom_point() +
geom_lm_cat(formula = y ~ I(x^3) + I(x^2) + x + cat)
# the implementation should change; curves expose problem. The non-smoothness shows prediction not made along parts not supported with data
```