-
Notifications
You must be signed in to change notification settings - Fork 51
/
Copy pathGetting_ready_to_use_R.Rmd
248 lines (182 loc) · 9.42 KB
/
Getting_ready_to_use_R.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
```{r,message=FALSE,echo=FALSE}
library("knitcitations")
library("knitr")
cite_options(citation_format = "pandoc", max.names = 3, style = "html", hyperlink = "to.doc")
bib <- read.bibtex("bibtexlib.bib")
opts_chunk$set(tidy = FALSE, message = FALSE, warning = FALSE, cache = TRUE)
# use this to set knitr options:
# http://yihui.name/knitr/options #chunk_options
```
---
title: "Getting ready to use R"
subtitle: "*NJ Grünwald, ZN Kamvar and SE Everhart*"
---
R provides a unique environment for performing population genetic analyses. You
will particularly enjoy not having to switch data formats and operating systems
to execute a series of analyses, as was the case until now. Furthermore, R
provides graphing capabilities that are ready for use in publications, with only
a little bit of extra effort. But first, let's install R, install an integrated
development environment, open R, and load R packages.
<!---
## FOR THE SPRING 2014 WORKSHOP
You must have:
1. The latest version of [R](http://www.r-project.org) **(3.1.0)**
2. [RStudio Desktop](http://www.rstudio.com/ide/). Think of this as kind of a GUI for R.
3. The latest development version of *poppr* (1.1.0.99) (see below)
**You must download the development version of poppr** (1.1.0.99). You may do this
in several ways:
### If you do NOT have a C compiler or do not know what it is:
1. In your R console, run the command to install the dependencies of *poppr*: `install.packages(c("adegenet", "pegas", "vegan", "ggplot2", "phangorn", "ape", "igraph"))`
2. Download the binaries for Windows [[here]](http://grunwaldlab.cgrb.oregonstate.edu/sites/default/files/u5/poppr_1.1.0.99_24.6.14.zip) or OSX [[here]](http://grunwaldlab.cgrb.oregonstate.edu/sites/default/files/u5/poppr_1.1.0.99_24.6.14.tgz) and install using the `Tools > Install Packages...` menu command. You should see a window like this:

You should make sure to choose the **Package Archive File** option and then indicate where you downloaded the package in the menubar. If you have downloaded it on your desktop, you would type `~/Desktop/poppr_1.1.0.99.tgz` (on OSX).
### If you have a working C compiler on your system
You may do install *poppr* via the R package *devtools*:
```{r, devtools, eval=FALSE}
install.packages("devtools")
devtools::install_github("grunwaldlab/poppr", ref = "devel")
```
-->
Installing R
-----
1. Download and install the [R](http://www.r-project.org) statistical computing
and graphing environment. This works cross-platform on Windows, OS X and
Linux operating systems.
2. Download and install the free, open source edition of the [RStudio Desktop](http://www.rstudio.com/ide/) integrated development environment (IDE) that we recommend.
Installing the required packages
----
The following packages are utilized in this primer:
1. [poppr](http://cran.r-project.org/web/packages/poppr/index.html)
2. [adegenet](http://cran.r-project.org/web/packages/adegenet/index.html)
2. [ape](http://cran.r-project.org/web/packages/ape/index.html)
4. [ggplot2](http://cran.r-project.org/web/packages/ggplot2/index.html)
5. [mmod](http://cran.r-project.org/web/packages/mmod/index.html)
6. [magrittr](http://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html)
7. [dplyr](http://cran.r-project.org/web/packages/dplyr/vignettes/introduction.html)
8. [treemap](http://cran.r-project.org/web/packages/treemap/index.html)
Use the following script to install these packages:
```{r, eval = FALSE}
install.packages(c("poppr", "mmod", "magrittr", "treemap"), repos = "http://cran.rstudio.com", dependencies = TRUE)
```
We wrote and actively maintain *poppr* `r citep(bib["kamvar2014poppr", "kamvar2015novel"])` and it
is heavily relied upon in this primer. *Poppr* is an R package. You can think of
a package as a library of functions written and curated by someone in the R user
community, which you can be loaded into R for use.
Once you've installed *poppr*, you can invoke (i.e., load) it by typing or
cutting and pasting:
```{r, message=TRUE, warning=TRUE, cache=FALSE}
library("poppr")
```
This will load *poppr* and all dependent packages, such as *adegenet* and
*ade4*. You will recognize loading by the prompts written to your screen.
Congratulations. You should now be all set for using R. Loading data and
conducting your first analysis will be the topic of the next chapter. But before
we go there lets provide a few useful resources.
A quick introduction to R using RStudio
----
Next, let's review some of the basic features and functions of R. To start R,
open the RStudio application from your programs folder or start menu. This will
initialize your R session. To exit R, simply close the RStudio application.
> Note that R is a case sensitive language!
Let's get comfortable with R by submitting the following command on the command
line (where R prompts you with a `>` in the lower left RStudio window pane) that
will retrieve the current working directory on your machine:
```{r, eval=FALSE}
getwd() # this command will print the current working directory
```
> Note that the symbol '#' is used to add comments to your code and you just
> type `getwd()` after the ">".
Our primer is heavily based on the *poppr* and *adegenet* packages. To get help
on any of their functions type a question mark before the empty function call as
in:
```{r, eval=FALSE}
?mlg # open the R documentation of the function mlg()
```
To quit R you can either use the <kbd>RStudio > Quit</kbd> pull-down menu
command or execute <kbd>⌘ + Q</kbd> (OS X) or <kbd>ctrl + Q</kbd> (PC).
Using *magrittr*
----
Various chapters throughout this primer will have the symbol `%>%` in the code.
This is called a "pipe" operator and it allows code to be more readable by
stringing together commands from right to left. Here's a [short description of
these "pipes" with cats](http://rforcats.net/#pipes). When reading code, it can
be thought of as equivalent to saying "and then". For example, if you have three
consecutive steps to a process, you would write this in English as:
> Take your data **and then** do step one, **and then** do step two, **and
> then** do step three.
In R code with *magrittr*, assuming that each step is a function, it might be
written as:
```{r, eval = FALSE}
result <- data %>% step_one() %>% step_two() %>% step_three()
```
Below, are two examples of how code can be improved with *magrittr*. [More
details about magrittr can be found in this
link.](http://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html)
#### Consider a fake Example:
Adapted from
[Hadley Wickham](https://twitter.com/_inundata/status/557980236130689024).
Based on the children's song,
[Little bunny foo foo](http://en.wikipedia.org/wiki/Little_Bunny_Foo_Foo).
```{r, eval = FALSE}
foo_foo <- little_bunny()
bop_on(scoop_up(hop_through(foo_foo, forest), field_mouse), head)
# VS
foo_foo %>%
hop_through(forest) %>%
scoop_up(field_mouse) %>%
bop_on(head)
```
#### Now for a real Example:
We will use the *Phytophthora infestans* microsatellite data from North and South America
`r citep(bib['goss2014irish'])`. Let's calculate allelic diversity per population
after clone-correction. This information can be found in our chapters on
[Population strata and clone correction](Population_Strata.html) and
[Locus based statistics](Locus_Stats.html).
```{r, eval = FALSE}
library("poppr")
library("magrittr")
data(Pinf)
# Compare the traditional R script
allelic_diversity <- lapply(seppop(clonecorrect(Pinf, strata = ~Continent/Country)),
FUN = locus_table, info = FALSE)
# versus the magrittr piping:
allelic_diversity <- Pinf %>%
clonecorrect(strata= ~Continent/Country) %>% # clone censor by continent and country.
seppop() %>% # Separate populations (by continent)
lapply(FUN = locus_table, info = FALSE) # Apply the function locus_table to both populations
```
> To observe the results type `allelic_diversity` into the console after each statement.
The `%>%` operator is thus good if you have to do a lot of small steps in your
analysis. It allows your code to be more readable and reproducible.
Packages and getting help
----
One way that R shines above other languages for analysis is the fact that
R packages in CRAN are all documented. Help files are written in HTML and give
the user a brief overview of:
- the purpose of a function,
- the parameters it takes,
- the output it yields,
- and some examples demonstrating its usage.
To see all of the help topics in a package, you can simply type:
```{r, eval = FALSE}
help(package = "poppr") # Get help for a package.
help(amova) # Get help for the amova function.
?amova # same as above.
??multilocus # Search for functions that have the keyword multilocus.
```
Some packages include vignettes that can have different formats such as being
introductions, tutorials, or reference cards in PDF format. You can look at a
list of vignettes in all packages by typing:
```{r, eval=FALSE}
browseVignettes() # see vignettes from all packages
browseVignettes(package = 'poppr') # see vignettes from a specific package.
```
and to look at a specific vignette you can type:
```{r, eval=FALSE}
vignette('poppr_manual')
```
Next, consider browsing Appendix 3 on "Introduction to R" if you are not yet
familiar with R and RStudio. Otherwise, you are now ready to think about
formatting and loading population genetic data into R.
References
----------