Relevant reading for this problem set: ModernDive Chapter 1: Getting Started with Data in R
The goal this week is to introduce you to R and RStudio which you’ll be using throughout the course both to review the statistical concepts discussed in the course and to analyze real data and come to informed conclusions. To clarify which is which: R is the name of the programming language itself and RStudio is a convenient interface.
Today we begin with the fundamental building blocks of R and RStudio: the interface, creating and saving files, and basic commands.
Open Smith’s RStudio Server and sign in: RStudio Server
Your credentials are the same as for your email.
Please DO NOT choose Stay signed in.
In RStudio Server you should see a window that looks like the image below.
The panel on the left is where the action happens. It’s called the console. Every time you launch RStudio, it will have the same text at the top of the console telling you the version of R that you’re running.
The panel in the upper right contains your workspace. This shows the variables and objects you define during your R session, and a history of the commands that you enter.
Any plots that you generate will show up in the panel in the lower right corner. This is also where you can browse your files, and access help files, and upload and download files.
We will start by making a course folder in RStudio Server that you can use to store all your R work for the course on the web. Click on the Files tab in the lower right panel, and the New Folder tab. Enter the file name introstatsR
in the window that opens, and click OK. You should now have a new folder!
Next, go to the course site, and download the data set posted for this week’s lab session. Put it in a location on your computer that you will remember! I highly suggest you also make an introstatsR
folder on your computer to store all your material for this course. Check in with the instructor if you don’t know how to do this!
To have access to files stored on the hard drive on your computer in RStudio Server, you need to upload them from your computer to the server. To upload the data set click on the RStudio Server introstatsR
folder once, and click the upload button, like so:
In the window that opens, browse to where you stored your data set on your computer, click on the data file, then click OK. Open the introstatsR
folder again in RStudio Server, to make sure the data is in there. You can upload any sort of file like this.
We are not doing anything with this toydata
file, except learning how to upload it.
When you want to write a paper, you have to open a Word document to type your ideas into, and save your work in. In R we use a document type called an R Markdown document. R Markdown documents are useful for both running code, and annotating the code with comments. The document can be saved, so you can refer back to your code later, and can be used to create other document types (html, word, pdf, or slides) for presenting the results of your analyses. R Markdown provides a way to generate clear and reproducible statistical analyses.
To open a new file, click on the little green plus on the upper left hand, and select R Markdown, as in the image below. You can leave it untitled.
When you open a new R Markdown file, there is some example code in it that you can get rid of. We will take care of this next.
Let’s make some changes to the R Markdown file you just opened. Using the image below as a guide
Your final result should look like this:
You will complete your lab work in an R Markdown file like this each week, so it is important to learn how to save these files.
introstatsR
course folder you just createdPS01_lastname_firstname
(fill in your firstname and lastname)This is now saved in the introstatsR
course folder on the server.
Click the Knit button at the top left side of the screen to “knit” the file, or in other words, produce an output document. An .html
file will be generated. It is automatically saved in the same folder that your R Markdown file was saved in.
Note that there is now a R Markdown file (.Rmd
) and an html file (.html
) in the introstatsR
folder.
Inspect the .html
file to see how what you typed was formatted. There are lots of tricks for controlling the formatting of the knitted html file. For instance:
##
and a space in front of text makes it into a large header. For example, see how ## This is a header
in your R Markdown .Rmd
file translates in the resulting .html
output.###
and a space in front of text makes it a smaller header!The code chunks are where you put R code in a R Markdown file. So far, your “knitted” file (your output document file) doesn’t show anything, because we did not put any content in the code chunks yet!
Using your first code chunk, type the following command to create a new variable called x
with the value of 6.
x <- 6
The arrow <-
is called an ASSIGNMENT OPERATOR, and tells R to save an object called x
that has the value of 6. This is similar to saving a value in a graphing calculator.
Note that whatever you want to save must always be to the left of the assignment operator!!
To actually RUN this command in your console, you have a few options:
Control-Enter
on a PC or Command-Return
on a MacThink of “running” code in your console as telling R “do this”.
Note that you now have a new object in your workspace, called x!
So far you have made a numeric variable x
. There many other types of data objects you can make in R.
First, copy, paste and run the following command in a new code chunk to make a character called favorite_movie
. Think of characters as text as opposed to numerical values. Note that I told R that this was a character by putting quotation marks around Star_Wars
.
favorite_movie <- "Star_Wars"
Next, copy, paste and run the following command into a new code chunk.
v <- c(2, 4, 6)
This makes what is called a vector, which we have named v
. It is a data object that has multiple elements of the same type. This vector contains three numbers, 2, 4, and 6. The c()
function says to r to concatenate
the values 2, 4, 6, into a single vector. Note in the Environment pane that your vector v
contains numbers (listed as num
).
You can do math on a vector that contains numbers! For instance, copy, paste and run the following command into a new code chunk. This tells R to multiply each element of the vector v
by 3.
v * 3
To complete this problem set you will next run through some Exercises, and submit a knitted .html
file with answers all the Exercises. Please make a header for each of these Exercises. If you need to answer an Exercise with text, type the text below the header, on the next line, in the white part, and if you need to answer an Exercise with some code, insert a code chunk below the header, and put the code in the greyed out box.
Remember to save your work as you go along! Click the save button in the upper left hand corner of the R Markdown window.
y
with the value of 7x
by y
, and store the answer in a variable named z
like so: z <- x * y
favorite_movie
, x
, v
, y
, and z
all in your Environment pane6 + 3
6 + 3
as a variable called a.a
show up? (please answer with text)a
into the code chunk and re-run the code chunk. What happens? (please answer with text)It is a good idea to try kitting your document from time to time as you go along! Go ahead, and make sure your document is knitting, and that your html file includes Exercise headers, text, and code. Note that knitting automatically saves your Rmd file too!
a^2
.^
operator do? (please answer with text)sum(a, x, y)
sum
is a function. Based on the output, what do you think the sum
function does? (please answer with text)It is a good idea to try kitting your document from time to time as you go along! Go ahead, and make sure your document is knitting, and that your html file includes Exercise headers, text, and code. Note that knitting automatically saves your Rmd file too!
v
we created earlier. Copy, paste and run the following in a code chunk. What does this code accomplish? (please answer with text)v + 2
music
, that contains music genres. Recall a vector is a data object that has multiple elements of the same type. Here the data type is a character. Look in the environment pane. How does R tell us that this vector contains characters, not numbers? (please answer with text)music <- c("bluegrass", "funk", "folk")
Italicize like this
Bold like this
A superscript: R2
Each week you will submit the html file on Moodle. This involves downloading the html file from the RStudio Server to your personal computer. The steps to do this are as follows:
introstatsR
folderintrostatsR
folder on your machineIf you need more help, there is a video at the top of this page that can help.
There are a lot of stats classes and students using the Server. To keep it as fast as possible, it is best if you sign out when you are done. To do so follow all the same steps for closing an R Markdown document as above: