Chapter 3 Getting Started with R
3.1 Objectives
- Compare and contrast the benefits of running code in the console versus a script.
- Answer a question about a function by looking it up in RStudio.
- Distinguish between a helpful and less helpful Stack Overflow answer.
- Identify which RStudio pane will show you a window of a loaded table.
As stated in the introduction, our overall goal is to work with people, programs and data. In this section, we will focus on programs and data as we learn how to run R code, as well as how data is stored and accessed on a computer.
3.2 Introduction to RStudio
Throughout this book, we’ll be writing programs (or, in verb form, programming) in order to accomplish our goals of working with data on the computer. Programming is one way to make a computer do something for us. Instead of clicking, we’ll mostly be typing; instead of doing what someone else has pre-defined, we’ll have a lot of flexibility to do what we want.
Just like using a web browser to access websites, and a program like MS Word to write documents, it’s helpful to have a program on your computer that is designed to make it easy to write and run code. This kind of program is called an “IDE” or Integrated Development Environment.
The one we’ll be using in this book for writing R code is called RStudio. RStudio (like many IDEs) has many panes (or panels or boxes), each of which has a different purpose.
FIXME: Screenshot or schematic.
We’ll take it slowly and introduce the purpose of each pane one at a time. We’ll start with the pane occupying the lefthand side of your screen.
3.3 Running Code in the Console
Now that we have our space for writing and running code (our environment) open, it’s time to actually run some code.
The first time you open a new installation of RStudio, the pane occupying the entire lefthand side of the screen is called the console. The console is a program that is constantly ready and waiting to accept and run code. The console you see in RStudio is expecting to see R code. Here’s an example you can type or copy in to see how the console works:
print("It was the best of times, it was the worst of times.")
## [1] "It was the best of times, it was the worst of times."
A function is a set of code that can repeatedly perform a specific task.
As you might guess from the name, the R print()
function
takes in text (indicated by the quotes) and then prints the text back out to
the console.
R has many built-in functions, each of which accomplishes different things:
round(3.1415)
## [1] 3
Sys.Date()
## [1] "2021-05-03"
length("hippo")
## [1] 1
Part of learning to program is learning some of the base R functions and what they do; in upcoming chapters, we’ll focus on functions that allow you to read in and manipulate data. At some point it may be helpful to write your own functions, which we’ll also cover in another chapter.
3.4 Running Code via Scripts
Can you imagine a situation where continuing to type code into the console could become tedious or challenging?
Here are some examples:
- Running the same code many times (you can use the up-arrow to see previous commands - try it! - but if you run many commands, you might end up scrolling a long time.
- Organizing long sections of code in a meaningful way, and working on subsections separately.
- Saving the code as a text file that can be stored on the computer’s hard drive and shared with collaborators.
Clearly, we need an approach that will allow us to write and run code while keeping a record of our work and allowing us to run (and re-run) parts.
The answer to these challenges is to add our R code to a text file called a script. You can create a script in RStudio by using the “File” menu option to select “New file” and then “R Script.” Your R script (a blank text file) should appear on the left side of the screen above the console. Try copying in some of the commands we’ve already run:
round(3.1415)
## [1] 3
Sys.Date()
## [1] "2021-05-03"
FIXME: Screenshot or animated gif.
Now these commands are written in the script, but nothing has happened yet.
In order to run the commands from the script (just like we did in the console),
we can choose from a variety of run commands. RStudio includes a “Run” button
in the
script pane. Clicking this button runs the current line of code.
You should see both the code and output appear in the console pane on the
bottom left. Your cursor will also move to the next line of code.
Alternatively, you can use a keyboard shortcut to run code.
Hovering your mouse over the run button will show you the shortcut for
running code on your computer; this shortcut is typically Ctrl+Enter
.
You can also try selecting multiple lines of code with your cursor to execute
multiple lines simultaneously.
Sometimes, when you want to experiment or check something, it can make sense to write and run R code in the console. However, most of the time, you’ll want to write and run your code from a script. Using scripts has the benefit of saving your work while also being able to run the code just like in the console. You and your collaborators can then also easily share and run the code outside of the IDE.
We have now told the computer what to do by using R code, and we have run that code in two different ways within our “workbench” - the RSudio IDE. Let’s see what else we can use in this environment to help us.
3.5 Assigning and Recalling Objects
Besides running functions that do something (as above), we’ll also want
to use R to keep track of information that we’re using throughout
our analysis. We save that information by creating an
[object][object])
using a name we select, special set of symbols known as the
assignment operator (<-
), and then the information to save.
For example:
<- "It is a far, far better thing that I do, than I have ever done"
message <- length("Sydney Carton") name_length
On the top right pane, there’s a tab that says “Environment.” If you click on that, you’ll see the objects you just created listed under “Values.” Objects can represent data of various types - character (collection of letters and numbers encased in quotation marks), numeric (numbers, including decimals) and more.
Referencing the name of an object by itself will print the data you assigned to the console. You can also use objects as input to a function:
message
## [1] "It is a far, far better thing that I do, than I have ever done"
class(message)
## [1] "character"
These examples are objects representing small, fairly simple data. Objects can also be much more complex, representing large datasets. Hopefully you can imagine how assigning objects can allow you to work more easily with complex data, which you’ll have a chance to do in the next section.
3.6 Viewing installed packages
Functions in R are grouped together into collections of related code called packages. There is a tab in the lower right pane in RStudio called “Packages” that lists all of the packages installed in your version of R. Scrolling down the list, you’ll note that some packages have the box on the left checked. This means the package has been loaded and is available for use.
In addition to the built-in packages that come with every installation of R, we have the ability to use packages written by other programmers. One of the most common places to find such code is the Comprehensive R Archive Network, or CRAN. RStudio allows a straightforward way to download and install packages through CRAN. If you click on the “Install” button in the Packages tab, a window will appear allowing you to install from “Repository (CRAN).” In the space below “Packages,” type “tidyverse.” Making sure the box next to “Install dependencies” is checked, click “Install” to download and install this collection of code (which we’ll be using in later sections).
If you’d prefer to use code to install packages, rather than the RStudio pane, you could execute the following line of code in your console:
install.packages("tidyverse")
This accomplishes the same task as searching and installing using the Packages pane. We don’t recommend including this code in your script, since you won’t need to install the package every time you run the script.
Once a package is installed, you’ll need to ensure it is loaded and all functions within it are recognized by R for use. You can load a package by locating it among installed packages in the Packages pane, and then checking the box to the left of its name. Alternatively, you can perform the same task using code:
library("tidyverse")
It is appropriate to include this code in your script, so any functions from loaded packages are recognized by R and run as expected.
3.7 Getting Help in RStudio
There are a number of additional features in RStudio that may be of use, but we’ll discuss them in later sections as the need arises. The remaining tab that will be useful to us now is “Help,” located in the lower right pane.
You can search for more information on functions in R by typing the name of the function, such as “round,” into the search box in the Help pane. The text that appears in the window below can sometimes be extensive. You can search within this particular help page using the second search box that reads “Find in Topic.” In this case, type “decimal” to find information about how the function determines the number of decimal points to print in the output.
Luckily, R help documentation tends to be formatted very consistently. At the very top, you’ll see the name of the function (“Round,” which in this case represents a collection of related functions) followed by the name of its package inside curly brackets (“{base},” this is important because sometimes different packages have functions with the same names!). Below that, a short title indicates the purpose of the function. “Description” is a more extensive explanation that should help you figure out if the function is appropriate for your use. “Usage” provides information on how the function should be applied through code, and “Arguments” is like a legend for the code described in Usage. The last subheadings, “Details” through “References” and “Examples,” should be self-explanatory.
When you’re writing code in your script, you can also access help documentation by prefacing the name of the function with a question mark. Similar to installing packages, searching help documentation is useful when you’re working on writing code, but isn’t particularly helpful to include in your script, so try typing the following code into the console:
?sum
After executing that code, you should see the help page for that function appear in the lower right pane. This search method works on functions that are currently loaded. If you would like to perform a global search (e.g., search among all R packages installed on your computer), you can use two question marks instead (remember to enter this in the console):
??filter
RStudio also provides helpful pop-up windows that attempt to predict the function name you’re typing. You may see windows that appear as you’re coding, which include short versions of the help documentation.
3.8 Getting Help Online
Sometimes you may not be able to find the answers you seek in the documentation available through RStudio. This often happens when you need to do something completely new and you don’t know where to start. In that scenario, there are multiple internet resources that are helpful:
- blogs
- tutorials
- question pages on Stack Overflow, an open forum that allows community members to ask and answer questions
It can be challenging to find the right words to search when you’re troubleshooting something new. Don’t be discouraged, and think about different ways you can apply the terminology you’re learning to make your search terms clearer and more specific.
Even when you find information that seems useful, these resources can be overwhelming and confusing. Some things to look for in a Stack Overflow discussion include:
- Is the question clearly stated and is there example code? If you’re debugging, does the example code look like yours?
- Answers have upvotes and downvotes. Is there one clear answer that has a lot of upvotes in response to a question?
- How complex is the answer? While some questions will necessarily have a complicated answer, for many common programming tasks, there should be a solution that only takes a few lines of code.
- Do you recognize any terms in the solution? If you don’t, are there other terms for which you could search?
Conventions:
One tool in using help pages (and in reading the rest of the book) will be understanding the conventions that writers use when describing many of the terms introduced in this chapter. For example
min()
indicates that something is a function, with parentheses to indicate that it is an action with (potentially) input and output.
Some other conventions used in this book are: *
folder_name/
for a folder *variable_name
for variables *column_name
for columns
3.9 Exercises
Run each of the functions above. Can you explain what each function expects as input, and what kind of output it produces?
What would be the pros and cons of using the console versus a script in each situation?
- Writing a data analysis with multiple steps.
- Opening a new data set and exploring its dimensions.
- Checking the value of a variable.
After googling “import csv file r,” the two following pages appear in the search results: https://stackoverflow.com/questions/3391880/how-to-get-a-csv-file-into-r http://www.sthda.com/english/wiki/reading-data-from-txt-csv-files-r-base-functions
- What is useful about these two pages?
- Do you understand all of the code in the second page?
- The second page provides many possible options for code to import a csv file. Where would you start in attempting to perform this task?
Try reading a csv file:
<- read_csv("measurements.csv") data
Which pane in RStudio will show our new
data
object? What happens when you click on it?
3.10 Key Points
- RStudio is an Integrated Development Environment (IDE) for writing R code.
- You can run R via a script or console.
- For most scenarios, we write and execute code from a script.
- IDEs like RStudio have shortcuts and help pages to facilitate writing code.
- Actions in R are performed by functions and methods.
- Information in R is stored as objects that are labeled with variables.