First Steps on R

Overview

Teaching: 15 min
Exercises: 10 min

Questions

What is R, and why is important to learn to use it?

What types of data does the R language has?

Objectives

Understand why R is important.

Describe the purpose and use of each panel in the RStudio IDE

Locate buttons and options in the RStudio IDE

Define a variable

Assign data to a variable

It takes courage to sail in uncharted waters -Snoopy

Introduction to R/RStudio

What is R?

“R” is used to name a programming language and the software that reads and interprets the instructions written on the scripts of this language. Is specialized in statistical computing and graphics.

The R environment combines:

effective handling of big data
collection of integrated tools
graphical facilities
simple and effective programming language

Why use R?

R is a powerful environment. It has a wide range of statistics and general data analysis and visualization capabilities.

Benefits of learning R

R is interdisciplinary and extensible: There are 10,000+ packages that can be installed to extend its capabilities. R provides a framework that allows you to combine statistical approaches from many scientific disciplines.
R works on data of all shapes and sizes: The skills you learn with R scale easily with the size of your dataset. Whether your dataset has hundreds or thousands of lines, it won’t make much difference. R is designed for data analysis, and it comes with special data structures and data types that make handling missing data convenient.
R can connect to spreadsheets, databases, and many other data formats.
R produces high quality graphics: The plotting functionalities in R are extensive and allow you to adjust any aspect of your graph to convey most effectively the message from your data.
R is free!: Anyone can use it!

Example of R used in the media

“At the BBC data team, we have developed an R package and an R cookbook to make the process of creating publication-ready graphics in our in-house style…” - BBC Visual and Data Journalism cookbook for R graphics

What is RStudio?

RStudio, which is an Integrated Development Environment (IDE) for working with R. First time users often confuse the two. At its simplest, R is like a car’s engine while RStudio is like a car’s dashboard as illustrated in the Figure below.

More precisely, R is a programming language that runs computations, while RStudio provides an interface by adding many convenient features and tools.

RStudio Interface

The RStudio interface has four main panels:

Console: where you can type commands and see output. The console is all you would see if you ran R in the command line without RStudio.
Script editor: where you can type out commands and save to file. You can also submit the commands to run in the console.
Environment/History: environment shows all active objects and history keeps track of all commands run in console
Files/Plots/Packages/Help is a handy browser for your current files, this is where your plots will appear, you can view package information, and much more.

The placement of these panes and their content can be customized (see menu, Tools -> Global Options -> Pane Layout).

Getting Set Up

It is good practice to keep a set of related data, analyses, and text self-contained in a single folder, called the working directory. All of the scripts within this folder can then use relative paths to files that indicate where inside the project a file is located (as opposed to absolute paths, which point to where a file is on a specific computer). Working this way makes it a lot easier to move your project around on your computer and share it with others without worrying about whether or not the underlying scripts will still work.

RStudio provides a helpful set of tools to do this through its “Projects” interface, which not only creates a working directory for you, but also remembers its location (allowing you to quickly navigate to it) and optionally preserves custom settings and open files to make it easier to resume work after a break. Go through the steps for creating an “R Project” for this tutorial below.

Let’s create a new project directory for our “Introduction to R” lesson today.

Open RStudio
Go to the File menu and select New Project.
In the New Project window, choose New Directory. Then, choose New Project. Name your new directory Intro-to-R and then “Create the project as subdirectory of:” the root of your VACC home account (~).
Click on Create Project.
When RStudio opens, you will see three panels in the window. We would like to generate the fourth panel.
Go to the File menu and select New File, and select R Script.
Go to the File menu and select Save As..., type Intro-to-R.R and select Save

The RStudio interface should now look like the screenshot below:

Organizing your working directory

Using a consistent folder structure across your projects will help keep things organised, and will also make it easy to find/file things in the future. This can be especially helpful when you have multiple projects. In general, you may create directories (folders) for scripts, data, and documents.

data/ Use this folder to store your raw data and intermediate datasets you may create for the need of a particular analysis. For the sake of transparency and provenance, you should always keep a copy of your raw data accessible and do as much of your data cleanup and preprocessing programmatically (i.e., with scripts, rather than manually) as possible. Separating raw data from processed data is also a good idea.
documents/ This would be a place to keep outlines, drafts, and other text.
scripts/ (or src) This would be the location to keep your R scripts for different analyses or plotting, and potentially a separate folder for your functions (more on that later).

You may want additional directories or subdirectories depending on your project needs, but these should form the backbone of your working directory.

What happens when you create a R Project?

When creating a project in RStudio you associate it with a working directory of your choice (either an existing one, or a new one). A . RProj file is created within that directory and that keeps track of your command history and variables in the environment. The .RProj file can be used to open the project in its current state but at a later date.

When a project is (re) opened within RStudio the following actions are taken:

A new R session (process) is started
The .RData file in the project’s main directory is loaded, populating the environment with any objects that were present when the project was closed.
The .Rhistory file in the project’s main directory is loaded into the RStudio History pane (and used for Console Up/Down arrow command history).
The current working directory is set to the project directory.
Previously edited source documents are restored into editor tabs
Other RStudio settings (e.g. active tabs, splitter positions, etc.) are restored to where they were the last time the project was closed.

Information adapted from RStudio Support Site

The working directory

The working directory is an important concept to understand. It is the place from where R will be looking for and saving the files. When you write code for your project, it should refer to files in relation to the root of your working directory and only need files within this structure.

Using RStudio projects makes this easy and ensures that your working directory is set properly. If you need to check it, you can use getwd(). If for some reason your working directory is not what it should be, you can change it in the RStudio interface by navigating in the file browser where your working directory should be, and clicking on the blue gear icon More, and select Set As Working Directory. Alternatively you can use setwd(“/path/to/working/directory”) to reset your working directory. However, your scripts should not include this line because it will fail on someone else’s computer.

Let’s check to see where our current working directory is located by typing into the console:

getwd() # return an abolute filepath
# this is also our first example of a function 

Your working directory should be the Intro-to-R folder constructed when you created the project. The working directory is where RStudio will automatically look for any files you bring in and where it will automatically save any files you create, unless otherwise specified.

You can visualize your working directory by selecting the Files tab from the Files/Plots/Packages/Help window.

If you wanted to choose a different directory to be your working directory, you could navigate to a different folder in the Files tab, then, click on the More dropdown menu which appears as a Cog and select Set As Working Directory.

Structuring your working directory

To organize your working directory for a particular analysis, you should separate the original data (raw data) from intermediate datasets. For instance, you may want to create a data/ directory within your working directory that stores the raw data, and have a results/ directory for intermediate datasets and a figures/ directory for the plots you will generate.

Let’s create these three directories within your working directory by clicking on New Folder within the Files tab.

When finished, your working directory should look like:

Soft Wrap Set up

This is more of a housekeeping task. In the future, we may be writing long lines of code in our script editor and want to make sure that the lines “wrap” and you don’t have to scroll back and forth to look at your long line of code.

Click on Code -> Soft Wrap Long lines (make sure this is checked off)

Interacting with R

Now that we have our interface and directory structure set up, let’s start playing with R! There are two main ways of interacting with R in RStudio: using the console or by using script editor (plain text files that contain your code).

Console window

The console window (in RStudio, the bottom left panel) is the place where R is waiting for you to tell it what to do, and where it will show the results of a command. You can type commands directly into the console, but they will be forgotten when you close the session.

Let’s test it out:

3 + 5

Script editor

Best practice is to enter the commands in the script editor, and save the script. You are encouraged to comment liberally to describe the commands you are running using #. This way, you have a complete record of what you did, you can easily show others how you did it and you can do it again later on if needed.

The Rstudio script editor allows you to ‘send’ the current line or the currently highlighted text to the R console by clicking on the Run button in the upper-right hand corner of the script editor.

Now let’s try entering commands to the script editor and using the comments character # to add descriptions and highlighting the text to run:

# Intro to R Lesson
# March 19th, 2025

# Interacting with R

## I am adding 3 and 5. 
3+5

Alternatively, you can run by simply pressing the Ctrl and Return/Enter keys at the same time as a shortcut.

You should see the command run in the console and output the result.

What happens if we do that same command without the comment symbol #? Re-run the command after removing the # sign in the front:

I am adding 3 and 5. R is fun!
3+5

Now R is trying to run that sentence as a command, and it doesn’t work. We get an error in the console “Error: unexpected symbol in “I am” means that the R interpreter did not know what to do with that command.”

Console command prompt

Interpreting the command prompt can help understand when R is ready to accept commands. Below lists the different states of the command prompt and how you can exit a command:

Console is ready to accept commands: >.

If R is ready to accept commands, the R console shows a > prompt.

When the console receives a command (by directly typing into the console or running from the script editor (Ctrl-Enter), R will try to execute it.

After running, the console will show the results and come back with a new > prompt to wait for new commands.

Console is waiting for you to enter more data: +.

If R is still waiting for you to enter more data because it isn’t complete yet, the console will show a + prompt. It means that you haven’t finished entering a complete command. Often this can be due to you having not ‘closed’ a parenthesis or quotation.

Escaping a command and getting a new prompt: esc

If you’re in Rstudio and you can’t figure out why your command isn’t running, you can click inside the console window and press esc to escape the command and bring back a new prompt >.

Keyboard shortcuts in RStudio

In addition to some of the shortcuts described earlier in this lesson, we have listed a few more that can be helpful as you work in RStudio.

key	action
`Ctrl`+`Enter`	Run command from script editor in console with Windows or Linux
`Command`+`Enter`	Run command from script editor in console with MacOS
`ESC`	Escape the current command to return to the command prompt
`Ctrl`+`1`	Move cursor from console to script editor
`Ctrl`+`2`	Move cursor from script editor to console
`Tab`	Use this key to complete a file path
`Ctrl`+`Shift`+`C`	Comment the block of highlighted text

Exercise

Try highlighting only 3 + from your script editor and running it. Find a way to bring back the command prompt > in the console.

The R syntax

Now that we know how to talk with R via the script editor or the console, we want to use R for something more than adding numbers. To do this, we need to know more about the R syntax.

The main “parts of speech” in R (syntax) include:

the comments # and how they are used to document function and its content
variables and functions
the assignment operator <-

NOTE: indentation and consistency in spacing is used to improve clarity and legibility

We will go through each of these “parts of speech” in more detail, starting with the assignment operator.

Assignment operator

To do useful and interesting things in R, we need to assign values to variables using the assignment operator, <-. For example, we can use the assignment operator to assign the value of 3 to x by executing:

x <- 3

The assignment operator (<-) assigns values on the right to variables on the left.

In RStudio, typing Alt + - (push Alt at the same time as the - key, on Mac type option and the - key) and this will write ` <- ` in a single keystroke.

Variables

A variable is a symbolic name for (or reference to) information. Variables in computer programming are analogous to “buckets”, where information can be maintained and referenced. On the outside of the bucket is a name. When referring to the bucket, we use the name of the bucket, not the data stored in the bucket.

In the example above, we created a variable or a ‘bucket’ called x. Inside we put a value, 3.

Let’s create another variable called y and give it a value of 5.

y <- 5

When assigning a value to an variable, R does not print anything to the console. You can force to print the value by using parentheses or by typing the variable name.

You can also view information on the variable by looking in your Environment window in the upper right-hand corner of the RStudio interface.

Now we can reference these buckets by name to perform mathematical operations on the values contained within. What do you get in the console for the following operation:

x + y

Try assigning the results of this operation to another variable called number.

number <- x + y

Exercises

Try changing the value of the variable x to 5. What happens to number?

Now try changing the value of variable y to contain the value 10. What do you need to do, to update the variable number?

Tips on variable names

Variables can be given almost any name, such as x, current_temperature, or subject_id. However, there are some rules / suggestions you should keep in mind:

Make your names explicit and not too long.
Avoid names starting with a number (2x is not valid but x2 is)
Avoid names of fundamental functions in R (e.g., if, else, for, see here for a complete list).
Avoid dots (.) within a variable name as in my.dataset. There are many functions in R with dots in their names for historical reasons, but because dots have a special meaning in R (for methods) and other programming languages, it’s best to avoid them.
Use nouns for object names and verbs for function names
Keep in mind that R is case sensitive (e.g., genome_length is different from Genome_length)
Be consistent with the styling of your code (where you put spaces, how you name variable, etc.). In R, two popular style guides are Hadley Wickham’s style guide and Google’s.

Best practices

Before we move on to more complex concepts and getting familiar with the language, we want to point out a few things about best practices when working with R which will help you stay organized in the long run:

Code and workflow are more reproducible if you can document everything that we do. Your end goal is not just to “do stuff”, but to do it in a way that anyone can easily and exactly replicate your workflow and results. All code should be written in the script editor and saved to file, rather than working in the console.
The R console should be mainly used to inspect objects, test a function or get help.
Use # signs to comment. Comment liberally in your R scripts. This will help future you and other collaborators know what each line of code (or code block) was meant to do. Anything to the right of a # is ignored by R.

Key Points

R is a programming language

RStudio is a useful tool for script writing and data-management.

A variable can temporarily store data.

lesson home

Introduction to R

next episode