SACNAS_RNAseq_Workshop_2023
| Audience | Computational skills required | Duration |
|---|---|---|
| SACNAS Attendees | None | 75 minute workshop |
Description
This repository has learning materials for a 75 minute, hands-on Introduction to RNA-Seq analysis with R/RStudio workshop. R is a simple programming environment that enables the effective handling of data while providing excellent graphical support. RStudio is a tool that provides a user-friendly environment for working with R.
These materials are intended to provide a general overview of the RNA-Seq data analysis, starting from processed counts files.
Learning Objectives
- Best practices: Understand best practices for designing an RNA-seq experiment
- Processing steps: Understand the processing steps from an FASTQ to counts file
- R syntax: Understand general R syntax including variables, functions, and arguments
- Using GO terms to explore enriched processes: Determine pathways for DEGs using Gene Ontology terms
- Exporting data: Generate tabular outputs of DEGs to be further investigated and visualized
Contents
| Time | Topic |
|---|---|
| ~10 mins | Module 1: RNAseq experimental setup and considerations |
| ~10 mins | Module 2: Post sequencing processing steps |
| ~40 mins | Module 3: Hands-on portion of workshop |
| ~15 mins | Questions from attendees |
Workshop Slides
The slides presented in Module 1 and 2 can be found here
Dataset
Download the R project and data for this workshop here. Decompress and move the folder to the location on your computer where you would like to perform the analysis.
Installation Requirements
Download R and RStudio for your laptop:
Install the required R packages by running the following code in RStudio:
# Install CRAN packages
install.packages(c("BiocManager", "RColorBrewer", "tidyverse", "devtools", "pheatmap", ))
# Install Bioconductor packages
BiocManager::install(c("clusterProfiler", "DESeq2", "org.Hs.eg.db", "EnhancedVolcano", "biomaRt", "enrichplot"))
Load the libraries to make sure the packages installed properly:
library(DESeq2)
library(RColorBrewer)
library(pheatmap)
library(ggplot2)
library(EnhancedVolcano)
library(biomaRt)
library(clusterProfiler)
library(org.Hs.eg.db)
library(enrichplot)
library(tidyverse)
NOTE: The library used for the annotations associated with genes (here we are using
org.Hs.eg.db) will change based on organism (e.g. if studying mouse, would need to install and loadorg.Mm.eg.db). The list of different organism packages are given here.
Additional Resources
For an overview of bioinformatics, the tools required for RNA-seq analysis and high perfomance computing, see these tutorials (the HPC parts will vary depending on your local cluster):
Bioinformatics Training
High Performance Computing
RNA-seq analysis
Informatics Technology for Cancer Research Training Network Courses
R for Data Science
Need help with Unix?
Unix Cheat Sheet
Vim - command line text editor
Common commands
Need help with R/RStudio?
Multiple RNAseq comparisons/ DESeq2:
Differential Expression Analysis
Overview
Non-model organisms:
Full-length transcriptome assembly from RNA-seq data without a reference genome
FASTQC and multiQC
Introduction to Nextflow and workflow management:
Nextflow video Nextflow Documentation
Here are some resources for publically available gene expression data:
- for published datasets available through NCBI: Gene Expression Omnibus (GEO)
- for tissue specific expression and a GUI to interact with the data: GTEx Portal
- for cancer specific datasets (with lots of clinical/phenotypic data): The Cancer Genome Atalas (TCGA) and the GUI for TCGA data, Xena (you can also upload your own data!)
- UCSC genome browser allows you to explore gene expression along the genome: UCSC genome browser