Analyzing RNA-Seq with DESeq2

Getting Started

Code chunk options

The knitr package provides a lot of options for customizing nearly all components of code chunk, including the text outputs, source codes, and plots. Under the text outputs, you can find echo. When echo = TRUE this means to display the source code in the output documents. When warning = FALSE this means that all warnings will be suppressed.

Install the required R libraries

# Install CRAN packages
#install.packages(c("knitr", "RColorBrewer", "pheatmap"))

library(knitr)
library(DESeq2) 
library(RColorBrewer)
library(pheatmap)
library(ggplot2)
library(tidyverse)
library(patchwork)
library(ggplotify)
library(ggpubr)

Libraries loaded

Loading the packages required for subsequent analyses. In the future, we will be using pacman::p_load. Please look to have these packages installed for use by next week.

# loading required packages 
#pacman::p_load(DESeq2, dplyr, tidyr, ggplot2, genekitr, pheatmap, 
               #RColorBrewer, EnhancedVolcano, clusterProfiler, DOSE, 
               #ggnewscale, enrichplot, knitr, cowplot, msigdbr, pathview, 
               #patchwork, ggplotify)

Introduction to DESeq2 with HTSeq-counts files

As a reminder, we have previously assigned all the sequencing reads to a given gene by directly counting reads overlapping with gene loci using HTSeq-count. Now, the next step is to analyze the counts data for the detection of differentially expressed genes (DEGs). The package DESeq2 provides methods to test for differential expression by use of negative binomial generalized linear models.

Overview

To use DESeq2 it requires the following:

counts matrix
table of sample information
design indicates how to model the samples

Why un-normalized counts?

As input, the DESeq2 package expects count data to be in the form of a matrix of integer values. The values in the matrix should be un-normalized counts. The DESeq2 model internally corrects for library size, so transformed or normalized values should not be used as input. To normalize for sequencing depth and RNA composition, DESeq2 uses the median of ratios method. On the user-end there is only one step, but on the back-end there are multiple steps involved, including:

Step 1: creating a pseudo-reference sample (row-wise geometric mean)

Step 2: calculates ratio of each sample to the reference

For every gene in a sample, the ratios (sample/ref) are calculated (as shown below). This is performed for each sample in the dataset. Since the majority of genes are not differentially expressed, the majority of genes in each sample should have similar ratios within the sample.

Step 3: calculate the normalization factor for each sample (size factor)

The median value (column-wise for the above table) of all ratios for a given sample is taken as the normalization factor (size factor) for that sample, as calculated below.

Step 4: calculate the normalized count values using the normalization factor

The DESeqDataSet

The object used by the DESeq2 package to store the (1) read counts, (2) table of sample information , and (3) design is the DESeqDataSet, this is usually represented as dds.

dds = 1 + 2 + 3

dds = readcounts, table of sample, design

There are multiple ways of constructing a DESeqDataSet, depending on what pipeline was used upstream of DESeq2 to generated counts or estimated counts. For example, if you used Salmon to generate read-counts, you would use DESeqDataSetFromTximport()

From transcript abundance files and tximport - DESeqDataSetFromTximport()
From a count matrix - DESeqDataSetFromMatrix
From htseq-count files - DESeqDataSetFromHTSeqCount

We will be discussing how to input files output from htseq-count program.

About the Data

Our end product was the creation of counts files.So what does the count data actually represent?

example-counts-matrix

The count data used for differential expression analysis represents the number of sequence reads that originated from a particular gene. The higher the number of counts, the more reads associated with that gene, and the assumption that there was a higher level of expression of that gene in the sample. Our overall goal is to identify a list of genes that are statistically distinct between the groups being compared.

For today’s lesson we will be working with RNA-seq data from a recent publication in Nature Communications by Shan et al. (2021).

T cell identity is established during thymic development, but how it is maintained in the periphery remains unknown.

T-cell-development

The authors in this paper discover that by ablating Tcf1 and Lef1 transcription factors in mature CD8+ T cells this induces the expression of genes from non-T cell lineages. We will focus on analyzing the Tcf7 dataset. Specifically, Tcf7 fl/fl mice were crossed with hCD2-Cre mice to create mature CD8+ T cells that lacked Tcf7. The protein coded by Tcf7 is TCF1.

htseq-count input

Let’s take a closer look at the contents of the ENSG_counts folder:

img folder contains images
counts folder contains outputs from HTSeq2 in the form of a .count file
The metadata table called GSE164713_tcf1_anno.csv

To input our metadata file:

sampleTable <- read.csv("GSE164713_tcf1_anno.csv", header = T)
head(sampleTable)

##         samplename               filename   condition replicate
## 1      WT_CD8_rep1 SRR13423162.ENSG.count      WT_CD8         1
## 2      WT_CD8_rep2 SRR13423163.ENSG.count      WT_CD8         2
## 3      WT_CD8_rep3 SRR13423164.ENSG.count      WT_CD8         3
## 4 TCF1_KO_CD8_rep1 SRR13423165.ENSG.count TCF1_KO_CD8         1
## 5 TCF1_KO_CD8_rep2 SRR13423166.ENSG.count TCF1_KO_CD8         2
## 6 TCF1_KO_CD8_rep3 SRR13423167.ENSG.count TCF1_KO_CD8         3

The order here actually matters. Let’s check why by turning to the DESeqDataSetFromHTSeqCount manual:

#?DESeqDataSetFromHTSeqCount

#Convert to data frame 
sampleTable <- data.frame(sampleTable)
head(sampleTable)

##         samplename               filename   condition replicate
## 1      WT_CD8_rep1 SRR13423162.ENSG.count      WT_CD8         1
## 2      WT_CD8_rep2 SRR13423163.ENSG.count      WT_CD8         2
## 3      WT_CD8_rep3 SRR13423164.ENSG.count      WT_CD8         3
## 4 TCF1_KO_CD8_rep1 SRR13423165.ENSG.count TCF1_KO_CD8         1
## 5 TCF1_KO_CD8_rep2 SRR13423166.ENSG.count TCF1_KO_CD8         2
## 6 TCF1_KO_CD8_rep3 SRR13423167.ENSG.count TCF1_KO_CD8         3

str(sampleTable)

## 'data.frame':    6 obs. of  4 variables:
##  $ samplename: chr  "WT_CD8_rep1" "WT_CD8_rep2" "WT_CD8_rep3" "TCF1_KO_CD8_rep1" ...
##  $ filename  : chr  "SRR13423162.ENSG.count" "SRR13423163.ENSG.count" "SRR13423164.ENSG.count" "SRR13423165.ENSG.count" ...
##  $ condition : chr  "WT_CD8" "WT_CD8" "WT_CD8" "TCF1_KO_CD8" ...
##  $ replicate : int  1 2 3 1 2 3

Next, we want to specify the condition column as a factor. By doing so, this will allow us to categorize the data and store it as a level.

#specify as factor 
condition <- factor(sampleTable$condition)

Running DESeq2 requires DESeqDataSet

dds <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                  directory = "counts/",
                                  design = ~ condition)

dds

## class: DESeqDataSet 
## dim: 57010 6 
## metadata(1): version
## assays(1): counts
## rownames(57010): ENSMUSG00000000001.5 ENSMUSG00000000003.16 ...
##   ENSMUSG00002076991.1 ENSMUSG00002076992.1
## rowData names(0):
## colnames(6): WT_CD8_rep1 WT_CD8_rep2 ... TCF1_KO_CD8_rep2
##   TCF1_KO_CD8_rep3
## colData names(2): condition replicate

The DESeqDataSet object must have an associated design formula. The design formula expresses the variables which will be used in modeling. The formula should include a tilde (~) followed by the variables with plus signs between them if required.

Note: In order to benefit from the default settings of the package, one should put the variable of interest at the end of the formula and make sure the control level is the first level.

In other words, since we want to measure the effect of the condition between the two treatments this should go at the end and if we wanted to control for potential batch differences due to how these replicates were made we would add this as the first variable(~batch+condition).

Pre-filtering

While it is not necessary to pre-filter low count genes before running the DESeq2 functions, there are two reasons which make pre-filtering useful: by removing rows in which there are very few reads, we reduce the memory size of the dds data object, and we increase the speed of the transformation and testing functions within DESeq2. It can also improve visualizations, as features with no information for differential expression are not plotted.

Below we perform a minimal pre-filtering to keep only rows that have at least 10 reads total.

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

Note on factor levels

By default, R will choose a reference level for factors based on alphabetical order. Therefore, if you never tell the DESeq2 functions which level you want to compare against (e.g. which level represents the control group), the comparisons will be based on the alphabetical order of the levels.

use relevel to specify the reference level:

dds$condition <- relevel(dds$condition, ref = "WT_CD8")

Library size differences

With DESeq2, size factors are calculated using the estimateSizeFactors() function. The size factors estimated by this function combines an adjustment for differences in library sizes with an adjustment for differences in the RNA composition of the samples. The latter is important due to the compositional nature of RNA-seq data. There is a fixed number of reads to distribute between the genes, and if a single (or a few) very highly expressed gene consume a large part of the reads, all other genes will consequently receive very low counts.

dds <- estimateSizeFactors(dds)

# Plot the size factors against library size
# and look for any patterns by group:

ggplot(data.frame(libSize = colSums(assay(dds)),
                  sizeFactor = sizeFactors(dds),
                  Group = dds$condition),
       aes(x = libSize, y = sizeFactor, col = condition)) + 
  geom_point(size = 5) + theme_bw() + 
  labs(x = "Library size", y = "Size factor")

Running Differential expression analysis

The standard differential expression analysis steps are wrapped into a single function, DESeq(). The estimation steps performed by this function are described in the manual page for ?DESeq.

dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds) : Dispersion estimates in DESeq2 quantify the biological and technical variability in RNA-seq count data.
dds <- nbinomWaldTest(dds) : fit a generalized linear model (GLM) and compute log2 fold changes corresponding to the variables of the design matrix.

dds <- DESeq(dds)

Explore results for specific contrasts

Using the results() function, the user can specify the comparison to build a results table for with the contrast argument.

Details about the comparison are printed to the console, directly above the results table. The text, condition KO vs WT, tells you that the estimates are of the logarithmic fold change log2(KO/WT) i.e. log2(treated/untreated).

res <- results(dds, contrast=c('condition', 'TCF1_KO_CD8', 'WT_CD8')) 
head(res)

## log2 fold change (MLE): condition TCF1_KO_CD8 vs WT_CD8 
## Wald test p-value: condition TCF1 KO CD8 vs WT CD8 
## DataFrame with 6 rows and 6 columns
##                          baseMean log2FoldChange     lfcSE      stat     pvalue
##                         <numeric>      <numeric> <numeric> <numeric>  <numeric>
## ENSMUSG00000000001.5   3904.95911      0.0746508 0.0569362   1.31113 0.18981359
## ENSMUSG00000000028.16    66.29971      0.3183793 0.2498710   1.27417 0.20260163
## ENSMUSG00000000037.18     4.93848     -0.2251786 0.9064074  -0.24843 0.80380188
## ENSMUSG00000000056.8    833.77631     -0.2260724 0.0819141  -2.75987 0.00578239
## ENSMUSG00000000078.8  25622.39656      0.3500977 0.1706392   2.05168 0.04020041
## ENSMUSG00000000085.17   377.59854     -0.1435168 0.1126148  -1.27440 0.20252037
##                            padj
##                       <numeric>
## ENSMUSG00000000001.5  0.3837944
## ENSMUSG00000000028.16 0.4010438
## ENSMUSG00000000037.18 0.9056941
## ENSMUSG00000000056.8  0.0241046
## ENSMUSG00000000078.8  0.1190318
## ENSMUSG00000000085.17 0.4010438

dim(res)

## [1] 19401     6

_baseMean - Mean of the normalized counts for all samples in the comparison, for a given gene
_log2FoldChange - log2 fold change between Tcf1 KO and WT
_pvalue - Wald test P value
_padj - Benjamini-Hochberg adjusted Wald test P value (P-value after multiple test correction)

To explore the output of the results() function we can also use the summary() function.

summary(res)

## 
## out of 19401 with nonzero total read count
## adjusted p-value < 0.1
## LFC > 0 (up)       : 2713, 14%
## LFC < 0 (down)     : 2716, 14%
## outliers [1]       : 6, 0.031%
## low counts [2]     : 2633, 14%
## (mean count < 4)
## [1] see 'cooksCutoff' argument of ?results
## [2] see 'independentFiltering' argument of ?results

Independent Filtering and log-fold shrinkage

A good check is to explore the relationship between log2fold changes, significant DE genes and the genes mean count. DESeq2 provides a useful function to do so, plotMA().

plotMA(res)

We can see that genes with a low mean count tend to have larger log fold changes. This is caused by counts from lowly expressed genes tending to be very noisy. We can shrink the log fold changes of these genes with low mean and high dispersion, as they contain little information.

resLfc <- lfcShrink(dds, coef = "condition_TCF1_KO_CD8_vs_WT_CD8", res = res)
#resultsNames(dds)

plotMA(resLfc)

result <- resLfc[order(resLfc$padj), ]

table(result$padj<0.05)

## 
## FALSE  TRUE 
## 12110  4652

Output Results

We will now output our results out of R to have a stand-alone file. First, we will merge the stats (contained within result object) with the normalized counts (contained within dds object). The final object created is called resdata and now since it is a dataframe we can create output a tabular file with write.csv.

resdata <- merge(as.data.frame(result), 
                 as.data.frame(counts(dds, normalized=TRUE)), 
                 by="row.names", 
                 sort=FALSE)

names(resdata)[1] <- "Gene_id"

head(resdata)

##                 Gene_id baseMean log2FoldChange      lfcSE pvalue padj
## 1 ENSMUSG00000015143.16 6770.207      -2.944392 0.05837777      0    0
## 2  ENSMUSG00000015437.6 1365.097       6.205747 0.15795883      0    0
## 3 ENSMUSG00000021756.13 5340.423      -1.929679 0.04968811      0    0
## 4  ENSMUSG00000025163.7 4445.134       2.450195 0.06249997      0    0
## 5  ENSMUSG00000035042.3 2816.472       6.873397 0.14335657      0    0
## 6 ENSMUSG00000039521.14 1506.650       6.551260 0.16941081      0    0
##   WT_CD8_rep1 WT_CD8_rep2 WT_CD8_rep3 TCF1_KO_CD8_rep1 TCF1_KO_CD8_rep2
## 1 12433.34924 11759.60918 11762.92037         1518.717         1477.525
## 2    30.97347    42.89393    34.62547         2565.046         2960.111
## 3  8406.79850  8492.04492  8490.81374         2237.728         2187.952
## 4  1421.78206  1263.94113  1434.79279         7804.848         7529.308
## 5    48.95806    40.03433    54.10229         5114.678         5337.308
## 6    32.97176    32.40875    29.21524         2819.828         3377.057
##   TCF1_KO_CD8_rep3
## 1         1669.121
## 2         2556.930
## 3         2227.202
## 4         7216.133
## 5         6303.749
## 6         2748.418

#write results
write.csv(resdata, 
          file="output/TCF1KOvsWT_unfiltered_matrix.csv", 
          row.names = FALSE)

The next code chunk makes use of the tidyverse package. We are using this package as a way of getting a list of differentially expressed genes (DEGs) only. We are filtering the 19,000+ rows to only rows which meet the criteria of having an absolute log2FC of 1 and padj of less than 0.05.

In the code below, output the DE results.

Label the file TCF1KOvsWT_log2fc1_fdr05_DEGlist.csv.

#Get DEG up/down lists
resdata <- as.data.frame(result) |>  
  dplyr::filter(abs(log2FoldChange) > 1 & padj < 0.05) |> 
  tibble::rownames_to_column("Gene_id")

#write results
write.csv(resdata, 
          "output/TCF1KOvsWT_log2fc1_fdr05_DEGlist.csv", 
          row.names = FALSE)

head(resdata)

##                 Gene_id baseMean log2FoldChange      lfcSE pvalue padj
## 1 ENSMUSG00000015143.16 6770.207      -2.944392 0.05837777      0    0
## 2  ENSMUSG00000015437.6 1365.097       6.205747 0.15795883      0    0
## 3 ENSMUSG00000021756.13 5340.423      -1.929679 0.04968811      0    0
## 4  ENSMUSG00000025163.7 4445.134       2.450195 0.06249997      0    0
## 5  ENSMUSG00000035042.3 2816.472       6.873397 0.14335657      0    0
## 6 ENSMUSG00000039521.14 1506.650       6.551260 0.16941081      0    0

Data transformation and visualization

Principal components analysis (PCA)

In order to test for differential analysis, we used raw counts. These matrices are so large that we need convenient ways to extract the important information from these large data matrices therefore it is useful to transform the data.

In using principal component analysis (PCA):

The data is reduced to smaller matrices so they can more easily be examined, plotted and interpreted.
The most important factors are extracted (principal components).

Below we are using an variance stabilizing transformation (vst) (Tibshirani 1988; Huber et al. 2003; Anders and Huber 2010) or regularized logarithm (rlog) (Love, Huber, and Anders 2014). Both transformations will produce transformed data on the log2 scale which has been normalized with respect to library size or other normalization factors.

Using the transformed data we will generate a few outputs.

PCA plot: The PCA plot will show the samples in a 2D plane spanned by the first two principal components. This type of plot is useful for visualizing the overall effect of experimental covariates (ex. age) and batch effects (ex. day sample was prepared).

vstcounts <- vst(dds, blind = FALSE)

It is also possible to customize the PCA plot using the ggplot function. Below we are adding shapes to represent the different days the samples were created.

pcaData <- plotPCA(vstcounts, intgroup=c("condition"), returnData=TRUE)

percentVar <- round(100 * attr(pcaData, "percentVar"))

ggplot(pcaData, aes(PC1, PC2, color=condition)) +
  geom_point(size=2) +
  xlab(paste0("PC1: ",percentVar[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar[2],"% variance")) + 
  coord_fixed()

Now, we are making sure the x and y axis are similar using ylim argument

#pdf(file = "output/PCA_plot_TCF1_RNAseq.pdf", height = 6, width = 6)

pcaData <- plotPCA(vstcounts, intgroup=c("condition"), returnData=TRUE)

percentVar <- round(100 * attr(pcaData, "percentVar"))

pca <- ggplot(pcaData, aes(PC1, PC2, color=condition, shape = condition)) +
  geom_point(size=4, alpha=0.8) + # adding transparency
  ylim(-10,10) + 
  xlab(paste0("PC1: ",percentVar[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar[2],"% variance")) + 
  coord_fixed() + 
  ggtitle("PCA of TCF1 RNASeq data") + 
  scale_color_brewer(palette = "Set1") + 
  theme_classic(base_size = 10) + 
  theme(legend.position = "right", legend.title = element_blank(),
        plot.title = element_text(hjust=0.5, face = "bold"))

#dev.off()

#saveRDS(pca, file = "pca.RDS")

Now that we got it looking the way we want it, we can take some time to interpret the plot. Each data point represents one sample. The x-axis typically PC1, explains the largest source of variation in the dataset. We can see that PC1 is primarily explained by knocking out the gene Tcf7 (encodes Tcf1) in CD8 T cells. The next largest source of variance within the dataset, or PC2, is explained by one sample created on day1 in the WT group. Overall, the groups are clustering as expected.

Finally, we can generate a correlation heatmap. This is a type of plot used to visualize the strength of relationships between samples.

#pdf(file = "output/correlation_plot_TCF1_RNAseq.pdf", height = 3, width = 4)

vst_mat <- assay(vstcounts)
vst_cor <- cor(vst_mat)
annotation_heatmap <- data.frame(row.names=colnames(vst_mat), condition)

cor_heatmap <- as.ggplot(pheatmap(vst_cor,
                                  annotation_col = annotation_heatmap,
                                  main = "sample correlation", 
                                  show_rownames = F, 
                                  show_colnames = F,
                                  fontsize = 6,  # Decrease overall font size
                                  fontsize_col = 6,  
                                  fontsize_row = 6,
                                  width = 3,
                                  height = 4))  # Decrease legend size

#dev.off()

saveRDS(cor_heatmap, "cor_heatmap.RDS")

library(ggplot2)
library(pheatmap)
library(grid)
library(cowplot)
library(magick)  


# Save heatmap to a PDF with controlled size
pdf("temp_heatmap.pdf", width = 4, height = 2)

pheatmap(vst_cor,
         annotation_col = annotation_heatmap,
         show_rownames = FALSE, 
         show_colnames = FALSE,
         fontsize = 8, 
         fontsize_col = 6, 
         fontsize_row = 6,
         legend_cex = 0.6)
dev.off()

# Convert PDF to image
heatmap_img <- image_read_pdf("temp_heatmap.pdf")

# Convert image to grob
heatmap_grob <- rasterGrob(as.raster(heatmap_img), interpolate = TRUE)

# Combine PCA plot and heatmap, ensuring heatmap is half the height
FigAB <- plot_grid(pca, 
                   heatmap_grob, 
                   labels = c("(A)", "(B)"), 
                   ncol = 2,
                   rel_widths = c(1, 1),    
                   rel_heights = c(1, 0.5), # Heatmap is half the height
                   align = "hv")  

# Save final figure
ggsave("QC_figure.pdf", FigAB, height = 6, width = 8, dpi = 300, units = "in", scale = 1)

Session info

sessionInfo()

## R version 4.4.3 (2025-02-28)
## Platform: x86_64-apple-darwin20
## Running under: macOS Sequoia 15.3
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] ggpubr_0.6.0                ggplotify_0.1.2            
##  [3] patchwork_1.3.0             lubridate_1.9.4            
##  [5] forcats_1.0.0               stringr_1.5.1              
##  [7] dplyr_1.1.4                 purrr_1.0.4                
##  [9] readr_2.1.5                 tidyr_1.3.1                
## [11] tibble_3.2.1                tidyverse_2.0.0            
## [13] ggplot2_3.5.1               pheatmap_1.0.12            
## [15] RColorBrewer_1.1-3          DESeq2_1.46.0              
## [17] SummarizedExperiment_1.36.0 Biobase_2.66.0             
## [19] MatrixGenerics_1.18.1       matrixStats_1.5.0          
## [21] GenomicRanges_1.58.0        GenomeInfoDb_1.42.3        
## [23] IRanges_2.40.1              S4Vectors_0.44.0           
## [25] BiocGenerics_0.52.0         knitr_1.50                 
## 
## loaded via a namespace (and not attached):
##  [1] rlang_1.1.5             magrittr_2.0.3          compiler_4.4.3         
##  [4] vctrs_0.6.5             pkgconfig_2.0.3         crayon_1.5.3           
##  [7] fastmap_1.2.0           backports_1.5.0         XVector_0.46.0         
## [10] rmdformats_1.0.4        labeling_0.4.3          rmarkdown_2.29         
## [13] tzdb_0.5.0              UCSC.utils_1.2.0        xfun_0.51              
## [16] zlibbioc_1.52.0         cachem_1.1.0            jsonlite_2.0.0         
## [19] DelayedArray_0.32.0     BiocParallel_1.40.0     broom_1.0.8            
## [22] parallel_4.4.3          R6_2.6.1                bslib_0.9.0            
## [25] stringi_1.8.7           car_3.1-3               jquerylib_0.1.4        
## [28] numDeriv_2016.8-1.1     Rcpp_1.0.14             bookdown_0.42          
## [31] Matrix_1.7-3            timechange_0.3.0        tidyselect_1.2.1       
## [34] rstudioapi_0.17.1       abind_1.4-8             yaml_2.3.10            
## [37] codetools_0.2-20        lattice_0.22-6          plyr_1.8.9             
## [40] withr_3.0.2             coda_0.19-4.1           evaluate_1.0.3         
## [43] gridGraphics_0.5-1      pillar_1.10.1           carData_3.0-5          
## [46] generics_0.1.3          emdbook_1.3.13          hms_1.1.3              
## [49] munsell_0.5.1           scales_1.3.0            glue_1.8.0             
## [52] tools_4.4.3             apeglm_1.28.0           locfit_1.5-9.12        
## [55] ggsignif_0.6.4          fs_1.6.5                mvtnorm_1.3-3          
## [58] grid_4.4.3              bbmle_1.0.25.1          bdsmatrix_1.3-7        
## [61] colorspace_2.1-1        GenomeInfoDbData_1.2.13 Formula_1.2-5          
## [64] cli_3.6.4               S4Arrays_1.6.0          gtable_0.3.6           
## [67] rstatix_0.7.2           yulab.utils_0.2.0       sass_0.4.9             
## [70] digest_0.6.37           SparseArray_1.6.2       farver_2.1.2           
## [73] htmltools_0.5.8.1       lifecycle_1.0.4         httr_1.4.7             
## [76] MASS_7.3-65

Citation

title: “Analyzing RNA-Seq with DESeq2” author: “Michael I. Love, Simon Anders, and Wolfgang Huber”

Note: if you use DESeq2 in published research, please cite:

Love, M.I., Huber, W., Anders, S. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15:550. 10.1186/s13059-014-0550-8