# Install CRAN packages
#install.packages(c("knitr", "RColorBrewer", "pheatmap"))
#BiocManager::install("apeglm") # Update all/some/none? [a/s/n] - type n
#install.packages("pacman")
# loading required packages
pacman::p_load(DESeq2, dplyr, tidyr, ggplot2, genekitr, pheatmap,
RColorBrewer, EnhancedVolcano, clusterProfiler, DOSE,
ggnewscale, enrichplot, knitr, cowplot, msigdbr, pathview,
patchwork, ggplotify)
For today’s lesson we will be working with RNA-seq data from another recent publication in ImmunoHorizons by Sabikunnahar et al. (2025).
Summary: Innate immune cells rapidly respond to microbial threats, but if unchecked, these responses can harm the host, as seen in septic shock. To explore how genetic variation influences these responses, researchers compared standard lab mice (C57BL/6) with genetically diverse wild-derived mice (PWD).
Experimental Design: The authors isolated bone marrow derived dendritic cells from B6, c11.2, and PWD mice and then either left them unstimulated or stimulated with LPS.
This experiment has:
untreated, LPSB6, c11.2, and
PWDThe goal:
ENSM_counts folder contentsContents of the ENSM_counts folder:
Sabikunnahar.et.al.2025_raw_counts_matrix.csvcountdataWrite a line of R code that reads in the counts matrix using
read.csv(). Redirect output to an R object called
countdata.
header=TRUE and specify the first column as row
names.countdata is in the global environment, convert
it to a matrix using as.matrix().#load data
countdata <- read.csv("Sabikunnahar.et.al.2025_raw_counts_matrix.csv",
header=TRUE,
row.names=1)
countdata <- as.matrix(countdata)
colnames(countdata)
## [1] "Unstim_B6_Rep1" "Unstim_B6_Rep2" "Unstim_B6_Rep3"
## [4] "Unstim_B6_Rep4" "Unstim_c11.2_rep1" "Unstim_c11.2_rep2"
## [7] "Unstim_c11.2_rep3" "Unstim_c11.2_rep4" "Unstim_PWD_rep1"
## [10] "Unstim_PWD_rep2" "Unstim_PWD_rep3" "Unstim_PWD_rep4"
## [13] "LPS_B6_rep1" "LPS_B6_rep2" "LPS_B6_rep3"
## [16] "LPS_B6_rep4" "LPS_c11.2_Rep1" "LPS_c11.2_Rep2"
## [19] "LPS_c11.2_Rep3" "LPS_c11.2_Rep4" "LPS_PWD_Rep1"
## [22] "LPS_PWD_Rep2" "LPS_PWD_Rep3" "LPS_PWD_Rep4"
Now that we’ve loaded our count data, let’s build the metadata that describes each sample.
sample_names <- colnames(countdata)
treatments <- rep(c("Untreated", "LPS"), each = 12)
genotypes <- rep(c("B6", "c11.2", "PWD"), each = 4, times = 2)
Let’s check our work we should have 24 sample names, 24 treatments, and 24 genotypes.
length(sample_names)
## [1] 24
length(treatments)
## [1] 24
length(genotypes)
## [1] 24
Now that we’ve created vectors for our sample names, genotypes, and treatments, we’re ready to organize them into a metadata table — often called a coldata object in RNA-seq workflows.
coldata <- data.frame(
row.names = sample_names,
Genotype = factor(genotypes, levels = c("B6", "c11.2", "PWD")),
Treatment = factor(treatments, levels = c("Untreated", "LPS"))
)
coldata$Group <- factor(paste(coldata$Genotype, coldata$Treatment, sep = "_"))
Here, we’re building a data.frame with three key things:
row.names = sample_names — This makes sure each row is
labeled with the correct sample ID from our count data.
Genotype = factor(...) — We’re telling R that genotype
is a categorical variable with a specific order: B6, c11.2, then PWD.
This helps when plotting or running models.
Treatment = factor(...) — Same idea: treatment is also a
categorical variable, ordered as Untreated, then LPS.
coldata$Group Line creates a new column called Group,
which combines the genotype and treatment into one label.
all(colnames(countdata) == rownames(coldata))
## [1] TRUE
# should return TRUE
DESeqDataSetFromHTSeqCountDESeqDataSetFromMatrixUse the DESeqDataSetFromMatrix() function to create a
dataset object called dds. Set the design to test the
~ Group variable.
dds <- DESeqDataSetFromMatrix(countData = countdata,
colData = coldata,
design = ~ Group)
keep <- rowSums(counts(dds)) >= 20
dds <- dds[keep,]
Now that you’ve defined the dataset, run the DESeq analysis with
DESeq().
dds <- DESeq(dds)
results() in
DESeq2Once you’ve run your differential expression analysis using
DESeq(), you can extract specific results using the
results() function.
This is where the contrast argument becomes very important — especially if you’re making multiple pairwise comparisons within your experimental design.
What is contrast doing?
The contrast = c(“Group”, “A”, “B”) argument tells DESeq2:
This means DESeq2 will test for genes that are significantly different between condition A and condition B — calculating log2 fold changes, p-values, and adjusted p-values.
# Compares B6 mice treated with LPS to untreated B6 mice
B6_LPS_res <- results(dds, contrast = c("Group", "B6_LPS", "B6_Untreated"))
# How do c11.2 mice respond to LPS, compared to their untreated controls
c11.2_LPS_res <- results(dds, contrast = c("Group", "c11.2_LPS", "c11.2_Untreated"))
# How do PWD mice respond to LPS, compared to their untreated controls
PWD_LPS_res <- results(dds, contrast = c("Group", "PWD_LPS", "PWD_Untreated"))
Below write a line to compare, PWD vs. B6 both under LPS treatment.
Call the object LPS_compare_res.
# Compare LPS response between PWD and B6 (under LPS)
LPS_compare_res <- results(dds, contrast = c("Group", "PWD_LPS", "B6_LPS"))
To explore the output of the results() function we can also use the
head() function.
head(B6_LPS_res)
## log2 fold change (MLE): Group B6_LPS vs B6_Untreated
## Wald test p-value: Group B6_LPS vs B6_Untreated
## DataFrame with 6 rows and 6 columns
## baseMean log2FoldChange lfcSE stat pvalue
## <numeric> <numeric> <numeric> <numeric> <numeric>
## ENSMUSG00000020826.9 3482.13439 8.836127 0.2132265 41.440103 0.00000e+00
## ENSMUSG00000098104.1 1.07948 0.563411 1.4252563 0.395305 6.92618e-01
## ENSMUSG00000103922.1 9.71912 0.210842 0.7620138 0.276691 7.82018e-01
## ENSMUSG00000033845.13 353.21357 -0.187633 0.0811147 -2.313180 2.07128e-02
## ENSMUSG00000102275.1 7.50138 -0.113853 0.4232646 -0.268988 7.87939e-01
## ENSMUSG00000025903.14 834.78800 0.402286 0.0842057 4.777418 1.77560e-06
## padj
## <numeric>
## ENSMUSG00000020826.9 0.00000e+00
## ENSMUSG00000098104.1 7.96790e-01
## ENSMUSG00000103922.1 8.62258e-01
## ENSMUSG00000033845.13 4.34703e-02
## ENSMUSG00000102275.1 8.66964e-01
## ENSMUSG00000025903.14 6.29707e-06
head(LPS_compare_res)
## log2 fold change (MLE): Group PWD_LPS vs B6_LPS
## Wald test p-value: Group PWD LPS vs B6 LPS
## DataFrame with 6 rows and 6 columns
## baseMean log2FoldChange lfcSE stat
## <numeric> <numeric> <numeric> <numeric>
## ENSMUSG00000020826.9 3482.13439 -3.812370 0.1757376 -21.693534
## ENSMUSG00000098104.1 1.07948 0.378988 1.2991089 0.291729
## ENSMUSG00000103922.1 9.71912 3.712774 0.6034209 6.152876
## ENSMUSG00000033845.13 353.21357 -0.640256 0.0862566 -7.422692
## ENSMUSG00000102275.1 7.50138 -0.681252 0.4570126 -1.490663
## ENSMUSG00000025903.14 834.78800 -0.188839 0.0837277 -2.255400
## pvalue padj
## <numeric> <numeric>
## ENSMUSG00000020826.9 2.36133e-104 9.03706e-103
## ENSMUSG00000098104.1 7.70494e-01 8.23113e-01
## ENSMUSG00000103922.1 7.60901e-10 2.86879e-09
## ENSMUSG00000033845.13 1.14764e-13 5.38103e-13
## ENSMUSG00000102275.1 1.36050e-01 1.95847e-01
## ENSMUSG00000025903.14 2.41082e-02 4.17033e-02
# Principal components analysis
vstcounts <- vst(dds, blind=TRUE)
plotPCA(vstcounts, intgroup=c("Treatment", "Genotype"))
pcaData <- plotPCA(vstcounts,
intgroup = c("Treatment", "Genotype"),
returnData = TRUE)
What is this doing? + plotPCA is the DESeq2 function
that performs PCA on the VST-transformed counts matrix +
intgroup = c("Treatment", "Genotype"): Tells DESeq2 to pull
these columns from the colData(dds) and include them in the
output for grouping + returnData = TRUE: Instead of drawing
the plot directly, this tells the function to return the underlying PCA
coordinates (PC1, PC2) as a data frame.
percentVar <- round(100 * attr(pcaData, "percentVar"))
# returns a numeric vector that is used in axis labeling
#library(ggplot2)
ggplot(pcaData, aes(x = PC1, y = PC2, color = Genotype, shape = Treatment )) +
geom_point(size = 4, alpha = 0.8) +
xlab(paste0("PC1: ", percentVar[1], "% variance")) +
ylab(paste0("PC2: ", percentVar[2], "% variance")) +
labs(title = "PCA of VST-normalized Counts",
subtitle = "Color = Genotype, Shape = Treatment") +
theme_grey(base_size = 14) +
theme(
legend.position = "right",
plot.title = element_text(hjust = 0.5, face = "bold"),
plot.subtitle = element_text(hjust = 0.5)
) +
scale_color_brewer(palette = "Set2")
display.brewer.all(type="qual")
display.brewer.all(type="div")
display.brewer.all(type="seq")
Log-fold change (LFC) shrinkage is a statistical technique used to moderate or “shrink” the estimated log2 fold changes, especially for genes with low counts or high variability.
Without shrinkage, some genes might show very large fold changes just due to noise — especially if they’re lowly expressed. Shrinkage pulls these extreme estimates toward zero, giving you more stable and reliable fold-change estimates.
shrunk_LPS_compare <- lfcShrink(dds,
contrast = c("Group", "PWD_LPS", "B6_LPS"),
type = "ashr") # or "apeglm" if available
resultsNames(dds)
## [1] "Intercept" "Group_B6_Untreated_vs_B6_LPS"
## [3] "Group_c11.2_LPS_vs_B6_LPS" "Group_c11.2_Untreated_vs_B6_LPS"
## [5] "Group_PWD_LPS_vs_B6_LPS" "Group_PWD_Untreated_vs_B6_LPS"
plotMA(LPS_compare_res)
plotMA(shrunk_LPS_compare)
result_LPS_compare <- shrunk_LPS_compare[order(shrunk_LPS_compare$padj), ]
table(result_LPS_compare$padj<0.05)
##
## FALSE TRUE
## 8253 11801
Note: Above we are performing log-fold change shrinkage for one comparison. If you’re visualizing results (e.g., volcano plots or heatmaps) and want comparable effect sizes, you will need to do this for all contrasts. Inconsistent usage of shrinkage could lead to misleading interpretations.
We will now output our results out of R to have a stand-alone file.
First, we will merge the stats (contained within result
object) with the normalized counts (contained within dds
object). The final object created is called resdata and now
since it is a data frame we can create output a tabular file with
write.csv.
result_LPS_compare <- merge(as.data.frame(result_LPS_compare),
as.data.frame(counts(dds, normalized=TRUE)),
by="row.names",
sort=FALSE)
names(result_LPS_compare)[1] <- "Gene_id"
str(result_LPS_compare)
## 'data.frame': 20055 obs. of 30 variables:
## $ Gene_id : chr "ENSMUSG00000090272.9" "ENSMUSG00000054203.7" "ENSMUSG00000083563.1" "ENSMUSG00000044103.4" ...
## $ baseMean : num 3535 2431 9917 3421 1584 ...
## $ log2FoldChange : num -6.5 -5.94 9.51 1.85 -2.97 ...
## $ lfcSE : num 0.1331 0.1254 0.1512 0.0439 0.0639 ...
## $ pvalue : num 0 0 0 0 0 0 0 0 0 0 ...
## $ padj : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Unstim_B6_Rep1 : num 1845.4 232.8 26.6 1338.3 580.4 ...
## $ Unstim_B6_Rep2 : num 1437.2 204.8 38.9 1314.9 511.5 ...
## $ Unstim_B6_Rep3 : num 2004.2 255.4 38.9 1401 615.8 ...
## $ Unstim_B6_Rep4 : num 2084.4 304.2 31.1 1238.2 677 ...
## $ Unstim_c11.2_rep1: num 1151.8 106.3 33.4 1235.8 649.2 ...
## $ Unstim_c11.2_rep2: num 1494.4 179.8 56.9 1439.5 648.1 ...
## $ Unstim_c11.2_rep3: num 1170.3 96.7 31.6 1334.8 622.7 ...
## $ Unstim_c11.2_rep4: num 1128.3 112.8 37.3 1396.6 609.4 ...
## $ Unstim_PWD_rep1 : num 16.91 3.22 29593.14 3998.42 262.53 ...
## $ Unstim_PWD_rep2 : num 13.47 3.59 27922.65 3793.23 260.36 ...
## $ Unstim_PWD_rep3 : num 18.57 2.53 24473.99 3738.21 237.23 ...
## $ Unstim_PWD_rep4 : num 19.2 4.8 24191.7 3983.6 274.7 ...
## $ LPS_B6_rep1 : num 11213.7 10906.2 31.5 2481.3 5132.8 ...
## $ LPS_B6_rep2 : num 10948 11126 51 2340 4718 ...
## $ LPS_B6_rep3 : num 11767.5 11143 46.7 2543.2 5332.9 ...
## $ LPS_B6_rep4 : num 11543.7 11151 46.9 2386 5403.3 ...
## $ LPS_c11.2_Rep1 : num 6564.8 2777.7 70.6 2771.6 2280.8 ...
## $ LPS_c11.2_Rep2 : num 7085.5 3316.9 66.6 2700.9 2278.4 ...
## $ LPS_c11.2_Rep3 : num 6615 3000 48 2689 2130 ...
## $ LPS_c11.2_Rep4 : num 6238.4 2716.9 63.4 2857.5 2165.9 ...
## $ LPS_PWD_Rep1 : num 90.9 153.2 35293.4 8966.2 650.6 ...
## $ LPS_PWD_Rep2 : num 164 207 30379 8644 673 ...
## $ LPS_PWD_Rep3 : num 135 183 31117 8691 682 ...
## $ LPS_PWD_Rep4 : num 97 165 34315 8812 620 ...
#Add a column to dataframe to specify up or down regulated
result_LPS_compare$diffexpressed <- "Unchanged"
#if log2FC > 1 and padj < 0.05 set as upregulated
result_LPS_compare$diffexpressed[result_LPS_compare$padj < 0.05 & result_LPS_compare$log2FoldChange > 1] <- "Upregulated"
#if log2FC > -1 and padj < 0.05 set as downregulated
result_LPS_compare$diffexpressed[result_LPS_compare$padj < 0.05 & result_LPS_compare$log2FoldChange < -1] <- "Downregulated"
str(result_LPS_compare)
## 'data.frame': 20055 obs. of 31 variables:
## $ Gene_id : chr "ENSMUSG00000090272.9" "ENSMUSG00000054203.7" "ENSMUSG00000083563.1" "ENSMUSG00000044103.4" ...
## $ baseMean : num 3535 2431 9917 3421 1584 ...
## $ log2FoldChange : num -6.5 -5.94 9.51 1.85 -2.97 ...
## $ lfcSE : num 0.1331 0.1254 0.1512 0.0439 0.0639 ...
## $ pvalue : num 0 0 0 0 0 0 0 0 0 0 ...
## $ padj : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Unstim_B6_Rep1 : num 1845.4 232.8 26.6 1338.3 580.4 ...
## $ Unstim_B6_Rep2 : num 1437.2 204.8 38.9 1314.9 511.5 ...
## $ Unstim_B6_Rep3 : num 2004.2 255.4 38.9 1401 615.8 ...
## $ Unstim_B6_Rep4 : num 2084.4 304.2 31.1 1238.2 677 ...
## $ Unstim_c11.2_rep1: num 1151.8 106.3 33.4 1235.8 649.2 ...
## $ Unstim_c11.2_rep2: num 1494.4 179.8 56.9 1439.5 648.1 ...
## $ Unstim_c11.2_rep3: num 1170.3 96.7 31.6 1334.8 622.7 ...
## $ Unstim_c11.2_rep4: num 1128.3 112.8 37.3 1396.6 609.4 ...
## $ Unstim_PWD_rep1 : num 16.91 3.22 29593.14 3998.42 262.53 ...
## $ Unstim_PWD_rep2 : num 13.47 3.59 27922.65 3793.23 260.36 ...
## $ Unstim_PWD_rep3 : num 18.57 2.53 24473.99 3738.21 237.23 ...
## $ Unstim_PWD_rep4 : num 19.2 4.8 24191.7 3983.6 274.7 ...
## $ LPS_B6_rep1 : num 11213.7 10906.2 31.5 2481.3 5132.8 ...
## $ LPS_B6_rep2 : num 10948 11126 51 2340 4718 ...
## $ LPS_B6_rep3 : num 11767.5 11143 46.7 2543.2 5332.9 ...
## $ LPS_B6_rep4 : num 11543.7 11151 46.9 2386 5403.3 ...
## $ LPS_c11.2_Rep1 : num 6564.8 2777.7 70.6 2771.6 2280.8 ...
## $ LPS_c11.2_Rep2 : num 7085.5 3316.9 66.6 2700.9 2278.4 ...
## $ LPS_c11.2_Rep3 : num 6615 3000 48 2689 2130 ...
## $ LPS_c11.2_Rep4 : num 6238.4 2716.9 63.4 2857.5 2165.9 ...
## $ LPS_PWD_Rep1 : num 90.9 153.2 35293.4 8966.2 650.6 ...
## $ LPS_PWD_Rep2 : num 164 207 30379 8644 673 ...
## $ LPS_PWD_Rep3 : num 135 183 31117 8691 682 ...
## $ LPS_PWD_Rep4 : num 97 165 34315 8812 620 ...
## $ diffexpressed : chr "Downregulated" "Downregulated" "Upregulated" "Upregulated" ...
Our objective now is to convert from one gene id to another.
There are many Gene ID types, below are some of the most common:
# Remove the version from Ensembl gene IDs
result_LPS_compare$Ensembl_id <- sub("\\.\\d+$", "", result_LPS_compare$Gene_id)
# Extract the cleaned Ensembl IDs
ids <- result_LPS_compare$Ensembl_id
# Convert to gene symbols and Entrez IDs
gene_symbol_map <- transId(ids,
transTo = c("symbol", "entrez"),
org = "mouse", # switch to mouse
unique = TRUE) # ensures one-to-one mapping
result_LPS_compare <- merge(result_LPS_compare,
gene_symbol_map,
by.x = "Ensembl_id",
by.y = "input_id")
str(result_LPS_compare)
## 'data.frame': 17835 obs. of 34 variables:
## $ Ensembl_id : chr "ENSMUSG00000000001" "ENSMUSG00000000028" "ENSMUSG00000000037" "ENSMUSG00000000056" ...
## $ Gene_id : chr "ENSMUSG00000000001.4" "ENSMUSG00000000028.15" "ENSMUSG00000000037.17" "ENSMUSG00000000056.7" ...
## $ baseMean : num 5932.73 145.14 8.27 234.74 1506.78 ...
## $ log2FoldChange : num -0.0709 0.4631 2.2703 0.156 -1.2649 ...
## $ lfcSE : num 0.0426 0.1313 0.6674 0.1024 0.085 ...
## $ pvalue : num 9.35e-02 1.73e-04 3.17e-05 1.14e-01 6.48e-51 ...
## $ padj : num 1.41e-01 4.16e-04 8.29e-05 1.68e-01 9.20e-50 ...
## $ Unstim_B6_Rep1 : num 5016.28 120.12 2.13 263.62 1119.34 ...
## $ Unstim_B6_Rep2 : num 5044.72 109.35 2.78 263.17 1237.09 ...
## $ Unstim_B6_Rep3 : num 5146.78 103.24 3.62 220.98 1315 ...
## $ Unstim_B6_Rep4 : num 5307.09 112.56 3.99 234.7 1363.5 ...
## $ Unstim_c11.2_rep1: num 4824.158 224.699 0.858 346.482 1594.331 ...
## $ Unstim_c11.2_rep2: num 4932.14 231.62 3.05 330.16 1659.96 ...
## $ Unstim_c11.2_rep3: num 4922.69 253.95 4.52 338.9 1887.92 ...
## $ Unstim_c11.2_rep4: num 4920.9 237.21 5.33 357.14 1739.5 ...
## $ Unstim_PWD_rep1 : num 4885.9 181.2 20.9 223.1 641 ...
## $ Unstim_PWD_rep2 : num 4964 178.7 19.8 242.4 670.7 ...
## $ Unstim_PWD_rep3 : num 5007.9 172.2 26.2 207.7 526.8 ...
## $ Unstim_PWD_rep4 : num 5281.4 165.2 22.1 221.9 591.7 ...
## $ LPS_B6_rep1 : num 7681.26 82.95 1.43 183.06 2092.29 ...
## $ LPS_B6_rep2 : num 6798.14 89.26 2.55 163.23 2113.02 ...
## $ LPS_B6_rep3 : num 7887.46 75.05 4.25 131.69 2315.26 ...
## $ LPS_B6_rep4 : num 7722.19 72.38 1.34 151.47 2265.32 ...
## $ LPS_c11.2_Rep1 : num 5981.53 142.48 1.22 259.38 2630.31 ...
## $ LPS_c11.2_Rep2 : num 5983.17 168.64 3.55 240.53 2248.24 ...
## $ LPS_c11.2_Rep3 : num 5713.96 128.65 4.36 250.76 2286.24 ...
## $ LPS_c11.2_Rep4 : num 5733.52 184.43 8.07 293.94 2237.39 ...
## $ LPS_PWD_Rep1 : num 6980.5 106.5 14.3 170.1 988.3 ...
## $ LPS_PWD_Rep2 : num 6936.6 112.9 15.3 166.8 1008.5 ...
## $ LPS_PWD_Rep3 : num 7636.31 112.25 9.71 173.77 814.9 ...
## $ LPS_PWD_Rep4 : num 7076.9 117.9 17.1 198.8 816 ...
## $ diffexpressed : chr "Unchanged" "Unchanged" "Upregulated" "Unchanged" ...
## $ symbol : chr "Gnai3" "Cdc45" "Scml2" "Narf" ...
## $ entrezid : chr "14679" "12544" "107815" "67608" ...
#write out results
#treatment groups being compared + full matrix + date
#treatment groups being compared + _log2fc1_fdr05_DEGlist + date
write.csv(result_LPS_compare,
file="PWD-LPSvsB6-LPS_normalized_matrix_040126.csv",
row.names = FALSE)
#Get DEG up/down lists
result_LPS_compare_sig <- as.data.frame(result_LPS_compare) |>
dplyr::filter(abs(log2FoldChange) > 1 & padj < 0.05)
#write results
write.csv(result_LPS_compare_sig,
"PWD-LPSvsB6-LPS_log2fc1_fdr05_DEGlist.csv",
row.names = FALSE)
Opps, there is an error below. Fill in geom_point to
appropriately show the x and y axis. In addition, indicate
color=diffexpressed.
ggplot(result_LPS_compare) +
geom_point(aes(x=log2FoldChange,
y=-log10(padj),
color=diffexpressed)) +
theme_classic() +
ggtitle("LPS treated (PWD vs B6)") +
xlab("Log2 fold change") +
ylab("-Log10(padj)") +
theme(legend.position = "right",
plot.title = element_text(size = rel(1.25), hjust = 0.5),
axis.title = element_text(size = rel(1.25))) +
scale_color_manual(values = c("black", "grey", "red"),
labels = c("Downregulated", "Not Significant", "Upregulated")) +
scale_x_continuous(limits = c(-15, 15)) +
coord_cartesian(ylim=c(0, 300), clip = "off")
#install.packages("ggrepel") # only if you haven't installed it
library(ggrepel)
genes_to_label <- c("Mucl1", "Apol7c", "Ptx3", "Cfh", "Ifi202b", "Arnt2") # example gene symbols
result_LPS_compare$label <- ifelse(result_LPS_compare$symbol %in% genes_to_label,
result_LPS_compare$symbol,
NA)
#Adds a label column to your result table
ggplot(result_LPS_compare) +
geom_point(aes(x = log2FoldChange, y = -log10(padj), color = diffexpressed, alpha = 0.8)) +
geom_text_repel(aes(x = log2FoldChange,
y = -log10(padj),
label = label),
size = 3,
max.overlaps = Inf) + #display all labels, even if they overlap
theme_classic() +
ggtitle("LPS treated (PWD vs B6)") +
xlab("Log2 fold change") +
ylab("-Log10(padj)") +
theme(legend.position = "right",
plot.title = element_text(size = rel(1.25), hjust = 0.5),
axis.title = element_text(size = rel(1.25))) +
scale_color_manual(values = c("black", "grey", "red"),
labels = c("Downregulated", "Not Significant", "Upregulated")) +
scale_x_continuous(limits = c(-15, 15)) +
coord_cartesian(ylim = c(0, 300), clip = "off")
We are plotting:
What causes the smile?
plotCounts() to generate boxplotsFigure 3 of the paper shows an increase in pro-inflammatory cytokine
production in response to LPS, with a reduced release in PWD mice
compared to B6 mice. To recreate this finding, we’ll use the
plotCounts() function along with ggplot2.
Step1: Find the Ensembl ID for IL-6. We’re interested in visualizing IL-6 expression. However, the dds object stores genes using Ensembl IDs with version numbers (e.g., “ENSMUSG00000054221.11”).
To find the correct ID for IL-6:
Search the result_LPS_compare data frame for the
Ensembl ID (with version) that corresponds to IL-6.
Write the full Ensembl ID (including version) below:
# Type the ENSMUSG ID (with version) for IL-6 here:
Step2: Use the plotCounts() function to
generate a plot of normalized counts for the IL-6 gene. Check the help
page (?plotCounts) to understand the function and its arguments.
Set the grouping variable with:
intgroup = c("Treatment", "Genotype"). This will allow you
to see how IL-6 expression varies across treatment conditions and
genotypes.
Set returnData = TRUE
?plotCounts
plot_IL6 <- plotCounts(dds,
gene="ENSMUSG00000025746.11",
intgroup = c("Treatment", "Genotype"),
returnData = TRUE)
Step3: Now we are ready to output the results to
ggplot() to add custom aesthetics.
# plotting IL-6
IL6 <- plotCounts(dds,
gene="ENSMUSG00000025746.11",
intgroup = c("Treatment", "Genotype"),
returnData = TRUE) |>
ggplot(aes(x = Treatment,
y = count,
fill = Treatment)) +
geom_boxplot(outlier.shape = NA, color = "black") +
geom_jitter(shape = 16, position = position_jitter(0.2)) +
labs(title = "IL6 expression",
y = "Normalized counts",
x = "Treatment") +
theme_grey() +
theme(plot.title = element_text(hjust = 0.5),
axis.text = element_text(size = 8),
axis.title = element_text(size = 14)) +
facet_grid(~Genotype)
IL6
Step4: Color scales
While the default colors may be fine for many applications, they are often not sufficient to highlight the relationships of interest or are not optimal for the intended audience/publication. For our boxplot, the default colors look great but still we may want some freedom with customization.
There are “cheatsheets” available for specifying the base R colors by name or hexadecimal code, and there are these websites here and here for picking colors or palettes of interest and returning the hexadecimal code(s).
To apply any of these colors to our plot, we can individually specify
the colors by providing them within a scale_fill_manual
layer.
Select two colors that speak to you
Add these colors into the code using
scale_fill_manual.
example:
scale_colour_manual(values = c("red", "blue", "green"))
This line can be inserted before
facet_grid()
# plotting IL-6
IL6 <- plotCounts(dds,
gene="ENSMUSG00000025746.11",
intgroup = c("Treatment", "Genotype"),
returnData = TRUE) |>
ggplot(aes(x = Treatment,
y = count,
fill = Treatment)) +
geom_boxplot(outlier.shape = NA, color = "black") +
geom_jitter(shape = 16, position = position_jitter(0.2)) +
labs(title = "IL6 expression",
y = "Normalized counts",
x = "Treatment") +
theme_grey() +
theme(plot.title = element_text(hjust = 0.5),
axis.text = element_text(size = 8),
axis.title = element_text(size = 14)) +
scale_fill_manual(values = c("#f1a340", "#998ec3")) +
facet_grid(~Genotype)
IL6
Step5: Complete Themes
ggplot2 provides complete themes which control all
non-data display. We have been using theme() to tweak the
display of an existing theme theme_grey().
More themes can be found here: https://ggplot2.tidyverse.org/reference/ggtheme.html
# plotting IL-6
IL6 <- plotCounts(dds,
gene="ENSMUSG00000025746.11",
intgroup = c("Treatment", "Genotype"),
returnData = TRUE) |>
ggplot(aes(x = Treatment,
y = count,
fill = Treatment)) +
geom_boxplot(outlier.shape = NA, color = "black") +
geom_jitter(shape = 16, position = position_jitter(0.2)) +
labs(title = "IL6 expression",
y = "Normalized counts",
x = "Treatment") +
theme_linedraw() +
theme(plot.title = element_text(hjust = 0.5),
axis.text = element_text(size = 8),
axis.title = element_text(size = 10)) +
scale_fill_manual(values = c("#f1a340", "#998ec3")) +
facet_grid(~Genotype)
IL6
Scenario: Stacey, an undergraduate in the Kremenstov lab, knows that Nos2 — a gene that encodes inducible nitric oxide synthase (iNOS)— plays a role in regulating the survival and metabolism of BMDCs (bone marrow-derived dendritic cells). She hypothesizes that, like IL-6, Nos2 expression will be upregulated in LPS-stimulated BMDCs compared to untreated cells. However, she wonders whether this expression will be reduced in BMDCs taken from PWD mice. Help Stacey visualize Nos2 expression to test her hypothesis. Output the results as a PDF (height = 2.5, width = 6).
# plotting Nos2
dev.off()?In R, when you create a plot that goes to a file (like a PDF, PNG, or SVG), you first open a graphics device (e.g., pdf(“plot.pdf”)). This tells R to start sending any plotting commands to that file.
dev.off() closes the graphics device, which:
The pdf() function must come before the plotting
function (ex.plot() or plotCounts()). R by
default will send the plot to the file being created with
pdf(). You will get a blank PDF if the file was opened by
R, after the plot was created as the file never received any
information.
sessionInfo()
## R version 4.5.3 (2026-03-11)
## Platform: x86_64-apple-darwin20
## Running under: macOS Tahoe 26.3.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] fstcore_0.10.0 ggplotify_0.1.3
## [3] patchwork_1.3.2 pathview_1.50.0
## [5] msigdbr_25.1.1 cowplot_1.2.0
## [7] knitr_1.51 enrichplot_1.30.4
## [9] ggnewscale_0.5.2 DOSE_4.4.0
## [11] clusterProfiler_4.18.4 EnhancedVolcano_1.28.2
## [13] ggrepel_0.9.6 RColorBrewer_1.1-3
## [15] pheatmap_1.0.13 genekitr_1.2.8
## [17] ggplot2_4.0.1 tidyr_1.3.2
## [19] dplyr_1.1.4 DESeq2_1.50.2
## [21] SummarizedExperiment_1.40.0 Biobase_2.70.0
## [23] MatrixGenerics_1.22.0 matrixStats_1.5.0
## [25] GenomicRanges_1.62.1 Seqinfo_1.0.0
## [27] IRanges_2.44.0 S4Vectors_0.48.0
## [29] BiocGenerics_0.56.0 generics_0.1.4
##
## loaded via a namespace (and not attached):
## [1] splines_4.5.3 bitops_1.0-9 urltools_1.7.3.1
## [4] tibble_3.3.1 R.oo_1.27.1 triebeard_0.4.1
## [7] polyclip_1.10-7 graph_1.88.1 XML_3.99-0.20
## [10] lifecycle_1.0.5 mixsqp_0.3-54 lattice_0.22-9
## [13] MASS_7.3-65 magrittr_2.0.4 openxlsx_4.2.8.1
## [16] sass_0.4.10 rmarkdown_2.30 jquerylib_0.1.4
## [19] yaml_2.3.12 remotes_2.5.0 otel_0.2.0
## [22] ggtangle_0.1.1 zip_2.3.3 sessioninfo_1.2.3
## [25] pkgbuild_1.4.8 ggvenn_0.1.19 DBI_1.2.3
## [28] abind_1.4-8 pkgload_1.4.1 purrr_1.2.1
## [31] R.utils_2.13.0 RCurl_1.98-1.17 ggraph_2.2.2
## [34] yulab.utils_0.2.3 tweenr_2.0.3 rappdirs_0.3.4
## [37] gdtools_0.4.4 irlba_2.3.5.1 tidytree_0.4.7
## [40] codetools_0.2-20 DelayedArray_0.36.0 xml2_1.5.2
## [43] ggforce_0.5.0 tidyselect_1.2.1 aplot_0.2.9
## [46] farver_2.1.2 viridis_0.6.5 jsonlite_2.0.0
## [49] fst_0.9.8 ellipsis_0.3.2 tidygraph_1.3.1
## [52] systemfonts_1.3.1 tools_4.5.3 progress_1.2.3
## [55] treeio_1.34.0 Rcpp_1.1.1 glue_1.8.0
## [58] gridExtra_2.3 SparseArray_1.10.8 xfun_0.56
## [61] qvalue_2.42.0 usethis_3.2.1 withr_3.0.2
## [64] fastmap_1.2.0 truncnorm_1.0-9 digest_0.6.39
## [67] R6_2.6.1 gridGraphics_0.5-1 GO.db_3.22.0
## [70] RSQLite_2.4.5 R.methodsS3_1.8.2 fontLiberation_0.1.0
## [73] data.table_1.18.0 prettyunits_1.2.0 graphlayouts_1.2.2
## [76] httr_1.4.7 htmlwidgets_1.6.4 S4Arrays_1.10.1
## [79] scatterpie_0.2.6 pkgconfig_2.0.3 gtable_0.3.6
## [82] blob_1.3.0 S7_0.2.1 XVector_0.50.0
## [85] htmltools_0.5.9 fontBitstreamVera_0.1.1 geneset_0.2.7
## [88] fgsea_1.36.2 scales_1.4.0 png_0.1-8
## [91] ashr_2.2-63 ggfun_0.2.0 rstudioapi_0.18.0
## [94] reshape2_1.4.5 nlme_3.1-168 curl_7.0.0
## [97] org.Hs.eg.db_3.22.0 cachem_1.1.0 stringr_1.6.0
## [100] parallel_4.5.3 AnnotationDbi_1.72.0 pillar_1.11.1
## [103] grid_4.5.3 vctrs_0.7.0 tidydr_0.0.6
## [106] cluster_2.1.8.2 Rgraphviz_2.54.0 KEGGgraph_1.70.0
## [109] evaluate_1.0.5 invgamma_1.2 cli_3.6.5
## [112] locfit_1.5-9.12 compiler_4.5.3 rlang_1.1.7
## [115] crayon_1.5.3 SQUAREM_2021.1 labeling_0.4.3
## [118] plyr_1.8.9 fs_1.6.6 ggiraph_0.9.3
## [121] stringi_1.8.7 viridisLite_0.4.2 BiocParallel_1.44.0
## [124] assertthat_0.2.1 babelgene_22.9 Biostrings_2.78.0
## [127] lazyeval_0.2.2 devtools_2.4.6 pacman_0.5.1
## [130] GOSemSim_2.36.0 fontquiver_0.2.1 Matrix_1.7-4
## [133] europepmc_0.4.3 hms_1.1.4 bit64_4.6.0-1
## [136] KEGGREST_1.50.0 igraph_2.2.1 memoise_2.0.1
## [139] bslib_0.9.0 ggtree_4.0.4 fastmatch_1.1-8
## [142] bit_4.6.0 ape_5.8-1 gson_0.1.0
title: “Analyzing RNA-Seq with DESeq2” author: “Michael I. Love, Simon Anders, and Wolfgang Huber”
Note: if you use DESeq2 in published research, please cite:
Love, M.I., Huber, W., Anders, S. (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15:550. 10.1186/s13059-014-0550-8
Sabikunnahar, B., Snyder, J. P., Rodriguez, P. D., Sessions, K. J., Amiel, E., Frietze, S. E., & Krementsov, D. N. (2025). Natural genetic variation in wild-derived mice controls host survival and transcriptional responses during endotoxic shock. ImmunoHorizons, 9(5), vlaf007. https://doi.org/10.1093/immhor/vlaf007