In Part I, differential gene expression of Salmonella enterica isolates was analyzed to determine how these signatures varied between the three serovars tested: Heidelberg, Typhimurium, and monophasic Typhimurium. We hypothesized that the gene expression observed in the monophasic Typhimurium and Typhimurium isolates would be distinct from the Heidelberg isolates, as the former two groups are more closely related. This anaylsis will analyze three isolates that are specifically Salmonella enterica serovar Heidelberg, and evaluate differences in gene expression to better understand what is potentially attributed to these differences. RNA-seq was performed on six outbreak-associated strains from four different outbreaks. The isolate naming scheme used was iso#_serovar_foodsource_replicate#. There were three biological replicates for each isolate.
The wet-lab methods were as follows:
Bacterial cultures were grown.
RNA extracted in stationary phase with the PureLink RNA Mini Kit (Invitrogen).
rRNA depletion with NEBNext rRNA Depletion Kit (Bacteria) from New England Biolabs and again with the Zymo RiboFree Assay kit by the Vermont Integrative Genomics Core.
Sequencing by the Vermont Integrative Genomics Core with the Illumina HiSeq (150 bp paired-end) on two flow cells.
The Vermont Integrative Genomics Core transferred the 72 FASTQ files us via Globus, we uploaded our the Vermont Advanced Computing Center (VACC) home directories. In the VACC, the FASTQ files were organized as follows:
The files, respective of forward and backward read, replicate number, and sample were concatenated to combine reads from the two different flow cells.
Quality assessment was performed with FastQC v.0.11.7 and MultiQC v.1.0.
Illumina adapters and reads with phred scores of less tha 20 were trimmed with Trimgalore v.0.6.4 and Cutadapt v.4.6.
Quality assessment was performed a second time, adapter contamination was gone.
The FASTQ files were aligned to the Salmonella enterica subsp. enterica serovar Heidelberg str. SL476 reference genome ASM2070v1 (GCF_000020705.1).
Counting was performed with HTSeq v.0.11.2.
Finally, we proceeded with the differential expression (DE) analysis below. In the directory with this Rmarkdown script, there are a number of additional files necessary to run this script fully. This includes:
18 *_sorted.counts files
“salmonella_annotation_BlackwellPatch.csv” to specify the sample/serovar that the count files were derived from
First, a DE Seq analysis will be performed for all of the isolates, using isolate 1 as the control to observe preliminary clustering and relations between isolates. Then, three pairwise comparisons will serve as an in-depth analysis.
library(knitr)
library(DESeq2)
library(RColorBrewer)
library(pheatmap)
library(ggplot2)
library(tidyverse)
library(EnhancedVolcano)
library(biomaRt)
library(dplyr)
library(tidyr)
library(cowplot)
library(ggplotify)
First, we will load the annotation file called
salmonella_annotation_BlackwellPatch.csv
below, then filter
to only include the Salmonella Heidelberg.
sampleTable <- read.csv("salmonella_annotation_BlackwellPatch.csv", header = TRUE, sep = ",")
heidelberg_matrix <- sampleTable %>%
filter(serovar == "Heidelberg")
heidelberg_matrix <- data.frame(heidelberg_matrix)
Next, we will specify factors. We will specify the columns “serovar”, “outbreak”, “Source”, and “Heat_Shock” as a factors.
#specify serovar as a factor
serovar <- factor(heidelberg_matrix$serovar)
#specify outbreak as a factor
outbreak <- factor(heidelberg_matrix$outbreak)
#specify source as a factor
Source <- factor(heidelberg_matrix$Source)
#specify heat shock level as a factor
#number of minutes indicates survival time of the isolate experiencing heat stress, so higher num. of minutes indicates more heat tolerant strain
heattolerance <- factor(heidelberg_matrix$Heat_Shock)
Design will be by outbreak. One of the isolates is from the Foster Farms outbreak (1), and two isolates are from the Kosher liver outbreak (4 and 5).
#DE Seq set up
ddsHTSeq_heidelberg <- DESeqDataSetFromHTSeqCount(heidelberg_matrix,
directory = ".",
design= ~ outbreak)
We can filter the rows to only keep those that have at least 30 reads total.
#keep 30 reads and above
keep <- rowSums(counts(ddsHTSeq_heidelberg)) >= 30
ddsHTSeq_heidelberg <- ddsHTSeq_heidelberg[keep,]
We will set the contrast to outbreak.
#run DE Seq and create object for results
ddsHTSeq_heidelberg <- DESeq(ddsHTSeq_heidelberg)
results_heidelberg <- results(ddsHTSeq_heidelberg, contrast = c("outbreak", "foster_farms", "kosher_liver"))
We can now order our results table by the smallest padj value, then filter:
#order results
result <- results_heidelberg[order(results_heidelberg$padj), ]
#filter by p-adj
table(result$padj<0.05)
##
## FALSE TRUE
## 4382 198
Now, we can merge the above output
(i.e. result
) with the normalized counts data
(i.e. ddsHTSeq_heidelberg
).
#merging
resdata_heidelberg <- merge(as.data.frame(result), as.data.frame(counts(ddsHTSeq_heidelberg, normalized=TRUE)), by="row.names", sort=FALSE)
Next, we can relabel the first column Gene_id
using the
names()
function.
#rename first column
names(resdata_heidelberg)[1] <- "Gene_id"
#write results into a csv
write.csv(resdata_heidelberg, file="Heidelberg_matrix.csv")
Now we will make a PCA plot. Each data point represents one sample. The x-axis, PC1, explains the largest source of variation in the dataset, which in this case is the outbreak from which the Salmonella Heidelberg isolates were obtained. We can see that the isolates are clustering by outbreak. Now, we can dive deeper into pairwise comparisons by specific isolate and source.
#PCA
heidelberg_counts <- DESeq2::vst(ddsHTSeq_heidelberg, blind = FALSE)
pca_heidelberg <- plotPCA(heidelberg_counts, intgroup=c("outbreak"), returnData=TRUE)
percentVar <- round(100 * attr(pca_heidelberg, "percentVar"))
ggplot(pca_heidelberg, aes(PC1, PC2, color=outbreak)) +
geom_point(size=3) +
ylim(-10,10) +
xlab(paste0("PC1: ",percentVar[1],"% variance")) +
ylab(paste0("PC2: ",percentVar[2],"% variance")) +
coord_fixed()
Figure 1. PCA plot of all nine Heidelberg isolates, clearly clustering by outbreak There is a 94% variance between the three Foster Farms chicken isolates and six Kosher Liver isolates (PC1), and a 5% variance between the groups of isolates themselves (PC2).’
Isolates 1, 4, and 5 are Salmonella Heidelberg; 4 and 5 are from the same outbreak, but they are all from different sources: * Iso 1 = chicken tenderloin * Iso 4 = fresh chicken liver * Iso 5 = broiled chicken liver
We can filter the rows to select only Iso 1 and Iso 4. Remember, Iso 1 consists of 3 replicates from chicken liver samples from the Foster Farms outbreak. Iso 4 consists of 3 replicates of fresh chicken liver from the Kosher Liver outbreak.
The next steps are very similar to above, as we are calling in the sample table to filter and keep only the isolates of interest. We will then re-specify factors before performing DE Seq.
#Filter origional sample table to select only Iso 1 and Iso 4, create a dataframe
iso_1v4_matrix <- dplyr::slice(sampleTable, -(4:9), -(13:18))
iso_1v4_matrix <- data.frame(iso_1v4_matrix)
Next, we will specify factors. We will specify the columns “outbreak”, “Source” and “Heat_Shock” as factors.
#specify outbreak as a factor
outbreak <- factor(iso_1v4_matrix$outbreak)
#specify source as a factor
Source <- factor(iso_1v4_matrix$Source)
#specify heat shock level as a factor, the number of minutes indicates survival time of the isolate experiencing heat stress (so higher value indicates a more heat tolerant strain)
heattolerance <- factor(iso_1v4_matrix$Heat_Shock)
We will select design by outbreak. We expect to see a greater difference by outbreak rather than by source or level of heat shock, and we are hoping to compare our findings to epidemiological and clinical relevance of the outbreak.
#Run DESeq
ddsHTSeq_heidelberg_1v4 <- DESeqDataSetFromHTSeqCount(iso_1v4_matrix,
directory = ".",
design= ~ outbreak)
We can filter the rows to only keep those that have at least 30 reads
total. Right now we have 4,754 rows. Notice that line 1 is creating an
object called keep
that will identify any row greater than
or equal to 30.
keep <- rowSums(counts(ddsHTSeq_heidelberg_1v4)) >= 30
ddsHTSeq_heidelberg_1v4 <- ddsHTSeq_heidelberg_1v4[keep,]
The standard differential expression analysis steps are wrapped into
a single function, DESeq()
. The estimation steps performed
by this function are described in the manual page for
?DESeq
.
ddsHTSeq_heidelberg_1v4 <- DESeq(ddsHTSeq_heidelberg_1v4)
We will use the contrast
argument, and contrast by
outbreak.
#contrast
results_heidelberg_1v4 <- results(ddsHTSeq_heidelberg_1v4, contrast = c("outbreak", "foster_farms", "kosher_liver"))
#in order by p adj value
result_1v4 <- results_heidelberg_1v4[order(results_heidelberg_1v4$padj), ]
table(result_1v4$padj<0.05)
##
## FALSE TRUE
## 4430 148
Now, we can merge the above output
(i.e. result
) with the normalized counts data
(i.e. ddsHTSeq_heidelberg
).
#merging
resdata_heidelberg_1v4 <- merge(as.data.frame(result_1v4), as.data.frame(counts(ddsHTSeq_heidelberg_1v4, normalized=TRUE)), by="row.names", sort=FALSE)
Next, we can relabel the first column Gene_id
using the
names()
function. Then finally, write the results to a new
.csv file called Heidelberg_matrix.csv
#rename column
names(resdata_heidelberg_1v4)[1] <- "Gene_id"
#write results into a csv
write.csv(resdata_heidelberg_1v4, file="resdata_heidelberg_1v4.csv")
Now we will make a PCA plot. Each data point represents one sample. The x-axis, PC1, explains the largest source of variation in the dataset, which in this case is the outbreak from which the Salmonella Heidelberg isolates were obtained. We can see that the isolates are clustering by outbreak. Now, we can dive deeper into pairwise comparisons by specific isolate and source.
#PCA
heidelberg_counts_1v4 <- DESeq2::vst(ddsHTSeq_heidelberg_1v4, blind = FALSE)
#creating an object and changing
pca_heidelberg_1v4 <- plotPCA(heidelberg_counts_1v4, intgroup=c("outbreak"), returnData=TRUE)
percentVar <- round(100 * attr(pca_heidelberg_1v4, "percentVar"))
ggplot(pca_heidelberg_1v4, aes(PC1, PC2, color=outbreak)) +
geom_point(size=3) +
ylim(-10,10) +
xlab(paste0("PC1: ",percentVar[1],"% variance")) +
ylab(paste0("PC2: ",percentVar[2],"% variance")) +
coord_fixed() +
ggtitle("Heidelberg Iso 1 (Foster Farms) vs Iso 4 (Kosher Liver)")
ggsave("Heidelberg_1v4_PCA.png")
## Saving 7 x 5 in image
Figure 2. PCA plot shows clustering by outbreak. There is a 95% PC1 variance by outbreak, and a 4% PC2 variance between respective outbreaks’ replicates, indicating that there are variations among replicates within each isolate, but the isolates of different outbreaks are vastly different.
This will visualize the relationship between the samples.
rld <- rlog(ddsHTSeq_heidelberg_1v4, blind=FALSE)
rld_mat <- assay(rld)
rld_cor <- cor(rld_mat)
pheatmap(rld_cor)
ggsave("Heidelberg_1v4_CorHeatMap.png")
## Saving 7 x 5 in image
Figure 3. Correlation heatmap between Isolate 1 chicken samples (three reps) and Isolate 4 liver samples (three reps) shows distinctive clustering by outbreak.
First, we need to add a column that specifies whether the data is
differentially expressed or not. This threshold will filter based on the
p adjusted value, and will TRUE
will mean that it
has a value of <0.05.
#filter
threshold_padj_1v4 <- resdata_heidelberg_1v4$padj < 0.05
length(which(threshold_padj_1v4))
## [1] 148
#Add vector as a column (threshold) to the resdata
resdata_heidelberg_1v4$threshold <- threshold_padj_1v4
Now we will add columns to specify upregulated or downregulated; if log2FC > 1 and padj < 0.05 set as upregulated, and if log2FC > -1 and padj < 0.05 set as downregulated.
#Add a column to dataframe to specify up or down regulated, then specify
resdata_heidelberg_1v4$diffexpressed <- "Unchanged"
resdata_heidelberg_1v4$diffexpressed[resdata_heidelberg_1v4$padj < 0.05 & resdata_heidelberg_1v4$log2FoldChange > 1] <- "Upregulated"
resdata_heidelberg_1v4$diffexpressed[resdata_heidelberg_1v4$padj < 0.05 & resdata_heidelberg_1v4$log2FoldChange < -1] <- "Downregulated"
Next we can filter out results to include ONLY data with a P-adj of 0.05 or less, and a log 2 fold change of greater than 1.
resdata_sig_heidelberg_1v4 <- as.data.frame(resdata_heidelberg_1v4) %>% dplyr::filter(abs(log2FoldChange) > 1 & padj < 0.05)
# Count the number of rows where diffexpressed is "Upregulated"
upregulated_count_heidelberg_1v4 <- nrow(resdata_sig_heidelberg_1v4[resdata_sig_heidelberg_1v4$diffexpressed == "Upregulated", ])
upregulated_count_heidelberg_1v4
## [1] 122
downregulated_count_heidelberg_1v4 <- nrow(resdata_sig_heidelberg_1v4[resdata_sig_heidelberg_1v4$diffexpressed == "Downregulated", ])
downregulated_count_heidelberg_1v4
## [1] 1
Now we can actually create the volcano plot!
p <- ggplot(resdata_heidelberg_1v4) +
geom_point(aes(x=log2FoldChange,
y=-log10(padj), colour=diffexpressed)) +
theme_classic() +
ggtitle("Heidelberg isolates: Iso 1 vs Iso 4") +
xlab("Log2 fold change") +
ylab("-Log10(padj)") +
theme(legend.position = "right",
plot.title = element_text(size = rel(1.25), hjust = 0.5),
axis.title = element_text(size = rel(1.25))) +
scale_color_manual(values = c("blue", "black", "red"),
labels = c("Downregulated", "Not Significant", "Upregulated")) +
scale_x_continuous(limits = c(-20, 20)) +
coord_cartesian(ylim=c(0, 300), clip = "off")
# Add annotation
p + annotate("text", x = 15, y = 10, label = "122", colour = "red", size = 8, hjust = 0) +
annotate("text", x = -15, y = 10, label = "1", colour = "blue", size = 8, hjust = 1)
ggsave("Heidelberg_1v4_Volcano.png")
## Saving 7 x 5 in image
Figure 4. Volcano plot shows 122 upregulated genes and 1 downregulated gene in isolate 1 (Foster Farms Outbreak) when compared to isolate 4 (Kosher Liver Outbreak). Most of these genes are not annotated, and the one downregulated gene is rRNA. It is suspected that this could possibly encode for heightened stress response, but could also potentially be a result of incomplete rRNA depletion.
An enhanced volcano plot will label some of the DE genes, as well as showing which are are significant by both P-adj and log 2 fold change.
#filter
resdata_sig_heidelberg_1v4 <- as.data.frame(resdata_heidelberg_1v4) %>% dplyr::filter(abs(log2FoldChange) > 1 & padj < 0.05)
#create enhanced volcano
EnhancedVolcano(resdata_heidelberg_1v4,
lab = as.character(resdata_heidelberg_1v4$Gene_id),
x = 'log2FoldChange',
y = 'padj',
title = "Heidelberg: Foster Farms[1] v Kosher Liver[4]",
titleLabSize = 15,
subtitleLabSize = 10,
FCcutoff = 1,
pCutoff = 0.05,
labSize = 4,
axisLabSize = 12,
col=c('black', 'black', 'black', 'red3'),
labCol = 'black',
labFace = 'bold',
boxedLabels = TRUE,
legendLabels=c('Not sig.','Log2FC','padj', 'padj & Log2FC'),
legendPosition = 'right',
legendLabSize = 10,
legendIconSize = 3.0,
drawConnectors = TRUE,
widthConnectors = 1.0,
colConnectors = 'black',
gridlines.major = FALSE,
gridlines.minor = FALSE,
max.overlaps = 20)
ggsave("Heidelberg_1v4_Enhancedvolano.png")
## Saving 7 x 5 in image
Figure 5. Enhanced volcano plot shows upregulation by p-adj and Log2FC. Simiarly to the volcano plot, DE genes are overwhelmingly upregulated in the Foster Farms chicken isolate 1 when compared to the Kosher liver isolate 4. Gene-SEHA_RS20475 (unannotated, encodes for rRNA) was downregulated. Gene-SEHA_RS22775 and gene-SEHA_RS00260 are both unannotated. Gene-SEHA_22820 encodes for tet(B) (tetracycline) resistance in the Foster Farms chicken, isolate 1. tet(B) was found in the genome, and tetracycline resistance was cited as a specific clinical issue in the outbreak.
A heatmap is another tool used to visualize DE genes among the isolates and replicates within isolates.
#convert into data frame
all_matrix_1v4 <- data.frame(resdata_sig_heidelberg_1v4)
#select only iso 1 and iso 4 and gene ID columns
heatmap_matrix_1v4 <- resdata_sig_heidelberg_1v4 %>% dplyr::select(Gene_id, iso1_Heidelberg_chicken_1:iso4_Heidelberg_liver_3)
heatmap_matrix_1v4 <- drop_na(heatmap_matrix_1v4)
heatmap_matrix_1v4 <- data.frame(heatmap_matrix_1v4)
#select sample name and outbreak info from meta file
heatmap_meta_1v4 <- iso_1v4_matrix %>%
dplyr::select(samplename, outbreak)
heatmap_meta_1v4 <- data.frame(heatmap_meta_1v4)
# Assign the sample names to be the row names
rownames(heatmap_meta_1v4) <- heatmap_meta_1v4$samplename
#Specify the outbreak
heatmap_meta_1v4 <- dplyr::select(heatmap_meta_1v4, outbreak)
#Re-level factors by outbreak
heatmap_meta_1v4$outbreak <- factor(heatmap_meta_1v4$outbreak,
levels = c("foster_farms", "kosher_liver"))
#Aesthetics
heatmap_colors <- rev(brewer.pal(10, "RdBu"))
ann_colors <- list(outbreak = c(foster_farms = "#D11313", kosher_liver = "#6DB0F8"))
#grab the Gene IDs
rownames(heatmap_matrix_1v4) <- heatmap_matrix_1v4$Gene_id
genes_for_heatmap <- grep("Gene_id", colnames(heatmap_matrix_1v4))
heatmap_matrix_1v4 <- heatmap_matrix_1v4[,-genes_for_heatmap]
heatmap_matrix_1v4 <- data.frame(heatmap_matrix_1v4)
heatmap_1v4 <- as.ggplot(pheatmap(heatmap_matrix_1v4,
color = heatmap_colors,
annotation_col = heatmap_meta_1v4,
annotation_colors = ann_colors,
show_colnames = FALSE,
show_rownames = FALSE,
scale = "row"))
ggsave("heatmap_figure_1v4.png")
Figure 6. Heatmap shows downregluation in Iso 4 (Kosher Liver) in most genes when comapred to Iso 1 (Foster Farms chicken). Overall, there is less variation among the Kosher liver Iso 4 samples, and more variation between the Iso 1 Foster Farms isolates. Despite uniformity among most DE genes, there is upregulation in one gene in one rep in the Kosher liver isolate. Similarly, in that same rep, there is downregulation among all three Iso 1 reps.
Boxplots will help us to visualize a few selected genes of interest so more closely investigate their differential expression.
We will be using all_matrix_1v4
and
heatmap_meta_1v4
#files
head(all_matrix_1v4)
## Gene_id baseMean log2FoldChange lfcSE stat pvalue
## 1 gene-SEHA_RS22775 12645.345 12.88613 0.4521659 28.49869 1.216049e-178
## 2 gene-SEHA_RS00260 5267.191 12.49762 0.6050720 20.65476 8.845504e-95
## 3 gene-SEHA_RS22820 15126.485 14.60285 0.7300353 20.00294 5.192277e-89
## 4 gene-SEHA_RS00055 45717.286 19.04702 1.1813897 16.12256 1.771243e-58
## 5 gene-SEHA_RS00050 27156.368 16.44527 1.0240578 16.05893 4.950419e-58
## 6 gene-SEHA_RS00315 38356.275 18.79374 1.1822737 15.89627 6.725057e-57
## padj iso1_Heidelberg_chicken_1 iso1_Heidelberg_chicken_2
## 1 5.567073e-175 22675.56 28150.23
## 2 2.024736e-91 10332.36 11820.67
## 3 7.923415e-86 28034.71 32861.80
## 4 2.027188e-55 93682.04 92809.25
## 5 4.532604e-55 55417.62 55113.39
## 6 5.131219e-54 74490.29 82833.78
## iso1_Heidelberg_chicken_3 iso4_Heidelberg_liver_1 iso4_Heidelberg_liver_2
## 1 25036.781 5.802579 1.060267
## 2 9445.141 4.973639 0.000000
## 3 29859.088 3.315759 0.000000
## 4 87812.429 0.000000 0.000000
## 5 52405.264 0.000000 1.060267
## 6 72813.576 0.000000 0.000000
## iso4_Heidelberg_liver_3 threshold diffexpressed
## 1 2.6380856 TRUE Upregulated
## 2 0.0000000 TRUE Upregulated
## 3 0.0000000 TRUE Upregulated
## 4 0.0000000 TRUE Upregulated
## 5 0.8793619 TRUE Upregulated
## 6 0.0000000 TRUE Upregulated
head(heatmap_meta_1v4)
## outbreak
## iso1_Heidelberg_chicken_1 foster_farms
## iso1_Heidelberg_chicken_2 foster_farms
## iso1_Heidelberg_chicken_3 foster_farms
## iso4_Heidelberg_liver_1 kosher_liver
## iso4_Heidelberg_liver_2 kosher_liver
## iso4_Heidelberg_liver_3 kosher_liver
This is to narrow down to a handful of DE genes.
#narrowing down DE genes
order_matrix_1v4_top20 <- arrange(all_matrix_1v4, padj) %>%
dplyr::select(Gene_id, iso1_Heidelberg_chicken_1:iso4_Heidelberg_liver_3) %>%
dplyr::slice_head(n=20)
#expression plot
expression_plot_1v4 <- tidyr::pivot_longer(order_matrix_1v4_top20,
cols = 2:7,
values_to = "normalized_counts")
#Adding outbreak column to pivoted data frame
expression_plot_1v4$outbreak <- rep(c("foster_farms", "foster_farms", "foster_farms", "kosher_liver", "kosher_liver", "kosher_liver"),times=20)
expression_plot_1v4 <- data.frame(expression_plot_1v4)
I will be selecting gene-SEHA_RS22775
,
gene-SEHA_RS00260
, and gene-SEHA_RS22820
;
which were all upregulated in the Foster Farms chicken outbreak when
compared to the Kosher Liver outbreak.
#create expression plots for all
RS22775_exp <- expression_plot_1v4 %>%
filter(Gene_id == "gene-SEHA_RS22775")
RS00260_exp <- expression_plot_1v4 %>%
filter(Gene_id == "gene-SEHA_RS00260")
RS22820_exp <- expression_plot_1v4 %>%
filter(Gene_id == "gene-SEHA_RS22820")
#Re-factor the x-axis variable to be in the correct order
RS22775_exp$outbreak <- factor(RS22775_exp$outbreak, levels = c("foster_farms", "kosher_liver"))
Now we can make the three box plots for each gene.
#RS22775
ggplot(RS22775_exp) +
geom_boxplot(aes(x = outbreak,
y = normalized_counts,
fill = outbreak)) +
theme_bw() +
ggtitle("RS22775_exp") +
xlab("Outbreak") +
ylab("Normalized Counts") +
theme(legend.position = "right",
plot.title = element_text(size = rel(1.25), hjust = 0.5),
axis.title = element_text(size = rel(0.75)),
axis.text = element_text(size = rel(0.75)))
ggsave("Hei_1v4_RS22775.png")
## Saving 7 x 5 in image
Figure 7. Boxplot shows extreme upregulation in Iso 1 Foster Farms when compared to Iso 4 Kosher Liver. Unfortunately, this gene is not annotated, so it cannot be evaluated extensively. It is interesting to note that the normalized overexpression counts is extremely high.
#RS00260
ggplot(RS00260_exp) +
geom_boxplot(aes(x = outbreak,
y = normalized_counts,
fill = outbreak)) +
theme_bw() +
ggtitle("RS00260_exp") +
xlab("Outbreak") +
ylab("Normalized Counts") +
theme(legend.position = "right",
plot.title = element_text(size = rel(1.25), hjust = 0.5),
axis.title = element_text(size = rel(0.75)),
axis.text = element_text(size = rel(0.75)))
ggsave("Hei_1v4_RS00260.png")
## Saving 7 x 5 in image
Figure 8. Boxplot shows extreme upregulation in Iso 1 Foster Farms when compared to Iso 4 Kosher Liver. Similarly to Figure 7, this gene is not annotated.
#RS22820
ggplot(RS22820_exp) +
geom_boxplot(aes(x = outbreak,
y = normalized_counts,
fill = outbreak)) +
theme_bw() +
ggtitle("RS22820_exp") +
xlab("Outbreak") +
ylab("Normalized Counts") +
theme(legend.position = "right",
plot.title = element_text(size = rel(1.25), hjust = 0.5),
axis.title = element_text(size = rel(0.75)),
axis.text = element_text(size = rel(0.75)))
ggsave("Hei_1v4_RS22820.png")
## Saving 7 x 5 in image
Figure 9. Boxplot shows extreme upregulation in Iso 1 Foster Farms when compared to Iso 4 Kosher Liver. This gene is annotated, and encodes for tet(B), tetracycline resistance. As mentioned earlier, tetracycline resistance was cited to be an issue in the outbreaks. In the Foster Farms outbreak, 65% of isolates were drug resistant, and 35% were multidrug resistant. There was an extremely high hospitalization rate of 38%, and invasive illness was seen at a much higher rate (15%) than normal (5%).
We can filter the rows to select only Iso 1 and Iso 5. Remember, Iso 1 consists of 3 replicates from chicken liver samples from the Foster Farms outbreak. Iso 5 consists of 3 replicates of broiled chicken liver from the Kosher Liver outbreak.
#Call in sampleTable
head(sampleTable)
## samplename filename isolate replicate
## 1 iso1_Heidelberg_chicken_1 iso1_rep1_sorted.count 1 1
## 2 iso1_Heidelberg_chicken_2 iso1_rep2_sorted.count 1 2
## 3 iso1_Heidelberg_chicken_3 iso1_rep3_sorted.count 1 3
## 4 iso2_MonoTyph_pork_1 iso2_rep1_sorted.count 2 1
## 5 iso2_MonoTyph_pork_2 iso2_rep2_sorted.count 2 2
## 6 iso2_MonoTyph_pork_3 iso2_rep3_sorted.count 2 3
## serovar outbreak Source Heat_Shock
## 1 Heidelberg foster_farms chicken_tenderloin <NA>
## 2 Heidelberg foster_farms chicken_tenderloin <NA>
## 3 Heidelberg foster_farms chicken_tenderloin <NA>
## 4 I(4,5,12);i- roast_pork swine 60_min
## 5 I(4,5,12);i- roast_pork swine 60_min
## 6 I(4,5,12);i- roast_pork swine 60_min
#Filter to select only Iso 1 and Iso 5, create a dataframe
iso_1v5_matrix <- dplyr::slice(sampleTable, -(4:12), -(16:18))
iso_1v5_matrix <- data.frame(iso_1v5_matrix)
Next, we will specify factors. We will specify the columns “outbreak”, “Source” and “Heat_Shock” as factors.
#specify outbreak as a factor
outbreak <- factor(iso_1v5_matrix$outbreak)
#specify source as a factor
Source <- factor(iso_1v5_matrix$Source)
#specify heat shock level as a factor, the number of minutes indicates survival time of the isolate experiencing heat stress (so higher value indicates a more heat tolerant strain)
heattolerance <- factor(iso_1v5_matrix$Heat_Shock)
We will select design by outbreak. We expect to see a greater difference by outbreak rather than by source or level of heat shock, and we are hoping to compare our findings to epidemiological and clinical relevance of the outbreak.
ddsHTSeq_heidelberg_1v5 <- DESeqDataSetFromHTSeqCount(iso_1v5_matrix,
directory = ".",
design= ~ outbreak)
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
We can filter the rows to only keep those that have at least 30 reads
total. Right now we have 4,754 rows. Notice that line 1 is creating an
object called keep
that will identify any row greater than
or equal to 30.
keep <- rowSums(counts(ddsHTSeq_heidelberg_1v5)) >= 30
ddsHTSeq_heidelberg_1v5 <- ddsHTSeq_heidelberg_1v5[keep,]
The standard differential expression analysis steps are wrapped into
a single function, DESeq()
. The estimation steps performed
by this function are described in the manual page for
?DESeq
.
ddsHTSeq_heidelberg_1v5 <- DESeq(ddsHTSeq_heidelberg_1v5)
We will use the contrast
argument
results_heidelberg_1v5 <- results(ddsHTSeq_heidelberg_1v5, contrast = c("outbreak", "foster_farms", "kosher_liver"))
#in order by p adj value
result_1v5 <- results_heidelberg_1v5[order(results_heidelberg_1v5$padj), ]
table(result_1v5$padj<0.05)
##
## FALSE TRUE
## 4415 165
Now, we can merge the above output
(i.e. result
) with the normalized counts data
(i.e. ddsHTSeq_heidelberg
).
resdata_heidelberg_1v5 <- merge(as.data.frame(result_1v5), as.data.frame(counts(ddsHTSeq_heidelberg_1v5, normalized=TRUE)), by="row.names", sort=FALSE)
Next, we can relabel the first column Gene_id
using the
names()
function. Then finally, write the results to a new
.csv file called Heidelberg_matrix.csv
#rename just a column... names(object)[column num] <- "new name"
names(resdata_heidelberg_1v5)[1] <- "Gene_id"
#write results into a csv
write.csv(resdata_heidelberg_1v5, file="resdata_heidelberg_1v5.csv")
Now we will make a PCA plot. Each data point represents one sample. The x-axis, PC1, explains the largest source of variation in the dataset, which in this case is the outbreak from which the Salmonella Heidelberg isolates were obtained. We can see that the isolates are clustering by outbreak. Now, we can dive deeper into pairwise comparisons by specific isolate and source.
heidelberg_counts_1v5 <- DESeq2::vst(ddsHTSeq_heidelberg_1v5, blind = FALSE)
pca_heidelberg_1v5 <- plotPCA(heidelberg_counts_1v5, intgroup=c("outbreak"), returnData=TRUE)
percentVar <- round(100 * attr(pca_heidelberg_1v5, "percentVar"))
ggplot(pca_heidelberg_1v5, aes(PC1, PC2, color=outbreak)) +
geom_point(size=3) +
ylim(-10,10) +
xlab(paste0("PC1: ",percentVar[1],"% variance")) +
ylab(paste0("PC2: ",percentVar[2],"% variance")) +
coord_fixed() +
ggtitle("Heidelberg Iso 1 (Foster Farms) vs Iso 5 (Kosher Liver)")
ggsave("Heidelberg_1v5_PCA.png")
## Saving 7 x 5 in image
Figure 10. PCA plot showing isolates/repetitions clustering by outbreak. Iso 1 (Foster Farms chicken) clustering with 4% variation within repetitions (PC2), and 95% variance when compared to Iso 4 (Kosher Liver) (PC1).
This will visualize the relationship between the samples.
library(pheatmap)
rld <- rlog(ddsHTSeq_heidelberg_1v5, blind=FALSE)
rld_mat <- assay(rld)
rld_cor <- cor(rld_mat)
pheatmap(rld_cor)
ggsave("Heidelberg_1v5_CorHeatMap.png")
## Saving 7 x 5 in image
Figure 11. Correlation heatmap of Iso 1 (Foster Farms chicken) compared to Iso 4 (Kosher Liver).
First, we need to add a column that specifies whether the data is
differentially expressed or not. This threshold will filter based on the
p adjusted value, and will TRUE
will mean that it
has a value of <0.05.
threshold_padj_1v5 <- resdata_heidelberg_1v5$padj < 0.05
length(which(threshold_padj_1v5))
## [1] 165
## Add vector as a column (threshold) to the resdata
resdata_heidelberg_1v5$threshold <- threshold_padj_1v5
#Add a column to dataframe to specify up or down regulated
resdata_heidelberg_1v5$diffexpressed <- "Unchanged"
#if log2FC > 1 and padj < 0.05 set as upregulated
resdata_heidelberg_1v5$diffexpressed[resdata_heidelberg_1v5$padj < 0.05 & resdata_heidelberg_1v5$log2FoldChange > 1] <- "Upregulated"
#if log2FC > -1 and padj < 0.05 set as downregulated
resdata_heidelberg_1v5$diffexpressed[resdata_heidelberg_1v5$padj < 0.05 & resdata_heidelberg_1v5$log2FoldChange < -1] <- "Downregulated"
#filter
resdata_sig_heidelberg_1v5 <- as.data.frame(resdata_heidelberg_1v5) %>% dplyr::filter(abs(log2FoldChange) > 1 & padj < 0.05)
# Count the number of rows where diffexpressed is "Upregulated"
upregulated_count_heidelberg_1v5 <- nrow(resdata_sig_heidelberg_1v5[resdata_sig_heidelberg_1v5$diffexpressed == "Upregulated", ])
downregulated_count_heidelberg_1v5 <- nrow(resdata_sig_heidelberg_1v5[resdata_sig_heidelberg_1v5$diffexpressed == "Downregulated", ])
#create plot
p <- ggplot(resdata_heidelberg_1v5) +
geom_point(aes(x=log2FoldChange,
y=-log10(padj), colour=diffexpressed)) +
theme_classic() +
ggtitle("Heidelberg isolates: Foster Farms Outbreak [1] vs Kosher Liver Outbreak [5]") +
xlab("Log2 fold change") +
ylab("-Log10(padj)") +
theme(legend.position = "right",
plot.title = element_text(size = rel(1.25), hjust = 0.5),
axis.title = element_text(size = rel(1.25))) +
scale_color_manual(values = c("blue", "black", "red"),
labels = c("Downregulated", "Not Significant", "Upregulated")) +
scale_x_continuous(limits = c(-20, 20)) +
coord_cartesian(ylim=c(0, 300), clip = "off")
# Add annotation
p + annotate("text", x = 15, y = 10, label = "124", colour = "red", size = 8, hjust = 0) +
annotate("text", x = -15, y = 10, label = "1", colour = "blue", size = 8, hjust = 1)
ggsave("Heidelberg_1v5_Volcano.png")
## Saving 7 x 5 in image
Figure 12. Volcano plot shows vast upregulation of genes in Iso 1 (Foster Farms chicken) when compared to Iso 4 (Kosher liver). There are 125 DE genes; 124 upregulated and 1 downregulated. The one downregulated gene is the same rRNA seen int he 1v4 comparison, which is not annotated. Between this pairwise comparison and pairwise comparison #1, 120 of the DE genes are identical. This implies that Iso 1 is very distinct from Iso 4 and Iso 5, which are very similar to eachother.
#filter
resdata_sig_heidelberg_1v5 <- as.data.frame(resdata_heidelberg_1v5) %>% dplyr::filter(abs(log2FoldChange) > 1 & padj < 0.05)
#create plot
EnhancedVolcano(resdata_heidelberg_1v5,
lab = as.character(resdata_heidelberg_1v5$Gene_id),
x = 'log2FoldChange',
y = 'padj',
title = "Heidelberg: Foster Farms[1] v Kosher Liver[5]",
titleLabSize = 15,
subtitleLabSize = 10,
FCcutoff = 1,
pCutoff = 0.05,
labSize = 4,
axisLabSize = 12,
col=c('black', 'black', 'black', 'red3'),
labCol = 'black',
labFace = 'bold',
boxedLabels = TRUE,
legendLabels=c('Not sig.','Log2FC','padj', 'padj & Log2FC'),
legendPosition = 'right',
legendLabSize = 10,
legendIconSize = 3.0,
drawConnectors = TRUE,
widthConnectors = 1.0,
colConnectors = 'black',
gridlines.major = FALSE,
gridlines.minor = FALSE,
max.overlaps = 20)
ggsave("Heidelberg_1v5_Enhancedvolano.png")
## Saving 7 x 5 in image
Figure 13. Enhanced volcano plot shows DE gene expression among Iso 1 (Foster Farms chicken) when compared to Iso 5 (Kosher liver). Two genes are labeled, gene-SEHA_RS08070 (APH-6, encodes for streptomycin resistance), and gene-SEHA_RS01895 (unannotated).
First we need to read in the data and set up.
all_matrix_1v5 <- data.frame(resdata_sig_heidelberg_1v5)
#select the data of interest
heatmap_matrix_1v5 <- resdata_sig_heidelberg_1v5 %>% dplyr::select(Gene_id, iso1_Heidelberg_chicken_1:iso5_Heidelberg_liver_3)
heatmap_matrix_1v5 <- drop_na(heatmap_matrix_1v5)
heatmap_matrix_1v5 <- data.frame(heatmap_matrix_1v5)
#select the data of interest from the meta file
heatmap_meta_1v5 <- iso_1v5_matrix %>%
dplyr::select(samplename, outbreak)
heatmap_meta_1v5 <- data.frame(heatmap_meta_1v5)
# Assign the sample names to be the row names
rownames(heatmap_meta_1v5) <- heatmap_meta_1v5$samplename
#Specify the outbreak
heatmap_meta_1v5 <- dplyr::select(heatmap_meta_1v5, outbreak)
#Re-level factors by outbreak
heatmap_meta_1v5$outbreak <- factor(heatmap_meta_1v5$outbreak,
levels = c("foster_farms", "kosher_liver"))
#Aesthetics
heatmap_colors <- rev(brewer.pal(10, "RdBu"))
ann_colors <- list(outbreak = c(foster_farms = "#D11313", kosher_liver = "#6DB0F8"))
#more set up
rownames(heatmap_matrix_1v5) <- heatmap_matrix_1v5$Gene_id
genes_for_heatmap <- grep("Gene_id", colnames(heatmap_matrix_1v5))
heatmap_matrix_1v5 <- heatmap_matrix_1v5[,-genes_for_heatmap]
heatmap_matrix_1v5 <- data.frame(heatmap_matrix_1v5)
#plotting the heatmap
heatmap_1v5 <- as.ggplot(pheatmap(heatmap_matrix_1v5,
color = heatmap_colors,
annotation_col = heatmap_meta_1v5,
annotation_colors = ann_colors,
show_colnames = FALSE,
show_rownames = FALSE,
scale = "row"))
ggsave("heatmap_figure_1v5.png")
## Saving 7 x 5 in image
Figure 14. Heatmap shows downregluation in Iso 5 (Kosher Liver) in most genes when comapred to Iso 1 (Foster Farms chicken). Overall, there is less variation among the Kosher liver Iso 4 samples, and more variation between the Iso 1 Foster Farms isolates.
We will be using all_matrix_1v4
and
heatmap_meta_1v4
This is to narrow down to a handful of DE genes.
#top 20 genes
order_matrix_1v5_top20 <- arrange(all_matrix_1v5, padj) %>%
dplyr::select(Gene_id, iso1_Heidelberg_chicken_1:iso5_Heidelberg_liver_3) %>%
dplyr::slice_head(n=20)
#pivot
expression_plot_1v5 <- tidyr::pivot_longer(order_matrix_1v5_top20,
cols = 2:7,
values_to = "normalized_counts")
#Adding outbreak column to pivoted data frame
expression_plot_1v5$outbreak <- rep(c("foster_farms", "foster_farms", "foster_farms", "kosher_liver", "kosher_liver", "kosher_liver"),times=20)
expression_plot_1v5 <- data.frame(expression_plot_1v5)
I will be selecting RS08070 and RS01895; which were upregulated in the Foster Farms chicken outbreak when compared to the Kosher Liver outbreak.
#create objects for expression values
RS08070_exp <- expression_plot_1v5 %>%
filter(Gene_id == "gene-SEHA_RS08070")
RS01895_exp <- expression_plot_1v5 %>%
filter(Gene_id == "gene-SEHA_RS01895")
#Re-factor the x-axis variable to be in the correct order
RS08070_exp$outbreak <- factor(RS08070_exp$outbreak, levels = c("foster_farms", "kosher_liver"))
Now we can make the three box plots for each gene.
#RS08070
ggplot(RS08070_exp) +
geom_boxplot(aes(x = outbreak,
y = normalized_counts,
fill = outbreak)) +
theme_bw() +
ggtitle("RS08070_exp") +
xlab("Outbreak") +
ylab("Normalized Counts") +
theme(legend.position = "right",
plot.title = element_text(size = rel(1.25), hjust = 0.5),
axis.title = element_text(size = rel(0.75)),
axis.text = element_text(size = rel(0.75)))
ggsave("Hei_1v5_RS08070.png")
## Saving 7 x 5 in image
Figure 15. Boxplot demonstrating upregulation of gene-SEHA_RS08070 in Iso 1 (Foster Farms chicken) when compared to Iso 5 (Kosher Liver). This gene encodes for APH(6), (aminoglycoside O-phosphotransferase). It inactivates a number of aminoglycoside antibiotics by phosphorylation. One of these antibiotics is streptomycin, which was detected in the genome.
#RS01895
ggplot(RS01895_exp) +
geom_boxplot(aes(x = outbreak,
y = normalized_counts,
fill = outbreak)) +
theme_bw() +
ggtitle("RS01895_exp") +
xlab("Outbreak") +
ylab("Normalized Counts") +
theme(legend.position = "right",
plot.title = element_text(size = rel(1.25), hjust = 0.5),
axis.title = element_text(size = rel(0.75)),
axis.text = element_text(size = rel(0.75)))
ggsave("Hei_1v5_RS01895.png")
## Saving 7 x 5 in image
Figure 16. Boxplot demonstrating upregulation of gene-SEHA_RS08070 in Iso 1 (Foster Farms chicken) when compared to Iso 5 (Kosher Liver). Unfortunatley, this gene is unannotated.
We can filter the rows to select only Iso 4 and Iso 5. These isolates are both from the same outbreak (Kosher Liver). However, they are different strains from different products. Furthemore, in wet lab heat shock experiements performed in the Etter Lab by Ariel Martin, it was determined that Iso 4 has moderate heat shock, while Iso 5 has low heat shock. Potential differences in heat tolerance is an area of interest.
#Filter to select only Iso 4 and Iso 5, create a dataframe
iso_4v5_matrix <- dplyr::slice(sampleTable, -(1:9), -(16:18))
iso_4v5_matrix <- data.frame(iso_4v5_matrix)
Next, we will specify factors. We will specify the columns “outbreak”, “Source” and “Heat_Shock” as factors.
#specify outbreak as a factor
outbreak <- factor(iso_4v5_matrix$outbreak)
#specify source as a factor
Source <- factor(iso_4v5_matrix$Source)
#specify heat shock level as a factor, the number of minutes indicates survival time of the isolate experiencing heat stress (so higher value indicates a more heat tolerant strain)
heattolerance <- factor(iso_4v5_matrix$Heat_Shock)
We will select design by outbreak. We expect to see a greater
difference by outbreak rather than by source or level of heat shock, and
we are hoping to compare our findings to epidemiological and clinical
relevance of the outbreak. However, since these are from the same
outbreak, we will design by Source
.
#DESeq
ddsHTSeq_heidelberg_4v5 <- DESeqDataSetFromHTSeqCount(iso_4v5_matrix,
directory = ".",
design= ~ Source)
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
We can filter the rows to only keep those that have at least 30 reads
total. Right now we have 4,754 rows. Notice that line 1 is creating an
object called keep
that will identify any row greater than
or equal to 30.
#filter
keep <- rowSums(counts(ddsHTSeq_heidelberg_4v5)) >= 30
ddsHTSeq_heidelberg_4v5 <- ddsHTSeq_heidelberg_4v5[keep,]
The standard differential expression analysis steps are wrapped into
a single function, DESeq()
. The estimation steps performed
by this function are described in the manual page for
?DESeq
.
#DESeq
ddsHTSeq_heidelberg_4v5 <- DESeq(ddsHTSeq_heidelberg_4v5)
We will use the contrast
argument
#contrast
results_heidelberg_4v5 <- results(ddsHTSeq_heidelberg_4v5, contrast = c("Source", "fresh_chicken_liver", "broiled_chicken_liver"))
#in order by p adj value
result_4v5 <- results_heidelberg_4v5[order(results_heidelberg_4v5$padj), ]
#table of results
table(result_4v5$padj<0.05)
##
## FALSE
## 4465
Now, we can merge the above output
(i.e. result
) with the normalized counts data
(i.e. ddsHTSeq_heidelberg
).
#merging
resdata_heidelberg_4v5 <- merge(as.data.frame(result_4v5), as.data.frame(counts(ddsHTSeq_heidelberg_4v5, normalized=TRUE)), by="row.names", sort=FALSE)
Next, we can relabel the first column Gene_id
using the
names()
function. Then finally, write the results to a new
.csv file called Heidelberg_matrix.csv
#rename column
names(resdata_heidelberg_4v5)[1] <- "Gene_id"
#write results into a csv
write.csv(resdata_heidelberg_4v5, file="resdata_heidelberg_4v5.csv")
Now we will make a PCA plot. Each data point represents one sample. The x-axis, PC1, explains the largest source of variation in the dataset, which in this case is the outbreak from which the Salmonella Heidelberg isolates were obtained. We can see that the isolates are clustering by outbreak. Now, we can dive deeper into pairwise comparisons by specific isolate and source.
#PCA
heidelberg_counts_4v5 <- DESeq2::vst(ddsHTSeq_heidelberg_4v5, blind = FALSE)
pca_heidelberg_4v5 <- plotPCA(heidelberg_counts_4v5, intgroup=c("Source"), returnData=TRUE)
percentVar <- round(100 * attr(pca_heidelberg_4v5, "percentVar"))
ggplot(pca_heidelberg_4v5, aes(PC1, PC2, color=Source)) +
geom_point(size=3) +
ylim(-10,10) +
xlab(paste0("PC1: ",percentVar[1],"% variance")) +
ylab(paste0("PC2: ",percentVar[2],"% variance")) +
coord_fixed() +
ggtitle("Heidelberg Kosher Outbreak Iso 4 vs Iso 5")
ggsave("Heidelberg_4v5_PCA.png")
## Saving 7 x 5 in image
Figure 17. PCA Plot of isolate 4 (Kosher liver, fresh) versus isolate 5 (Kosher liver, broiled), showing lack of clustering by source. It can be seen that these isolates are very similar, as they are not clustering. There is an 11% variance between reps shown on the Y-axis (PC2), and there is no distinct clustering to be seen across the X-axis (PC1). This indicates that isolates are extremely similar.
This will visualize the relationship between the samples.
#plot correlation heatmap
rld <- rlog(ddsHTSeq_heidelberg_4v5, blind=FALSE)
rld_mat <- assay(rld)
rld_cor <- cor(rld_mat)
pheatmap(rld_cor)
ggsave("Heidelberg_4v5_CorHeatMap.png")
## Saving 7 x 5 in image
Figure 18. Correlation heatmap of three replicates of Iso 4 (Kosher liver, fresh) and Iso 5 (Kosher liver, broiled). This heatmap shows that the isolates are very similar, as a value of 1 indicates that 1 is most similar, and 0.993 is least similar. In fact, it seems that isolates of different sources are similar based upon replicate number, potentially moreso than source.
Since there are no DE genes between Iso 4 and Iso 5, I will not proceed with data visualization.
I hypothesized that isolates 4 and 5 would be very different from isolate 1, as they are from different outbreaks. These differences were likely due to genetic variability among the isolates. This was supported by my findings, as there were 120 DE genes in both Iso 4 and Iso 5 when compared to Iso 1. Likewise, there were 0 DE genes between Iso 4 and Iso 5. This emphasizes the point that neither the source (fresh versus broiled liver) nor the heat tolerance (as determined by wet lab experiments) had significant impacts on gene expression. However, it was interesting to see that genes encoding for antimicrobial resistance were upregulated in Iso 1 (tet(B) was the third most upregulated gene when compared to Iso 4; APH(6) was the most upregulated gene when compared to Iso 5; both AMR genes were DE in both pairwise comparisons).
Because Salmonella enterica is a non-model organism, there were limitations inherent in this study, such as the lack of gene annotation, but there were also unanticipated road blocks. Using a serovar-specific alignment was useful in enhancing specificity, but lack of annotation made it difficult to conceptualize the functions of differentially expressed genes. Furthermore, we were unable to convert the gene id’s to the formats necessary for either gene ontology or KEGG analysis. Within this serovar-specific analysis of the Heidelberg isolates, we were still able to see the upregulation of AMR genes, which supports and validates the outbreak and clinical manifestations of Isolate 1, from the Foster Farms chicken outbreak. Manually annotating and analyzing DE genes was time consuming and less effective than other methods, and better pipelines for evaluating DESEq results within specific Salmonella enterica serovars would prove useful.
sessionInfo()
## R version 4.2.2 (2022-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] ggplotify_0.1.2 cowplot_1.1.3
## [3] biomaRt_2.54.1 EnhancedVolcano_1.16.0
## [5] ggrepel_0.9.5 lubridate_1.9.3
## [7] forcats_1.0.0 stringr_1.5.1
## [9] dplyr_1.1.4 purrr_1.0.2
## [11] readr_2.1.5 tidyr_1.3.1
## [13] tibble_3.2.1 tidyverse_2.0.0
## [15] ggplot2_3.5.0 pheatmap_1.0.12
## [17] RColorBrewer_1.1-3 DESeq2_1.38.3
## [19] SummarizedExperiment_1.28.0 Biobase_2.58.0
## [21] MatrixGenerics_1.10.0 matrixStats_1.3.0
## [23] GenomicRanges_1.50.2 GenomeInfoDb_1.34.9
## [25] IRanges_2.32.0 S4Vectors_0.36.2
## [27] BiocGenerics_0.44.0 knitr_1.46
##
## loaded via a namespace (and not attached):
## [1] fs_1.6.3 bitops_1.0-7 bit64_4.0.5
## [4] filelock_1.0.3 progress_1.2.3 httr_1.4.7
## [7] tools_4.2.2 bslib_0.7.0 utf8_1.2.4
## [10] R6_2.5.1 DBI_1.2.2 colorspace_2.1-0
## [13] withr_3.0.0 tidyselect_1.2.1 prettyunits_1.2.0
## [16] curl_5.2.1 bit_4.0.5 compiler_4.2.2
## [19] textshaping_0.3.7 cli_3.6.2 xml2_1.3.6
## [22] DelayedArray_0.24.0 labeling_0.4.3 sass_0.4.9
## [25] scales_1.3.0 rappdirs_0.3.3 systemfonts_1.0.6
## [28] yulab.utils_0.1.4 digest_0.6.35 rmarkdown_2.26
## [31] XVector_0.38.0 pkgconfig_2.0.3 htmltools_0.5.8.1
## [34] highr_0.10 dbplyr_2.5.0 fastmap_1.1.1
## [37] rlang_1.1.3 rstudioapi_0.16.0 RSQLite_2.3.6
## [40] farver_2.1.1 gridGraphics_0.5-1 jquerylib_0.1.4
## [43] generics_0.1.3 jsonlite_1.8.8 BiocParallel_1.32.6
## [46] RCurl_1.98-1.14 magrittr_2.0.3 GenomeInfoDbData_1.2.9
## [49] Matrix_1.5-1 Rcpp_1.0.12 munsell_0.5.1
## [52] fansi_1.0.6 lifecycle_1.0.4 stringi_1.8.3
## [55] yaml_2.3.8 zlibbioc_1.44.0 BiocFileCache_2.6.1
## [58] grid_4.2.2 blob_1.2.4 parallel_4.2.2
## [61] crayon_1.5.2 lattice_0.20-45 Biostrings_2.66.0
## [64] annotate_1.76.0 hms_1.1.3 KEGGREST_1.38.0
## [67] locfit_1.5-9.9 pillar_1.9.0 geneplotter_1.76.0
## [70] codetools_0.2-18 XML_3.99-0.16.1 glue_1.7.0
## [73] evaluate_0.23 png_0.1-8 vctrs_0.6.5
## [76] tzdb_0.4.0 gtable_0.3.4 cachem_1.0.8
## [79] xfun_0.43 xtable_1.8-4 ragg_1.3.0
## [82] AnnotationDbi_1.60.2 memoise_2.0.1 timechange_0.3.0