Introduction, Part I

In Part I, differential gene expression of Salmonella enterica isolates was analyzed to determine how these signatures varied between the three serovars tested: Heidelberg, Typhimurium, and monophasic Typhimurium. We hypothesized that the gene expression observed in the monophasic Typhimurium and Typhimurium isolates would be distinct from the Heidelberg isolates, as the former two groups are more closely related. This anaylsis will analyze three isolates that are specifically Salmonella enterica serovar Heidelberg, and evaluate differences in gene expression to better understand what is potentially attributed to these differences. RNA-seq was performed on six outbreak-associated strains from four different outbreaks. The isolate naming scheme used was iso#_serovar_foodsource_replicate#. There were three biological replicates for each isolate.

The wet-lab methods were as follows:

Bacterial cultures were grown.
RNA extracted in stationary phase with the PureLink RNA Mini Kit (Invitrogen).
rRNA depletion with NEBNext rRNA Depletion Kit (Bacteria) from New England Biolabs and again with the Zymo RiboFree Assay kit by the Vermont Integrative Genomics Core.
Sequencing by the Vermont Integrative Genomics Core with the Illumina HiSeq (150 bp paired-end) on two flow cells.

The Vermont Integrative Genomics Core transferred the 72 FASTQ files us via Globus, we uploaded our the Vermont Advanced Computing Center (VACC) home directories. In the VACC, the FASTQ files were organized as follows:

The files, respective of forward and backward read, replicate number, and sample were concatenated to combine reads from the two different flow cells.
Quality assessment was performed with FastQC v.0.11.7 and MultiQC v.1.0.
Illumina adapters and reads with phred scores of less tha 20 were trimmed with Trimgalore v.0.6.4 and Cutadapt v.4.6.
Quality assessment was performed a second time, adapter contamination was gone.
The FASTQ files were aligned to the Salmonella enterica subsp. enterica serovar Heidelberg str. SL476 reference genome ASM2070v1 (GCF_000020705.1).
Counting was performed with HTSeq v.0.11.2.

Finally, we proceeded with the differential expression (DE) analysis below. In the directory with this Rmarkdown script, there are a number of additional files necessary to run this script fully. This includes:

18 *_sorted.counts files
“salmonella_annotation_BlackwellPatch.csv” to specify the sample/serovar that the count files were derived from

Workflow

First, a DE Seq analysis will be performed for all of the isolates, using isolate 1 as the control to observe preliminary clustering and relations between isolates. Then, three pairwise comparisons will serve as an in-depth analysis.

All Heidelberg Isolates (1, 4 and 5)

Load the Required Libraries

library(knitr)
library(DESeq2) 
library(RColorBrewer)
library(pheatmap)
library(ggplot2)
library(tidyverse)
library(EnhancedVolcano)
library(biomaRt)
library(dplyr)
library(tidyr)
library(cowplot)
library(ggplotify)

Input Counts

First, we will load the annotation file called salmonella_annotation_BlackwellPatch.csv below, then filter to only include the Salmonella Heidelberg.

sampleTable <- read.csv("salmonella_annotation_BlackwellPatch.csv", header = TRUE, sep = ",")

heidelberg_matrix <- sampleTable %>%
  filter(serovar == "Heidelberg")
heidelberg_matrix <- data.frame(heidelberg_matrix)

Next, we will specify factors. We will specify the columns “serovar”, “outbreak”, “Source”, and “Heat_Shock” as a factors.

#specify serovar as a factor
serovar <- factor(heidelberg_matrix$serovar)

#specify outbreak as a factor
outbreak <- factor(heidelberg_matrix$outbreak)

#specify source as a factor
Source <- factor(heidelberg_matrix$Source)

#specify heat shock level as a factor
#number of minutes indicates survival time of the isolate experiencing heat stress, so higher num. of minutes indicates more heat tolerant strain
heattolerance <- factor(heidelberg_matrix$Heat_Shock)

Running DESeq2

All Heidelberg

Design will be by outbreak. One of the isolates is from the Foster Farms outbreak (1), and two isolates are from the Kosher liver outbreak (4 and 5).

#DE Seq set up
ddsHTSeq_heidelberg <- DESeqDataSetFromHTSeqCount(heidelberg_matrix,
                                       directory = ".",
                                       design= ~ outbreak)

Pre-Filtering

We can filter the rows to only keep those that have at least 30 reads total.

#keep 30 reads and above
keep <- rowSums(counts(ddsHTSeq_heidelberg)) >= 30 
ddsHTSeq_heidelberg <- ddsHTSeq_heidelberg[keep,]

Running the Differential Expression Analysis

All Heidelberg

We will set the contrast to outbreak.

#run DE Seq and create object for results
ddsHTSeq_heidelberg <- DESeq(ddsHTSeq_heidelberg)
results_heidelberg <- results(ddsHTSeq_heidelberg, contrast = c("outbreak", "foster_farms", "kosher_liver"))

We can now order our results table by the smallest padj value, then filter:

#order results
result <- results_heidelberg[order(results_heidelberg$padj), ]

#filter by p-adj
table(result$padj<0.05)

## 
## FALSE  TRUE 
##  4382   198

Merging the results above with the normalized counts data

Now, we can merge the above output (i.e. result) with the normalized counts data (i.e. ddsHTSeq_heidelberg).

#merging
resdata_heidelberg <- merge(as.data.frame(result), as.data.frame(counts(ddsHTSeq_heidelberg, normalized=TRUE)), by="row.names", sort=FALSE)

Next, we can relabel the first column Gene_id using the names() function.

#rename first column
names(resdata_heidelberg)[1] <- "Gene_id"

#write results into a csv
write.csv(resdata_heidelberg, file="Heidelberg_matrix.csv")

Data transformation and visualization

All Heidelberg

Principal components analysis (PCA)

Now we will make a PCA plot. Each data point represents one sample. The x-axis, PC1, explains the largest source of variation in the dataset, which in this case is the outbreak from which the Salmonella Heidelberg isolates were obtained. We can see that the isolates are clustering by outbreak. Now, we can dive deeper into pairwise comparisons by specific isolate and source.

#PCA
heidelberg_counts <- DESeq2::vst(ddsHTSeq_heidelberg, blind = FALSE)
pca_heidelberg <- plotPCA(heidelberg_counts, intgroup=c("outbreak"), returnData=TRUE)
percentVar <- round(100 * attr(pca_heidelberg, "percentVar"))
ggplot(pca_heidelberg, aes(PC1, PC2, color=outbreak)) +
  geom_point(size=3) +
  ylim(-10,10) + 
  xlab(paste0("PC1: ",percentVar[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar[2],"% variance")) + 
  coord_fixed()

Figure 1. PCA plot of all nine Heidelberg isolates, clearly clustering by outbreak There is a 94% variance between the three Foster Farms chicken isolates and six Kosher Liver isolates (PC1), and a 5% variance between the groups of isolates themselves (PC2).’

Pairwise Comparison Between Heidelberg Isolates

Isolates 1, 4, and 5 are Salmonella Heidelberg; 4 and 5 are from the same outbreak, but they are all from different sources: * Iso 1 = chicken tenderloin * Iso 4 = fresh chicken liver * Iso 5 = broiled chicken liver

Pairwise Comparison 1: Iso 1 and Iso 4

We can filter the rows to select only Iso 1 and Iso 4. Remember, Iso 1 consists of 3 replicates from chicken liver samples from the Foster Farms outbreak. Iso 4 consists of 3 replicates of fresh chicken liver from the Kosher Liver outbreak.

Import and Format Data

The next steps are very similar to above, as we are calling in the sample table to filter and keep only the isolates of interest. We will then re-specify factors before performing DE Seq.

#Filter origional sample table to select only Iso 1 and Iso 4, create a dataframe
iso_1v4_matrix <- dplyr::slice(sampleTable, -(4:9), -(13:18))
iso_1v4_matrix <- data.frame(iso_1v4_matrix)

Next, we will specify factors. We will specify the columns “outbreak”, “Source” and “Heat_Shock” as factors.

#specify outbreak as a factor
outbreak <- factor(iso_1v4_matrix$outbreak)

#specify source as a factor
Source <- factor(iso_1v4_matrix$Source)

#specify heat shock level as a factor, the number of minutes indicates survival time of the isolate experiencing heat stress (so higher value indicates a more heat tolerant strain)
heattolerance <- factor(iso_1v4_matrix$Heat_Shock)

DE Seq

Pairwise 1: Iso 1 and Iso 4

We will select design by outbreak. We expect to see a greater difference by outbreak rather than by source or level of heat shock, and we are hoping to compare our findings to epidemiological and clinical relevance of the outbreak.

#Run DESeq
ddsHTSeq_heidelberg_1v4 <- DESeqDataSetFromHTSeqCount(iso_1v4_matrix,
                                       directory = ".",
                                       design= ~ outbreak)

Pre-Filtering

We can filter the rows to only keep those that have at least 30 reads total. Right now we have 4,754 rows. Notice that line 1 is creating an object called keep that will identify any row greater than or equal to 30.

keep <- rowSums(counts(ddsHTSeq_heidelberg_1v4)) >= 30 
ddsHTSeq_heidelberg_1v4 <- ddsHTSeq_heidelberg_1v4[keep,]

Running the Differential Expression Analysis

Pairwise 1: Iso 1 and Iso 4

The standard differential expression analysis steps are wrapped into a single function, DESeq(). The estimation steps performed by this function are described in the manual page for ?DESeq.

ddsHTSeq_heidelberg_1v4 <- DESeq(ddsHTSeq_heidelberg_1v4)

Extracting a Table to Show the Results

We will use the contrast argument, and contrast by outbreak.

#contrast
results_heidelberg_1v4 <- results(ddsHTSeq_heidelberg_1v4, contrast = c("outbreak", "foster_farms", "kosher_liver"))
 
#in order by p adj value 
result_1v4 <- results_heidelberg_1v4[order(results_heidelberg_1v4$padj), ]

table(result_1v4$padj<0.05)

## 
## FALSE  TRUE 
##  4430   148

Merging the results above with the normalized counts data

Now, we can merge the above output (i.e. result) with the normalized counts data (i.e. ddsHTSeq_heidelberg).

#merging
resdata_heidelberg_1v4 <- merge(as.data.frame(result_1v4), as.data.frame(counts(ddsHTSeq_heidelberg_1v4, normalized=TRUE)), by="row.names", sort=FALSE)

Next, we can relabel the first column Gene_id using the names() function. Then finally, write the results to a new .csv file called Heidelberg_matrix.csv

#rename column
names(resdata_heidelberg_1v4)[1] <- "Gene_id"

#write results into a csv
write.csv(resdata_heidelberg_1v4, file="resdata_heidelberg_1v4.csv")

Data transformation and visualization

Pairwise 1: Iso 1 and Iso 4

Principal components analysis (PCA)

#PCA
heidelberg_counts_1v4 <- DESeq2::vst(ddsHTSeq_heidelberg_1v4, blind = FALSE)

#creating an object and changing 
pca_heidelberg_1v4 <- plotPCA(heidelberg_counts_1v4, intgroup=c("outbreak"), returnData=TRUE)
percentVar <- round(100 * attr(pca_heidelberg_1v4, "percentVar"))
ggplot(pca_heidelberg_1v4, aes(PC1, PC2, color=outbreak)) +
  geom_point(size=3) +
  ylim(-10,10) + 
  xlab(paste0("PC1: ",percentVar[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar[2],"% variance")) + 
  coord_fixed() +
  ggtitle("Heidelberg Iso 1 (Foster Farms) vs Iso 4 (Kosher Liver)")

ggsave("Heidelberg_1v4_PCA.png")

## Saving 7 x 5 in image

Figure 2. PCA plot shows clustering by outbreak. There is a 95% PC1 variance by outbreak, and a 4% PC2 variance between respective outbreaks’ replicates, indicating that there are variations among replicates within each isolate, but the isolates of different outbreaks are vastly different.

Correlation Heatmap

Pairwise 1 : Iso 1 and Iso 4

This will visualize the relationship between the samples.

rld <- rlog(ddsHTSeq_heidelberg_1v4, blind=FALSE)
rld_mat <- assay(rld)
rld_cor <- cor(rld_mat)
pheatmap(rld_cor)

ggsave("Heidelberg_1v4_CorHeatMap.png")

## Saving 7 x 5 in image

Figure 3. Correlation heatmap between Isolate 1 chicken samples (three reps) and Isolate 4 liver samples (three reps) shows distinctive clustering by outbreak.

Volcano Plot

Pairwise 1 : Iso 1 and Iso 4

First, we need to add a column that specifies whether the data is differentially expressed or not. This threshold will filter based on the p adjusted value, and will TRUE will mean that it has a value of <0.05.

#filter
threshold_padj_1v4 <- resdata_heidelberg_1v4$padj < 0.05 
length(which(threshold_padj_1v4))

## [1] 148

#Add vector as a column (threshold) to the resdata
resdata_heidelberg_1v4$threshold <- threshold_padj_1v4

Now we will add columns to specify upregulated or downregulated; if log2FC > 1 and padj < 0.05 set as upregulated, and if log2FC > -1 and padj < 0.05 set as downregulated.

#Add a column to dataframe to specify up or down regulated, then specify
resdata_heidelberg_1v4$diffexpressed <- "Unchanged"
resdata_heidelberg_1v4$diffexpressed[resdata_heidelberg_1v4$padj < 0.05 & resdata_heidelberg_1v4$log2FoldChange > 1] <- "Upregulated"
resdata_heidelberg_1v4$diffexpressed[resdata_heidelberg_1v4$padj < 0.05 & resdata_heidelberg_1v4$log2FoldChange < -1] <- "Downregulated"

Next we can filter out results to include ONLY data with a P-adj of 0.05 or less, and a log 2 fold change of greater than 1.

resdata_sig_heidelberg_1v4 <- as.data.frame(resdata_heidelberg_1v4) %>% dplyr::filter(abs(log2FoldChange) > 1 & padj < 0.05)

# Count the number of rows where diffexpressed is "Upregulated"
upregulated_count_heidelberg_1v4 <- nrow(resdata_sig_heidelberg_1v4[resdata_sig_heidelberg_1v4$diffexpressed == "Upregulated", ])
upregulated_count_heidelberg_1v4

## [1] 122

downregulated_count_heidelberg_1v4 <- nrow(resdata_sig_heidelberg_1v4[resdata_sig_heidelberg_1v4$diffexpressed == "Downregulated", ])
downregulated_count_heidelberg_1v4

## [1] 1

Now we can actually create the volcano plot!

p <- ggplot(resdata_heidelberg_1v4) +
        geom_point(aes(x=log2FoldChange, 
                       y=-log10(padj), colour=diffexpressed)) + 
  theme_classic() + 
        ggtitle("Heidelberg isolates: Iso 1 vs Iso 4") +
        xlab("Log2 fold change") + 
        ylab("-Log10(padj)") +
        theme(legend.position = "right",
              plot.title = element_text(size = rel(1.25), hjust = 0.5), 
              axis.title = element_text(size = rel(1.25))) +
  scale_color_manual(values = c("blue", "black", "red"),
                     labels = c("Downregulated", "Not Significant", "Upregulated")) +
  scale_x_continuous(limits = c(-20, 20)) + 
  coord_cartesian(ylim=c(0, 300), clip = "off") 

  # Add annotation
p + annotate("text", x = 15, y = 10, label = "122", colour = "red", size = 8, hjust = 0) +
  annotate("text", x = -15, y = 10, label = "1", colour = "blue", size = 8, hjust = 1)

ggsave("Heidelberg_1v4_Volcano.png")

## Saving 7 x 5 in image

Figure 4. Volcano plot shows 122 upregulated genes and 1 downregulated gene in isolate 1 (Foster Farms Outbreak) when compared to isolate 4 (Kosher Liver Outbreak). Most of these genes are not annotated, and the one downregulated gene is rRNA. It is suspected that this could possibly encode for heightened stress response, but could also potentially be a result of incomplete rRNA depletion.

Enhanced Volcano Plot

Pairwise 1: Iso 1 v Iso 4

An enhanced volcano plot will label some of the DE genes, as well as showing which are are significant by both P-adj and log 2 fold change.

#filter
resdata_sig_heidelberg_1v4 <- as.data.frame(resdata_heidelberg_1v4) %>% dplyr::filter(abs(log2FoldChange) > 1 & padj < 0.05)

#create enhanced volcano
EnhancedVolcano(resdata_heidelberg_1v4,
                lab = as.character(resdata_heidelberg_1v4$Gene_id),
                x = 'log2FoldChange',
                y = 'padj', 
                title = "Heidelberg: Foster Farms[1] v Kosher Liver[4]",
                titleLabSize = 15,
                subtitleLabSize = 10,
                FCcutoff = 1,
                pCutoff = 0.05,
                labSize = 4, 
                axisLabSize = 12, 
                col=c('black', 'black', 'black', 'red3'),
                labCol = 'black',
                labFace = 'bold',
                boxedLabels = TRUE,
                legendLabels=c('Not sig.','Log2FC','padj', 'padj & Log2FC'),
                legendPosition = 'right',
                legendLabSize = 10,
                legendIconSize = 3.0,
                drawConnectors = TRUE,
                widthConnectors = 1.0,
                colConnectors = 'black', 
                gridlines.major = FALSE,
                gridlines.minor = FALSE,
                max.overlaps = 20)

ggsave("Heidelberg_1v4_Enhancedvolano.png")

## Saving 7 x 5 in image

Figure 5. Enhanced volcano plot shows upregulation by p-adj and Log2FC. Simiarly to the volcano plot, DE genes are overwhelmingly upregulated in the Foster Farms chicken isolate 1 when compared to the Kosher liver isolate 4. Gene-SEHA_RS20475 (unannotated, encodes for rRNA) was downregulated. Gene-SEHA_RS22775 and gene-SEHA_RS00260 are both unannotated. Gene-SEHA_22820 encodes for tet(B) (tetracycline) resistance in the Foster Farms chicken, isolate 1. tet(B) was found in the genome, and tetracycline resistance was cited as a specific clinical issue in the outbreak.

Heat Map

Pairwise 1 : Iso 1 v Iso 4

A heatmap is another tool used to visualize DE genes among the isolates and replicates within isolates.

#convert into data frame
all_matrix_1v4 <- data.frame(resdata_sig_heidelberg_1v4)

#select only iso 1 and iso 4 and gene ID columns
heatmap_matrix_1v4 <- resdata_sig_heidelberg_1v4 %>% dplyr::select(Gene_id, iso1_Heidelberg_chicken_1:iso4_Heidelberg_liver_3)
heatmap_matrix_1v4 <- drop_na(heatmap_matrix_1v4) 
heatmap_matrix_1v4 <- data.frame(heatmap_matrix_1v4)

#select sample name and outbreak info from meta file
heatmap_meta_1v4 <- iso_1v4_matrix %>% 
                    dplyr::select(samplename, outbreak)
heatmap_meta_1v4 <- data.frame(heatmap_meta_1v4)

Setting Up

# Assign the sample names to be the row names
rownames(heatmap_meta_1v4) <- heatmap_meta_1v4$samplename

#Specify the outbreak
heatmap_meta_1v4 <- dplyr::select(heatmap_meta_1v4, outbreak)


#Re-level factors by outbreak
heatmap_meta_1v4$outbreak <- factor(heatmap_meta_1v4$outbreak,
                                levels = c("foster_farms", "kosher_liver"))

#Aesthetics
heatmap_colors <- rev(brewer.pal(10, "RdBu"))
ann_colors <- list(outbreak = c(foster_farms = "#D11313", kosher_liver = "#6DB0F8"))

Plotting

#grab the Gene IDs
rownames(heatmap_matrix_1v4) <- heatmap_matrix_1v4$Gene_id
genes_for_heatmap <- grep("Gene_id", colnames(heatmap_matrix_1v4))
heatmap_matrix_1v4 <- heatmap_matrix_1v4[,-genes_for_heatmap]
heatmap_matrix_1v4 <- data.frame(heatmap_matrix_1v4)

heatmap_1v4 <- as.ggplot(pheatmap(heatmap_matrix_1v4, 
         color = heatmap_colors,
         annotation_col = heatmap_meta_1v4,
         annotation_colors = ann_colors,
         show_colnames = FALSE, 
         show_rownames = FALSE,
         scale = "row"))

ggsave("heatmap_figure_1v4.png")

Figure 6. Heatmap shows downregluation in Iso 4 (Kosher Liver) in most genes when comapred to Iso 1 (Foster Farms chicken). Overall, there is less variation among the Kosher liver Iso 4 samples, and more variation between the Iso 1 Foster Farms isolates. Despite uniformity among most DE genes, there is upregulation in one gene in one rep in the Kosher liver isolate. Similarly, in that same rep, there is downregulation among all three Iso 1 reps.

Boxplots

Pairwise 1 : Iso 1 v Iso 4

Boxplots will help us to visualize a few selected genes of interest so more closely investigate their differential expression.

Data Input

We will be using all_matrix_1v4 and heatmap_meta_1v4

#files
head(all_matrix_1v4)

##             Gene_id  baseMean log2FoldChange     lfcSE     stat        pvalue
## 1 gene-SEHA_RS22775 12645.345       12.88613 0.4521659 28.49869 1.216049e-178
## 2 gene-SEHA_RS00260  5267.191       12.49762 0.6050720 20.65476  8.845504e-95
## 3 gene-SEHA_RS22820 15126.485       14.60285 0.7300353 20.00294  5.192277e-89
## 4 gene-SEHA_RS00055 45717.286       19.04702 1.1813897 16.12256  1.771243e-58
## 5 gene-SEHA_RS00050 27156.368       16.44527 1.0240578 16.05893  4.950419e-58
## 6 gene-SEHA_RS00315 38356.275       18.79374 1.1822737 15.89627  6.725057e-57
##            padj iso1_Heidelberg_chicken_1 iso1_Heidelberg_chicken_2
## 1 5.567073e-175                  22675.56                  28150.23
## 2  2.024736e-91                  10332.36                  11820.67
## 3  7.923415e-86                  28034.71                  32861.80
## 4  2.027188e-55                  93682.04                  92809.25
## 5  4.532604e-55                  55417.62                  55113.39
## 6  5.131219e-54                  74490.29                  82833.78
##   iso1_Heidelberg_chicken_3 iso4_Heidelberg_liver_1 iso4_Heidelberg_liver_2
## 1                 25036.781                5.802579                1.060267
## 2                  9445.141                4.973639                0.000000
## 3                 29859.088                3.315759                0.000000
## 4                 87812.429                0.000000                0.000000
## 5                 52405.264                0.000000                1.060267
## 6                 72813.576                0.000000                0.000000
##   iso4_Heidelberg_liver_3 threshold diffexpressed
## 1               2.6380856      TRUE   Upregulated
## 2               0.0000000      TRUE   Upregulated
## 3               0.0000000      TRUE   Upregulated
## 4               0.0000000      TRUE   Upregulated
## 5               0.8793619      TRUE   Upregulated
## 6               0.0000000      TRUE   Upregulated

head(heatmap_meta_1v4)

##                               outbreak
## iso1_Heidelberg_chicken_1 foster_farms
## iso1_Heidelberg_chicken_2 foster_farms
## iso1_Heidelberg_chicken_3 foster_farms
## iso4_Heidelberg_liver_1   kosher_liver
## iso4_Heidelberg_liver_2   kosher_liver
## iso4_Heidelberg_liver_3   kosher_liver

Data Manipulation

This is to narrow down to a handful of DE genes.

#narrowing down DE genes
order_matrix_1v4_top20 <- arrange(all_matrix_1v4, padj) %>%
                          dplyr::select(Gene_id, iso1_Heidelberg_chicken_1:iso4_Heidelberg_liver_3) %>%
                          dplyr::slice_head(n=20)

#expression plot
expression_plot_1v4 <- tidyr::pivot_longer(order_matrix_1v4_top20, 
                                cols = 2:7,
                                values_to = "normalized_counts")

#Adding outbreak column to pivoted data frame
expression_plot_1v4$outbreak <- rep(c("foster_farms", "foster_farms", "foster_farms", "kosher_liver", "kosher_liver", "kosher_liver"),times=20)
expression_plot_1v4 <- data.frame(expression_plot_1v4)

Creating Boxplots for Genes of interest

I will be selecting gene-SEHA_RS22775, gene-SEHA_RS00260, and gene-SEHA_RS22820; which were all upregulated in the Foster Farms chicken outbreak when compared to the Kosher Liver outbreak.

#create expression plots for all
RS22775_exp <- expression_plot_1v4 %>%
  filter(Gene_id == "gene-SEHA_RS22775")

RS00260_exp <- expression_plot_1v4 %>%
  filter(Gene_id == "gene-SEHA_RS00260")

RS22820_exp <- expression_plot_1v4 %>%
  filter(Gene_id == "gene-SEHA_RS22820")

#Re-factor the x-axis variable to be in the correct order
RS22775_exp$outbreak <- factor(RS22775_exp$outbreak, levels = c("foster_farms", "kosher_liver"))

Now we can make the three box plots for each gene.

#RS22775
ggplot(RS22775_exp) + 
  geom_boxplot(aes(x = outbreak, 
                   y = normalized_counts,
                   fill = outbreak)) +
  theme_bw() +
  ggtitle("RS22775_exp") +
  xlab("Outbreak") +
  ylab("Normalized Counts") + 
  theme(legend.position = "right",
              plot.title = element_text(size = rel(1.25), hjust = 0.5), 
              axis.title = element_text(size = rel(0.75)),
              axis.text = element_text(size = rel(0.75)))

ggsave("Hei_1v4_RS22775.png")

## Saving 7 x 5 in image

Figure 7. Boxplot shows extreme upregulation in Iso 1 Foster Farms when compared to Iso 4 Kosher Liver. Unfortunately, this gene is not annotated, so it cannot be evaluated extensively. It is interesting to note that the normalized overexpression counts is extremely high.

#RS00260
ggplot(RS00260_exp) + 
  geom_boxplot(aes(x = outbreak, 
                   y = normalized_counts,
                   fill = outbreak)) +
  theme_bw() +
  ggtitle("RS00260_exp") +
  xlab("Outbreak") +
  ylab("Normalized Counts") + 
  theme(legend.position = "right",
              plot.title = element_text(size = rel(1.25), hjust = 0.5), 
              axis.title = element_text(size = rel(0.75)),
              axis.text = element_text(size = rel(0.75)))

ggsave("Hei_1v4_RS00260.png")

## Saving 7 x 5 in image

Figure 8. Boxplot shows extreme upregulation in Iso 1 Foster Farms when compared to Iso 4 Kosher Liver. Similarly to Figure 7, this gene is not annotated.

#RS22820
ggplot(RS22820_exp) + 
  geom_boxplot(aes(x = outbreak, 
                   y = normalized_counts,
                   fill = outbreak)) +
  theme_bw() +
  ggtitle("RS22820_exp") +
  xlab("Outbreak") +
  ylab("Normalized Counts") + 
  theme(legend.position = "right",
              plot.title = element_text(size = rel(1.25), hjust = 0.5), 
              axis.title = element_text(size = rel(0.75)),
              axis.text = element_text(size = rel(0.75)))

ggsave("Hei_1v4_RS22820.png")

## Saving 7 x 5 in image

Figure 9. Boxplot shows extreme upregulation in Iso 1 Foster Farms when compared to Iso 4 Kosher Liver. This gene is annotated, and encodes for tet(B), tetracycline resistance. As mentioned earlier, tetracycline resistance was cited to be an issue in the outbreaks. In the Foster Farms outbreak, 65% of isolates were drug resistant, and 35% were multidrug resistant. There was an extremely high hospitalization rate of 38%, and invasive illness was seen at a much higher rate (15%) than normal (5%).

Pairwise Comparison 2: Iso 1 and Iso 5

We can filter the rows to select only Iso 1 and Iso 5. Remember, Iso 1 consists of 3 replicates from chicken liver samples from the Foster Farms outbreak. Iso 5 consists of 3 replicates of broiled chicken liver from the Kosher Liver outbreak.

Import and Format Data

#Call in sampleTable
head(sampleTable)

##                  samplename               filename isolate replicate
## 1 iso1_Heidelberg_chicken_1 iso1_rep1_sorted.count       1         1
## 2 iso1_Heidelberg_chicken_2 iso1_rep2_sorted.count       1         2
## 3 iso1_Heidelberg_chicken_3 iso1_rep3_sorted.count       1         3
## 4      iso2_MonoTyph_pork_1 iso2_rep1_sorted.count       2         1
## 5      iso2_MonoTyph_pork_2 iso2_rep2_sorted.count       2         2
## 6      iso2_MonoTyph_pork_3 iso2_rep3_sorted.count       2         3
##        serovar     outbreak             Source Heat_Shock
## 1   Heidelberg foster_farms chicken_tenderloin       <NA>
## 2   Heidelberg foster_farms chicken_tenderloin       <NA>
## 3   Heidelberg foster_farms chicken_tenderloin       <NA>
## 4 I(4,5,12);i-   roast_pork              swine     60_min
## 5 I(4,5,12);i-   roast_pork              swine     60_min
## 6 I(4,5,12);i-   roast_pork              swine     60_min

#Filter to select only Iso 1 and Iso 5, create a dataframe
iso_1v5_matrix <- dplyr::slice(sampleTable, -(4:12), -(16:18))
iso_1v5_matrix <- data.frame(iso_1v5_matrix)

Next, we will specify factors. We will specify the columns “outbreak”, “Source” and “Heat_Shock” as factors.

#specify outbreak as a factor
outbreak <- factor(iso_1v5_matrix$outbreak)

#specify source as a factor
Source <- factor(iso_1v5_matrix$Source)

#specify heat shock level as a factor, the number of minutes indicates survival time of the isolate experiencing heat stress (so higher value indicates a more heat tolerant strain)
heattolerance <- factor(iso_1v5_matrix$Heat_Shock)

DE Seq

Pairwise 2: Iso 1 v Iso 5

ddsHTSeq_heidelberg_1v5 <- DESeqDataSetFromHTSeqCount(iso_1v5_matrix,
                                       directory = ".",
                                       design= ~ outbreak)

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

Pre-Filtering

keep <- rowSums(counts(ddsHTSeq_heidelberg_1v5)) >= 30 
ddsHTSeq_heidelberg_1v5 <- ddsHTSeq_heidelberg_1v5[keep,]

Running the Differential Expression Analysis

The standard differential expression analysis steps are wrapped into a single function, DESeq(). The estimation steps performed by this function are described in the manual page for ?DESeq.

ddsHTSeq_heidelberg_1v5 <- DESeq(ddsHTSeq_heidelberg_1v5)

Extracting a Table to Show the Results

We will use the contrast argument

results_heidelberg_1v5 <- results(ddsHTSeq_heidelberg_1v5, contrast = c("outbreak", "foster_farms", "kosher_liver"))
 
#in order by p adj value 
result_1v5 <- results_heidelberg_1v5[order(results_heidelberg_1v5$padj), ]

table(result_1v5$padj<0.05)

## 
## FALSE  TRUE 
##  4415   165

Merging the results above with the normalized counts data

Now, we can merge the above output (i.e. result) with the normalized counts data (i.e. ddsHTSeq_heidelberg).

resdata_heidelberg_1v5 <- merge(as.data.frame(result_1v5), as.data.frame(counts(ddsHTSeq_heidelberg_1v5, normalized=TRUE)), by="row.names", sort=FALSE)

Next, we can relabel the first column Gene_id using the names() function. Then finally, write the results to a new .csv file called Heidelberg_matrix.csv

#rename just a column... names(object)[column num] <- "new name"
names(resdata_heidelberg_1v5)[1] <- "Gene_id"

#write results into a csv
write.csv(resdata_heidelberg_1v5, file="resdata_heidelberg_1v5.csv")

Data transformation and visualization

Principal components analysis (PCA)

Pairwise 2 : Iso 1 and Iso 5

heidelberg_counts_1v5 <- DESeq2::vst(ddsHTSeq_heidelberg_1v5, blind = FALSE)

pca_heidelberg_1v5 <- plotPCA(heidelberg_counts_1v5, intgroup=c("outbreak"), returnData=TRUE)
percentVar <- round(100 * attr(pca_heidelberg_1v5, "percentVar"))
ggplot(pca_heidelberg_1v5, aes(PC1, PC2, color=outbreak)) +
  geom_point(size=3) +
  ylim(-10,10) + 
  xlab(paste0("PC1: ",percentVar[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar[2],"% variance")) + 
  coord_fixed() +
  ggtitle("Heidelberg Iso 1 (Foster Farms) vs Iso 5 (Kosher Liver)")

ggsave("Heidelberg_1v5_PCA.png")

## Saving 7 x 5 in image

Figure 10. PCA plot showing isolates/repetitions clustering by outbreak. Iso 1 (Foster Farms chicken) clustering with 4% variation within repetitions (PC2), and 95% variance when compared to Iso 4 (Kosher Liver) (PC1).

Correlation Heatmap

Pairwise 2 : Iso 1 and Iso 5

This will visualize the relationship between the samples.

library(pheatmap)
rld <- rlog(ddsHTSeq_heidelberg_1v5, blind=FALSE)
rld_mat <- assay(rld)
rld_cor <- cor(rld_mat)
pheatmap(rld_cor)

ggsave("Heidelberg_1v5_CorHeatMap.png")

## Saving 7 x 5 in image

Figure 11. Correlation heatmap of Iso 1 (Foster Farms chicken) compared to Iso 4 (Kosher Liver).

Volcano Plot

Pairwise 2 : Iso 1 and Iso 5

threshold_padj_1v5 <- resdata_heidelberg_1v5$padj < 0.05 
length(which(threshold_padj_1v5))

## [1] 165

## Add vector as a column (threshold) to the resdata
resdata_heidelberg_1v5$threshold <- threshold_padj_1v5

#Add a column to dataframe to specify up or down regulated
resdata_heidelberg_1v5$diffexpressed <- "Unchanged"

#if log2FC > 1 and padj < 0.05 set as upregulated
resdata_heidelberg_1v5$diffexpressed[resdata_heidelberg_1v5$padj < 0.05 & resdata_heidelberg_1v5$log2FoldChange > 1] <- "Upregulated"

#if log2FC > -1 and padj < 0.05 set as downregulated
resdata_heidelberg_1v5$diffexpressed[resdata_heidelberg_1v5$padj < 0.05 & resdata_heidelberg_1v5$log2FoldChange < -1] <- "Downregulated"

#filter
resdata_sig_heidelberg_1v5 <- as.data.frame(resdata_heidelberg_1v5) %>% dplyr::filter(abs(log2FoldChange) > 1 & padj < 0.05)

# Count the number of rows where diffexpressed is "Upregulated"
upregulated_count_heidelberg_1v5 <- nrow(resdata_sig_heidelberg_1v5[resdata_sig_heidelberg_1v5$diffexpressed == "Upregulated", ])

downregulated_count_heidelberg_1v5 <- nrow(resdata_sig_heidelberg_1v5[resdata_sig_heidelberg_1v5$diffexpressed == "Downregulated", ])

#create plot
p <- ggplot(resdata_heidelberg_1v5) +
        geom_point(aes(x=log2FoldChange, 
                       y=-log10(padj), colour=diffexpressed)) + 
  theme_classic() + 
        ggtitle("Heidelberg isolates: Foster Farms Outbreak [1] vs Kosher Liver Outbreak [5]") +
        xlab("Log2 fold change") + 
        ylab("-Log10(padj)") +
        theme(legend.position = "right",
              plot.title = element_text(size = rel(1.25), hjust = 0.5), 
              axis.title = element_text(size = rel(1.25))) +
  scale_color_manual(values = c("blue", "black", "red"),
                     labels = c("Downregulated", "Not Significant", "Upregulated")) +
  scale_x_continuous(limits = c(-20, 20)) + 
  coord_cartesian(ylim=c(0, 300), clip = "off") 

  # Add annotation
p + annotate("text", x = 15, y = 10, label = "124", colour = "red", size = 8, hjust = 0) +
  annotate("text", x = -15, y = 10, label = "1", colour = "blue", size = 8, hjust = 1)

ggsave("Heidelberg_1v5_Volcano.png")

## Saving 7 x 5 in image

Figure 12. Volcano plot shows vast upregulation of genes in Iso 1 (Foster Farms chicken) when compared to Iso 4 (Kosher liver). There are 125 DE genes; 124 upregulated and 1 downregulated. The one downregulated gene is the same rRNA seen int he 1v4 comparison, which is not annotated. Between this pairwise comparison and pairwise comparison #1, 120 of the DE genes are identical. This implies that Iso 1 is very distinct from Iso 4 and Iso 5, which are very similar to eachother.

Enhanced Volcano Plot

Pairwise 2 : Iso 1 and Iso 5

#filter
resdata_sig_heidelberg_1v5 <- as.data.frame(resdata_heidelberg_1v5) %>% dplyr::filter(abs(log2FoldChange) > 1 & padj < 0.05)

#create plot
EnhancedVolcano(resdata_heidelberg_1v5,
                lab = as.character(resdata_heidelberg_1v5$Gene_id),
                x = 'log2FoldChange',
                y = 'padj', 
                title = "Heidelberg: Foster Farms[1] v Kosher Liver[5]",
                titleLabSize = 15,
                subtitleLabSize = 10,
                FCcutoff = 1,
                pCutoff = 0.05,
                labSize = 4, 
                axisLabSize = 12, 
                col=c('black', 'black', 'black', 'red3'),
                labCol = 'black',
                labFace = 'bold',
                boxedLabels = TRUE,
                legendLabels=c('Not sig.','Log2FC','padj', 'padj & Log2FC'),
                legendPosition = 'right',
                legendLabSize = 10,
                legendIconSize = 3.0,
                drawConnectors = TRUE,
                widthConnectors = 1.0,
                colConnectors = 'black', 
                gridlines.major = FALSE,
                gridlines.minor = FALSE,
                max.overlaps = 20)

ggsave("Heidelberg_1v5_Enhancedvolano.png")

## Saving 7 x 5 in image

Figure 13. Enhanced volcano plot shows DE gene expression among Iso 1 (Foster Farms chicken) when compared to Iso 5 (Kosher liver). Two genes are labeled, gene-SEHA_RS08070 (APH-6, encodes for streptomycin resistance), and gene-SEHA_RS01895 (unannotated).

Heat Map

Pairwise 2 : Iso 1 and Iso 5

First we need to read in the data and set up.

all_matrix_1v5 <- data.frame(resdata_sig_heidelberg_1v5)

#select the data of interest
heatmap_matrix_1v5 <- resdata_sig_heidelberg_1v5 %>% dplyr::select(Gene_id, iso1_Heidelberg_chicken_1:iso5_Heidelberg_liver_3)
heatmap_matrix_1v5 <- drop_na(heatmap_matrix_1v5) 
heatmap_matrix_1v5 <- data.frame(heatmap_matrix_1v5)

#select the data of interest from the meta file
heatmap_meta_1v5 <- iso_1v5_matrix %>% 
                    dplyr::select(samplename, outbreak)

heatmap_meta_1v5 <- data.frame(heatmap_meta_1v5)

Setting Up

# Assign the sample names to be the row names
rownames(heatmap_meta_1v5) <- heatmap_meta_1v5$samplename

#Specify the outbreak
heatmap_meta_1v5 <- dplyr::select(heatmap_meta_1v5, outbreak)


#Re-level factors by outbreak
heatmap_meta_1v5$outbreak <- factor(heatmap_meta_1v5$outbreak,
                                levels = c("foster_farms", "kosher_liver"))

#Aesthetics
heatmap_colors <- rev(brewer.pal(10, "RdBu"))
ann_colors <- list(outbreak = c(foster_farms = "#D11313", kosher_liver = "#6DB0F8"))

Plotting

#more set up
rownames(heatmap_matrix_1v5) <- heatmap_matrix_1v5$Gene_id
genes_for_heatmap <- grep("Gene_id", colnames(heatmap_matrix_1v5))
heatmap_matrix_1v5 <- heatmap_matrix_1v5[,-genes_for_heatmap]
heatmap_matrix_1v5 <- data.frame(heatmap_matrix_1v5)

#plotting the heatmap
heatmap_1v5 <- as.ggplot(pheatmap(heatmap_matrix_1v5, 
         color = heatmap_colors,
         annotation_col = heatmap_meta_1v5,
         annotation_colors = ann_colors,
         show_colnames = FALSE, 
         show_rownames = FALSE,
         scale = "row"))

ggsave("heatmap_figure_1v5.png")

## Saving 7 x 5 in image

Figure 14. Heatmap shows downregluation in Iso 5 (Kosher Liver) in most genes when comapred to Iso 1 (Foster Farms chicken). Overall, there is less variation among the Kosher liver Iso 4 samples, and more variation between the Iso 1 Foster Farms isolates.

Boxplots

Pairwise 2 : Iso 1 vs Iso 5

Data Input

We will be using all_matrix_1v4 and heatmap_meta_1v4

Data Manipulation

This is to narrow down to a handful of DE genes.

#top 20 genes
order_matrix_1v5_top20 <- arrange(all_matrix_1v5, padj) %>%
                          dplyr::select(Gene_id, iso1_Heidelberg_chicken_1:iso5_Heidelberg_liver_3) %>%
                          dplyr::slice_head(n=20)

#pivot
expression_plot_1v5 <- tidyr::pivot_longer(order_matrix_1v5_top20, 
                                cols = 2:7,
                                values_to = "normalized_counts")

#Adding outbreak column to pivoted data frame
expression_plot_1v5$outbreak <- rep(c("foster_farms", "foster_farms", "foster_farms", "kosher_liver", "kosher_liver", "kosher_liver"),times=20)
expression_plot_1v5 <- data.frame(expression_plot_1v5)

Creating Boxplots for Genes of interest

I will be selecting RS08070 and RS01895; which were upregulated in the Foster Farms chicken outbreak when compared to the Kosher Liver outbreak.

#create objects for expression values
RS08070_exp <- expression_plot_1v5 %>%
  filter(Gene_id == "gene-SEHA_RS08070")

RS01895_exp <- expression_plot_1v5 %>%
  filter(Gene_id == "gene-SEHA_RS01895")

#Re-factor the x-axis variable to be in the correct order
RS08070_exp$outbreak <- factor(RS08070_exp$outbreak, levels = c("foster_farms", "kosher_liver"))

Now we can make the three box plots for each gene.

#RS08070
ggplot(RS08070_exp) + 
  geom_boxplot(aes(x = outbreak, 
                   y = normalized_counts,
                   fill = outbreak)) +
  theme_bw() +
  ggtitle("RS08070_exp") +
  xlab("Outbreak") +
  ylab("Normalized Counts") + 
  theme(legend.position = "right",
              plot.title = element_text(size = rel(1.25), hjust = 0.5), 
              axis.title = element_text(size = rel(0.75)),
              axis.text = element_text(size = rel(0.75)))

ggsave("Hei_1v5_RS08070.png")

## Saving 7 x 5 in image

Figure 15. Boxplot demonstrating upregulation of gene-SEHA_RS08070 in Iso 1 (Foster Farms chicken) when compared to Iso 5 (Kosher Liver). This gene encodes for APH(6), (aminoglycoside O-phosphotransferase). It inactivates a number of aminoglycoside antibiotics by phosphorylation. One of these antibiotics is streptomycin, which was detected in the genome.

#RS01895
ggplot(RS01895_exp) + 
  geom_boxplot(aes(x = outbreak, 
                   y = normalized_counts,
                   fill = outbreak)) +
  theme_bw() +
  ggtitle("RS01895_exp") +
  xlab("Outbreak") +
  ylab("Normalized Counts") + 
  theme(legend.position = "right",
              plot.title = element_text(size = rel(1.25), hjust = 0.5), 
              axis.title = element_text(size = rel(0.75)),
              axis.text = element_text(size = rel(0.75)))

ggsave("Hei_1v5_RS01895.png")

## Saving 7 x 5 in image

Figure 16. Boxplot demonstrating upregulation of gene-SEHA_RS08070 in Iso 1 (Foster Farms chicken) when compared to Iso 5 (Kosher Liver). Unfortunatley, this gene is unannotated.

Pairwise Comparison 1: Iso 4 and Iso 5

We can filter the rows to select only Iso 4 and Iso 5. These isolates are both from the same outbreak (Kosher Liver). However, they are different strains from different products. Furthemore, in wet lab heat shock experiements performed in the Etter Lab by Ariel Martin, it was determined that Iso 4 has moderate heat shock, while Iso 5 has low heat shock. Potential differences in heat tolerance is an area of interest.

Filter Data

#Filter to select only Iso 4 and Iso 5, create a dataframe
iso_4v5_matrix <- dplyr::slice(sampleTable, -(1:9), -(16:18))
iso_4v5_matrix <- data.frame(iso_4v5_matrix)

Next, we will specify factors. We will specify the columns “outbreak”, “Source” and “Heat_Shock” as factors.

#specify outbreak as a factor
outbreak <- factor(iso_4v5_matrix$outbreak)

#specify source as a factor
Source <- factor(iso_4v5_matrix$Source)

#specify heat shock level as a factor, the number of minutes indicates survival time of the isolate experiencing heat stress (so higher value indicates a more heat tolerant strain)
heattolerance <- factor(iso_4v5_matrix$Heat_Shock)

DE Seq

Pairwise 3 : Iso 4 vs Iso 5

#DESeq
ddsHTSeq_heidelberg_4v5 <- DESeqDataSetFromHTSeqCount(iso_4v5_matrix,
                                       directory = ".",
                                       design= ~ Source)

## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors

Pre-Filtering

#filter
keep <- rowSums(counts(ddsHTSeq_heidelberg_4v5)) >= 30 
ddsHTSeq_heidelberg_4v5 <- ddsHTSeq_heidelberg_4v5[keep,]

Running the Differential Expression Analysis

The standard differential expression analysis steps are wrapped into a single function, DESeq(). The estimation steps performed by this function are described in the manual page for ?DESeq.

#DESeq
ddsHTSeq_heidelberg_4v5 <- DESeq(ddsHTSeq_heidelberg_4v5)

Extracting a Table to Show the Results

We will use the contrast argument

#contrast
results_heidelberg_4v5 <- results(ddsHTSeq_heidelberg_4v5, contrast = c("Source", "fresh_chicken_liver", "broiled_chicken_liver"))
 
#in order by p adj value 
result_4v5 <- results_heidelberg_4v5[order(results_heidelberg_4v5$padj), ]

#table of results
table(result_4v5$padj<0.05)

## 
## FALSE 
##  4465

Merging the results above with the normalized counts data

Now, we can merge the above output (i.e. result) with the normalized counts data (i.e. ddsHTSeq_heidelberg).

#merging
resdata_heidelberg_4v5 <- merge(as.data.frame(result_4v5), as.data.frame(counts(ddsHTSeq_heidelberg_4v5, normalized=TRUE)), by="row.names", sort=FALSE)

Next, we can relabel the first column Gene_id using the names() function. Then finally, write the results to a new .csv file called Heidelberg_matrix.csv

#rename column
names(resdata_heidelberg_4v5)[1] <- "Gene_id"

#write results into a csv
write.csv(resdata_heidelberg_4v5, file="resdata_heidelberg_4v5.csv")

Data transformation and visualization

Principal components analysis (PCA)

Pairwise 3 : Iso 4 vs Iso 5

#PCA
heidelberg_counts_4v5 <- DESeq2::vst(ddsHTSeq_heidelberg_4v5, blind = FALSE)

pca_heidelberg_4v5 <- plotPCA(heidelberg_counts_4v5, intgroup=c("Source"), returnData=TRUE)
percentVar <- round(100 * attr(pca_heidelberg_4v5, "percentVar"))
ggplot(pca_heidelberg_4v5, aes(PC1, PC2, color=Source)) +
  geom_point(size=3) +
  ylim(-10,10) + 
  xlab(paste0("PC1: ",percentVar[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar[2],"% variance")) + 
  coord_fixed() +
  ggtitle("Heidelberg Kosher Outbreak Iso 4 vs Iso 5")

ggsave("Heidelberg_4v5_PCA.png")

## Saving 7 x 5 in image

Figure 17. PCA Plot of isolate 4 (Kosher liver, fresh) versus isolate 5 (Kosher liver, broiled), showing lack of clustering by source. It can be seen that these isolates are very similar, as they are not clustering. There is an 11% variance between reps shown on the Y-axis (PC2), and there is no distinct clustering to be seen across the X-axis (PC1). This indicates that isolates are extremely similar.

Correlation Heatmap

Pairwise 3 : Iso 4 v Iso 5

This will visualize the relationship between the samples.

#plot correlation heatmap
rld <- rlog(ddsHTSeq_heidelberg_4v5, blind=FALSE)
rld_mat <- assay(rld)
rld_cor <- cor(rld_mat)
pheatmap(rld_cor)

ggsave("Heidelberg_4v5_CorHeatMap.png")

## Saving 7 x 5 in image

Figure 18. Correlation heatmap of three replicates of Iso 4 (Kosher liver, fresh) and Iso 5 (Kosher liver, broiled). This heatmap shows that the isolates are very similar, as a value of 1 indicates that 1 is most similar, and 0.993 is least similar. In fact, it seems that isolates of different sources are similar based upon replicate number, potentially moreso than source.

Since there are no DE genes between Iso 4 and Iso 5, I will not proceed with data visualization.

Conclusion

I hypothesized that isolates 4 and 5 would be very different from isolate 1, as they are from different outbreaks. These differences were likely due to genetic variability among the isolates. This was supported by my findings, as there were 120 DE genes in both Iso 4 and Iso 5 when compared to Iso 1. Likewise, there were 0 DE genes between Iso 4 and Iso 5. This emphasizes the point that neither the source (fresh versus broiled liver) nor the heat tolerance (as determined by wet lab experiments) had significant impacts on gene expression. However, it was interesting to see that genes encoding for antimicrobial resistance were upregulated in Iso 1 (tet(B) was the third most upregulated gene when compared to Iso 4; APH(6) was the most upregulated gene when compared to Iso 5; both AMR genes were DE in both pairwise comparisons).

Because Salmonella enterica is a non-model organism, there were limitations inherent in this study, such as the lack of gene annotation, but there were also unanticipated road blocks. Using a serovar-specific alignment was useful in enhancing specificity, but lack of annotation made it difficult to conceptualize the functions of differentially expressed genes. Furthermore, we were unable to convert the gene id’s to the formats necessary for either gene ontology or KEGG analysis. Within this serovar-specific analysis of the Heidelberg isolates, we were still able to see the upregulation of AMR genes, which supports and validates the outbreak and clinical manifestations of Isolate 1, from the Foster Farms chicken outbreak. Manually annotating and analyzing DE genes was time consuming and less effective than other methods, and better pipelines for evaluating DESEq results within specific Salmonella enterica serovars would prove useful.

sessionInfo()

## R version 4.2.2 (2022-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] ggplotify_0.1.2             cowplot_1.1.3              
##  [3] biomaRt_2.54.1              EnhancedVolcano_1.16.0     
##  [5] ggrepel_0.9.5               lubridate_1.9.3            
##  [7] forcats_1.0.0               stringr_1.5.1              
##  [9] dplyr_1.1.4                 purrr_1.0.2                
## [11] readr_2.1.5                 tidyr_1.3.1                
## [13] tibble_3.2.1                tidyverse_2.0.0            
## [15] ggplot2_3.5.0               pheatmap_1.0.12            
## [17] RColorBrewer_1.1-3          DESeq2_1.38.3              
## [19] SummarizedExperiment_1.28.0 Biobase_2.58.0             
## [21] MatrixGenerics_1.10.0       matrixStats_1.3.0          
## [23] GenomicRanges_1.50.2        GenomeInfoDb_1.34.9        
## [25] IRanges_2.32.0              S4Vectors_0.36.2           
## [27] BiocGenerics_0.44.0         knitr_1.46                 
## 
## loaded via a namespace (and not attached):
##  [1] fs_1.6.3               bitops_1.0-7           bit64_4.0.5           
##  [4] filelock_1.0.3         progress_1.2.3         httr_1.4.7            
##  [7] tools_4.2.2            bslib_0.7.0            utf8_1.2.4            
## [10] R6_2.5.1               DBI_1.2.2              colorspace_2.1-0      
## [13] withr_3.0.0            tidyselect_1.2.1       prettyunits_1.2.0     
## [16] curl_5.2.1             bit_4.0.5              compiler_4.2.2        
## [19] textshaping_0.3.7      cli_3.6.2              xml2_1.3.6            
## [22] DelayedArray_0.24.0    labeling_0.4.3         sass_0.4.9            
## [25] scales_1.3.0           rappdirs_0.3.3         systemfonts_1.0.6     
## [28] yulab.utils_0.1.4      digest_0.6.35          rmarkdown_2.26        
## [31] XVector_0.38.0         pkgconfig_2.0.3        htmltools_0.5.8.1     
## [34] highr_0.10             dbplyr_2.5.0           fastmap_1.1.1         
## [37] rlang_1.1.3            rstudioapi_0.16.0      RSQLite_2.3.6         
## [40] farver_2.1.1           gridGraphics_0.5-1     jquerylib_0.1.4       
## [43] generics_0.1.3         jsonlite_1.8.8         BiocParallel_1.32.6   
## [46] RCurl_1.98-1.14        magrittr_2.0.3         GenomeInfoDbData_1.2.9
## [49] Matrix_1.5-1           Rcpp_1.0.12            munsell_0.5.1         
## [52] fansi_1.0.6            lifecycle_1.0.4        stringi_1.8.3         
## [55] yaml_2.3.8             zlibbioc_1.44.0        BiocFileCache_2.6.1   
## [58] grid_4.2.2             blob_1.2.4             parallel_4.2.2        
## [61] crayon_1.5.2           lattice_0.20-45        Biostrings_2.66.0     
## [64] annotate_1.76.0        hms_1.1.3              KEGGREST_1.38.0       
## [67] locfit_1.5-9.9         pillar_1.9.0           geneplotter_1.76.0    
## [70] codetools_0.2-18       XML_3.99-0.16.1        glue_1.7.0            
## [73] evaluate_0.23          png_0.1-8              vctrs_0.6.5           
## [76] tzdb_0.4.0             gtable_0.3.4           cachem_1.0.8          
## [79] xfun_0.43              xtable_1.8-4           ragg_1.3.0            
## [82] AnnotationDbi_1.60.2   memoise_2.0.1          timechange_0.3.0

Part II: Serovar Specific DESeq: Salmonella Heidelberg

Hannah Blackwell and Chelsey Patch

2024-04-03

Introduction, Part I

Workflow

All Heidelberg Isolates (1, 4 and 5)

Load the Required Libraries

Input Counts

Running DESeq2

All Heidelberg

Pre-Filtering

Running the Differential Expression Analysis

All Heidelberg

Merging the results above with the normalized counts data

Data transformation and visualization

All Heidelberg

Principal components analysis (PCA)

Pairwise Comparison Between Heidelberg Isolates

Pairwise Comparison 1: Iso 1 and Iso 4

Import and Format Data

DE Seq

Pairwise 1: Iso 1 and Iso 4

Pre-Filtering

Running the Differential Expression Analysis

Pairwise 1: Iso 1 and Iso 4

Extracting a Table to Show the Results

Merging the results above with the normalized counts data

Data transformation and visualization

Pairwise 1: Iso 1 and Iso 4

Principal components analysis (PCA)

Correlation Heatmap

Pairwise 1 : Iso 1 and Iso 4

Volcano Plot

Pairwise 1 : Iso 1 and Iso 4

Enhanced Volcano Plot

Pairwise 1: Iso 1 v Iso 4

Heat Map

Pairwise 1 : Iso 1 v Iso 4

Setting Up

Plotting

Boxplots

Pairwise 1 : Iso 1 v Iso 4

Data Input

Data Manipulation

Creating Boxplots for Genes of interest

Pairwise Comparison 2: Iso 1 and Iso 5

Import and Format Data

DE Seq

Pairwise 2: Iso 1 v Iso 5

Pre-Filtering

Running the Differential Expression Analysis

Extracting a Table to Show the Results

Merging the results above with the normalized counts data

Data transformation and visualization

Principal components analysis (PCA)

Pairwise 2 : Iso 1 and Iso 5

Correlation Heatmap

Pairwise 2 : Iso 1 and Iso 5

Volcano Plot

Pairwise 2 : Iso 1 and Iso 5

Enhanced Volcano Plot

Pairwise 2 : Iso 1 and Iso 5

Heat Map

Pairwise 2 : Iso 1 and Iso 5

Setting Up

Plotting

Boxplots

Pairwise 2 : Iso 1 vs Iso 5

Data Input

Data Manipulation

Creating Boxplots for Genes of interest

Pairwise Comparison 1: Iso 4 and Iso 5

Filter Data

DE Seq

Pairwise 3 : Iso 4 vs Iso 5

Pre-Filtering

Running the Differential Expression Analysis

Extracting a Table to Show the Results

Merging the results above with the normalized counts data

Data transformation and visualization