MMG5320 Final Project - Analyzing NGS Dataset Using Salmon and STAR

About the Dataset

Thyroid hormones and their receptors are important to regulating cell differentiation and metabolism in various tissues, with the receptor mainly acting as transcription factors to control different gene network in the presence of the thyroid hormone T3 (Triiodothyronine). With new evidence emerging that suggests a more complex model of transcriptional regulation of thyroid hormone receptors, Gillis et. al aim to further explore this complex model through various Next Generation Sequencing approach, carried out on the immortalized normal human thyroid epithelial cell line (Nthy-ORI). Gene expression analysis was done following RNA-sequencing, to compare the effect of T3 against vehicle control on Nthy-ORI cells.

The Nthy-ORI cells were cultured in RPMI-1640 media supplemented with growth factors and hormones, then hormone-deprived using phenol-red free and charcoal-stripped fetal bovine serum RPMI-1640 media for 24 h, and then treated with 10 nM T3 or 1 N NaOH vehicle for 6 or 24 h. Total RNA was extracted and purifed using RNeasy Plus Kit, for a total of two biological replicates per condition. Stranded Illumina cDNA libraries were prepared using the KAPA Stranded RNA-Seq library preparation kit, and sequenced on the Illumina HiSeq 1500 platform.

Citation: Gillis, N. E., Boyd, J. R., Tomczak, J. A., Frietze, S., & Carr, F. (2022). Thyroid hormone dependent transcriptional programming by TR² requires SWI/SNF chromatin remodelers. Nucleic Acids Research, 50(3), 13821395. https://doi.org/10.1093/nar/gkab1287

NCBI Gene Expression Omnibus accession code: GSE168954

Install and load the required libraries

#tximport is used to import the quantification data obtained from salmon output (quant.sf).
#ensembldb and EnsDb.Hsapiens.v86 is used to create tx2gene to import salmon output for DESeq2. 
library("tximport")
library("ensembldb")
library("EnsDb.Hsapiens.v86")
library("readr")
library("DESeq2")
library("dplyr")
library("ggplot2")
library("patchwork")

#volcano
library("EnhancedVolcano")
library("dplyr")
library("tidyr")
library("ggplot2")
library("genekitr")

#heatmap
library("pheatmap")
library("RColorBrewer")
library("ggplot2")
library("cowplot")
library("tidyr")
library("dplyr")
library("patchwork")
library("ggplotify")

Set working directory to folder containing annotation and count files.

setwd("~/scratch/bioinformatics/noelle")
directory <- "/gpfs2/scratch/ihastomo/bioinformatics/noelle"

Salmon: Load annotation files, set factors, and create file path containing quant.sf files from Salmon output

sampleTable <- data.frame(read.csv("NORI_annotation_salmon.csv", header = TRUE))

#set variables as factor
sampleTable$time <- factor(sampleTable$time)
sampleTable$treatment <- factor(sampleTable$treatment)
sampleTable$replicate <- factor(sampleTable$replicate)

rownames(sampleTable) <- sampleTable$samplename

files <- file.path(directory, sampleTable$filename, "quant.sf")
names(files) <- sampleTable$samplename

Salmon: Using tximport to load transcript abundance quantification

edb <- EnsDb.Hsapiens.v86 #load annotation database generated from Ensembl
Tx <- transcripts(edb, return.type = "DataFrame") #extract transcript info from database
tx2gene <- Tx[, c(1,7)] #extract transcript id and gene id


txi.salmon <- tximport(files, type = "salmon", tx2gene = tx2gene, ignoreTxVersion = TRUE) 
#ignoreTxVersion to remove version info to easily match transcript id with gene id.
tail(txi.salmon$counts)

##                 nori_24HR_T3_rep1 nori_24HR_T3_rep2 nori_24HR_vehicle_rep1
## ENSG00000283694             0.000             0.000                  0.000
## ENSG00000283695             0.000             0.000                  0.000
## ENSG00000283696             0.000             0.000                  1.000
## ENSG00000283697             0.000             0.000                  0.000
## ENSG00000283698            14.259             9.966                 15.129
## ENSG00000283699             0.000             0.000                  0.000
##                 nori_24HR_vehicle_rep2 nori_6HR_T3_rep1 nori_6HR_T3_rep2
## ENSG00000283694                  0.000            0.000            0.000
## ENSG00000283695                  0.000            0.000            0.000
## ENSG00000283696                  1.000            2.000            4.197
## ENSG00000283697                  0.000            0.000            0.000
## ENSG00000283698                  3.252            2.332            3.479
## ENSG00000283699                  0.000            0.000            0.000
##                 nori_6HR_vehicle_rep1 nori_6HR_vehicle_rep2
## ENSG00000283694                 0.000                 0.000
## ENSG00000283695                 0.000                 0.000
## ENSG00000283696                 1.000                 2.000
## ENSG00000283697                 0.000                 0.000
## ENSG00000283698                 8.263                11.553
## ENSG00000283699                 0.000                 0.000

Salmon: DESeq2 Analysis

salmon_dds <- DESeqDataSetFromTximport(txi.salmon,
                                       colData = sampleTable,
                                       design = ~treatment + time)

keep <- rowSums(counts(salmon_dds)) >= 10 
salmon_dds <- salmon_dds[keep,] #remove low counts
salmon_dds$treatment <- relevel(salmon_dds$treatment, ref = "vehicle")
salmon_dds <- DESeq(salmon_dds)
salmon_res_vehiclevsT3 <- results(salmon_dds, contrast = c('treatment', 'T3', 'vehicle'))
head(salmon_res_vehiclevsT3)

## log2 fold change (MLE): treatment T3 vs vehicle 
## Wald test p-value: treatment T3 vs vehicle 
## DataFrame with 6 rows and 6 columns
##                   baseMean log2FoldChange     lfcSE      stat    pvalue
##                  <numeric>      <numeric> <numeric> <numeric> <numeric>
## ENSG00000000003 247.383473      0.1647337  0.226486  0.727347  0.467013
## ENSG00000000419 426.846321      0.0291147  0.219182  0.132834  0.894325
## ENSG00000000457  89.361790      0.4255695  0.282939  1.504102  0.132555
## ENSG00000000460 227.464499     -0.2121521  0.239450 -0.885996  0.375619
## ENSG00000000938   0.918696      2.2082718  3.118336  0.708157  0.478848
## ENSG00000000971   2.088495     -0.7828573  1.488357 -0.525988  0.598897
##                      padj
##                 <numeric>
## ENSG00000000003  0.950111
## ENSG00000000419  0.996035
## ENSG00000000457  0.803926
## ENSG00000000460  0.924073
## ENSG00000000938        NA
## ENSG00000000971        NA

salmon_padj_res_vehiclevsT3 <- salmon_res_vehiclevsT3[order(salmon_res_vehiclevsT3$padj), ] #order based on padj value

salmon_resdata_vehiclevsT3 <- merge(as.data.frame(salmon_padj_res_vehiclevsT3), 
                             as.data.frame(counts(salmon_dds, normalized=TRUE)), 
                             by="row.names", 
                             sort=FALSE)
names(salmon_resdata_vehiclevsT3)[1] <- "Gene_id"

write.csv(salmon_resdata_vehiclevsT3, 
          file="salmon_nori_vehiclevsT3_unfiltered_matrix.csv",
          row.names = FALSE)

salmon_resdata_vehiclevsT3 <- as.data.frame(salmon_padj_res_vehiclevsT3) %>% 
  dplyr::filter(abs(log2FoldChange) > 0.5 & padj < 0.05) %>% #cutoff value was chosen based on the NGS dataset paper
  tibble::rownames_to_column("Gene_id")

write.csv(salmon_resdata_vehiclevsT3, 
          "salmon_nori_vehiclevsT3_log2fc1_fdr05_DEGlist.csv", 
          row.names = FALSE)

STAR: Load annotation files, set factors, and load count file containing gene id and counts of all samples

coldata <- read.csv("NORI_annotation_STAR.csv", header = TRUE)
coldata <- data.frame(coldata)

# Set the variables as factor
coldata$time <- factor(coldata$time)
coldata$treatment <- factor(coldata$treatment)
coldata$replicate <- factor(coldata$replicate)

count_matrix <- as.matrix(read.csv("all_counts.txt",
                                     header=TRUE,
                                     sep=""))

STAR: DESeq2 Analysis

STAR_dds <- DESeqDataSetFromMatrix(countData = count_matrix,
                              colData = coldata,
                              design = ~treatment + time)

keep <- rowSums(counts(STAR_dds)) >= 10 #remove low counts
STAR_dds <- STAR_dds[keep,]
STAR_dds$treatment <- relevel(STAR_dds$treatment, ref = "vehicle")
STAR_dds <- DESeq(STAR_dds)
STAR_res_vehiclevsT3 <- results(STAR_dds, contrast = c('treatment', 'T3', 'vehicle'))
head(STAR_res_vehiclevsT3)

## log2 fold change (MLE): treatment T3 vs vehicle 
## Wald test p-value: treatment T3 vs vehicle 
## DataFrame with 6 rows and 6 columns
##                     baseMean log2FoldChange     lfcSE      stat    pvalue
##                    <numeric>      <numeric> <numeric> <numeric> <numeric>
## ENSG00000241860.7    1.86719       0.275374  1.574065  0.174945  0.861123
## ENSG00000279457.4    5.84251      -1.078011  0.938354 -1.148833  0.250625
## ENSG00000290385.1    2.85677       0.621789  1.132516  0.549033  0.582983
## ENSG00000230021.10   4.05194      -1.108843  1.143393 -0.969783  0.332155
## ENSG00000225972.1    2.76696       2.159290  1.355773  1.592664  0.111236
## ENSG00000225630.1  494.60968       0.479414  0.932520  0.514106  0.607178
##                         padj
##                    <numeric>
## ENSG00000241860.7         NA
## ENSG00000279457.4         NA
## ENSG00000290385.1         NA
## ENSG00000230021.10        NA
## ENSG00000225972.1         NA
## ENSG00000225630.1   0.808182

STAR_padj_res_vehiclevsT3 <- STAR_res_vehiclevsT3[order(STAR_res_vehiclevsT3$padj), ] #order based on padj value

STAR_resdata_vehiclevsT3 <- merge(as.data.frame(STAR_padj_res_vehiclevsT3), 
                             as.data.frame(counts(STAR_dds, normalized=TRUE)), 
                             by="row.names", 
                             sort=FALSE)

names(STAR_resdata_vehiclevsT3)[1] <- "Gene_id"

write.csv(STAR_resdata_vehiclevsT3, 
          file="STAR_nori_vehiclevsT3_unfiltered_matrix.csv",
          row.names = FALSE)

STAR_resdata_vehiclevsT3 <- as.data.frame(STAR_padj_res_vehiclevsT3) %>%  
  dplyr::filter(abs(log2FoldChange) > 0.5 & padj < 0.05) %>%  #cutoff value was chosen based on the NGS dataset paper
  tibble::rownames_to_column("Gene_id")

write.csv(STAR_resdata_vehiclevsT3, 
          "STAR_nori_vehiclevsT3_log2fc1_fdr05_DEGlist.csv", 
          row.names = FALSE)

Principle Components Analysis (PCA) Plot

#Salmon PCA Plot
salmon_vstcounts <- vst(salmon_dds, blind = FALSE)
salmon_PCA <- plotPCA(salmon_vstcounts, intgroup=c("treatment", "time"), returnData=TRUE)
percentVar <- round(100 * attr(salmon_PCA, "percentVar"))
salmon_PCA_plot <- ggplot(data = salmon_PCA, aes(PC1, PC2, color=treatment, shape=time)) +
  geom_point(size=3) +
  xlim(-25,25) +
  ylim(-25,25) + 
  xlab(paste0("PC1: ",percentVar[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar[2],"% variance")) + 
  coord_fixed() +
  theme_classic() + 
        ggtitle("Salmon PCA Plot") +
        theme(legend.position = "right",
              plot.title = element_text(size = rel(1.25), hjust = 0.5), 
              axis.title = element_text(size = rel(1.25)))

#STAR PCA Plot
STAR_vstcounts <- vst(STAR_dds, blind = FALSE)
STAR_PCA <- plotPCA(STAR_vstcounts, intgroup=c("treatment", "time"), returnData=TRUE)
percentVar <- round(100 * attr(STAR_PCA, "percentVar"))
STAR_PCA_plot <- ggplot(data = STAR_PCA, aes(PC1, PC2, color=treatment, shape=time)) +
  geom_point(size=3) +
  xlim(-25,25) +
  ylim(-25,25) + 
  xlab(paste0("PC1: ",percentVar[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar[2],"% variance")) + 
  coord_fixed() +
  theme_classic() + 
        ggtitle("STAR PCA Plot") +
        theme(legend.position = "right",
              plot.title = element_text(size = rel(1.25), hjust = 0.5), 
              axis.title = element_text(size = rel(1.25)))

#save PCA plots as png files
ggsave(filename = "salmon_PCA_plot.png",
       plot = salmon_PCA_plot,
       units = "in", 
       width = 5, 
       height = 6, 
       dpi = 300)

ggsave(filename = "STAR_PCA_plot.png",
       plot = STAR_PCA_plot,
       units = "in", 
       width = 5, 
       height = 6, 
       dpi = 300)  

salmon_PCA_plot + STAR_PCA_plot

Differential analysis from DESeq2 uses raw counts. PCA extracts the most important factors and allows the data to be more easily examined, using variance stabilizing transformation (vst). The plot visualized the overall effect of treatment and time, where the PC1 accounts for the largest variation, and PC2 accounts for the second largest variation. Ideally samples are clustered based on their conditions. In this case, Salmon and STAR produced a similar pattern (simply inverted), where the largest variance is time (left side are 6 HR), and the second largest variance is the treatment (presence or absence of T3). All sample replicates are clustered based on their condition, except for one of the 24 HR vehicle sample. These plots also suggest that there is a larger variation due to time when using STAR, and a larger variation due to treatment when using Salmon.

Salmon: Generate differentially expressed gene files containing additional column of gene symbol and differentially expressed status

salmon_resdata_vehiclevsT3 <- read.csv("salmon_nori_vehiclevsT3_unfiltered_matrix.csv",
                                       header = TRUE)
salmon_resdata_vehiclevsT3$threshold <- salmon_resdata_vehiclevsT3$padj < 0.05


salmon_resdata_vehiclevsT3$Gene_id <- sub("\\.\\d+", "", salmon_resdata_vehiclevsT3$Gene_id)
ids <- salmon_resdata_vehiclevsT3$Gene_id
gene_symbol_map <- transId(id = ids, transTo = "symbol")
names(salmon_resdata_vehiclevsT3)[1] = "input_id"
salmon_resdata_gene <- merge(salmon_resdata_vehiclevsT3, gene_symbol_map, by.x = "input_id", by.y = "input_id")

#Add a column to data frame to specify up or down regulated
salmon_resdata_gene$diffexpressed <- "Unchanged"

#if log2FC > 0.5 and padj < 0.05 set as upregulated
salmon_resdata_gene$diffexpressed[salmon_resdata_gene$padj < 0.05 & salmon_resdata_gene$log2FoldChange > 0.5] <- "Upregulated"

#if log2FC < -0.5 and padj < 0.05 set as downregulated
salmon_resdata_gene$diffexpressed[salmon_resdata_gene$padj < 0.05 & salmon_resdata_gene$log2FoldChange < -0.5] <- "Downregulated"

salmon_resdata_sig <- as.data.frame(salmon_resdata_gene) %>% 
  dplyr::filter(abs(log2FoldChange) > 0.5 & padj < 0.05)

write.csv(salmon_resdata_sig, file = "nori_salmon_vehiclevsT3_DEG.csv", row.names = FALSE)

salmon_upregulated_count <- nrow(salmon_resdata_sig[salmon_resdata_sig$diffexpressed == "Upregulated", ])

salmon_downregulated_count <- nrow(salmon_resdata_sig[salmon_resdata_sig$diffexpressed == "Downregulated", ])

salmon_upregulated_count

## [1] 15

salmon_downregulated_count

## [1] 250

STAR: Generate differentially expressed gene files containing additional column of gene symbol and differentially expressed status

STAR_resdata_vehiclevsT3 <- read.csv("STAR_nori_vehiclevsT3_unfiltered_matrix.csv",
                                     header = TRUE)

STAR_resdata_vehiclevsT3$threshold <- STAR_resdata_vehiclevsT3$padj < 0.05

STAR_resdata_vehiclevsT3$Gene_id <- sub("\\.\\d+", "", STAR_resdata_vehiclevsT3$Gene_id)
ids <- STAR_resdata_vehiclevsT3$Gene_id
gene_symbol_map <- transId(id = ids, transTo = "symbol")
names(STAR_resdata_vehiclevsT3)[1] = "input_id"
STAR_resdata_gene <- merge(STAR_resdata_vehiclevsT3, gene_symbol_map, by.x = "input_id", by.y = "input_id")

#Add a column to dataframe to specify up or down regulated
STAR_resdata_gene$diffexpressed <- "Unchanged"

#if log2FC > 0.5 and padj < 0.05 set as upregulated
STAR_resdata_gene$diffexpressed[STAR_resdata_gene$padj < 0.05 & STAR_resdata_gene$log2FoldChange > 0.5] <- "Upregulated"

#if log2FC < -0.5 and padj < 0.05 set as downregulated
STAR_resdata_gene$diffexpressed[STAR_resdata_gene$padj < 0.05 & STAR_resdata_gene$log2FoldChange < -0.5] <- "Downregulated"

STAR_resdata_sig <- as.data.frame(STAR_resdata_gene) %>% 
  dplyr::filter(abs(log2FoldChange) > 0.5 & padj < 0.05)

write.csv(STAR_resdata_sig, file = "nori_STAR_vehiclevsT3_DEG.csv", row.names = FALSE)


STAR_upregulated_count <- nrow(STAR_resdata_sig[STAR_resdata_sig$diffexpressed == "Upregulated", ])

STAR_downregulated_count <- nrow(STAR_resdata_sig[STAR_resdata_sig$diffexpressed == "Downregulated", ])

STAR_upregulated_count

## [1] 57

STAR_downregulated_count

## [1] 104

Volcano Plot

#Salmon
salmon_volcano <- 
        ggplot(salmon_resdata_gene) +
        geom_point(aes(x=log2FoldChange, 
                       y=-log10(padj), colour=diffexpressed)) + 
        theme_classic() + 
        ggtitle("Salmon Vehicle versus T3") +
        xlab("Log2 fold change") + 
        ylab("-Log10(padj)") +
        theme(legend.position = "right",
              plot.title = element_text(size = rel(1.25), hjust = 0.5), 
              axis.title = element_text(size = rel(1.25))) +
        scale_color_manual(values = c("blue", "black", "red"),
                           labels = c("Downregulated", "Not Significant", "Upregulated")) +
        scale_x_continuous(limits = c(-20, 20)) + 
        coord_cartesian(ylim=c(0, 10), clip = "off") 

  # Add annotation
  salmon_volcano <- salmon_volcano + 
  annotate("text", x = 15, y = 10, label = "15", colour = "red", size = 4, hjust = 0) +
  annotate("text", x = -10, y = 10, label = "250", colour = "blue", size = 4, hjust = 1) 
  
  
  
STAR_volcano <- 
  ggplot(STAR_resdata_gene) +
  geom_point(aes(x=log2FoldChange, 
                 y=-log10(padj), colour=diffexpressed)) + 
  theme_classic() + 
  ggtitle("STAR Vehicle vs T3") +
  xlab("Log2 fold change") + 
  ylab("-Log10(padj)") +
  theme(legend.position = "right",
        plot.title = element_text(size = rel(1.25), hjust = 0.5), 
        axis.title = element_text(size = rel(1.25))) +
  scale_color_manual(values = c("blue", "black", "red"),
                     labels = c("Downregulated", "Not Significant", "Upregulated")) +
  scale_x_continuous(limits = c(-20, 20)) + 
  coord_cartesian(ylim=c(0, 10), clip = "off") 

  # Add annotation
  STAR_volcano <- STAR_volcano + 
  annotate("text", x = 15, y = 10, label = "57", colour = "red", size = 4, hjust = 0) +
  annotate("text", x = -10, y = 10, label = "104", colour = "blue", size = 4, hjust = 1) 
  
  #save as PNG files
  ggsave(filename = "salmon_volcano.png",
       plot = salmon_volcano,
       units = "in", 
       width = 5, 
       height = 6, 
       dpi = 300)  
  
  ggsave(filename = "STAR_volcano.png",
       plot = STAR_volcano,
       units = "in", 
       width = 5, 
       height = 6, 
       dpi = 300)  
  
  salmon_volcano / STAR_volcano + plot_layout(nrow = 2,
                                              widths = c(4, 4),
                                              heights = c(5, 5))

Salmon quantifies transcript from RNA-seq data, by mapping it against transcriptome index, while STAR counts the number of reads per gene while mapping to the whole genome. Volcano plot visualizes every gene and plots their -log10(padj) against the log2(FoldChange). Genes that have a padj of < 0.05 and a log2(FoldChange) > 0.5 or < -0.5 are labeled as upregulated or downregulated respectively. All the other genes are considered not significant. This highlights the effect of the treatment on gene regulation. The volcano plots differ in that Salmon has a higher downregulated genes and lower upregulated genes compared to STAR.

Salmon: Generate matrix for heatmap

salmon_all_matrix <- data.frame(read.csv("nori_salmon_vehiclevsT3_DEG.csv", header = TRUE))

salmon_hm_matrix <- data.frame(dplyr::select(salmon_all_matrix, symbol, 8:15), row.names = 1)
salmon_hm_matrix <- salmon_hm_matrix[-c(53:264),] #remove ribosomal genes
                  
salmon_heatmap_meta <- sampleTable
rownames(salmon_heatmap_meta) <- salmon_heatmap_meta$samplename
salmon_heatmap_meta <-  dplyr::select(salmon_heatmap_meta, treatment, time)
salmon_heatmap_meta$treatment <- factor(salmon_heatmap_meta$treatment, levels = c("T3", "vehicle"))
salmon_heatmap_meta$time <- factor(salmon_heatmap_meta$time, levels = c("6HR", "24HR"))
salmon_heatmap_colors <- rev(brewer.pal(10, "RdBu"))
salmon_ann_colors <- list(treatment = c("T3" = "#D11313", "vehicle" = "#6DB0F8"),
                          time = c("6HR" = "#fff2cc", "24HR" = "#93c47d"))

STAR: Generate matrix for heatmap

STAR_all_matrix <- data.frame(read.csv("nori_STAR_vehiclevsT3_DEG.csv", header = TRUE))
STAR_all_matrix <- STAR_all_matrix[-c(150,155), ] #remove duplicate row names

STAR_hm_matrix <- data.frame(dplyr::select(STAR_all_matrix, symbol, 8:15), row.names = 1)

STAR_heatmap_meta <- coldata
rownames(STAR_heatmap_meta) <- STAR_heatmap_meta$samplename
STAR_heatmap_meta <-  dplyr::select(STAR_heatmap_meta, treatment, time)
STAR_heatmap_meta$treatment <- factor(STAR_heatmap_meta$treatment, levels = c("T3", "vehicle"))
STAR_heatmap_meta$time <- factor(STAR_heatmap_meta$time, levels = c("6HR", "24HR"))
STAR_heatmap_colors <- rev(brewer.pal(10, "RdBu"))
STAR_ann_colors <- list(treatment = c("T3" = "#D11313", "vehicle" = "#6DB0F8"),
                          time = c("6HR" = "#fff2cc", "24HR" = "#93c47d"))

Heatmap

#salmon
salmon_heatmap <- as.ggplot(pheatmap(salmon_hm_matrix, 
         color = salmon_heatmap_colors,
         annotation_col = salmon_heatmap_meta,
         annotation_colors = salmon_ann_colors,
         show_colnames = F, 
         show_rownames = F,
         scale = "row",
         annotation_names_col = F,
         main = "Salmon Heatmap"))

ggsave(filename = "salmon_heatmap.png",
       plot = salmon_heatmap,
       units = "in", 
       width = 5, 
       height = 6, 
       dpi = 300)  

#STAR
STAR_heatmap <- as.ggplot(pheatmap(STAR_hm_matrix, 
         color = STAR_heatmap_colors,
         annotation_col = STAR_heatmap_meta,
         annotation_colors = STAR_ann_colors,
         show_colnames = F, 
         show_rownames = F,
         scale = "row",
         annotation_names_col = F,
         main = "STAR Heatmap"))

ggsave(filename = "STAR_heatmap.png",
       plot = STAR_heatmap,
       units = "in", 
       width = 5, 
       height = 6, 
       dpi = 300)  

salmon_heatmap + STAR_heatmap

Heatmap plots the treatment and time as column, and every genes as row. They are plotted according to their Z-scores, by substracting the mean from each gene, and dividing them by the standard deviation. Heatmap visualizes gene expression clustering based on their conditions, ideally with little differences between replicates. Based on the Salmon heatmap, there is a clear difference in gene expression profile between treatments (T3 and vehicle), while the STAR heatmap suggests a difference in profile between the two time points and treatments only at 24 HR.

Conclusion

Although both PCA plots indicate similarities in variation from the different sample conditions, subsequent downstream analysis showed that there are differences in the analysis result depending on whether salmon or STAR was used. These differences would likely impact lowly expressed genes the most, resulting in different numbers of differentially expressed genes, both upregulated and downregulated. These differences arise from the nature of each tools, with STAR being an aligner that aligns to the whole genome, while Salmon is more of a pseudo-aligner that maps to the transcriptome. Both of them have their specific purposes and output, along with their own advantages and disadvantages, such as Salmon generally working faster than STAR.