Functional analysis

Getting Started

Install required packages

#BiocManager::install("org.Mm.eg.db") #update all/some/none? type n

#install.packages("ggupset")
#remotes::install_github("YuLab-SMU/clusterProfiler") 
#remotes::install_github("eliocamp/ggnewscale")
#install.packages("remotes)

Load required packages

library(clusterProfiler) 
library(org.Mm.eg.db) 
library(DOSE) 
library(ggnewscale) 
library(enrichplot) 
library(knitr)
library(dplyr)
library(ggrepel)
library(ggplot2)
library(cowplot)

#library(pathview) #if this library is not installed, its okay install later, takes a few minutes 

#library(GO.db) #takes a while to install as well

Functional analysis

The goal of functional analysis is to provide biological insight, so it’s important to interpret our results in the context of our experimental hypothesis: Genetically diverse, wild-derived mice (PWD) are expected to have an enhanced ability to suppress inflammation-related cell death compared to standard laboratory mice (B6). Therefore, we anticipate a downregulation of processes or pathways related to cell death.

GO Ontologies

The Gene Ontology (GO) knowledgebase is the world’s largest source of information on the function of genes. GO terms are organized into three independent controlled vocabularies (ontologies) in a species-independent manner:

Biological process: refers to the biological role involving the gene or gene product, and could include “transcription”, “signal transduction”, and “apoptosis”. A biological process generally involves a chemical or physical change of the starting material or input.
Molecular function: represents the biochemical activity of the gene product, such activities could include “ligand”, “GTPase”, and “transporter”.
Cellular component: refers to the location in the cell of the gene product. Cellular components could include “nucleus”, “lysosome”, and “plasma membrane”.

Each GO term has a term name (e.g. DNA repair) and a unique term accession number (GO:0005125), and a single gene product can be associated with many GO terms.

clusterProfiler

We will be using clusterProfiler to perform over-representation analysis on GO terms associated with our list of significant genes.

Over-representation (or enrichment) analysis is a statistical method that determines whether genes from pre-defined sets (ex: those beloging to a specific GO term or KEGG pathway) are present more than would be expected (over-represented) in a subset of your data. In this case, the subset is your set of under or over expressed genes.

Clusterprofiler will take as input a significant gene list and a background gene list and performs statistical enrichment analysis using hypergeometric testing. The basic arguments allow the user to select the appropriate organism and GO ontology (BP, CC, MF) to test.

Prepping the data

PWDvB6_LPS_data <- read.csv("PWD-LPSvsB6-LPS_normalized_matrix.csv", header = T)

head(PWDvB6_LPS_data)

##           Ensembl_id               Gene_id     baseMean log2FoldChange
## 1 ENSMUSG00000000001  ENSMUSG00000000001.4  5932.726707    -0.07093008
## 2 ENSMUSG00000000028 ENSMUSG00000000028.15   145.143927     0.46312330
## 3 ENSMUSG00000000037 ENSMUSG00000000037.17     8.268551     2.27033900
## 4 ENSMUSG00000000056  ENSMUSG00000000056.7   234.738253     0.15596467
## 5 ENSMUSG00000000058  ENSMUSG00000000058.6  1506.777887    -1.26487297
## 6 ENSMUSG00000000078  ENSMUSG00000000078.7 10025.556928    -0.92963543
##        lfcSE       pvalue         padj Unstim_B6_Rep1 Unstim_B6_Rep2
## 1 0.04256565 9.350730e-02 1.411407e-01    5016.280801    5044.716516
## 2 0.13131617 1.732092e-04 4.163415e-04     120.118612     109.345435
## 3 0.66736121 3.171750e-05 8.285304e-05       2.125993       2.779969
## 4 0.10241008 1.139347e-01 1.675967e-01     263.623149     263.170369
## 5 0.08496567 6.480031e-51 9.196782e-50    1119.335385    1237.086067
## 6 0.04415007 4.385270e-99 1.532094e-97    8222.278448    8505.777536
##   Unstim_B6_Rep3 Unstim_B6_Rep4 Unstim_c11.2_rep1 Unstim_c11.2_rep2
## 1    5146.778634    5307.094517       4824.157778       4932.135450
## 2     103.243492     112.560218        224.698549        231.622427
## 3       3.622579       3.991497          0.857628          3.047664
## 4     220.977298     234.700028        346.481732        330.163547
## 5    1314.996054    1363.495402       1594.330544       1659.960726
## 6    8847.242738    8900.240188       7385.892762       8012.307373
##   Unstim_c11.2_rep3 Unstim_c11.2_rep4 Unstim_PWD_rep1 Unstim_PWD_rep2
## 1       4922.688289       4920.896484      4885.88407      4963.96770
## 2        253.951792        237.205156       181.19728       178.66333
## 3          4.518715          5.330453        20.93835        19.75173
## 4        338.903637        357.140348       223.07399       242.40754
## 5       1887.919191       1739.504480       641.03572       670.66086
## 6       7999.029565       7403.110742      6473.17227      6218.20225
##   Unstim_PWD_rep3 Unstim_PWD_rep4  LPS_B6_rep1  LPS_B6_rep2  LPS_B6_rep3
## 1      5007.92324      5281.35992  7681.263762  6798.135779  7887.455511
## 2       172.22123       165.22261    82.947924    89.264585    75.051192
## 3        26.17087        22.09372     1.430137     2.550417     4.248181
## 4       207.67854       221.89781   183.057487   163.226670   131.693602
## 5       526.79435       591.72748  2092.289868  2113.020256  2315.258485
## 6      5956.82845      6003.72853 17274.620178 15964.333486 17801.293219
##    LPS_B6_rep4 LPS_c11.2_Rep1 LPS_c11.2_Rep2 LPS_c11.2_Rep3 LPS_c11.2_Rep4
## 1  7722.190804    5981.528448    5983.173773     5713.96041    5733.518874
## 2    72.382972     142.475332     168.640115      128.64860     184.431649
## 3     1.340425       1.217738       3.550318        4.36097       8.068885
## 4   151.468072     259.378168     240.534059      250.75575     293.937940
## 5  2265.318948    2630.313813    2248.239010     2286.23831    2237.386436
## 6 16992.572959   11585.558155   12460.729358    11675.40584   11282.606099
##   LPS_PWD_Rep1 LPS_PWD_Rep2 LPS_PWD_Rep3 LPS_PWD_Rep4 diffexpressed symbol
## 1   6980.49968   6936.64081  7636.313135   7076.87660     Unchanged  Gnai3
## 2    106.49320    112.88533   112.251105    117.93209     Unchanged  Cdc45
## 3     14.28567     15.33011     9.714038     17.11917   Upregulated  Scml2
## 4    170.12939    166.77297   173.773345    198.77264     Unchanged   Narf
## 5    988.30888   1008.53517   814.899847    816.01399 Downregulated   Cav2
## 6   8983.09140   9050.33728  9125.798948   8489.20851     Unchanged   Klf6
##   entrezid
## 1    14679
## 2    12544
## 3   107815
## 4    67608
## 5    12390
## 6    23849

PWDvB6_LPS_background <- select(PWDvB6_LPS_data, 
                                Ensembl_id,
                                log2FoldChange,
                                padj,
                                symbol,
                                entrezid)

Create Gene lists

To perform the over-representation analysis, we need a list of background genes and a list of significant genes. For our background dataset we will use all genes and log2FC from the PWDvB6_LPS_background results table.

For the significant gene list we will use only genes with p-adjusted values less than 0.05 and absolute log2FC of 1.

Background gene set

# we want the log2 fold change
original_gene_list <- PWDvB6_LPS_background$log2FoldChange
head(original_gene_list)

## [1] -0.07093008  0.46312330  2.27033900  0.15596467 -1.26487297 -0.92963543

# name the vector
names(original_gene_list) <- PWDvB6_LPS_background$Ensembl_id
head(original_gene_list)

## ENSMUSG00000000001 ENSMUSG00000000028 ENSMUSG00000000037 ENSMUSG00000000056 
##        -0.07093008         0.46312330         2.27033900         0.15596467 
## ENSMUSG00000000058 ENSMUSG00000000078 
##        -1.26487297        -0.92963543

#original_gene_list

# omit any NA values
background_gene_list<-na.omit(original_gene_list)

# sort the list in decreasing order (required for clusterProfiler)
background_gene_list = sort(background_gene_list, decreasing = TRUE)

head(background_gene_list)

## ENSMUSG00000096449 ENSMUSG00000082062 ENSMUSG00000082179 ENSMUSG00000101431 
##          11.118480          10.172383           9.956474           9.854910 
## ENSMUSG00000091383 ENSMUSG00000006586 
##           9.650647           9.615596

Significant gene set

# Extract significant results (padj < 0.05) using subset()
sig_genes_df = subset(PWDvB6_LPS_background, padj < 0.05)

# From significant results, we want to filter on log2fold change
sig_genes <- sig_genes_df$log2FoldChange

# Name the vector 
names(sig_genes) <- sig_genes_df$Ensembl_id

# omit NA values
sig_genes <- na.omit(sig_genes)

# filter on min log2fold change (log2FoldChange > 1)
sig_genes <- names(sig_genes)[abs(sig_genes) > 1]

head(sig_genes)

## [1] "ENSMUSG00000000037" "ENSMUSG00000000058" "ENSMUSG00000000127"
## [4] "ENSMUSG00000000142" "ENSMUSG00000000149" "ENSMUSG00000000157"

Function: `enrichGO()`

From the clusterProfiler package, enrichGO() performs Gene Ontology (GO) enrichment analysis to identify overrepresented biological terms among a list of genes.

There are ontology options: [“BP”, “MF”, “CC”]. The ont argument can accept either “BP” (Biological Process), “MF” (Molecular Function), and “CC” (Cellular Component) subontologies, or “ALL” for all three.
There are keytypes options: This is the source of the annotation (gene ids). The options vary for each annotation. In the example of org.Mm.eg.db, the options are:

keytypes(org.Mm.eg.db)

##  [1] "ACCNUM"       "ALIAS"        "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS"
##  [6] "ENTREZID"     "ENZYME"       "EVIDENCE"     "EVIDENCEALL"  "GENENAME"    
## [11] "GENETYPE"     "GO"           "GOALL"        "IPI"          "MGI"         
## [16] "ONTOLOGY"     "ONTOLOGYALL"  "PATH"         "PFAM"         "PMID"        
## [21] "PROSITE"      "REFSEQ"       "SYMBOL"       "UNIPROT"

The different organisms with annotation databases available to use with for the OrgDb argument can be found below here: https://bioconductor.org/packages/release/BiocViews.html#___Organism

#go_enrich <- enrichGO(gene = sig genes,
                      #universe = background genes,
                      #OrgDb = organism Db, 
                      #keyType = 'ENSEMBL',
                      #readable = T,
                      #ont = "BP",
                      #pvalueCutoff = 0.05, 
                      #qvalueCutoff = 0.05)

Together, we will run this next part. It will take a few minutes to run.

What is it doing?

It tests whether GO Biological Process (BP) terms are significantly enriched in your significant genes (sig_genes) compared to a background of all tested genes (background_gene_list), using mouse ENSEMBL gene IDs.

#saveRDS(go_enrich, file = "enrichGO_PWD.rds")

#go_enrich <- readRDS("enrichGO_PWD.rds")

Output results:

go_enrich will be an object containing a table of significantly enriched GO terms, along with stats like:

p-value
q-value (adjusted p-value/FDR)
number of genes in each term
gene symbols

## Output results from GO analysis to a table
cluster_summary <- data.frame(go_enrich)

write.csv(cluster_summary, "clusterProfiler_PWDvB6_LPS.csv")

head(cluster_summary)

##                    ID                                   Description GeneRatio
## GO:0042742 GO:0042742                 defense response to bacterium  124/3962
## GO:0002683 GO:0002683  negative regulation of immune system process  218/3962
## GO:0001906 GO:0001906                                  cell killing  110/3962
## GO:0001910 GO:0001910 regulation of leukocyte mediated cytotoxicity   66/3962
## GO:0002250 GO:0002250                      adaptive immune response  188/3962
## GO:0002449 GO:0002449                  lymphocyte mediated immunity  145/3962
##              BgRatio RichFactor FoldEnrichment   zScore       pvalue
## GO:0042742 228/14310  0.5438596       1.964319 9.082170 1.143244e-17
## GO:0002683 489/14310  0.4458078       1.610174 8.495169 3.269225e-16
## GO:0001906 202/14310  0.5445545       1.966828 8.563002 7.042149e-16
## GO:0001910 101/14310  0.6534653       2.360194 8.488159 2.975103e-15
## GO:0002250 419/14310  0.4486874       1.620574 7.977518 1.800604e-14
## GO:0002449 305/14310  0.4754098       1.717091 7.832787 7.364507e-14
##                p.adjust       qvalue
## GO:0042742 6.665110e-14 5.391296e-14
## GO:0002683 9.529792e-13 7.708490e-13
## GO:0001906 1.368524e-12 1.106976e-12
## GO:0001910 4.336213e-12 3.507490e-12
## GO:0002250 2.099505e-11 1.698254e-11
## GO:0002449 6.360617e-11 5.144996e-11
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                geneID
## GO:0042742                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Wfdc18/Lgals9/Oas1c/Acp5/Il7r/Ikbkg/Il12b/Aqp1/Il27ra/Fcgr1/H2-M3/Il10/Lyst/Tbk1/Havcr2/Nos2/Arg2/Naip1/Rnase10/Rnase6/Gsdmd/Adamts5/Slc15a2/Cd4/Tfeb/Trem2/Tslp/Lta/Peli3/Casp7/Il23a/Casp1/Mr1/Optn/Prg2/Rbck1/Pld1/Tlr2/Gbp3/Gbp2/Tnfsf8/Unc13b/Emilin1/Gbp9/Plac8/Cxcl5/P2rx7/Npy/Klrk1/Adm/Pycard/Cd209d/Oas3/Oas2/Casp4/Gbp8/Adgrb1/Ssc5d/Isg15/Colec12/Scd1/Serpine1/Mavs/Camp/Rbpj/Il18/Znfx1/Gbp7/Gbp2b/Aicda/Rps19/Ripk2/Irf8/Trem3/Il17f/Pglyrp4/Trem1/Tnfrsf14/Pla2g6/Tlr9/Serpinb9/Irgm1/Rnf31/Rnase2a/Ifnb1/Nlrp10/Hamp/Epx/Oas1a/Slamf8/Lgals4/Iigp1/Hamp2/Krt6a/Rpl30/B2m/H2-K1/Ear6/Muc5b/Oas1g/Jchain/H2-T23/Lyz1/Wfdc17/Nlrp1a/Irgm2/Rnf213/Naip5/Ang/Ear2/Ear1/Wfdc3/Ighg2b/Ighg3/Ighm/Igtp/Naip6/Naip2/Gbp4/Pglyrp2/Igha/Gpr15lg/Gbp6/Gbp5
## GO:0002683 Fer/Clec2g/Pparg/Sox9/Loxl3/Il4/Lgals9/Oas1c/Acp5/Nr1h3/Axl/Il7r/Stat5a/Il12b/Il27ra/Tnfrsf13b/Rnf170/Mertk/Dll1/Slamf1/Gnrh1/Sqstm1/H2-M3/Pdcd1lg2/Il10/Sdc4/Ada/Dhx58/Irf1/Alox15/Smpdl3a/Arg1/Igf1/Vsir/Fstl3/Havcr2/Sec14l1/Nme2/Aurkb/Arg2/Gpr137b/Irf4/Serpinb9b/Dlg5/Tsc22d1/Olfm4/Myc/Bcl6/Mefv/Apod/Cblb/Cd200r1/Cd86/Parp3/Tmem176a/Trem2/Pim1/Dusp1/Tmem178/H2-Oa/Tap2/Btnl2/Cd74/Gal/Gpam/Ccr1/Smad7/Col3a1/Il1rl1/Pdcd1/Cd55/Il2ra/Dab2ip/Nmi/Prg2/Mdk/Tyro3/Zbtb46/Samhd1/Ptpn22/Adar/Rnf115/Svep1/Ppt1/Padi2/Cptp/Tnfrsf4/Emilin1/Stap1/Pf4/Dtx1/Tmem176b/Npy/Usp18/Lag3/Klrk1/Cd69/Clec2d/Klrd1/Clec2i/Siglecg/Cd22/Trim30a/Tsc22d3/Sfrp1/Ido1/Slit2/Cx3cl1/Thy1/Ldlr/Oas3/Rnf125/Atg9a/Tmem131l/Parp14/Cebpa/Dpp4/Ccl12/Isg15/Twist1/Gramd4/Cd276/Ripor2/C1qc/Tap1/Ccn3/Mavs/Ifi206/Bank1/Inpp4b/Socs1/Tapbpl/Serpinb9f/Susd4/Lyplal1/Prdm16/Fgl2/Ptger4/Dtx4/Ifi203/Stat2/Thbs1/Dgkz/Appl1/Rps19/Nbl1/Pla2g5/Pik3r1/H2-Ob/Pglyrp4/Tnfrsf14/Ubash3a/Trafd1/Ifi209/Gimap5/Cdkn2a/Serpinb9/Bst2/Irgm1/Cd24a/Sfn/Gpr68/Fcrl5/Ifnb1/Bmyc/Ccr2/Nlrc3/Lgr4/Klre1/Lgals3/Plcb1/Tlr6/Lax1/Btla/Epx/Cx3cr1/Oas1a/Slamf8/Tarm1/Tafa3/Serpinb9g/Adtrp/Arrb2/Erbb2/Serpinb9e/Cnr2/Ifi208/Pilrb1/Oas1g/H2-T23/Cst7/Irgm2/Ifi214/Fcrlb/Runx3/Serpinb9h/Tigit/Ifi207/Ifi213/Sh2d1b2/Il4i1/Nlrc5/Ceacam1/Pira11/Pira6/Mafb/Ccl28/Grem1/Cd80/Slfn1/Igtp/Klrb1b/Pglyrp2/Mndal/Ptgs2os/Pvrig/Lilrb4b/Lilrb4a
## GO:0001906                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Il4/Lgals9/Il7r/Stat5a/Il12b/Fcgr1/H2-M3/H2-M2/Lyst/Arg1/Il13/Havcr2/Nos2/Serpinb9b/Cxcl14/Emp2/C3/Tap2/Dnase1l3/Il23a/Il18rap/Ccl20/Cfh/Cd55/Mr1/Rasgrp1/Il12a/Cd2/Cd1d1/Gbp3/Gbp2/Prdx1/Rnf19b/Stap1/Cxcl5/Ppbp/Cxcl9/P2rx7/Lag3/Klrk1/Klrb1f/Clec2d/Klrd1/Klrb1c/Klrb1a/Itgam/Ccl22/Ccl17/Cadm1/Cd226/Cxcl10/Ccl2/H2-Q4/Tap1/Ninj1/Serpinb9f/Camp/Il18/Fgl2/Gbp7/Gbp2b/Pvr/Rps19/Pik3r1/Trem3/Pglyrp4/Trem1/Gimap5/Klri2/Serpinb9/Klre1/Lgals3/Hamp/Cx3cr1/Klrc2/Raet1e/H2-T22/Hamp2/Gapdh/Serpinb9g/Rpl30/Fcgr4/Arrb2/H2-Q7/B2m/H2-K1/Serpinb9e/H2-T23/H2-Q10/Klri1/Lyz1/Irgm2/Serpinb9h/H2-T26/H2-Bl/H2-Q6/H2-D1/Sh2d1b2/Ccl27a/Ceacam1/Ccl28/H60b/Igtp/Klrb1b/H2-T27/H2-Q1/Ulbp1/H2-Q2/Gbp5/Pnp
## GO:0001910                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Lgals9/Il7r/Stat5a/Il12b/H2-M3/H2-M2/Arg1/Havcr2/Serpinb9b/Tap2/Dnase1l3/Il23a/Il18rap/Mr1/Rasgrp1/Il12a/Cd1d1/Stap1/Cxcl5/P2rx7/Lag3/Klrk1/Klrb1f/Clec2d/Klrd1/Klrb1c/Klrb1a/Itgam/Cadm1/Cd226/Ccl2/H2-Q4/Tap1/Serpinb9f/Pvr/Gimap5/Klri2/Serpinb9/Klre1/Cx3cr1/Klrc2/Raet1e/H2-T22/Serpinb9g/Arrb2/H2-Q7/B2m/H2-K1/Serpinb9e/H2-T23/H2-Q10/Klri1/Serpinb9h/H2-T26/H2-Bl/H2-Q6/H2-D1/Sh2d1b2/Ceacam1/H60b/Klrb1b/H2-T27/H2-Q1/Ulbp1/H2-Q2/Pnp
## GO:0002250                                                                                                                                                                                                                            Loxl3/Il12rb1/Il4/Il17ra/Cd79a/Il7r/Stat3/Il12b/Ly9/Cd244a/Il27ra/Mef2c/Cd247/Pou2f2/Tnfrsf13b/Slamf1/Gata3/Fcgr1/H2-M3/H2-M2/Pdcd1lg2/Cd40/Ada/Irf1/Alox15/Cd70/Lyst/Traf3ip2/Arg1/Vsir/Havcr2/Myo1g/Dbnl/Rsad2/Arg2/Irf4/Serpinb9b/Erap1/Emp2/Bcl6/Cd86/Parp3/Cd4/Clec4n/Vegfa/Tfeb/Trem2/C3/Tap2/Lta/Cd74/Jak2/Cd7/Il23a/Irf7/6030468B19Rik/Il6/Smad7/Il18rap/Il1rl1/Il18r1/Il1r1/Pdcd1/Cfh/Cd55/Mr1/Prkcq/Il12a/Sema4a/Cd1d1/Svep1/Rnf19b/Masp2/Pf4/P2rx7/Ung/Lag3/Klrk1/Cd69/Klrd1/Siglecg/Cd19/Lat/Il21r/Pycard/Hpx/Swap70/Tnfsf13b/Nedd4/Rnf125/Zbtb1/Cd226/Dpp4/H2-Q4/Fut7/C1qa/C1qc/C1qb/Tap1/Arid5a/H2-DMa/Ccr7/Camk4/Slamf7/C1s1/C1rl/Susd4/Il18/Fgl2/Bach2/Pvr/C3ar1/Cd79b/Aicda/Lat2/Ccr6/Ripk2/Mfsd6/Serpina3g/Il17f/Trem1/Tnfrsf14/Fcmr/Klhl6/Gimap5/Il27/Serpinb9/Gapt/Cd24a/Ifnb1/Il23r/Ccr2/Nlrp10/Il31ra/Gpr183/Lax1/Btla/Cx3cr1/Raet1e/Tarm1/Cd8a/C1ra/H2-T22/Dennd1b/Pirb/Fcgr4/H2-Q7/B2m/H2-K1/Stat4/Jchain/H2-T23/H2-Q10/Tnfrsf13c/Il2rb/Il18bp/H2-T26/H2-Bl/H2-Q6/H2-D1/C4b/Sh2d1b2/Il4i1/Ceacam1/Cd80/H60b/Igkc/Ighg2b/Ighg3/Ighm/Iglc2/Slfn1/H2-T27/H2-Q1/H2-DMb1/Ulbp1/Tnfsf13/H2-Q2/Ighv1-53/Igha/Ighv9-3/C1rb/Ighd/Iglc3/Iglc1/Lilrb4b/Lilrb4a/Pnp
## GO:0002449                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Il4/Lgals9/Il7r/Stat5a/Il12b/Il27ra/Pou2f2/Slamf1/Gata3/Fcgr1/H2-M3/H2-M2/Cd40/Cd70/Lyst/Traf3ip2/Arg1/Vsir/Havcr2/Myo1g/Rsad2/Serpinb9b/Emp2/Bcl6/Parp3/Trem2/C3/Tap2/Lta/Cd74/Il23a/Irf7/6030468B19Rik/Il6/Smad7/Il18rap/Il18r1/Il1r1/Pdcd1/Cfh/Cd55/Mr1/Rasgrp1/Il12a/Cd2/Cd1d1/Svep1/Prdx1/Rnf19b/Masp2/P2rx7/Ung/Lag3/Klrk1/Klrb1f/Clec2d/Klrd1/Klrb1c/Klrb1a/Cd19/Il21r/Hpx/Swap70/Cadm1/Zbtb1/Cd226/Dpp4/H2-Q4/Fut7/C1qa/C1qc/C1qb/Tap1/Arid5a/H2-DMa/Serpinb9f/C1s1/C1rl/Susd4/Il18/Fgl2/Pvr/Aicda/Ccr6/Pik3r1/Fcmr/Gimap5/Klri2/Serpinb9/Gapt/Calhm6/Cd24a/Ifnb1/Ccr2/Klre1/Il31ra/Klrc2/Raet1e/Cd8a/C1ra/H2-T22/Dennd1b/Serpinb9g/Pirb/Fcgr4/Arrb2/H2-Q7/B2m/H2-K1/Serpinb9e/H2-T23/H2-Q10/Klri1/Il2rb/Serpinb9h/H2-T26/H2-Bl/H2-Q6/H2-D1/C4b/Sh2d1b2/Il4i1/Ceacam1/Cd80/H60b/Ighg2b/Ighg3/Ighm/Iglc2/Klrb1b/H2-T27/H2-Q1/Ulbp1/Tnfsf13/H2-Q2/Ighv1-53/Igha/Ighv9-3/C1rb/Ighd/Iglc3/Iglc1/Lilrb4b/Lilrb4a/Pnp
##            Count
## GO:0042742   124
## GO:0002683   218
## GO:0001906   110
## GO:0001910    66
## GO:0002250   188
## GO:0002449   145

Visualizing clusterProfiler results

clusterProfiler has a variety of options for viewing the over-represented GO terms.

The Bar plot is the most widely used method to visualize enriched terms. It depicts the enrichment scores (e.g. p values) and gene count or ratio as bar height and color. Users can specify the number of terms (most significant) or selected terms to display via the showCategory parameter.

barplot <- barplot(go_enrich, 
                   showCategory = 10,
                   title = "GO Biological Pathways",
                   font.size = 8)

barplot

dot <- dotplot(go_enrich, 
               showCategory = 10, 
               font.size = 8)
dot

We can also visualize enriched GO terms as a directed acyclic graph.

Gene ontologies (GO) can be structured as a directed acyclic graph with GO terms as nodes and their relationships as edges. The most commonly used relationships in GO are:

is a
part of
has part
regulates
negatively regulates
positively regulates

#install.packages("ggarchery")
#library(Go.db)
#acyclic_pathway <- goplot(go_enrich, showCategory = 10)

#acyclic_pathway

Gene Concept Network

In order to consider if a gene may belong to multiple “pathway” categories, we can use the cnetplot() function. The cnetplot() depicts the linkages of genes and biological concepts (e.g. GO terms or KEGG pathways) as a network.

cnetplot(go_enrich, 
         showCategory = 3, 
         categorySize="pvalue", 
         foldChange=background_gene_list, 
         vertex.label.font=6, 
         cex_label_gene = 0.5, 
         cex_label_category= 0.8)

Select which labels to be displayed. Options include ‘category’, ‘gene’, ‘all’(the default) and ‘none’.

cnetplot(go_enrich, 
         showCategory = 3, 
         categorySize="pvalue", 
         foldChange=background_gene_list, 
         vertex.label.font=6, 
         cex_label_category = 0.5,
         node_label="category")

#library(ggplot2)
plot_color <- cnetplot(go_enrich, 
                       showCategory = 3, 
                       categorySize="pvalue", 
                       foldChange=background_gene_list, 
                       vertex.label.font=6, 
                       cex_label_gene = 0.5,
                       cex_label_category= 0.8,
                       node_label="category") +
  scale_color_gradient2(name='fold change', 
                        low='darkgreen', 
                        high='firebrick')

plot_color

Function `enrichKEGG`

From the clusterProfiler package, enrichKEGG() performs KEGG pathway enrichment analysis — identifying biological pathways that are overrepresented in a list of genes.

# Create vector of log2FoldChange values
kegg_gene_list <- PWDvB6_LPS_background$log2FoldChange

# Assign ENTREZ IDs as names
names(kegg_gene_list) <- PWDvB6_LPS_background$entrezid

# Remove NA values
kegg_gene_list <- na.omit(kegg_gene_list)

# Remove duplicates in ENTREZ IDs (keep the first occurrence)
kegg_gene_list <- kegg_gene_list[!duplicated(names(kegg_gene_list))]

# sort the list in decreasing order (required for clusterProfiler)
kegg_gene_list = sort(kegg_gene_list, decreasing = TRUE)

head(kegg_gene_list)

## 100042862     14337 100502983    666036    667728     12395 
## 11.118480 10.172383  9.956474  9.854910  9.650647  9.615596

We just created the background list, similar to what we did above. Now, we will move on to create the significant list.

# Extract significant results 
kegg_sig_genes_df = subset(PWDvB6_LPS_background, padj < 0.05)

# From significant results, we want to filter on log2fold change
kegg_sig_genes <- kegg_sig_genes_df$log2FoldChange

# Name the vector with the ENTREZID
names(kegg_sig_genes) <- kegg_sig_genes_df$entrezid

# omit NA values
kegg_sig_genes <- na.omit(kegg_sig_genes)

# filter on log2fold change 
kegg_sig_genes <- names(kegg_sig_genes)[abs(kegg_sig_genes) > 1]

head(kegg_sig_genes)

## [1] "107815" "12390"  "14158"  "12006"  "14673"  "16415"

BTW, the clusterProfiler package provides search_kegg_organism() function to help searching supported organisms.

sal_ent <- search_kegg_organism('Salmonella enterica',
                                by='scientific_name')

dim(sal_ent)

## [1] 50  3

#sal_ent

What’s the enrichKEGG function doing?

It tests whether KEGG pathways are significantly enriched in your significant genes (kegg_sig_genes) compared to a background (kegg_gene_list), using NCBI gene IDs for mouse (“mmu”).

organism KEGG Organism Code: The full list is here: https://www.genome.jp/kegg/catalog/org_list.html (need the 3 letter code).
keyType one of ‘kegg’, ‘ncbi-geneid’, ‘ncib-proteinid’ or ‘uniprot’.
- “kegg”: Internal KEGG gene IDs (e.g., “mmu:12345”) — rarely used directly.
- “ncbi-geneid”: NCBI Entrez Gene IDs (e.g., 12345) — most commonly used.
- “ncbi-proteinid”: NCBI protein IDs (e.g., NP_032824.1) — used if you’re analyzing proteins.
- “uniprot”: UniProt accessions (e.g., P12345) — useful for proteomics or cross-species.

head(kk)

##                                      category
## mmu04060 Environmental Information Processing
## mmu04514 Environmental Information Processing
## mmu05330                       Human Diseases
## mmu04940                       Human Diseases
## mmu05320                       Human Diseases
## mmu04061 Environmental Information Processing
##                                  subcategory       ID
## mmu04060 Signaling molecules and interaction mmu04060
## mmu04514 Signaling molecules and interaction mmu04514
## mmu05330                      Immune disease mmu05330
## mmu04940     Endocrine and metabolic disease mmu04940
## mmu05320                      Immune disease mmu05320
## mmu04061 Signaling molecules and interaction mmu04061
##                                                            Description
## mmu04060                        Cytokine-cytokine receptor interaction
## mmu04514                                       Cell adhesion molecules
## mmu05330                                           Allograft rejection
## mmu04940                                      Type I diabetes mellitus
## mmu05320                                    Autoimmune thyroid disease
## mmu04061 Viral protein interaction with cytokine and cytokine receptor
##          GeneRatio  BgRatio RichFactor FoldEnrichment   zScore       pvalue
## mmu04060  102/1789 197/6237  0.5177665       1.805092 7.281754 3.708189e-12
## mmu04514   67/1789 122/6237  0.5491803       1.914610 6.469841 8.066352e-10
## mmu05330   30/1789  40/6237  0.7500000       2.614729 6.497027 1.553279e-09
## mmu04940   30/1789  43/6237  0.6976744       2.432306 5.976684 2.571369e-08
## mmu05320   28/1789  39/6237  0.7179487       2.502988 5.970875 2.795788e-08
## mmu04061   43/1789  72/6237  0.5972222       2.082099 5.856559 3.358709e-08
##              p.adjust       qvalue
## mmu04060 1.268201e-09 1.046100e-09
## mmu04514 1.379346e-07 1.137780e-07
## mmu05330 1.770738e-07 1.460627e-07
## mmu04940 1.912319e-06 1.577413e-06
## mmu05320 1.912319e-06 1.577413e-06
## mmu04061 1.914464e-06 1.579183e-06
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 geneID
## mmu04060 11482/16161/16189/16170/16172/16197/16160/56221/50931/50930/57916/16153/16164/21939/12981/66102/21948/16163/57266/12166/21943/21933/18414/16180/16169/12504/53603/16994/16992/83430/16193/12768/16174/17082/16182/16177/16178/20297/12765/16184/269275/54450/54448/12156/16159/18049/21949/21941/21942/22163/20311/57349/56744/17329/12769/12154/14562/60504/16186/24099/16168/20312/20299/20295/21813/12163/15945/20304/20293/20306/20296/12771/16324/12775/16173/22035/12458/257630/21936/230979/225392/215257/12778/246779/12145/227288/15977/209590/12772/218624/13051/16880/16847/56066/23832/242700/72049/16185/20301/56838/69583/21944
## mmu04514                                                                                                                                                                                                           16415/16421/94332/14991/14990/58205/20971/21939/74048/319504/18613/13003/58187/17968/13052/12524/12504/15001/12511/20970/54167/18566/241226/12481/22329/12483/16409/16408/12555/54725/18007/19274/18190/225825/171171/102657/15015/15000/14998/16456/104099/17967/52118/15002/270152/216856/67374/12525/71908/15039/15018/14969/14972/15040/15007/100043314/667977/69717/14963/110557/14964/14961/12519/100529082/15006/14999/15013
## mmu05330                                                                                                                                                                                                                                                                                                                                                                                                                                                     16189/16160/14991/14990/16153/21939/12524/15001/16159/15015/15000/14998/15002/15039/15018/14969/14972/15040/15007/667977/69717/14963/110557/14964/14961/12519/100529082/15006/14999/15013
## mmu04940                                                                                                                                                                                                                                                                                                                                                                                                                                                     16160/14991/14990/12524/15001/16992/16159/15015/15000/14998/12876/15002/15039/15018/14969/14972/15893/15040/15007/667977/69717/14963/110557/14964/14961/12519/100529082/15006/14999/15013
## mmu05320                                                                                                                                                                                                                                                                                                                                                                                                                                                                 16189/14991/14990/16153/21939/12524/15001/15015/15000/14998/15002/15039/15018/14969/14972/15040/15007/667977/69717/14963/110557/14964/14961/12519/100529082/15006/14999/15013
## mmu04061                                                                                                                                                                                                                                                                                                                                                                           56221/50930/16153/57266/21933/16992/16193/12768/16174/16182/20297/12765/16184/20311/57349/56744/17329/12769/16186/20312/20299/20295/15945/20304/20293/20306/20296/12771/12775/16173/22035/12458/230979/12778/12145/227288/12772/13051/56066/23832/16185/20301/56838
##          Count
## mmu04060   102
## mmu04514    67
## mmu05330    30
## mmu04940    30
## mmu05320    28
## mmu04061    43

saveRDS(kk, file = "enrichKEGG_PWD.rds")

#kk <- readRDS("enrichKEGG_PWD.rds")

Visualize specific KEGG pathways

library("pathview")
mmu04217 <- pathview(gene.data  = kegg_gene_list,
                     pathway.id = "mmu04217",
                     species    = "mmu",
                     limit      = list(gene=max(abs(kegg_gene_list)), cpd=1))

knitr::include_graphics("mmu04217.pathview.png")

Putting it all together

cowplot::plot_grid(dot, plot_color, labels=LETTERS[1:2])

Before you go…. knit the document!!

Session info

sessionInfo()

## R version 4.4.3 (2025-02-28)
## Platform: x86_64-apple-darwin20
## Running under: macOS Sequoia 15.3
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] pathview_1.46.0        cowplot_1.1.3          ggrepel_0.9.6         
##  [4] ggplot2_3.5.1          dplyr_1.1.4            knitr_1.50            
##  [7] enrichplot_1.26.6      ggnewscale_0.5.1       DOSE_4.0.1            
## [10] org.Mm.eg.db_3.20.0    AnnotationDbi_1.68.0   IRanges_2.40.1        
## [13] S4Vectors_0.44.0       Biobase_2.66.0         BiocGenerics_0.52.0   
## [16] clusterProfiler_4.14.6
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-9            DBI_1.2.3               gson_0.1.0             
##  [4] rlang_1.1.5             magrittr_2.0.3          compiler_4.4.3         
##  [7] RSQLite_2.3.9           png_0.1-8               vctrs_0.6.5            
## [10] reshape2_1.4.4          stringr_1.5.1           pkgconfig_2.0.3        
## [13] crayon_1.5.3            fastmap_1.2.0           XVector_0.46.0         
## [16] labeling_0.4.3          rmdformats_1.0.4        rmarkdown_2.29         
## [19] graph_1.84.1            KEGGgraph_1.66.0        UCSC.utils_1.2.0       
## [22] purrr_1.0.4             bit_4.6.0               xfun_0.51              
## [25] zlibbioc_1.52.0         cachem_1.1.0            aplot_0.2.5            
## [28] GenomeInfoDb_1.42.3     jsonlite_2.0.0          blob_1.2.4             
## [31] BiocParallel_1.40.0     parallel_4.4.3          R6_2.6.1               
## [34] bslib_0.9.0             stringi_1.8.7           RColorBrewer_1.1-3     
## [37] jquerylib_0.1.4         GOSemSim_2.32.0         Rcpp_1.0.14            
## [40] bookdown_0.42           ggtangle_0.0.6          R.utils_2.13.0         
## [43] Matrix_1.7-3            splines_4.4.3           igraph_2.1.4           
## [46] tidyselect_1.2.1        qvalue_2.38.0           rstudioapi_0.17.1      
## [49] yaml_2.3.10             codetools_0.2-20        lattice_0.22-6         
## [52] tibble_3.2.1            plyr_1.8.9              withr_3.0.2            
## [55] treeio_1.30.0           KEGGREST_1.46.0         evaluate_1.0.3         
## [58] gridGraphics_0.5-1      Biostrings_2.74.1       pillar_1.10.1          
## [61] ggtree_3.14.0           ggfun_0.1.8             generics_0.1.3         
## [64] RCurl_1.98-1.17         munsell_0.5.1           scales_1.3.0           
## [67] tidytree_0.4.6          glue_1.8.0              lazyeval_0.2.2         
## [70] tools_4.4.3             data.table_1.17.0       fgsea_1.32.0           
## [73] XML_3.99-0.18           fs_1.6.5                fastmatch_1.1-6        
## [76] grid_4.4.3              tidyr_1.3.1             ape_5.8-1              
## [79] colorspace_2.1-1        nlme_3.1-168            GenomeInfoDbData_1.2.13
## [82] patchwork_1.3.0         cli_3.6.4               Rgraphviz_2.50.0       
## [85] gtable_0.3.6            R.methodsS3_1.8.2       yulab.utils_0.2.0      
## [88] sass_0.4.9              digest_0.6.37           ggplotify_0.1.2        
## [91] org.Hs.eg.db_3.20.0     farver_2.1.2            memoise_2.0.1          
## [94] htmltools_0.5.8.1       R.oo_1.27.0             lifecycle_1.0.4        
## [97] httr_1.4.7              GO.db_3.20.0            bit64_4.6.0-1

Functional analysis

Functional analysis

Getting Started

Install required packages

Load required packages

Functional analysis

GO Ontologies

clusterProfiler

Prepping the data

Create Gene lists

Background gene set

Significant gene set

Function: enrichGO()

Visualizing clusterProfiler results

Gene Concept Network

Function enrichKEGG

Visualize specific KEGG pathways

Putting it all together

Session info

Function: `enrichGO()`

Function `enrichKEGG`