Functional analysis
Getting Started
Install required packages
Load required packages
library(clusterProfiler)
library(org.Mm.eg.db)
library(DOSE)
library(ggnewscale)
library(enrichplot)
library(knitr)
library(dplyr)
library(ggrepel)
library(ggplot2)
library(cowplot)
#library(pathview) #if this library is not installed, its okay install later, takes a few minutes
#library(GO.db) #takes a while to install as well
Functional analysis
The goal of functional analysis is to provide biological insight, so it’s important to interpret our results in the context of our experimental hypothesis: Genetically diverse, wild-derived mice (PWD) are expected to have an enhanced ability to suppress inflammation-related cell death compared to standard laboratory mice (B6). Therefore, we anticipate a downregulation of processes or pathways related to cell death.
GO Ontologies
The Gene Ontology (GO) knowledgebase is the world’s largest source of information on the function of genes. GO terms are organized into three independent controlled vocabularies (ontologies) in a species-independent manner:
Biological process: refers to the biological role involving the gene or gene product, and could include “transcription”, “signal transduction”, and “apoptosis”. A biological process generally involves a chemical or physical change of the starting material or input.
Molecular function: represents the biochemical activity of the gene product, such activities could include “ligand”, “GTPase”, and “transporter”.
Cellular component: refers to the location in the cell of the gene product. Cellular components could include “nucleus”, “lysosome”, and “plasma membrane”.
Each GO term has a term name (e.g. DNA repair) and a unique term accession number (GO:0005125), and a single gene product can be associated with many GO terms.
clusterProfiler
We will be using clusterProfiler to perform over-representation analysis on GO terms associated with our list of significant genes.
Over-representation (or enrichment) analysis is a statistical method that determines whether genes from pre-defined sets (ex: those beloging to a specific GO term or KEGG pathway) are present more than would be expected (over-represented) in a subset of your data. In this case, the subset is your set of under or over expressed genes.
Clusterprofiler will take as input a significant gene list and a background gene list and performs statistical enrichment analysis using hypergeometric testing. The basic arguments allow the user to select the appropriate organism and GO ontology (BP, CC, MF) to test.
Prepping the data
PWDvB6_LPS_data <- read.csv("PWD-LPSvsB6-LPS_normalized_matrix.csv", header = T)
head(PWDvB6_LPS_data)
## Ensembl_id Gene_id baseMean log2FoldChange
## 1 ENSMUSG00000000001 ENSMUSG00000000001.4 5932.726707 -0.07093008
## 2 ENSMUSG00000000028 ENSMUSG00000000028.15 145.143927 0.46312330
## 3 ENSMUSG00000000037 ENSMUSG00000000037.17 8.268551 2.27033900
## 4 ENSMUSG00000000056 ENSMUSG00000000056.7 234.738253 0.15596467
## 5 ENSMUSG00000000058 ENSMUSG00000000058.6 1506.777887 -1.26487297
## 6 ENSMUSG00000000078 ENSMUSG00000000078.7 10025.556928 -0.92963543
## lfcSE pvalue padj Unstim_B6_Rep1 Unstim_B6_Rep2
## 1 0.04256565 9.350730e-02 1.411407e-01 5016.280801 5044.716516
## 2 0.13131617 1.732092e-04 4.163415e-04 120.118612 109.345435
## 3 0.66736121 3.171750e-05 8.285304e-05 2.125993 2.779969
## 4 0.10241008 1.139347e-01 1.675967e-01 263.623149 263.170369
## 5 0.08496567 6.480031e-51 9.196782e-50 1119.335385 1237.086067
## 6 0.04415007 4.385270e-99 1.532094e-97 8222.278448 8505.777536
## Unstim_B6_Rep3 Unstim_B6_Rep4 Unstim_c11.2_rep1 Unstim_c11.2_rep2
## 1 5146.778634 5307.094517 4824.157778 4932.135450
## 2 103.243492 112.560218 224.698549 231.622427
## 3 3.622579 3.991497 0.857628 3.047664
## 4 220.977298 234.700028 346.481732 330.163547
## 5 1314.996054 1363.495402 1594.330544 1659.960726
## 6 8847.242738 8900.240188 7385.892762 8012.307373
## Unstim_c11.2_rep3 Unstim_c11.2_rep4 Unstim_PWD_rep1 Unstim_PWD_rep2
## 1 4922.688289 4920.896484 4885.88407 4963.96770
## 2 253.951792 237.205156 181.19728 178.66333
## 3 4.518715 5.330453 20.93835 19.75173
## 4 338.903637 357.140348 223.07399 242.40754
## 5 1887.919191 1739.504480 641.03572 670.66086
## 6 7999.029565 7403.110742 6473.17227 6218.20225
## Unstim_PWD_rep3 Unstim_PWD_rep4 LPS_B6_rep1 LPS_B6_rep2 LPS_B6_rep3
## 1 5007.92324 5281.35992 7681.263762 6798.135779 7887.455511
## 2 172.22123 165.22261 82.947924 89.264585 75.051192
## 3 26.17087 22.09372 1.430137 2.550417 4.248181
## 4 207.67854 221.89781 183.057487 163.226670 131.693602
## 5 526.79435 591.72748 2092.289868 2113.020256 2315.258485
## 6 5956.82845 6003.72853 17274.620178 15964.333486 17801.293219
## LPS_B6_rep4 LPS_c11.2_Rep1 LPS_c11.2_Rep2 LPS_c11.2_Rep3 LPS_c11.2_Rep4
## 1 7722.190804 5981.528448 5983.173773 5713.96041 5733.518874
## 2 72.382972 142.475332 168.640115 128.64860 184.431649
## 3 1.340425 1.217738 3.550318 4.36097 8.068885
## 4 151.468072 259.378168 240.534059 250.75575 293.937940
## 5 2265.318948 2630.313813 2248.239010 2286.23831 2237.386436
## 6 16992.572959 11585.558155 12460.729358 11675.40584 11282.606099
## LPS_PWD_Rep1 LPS_PWD_Rep2 LPS_PWD_Rep3 LPS_PWD_Rep4 diffexpressed symbol
## 1 6980.49968 6936.64081 7636.313135 7076.87660 Unchanged Gnai3
## 2 106.49320 112.88533 112.251105 117.93209 Unchanged Cdc45
## 3 14.28567 15.33011 9.714038 17.11917 Upregulated Scml2
## 4 170.12939 166.77297 173.773345 198.77264 Unchanged Narf
## 5 988.30888 1008.53517 814.899847 816.01399 Downregulated Cav2
## 6 8983.09140 9050.33728 9125.798948 8489.20851 Unchanged Klf6
## entrezid
## 1 14679
## 2 12544
## 3 107815
## 4 67608
## 5 12390
## 6 23849
PWDvB6_LPS_background <- select(PWDvB6_LPS_data,
Ensembl_id,
log2FoldChange,
padj,
symbol,
entrezid)
Create Gene lists
To perform the over-representation analysis, we need a list of
background genes and a list of significant genes. For our background
dataset we will use all genes and log2FC from the
PWDvB6_LPS_background
results table.
For the significant gene list we will use only genes with p-adjusted values less than 0.05 and absolute log2FC of 1.
Background gene set
# we want the log2 fold change
original_gene_list <- PWDvB6_LPS_background$log2FoldChange
head(original_gene_list)
## [1] -0.07093008 0.46312330 2.27033900 0.15596467 -1.26487297 -0.92963543
# name the vector
names(original_gene_list) <- PWDvB6_LPS_background$Ensembl_id
head(original_gene_list)
## ENSMUSG00000000001 ENSMUSG00000000028 ENSMUSG00000000037 ENSMUSG00000000056
## -0.07093008 0.46312330 2.27033900 0.15596467
## ENSMUSG00000000058 ENSMUSG00000000078
## -1.26487297 -0.92963543
# sort the list in decreasing order (required for clusterProfiler)
background_gene_list = sort(background_gene_list, decreasing = TRUE)
## ENSMUSG00000096449 ENSMUSG00000082062 ENSMUSG00000082179 ENSMUSG00000101431
## 11.118480 10.172383 9.956474 9.854910
## ENSMUSG00000091383 ENSMUSG00000006586
## 9.650647 9.615596
Significant gene set
# Extract significant results (padj < 0.05) using subset()
sig_genes_df = subset(PWDvB6_LPS_background, padj < 0.05)
# From significant results, we want to filter on log2fold change
sig_genes <- sig_genes_df$log2FoldChange
# Name the vector
names(sig_genes) <- sig_genes_df$Ensembl_id
# omit NA values
sig_genes <- na.omit(sig_genes)
# filter on min log2fold change (log2FoldChange > 1)
sig_genes <- names(sig_genes)[abs(sig_genes) > 1]
## [1] "ENSMUSG00000000037" "ENSMUSG00000000058" "ENSMUSG00000000127"
## [4] "ENSMUSG00000000142" "ENSMUSG00000000149" "ENSMUSG00000000157"
Function: enrichGO()
From the clusterProfiler package, enrichGO()
performs
Gene Ontology (GO) enrichment analysis to identify overrepresented
biological terms among a list of genes.
There are ontology options: [“BP”, “MF”, “CC”]. The
ont
argument can accept either “BP” (Biological Process), “MF” (Molecular Function), and “CC” (Cellular Component) subontologies, or “ALL” for all three.There are keytypes options: This is the source of the annotation (gene ids). The options vary for each annotation. In the example of org.Mm.eg.db, the options are:
## [1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
## [6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
## [11] "GENETYPE" "GO" "GOALL" "IPI" "MGI"
## [16] "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM" "PMID"
## [21] "PROSITE" "REFSEQ" "SYMBOL" "UNIPROT"
- The different organisms with annotation databases available to use
with for the
OrgDb
argument can be found below here: https://bioconductor.org/packages/release/BiocViews.html#___Organism
#go_enrich <- enrichGO(gene = sig genes,
#universe = background genes,
#OrgDb = organism Db,
#keyType = 'ENSEMBL',
#readable = T,
#ont = "BP",
#pvalueCutoff = 0.05,
#qvalueCutoff = 0.05)
Together, we will run this next part. It will take a few minutes to run.
What is it doing?
It tests whether GO Biological Process (BP) terms are significantly enriched in your significant genes (sig_genes) compared to a background of all tested genes (background_gene_list), using mouse ENSEMBL gene IDs.
Output results:
go_enrich
will be an object containing a table of
significantly enriched GO terms, along with stats like:
- p-value
- q-value (adjusted p-value/FDR)
- number of genes in each term
- gene symbols
## Output results from GO analysis to a table
cluster_summary <- data.frame(go_enrich)
write.csv(cluster_summary, "clusterProfiler_PWDvB6_LPS.csv")
head(cluster_summary)
## ID Description GeneRatio
## GO:0042742 GO:0042742 defense response to bacterium 124/3962
## GO:0002683 GO:0002683 negative regulation of immune system process 218/3962
## GO:0001906 GO:0001906 cell killing 110/3962
## GO:0001910 GO:0001910 regulation of leukocyte mediated cytotoxicity 66/3962
## GO:0002250 GO:0002250 adaptive immune response 188/3962
## GO:0002449 GO:0002449 lymphocyte mediated immunity 145/3962
## BgRatio RichFactor FoldEnrichment zScore pvalue
## GO:0042742 228/14310 0.5438596 1.964319 9.082170 1.143244e-17
## GO:0002683 489/14310 0.4458078 1.610174 8.495169 3.269225e-16
## GO:0001906 202/14310 0.5445545 1.966828 8.563002 7.042149e-16
## GO:0001910 101/14310 0.6534653 2.360194 8.488159 2.975103e-15
## GO:0002250 419/14310 0.4486874 1.620574 7.977518 1.800604e-14
## GO:0002449 305/14310 0.4754098 1.717091 7.832787 7.364507e-14
## p.adjust qvalue
## GO:0042742 6.665110e-14 5.391296e-14
## GO:0002683 9.529792e-13 7.708490e-13
## GO:0001906 1.368524e-12 1.106976e-12
## GO:0001910 4.336213e-12 3.507490e-12
## GO:0002250 2.099505e-11 1.698254e-11
## GO:0002449 6.360617e-11 5.144996e-11
## geneID
## GO:0042742 Wfdc18/Lgals9/Oas1c/Acp5/Il7r/Ikbkg/Il12b/Aqp1/Il27ra/Fcgr1/H2-M3/Il10/Lyst/Tbk1/Havcr2/Nos2/Arg2/Naip1/Rnase10/Rnase6/Gsdmd/Adamts5/Slc15a2/Cd4/Tfeb/Trem2/Tslp/Lta/Peli3/Casp7/Il23a/Casp1/Mr1/Optn/Prg2/Rbck1/Pld1/Tlr2/Gbp3/Gbp2/Tnfsf8/Unc13b/Emilin1/Gbp9/Plac8/Cxcl5/P2rx7/Npy/Klrk1/Adm/Pycard/Cd209d/Oas3/Oas2/Casp4/Gbp8/Adgrb1/Ssc5d/Isg15/Colec12/Scd1/Serpine1/Mavs/Camp/Rbpj/Il18/Znfx1/Gbp7/Gbp2b/Aicda/Rps19/Ripk2/Irf8/Trem3/Il17f/Pglyrp4/Trem1/Tnfrsf14/Pla2g6/Tlr9/Serpinb9/Irgm1/Rnf31/Rnase2a/Ifnb1/Nlrp10/Hamp/Epx/Oas1a/Slamf8/Lgals4/Iigp1/Hamp2/Krt6a/Rpl30/B2m/H2-K1/Ear6/Muc5b/Oas1g/Jchain/H2-T23/Lyz1/Wfdc17/Nlrp1a/Irgm2/Rnf213/Naip5/Ang/Ear2/Ear1/Wfdc3/Ighg2b/Ighg3/Ighm/Igtp/Naip6/Naip2/Gbp4/Pglyrp2/Igha/Gpr15lg/Gbp6/Gbp5
## GO:0002683 Fer/Clec2g/Pparg/Sox9/Loxl3/Il4/Lgals9/Oas1c/Acp5/Nr1h3/Axl/Il7r/Stat5a/Il12b/Il27ra/Tnfrsf13b/Rnf170/Mertk/Dll1/Slamf1/Gnrh1/Sqstm1/H2-M3/Pdcd1lg2/Il10/Sdc4/Ada/Dhx58/Irf1/Alox15/Smpdl3a/Arg1/Igf1/Vsir/Fstl3/Havcr2/Sec14l1/Nme2/Aurkb/Arg2/Gpr137b/Irf4/Serpinb9b/Dlg5/Tsc22d1/Olfm4/Myc/Bcl6/Mefv/Apod/Cblb/Cd200r1/Cd86/Parp3/Tmem176a/Trem2/Pim1/Dusp1/Tmem178/H2-Oa/Tap2/Btnl2/Cd74/Gal/Gpam/Ccr1/Smad7/Col3a1/Il1rl1/Pdcd1/Cd55/Il2ra/Dab2ip/Nmi/Prg2/Mdk/Tyro3/Zbtb46/Samhd1/Ptpn22/Adar/Rnf115/Svep1/Ppt1/Padi2/Cptp/Tnfrsf4/Emilin1/Stap1/Pf4/Dtx1/Tmem176b/Npy/Usp18/Lag3/Klrk1/Cd69/Clec2d/Klrd1/Clec2i/Siglecg/Cd22/Trim30a/Tsc22d3/Sfrp1/Ido1/Slit2/Cx3cl1/Thy1/Ldlr/Oas3/Rnf125/Atg9a/Tmem131l/Parp14/Cebpa/Dpp4/Ccl12/Isg15/Twist1/Gramd4/Cd276/Ripor2/C1qc/Tap1/Ccn3/Mavs/Ifi206/Bank1/Inpp4b/Socs1/Tapbpl/Serpinb9f/Susd4/Lyplal1/Prdm16/Fgl2/Ptger4/Dtx4/Ifi203/Stat2/Thbs1/Dgkz/Appl1/Rps19/Nbl1/Pla2g5/Pik3r1/H2-Ob/Pglyrp4/Tnfrsf14/Ubash3a/Trafd1/Ifi209/Gimap5/Cdkn2a/Serpinb9/Bst2/Irgm1/Cd24a/Sfn/Gpr68/Fcrl5/Ifnb1/Bmyc/Ccr2/Nlrc3/Lgr4/Klre1/Lgals3/Plcb1/Tlr6/Lax1/Btla/Epx/Cx3cr1/Oas1a/Slamf8/Tarm1/Tafa3/Serpinb9g/Adtrp/Arrb2/Erbb2/Serpinb9e/Cnr2/Ifi208/Pilrb1/Oas1g/H2-T23/Cst7/Irgm2/Ifi214/Fcrlb/Runx3/Serpinb9h/Tigit/Ifi207/Ifi213/Sh2d1b2/Il4i1/Nlrc5/Ceacam1/Pira11/Pira6/Mafb/Ccl28/Grem1/Cd80/Slfn1/Igtp/Klrb1b/Pglyrp2/Mndal/Ptgs2os/Pvrig/Lilrb4b/Lilrb4a
## GO:0001906 Il4/Lgals9/Il7r/Stat5a/Il12b/Fcgr1/H2-M3/H2-M2/Lyst/Arg1/Il13/Havcr2/Nos2/Serpinb9b/Cxcl14/Emp2/C3/Tap2/Dnase1l3/Il23a/Il18rap/Ccl20/Cfh/Cd55/Mr1/Rasgrp1/Il12a/Cd2/Cd1d1/Gbp3/Gbp2/Prdx1/Rnf19b/Stap1/Cxcl5/Ppbp/Cxcl9/P2rx7/Lag3/Klrk1/Klrb1f/Clec2d/Klrd1/Klrb1c/Klrb1a/Itgam/Ccl22/Ccl17/Cadm1/Cd226/Cxcl10/Ccl2/H2-Q4/Tap1/Ninj1/Serpinb9f/Camp/Il18/Fgl2/Gbp7/Gbp2b/Pvr/Rps19/Pik3r1/Trem3/Pglyrp4/Trem1/Gimap5/Klri2/Serpinb9/Klre1/Lgals3/Hamp/Cx3cr1/Klrc2/Raet1e/H2-T22/Hamp2/Gapdh/Serpinb9g/Rpl30/Fcgr4/Arrb2/H2-Q7/B2m/H2-K1/Serpinb9e/H2-T23/H2-Q10/Klri1/Lyz1/Irgm2/Serpinb9h/H2-T26/H2-Bl/H2-Q6/H2-D1/Sh2d1b2/Ccl27a/Ceacam1/Ccl28/H60b/Igtp/Klrb1b/H2-T27/H2-Q1/Ulbp1/H2-Q2/Gbp5/Pnp
## GO:0001910 Lgals9/Il7r/Stat5a/Il12b/H2-M3/H2-M2/Arg1/Havcr2/Serpinb9b/Tap2/Dnase1l3/Il23a/Il18rap/Mr1/Rasgrp1/Il12a/Cd1d1/Stap1/Cxcl5/P2rx7/Lag3/Klrk1/Klrb1f/Clec2d/Klrd1/Klrb1c/Klrb1a/Itgam/Cadm1/Cd226/Ccl2/H2-Q4/Tap1/Serpinb9f/Pvr/Gimap5/Klri2/Serpinb9/Klre1/Cx3cr1/Klrc2/Raet1e/H2-T22/Serpinb9g/Arrb2/H2-Q7/B2m/H2-K1/Serpinb9e/H2-T23/H2-Q10/Klri1/Serpinb9h/H2-T26/H2-Bl/H2-Q6/H2-D1/Sh2d1b2/Ceacam1/H60b/Klrb1b/H2-T27/H2-Q1/Ulbp1/H2-Q2/Pnp
## GO:0002250 Loxl3/Il12rb1/Il4/Il17ra/Cd79a/Il7r/Stat3/Il12b/Ly9/Cd244a/Il27ra/Mef2c/Cd247/Pou2f2/Tnfrsf13b/Slamf1/Gata3/Fcgr1/H2-M3/H2-M2/Pdcd1lg2/Cd40/Ada/Irf1/Alox15/Cd70/Lyst/Traf3ip2/Arg1/Vsir/Havcr2/Myo1g/Dbnl/Rsad2/Arg2/Irf4/Serpinb9b/Erap1/Emp2/Bcl6/Cd86/Parp3/Cd4/Clec4n/Vegfa/Tfeb/Trem2/C3/Tap2/Lta/Cd74/Jak2/Cd7/Il23a/Irf7/6030468B19Rik/Il6/Smad7/Il18rap/Il1rl1/Il18r1/Il1r1/Pdcd1/Cfh/Cd55/Mr1/Prkcq/Il12a/Sema4a/Cd1d1/Svep1/Rnf19b/Masp2/Pf4/P2rx7/Ung/Lag3/Klrk1/Cd69/Klrd1/Siglecg/Cd19/Lat/Il21r/Pycard/Hpx/Swap70/Tnfsf13b/Nedd4/Rnf125/Zbtb1/Cd226/Dpp4/H2-Q4/Fut7/C1qa/C1qc/C1qb/Tap1/Arid5a/H2-DMa/Ccr7/Camk4/Slamf7/C1s1/C1rl/Susd4/Il18/Fgl2/Bach2/Pvr/C3ar1/Cd79b/Aicda/Lat2/Ccr6/Ripk2/Mfsd6/Serpina3g/Il17f/Trem1/Tnfrsf14/Fcmr/Klhl6/Gimap5/Il27/Serpinb9/Gapt/Cd24a/Ifnb1/Il23r/Ccr2/Nlrp10/Il31ra/Gpr183/Lax1/Btla/Cx3cr1/Raet1e/Tarm1/Cd8a/C1ra/H2-T22/Dennd1b/Pirb/Fcgr4/H2-Q7/B2m/H2-K1/Stat4/Jchain/H2-T23/H2-Q10/Tnfrsf13c/Il2rb/Il18bp/H2-T26/H2-Bl/H2-Q6/H2-D1/C4b/Sh2d1b2/Il4i1/Ceacam1/Cd80/H60b/Igkc/Ighg2b/Ighg3/Ighm/Iglc2/Slfn1/H2-T27/H2-Q1/H2-DMb1/Ulbp1/Tnfsf13/H2-Q2/Ighv1-53/Igha/Ighv9-3/C1rb/Ighd/Iglc3/Iglc1/Lilrb4b/Lilrb4a/Pnp
## GO:0002449 Il4/Lgals9/Il7r/Stat5a/Il12b/Il27ra/Pou2f2/Slamf1/Gata3/Fcgr1/H2-M3/H2-M2/Cd40/Cd70/Lyst/Traf3ip2/Arg1/Vsir/Havcr2/Myo1g/Rsad2/Serpinb9b/Emp2/Bcl6/Parp3/Trem2/C3/Tap2/Lta/Cd74/Il23a/Irf7/6030468B19Rik/Il6/Smad7/Il18rap/Il18r1/Il1r1/Pdcd1/Cfh/Cd55/Mr1/Rasgrp1/Il12a/Cd2/Cd1d1/Svep1/Prdx1/Rnf19b/Masp2/P2rx7/Ung/Lag3/Klrk1/Klrb1f/Clec2d/Klrd1/Klrb1c/Klrb1a/Cd19/Il21r/Hpx/Swap70/Cadm1/Zbtb1/Cd226/Dpp4/H2-Q4/Fut7/C1qa/C1qc/C1qb/Tap1/Arid5a/H2-DMa/Serpinb9f/C1s1/C1rl/Susd4/Il18/Fgl2/Pvr/Aicda/Ccr6/Pik3r1/Fcmr/Gimap5/Klri2/Serpinb9/Gapt/Calhm6/Cd24a/Ifnb1/Ccr2/Klre1/Il31ra/Klrc2/Raet1e/Cd8a/C1ra/H2-T22/Dennd1b/Serpinb9g/Pirb/Fcgr4/Arrb2/H2-Q7/B2m/H2-K1/Serpinb9e/H2-T23/H2-Q10/Klri1/Il2rb/Serpinb9h/H2-T26/H2-Bl/H2-Q6/H2-D1/C4b/Sh2d1b2/Il4i1/Ceacam1/Cd80/H60b/Ighg2b/Ighg3/Ighm/Iglc2/Klrb1b/H2-T27/H2-Q1/Ulbp1/Tnfsf13/H2-Q2/Ighv1-53/Igha/Ighv9-3/C1rb/Ighd/Iglc3/Iglc1/Lilrb4b/Lilrb4a/Pnp
## Count
## GO:0042742 124
## GO:0002683 218
## GO:0001906 110
## GO:0001910 66
## GO:0002250 188
## GO:0002449 145
Visualizing clusterProfiler results
clusterProfiler
has a variety of options for viewing the
over-represented GO terms.
The Bar plot is the most widely used method to visualize enriched terms. It depicts the enrichment scores (e.g. p values) and gene count or ratio as bar height and color. Users can specify the number of terms (most significant) or selected terms to display via the showCategory parameter.
barplot <- barplot(go_enrich,
showCategory = 10,
title = "GO Biological Pathways",
font.size = 8)
barplot
We can also visualize enriched GO terms as a directed acyclic graph.
Gene ontologies (GO) can be structured as a directed acyclic graph with GO terms as nodes and their relationships as edges. The most commonly used relationships in GO are:
- is a
- part of
- has part
- regulates
- negatively regulates
- positively regulates
Gene Concept Network
In order to consider if a gene may belong to multiple “pathway”
categories, we can use the cnetplot()
function. The
cnetplot()
depicts the linkages of genes and biological
concepts (e.g. GO terms or KEGG pathways) as a network.
cnetplot(go_enrich,
showCategory = 3,
categorySize="pvalue",
foldChange=background_gene_list,
vertex.label.font=6,
cex_label_gene = 0.5,
cex_label_category= 0.8)
Select which labels to be displayed. Options include ‘category’, ‘gene’, ‘all’(the default) and ‘none’.
cnetplot(go_enrich,
showCategory = 3,
categorySize="pvalue",
foldChange=background_gene_list,
vertex.label.font=6,
cex_label_category = 0.5,
node_label="category")
#library(ggplot2)
plot_color <- cnetplot(go_enrich,
showCategory = 3,
categorySize="pvalue",
foldChange=background_gene_list,
vertex.label.font=6,
cex_label_gene = 0.5,
cex_label_category= 0.8,
node_label="category") +
scale_color_gradient2(name='fold change',
low='darkgreen',
high='firebrick')
plot_color
Function enrichKEGG
From the clusterProfiler package, enrichKEGG()
performs
KEGG pathway enrichment analysis — identifying biological pathways that
are overrepresented in a list of genes.
# Create vector of log2FoldChange values
kegg_gene_list <- PWDvB6_LPS_background$log2FoldChange
# Assign ENTREZ IDs as names
names(kegg_gene_list) <- PWDvB6_LPS_background$entrezid
# Remove NA values
kegg_gene_list <- na.omit(kegg_gene_list)
# Remove duplicates in ENTREZ IDs (keep the first occurrence)
kegg_gene_list <- kegg_gene_list[!duplicated(names(kegg_gene_list))]
# sort the list in decreasing order (required for clusterProfiler)
kegg_gene_list = sort(kegg_gene_list, decreasing = TRUE)
## 100042862 14337 100502983 666036 667728 12395
## 11.118480 10.172383 9.956474 9.854910 9.650647 9.615596
We just created the background list, similar to what we did above. Now, we will move on to create the significant list.
# Extract significant results
kegg_sig_genes_df = subset(PWDvB6_LPS_background, padj < 0.05)
# From significant results, we want to filter on log2fold change
kegg_sig_genes <- kegg_sig_genes_df$log2FoldChange
# Name the vector with the ENTREZID
names(kegg_sig_genes) <- kegg_sig_genes_df$entrezid
# omit NA values
kegg_sig_genes <- na.omit(kegg_sig_genes)
# filter on log2fold change
kegg_sig_genes <- names(kegg_sig_genes)[abs(kegg_sig_genes) > 1]
## [1] "107815" "12390" "14158" "12006" "14673" "16415"
BTW, the clusterProfiler package provides
search_kegg_organism()
function to help searching supported
organisms.
## [1] 50 3
What’s the enrichKEGG
function
doing?
It tests whether KEGG pathways are significantly enriched in your significant genes (kegg_sig_genes) compared to a background (kegg_gene_list), using NCBI gene IDs for mouse (“mmu”).
organism KEGG Organism Code: The full list is here: https://www.genome.jp/kegg/catalog/org_list.html (need the 3 letter code).
keyType one of ‘kegg’, ‘ncbi-geneid’, ‘ncib-proteinid’ or ‘uniprot’.
- “kegg”: Internal KEGG gene IDs (e.g., “mmu:12345”) — rarely used directly.
- “ncbi-geneid”: NCBI Entrez Gene IDs (e.g., 12345) — most commonly used.
- “ncbi-proteinid”: NCBI protein IDs (e.g., NP_032824.1) — used if you’re analyzing proteins.
- “uniprot”: UniProt accessions (e.g., P12345) — useful for proteomics or cross-species.
## category
## mmu04060 Environmental Information Processing
## mmu04514 Environmental Information Processing
## mmu05330 Human Diseases
## mmu04940 Human Diseases
## mmu05320 Human Diseases
## mmu04061 Environmental Information Processing
## subcategory ID
## mmu04060 Signaling molecules and interaction mmu04060
## mmu04514 Signaling molecules and interaction mmu04514
## mmu05330 Immune disease mmu05330
## mmu04940 Endocrine and metabolic disease mmu04940
## mmu05320 Immune disease mmu05320
## mmu04061 Signaling molecules and interaction mmu04061
## Description
## mmu04060 Cytokine-cytokine receptor interaction
## mmu04514 Cell adhesion molecules
## mmu05330 Allograft rejection
## mmu04940 Type I diabetes mellitus
## mmu05320 Autoimmune thyroid disease
## mmu04061 Viral protein interaction with cytokine and cytokine receptor
## GeneRatio BgRatio RichFactor FoldEnrichment zScore pvalue
## mmu04060 102/1789 197/6237 0.5177665 1.805092 7.281754 3.708189e-12
## mmu04514 67/1789 122/6237 0.5491803 1.914610 6.469841 8.066352e-10
## mmu05330 30/1789 40/6237 0.7500000 2.614729 6.497027 1.553279e-09
## mmu04940 30/1789 43/6237 0.6976744 2.432306 5.976684 2.571369e-08
## mmu05320 28/1789 39/6237 0.7179487 2.502988 5.970875 2.795788e-08
## mmu04061 43/1789 72/6237 0.5972222 2.082099 5.856559 3.358709e-08
## p.adjust qvalue
## mmu04060 1.268201e-09 1.046100e-09
## mmu04514 1.379346e-07 1.137780e-07
## mmu05330 1.770738e-07 1.460627e-07
## mmu04940 1.912319e-06 1.577413e-06
## mmu05320 1.912319e-06 1.577413e-06
## mmu04061 1.914464e-06 1.579183e-06
## geneID
## mmu04060 11482/16161/16189/16170/16172/16197/16160/56221/50931/50930/57916/16153/16164/21939/12981/66102/21948/16163/57266/12166/21943/21933/18414/16180/16169/12504/53603/16994/16992/83430/16193/12768/16174/17082/16182/16177/16178/20297/12765/16184/269275/54450/54448/12156/16159/18049/21949/21941/21942/22163/20311/57349/56744/17329/12769/12154/14562/60504/16186/24099/16168/20312/20299/20295/21813/12163/15945/20304/20293/20306/20296/12771/16324/12775/16173/22035/12458/257630/21936/230979/225392/215257/12778/246779/12145/227288/15977/209590/12772/218624/13051/16880/16847/56066/23832/242700/72049/16185/20301/56838/69583/21944
## mmu04514 16415/16421/94332/14991/14990/58205/20971/21939/74048/319504/18613/13003/58187/17968/13052/12524/12504/15001/12511/20970/54167/18566/241226/12481/22329/12483/16409/16408/12555/54725/18007/19274/18190/225825/171171/102657/15015/15000/14998/16456/104099/17967/52118/15002/270152/216856/67374/12525/71908/15039/15018/14969/14972/15040/15007/100043314/667977/69717/14963/110557/14964/14961/12519/100529082/15006/14999/15013
## mmu05330 16189/16160/14991/14990/16153/21939/12524/15001/16159/15015/15000/14998/15002/15039/15018/14969/14972/15040/15007/667977/69717/14963/110557/14964/14961/12519/100529082/15006/14999/15013
## mmu04940 16160/14991/14990/12524/15001/16992/16159/15015/15000/14998/12876/15002/15039/15018/14969/14972/15893/15040/15007/667977/69717/14963/110557/14964/14961/12519/100529082/15006/14999/15013
## mmu05320 16189/14991/14990/16153/21939/12524/15001/15015/15000/14998/15002/15039/15018/14969/14972/15040/15007/667977/69717/14963/110557/14964/14961/12519/100529082/15006/14999/15013
## mmu04061 56221/50930/16153/57266/21933/16992/16193/12768/16174/16182/20297/12765/16184/20311/57349/56744/17329/12769/16186/20312/20299/20295/15945/20304/20293/20306/20296/12771/12775/16173/22035/12458/230979/12778/12145/227288/12772/13051/56066/23832/16185/20301/56838
## Count
## mmu04060 102
## mmu04514 67
## mmu05330 30
## mmu04940 30
## mmu05320 28
## mmu04061 43
Visualize specific KEGG pathways
library("pathview")
mmu04217 <- pathview(gene.data = kegg_gene_list,
pathway.id = "mmu04217",
species = "mmu",
limit = list(gene=max(abs(kegg_gene_list)), cpd=1))
knitr::include_graphics("mmu04217.pathview.png")
Putting it all together
Before you go…. knit the document!!
Session info
## R version 4.4.3 (2025-02-28)
## Platform: x86_64-apple-darwin20
## Running under: macOS Sequoia 15.3
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] pathview_1.46.0 cowplot_1.1.3 ggrepel_0.9.6
## [4] ggplot2_3.5.1 dplyr_1.1.4 knitr_1.50
## [7] enrichplot_1.26.6 ggnewscale_0.5.1 DOSE_4.0.1
## [10] org.Mm.eg.db_3.20.0 AnnotationDbi_1.68.0 IRanges_2.40.1
## [13] S4Vectors_0.44.0 Biobase_2.66.0 BiocGenerics_0.52.0
## [16] clusterProfiler_4.14.6
##
## loaded via a namespace (and not attached):
## [1] bitops_1.0-9 DBI_1.2.3 gson_0.1.0
## [4] rlang_1.1.5 magrittr_2.0.3 compiler_4.4.3
## [7] RSQLite_2.3.9 png_0.1-8 vctrs_0.6.5
## [10] reshape2_1.4.4 stringr_1.5.1 pkgconfig_2.0.3
## [13] crayon_1.5.3 fastmap_1.2.0 XVector_0.46.0
## [16] labeling_0.4.3 rmdformats_1.0.4 rmarkdown_2.29
## [19] graph_1.84.1 KEGGgraph_1.66.0 UCSC.utils_1.2.0
## [22] purrr_1.0.4 bit_4.6.0 xfun_0.51
## [25] zlibbioc_1.52.0 cachem_1.1.0 aplot_0.2.5
## [28] GenomeInfoDb_1.42.3 jsonlite_2.0.0 blob_1.2.4
## [31] BiocParallel_1.40.0 parallel_4.4.3 R6_2.6.1
## [34] bslib_0.9.0 stringi_1.8.7 RColorBrewer_1.1-3
## [37] jquerylib_0.1.4 GOSemSim_2.32.0 Rcpp_1.0.14
## [40] bookdown_0.42 ggtangle_0.0.6 R.utils_2.13.0
## [43] Matrix_1.7-3 splines_4.4.3 igraph_2.1.4
## [46] tidyselect_1.2.1 qvalue_2.38.0 rstudioapi_0.17.1
## [49] yaml_2.3.10 codetools_0.2-20 lattice_0.22-6
## [52] tibble_3.2.1 plyr_1.8.9 withr_3.0.2
## [55] treeio_1.30.0 KEGGREST_1.46.0 evaluate_1.0.3
## [58] gridGraphics_0.5-1 Biostrings_2.74.1 pillar_1.10.1
## [61] ggtree_3.14.0 ggfun_0.1.8 generics_0.1.3
## [64] RCurl_1.98-1.17 munsell_0.5.1 scales_1.3.0
## [67] tidytree_0.4.6 glue_1.8.0 lazyeval_0.2.2
## [70] tools_4.4.3 data.table_1.17.0 fgsea_1.32.0
## [73] XML_3.99-0.18 fs_1.6.5 fastmatch_1.1-6
## [76] grid_4.4.3 tidyr_1.3.1 ape_5.8-1
## [79] colorspace_2.1-1 nlme_3.1-168 GenomeInfoDbData_1.2.13
## [82] patchwork_1.3.0 cli_3.6.4 Rgraphviz_2.50.0
## [85] gtable_0.3.6 R.methodsS3_1.8.2 yulab.utils_0.2.0
## [88] sass_0.4.9 digest_0.6.37 ggplotify_0.1.2
## [91] org.Hs.eg.db_3.20.0 farver_2.1.2 memoise_2.0.1
## [94] htmltools_0.5.8.1 R.oo_1.27.0 lifecycle_1.0.4
## [97] httr_1.4.7 GO.db_3.20.0 bit64_4.6.0-1