🧬 Mastering Bulk RNA-seq Analysis in R – Post 16: GSEA - When Gene Lists Aren’t Enough! Link to heading
📈 The Evolution Beyond Gene Lists Link to heading
After mastering Over-Representation Analysis (ORA) with GO, KEGG, and Reactome, you’ve learned to ask: “Are my significant genes enriched in specific pathways?”
But what if the most important biological insights are hidden in genes that didn’t make your significance cutoff? What if subtle but coordinated changes across entire pathways are more meaningful than a few dramatically changing genes?
Welcome to Gene Set Enrichment Analysis (GSEA)—the method that uses your complete dataset to detect coordinated biological patterns that ORA completely misses.
🎯 ORA vs GSEA: The Paradigm Shift Link to heading
🔬 ORA: Testing Specific Genes One by One Link to heading
- Takes only genes with padj < 0.05 (maybe 5% of your data)
- Asks: “Are my significant genes enriched in this pathway?”
- Ignores 95% of your expression data
📈 GSEA: Watching Entire Pathways Coordinate Together Link to heading
- Uses ALL genes ranked by fold change
- Asks: “Is this entire pathway trending up or down across my complete dataset?”
- Detects coordinated patterns regardless of individual significance
The power: Even genes with modest changes contribute if they’re part of a coordinated pathway response.
🚀 Setting Up GSEA Analysis Link to heading
Required Packages Link to heading
# Core packages
library(clusterProfiler) # Main GSEA functions
library(enrichplot) # GSEA visualizations
library(org.Hs.eg.db) # Human gene annotations
library(ReactomePA) # Reactome pathways
Key Resources: - clusterProfiler: Bioconductor | GitHub - ReactomePA: Bioconductor
Preparing Your Ranked Gene List Link to heading
# Starting with DESeq2 results
# Method 1: Rank by log2FoldChange (simple)
gene_list <- results$log2FoldChange
names(gene_list) <- rownames(results)
# Method 2: Rank by signed p-value (more robust)
gene_list <- sign(results$log2FoldChange) * (-log10(results$pvalue))
names(gene_list) <- rownames(results)
# Clean and sort
gene_list <- gene_list[!is.na(gene_list)]
gene_list <- sort(gene_list, decreasing = TRUE)
# Check your list
head(gene_list, 5) # Top upregulated
tail(gene_list, 5) # Top downregulated
🔬 Running GSEA with Three Databases Link to heading
GO Biological Process Link to heading
# Convert to ENTREZ IDs
gene_list_entrez <- bitr(names(gene_list),
fromType = "SYMBOL",
toType = "ENTREZID",
OrgDb = org.Hs.eg.db)
# Map to ranked list
gene_list_entrez_ranked <- gene_list[gene_list_entrez$SYMBOL]
names(gene_list_entrez_ranked) <- gene_list_entrez$ENTREZID
# Run GO GSEA
gsea_go <- gseGO(geneList = gene_list_entrez_ranked,
OrgDb = org.Hs.eg.db,
ont = "BP",
minGSSize = 15,
maxGSSize = 500,
pvalueCutoff = 0.05)
KEGG Pathways Link to heading
gsea_kegg <- gseKEGG(geneList = gene_list_entrez_ranked,
organism = "hsa",
minGSSize = 15,
maxGSSize = 500,
pvalueCutoff = 0.05)
Reactome Pathways Link to heading
gsea_reactome <- gsePathway(geneList = gene_list_entrez_ranked,
organism = "human",
minGSSize = 15,
maxGSSize = 500,
pvalueCutoff = 0.05)
📊 Understanding GSEA Results Link to heading
Key Metrics Link to heading
# View top results
head(gsea_go@result, 5)
# Key columns:
# - NES: Normalized Enrichment Score (effect size and direction)
# - p.adjust: FDR corrected p-value
# - core_enrichment: Leading edge genes driving enrichment
🎯 Interpreting NES (Normalized Enrichment Score) Link to heading
- Positive NES: Pathway enriched in upregulated genes
- Negative NES: Pathway enriched in downregulated genes
- |NES| > 2.0: Very strong enrichment
- |NES| > 1.5: Strong enrichment
🎨 Essential GSEA Visualizations Link to heading
Single Pathway Enrichment Plot Link to heading
# Classic GSEA plot for top pathway
gseaplot2(gsea_go,
geneSetID = 1,
title = gsea_go$Description[1],
color = "green")
Multi-Database Comparison Link to heading
# Compare results across databases
p1 <- dotplot(gsea_go, showCategory = 10, title = "GO GSEA")
p2 <- dotplot(gsea_kegg, showCategory = 10, title = "KEGG GSEA")
p3 <- dotplot(gsea_reactome, showCategory = 10, title = "Reactome GSEA")
library(cowplot)
plot_grid(p1, p2, p3, ncol = 3)
🔥 Real-World Example: My KO vs WT Analysis Link to heading
Key Findings Across Databases Link to heading
🟨 Upregulated in KO (Yellow bars in plots):
-
KEGG: Glucose metabolism, glycolysis pathways strongly upregulated
-
Reactome: Gluconeogenesis and energy production coordinated upregulation
-
GO: Ion transport processes enhanced
🟦 Downregulated in KO (Blue bars in plots):
-
KEGG: Hippo signaling pathway suppressed
-
Reactome: Translation machinery heavily downregulated
-
GO: Wound healing and transcription processes coordinated downregulation
Cross-Database Validation Link to heading
# Quick summary
cat("GO significant pathways:", sum(gsea_go@result$p.adjust < 0.05), "\n")
cat("KEGG significant pathways:", sum(gsea_kegg@result$p.adjust < 0.05), "\n")
cat("Reactome significant pathways:", sum(gsea_reactome@result$p.adjust < 0.05), "\n")
Notice: Different databases reveal complementary insights into the same underlying biology—metabolism coordinately upregulated, translation coordinately downregulated.
⚠️ GSEA Red Flags to Watch Link to heading
🚩 All pathways show weak enrichment (|NES| < 1.0)
🚩 Contradictory results across databases
🚩 Only very large (>500) or small (<15) gene sets significant
🚩 No clear biological themes in top pathways
💡 GSEA vs ORA: When to Use What? Link to heading
Use ORA when:
-
You have clear significant gene lists
-
Simple pathway membership questions
-
Preliminary exploration
Use GSEA when:
-
Subtle coordinated changes expected
-
Want to use complete expression data
-
Studying regulatory networks or metabolic coordination
-
ORA shows few/no significant pathways
🔥 Bottom Line Link to heading
ORA tells you: “Which pathways contain your significant genes” GSEA tells you: “Which pathways are coordinately changing across your entire experiment”
My results show perfect examples: metabolism coordinately upregulated, translation coordinately downregulated—patterns that ORA’s arbitrary cutoffs would completely miss! 🎯
The subtle but coordinated biological responses detected by GSEA often represent the most important regulatory changes in your system.
🧪 What’s Next? Link to heading
Post 17: Advanced GSEA with Custom Gene Sets will teach you how to create and analyze your own pathway collections, use tissue-specific gene sets, and integrate external databases for specialized GSEA analyses tailored to your research questions.
Ready to customize GSEA for your unique research needs?
💬 Share Your Thoughts! Link to heading
Have you discovered coordinated pathway changes with GSEA that ORA missed? What subtle biological patterns has GSEA revealed in your data?
#RNAseq #GSEA #PathwayAnalysis #ClusterProfiler #EnrichmentAnalysis #SystemsBiology #DESeq2 #GO #KEGG #Reactome #Bioinformatics #RStats