🧬 Mastering Bulk RNA-seq Analysis in R – Post 13: From Gene Lists to Biology with GO Enrichment! Link to heading
🛤️ The Million-Dollar Question: What Do My Genes Actually DO? Link to heading
Picture this: You’ve just completed your DESeq2 analysis and proudly generated a spreadsheet with 500 differentially expressed genes. Your PI walks over, glances at your screen full of cryptic gene names like ENSG00000139618
and ENSG00000141510
, and asks the question that makes every bioinformatician’s heart skip a beat:
“So… what do these genes actually DO biologically?”
Suddenly, your triumph feels hollow. You’ve got the what (which genes changed), but you’re missing the why (what biological processes are affected). This is where Gene Ontology (GO) enrichment analysis swoops in like a superhero, transforming your mysterious gene list into a coherent biological story.
🎯 The Fishing Expedition Analogy Link to heading
Think of GO enrichment analysis like the world’s most sophisticated fishing expedition:
The Setup Link to heading
- The Lake: Your entire genome (~20,000 genes)
- Your Catch: Your differentially expressed genes (let’s say 200)
- The Question: “Did I catch more salmon than expected by random chance?”
The Magic Moment Link to heading
Let’s say salmon represent genes involved in “immune response”:
- Total salmon in lake: 500 genes
- Expected salmon in random catch: ~5 genes
- Actual salmon you caught: 50 genes
- The revelation: You caught 10x more immune genes than expected!
This isn’t luck—your experimental treatment specifically activated immune-related processes.
💡 Gene Ontology: Your Biological Dictionary Link to heading
GO organizes the chaos of biological knowledge into three neat categories:
🔬 Biological Process Link to heading
What cellular events the gene participates in
Examples: “cell division”, “immune response”, “DNA repair”
🧬 Molecular Function Link to heading
What the gene product does biochemically
Examples: “DNA binding”, “kinase activity”, “transcription factor”
📍 Cellular Component Link to heading
Where in the cell the gene product hangs out
Examples: “nucleus”, “mitochondria”, “cell membrane”
Pro Tip: Start with Biological Process—it usually gives the most interpretable results for understanding your experiment’s biological impact!
🚀 Hands-On: Running GO Analysis Like a Pro Link to heading
Step 1: Setup Your Arsenal Link to heading
# Load the heavy hitters
library(clusterProfiler)
library(org.Hs.eg.db) # Human gene annotations
library(ggplot2)
library(enrichplot)
Step 2: Prepare Your Gene Lists (The Critical Step!) Link to heading
# ALWAYS separate up and down-regulated genes!
upregulated_genes <- rownames(results[results$log2FoldChange > 1 &
results$padj < 0.05, ])
downregulated_genes <- rownames(results[results$log2FoldChange < -1 &
results$padj < 0.05, ])
# Check your lists
cat("Upregulated genes:", length(upregulated_genes), "\n")
cat("Downregulated genes:", length(downregulated_genes), "\n")
Step 3: Run the Analysis Link to heading
# GO enrichment for upregulated genes
go_up <- enrichGO(gene = upregulated_genes,
OrgDb = org.Hs.eg.db,
ont = "BP", # Biological Process
pAdjustMethod = "BH",
pvalueCutoff = 0.05,
readable = TRUE)
# Quick peek at results
head(go_up@result, 10)
Step 4: Create Stunning Visualizations Link to heading
# The publication-ready dotplot
dotplot(go_up, showCategory = 20) +
ggtitle("Upregulated Genes: GO Enrichment") +
theme(axis.text.y = element_text(size = 10))
# Clean barplot for presentations
barplot(go_up, showCategory = 15) +
theme(axis.text = element_text(size = 12))
📊 Decoding Your Results: The Detective’s Guide Link to heading
The Essential Columns Link to heading
Column | What It Means | What You Want |
---|---|---|
Description | The biological process name | Specific, interpretable terms |
p.adjust | Corrected p-value | < 0.05 (this is what matters!) |
Count | Your genes in this category | 5-50 genes (sweet spot) |
GeneRatio | Fraction of your genes | Shows enrichment strength |
🔥 Real Results Interpretation Link to heading
# Example results that make you smile
# Description p.adjust Count GeneRatio
# 1 immune response 1.2e-08 45 45/200
# 2 T cell activation 3.4e-06 23 23/200
# 3 cell cycle regulation 8.9e-05 18 18/200
Translation into English: “Your treatment triggered a massive immune response, specifically activating T cells, while also affecting cell proliferation. This looks like a classic inflammatory response with cellular activation!”
⚠️ Red Flags and Rookie Mistakes Link to heading
🚩 Warning Signs in Your Results Link to heading
Super-generic terms dominating
- “metabolic process” or “cellular process”
- These are too broad to be useful—dig deeper!
Thousands of “significant” terms
- Your cutoffs are too lenient
- Tighten your p-value or fold-change thresholds
No significant enrichment at all
- Weak biological signal or overly strict cutoffs
- Check if your gene IDs are converting properly
Results that contradict known biology
- Time to double-check your experimental setup
- Verify your gene lists and sample labels
😅 Common Mistakes That Kill Analyses Link to heading
Mixing up and down-regulated genes
# DON'T do this
all_de_genes <- c(upregulated_genes, downregulated_genes)
# DO this instead
go_up <- enrichGO(upregulated_genes, ...)
go_down <- enrichGO(downregulated_genes, ...)
Ignoring fold-change magnitude
- Don’t just use p-value cutoffs
- Include meaningful effect sizes (|log2FC| > 1)
Trusting gene ID conversion blindly
# Always check conversion success
converted <- bitr(your_genes, fromType = "ENSEMBL",
toType = "ENTREZID", OrgDb = org.Hs.eg.db)
conversion_rate <- nrow(converted) / length(your_genes)
cat("Conversion success:", round(conversion_rate * 100, 1), "%\n")
🎨 Visualization Showdown: Which Plot When? Link to heading
🏆 For Publications: Dotplot Link to heading
dotplot(go_results, showCategory = 15, font.size = 12) +
scale_color_gradient(low = "blue", high = "red") +
theme_minimal()
Why it wins: Shows both significance (color) and gene count (dot size) elegantly
📊 For Presentations: Barplot Link to heading
barplot(go_results, showCategory = 10) +
coord_flip() +
theme(axis.text = element_text(size = 12))
Why it works: Clean, readable from the back of the room
🕸️ For Exploration: Network Plot Link to heading
cnetplot(go_results, showCategory = 10, circular = TRUE)
Why it’s cool: Shows which genes belong to which processes
🧪 Quality Control: The Sanity Checks Link to heading
✅ The Four Essential Tests Link to heading
1. Biological Sense Test
- Do enriched processes match your experimental hypothesis?
- Can you explain the results to your grandmother?
2. Specificity Test
- Are terms specific enough to generate testable hypotheses?
- “T cell activation” > “immune response” > “biological process”
3. Literature Test
- Do similar experiments show related processes?
- Time for a quick PubMed search!
4. Gene-Level Validation
# Check specific genes in enriched pathways
go_genes <- go_results@result$geneID[1] # Top pathway
cat("Genes in top pathway:", go_genes)
🎯 The Ultimate Reality Check Link to heading
Your GO enrichment should transform your story from:
Before: The Gene List Dump Link to heading
“We identified 347 differentially expressed genes, including ENSG00000139618, ENSG00000141510, ENSG00000157764, and 344 others (Supplementary Table 1)…”
After: The Biological Narrative Link to heading
“Our treatment specifically activated immune response pathways, particularly T cell activation and inflammatory signaling (adjusted p < 1e-6), while simultaneously upregulating cell cycle processes, suggesting a coordinated cellular response involving both immune activation and proliferative signaling.”
That’s the difference between data and discovery!
💪 Advanced Pro Tips Link to heading
Separate Analysis Strategy Link to heading
# Get more nuanced insights
go_up_specific <- enrichGO(upregulated_genes, ont = "BP",
pvalueCutoff = 0.01) # Stricter
go_down_specific <- enrichGO(downregulated_genes, ont = "BP",
pvalueCutoff = 0.01)
# Compare and contrast
compareCluster(list(Up = upregulated_genes,
Down = downregulated_genes),
fun = "enrichGO", OrgDb = org.Hs.eg.db)
Custom Visualization Tricks Link to heading
# Make it publication-pretty
dotplot(go_up, showCategory = 20) +
scale_size_continuous(range = c(2, 10)) +
scale_color_gradient(low = "#3182bd", high = "#de2d26") +
theme_minimal() +
theme(
axis.text.y = element_text(size = 11),
plot.title = element_text(size = 14, face = "bold")
) +
labs(title = "GO Enrichment: Upregulated Genes",
x = "Gene Ratio",
y = "GO Terms")
🧪 What’s Next? Link to heading
Post 14: KEGG Pathway Analysis will take your functional analysis to the next level! We’ll explore how to map your genes onto specific biochemical pathways and understand the molecular mechanisms driving your biological processes.
Ready to dive deeper into metabolic and signaling pathways? Let’s connect the dots between genes and mechanisms!
💬 Share Your Thoughts! Link to heading
What’s the most surprising biological process you’ve discovered through GO enrichment? Any horror stories about misinterpreting results? Drop your experiences below! 👇
#RNAseq #GeneOntology #ClusterProfiler #EnrichmentAnalysis #ORA #FunctionalAnnotation #DESeq2 #Bioinformatics #RStats #SystemsBiology