Advanced scRNA-seq for Biologists — Post 6: Choosing Methods and Troubleshooting! 🛠️ Link to heading

Over the last six posts, we’ve built your trajectory analysis toolkit: the biological foundations, Monocle3 implementation, pseudotime interpretation, RNA velocity, and PAGA for exploratory analysis! Now it’s time for the practical wisdom that usually takes years to accumulate: how do you actually make decisions when analyzing real-world data? 🧠

This post is about the messy reality of trajectory analysis — choosing between methods, fixing things when they break, and knowing when to walk away! 💪

The method selection framework 🗺️ Link to heading

There’s no single “best” trajectory method! The right choice depends on your data, your question, and your computational comfort zone! Here’s how to think through it: 🤔

Question 1: Is your trajectory linear or branching?

If you expect a single, linear progression (one starting population differentiating into one endpoint), most methods will work! If you expect branching (a progenitor that can become multiple different cell types), you need a method that explicitly handles this! 🌳

  • Linear trajectories: Monocle3, Slingshot, PAGA, or even principal curve methods ✅
  • Branching trajectories: Monocle3 (excellent branching support), Slingshot (handles multiple lineages), PAGA (graph-based, flexible) ✅

Question 2: Do you have prior knowledge about starting cells?

Root selection matters enormously for pseudotime! If you know which cells are your progenitors (based on markers, experimental design, or prior literature), you can select the root confidently! 🎯

If you don’t know where to start:

  • RNA velocity can suggest directionality 🏹
  • Multiple root selections can test robustness 🔄
  • Marker gene validation becomes even more critical 🔬

Question 3: How many cells do you have?

Trajectory methods need cells at each stage of the process to accurately reconstruct it! 📊

  • Under 1,000 cells: May struggle if transitions are poorly sampled! Consider whether you have enough cells in intermediate states! ⚠️
  • 1,000–5,000 cells: Most methods work well with reasonable coverage! ✅
  • Over 10,000 cells: Computational considerations become relevant; some methods scale better than others! 💻

Question 4: R or Python?

This sounds trivial but matters for reproducibility and integration with your existing workflow! 🔧

  • R ecosystem: Monocle3, Slingshot, integrate naturally with Seurat 📦
  • Python ecosystem: PAGA (Scanpy), scVelo 🐍

You can transfer objects between ecosystems, but it adds complexity! 🤷

Method quick reference 📋 Link to heading

Monocle3 (R) 🧭

  • Handles branching trajectories well
  • Integrates with Seurat via SeuratWrappers
  • Interactive root selection
  • graph_test() for trajectory-dependent genes
  • Best for: General-purpose trajectory analysis in R

Slingshot (R) 🛤️

  • Elegant handling of multiple lineages
  • Returns smooth curves, not just graphs
  • Good for: Well-separated lineages, when you want lineage-specific pseudotime

PAGA (Python/Scanpy) 🕸️

  • Graph abstraction approach — summarizes relationships between clusters
  • Good for exploratory analysis before committing to detailed trajectories
  • Best for: Initial exploration, complex datasets with unclear structure

scVelo (Python) 🌊

  • RNA velocity analysis
  • Adds directionality to any embedding
  • Requires spliced/unspliced quantification
  • Best for: Confirming trajectory direction, dynamic populations

Common problems and solutions 🔧 Link to heading

Problem: Trajectory connects populations that shouldn’t be connected

Your UMAP shows T cells and B cells, and Monocle3 draws a trajectory through both!

Diagnosis: Trajectory methods assume continuity! If you have discrete populations, they’ll try to connect them anyway!

Solution: Subset your data to only include biologically related cells before trajectory analysis! If you’re studying T cell differentiation, remove the B cells first! ✂️

Problem: Pseudotime correlates strongly with batch 🚩

Cells from batch 1 all have low pseudotime; cells from batch 2 all have high pseudotime!

Diagnosis: This is an upstream integration problem! Your trajectory is capturing technical variation, not biology!

Solution: Go back and fix your batch correction! Harmony, Seurat integration, or other methods should remove batch effects before trajectory analysis! If batch effects persist in your UMAP, they’ll persist in your trajectory! 🔧

Problem: No clear trajectory structure emerges 🤷

Your UMAP shows distinct clusters, but learn_graph() produces a messy graph that doesn’t make biological sense!

Diagnosis: Maybe there isn’t a trajectory! If your cells represent genuinely discrete states with no transitional populations, forcing a trajectory won’t help!

Solution: Ask yourself: do I actually expect continuous transitions in this data? If not, standard clustering analysis may be more appropriate! Not every dataset needs trajectory analysis! 📊

Problem: Too many or too few branch points 🌳

The learned graph has branches everywhere (or no branches when you expect them)!

Diagnosis: Graph complexity is influenced by data density and algorithm parameters!

Solutions:

  • Adjust learn_graph() parameters (e.g., minimal_branch_len) 🔧
  • Check cell density in putative branch regions 📊
  • Consider whether your biological expectation matches the data 🤔

Problem: Pseudotime gradient is reversed 🔄

Your most differentiated cells have the lowest pseudotime!

Diagnosis: Wrong root selection! This is common and easy to fix! 😅

Solution: Re-run order_cells() and select a different root! Validate with known markers! ✅

Problem: Trajectory works but dynamic genes don’t make sense 🤔

graph_test() returns genes that don’t match expected differentiation markers!

Diagnosis: Either the trajectory doesn’t reflect real biology, or your expectations need updating!

Solutions:

  • Check that top genes aren’t technical artifacts (ribosomal, mitochondrial) 🔬
  • Validate trajectory with known markers first ✅
  • Consider whether you’re discovering something new 💡

The troubleshooting mindset 🧠 Link to heading

When trajectory analysis produces unexpected results, systematically consider three possibilities:

1. Data quality issue 🔧

Is the problem upstream? Bad integration, ambient RNA, doublets, low-quality cells — all of these can create false trajectories or mask real ones! Most trajectory problems are actually data problems!

2. Method choice issue 🛠️

Is this method appropriate for your data? Linear methods fail on branching data! Methods that assume specific noise models may not fit your library preparation! Sometimes switching methods resolves the issue!

3. Biological reality 🧬

Sometimes the “problem” is your biology! Cells may not follow the trajectory you expected! This is actually a discovery, not an error! 💡

Start with data quality, then method choice, then consider that your results might be correct! 🎯

Validation checklist ✅ Link to heading

Before trusting any trajectory analysis:

  • Known early markers peak at low pseudotime
  • Known late markers peak at high pseudotime
  • Pseudotime doesn’t correlate with batch more than biology
  • Top trajectory genes are biologically plausible
  • Branch points (if any) represent interpretable fate decisions
  • Results are robust to different root selections within the same biological starting point
  • RNA velocity (if used) supports the pseudotime direction
  • PAGA cluster connectivity agrees with detailed trajectory structure

If most boxes aren’t checked, go back and troubleshoot! 🔧

When to walk away 🚶 Link to heading

Not every dataset benefits from trajectory analysis! Walk away when:

  • Your cells represent genuinely discrete types with no expected transitions ❌
  • Technical artifacts dominate and can’t be corrected 🚩
  • You don’t have enough cells to resolve transitions 📉
  • The results don’t validate despite multiple attempts ⚠️
  • You’re forcing a trajectory because the method exists, not because the biology demands it 🤷

There’s no shame in concluding that clustering analysis is sufficient for your data! Trajectory analysis is a tool, not an obligation! 🎯

Bringing it all together 🏗️ Link to heading

This series has given you the foundations for trajectory analysis: the biological reasoning (Post 1), the Monocle3 workflow (Post 2), pseudotime interpretation (Post 3), RNA velocity (Post 4), PAGA for exploratory analysis (Post 5), and now practical decision-making (Post 6)! 📚

The technical skills matter, but the real expertise is judgment — knowing when trajectory analysis fits your data, choosing appropriate methods, validating results against known biology, and troubleshooting when things don’t work! 🧠

That judgment develops with practice! Run these analyses on your own data! Make mistakes! Learn what works for your system! 💪

What’s next 🔮 Link to heading

Part A of the Advanced scRNA-seq series is complete! You can now analyze trajectories, interpret pseudotime, add directionality with RNA velocity, explore cluster connectivity with PAGA, and make informed method choices! 🎉

Part B is coming: Cell-Cell Communication! We’ll explore how cells talk to each other through ligand-receptor interactions, using tools like CellChat and NicheNet! If trajectories tell you how cells change over time, communication analysis tells you how cells influence each other in space! 📡

Different question, complementary insights! 🧬

See you there! 🙏