Advanced scRNA-seq for Biologists — Post 4: RNA Velocity with scVelo! 🌊 Link to heading

Pseudotime tells you where cells are on a trajectory! But it doesn’t tell you which direction they’re moving! A cell positioned between two clusters could be differentiating forward, dedifferentiating backward, or sitting in a stable intermediate state! 🤔

RNA velocity adds the missing piece: directionality! It predicts where each cell is heading based on its current transcriptional dynamics! Think of it as adding arrows to your trajectory map! 🏹

The biological insight behind velocity 🧬 Link to heading

The core idea is elegant: transcription leaves a trail! 💡

When a gene is actively being transcribed, newly made RNA molecules still contain introns — the non-coding sequences that will be spliced out before the mRNA is used! These unspliced transcripts are a signature of recent transcription! 🔬

When transcription slows or stops, the unspliced pool depletes (existing transcripts get spliced and eventually degraded), and you’re left with mostly spliced transcripts! 📉

By comparing the ratio of unspliced to spliced RNA for each gene, we can infer:

Gene is ramping up: More unspliced RNA than expected — recent burst of transcription — expression is increasing ⬆️
Gene is ramping down: Less unspliced RNA than expected — transcription slowed — expression is decreasing ⬇️
Gene is at steady state: Unspliced and spliced are in equilibrium — stable expression ⚖️

Now here’s the key: aggregate this information across hundreds of genes, and you get a prediction of the cell’s future state! If most genes are ramping toward the expression profile of a mature cell type, that’s where the cell is heading! 🎯

The technical requirements 🛠️ Link to heading

RNA velocity requires information that standard scRNA-seq analysis discards: the unspliced transcripts! ⚠️

Your standard count matrix (what comes out of Cell Ranger) only contains spliced counts! The unspliced reads are filtered out during alignment to exons! To do velocity analysis, you need to go back to the raw data! 🔧

What you need:

Raw FASTQ files — the original sequencing reads, not just the count matrix 📁
A tool to quantify spliced and unspliced counts separately: 🧪
- Velocyto — the original tool, runs on BAM files
- kb-count (kallisto|bustools) — faster, runs directly on FASTQs
scVelo — the Python package for velocity analysis (or velocyto.R for R users, though scVelo is more actively developed) 🐍

This means velocity isn’t something you can add retrospectively to most existing analyses — you need the raw sequencing data! 📊

The scVelo workflow ⚡ Link to heading

Assuming you’ve generated spliced/unspliced counts with velocyto or kb-count, here’s the analysis flow:

import scvelo as scv

# Load your data (with spliced/unspliced layers)
adata = scv.read("your_data.h5ad")

# Basic preprocessing
scv.pp.filter_and_normalize(adata)
scv.pp.moments(adata)

# Estimate velocity
scv.tl.velocity(adata, mode="stochastic")  # or "dynamical" for more accuracy
scv.tl.velocity_graph(adata)

# Visualize
scv.pl.velocity_embedding_stream(adata, basis="umap")

The output is a UMAP (or other embedding) with streamlines showing the predicted direction of cell movement! Arrows flowing from progenitors to mature cells confirm your differentiation trajectory! Arrows pointing in unexpected directions raise questions worth investigating! 🎯

What good results look like ✅ Link to heading

When velocity analysis works well:

Consistent directionality! Arrows across a population generally point the same direction! Random, chaotic arrows suggest noise rather than signal! 🧭

Biological sense! Velocity arrows point from known progenitors toward known mature cells! The directions match your pseudotime ordering! 🧬

Reproducibility! Different velocity modes (stochastic vs dynamical) give similar overall patterns! 🔄

Gene-level support! When you examine individual genes known to change during differentiation, their velocity (positive or negative) matches expectation! 📊

Check specific genes:

scv.pl.velocity(adata, var_names=["Stem_marker", "Mature_marker"])

You should see the stem marker with negative velocity (turning off) and the mature marker with positive velocity (turning on) in differentiating cells! 🎯

When velocity works well 💪 Link to heading

Velocity analysis shines when:

Active transcriptional changes are occurring! Differentiation, activation, cell cycle — processes where genes are being turned on and off! 🔬

You have good sequencing depth! Unspliced reads are rarer than spliced reads! Low-depth libraries may not capture enough unspliced counts for reliable inference! 📊

The biology is transcriptionally driven! Velocity measures transcriptional dynamics! If your biological process is driven primarily by post-transcriptional regulation (protein degradation, translational control), velocity may miss it! 🧠

You’re looking at dynamic populations! Quiescent cells in steady state won’t show strong velocity signals — there’s nothing to predict if nothing is changing! ⚡

When to be cautious ⚠️ Link to heading

Velocity analysis can fail or mislead when:

Cells are transcriptionally stable! Mature, quiescent cells have low velocity magnitude! That’s biologically correct but doesn’t tell you much! 🤷

Sequencing depth is low! Insufficient unspliced counts lead to noisy estimates! The spliced/unspliced ratio becomes unreliable! 📉

The steady-state assumption is violated! Standard velocity models assume genes reach a balance between transcription and degradation! In rapidly changing systems or cells with unusual RNA kinetics, this may not hold! 🔬

Post-transcriptional regulation dominates! If gene expression is primarily controlled by mRNA stability or translation rather than transcription, velocity (which measures transcription rate) may not capture the relevant biology! 🧬

Technical artifacts create spurious signal! Ambient RNA contamination, doublets, or batch effects can create velocity patterns that don’t reflect real biology! 🚩

Dynamical vs stochastic mode 🔧 Link to heading

scVelo offers two main velocity estimation modes:

Stochastic mode (mode="stochastic"): Faster, simpler, uses a steady-state assumption! Good for initial exploration! ⚡

Dynamical mode (mode="dynamical"): More accurate, learns kinetic parameters for each gene, handles transient dynamics better! Use for final analysis and publication figures! 🎯

# More accurate but slower
scv.tl.recover_dynamics(adata)
scv.tl.velocity(adata, mode="dynamical")

If stochastic and dynamical modes give very different results, investigate further — the biology or data quality may be more complex than assumed! 🤔

Integration with pseudotime 🔗 Link to heading

Velocity and pseudotime complement each other:

Pseudotime orders cells along a trajectory but doesn’t specify direction ⏱️
Velocity predicts direction but doesn’t provide a numerical ordering 🏹

In combination:

Use pseudotime to define the trajectory structure 📊
Use velocity to confirm directionality 🧭
Check that arrows flow from low to high pseudotime ✅

If velocity arrows point against your pseudotime gradient, something needs investigation! Either your root selection was wrong, or the biology is more complex (dedifferentiation, cycling, etc.)! 🤔

Common pitfalls 🚩 Link to heading

Over-interpreting low-magnitude velocity! Cells with very low velocity confidence are essentially unpredictable! Don’t draw strong conclusions from weak signals! ⚠️

Ignoring gene-level validation! The pretty streamlines are aggregate summaries! Check that individual genes make sense before trusting the big picture! 🔬

Assuming velocity = fate! Velocity predicts short-term direction based on current transcriptional state! Cells can change course! It’s a weather forecast, not a destiny! 🌤️

Forgetting about technical variation! If your velocity patterns correlate with batch, sequencing depth, or other technical factors, be skeptical! 🤷

Bottom line 🎯 Link to heading

RNA velocity adds directionality to trajectory analysis — it tells you where cells are going, not just where they are! When it works, it’s powerful validation that your trajectory reflects real differentiation dynamics! 💪

But it requires additional preprocessing (spliced/unspliced quantification), works best on actively changing populations, and can mislead when technical or biological assumptions are violated! ⚠️

Use it as a complementary tool, not a replacement for biological reasoning! 🧠

In Post 5, we’ll explore PAGA — a method that gives you a bird’s eye view of cluster connectivity before diving into detailed trajectory analysis! 🕸️

See you in Post 5! 🙏