Advanced scRNA-seq for Biologists — Post 3: Interpreting Pseudotime! 📊 Link to heading

You’ve run Monocle3! You’ve got pseudotime values assigned to every cell! Your UMAP shows a beautiful gradient from purple to yellow! Now comes the hard part: what do those numbers actually mean? 🤔

This is where many analyses go wrong! Not because the code failed, but because the interpretation did! 🎯

Pseudotime is NOT time ⏱️ Link to heading

Let’s get the most important point out of the way: pseudotime is not a measurement of actual time! It’s not hours! It’s not cell divisions! It’s not even a consistent rate of change! 🚫

Pseudotime is ordinal, not absolute! It tells you that cell A is further along the trajectory than cell B! It doesn’t tell you how much further, or how long it took to get there! 📊

Think of mile markers on a highway! If you see mile marker 50 and I see mile marker 30, we know you’re ahead of me! But we don’t know if you’re driving faster, if you started earlier, or if the road between us is straight or winding! The markers tell us relative position, nothing more! 🛣️

This has practical implications for how you interpret and present your results! 🎯

What pseudotime actually represents 🧠 Link to heading

Pseudotime measures distance along the learned trajectory graph, starting from the root you selected! That’s it! 📏

When you called order_cells() and clicked on a starting point, you defined pseudotime = 0! Every other cell gets a value based on how far it is from that root, following the trajectory structure! 🏁

This means:

Cells with similar pseudotime are at similar stages — they’re close to each other on the trajectory ✅
Cells with very different pseudotime are at different stages — one is “ahead” of the other ✅
The specific numbers are arbitrary — pseudotime 5 vs pseudotime 10 doesn’t mean “twice as far along” in any biologically meaningful sense ⚠️

The trajectory captures topology (what connects to what) and ordering (what comes before what)! It doesn’t capture kinetics (how fast things happen)! 🔬

Validation strategy 1: Known marker genes 🧬 Link to heading

The most straightforward validation is checking whether known biology aligns with your pseudotime ordering! 🎯

If you’re studying hematopoiesis, you know that stem cell markers (like CD34 in humans, or Kit in mice) should be expressed early! Mature lineage markers should come later! Plot these genes against pseudotime:

# Plot known early and late markers
plot_genes_in_pseudotime(cds[c("CD34", "CD38", "MPO", "CD14"), ],
                         color_cells_by = "cluster")

What you should see:

Early markers peak at low pseudotime — high expression near your root, decreasing as cells progress ✅
Late markers peak at high pseudotime — low expression early, increasing toward terminal states ✅
Intermediate markers peak in the middle — transient expression during transition phases ✅

If your known early markers peak at high pseudotime, you probably selected the wrong root! Go back to order_cells() and try again! 🔧

Validation strategy 2: Gene expression logic 🔬 Link to heading

Beyond individual markers, check whether the sequence of gene expression makes biological sense! 🧠

For differentiation, there’s usually a logical order: transcription factors that drive lineage commitment should turn on before the downstream structural genes they regulate! Signaling receptors should appear before their target genes are activated! ⏱️

Make a list of genes with known temporal relationships in your system! Plot them:

# Order matters: TF should come before its target
genes_in_order <- c("early_TF", "intermediate_gene", "late_effector")
plot_genes_in_pseudotime(cds[genes_in_order, ])

Do the curves respect the expected order? If a known downstream gene peaks before its upstream regulator, something is wrong — either with your trajectory or with the assumed regulatory relationship! 🤔

Validation strategy 3: Correlation with real time 📅 Link to heading

If your experimental design includes actual timepoints — say, cells harvested at day 0, day 3, day 7, and day 14 of differentiation — you have ground truth to compare against! 🎯

Pseudotime should correlate with real time! Not perfectly (biological variation means cells at the same timepoint are at different stages), but the trend should be clear:

# Check correlation between pseudotime and harvest day
cor.test(pseudotime(cds), colData(cds)$harvest_day)

# Visualize
plot(colData(cds)$harvest_day, pseudotime(cds),
     xlab = "Harvest Day", ylab = "Pseudotime")

What you should see:

Positive correlation — later timepoints have higher average pseudotime ✅
Overlap between timepoints — cells from day 3 and day 7 may have similar pseudotime (asynchronous differentiation is normal) ✅
No complete separation — if day 7 cells all have higher pseudotime than any day 3 cell, either your differentiation is unusually synchronous or something else is driving the trajectory 🤔

What’s concerning:

No correlation — pseudotime varies randomly with respect to real time ⚠️
Batch effects — if timepoints were processed in different batches and pseudotime correlates with batch more than harvest day, you have a technical artifact 🚩

Validation strategy 4: Independent experimental validation 🔬 Link to heading

The gold standard is validating trajectory predictions with independent experiments! 💪

If your trajectory predicts that Gene X turns on at a specific stage, can you confirm that with:

qPCR on sorted populations — sort cells at different stages (by surface markers) and check expression 🧪
Flow cytometry — if Gene X has a surface protein product, stain for it across your differentiation timecourse 📈
Immunofluorescence — visualize protein in tissue sections at different developmental stages 🔬
Perturbation experiments — if you knock out an “early” transcription factor, do “late” genes fail to turn on? ⚡

Computational predictions are hypotheses! Experiments test them! 🎯

Red flags: When to doubt your trajectory 🚩 Link to heading

Some warning signs that your pseudotime may not reflect real biology:

Batch correlates with pseudotime! If cells from batch 1 cluster at low pseudotime and batch 2 at high pseudotime, you’re probably looking at a technical artifact! Check with:

# Pseudotime by batch
boxplot(pseudotime(cds) ~ colData(cds)$batch)

Housekeeping genes appear as trajectory markers! When you run graph_test(), the top hits should be genes known to change during your biological process! If you see ribosomal genes, mitochondrial genes, or other housekeeping factors dominating, the trajectory may be capturing cell quality variation rather than biology! ⚠️

Known late genes peak early! This usually means wrong root selection! Easy to fix — just re-run order_cells()! 🔧

Pseudotime varies wildly within a single experimental condition! If all your cells came from one timepoint and one condition, but pseudotime spans the entire range, ask yourself: is there really that much biological variation, or is the algorithm finding structure in noise? 🤔

The trajectory connects populations that shouldn’t be connected! If your UMAP shows T cells and B cells, and the trajectory runs through both, that’s not a differentiation process — that’s an artifact of forcing trajectory analysis on discrete populations! ❌

Presenting results to collaborators 📊 Link to heading

When it’s time to show your trajectory analysis to your PI or in a paper, don’t lead with raw pseudotime values! Nobody cares that cell X has pseudotime 7.3! 😅

Instead, show:

Gene expression trends! Line plots of key marker genes across pseudotime! This tells a biological story: “As cells differentiate, Gene A turns off while Gene B turns on!” 📈

Validation evidence! Show that known markers behave as expected! This builds confidence that the trajectory reflects reality! ✅

Branch point interpretation! If you have branching trajectories, identify what’s different about each branch! Which genes are expressed in branch A but not B? What biological process does each branch represent? 🌳

Integration with other data! How does pseudotime relate to your experimental conditions? Treatment effects? Genotype differences? 🔬

The goal is to translate computational output into biological insight! Pseudotime is the scaffolding; the biology is the building! 🏗️

Practical checklist before moving forward ✅ Link to heading

Before you trust your pseudotime results:

Known early markers peak at low pseudotime
Known late markers peak at high pseudotime
Gene expression order matches expected biology
Pseudotime correlates with real timepoints (if available)
Batch effects are not driving the trajectory
Top trajectory genes are biologically sensible
Branch points (if any) have interpretable differences

If you can check most of these boxes, your trajectory is probably capturing real biology! If you can’t, go back and troubleshoot before building further analysis on top! 🔧

Bottom line 🎯 Link to heading

Pseudotime is a hypothesis about how your cells are ordered along a biological process! The algorithm doesn’t know your biology — it finds structure in the data! Your job is to test whether that structure matches reality! 🧠

Validation isn’t optional! It’s what separates meaningful trajectory analysis from computational pattern-matching! 💪

In Post 4, we add another dimension: RNA velocity! Where pseudotime tells you where cells are, velocity tells you where they’re going! It’s like adding arrows to your roadmap! 🏹

See you there! 🙏