🧬 Foundations of Genomic Data Handling in R – Post 6: GRangesList Link to heading
🚀 Why GRangesList? Link to heading
After mastering GRanges to represent single sets of genomic intervals with biological context, you often need to handle groups of such ranges—in exons grouped by transcript, peaks grouped by sample, or variants grouped by chromosome. Enter GRangesList, the list-like container that preserves both the structure and metadata of each GRanges element.
A GRangesList holds multiple GRanges objects, each identified by a name (e.g., transcript ID, sample name). This enables modular, grouped analyses at scale.
🔧 Creating a GRangesList Object Link to heading
library(GenomicRanges)
# First, create individual GRanges objects
gr1 <- GRanges(
seqnames = "chr1",
ranges = IRanges(start = c(1, 20), end = c(10, 30)),
strand = "+"
)
gr2 <- GRanges(
seqnames = "chr2",
ranges = IRanges(start = c(100, 150), end = c(120, 170)),
strand = "-"
)
# Combine into a GRangesList, naming each element
grl <- GRangesList(
transcript1 = gr1,
transcript2 = gr2
)
# Inspect
grl
Output:
GRangesList object of length 2:
$transcript1
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 1-10 +
[2] chr1 20-30 +
$transcript2
GRanges object with 2 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr2 100-120 -
[2] chr2 150-170 -
🔍 Accessor Functions Link to heading
length(grl) # Number of GRanges elements (here, 2)
names(grl) # Element names: "transcript1", "transcript2"
grlist[[1]] # Extract the first GRanges
unlist(grl) # Flatten to a single GRanges with grouping info
length(grl)
returns how many groups you have.names(grl)
returns the group identifiers.[[ ]]
indexing accesses individual GRanges elements.unlist()
flattens the list into one big GRanges, embedding agroup
metadata column via anRle
under the hood.
🔗 Integration: IRanges + Rle + GRangesList Link to heading
- IRanges powers interval logic inside each GRanges.
- Rle compresses repeated metadata (e.g., group names) when you unlist.
- GRangesList organizes multiple GRanges into a coherent, nested structure.
# After unlisting, check the group Rle
flat <- unlist(grl)
mcols(flat)$group # An Rle vector of length(flat) identifying each original GRanges
runLength(mcols(flat)$group) # Shows run lengths per group
This combination yields efficient storage and fast operations on millions of intervals across groups.
🛠 Core Uses of GRangesList Link to heading
- Exons by transcript: Extract exon ranges per transcript and calculate transcript coverage.
- Peaks by sample: Store ChIP-Seq or ATAC-Seq peak sets for each sample in one object.
- Variants by chromosome: Group variant calls by chromosome or sample.
- Temporal or condition grouping: Organize time-series or treatment groups for multi-sample analyses.
# Example: compute per-transcript coverage
cov_list <- lapply(grl, function(x) coverage(x))
# cov_list is now a list of Rle coverage vectors per transcript
📈 Real-World Example: Alternative Splicing Analysis Link to heading
-
Extract exons grouped by transcript from a
TxDb
object:library(GenomicFeatures) txdb <- makeTxDbFromGFF("annotation.gtf") exons_by_tx <- exonsBy(txdb, by = "tx") # returns GRangesList
-
Filter transcripts based on exon counts or lengths:
exon_counts <- elementNROWS(exons_by_tx) filtered <- exons_by_tx[exon_counts > 3]
-
Compute coverage per transcript:
cov_tx <- lapply(filtered, function(tx) sum(width(tx))) # total exon length per transcript
This workflow uses GRangesList to maintain transcript-level structure, enabling targeted isoform analyses.
🚀 Why GRangesList Matters Link to heading
- Preserves biological grouping in your data model.
- Enables group-wise operations without losing context.
- Integrates seamlessly into SummarizedExperiment for multi-assay experiments.
- Supports large-scale and nested genomic workflows.
Mastering GRangesList equips you to handle complex genomic groupings—essential for transcriptomics, epigenomics, and beyond.
🧬 What’s Next? Link to heading
Coming up: TxDb & GenomicFeatures — the bridge between annotation files and GRanges/GRangesList objects, empowering automated gene model extraction and feature annotation. 🎯
💬 Share Your Thoughts! Link to heading
How are you leveraging GRangesList in your pipelines? Any tips or tricks to share? Drop a comment below! 👇
#Bioinformatics #RStats #GRangesList #GenomicData #Bioconductor #IRanges #Rle #Transcriptomics #ComputationalBiology #AlternativeSplicing #NGS #Genomics #GroupedData #Isoforms #TxDb