🧬 Foundations of Genomic Data Handling in R – Post 6: GRangesList Link to heading

🚀 Why GRangesList? Link to heading

After mastering GRanges to represent single sets of genomic intervals with biological context, you often need to handle groups of such ranges—in exons grouped by transcript, peaks grouped by sample, or variants grouped by chromosome. Enter GRangesList, the list-like container that preserves both the structure and metadata of each GRanges element.

A GRangesList holds multiple GRanges objects, each identified by a name (e.g., transcript ID, sample name). This enables modular, grouped analyses at scale.


🔧 Creating a GRangesList Object Link to heading

library(GenomicRanges)

# First, create individual GRanges objects
gr1 <- GRanges(
  seqnames = "chr1",
  ranges   = IRanges(start = c(1, 20), end = c(10, 30)),
  strand   = "+"
)
gr2 <- GRanges(
  seqnames = "chr2",
  ranges   = IRanges(start = c(100, 150), end = c(120, 170)),
  strand   = "-"
)

# Combine into a GRangesList, naming each element
grl <- GRangesList(
  transcript1 = gr1,
  transcript2 = gr2
)

# Inspect
grl

Output:

GRangesList object of length 2:
$transcript1
GRanges object with 2 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1       1-10      +
  [2]     chr1      20-30      +

$transcript2
GRanges object with 2 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr2     100-120      -
  [2]     chr2     150-170      -

🔍 Accessor Functions Link to heading

length(grl)     # Number of GRanges elements (here, 2)
names(grl)      # Element names: "transcript1", "transcript2"
grlist[[1]]    # Extract the first GRanges
unlist(grl)     # Flatten to a single GRanges with grouping info
  • length(grl) returns how many groups you have.
  • names(grl) returns the group identifiers.
  • [[ ]] indexing accesses individual GRanges elements.
  • unlist() flattens the list into one big GRanges, embedding a group metadata column via an Rle under the hood.

🔗 Integration: IRanges + Rle + GRangesList Link to heading

  • IRanges powers interval logic inside each GRanges.
  • Rle compresses repeated metadata (e.g., group names) when you unlist.
  • GRangesList organizes multiple GRanges into a coherent, nested structure.
# After unlisting, check the group Rle
flat <- unlist(grl)
mcols(flat)$group  # An Rle vector of length(flat) identifying each original GRanges
runLength(mcols(flat)$group)  # Shows run lengths per group

This combination yields efficient storage and fast operations on millions of intervals across groups.


🛠 Core Uses of GRangesList Link to heading

  • Exons by transcript: Extract exon ranges per transcript and calculate transcript coverage.
  • Peaks by sample: Store ChIP-Seq or ATAC-Seq peak sets for each sample in one object.
  • Variants by chromosome: Group variant calls by chromosome or sample.
  • Temporal or condition grouping: Organize time-series or treatment groups for multi-sample analyses.
# Example: compute per-transcript coverage
cov_list <- lapply(grl, function(x) coverage(x))
# cov_list is now a list of Rle coverage vectors per transcript

📈 Real-World Example: Alternative Splicing Analysis Link to heading

  1. Extract exons grouped by transcript from a TxDb object:

    library(GenomicFeatures)
    txdb <- makeTxDbFromGFF("annotation.gtf")
    exons_by_tx <- exonsBy(txdb, by = "tx")  # returns GRangesList
    
  2. Filter transcripts based on exon counts or lengths:

    exon_counts <- elementNROWS(exons_by_tx)
    filtered <- exons_by_tx[exon_counts > 3]
    
  3. Compute coverage per transcript:

    cov_tx <- lapply(filtered, function(tx) sum(width(tx)))  # total exon length per transcript
    

This workflow uses GRangesList to maintain transcript-level structure, enabling targeted isoform analyses.


🚀 Why GRangesList Matters Link to heading

  • Preserves biological grouping in your data model.
  • Enables group-wise operations without losing context.
  • Integrates seamlessly into SummarizedExperiment for multi-assay experiments.
  • Supports large-scale and nested genomic workflows.

Mastering GRangesList equips you to handle complex genomic groupings—essential for transcriptomics, epigenomics, and beyond.


🧬 What’s Next? Link to heading

Coming up: TxDb & GenomicFeatures — the bridge between annotation files and GRanges/GRangesList objects, empowering automated gene model extraction and feature annotation. 🎯


💬 Share Your Thoughts! Link to heading

How are you leveraging GRangesList in your pipelines? Any tips or tricks to share? Drop a comment below! 👇

#Bioinformatics #RStats #GRangesList #GenomicData #Bioconductor #IRanges #Rle #Transcriptomics #ComputationalBiology #AlternativeSplicing #NGS #Genomics #GroupedData #Isoforms #TxDb