🔬 Tidyverse Series – Post 3: Reshaping & Cleaning Data with `tidyr` Link to heading

🛠 Why `tidyr`? Link to heading

Data often comes in messy, inconsistent, or improperly structured formats. tidyr is designed to reshape, clean, and structure data into a tidy format that’s easy to analyze and visualize. Whether you need to pivot, separate, unite, or handle missing values, tidyr makes it seamless.

✔️ Why Use `tidyr`? Link to heading

Transforms messy data into structured formats
Works perfectly with dplyr for smooth data wrangling
Simplifies complex reshaping tasks

Let’s explore the key functions in tidyr, with detailed explanations, code examples, and expected outputs!

📚 Essential `tidyr` Functions Link to heading

➡️ `pivot_longer()`: Convert Wide Data to Long Format Link to heading

In many datasets, values are stored in wide format, making them difficult to analyze. pivot_longer() reshapes wide data into long format, making it easier to filter, summarize, and visualize.

🔹 Example: Reshaping Gene Expression Data Link to heading

Before (`wide format`) Link to heading

Gene	Sample_1	Sample_2	Sample_3
TP53	12.3	10.5	14.2
BRCA1	8.9	9.2	10.1

Using `pivot_longer()` Link to heading

library(tidyr)
library(dplyr)

df_long <- df %>%
  pivot_longer(cols = starts_with("Sample"),
               names_to = "Sample",
               values_to = "Expression")

After (`long format`) Link to heading

Gene	Sample	Expression
TP53	Sample_1	12.3
TP53	Sample_2	10.5
TP53	Sample_3	14.2

✅ Now, this structure allows for easy filtering and statistical analysis!

➡️ `pivot_wider()`: Convert Long Data to Wide Format Link to heading

Sometimes, data stored in long format needs to be expanded back into wide format.

Example: Converting Long Format Back to Wide Link to heading

df_wide <- df_long %>%
  pivot_wider(names_from = "Sample",
              values_from = "Expression")

📌 This will recreate the original wide format, reversing the pivot_longer() operation.

➡️ `separate()`: Splitting One Column into Multiple Columns Link to heading

Often, a single column contains multiple pieces of information that should be split into separate columns.

Example: Splitting Sample Names into Condition & Replicate Link to heading

df_separated <- df_long %>%
  separate(Sample, into = c("Condition", "Replicate"), sep = "_")

Before Link to heading

Gene	Sample	Expression
TP53	Control_1	12.3
TP53	Control_2	10.5

After Link to heading

Gene	Condition	Replicate	Expression
TP53	Control	1	12.3
TP53	Control	2	10.5

✅ Now, Condition and Replicate are separate columns, making analysis easier.

➡️ `unite()`: Combining Multiple Columns into One Link to heading

unite() is the opposite of separate(). It merges multiple columns into a single column, with a specified separator.

Example: Creating a Unique Identifier from Multiple Columns Link to heading

df_united <- df_separated %>%
  unite("Sample_ID", Condition, Replicate, sep = "_")

Before Link to heading

Gene	Condition	Replicate
TP53	Control	1
TP53	Control	2

After Link to heading

Gene	Sample_ID
TP53	Control_1
TP53	Control_2

✅ Now, the Condition and Replicate columns are combined into a single Sample_ID column.

➡️ `drop_na()`: Removing Missing Values Link to heading

Handling missing values is essential to ensure clean data.

Example: Removing Rows with Missing Values Link to heading

df_clean <- df %>%
  drop_na()

✅ This removes all rows that contain missing (NA) values.

➡️ `replace_na()`: Replacing Missing Values Link to heading

Instead of removing missing values, you might want to replace them with a default value.

Example: Replacing Missing Values with Zero Link to heading

df_filled <- df %>%
  replace_na(list(Expression = 0))

✅ This replaces all NA values in the Expression column with 0.

📊 Complete Workflow: Cleaning & Reshaping Data Link to heading

Let’s go through a complete example, from messy data to clean, structured data.

library(tidyr)
library(dplyr)

# Sample messy dataset
df <- data.frame(
  Gene = c("TP53", "BRCA1", "EGFR"),
  Control_1 = c(12.3, NA, 7.8),
  Control_2 = c(10.5, 9.2, 8.9)
)

# Reshape & clean
df_cleaned <- df %>%
  pivot_longer(cols = starts_with("Control"), names_to = "Sample", values_to = "Expression") %>%
  separate(Sample, into = c("Condition", "Replicate"), sep = "_") %>%
  drop_na()

✅ This pipeline reshapes, cleans, and structures the dataset, making it easier to analyze.

📈 Key Takeaways Link to heading

✅ tidyr is essential for reshaping and cleaning data.
✅ pivot_longer() and pivot_wider() make restructuring seamless.
✅ separate() and unite() allow flexible column manipulation.
✅ Handling missing values is easy with drop_na() and replace_na().
✅ Works perfectly alongside dplyr for efficient data workflows.

📌 Next up: Combining Data Efficiently – Joins & Merging with dplyr! Stay tuned! 🚀

👇 How often do you reshape data in your analysis? Let’s discuss!

#Tidyverse #tidyr #RStats #DataScience #Bioinformatics #OpenScience #ComputationalBiology

🔬 Tidyverse Series – Post 3: Reshaping & Cleaning Data with tidyr Link to heading

🛠 Why tidyr? Link to heading

✔️ Why Use tidyr? Link to heading

📚 Essential tidyr Functions Link to heading

➡️ pivot_longer(): Convert Wide Data to Long Format Link to heading

🔹 Example: Reshaping Gene Expression Data Link to heading

Before (wide format) Link to heading

Using pivot_longer() Link to heading

After (long format) Link to heading

➡️ pivot_wider(): Convert Long Data to Wide Format Link to heading

Example: Converting Long Format Back to Wide Link to heading

➡️ separate(): Splitting One Column into Multiple Columns Link to heading

Example: Splitting Sample Names into Condition & Replicate Link to heading

Before Link to heading

After Link to heading

➡️ unite(): Combining Multiple Columns into One Link to heading

Example: Creating a Unique Identifier from Multiple Columns Link to heading

Before Link to heading

After Link to heading

➡️ drop_na(): Removing Missing Values Link to heading

Example: Removing Rows with Missing Values Link to heading

➡️ replace_na(): Replacing Missing Values Link to heading

Example: Replacing Missing Values with Zero Link to heading

📊 Complete Workflow: Cleaning & Reshaping Data Link to heading

📈 Key Takeaways Link to heading

🔬 Tidyverse Series – Post 3: Reshaping & Cleaning Data with `tidyr` Link to heading

🛠 Why `tidyr`? Link to heading

✔️ Why Use `tidyr`? Link to heading

📚 Essential `tidyr` Functions Link to heading

➡️ `pivot_longer()`: Convert Wide Data to Long Format Link to heading

Before (`wide format`) Link to heading

Using `pivot_longer()` Link to heading

After (`long format`) Link to heading

➡️ `pivot_wider()`: Convert Long Data to Wide Format Link to heading

➡️ `separate()`: Splitting One Column into Multiple Columns Link to heading

➡️ `unite()`: Combining Multiple Columns into One Link to heading

➡️ `drop_na()`: Removing Missing Values Link to heading

➡️ `replace_na()`: Replacing Missing Values Link to heading