The Tidyverse is a collection of R packages designed for data science. It provides a consistent and intuitive framework for data manipulation, visualization, and modeling.
Tidyverse in R: Data Science Made Easy
1. What is the Tidyverse?
A cohesive set of packages for data science workflows:
# Install and load the core tidyverse
install.packages("tidyverse")
library(tidyverse)
# Core packages include:
# - ggplot2 (visualization)
# - dplyr (data manipulation)
# - tidyr (data tidying)
# - readr (data import)
# - purrr (functional programming)
# - tibble (modern data frames)
# - stringr (string manipulation)
# - forcats (factor handling)
Tidyverse Basics Quiz
Which package is NOT part of the core tidyverse?
2. Data Import (readr)
Fast and user-friendly data import functions:
# Read CSV files
students <- read_csv("students.csv")
# Read TSV files
survey <- read_tsv("survey_results.tsv")
# Column specification
sales <- read_csv(
"sales_data.csv",
col_types = cols(
date = col_date("%Y-%m-%d"),
region = col_factor(),
revenue = col_double()
)
)
# Write data back to files
write_csv(students, "students_clean.csv")
Data Import Quiz
Which function reads comma-separated values?
3. Data Wrangling (dplyr)
The grammar of data manipulation:
# Create a tibble (modern data frame)
tib <- tibble(
id = 1:5,
name = c("Alice", "Bob", "Charlie", "David", "Eve"),
score = c(88, 92, 85, 79, 95)
)
# Key verbs:
# Filter rows
tib %>% filter(score > 85)
# Select columns
tib %>% select(name, score)
# Create new columns
tib %>% mutate(grade = ifelse(score >= 90, "A", "B"))
# Summarize data
tib %>% summarize(avg_score = mean(score))
# Grouped operations
tib %>%
mutate(grade = ifelse(score >= 90, "A", "B")) %>%
group_by(grade) %>%
summarize(count = n(), avg = mean(score))
dplyr Quiz
Which dplyr verb creates new columns?
4. Data Tidying (tidyr)
Getting your data into a tidy format:
# Pivoting data
sales <- tibble(
product = c("A", "B", "C"),
Q1 = c(120, 150, 90),
Q2 = c(140, 110, 95)
)
# Wide to long
sales_long <- sales %>%
pivot_longer(cols = c(Q1, Q2),
names_to = "quarter",
values_to = "revenue")
# Long to wide
sales_wide <- sales_long %>%
pivot_wider(names_from = quarter,
values_from = revenue)
# Handling missing values
df <- tibble(x = c(1, NA, 3), y = c("a", NA, "c"))
df %>% drop_na() # Remove rows with NAs
df %>% replace_na(list(x = 0, y = "unknown")) # Replace NAs
tidyr Quiz
Which function converts data from wide to long format?
5. Data Visualization (ggplot2)
Create publication-quality graphics:
# Basic scatterplot
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point()
# With aesthetics
ggplot(mpg, aes(x = displ, y = hwy, color = class)) +
geom_point() +
labs(title = "Engine Size vs Highway MPG",
x = "Engine Displacement (L)",
y = "Highway MPG")
# Faceting
ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(~class)
# Saving plots
ggsave("mpg_plot.png", width = 8, height = 6, dpi = 300)
ggplot2 Quiz
Which function adds points to a ggplot?
6. Functional Programming (purrr)
Elegant alternatives to loops:
# Map functions
numbers <- list(1:5, 6:10, 11:15)
# Apply function to each element
map(numbers, mean) # Returns list
map_dbl(numbers, mean) # Returns numeric vector
# Mapping with additional arguments
map(numbers, ~ .x * 2) # Double each element
# Working with data frames
mtcars %>%
split(.$cyl) %>% # Split by cylinder
map(~ lm(mpg ~ wt, data = .x)) %>% # Model for each group
map(summary) # Get summaries
# Safely handle errors
safe_log <- safely(log)
results <- map(list(1, 10, "a"), safe_log)
purrr Quiz
Which map variant returns a numeric vector?
7. String Manipulation (stringr)
Consistent functions for working with strings:
# Basic string operations
text <- c("Data Science", "Machine Learning", "R Programming")
str_length(text) # Count characters
str_to_upper(text) # Convert to uppercase
str_sub(text, 1, 4) # Extract substrings
# Pattern matching
str_detect(text, "Science") # TRUE/FALSE
str_subset(text, "ing") # Return matches
str_extract(text, "[A-Za-z]+") # Extract patterns
# String splitting
dates <- c("2023-01-15", "2022-11-30")
str_split(dates, "-", simplify = TRUE)
# String replacement
str_replace(text, " ", "_") # Replace first space
str_replace_all(text, " ", "_") # Replace all spaces
stringr Quiz
Which function detects if a pattern exists?
8. Factor Handling (forcats)
Tools for working with categorical variables:
# Create factors
survey <- tibble(
response = factor(c("Agree", "Neutral", "Disagree", "Agree"))
)
# Reorder factors
survey %>%
mutate(response = fct_relevel(response, "Disagree", "Neutral", "Agree"))
# Count factor levels
fct_count(survey$response)
# Lump small categories
countries <- factor(c("USA", "UK", "France", "USA", "Germany", "UK"))
fct_lump(countries, n = 2) # Keep top 2, others as "Other"
# Reorder by frequency
survey %>%
mutate(response = fct_infreq(response))
forcats Quiz
Which function reorders factors by frequency?
×