R Tutorials

Introduction to R Installing R and RStudio R Syntax and Basics R variables R data types R operators

R Intermediate Tutorials

R Control Structures R Functions R Data Manipulation (dplyr) Data Visualization in R (ggplot2) Tidyverse Package in R

R Advanced Tutorials

Advanced Statistical Analysis in R Advanced Machine Learning with R Building Web Apps with Shiny Advanced Data Visualization in R

R Debugging & Testing

Testing in R Debugging in R

Advanced r Tests

Advanced Quizzes

R Basic Quiz Level
R Intermediate Quiz Level
R Advanced Quiz Level

🔒

To benefit from this feature, you need to have an active subscription.

Yes/No Quizzes

R Basic Quiz Level
R Intermediate Quiz Level
R Advanced Quiz Level

🔒

To benefit from this feature, you need to have an active subscription.

Fill in the Blanks (FITB) Test

R test 1
R test 2
R test 3
R test 4

🔒

To benefit from this feature, you need to have an active subscription.

Tasks - Unlock the Treasure Box

R Task 1
R Task 2
R Task 3
R Task 4

🔒

To benefit from this feature, you need to have an active subscription.

Find and Fix Errors R Test

R test 1
R test 2
R test 3
R test 4

🔒

To benefit from this feature, you need to have an active subscription.

R Data Manipulation with dplyr

The dplyr package provides a grammar of data manipulation with intuitive functions that make data wrangling efficient and readable.

1. dplyr Basics

Core dplyr functions for data manipulation:

# Load dplyr
library(dplyr)

# Create a sample data frame
df <- tibble(
  id = 1:5,
  name = c("Alice", "Bob", "Charlie", "David", "Eve"),
  age = c(25, 30, 35, 40, 45),
  score = c(80, 85, 90, 95, 100)
)

# View the data
glimpse(df)

# Basic verbs:
# - filter(): subset rows
# - select(): select columns
# - mutate(): create new columns
# - arrange(): sort rows
# - summarize(): aggregate data

dplyr Basics Quiz

Which dplyr function is used to select columns?

filter()
select()
mutate()

2. Filtering and Selecting

Subset your data by rows and columns:

# Filter rows (like WHERE in SQL)
df %>% filter(age > 30)

# Multiple conditions
df %>% filter(age > 30, score >= 90)

# Select columns
df %>% select(name, score)

# Select helpers:
df %>% select(starts_with("s"))  # score
df %>% select(ends_with("e"))    # name, age
df %>% select(contains("a"))     # name, age

# Rename columns
df %>% rename(student_name = name)

Filtering Quiz

How do you select columns whose names start with "date"?

select(contains("date"))
select(starts_with("date"))
filter(colnames == "date")

3. Creating and Modifying Columns

Transform and create new variables with mutate:

# Create new column
df %>% mutate(score_per_age = score / age)

# Modify existing column
df %>% mutate(age = age + 1)

# Multiple mutations
df %>% mutate(
  age_group = ifelse(age < 35, "Young", "Mature"),
  score_centered = score - mean(score)
)

# Conditional mutation with case_when
df %>% mutate(
  grade = case_when(
    score >= 90 ~ "A",
    score >= 80 ~ "B",
    TRUE ~ "C"
  )
)

Mutation Quiz

Which function is best for complex conditional column creation?

ifelse()
case_when()
when()

4. Summarizing and Grouping

Aggregate data with group_by and summarize:

# Basic summary
df %>% summarize(
  avg_score = mean(score),
  max_age = max(age)
)

# Grouped operations
df %>% 
  group_by(age_group) %>% 
  summarize(
    count = n(),
    mean_score = mean(score),
    sd_score = sd(score)
  )

# Count distinct values
df %>% count(name)

# Window functions
df %>% 
  mutate(rank = dense_rank(desc(score)))

Summarizing Quiz

What does the n() function return?

The number of columns
The number of rows/observations
The number of NA values

5. Joining Data

Combine data from multiple sources:

# Create another data frame
df2 <- tibble(
  id = 3:6,
  department = c("Math", "Science", "Arts", "History")
)

# Inner join (keep only matching rows)
df %>% inner_join(df2, by = "id")

# Left join (keep all left rows)
df %>% left_join(df2, by = "id")

# Full join (keep all rows)
df %>% full_join(df2, by = "id")

# Anti join (keep only non-matching)
df %>% anti_join(df2, by = "id")

# Binding rows/columns
bind_rows(df, df)  # stack vertically
bind_cols(df, df)  # combine horizontally

Joining Quiz

Which join keeps all rows from the first table?

inner_join()
left_join()
full_join()

6. Piping with %>%

The pipe operator for readable data workflows:

# Without pipes (nested)
arrange(
  summarize(
    group_by(
      filter(df, age > 30),
      age_group
    ),
    avg_score = mean(score)
  ),
  avg_score
)

# With pipes (sequential)
df %>%
  filter(age > 30) %>%
  group_by(age_group) %>%
  summarize(avg_score = mean(score)) %>%
  arrange(avg_score)

# Native pipe |> (R 4.1+)
df |>
  filter(age > 30) |>
  group_by(age_group) |>
  summarize(avg_score = mean(score)) |>
  arrange(avg_score)

Piping Quiz

What does %>% pass to the next function?

Only the first argument
The result of the previous operation
The entire data frame

0 Interaction

0 Views

0 Likes

R Data Manipulation with dplyr

1. dplyr Basics

dplyr Basics Quiz

2. Filtering and Selecting

Filtering Quiz

3. Creating and Modifying Columns

Mutation Quiz

4. Summarizing and Grouping

Summarizing Quiz

5. Joining Data

Joining Quiz

6. Piping with %>%

Piping Quiz

Welcome to Ptutorials