R Tutorials

Introduction to R Installing R and RStudio R Syntax and Basics R variables R data types R operators

R Intermediate Tutorials

R Control Structures R Functions R Data Manipulation (dplyr) Data Visualization in R (ggplot2) Tidyverse Package in R

R Advanced Tutorials

Advanced Statistical Analysis in R Advanced Machine Learning with R Building Web Apps with Shiny Advanced Data Visualization in R

R Debugging & Testing

Testing in R Debugging in R

Advanced r Tests

Advanced Quizzes

R Basic Quiz Level
R Intermediate Quiz Level
R Advanced Quiz Level

🔒

To benefit from this feature, you need to have an active subscription.

Yes/No Quizzes

R Basic Quiz Level
R Intermediate Quiz Level
R Advanced Quiz Level

🔒

To benefit from this feature, you need to have an active subscription.

Fill in the Blanks (FITB) Test

R test 1
R test 2
R test 3
R test 4

🔒

To benefit from this feature, you need to have an active subscription.

Tasks - Unlock the Treasure Box

R Task 1
R Task 2
R Task 3
R Task 4

🔒

To benefit from this feature, you need to have an active subscription.

Find and Fix Errors R Test

R test 1
R test 2
R test 3
R test 4

🔒

To benefit from this feature, you need to have an active subscription.

Machine Learning with R: A Complete Guide

This tutorial covers machine learning in R from foundational concepts to advanced techniques, with practical examples and clear explanations.

1. Introduction to Machine Learning

What it is: Algorithms that learn patterns from data to make predictions or decisions without explicit programming.

Key Concepts:

Supervised Learning: Predict outcomes (classification/regression)
Unsupervised Learning: Find patterns (clustering/dimensionality reduction)
Model Evaluation: Metrics to assess performance

# Essential packages
install.packages(c("caret", "randomForest", "e1071", "xgboost", "keras"))
library(caret)  # Unified ML interface

2. Data Preprocessing

Why it matters: Quality data = Better models. Most ML projects spend 80% time here.

Key Steps:

Handling missing values
Feature scaling
Categorical encoding

# Using caret's preProcess()
data(iris)
preproc <- preProcess(iris[,1:4], 
                     method = c("center", "scale", "knnImpute"))
processed_data <- predict(preproc, iris[,1:4])

# One-hot encoding
dummy_vars <- dummyVars(~ Species, data = iris)
encoded_data <- predict(dummy_vars, iris)

3. Supervised Learning

3.1 Classification

Goal: Predict categorical outcomes (e.g., spam/not spam)

# Train/test split
set.seed(123)
trainIndex <- createDataPartition(iris$Species, p = 0.8, list = FALSE)
train <- iris[trainIndex, ]
test <- iris[-trainIndex, ]

# Random Forest
model_rf <- train(Species ~ ., data = train, method = "rf")
predictions <- predict(model_rf, test)
confusionMatrix(predictions, test$Species)

3.2 Regression

Goal: Predict continuous values (e.g., house prices)

# Linear regression
model_lm <- train(mpg ~ ., data = mtcars, method = "lm")
summary(model_lm)

# Gradient Boosting
model_gbm <- train(mpg ~ ., data = mtcars, method = "gbm", verbose = FALSE)
predict(model_gbm, newdata = mtcars[1:3, ])

4. Unsupervised Learning

4.1 Clustering

Goal: Group similar data points (e.g., customer segmentation)

# K-means clustering
kmeans_result <- kmeans(iris[,1:4], centers = 3)
table(iris$Species, kmeans_result$cluster)

# Hierarchical clustering
dist_matrix <- dist(iris[,1:4])
hclust_result <- hclust(dist_matrix, method = "ward.D2")
plot(hclust_result)

4.2 Dimensionality Reduction

Goal: Reduce features while preserving information

# PCA
pca_result <- prcomp(iris[,1:4], scale = TRUE)
summary(pca_result)
biplot(pca_result)

# t-SNE (for visualization)
library(Rtsne)
tsne_result <- Rtsne(iris[,1:4], perplexity = 30)
plot(tsne_result$Y, col = iris$Species)

5. Model Evaluation

Key Metrics:

Classification: Accuracy, Precision, Recall, F1, ROC-AUC
Regression: RMSE, R², MAE

# Cross-validation
ctrl <- trainControl(method = "cv", number = 10)
model <- train(Species ~ ., data = iris, method = "rf", trControl = ctrl)
model

# ROC curve (for binary classification)
library(pROC)
roc_curve <- roc(response = test$Species, 
                predictor = as.numeric(predictions))
plot(roc_curve)

6. Advanced Techniques

6.1 Ensemble Methods

Why: Combine models to improve performance

# XGBoost (Gradient Boosting)
library(xgboost)
model_xgb <- train(Species ~ ., data = iris, method = "xgbTree")
varImp(model_xgb)

# Stacking models
library(caretEnsemble)
models <- caretList(Species ~ ., data = iris,
                  trControl = trainControl(method = "cv"),
                  methodList = c("rf", "glm", "svmRadial"))
ensemble <- caretEnsemble(models)
summary(ensemble)

6.2 Deep Learning

When: Complex patterns in unstructured data (images, text)

# Neural Networks with Keras
library(keras)
model <- keras_model_sequential() %>%
  layer_dense(units = 16, activation = 'relu', input_shape = c(4)) %>%
  layer_dense(units = 3, activation = 'softmax')

model %>% compile(
  optimizer = 'adam',
  loss = 'categorical_crossentropy',
  metrics = c('accuracy')
)

history <- model %>% fit(
  x = as.matrix(iris[,1:4]),
  y = to_categorical(as.numeric(iris$Species)-1),
  epochs = 50,
  batch_size = 5
)

7. Hyperparameter Optimization

Goal: Find the best model settings

# Grid search
tuneGrid <- expand.grid(
  mtry = c(2, 3, 4),
  splitrule = c("gini", "extratrees"),
  min.node.size = c(1, 5, 10)
)

model <- train(Species ~ .,
              data = iris,
              method = "ranger",
              tuneGrid = tuneGrid,
              trControl = trainControl(method = "cv", number = 5))
plot(model)

8. Building ML Pipelines

Why: Automate preprocessing + modeling workflows

# ML Pipeline with recipes
library(recipes)
recipe <- recipe(Species ~ ., data = iris) %>%
  step_center(all_numeric()) %>%
  step_scale(all_numeric()) %>%
  step_pca(all_numeric(), num_comp = 2)

model <- train(recipe,
              data = iris,
              method = "rf",
              trControl = trainControl(method = "cv"))

9. Model Deployment

Options:

R Shiny apps
Plumber APIs
Export to PMML

# Save/load models
saveRDS(model_rf, "rf_model.rds")
loaded_model <- readRDS("rf_model.rds")

# Simple prediction API with plumber
# plumber.R:
#* @post /predict
#* @param Sepal.Length numeric
#* @param Sepal.Width numeric
#* @param Petal.Length numeric
#* @param Petal.Width numeric
function(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) {
  new_data <- data.frame(
    Sepal.Length = as.numeric(Sepal.Length),
    Sepal.Width = as.numeric(Sepal.Width),
    Petal.Length = as.numeric(Petal.Length),
    Petal.Width = as.numeric(Petal.Width)
  )
  predict(loaded_model, new_data)
}

0 Interaction

0 Views

0 Likes

Machine Learning with R: A Complete Guide

1. Introduction to Machine Learning

2. Data Preprocessing

3. Supervised Learning

3.1 Classification

3.2 Regression

4. Unsupervised Learning

4.1 Clustering

4.2 Dimensionality Reduction

5. Model Evaluation

6. Advanced Techniques

6.1 Ensemble Methods

6.2 Deep Learning

7. Hyperparameter Optimization

8. Building ML Pipelines

9. Model Deployment

Welcome to Ptutorials