Skip to content

Instantly share code, notes, and snippets.

View pedrobrantes's full-sized avatar

Brantes pedrobrantes

View GitHub Profile
@pedrobrantes
pedrobrantes / Full_semi_and_anti_joins.R
Created May 8, 2025 21:53
This Gist contains R code exercises demonstrating advanced data joining techniques (full_join, semi_join, anti_join) using dplyr and tidyr with LEGO datasets. It covers combining information from multiple tables, finding common and unique rows between datasets, identifying missing data, aggregating and transforming data (counting parts, calculat…
# Exercises
# Load datasets and libraries
library(dplyr)
library(tidyr)
inventories <- read.csv(file = "../../../datasets/inventories.csv")
inventory_parts <- read.csv(file = "../../../datasets/inventory_parts.csv")
inventory_parts_joined <- inventories %>% inner_join(inventory_parts, by = c("id" = "inventory_id")) %>% arrange(desc(quantity)) %>% select(-id, -version)
inventory_parts_themes <- inventories %>% inner_join(inventory_parts, by = c("id" = "inventory_id")) %>% arrange(desc(quantity)) %>% select(-id, -version) %>% inner_join(sets, by = "set_num") %>% inner_join(themes, by = c("theme_id" = "id"), suffix = c("_set", "_theme"))
sets <- read.csv(file = "../../../datasets/sets.csv")
@pedrobrantes
pedrobrantes / Left_and_right_joins.R
Created May 5, 2025 21:02
This Gist contains R code exercises focusing on practicing different types of data joins (inner_join, left_join, right_join) using the dplyr and tidyr packages with several LEGO-related datasets. It demonstrates combining tables, handling missing values from joins, performing self-joins to explore hierarchical relationships (like LEGO themes), a…
# Exercises
# Load datasets
inventory_parts <- read.csv(file = "../../../datasets/inventory_parts.csv")
inventories <-read.csv(file = "../../../datasets/inventories.csv")
sets <- read.csv(file = "../../../datasets/sets.csv")
parts <- read.csv(file = "../../../datasets/parts.csv")
part_categories <- read.csv(file = "../../../datasets/part_categories.csv")
themes <- read.csv(file = "../../../datasets/themes.csv")
@pedrobrantes
pedrobrantes / Joining_tables.R
Created April 29, 2025 17:51
This Gist contains R code exercises demonstrating how to join multiple related dataframes using dplyr, specifically focusing on inner_join. The exercises use datasets related to parts, inventories, sets, and colors, showing how to combine information across these tables, specify joining columns, handle naming conflicts with suffixes, chain multi…
# Exercises
# Load datasets
parts <- read.csv(file = "../../../datasets/parts.csv")
part_categories <- read.csv(file = "../../../datasets/part_categories.csv")
inventory_parts <- read.csv(file = "../../../datasets/inventory_parts.csv")
inventories <- read.csv(file = "../../../datasets/inventories.csv")
sets <- read.csv(file = "../../../datasets/sets.csv")
colors <- read.csv(file = "../../../datasets/colors.csv")
@pedrobrantes
pedrobrantes / Case_study_the_babynames_dataset.R
Created April 25, 2025 20:56
This Gist contains R code exercises that analyze and visualize baby name popularity trends using the babynames dataset with the dplyr and ggplot2 packages. Exercises cover filtering, sorting, grouping, calculating name frequencies and proportions over time, finding peak popularity years, and plotting trends and distributions of name usage.
# Exercises
babynames <- read.csv(file = "Data_Manipulation_with_dplyr/datasets/babynames.csv")
# Filter for the year 1990
# Sort the number column in descending order
babynames %>%
filter(year == 1990) %>%
arrange(desc(n))
@pedrobrantes
pedrobrantes / Selecting_and_transforming_data.R
Created April 19, 2025 17:06
This Gist provides a comprehensive set of R code exercises using the dplyr package for data manipulation on a US counties dataset. It covers fundamental operations like selecting, filtering, arranging, and mutating data, as well as more advanced techniques such as grouping, summarizing, counting, relocating columns, renaming, using across for mu…
# Exercises
library(dplyr)
counties <- read.csv(file = "Data_Manipulation_with_dplyr/datasets/counties.csv")
counties <- counties %>%
mutate(
census_id = as.character(census_id),state = as.character(state),county = as.character(county),region = as.character(region),metro = as.character(metro),population = as.numeric(population),men = as.numeric(men),women = as.numeric(women),hispanic = as.numeric(hispanic),white = as.numeric(white),black = as.numeric(black),native = as.numeric(native),asian = as.numeric(asian),pacific = as.numeric(pacific),citizens = as.numeric(citizens),income = as.numeric(income),income_err = as.numeric(income_err),income_per_cap = as.numeric(income_per_cap),income_per_cap_err = as.numeric(income_per_cap_err),poverty = as.numeric(poverty),child_poverty = as.numeric(child_poverty),professional = as.numeric(professional),service = as.numeric(service),office = as.numeric(office),construction = as.numeric(construction),production = as.numeric(production),drive = as.numeric(drive),carpool
@pedrobrantes
pedrobrantes / Easter_functions_and_graph_analysis.R
Created April 12, 2025 19:29
This Gist contains an R script that calculates and visualizes the date of Easter over a range of years (2000-2025). It uses the timeDate package to determine Easter dates, dplyr for data manipulation, ggplot2 for creating trend lines, frequency bar charts, and box plots, lubridate for date manipulation, and forecast for analyzing the autocorrela…
# Functions to calculate easter day
library(timeDate)
calculate_easter <- function(year) {
easter_date <- as.Date(Easter(year))
return(easter_date)
}
desired_year <- 2025
easter_date_year <- calculate_easter(desired_year)
@pedrobrantes
pedrobrantes / Aggregating_data.R
Created April 11, 2025 23:01
This Gist presents R code exercises demonstrating advanced data manipulation with the dplyr package on a US counties dataset. It covers using count() for frequency analysis, summarize() for aggregate statistics, group_by() for grouped operations, and slice_max()/slice_min() to identify top/bottom entries within groups. The exercises explore coun…
# Exercises
library(dplyr)
counties <- read.csv(file = "Data_Manipulation_with_dplyr/datasets/counties.csv")
counties <- counties %>%
mutate(
census_id = as.character(census_id),state = as.character(state),county = as.character(county),region = as.character(region),metro = as.character(metro),population = as.numeric(population),men = as.numeric(men),women = as.numeric(women),hispanic = as.numeric(hispanic),white = as.numeric(white),black = as.numeric(black),native = as.numeric(native),asian = as.numeric(asian),pacific = as.numeric(pacific),citizens = as.numeric(citizens),income = as.numeric(income),income_err = as.numeric(income_err),income_per_cap = as.numeric(income_per_cap),income_per_cap_err = as.numeric(income_per_cap_err),poverty = as.numeric(poverty),child_poverty = as.numeric(child_poverty),professional = as.numeric(professional),service = as.numeric(service),office = as.numeric(office),construction = as.numeric(construction),production = as.numeric(production),drive = as.numeric(drive),carpool
@pedrobrantes
pedrobrantes / Transforming_data_with_dplyr.R
Created April 7, 2025 22:22
This Gist contains R code exercises demonstrating data manipulation with the dplyr package using a dataset of US counties. It includes examples of selecting columns, filtering rows based on various criteria (population, state, poverty, unemployment, etc.), sorting data, and creating new columns through calculations and conditional logic.
# Exercises
library(dplyr)
counties <- read.csv(file = "Data_Manipulation_with_dplyr/datasets/counties.csv")
counties <- counties %>%
mutate(
census_id = as.character(census_id),state = as.character(state),county = as.character(county),region = as.character(region),metro = as.character(metro),population = as.numeric(population),men = as.numeric(men),women = as.numeric(women),hispanic = as.numeric(hispanic),white = as.numeric(white),black = as.numeric(black),native = as.numeric(native),asian = as.numeric(asian),pacific = as.numeric(pacific),citizens = as.numeric(citizens),income = as.numeric(income),income_err = as.numeric(income_err),income_per_cap = as.numeric(income_per_cap),income_per_cap_err = as.numeric(income_per_cap_err),poverty = as.numeric(poverty),child_poverty = as.numeric(child_poverty),professional = as.numeric(professional),service = as.numeric(service),office = as.numeric(office),construction = as.numeric(construction),production = as.numeric(production),drive = as.numeric(drive),carpool
@pedrobrantes
pedrobrantes / Types_of_visualizations.R
Created April 6, 2025 14:55
This Gist contains R code exercises demonstrating data analysis and visualization using dplyr and ggplot2 with the gapminder dataset. It covers summarizing data (calculating medians and averages), creating various plot types (line, bar, scatter, histogram, box, violin) to explore trends in GDP per capita, life expectancy, and population over tim…
# Exercises
library(gapminder)
library(dplyr)
library(ggplot2)
# Summarize the median gdpPercap by year, then save it as by_year
by_year <- gapminder %>%
group_by(year) %>%
summarize(medianGdpPercap = median(gdpPercap))
@pedrobrantes
pedrobrantes / Grouping_and_summarizing.R
Created March 29, 2025 23:05
This Gist presents R code exercises demonstrating the use of dplyr's summarise function to calculate median and maximum values from the gapminder dataset, grouped by year and continent. It also includes examples of visualizing these summarized trends over time using ggplot2.
# Exercises
library(gapminder)
library(dplyr)
# Summarize to find the median life expectancy
gapminder %>%
summarise(
medianLifeExp = median(lifeExp)
)