pedrobrantes’s gists

pedrobrantes / Full_semi_and_anti_joins.R

Created May 8, 2025 21:53

This Gist contains R code exercises demonstrating advanced data joining techniques (full_join, semi_join, anti_join) using dplyr and tidyr with LEGO datasets. It covers combining information from multiple tables, finding common and unique rows between datasets, identifying missing data, aggregating and transforming data (counting parts, calculat…

	# Exercises

	# Load datasets and libraries
	library(dplyr)
	library(tidyr)
	inventories <- read.csv(file = "../../../datasets/inventories.csv")
	inventory_parts <- read.csv(file = "../../../datasets/inventory_parts.csv")
	inventory_parts_joined <- inventories %>% inner_join(inventory_parts, by = c("id" = "inventory_id")) %>% arrange(desc(quantity)) %>% select(-id, -version)
	inventory_parts_themes <- inventories %>% inner_join(inventory_parts, by = c("id" = "inventory_id")) %>% arrange(desc(quantity)) %>% select(-id, -version) %>% inner_join(sets, by = "set_num") %>% inner_join(themes, by = c("theme_id" = "id"), suffix = c("_set", "_theme"))
	sets <- read.csv(file = "../../../datasets/sets.csv")

pedrobrantes / Left_and_right_joins.R

Created May 5, 2025 21:02

This Gist contains R code exercises focusing on practicing different types of data joins (inner_join, left_join, right_join) using the dplyr and tidyr packages with several LEGO-related datasets. It demonstrates combining tables, handling missing values from joins, performing self-joins to explore hierarchical relationships (like LEGO themes), a…

	# Exercises

	# Load datasets
	inventory_parts <- read.csv(file = "../../../datasets/inventory_parts.csv")
	inventories <-read.csv(file = "../../../datasets/inventories.csv")
	sets <- read.csv(file = "../../../datasets/sets.csv")
	parts <- read.csv(file = "../../../datasets/parts.csv")
	part_categories <- read.csv(file = "../../../datasets/part_categories.csv")
	themes <- read.csv(file = "../../../datasets/themes.csv")

pedrobrantes / Joining_tables.R

Created April 29, 2025 17:51

This Gist contains R code exercises demonstrating how to join multiple related dataframes using dplyr, specifically focusing on inner_join. The exercises use datasets related to parts, inventories, sets, and colors, showing how to combine information across these tables, specify joining columns, handle naming conflicts with suffixes, chain multi…

	# Exercises

	# Load datasets
	parts <- read.csv(file = "../../../datasets/parts.csv")
	part_categories <- read.csv(file = "../../../datasets/part_categories.csv")
	inventory_parts <- read.csv(file = "../../../datasets/inventory_parts.csv")
	inventories <- read.csv(file = "../../../datasets/inventories.csv")
	sets <- read.csv(file = "../../../datasets/sets.csv")
	colors <- read.csv(file = "../../../datasets/colors.csv")

pedrobrantes / Case_study_the_babynames_dataset.R

Created April 25, 2025 20:56

This Gist contains R code exercises that analyze and visualize baby name popularity trends using the babynames dataset with the dplyr and ggplot2 packages. Exercises cover filtering, sorting, grouping, calculating name frequencies and proportions over time, finding peak popularity years, and plotting trends and distributions of name usage.

	# Exercises

	babynames <- read.csv(file = "Data_Manipulation_with_dplyr/datasets/babynames.csv")

	# Filter for the year 1990
	# Sort the number column in descending order
	babynames %>%
	filter(year == 1990) %>%
	arrange(desc(n))

pedrobrantes / Selecting_and_transforming_data.R

Created April 19, 2025 17:06

This Gist provides a comprehensive set of R code exercises using the dplyr package for data manipulation on a US counties dataset. It covers fundamental operations like selecting, filtering, arranging, and mutating data, as well as more advanced techniques such as grouping, summarizing, counting, relocating columns, renaming, using across for mu…

	# Exercises

	library(dplyr)
	counties <- read.csv(file = "Data_Manipulation_with_dplyr/datasets/counties.csv")
	counties <- counties %>%
	mutate(
	census_id = as.character(census_id),state = as.character(state),county = as.character(county),region = as.character(region),metro = as.character(metro),population = as.numeric(population),men = as.numeric(men),women = as.numeric(women),hispanic = as.numeric(hispanic),white = as.numeric(white),black = as.numeric(black),native = as.numeric(native),asian = as.numeric(asian),pacific = as.numeric(pacific),citizens = as.numeric(citizens),income = as.numeric(income),income_err = as.numeric(income_err),income_per_cap = as.numeric(income_per_cap),income_per_cap_err = as.numeric(income_per_cap_err),poverty = as.numeric(poverty),child_poverty = as.numeric(child_poverty),professional = as.numeric(professional),service = as.numeric(service),office = as.numeric(office),construction = as.numeric(construction),production = as.numeric(production),drive = as.numeric(drive),carpool

pedrobrantes / Easter_functions_and_graph_analysis.R

Created April 12, 2025 19:29

This Gist contains an R script that calculates and visualizes the date of Easter over a range of years (2000-2025). It uses the timeDate package to determine Easter dates, dplyr for data manipulation, ggplot2 for creating trend lines, frequency bar charts, and box plots, lubridate for date manipulation, and forecast for analyzing the autocorrela…

	# Functions to calculate easter day
	library(timeDate)

	calculate_easter <- function(year) {
	easter_date <- as.Date(Easter(year))
	return(easter_date)
	}

	desired_year <- 2025
	easter_date_year <- calculate_easter(desired_year)

pedrobrantes / Aggregating_data.R

Created April 11, 2025 23:01

This Gist presents R code exercises demonstrating advanced data manipulation with the dplyr package on a US counties dataset. It covers using count() for frequency analysis, summarize() for aggregate statistics, group_by() for grouped operations, and slice_max()/slice_min() to identify top/bottom entries within groups. The exercises explore coun…

	# Exercises

	library(dplyr)
	counties <- read.csv(file = "Data_Manipulation_with_dplyr/datasets/counties.csv")
	counties <- counties %>%
	mutate(
	census_id = as.character(census_id),state = as.character(state),county = as.character(county),region = as.character(region),metro = as.character(metro),population = as.numeric(population),men = as.numeric(men),women = as.numeric(women),hispanic = as.numeric(hispanic),white = as.numeric(white),black = as.numeric(black),native = as.numeric(native),asian = as.numeric(asian),pacific = as.numeric(pacific),citizens = as.numeric(citizens),income = as.numeric(income),income_err = as.numeric(income_err),income_per_cap = as.numeric(income_per_cap),income_per_cap_err = as.numeric(income_per_cap_err),poverty = as.numeric(poverty),child_poverty = as.numeric(child_poverty),professional = as.numeric(professional),service = as.numeric(service),office = as.numeric(office),construction = as.numeric(construction),production = as.numeric(production),drive = as.numeric(drive),carpool

pedrobrantes / Transforming_data_with_dplyr.R

Created April 7, 2025 22:22

This Gist contains R code exercises demonstrating data manipulation with the dplyr package using a dataset of US counties. It includes examples of selecting columns, filtering rows based on various criteria (population, state, poverty, unemployment, etc.), sorting data, and creating new columns through calculations and conditional logic.

	# Exercises

	library(dplyr)
	counties <- read.csv(file = "Data_Manipulation_with_dplyr/datasets/counties.csv")
	counties <- counties %>%
	mutate(
	census_id = as.character(census_id),state = as.character(state),county = as.character(county),region = as.character(region),metro = as.character(metro),population = as.numeric(population),men = as.numeric(men),women = as.numeric(women),hispanic = as.numeric(hispanic),white = as.numeric(white),black = as.numeric(black),native = as.numeric(native),asian = as.numeric(asian),pacific = as.numeric(pacific),citizens = as.numeric(citizens),income = as.numeric(income),income_err = as.numeric(income_err),income_per_cap = as.numeric(income_per_cap),income_per_cap_err = as.numeric(income_per_cap_err),poverty = as.numeric(poverty),child_poverty = as.numeric(child_poverty),professional = as.numeric(professional),service = as.numeric(service),office = as.numeric(office),construction = as.numeric(construction),production = as.numeric(production),drive = as.numeric(drive),carpool

pedrobrantes / Types_of_visualizations.R

Created April 6, 2025 14:55

This Gist contains R code exercises demonstrating data analysis and visualization using dplyr and ggplot2 with the gapminder dataset. It covers summarizing data (calculating medians and averages), creating various plot types (line, bar, scatter, histogram, box, violin) to explore trends in GDP per capita, life expectancy, and population over tim…

	# Exercises

	library(gapminder)
	library(dplyr)
	library(ggplot2)

	# Summarize the median gdpPercap by year, then save it as by_year
	by_year <- gapminder %>%
	group_by(year) %>%
	summarize(medianGdpPercap = median(gdpPercap))

pedrobrantes / Grouping_and_summarizing.R

Created March 29, 2025 23:05

This Gist presents R code exercises demonstrating the use of dplyr's summarise function to calculate median and maximum values from the gapminder dataset, grouped by year and continent. It also includes examples of visualizing these summarized trends over time using ggplot2.

	# Exercises

	library(gapminder)
	library(dplyr)

	# Summarize to find the median life expectancy
	gapminder %>%
	summarise(
	medianLifeExp = median(lifeExp)
	)

Brantes pedrobrantes