Martin Chan martinctc

🔥

Packaging up my workflows

Data Science @ Microsoft Viva

martinctc / Postcodes to location.R

Created April 10, 2025 14:37

[Convert postcodes to location] #R

	library(tidyverse)
	library(PostcodesioR)

	# Customize with your own path
	df_with_postcodes <- read_csv(
	"path/data/postcodes.csv"
	)

	# Update with column name containing postcode
	postcode_column <- df_with_postcodes[["postcode"]]

martinctc / apply_noise.R

Last active March 12, 2025 15:25

[Apply Noise to Specified Columns in a Data Frame] #R

	#' @title Apply Noise to Specified Columns in a Data Frame
	#'
	#' @description This function applies a normal distribution-based noise to
	#' specified columns in a data frame, grouped by a specified variable. The
	#' noise is scaled to a range of -0.2 to 0.2.
	#'
	#' @param df Data frame to apply the normal distribution to for creating noise.
	#' @param group_var String specifying the grouping variable.
	#' @param cols Vector of column names to apply the noise to.
	#' @param scale_from Numeric value specifying the lower bound of the scaling range.

martinctc / simulate_and_modify_by_rnorm.R

Last active March 12, 2025 15:14

[Simulate dataset, duplicate, and modify with a distribution] #R

	# This script simulates a dataset, duplicates it over time, and modifies it to
	# create a bell curve-like distribution.

	# Set up
	library(tidyverse)
	library(uuid)

	# Simulate dataset
	temp_df <-
	tibble(

martinctc / run-stats-tests.R

Last active February 6, 2025 15:28

run any statistical tests for two metrics

	#' @title Perform a Statistical Test
	#'
	#' @description This function performs a statistical test (e.g., chi-squared, t-test) given a data frame, variable names, and any other parameters needed.
	#'
	#' @details Insert more detailed information here about what the function does, the assumptions it makes, and how it should be used.
	#'
	#' @param data A data frame containing the variables of interest.
	#' @param var1 A string or symbol specifying the first variable.
	#' @param var2 A string or symbol specifying the second variable (if applicable).
	#' @param ... Additional arguments passed to the underlying test function.

martinctc / approx_num.R

Created March 5, 2024 14:31

[Convert numeric value to natural language approximation] #R

	#' @title Convert a numeric value into a natural language approximation string
	#'
	#' @description
	#' This function takes a numeric value and returns a string that approximates the value in natural language.
	#'
	#' @param x A numeric value.
	#'
	#' @examples
	#' approx_num(0.5)
	#' # [1] "increased by a half"

martinctc / test-python-rf-runtime.py

Last active January 15, 2024 14:20

Test run speeds for RF model in Python including simulation

martinctc / get-pypi-stats.py

Created November 8, 2023 15:48

[Get PyPI statistics] #python

	import requests
	import pandas as pd

	package_name = "vivainsights"
	api_endpoint = f"https://pypistats.org/api/packages/{package_name}/overall"

	response = requests.get(api_endpoint)

	if response.status_code == 200:
	data = response.json()

martinctc / power-analysis.R

Created January 9, 2023 15:18

[Power analysis and sample size estimation with R] #R

	# See <https://rpubs.com/mbounthavong/sample_size_power_analysis_R>

	library(pwr)

	# Sample size estimations for two proportions
	# `pwr::ES.h()` computes effect size for two proportions
	# n provides required sample size

	p0 <- pwr.2p.test(h = ES.h(p1 = 0.60, p2 = 0.50), sig.level = 0.05, power = .80)
	plot(p0)

martinctc / power-analysis.py

Last active January 6, 2023 15:42

[Power analysis with python] #python

	# estimate sample size via power analysis
	from statsmodels.stats.power import TTestIndPower

	# parameters for power analysis
	effect = 0.8
	alpha = 0.05
	power = 0.8

	# perform power analysis
	analysis = TTestIndPower()

martinctc / rank_by_group.R

Last active November 1, 2021 23:57

[Rank a data frame with a grouping variable using entirely base R] #R

	#' @title
	#' Rank a data frame by grouping variable using base R
	#'
	#' @description
	#' This function ranks a specified column in a data frame by group using entirely base R functions.
	#' The underlying function is `rank()`, where additional arguments can be passed with `...`.
	#' The grouping variable is specified as a string using the argument `group_var`, and the variable to rank is
	#' specified using the argument `rank_var`. The operation is analogous to using `group_by()` followed by
	#' `mutate()` in {dplyr}.
	#' See example below using the base dataset `iris`.