Skip to content

Instantly share code, notes, and snippets.

@yeiichi
yeiichi / infer_csv_schema.py
Created September 11, 2025 00:03
Analyze and infer SQL types for columns in CSV files
#!/usr/bin/env python3
"""
This module provides a class and utility functions to analyze and infer SQL
types for columns in CSV files via statistical and datatype examination.
It handles various datatypes like integers, floats, booleans, dates, and
timestamps while accommodating dialect-specific SQL type mappings. Additional
functionality includes updating statistics across data chunks, robustness
against missing or invalid data, and generation of NULL/NOT NULL constraints.
"""
@yeiichi
yeiichi / csv_classifier.py
Created September 7, 2025 22:05
Classify CSV files by strict header match and move them to mapped destinations
#!/usr/bin/env python3
"""
A tool for classifying and moving CSV files based on column headers.
This script processes CSV files by matching their headers against predefined
layouts. Depending on the matched layout, the files are moved to their
corresponding destination directories. The script supports both file-based
and directory-based classification and offers options for dry-run execution,
recursion, and verbose output.
@yeiichi
yeiichi / excel_concatenator.py
Created August 24, 2025 03:36
Combine all `.xlsx` files present in a specified directory
#!/usr/bin/env python3
"""
A program to concatenate Excel files in a directory.
This script combines all `.xlsx` files present in a specified directory, ensuring
they each have identical and consistent columns (order-sensitive). The combined
data is written into a single output Excel file. Additionally, duplicate rows are
dropped from the final result, and the index is reset. An output file name can
either be provided via the command-line arguments or auto-generated based on
the current timestamp.
@yeiichi
yeiichi / excel_table_inserter.py
Created August 23, 2025 06:14
Insert an Excel Table covering the contiguous data range in the worksheet
#!/usr/bin/env python3
from __future__ import annotations
import argparse
from pathlib import Path
from openpyxl import load_workbook
from openpyxl.utils import get_column_letter
from openpyxl.workbook.workbook import Workbook
from openpyxl.worksheet.table import Table, TableStyleInfo
@yeiichi
yeiichi / cron-safe-runner.bash
Last active August 10, 2025 13:27
Bash wrapper to run a Python script using a virtual environment.
#!/usr/bin/env bash
# shellcheck shell=bash
# Bash wrapper to run Python scripts using a virtual environment.
set -Eeuo pipefail
# Constants
readonly HOME_DIR="${HOME}"
readonly VENV_DIR="${HOME_DIR}/path/to/venv"
readonly VENV_BIN="${VENV_DIR}/bin"
@yeiichi
yeiichi / normalize_missing.py
Created July 26, 2025 12:50
Normalize missing-like values to Python None for PostgreSQL or general use
#!/usr/bin/env python3
import pandas as pd
import numpy as np
def normalize_missing(
df: pd.DataFrame,
targets=None,
columns=None,
output_dtype="object"
) -> pd.DataFrame:
@yeiichi
yeiichi / datetime-now.zsh
Created July 16, 2025 23:48
Return ISO 8601 datetime.now() in the given time zone
#!/bin/zsh
# Return ISO 8601 datetime.now() in the given time zone
# Ask user for timezone abbreviation
read "abbr?Enter timezone (e.g. jst, utc, est, pst): "
abbr="${abbr:l}" # normalize to lowercase
# Map common abbreviations to IANA timezone names
case "$abbr" in
jst) tz="Asia/Tokyo" ;;
@yeiichi
yeiichi / filename_sanitizer.py
Created July 6, 2025 01:40
Sanitize a string to make it safe for use as a filename
import re
# macOS/Windows compatibility
INVALID_FILENAME_CHARS = re.compile(r'[<>:"/\\|?*\x00-\x1F]|[\s.]$')
def sanitize_filename(filename: str) -> str:
"""
Sanitizes a string to make it safe for use as a filename.
Replaces problematic characters with a hyphen (-).
"""
@yeiichi
yeiichi / parent_url_extractor.py
Created July 4, 2025 01:43
Extract the parent directory path from the given file path
#!/usr/bin/env python3
from urllib.parse import urlparse, urlunparse, ParseResult
# Constants
ROOT_PATH = '/'
def extract_parent_path(path: str) -> str:
"""
Extracts the parent directory path from the given file path.
@yeiichi
yeiichi / week_number_solver.py
Last active July 2, 2025 00:49
Generate a list of ISO week date ranges (Monday to Sunday) for a given year
#!/usr/bin/env python3
import argparse
from collections import namedtuple
from datetime import date
Week = namedtuple("Week", ["week_number", "start_date", "end_date"])
def list_iso_week_dates(year):
"""