This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
""" | |
This module provides a class and utility functions to analyze and infer SQL | |
types for columns in CSV files via statistical and datatype examination. | |
It handles various datatypes like integers, floats, booleans, dates, and | |
timestamps while accommodating dialect-specific SQL type mappings. Additional | |
functionality includes updating statistics across data chunks, robustness | |
against missing or invalid data, and generation of NULL/NOT NULL constraints. | |
""" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
""" | |
A tool for classifying and moving CSV files based on column headers. | |
This script processes CSV files by matching their headers against predefined | |
layouts. Depending on the matched layout, the files are moved to their | |
corresponding destination directories. The script supports both file-based | |
and directory-based classification and offers options for dry-run execution, | |
recursion, and verbose output. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
""" | |
A program to concatenate Excel files in a directory. | |
This script combines all `.xlsx` files present in a specified directory, ensuring | |
they each have identical and consistent columns (order-sensitive). The combined | |
data is written into a single output Excel file. Additionally, duplicate rows are | |
dropped from the final result, and the index is reset. An output file name can | |
either be provided via the command-line arguments or auto-generated based on | |
the current timestamp. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
from __future__ import annotations | |
import argparse | |
from pathlib import Path | |
from openpyxl import load_workbook | |
from openpyxl.utils import get_column_letter | |
from openpyxl.workbook.workbook import Workbook | |
from openpyxl.worksheet.table import Table, TableStyleInfo |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env bash | |
# shellcheck shell=bash | |
# Bash wrapper to run Python scripts using a virtual environment. | |
set -Eeuo pipefail | |
# Constants | |
readonly HOME_DIR="${HOME}" | |
readonly VENV_DIR="${HOME_DIR}/path/to/venv" | |
readonly VENV_BIN="${VENV_DIR}/bin" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
import pandas as pd | |
import numpy as np | |
def normalize_missing( | |
df: pd.DataFrame, | |
targets=None, | |
columns=None, | |
output_dtype="object" | |
) -> pd.DataFrame: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/zsh | |
# Return ISO 8601 datetime.now() in the given time zone | |
# Ask user for timezone abbreviation | |
read "abbr?Enter timezone (e.g. jst, utc, est, pst): " | |
abbr="${abbr:l}" # normalize to lowercase | |
# Map common abbreviations to IANA timezone names | |
case "$abbr" in | |
jst) tz="Asia/Tokyo" ;; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import re | |
# macOS/Windows compatibility | |
INVALID_FILENAME_CHARS = re.compile(r'[<>:"/\\|?*\x00-\x1F]|[\s.]$') | |
def sanitize_filename(filename: str) -> str: | |
""" | |
Sanitizes a string to make it safe for use as a filename. | |
Replaces problematic characters with a hyphen (-). | |
""" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
from urllib.parse import urlparse, urlunparse, ParseResult | |
# Constants | |
ROOT_PATH = '/' | |
def extract_parent_path(path: str) -> str: | |
""" | |
Extracts the parent directory path from the given file path. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
import argparse | |
from collections import namedtuple | |
from datetime import date | |
Week = namedtuple("Week", ["week_number", "start_date", "end_date"]) | |
def list_iso_week_dates(year): | |
""" |
NewerOlder