Skip to content

Instantly share code, notes, and snippets.

View teleprint-me's full-sized avatar

Austin teleprint-me

View GitHub Profile
@teleprint-me
teleprint-me / bpe.py
Created August 7, 2025 04:46
Byte-pair Encoding completely from scratch in Python
"""
Copyright © 2025 Austin Berrio
@file unicode.model.py
@license cc-by-sa-nc-4.0
@ref https://aclanthology.org/P16-1162/
@ref https://aclanthology.org/2025.coling-main.400/
@ref https://huggingface.co/blog/catherinearnett/dangers-of-tokenizer-recycling
"""
import argparse
@teleprint-me
teleprint-me / programming-as-theory-building.md
Created October 3, 2024 04:34 — forked from onlurking/programming-as-theory-building.md
Programming as Theory Building - Peter Naur

Programming as Theory Building

Peter Naur

Peter Naur's classic 1985 essay "Programming as Theory Building" argues that a program is not its source code. A program is a shared mental construct (he uses the word theory) that lives in the minds of the people who work on it. If you lose the people, you lose the program. The code is merely a written representation of the program, and it's lossy, so you can't reconstruct

@teleprint-me
teleprint-me / lambda.c
Created September 9, 2024 16:33 — forked from 7etsuo/lambda.c
lambdas in C
/** Lambdas in C. Compile with GCC!
* ███ ▄████████ ███ ▄████████ ███ █▄ ▄██████▄
*▀█████████▄ ███ ███ ▀█████████▄ ███ ███ ███ ███ ███ ███
* ▀███▀▀██ ███ █▀ ▀███▀▀██ ███ █▀ ███ ███ ███ ███
* ███ ▀ ▄███▄▄▄ ███ ▀ ███ ███ ███ ███ ███
* ███ ▀▀███▀▀▀ ███ ▀███████████ ███ ███ ███ ███
* ███ ███ █▄ ███ ███ ███ ███ ███ ███
* ███ ███ ███ ███ ▄█ ███ ███ ███ ███ ███
* ▄████▀ ██████████ ▄████▀ ▄████████▀ ████████▀ ▀██████▀
*
@teleprint-me
teleprint-me / gpt-2-encode.py
Last active May 6, 2024 16:04
gpt-2-encode.py
import json
import os
import re
from functools import lru_cache
from typing import IO
class Encoder:
def __init__(
self,
@teleprint-me
teleprint-me / convert.py
Created March 24, 2024 04:36 — forked from chu-tianxiang/convert.py
Convert grok-1 weight to torch
import numpy as np
import torch
import jax
from tqdm import tqdm
from model import LanguageModelConfig, TransformerConfig, QuantizedWeight8bit as QW8Bit
from runners import InferenceRunner, ModelRunner, sample_from_model
CKPT_PATH = "./checkpoints"
@teleprint-me
teleprint-me / timestamp.py
Created November 28, 2022 01:46
Simplify timestamps in python
from datetime import date, datetime
from dateutil import parser
def to_datetime(value: str) -> datetime:
return parser.parse(value)
def to_iso(value: datetime) -> str:
@teleprint-me
teleprint-me / security.py
Created November 28, 2022 01:37
Home brewed scrypt and json web token implementation in python
import base64
import hmac
import json
import os
import secrets
import string
import time
import uuid
import scrypt