Peter Naur's classic 1985 essay "Programming as Theory Building" argues that a program is not its source code. A program is a shared mental construct (he uses the word theory) that lives in the minds of the people who work on it. If you lose the people, you lose the program. The code is merely a written representation of the program, and it's lossy, so you can't reconstruct
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Copyright © 2025 Austin Berrio | |
@file unicode.model.py | |
@license cc-by-sa-nc-4.0 | |
@ref https://aclanthology.org/P16-1162/ | |
@ref https://aclanthology.org/2025.coling-main.400/ | |
@ref https://huggingface.co/blog/catherinearnett/dangers-of-tokenizer-recycling | |
""" | |
import argparse |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/** Lambdas in C. Compile with GCC! | |
* ███ ▄████████ ███ ▄████████ ███ █▄ ▄██████▄ | |
*▀█████████▄ ███ ███ ▀█████████▄ ███ ███ ███ ███ ███ ███ | |
* ▀███▀▀██ ███ █▀ ▀███▀▀██ ███ █▀ ███ ███ ███ ███ | |
* ███ ▀ ▄███▄▄▄ ███ ▀ ███ ███ ███ ███ ███ | |
* ███ ▀▀███▀▀▀ ███ ▀███████████ ███ ███ ███ ███ | |
* ███ ███ █▄ ███ ███ ███ ███ ███ ███ | |
* ███ ███ ███ ███ ▄█ ███ ███ ███ ███ ███ | |
* ▄████▀ ██████████ ▄████▀ ▄████████▀ ████████▀ ▀██████▀ | |
* |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import json | |
import os | |
import re | |
from functools import lru_cache | |
from typing import IO | |
class Encoder: | |
def __init__( | |
self, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
import torch | |
import jax | |
from tqdm import tqdm | |
from model import LanguageModelConfig, TransformerConfig, QuantizedWeight8bit as QW8Bit | |
from runners import InferenceRunner, ModelRunner, sample_from_model | |
CKPT_PATH = "./checkpoints" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from datetime import date, datetime | |
from dateutil import parser | |
def to_datetime(value: str) -> datetime: | |
return parser.parse(value) | |
def to_iso(value: datetime) -> str: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import base64 | |
import hmac | |
import json | |
import os | |
import secrets | |
import string | |
import time | |
import uuid | |
import scrypt |