Skip to content

Instantly share code, notes, and snippets.

View stanleyugwu's full-sized avatar
✌️

Stanley Ugwu stanleyugwu

✌️
View GitHub Profile
@stanleyugwu
stanleyugwu / tokenizer.js
Created November 23, 2025 20:35
Mini Tokenizer
let corpus = "the cat full overran the ran dog running do the needful";
corpus = corpus.replace(/ /g, "_");
let corpArr = corpus.split("");
const vocab = {};
const rules = [];
let id = 1;
const maxTokens = 50;
const sep = "<sep>";