Skip to content

Instantly share code, notes, and snippets.

@NTT123
NTT123 / llm-play-chess.html
Created August 5, 2025 16:33
llm-play-chess.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>AI Chess Arena - Gemini API Chess Battle</title>
<style>
body {
font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
margin: 0 auto;
Caffeine, a methylxanthine alkaloid, stands as the most widely consumed psychoactive substance on the planet. Its presence is woven into the daily rituals of billions, found in coffee, tea, chocolate, and an ever-expanding universe of energy drinks, sodas, and dietary supplements. Consumers turn to it for its well-documented ability to promote wakefulness, enhance cognition, and boost physical performance. For many, it is an indispensable tool for navigating the demands of modern life. Yet, for a significant portion of these users, the benefits of caffeine come with a familiar list of drawbacks: anxiety, jitters, digestive upset, and disrupted sleep.
Into this landscape has emerged paraxanthine, a compound fascinating yet unfamiliar to most people. Scientifically known as 1,7-dimethylxanthine, paraxanthine is not an obscure molecule from a remote plant but is, in fact, the principal metabolite produced by the human body after caffeine is consumed. When an individual drinks a cup of coffee, their liver rapidly
@NTT123
NTT123 / Paraxanthine.html
Last active July 3, 2025 16:17
Paraxanthine vs. Caffeine
<!DOCTYPE html>
<html lang="en" class="scroll-smooth">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Paraxanthine Versus Caffeine: An Evidence-Based Evaluation</title>
<script src="https://cdn.tailwindcss.com"></script>
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Source+Serif+4:opsz,[email protected],400;8..60,600;8..60,700&family=Inter:wght@400;500;600&display=swap" rel="stylesheet">
@NTT123
NTT123 / llama3_model.py
Created April 22, 2025 01:25
Llama3 model from scratch
import json
from dataclasses import dataclass
from pathlib import Path
from typing import Optional, Tuple, Union
import torch
import torch.nn.functional as F
from torch import nn
@NTT123
NTT123 / memory_efficient_adamw.py
Last active April 18, 2025 11:14
Memory Efficient AdamW optimizer that offloads optimizer states to CPU memory
import math
import torch
from torch.optim import AdamW
class MemoryEfficientAdamW(AdamW):
"""
Memory Efficient AdamW optimizer that keeps parameters and gradients on GPU
but optimizer states on CPU when enabled.
"""
This script fetches download statistics for major LLM provider packages (OpenAI, Anthropic, Claude) from PyPI Stats API
and generates an HTML visualization showing the relative market share across different operating systems.
The visualization consists of three pie charts displaying the percentage of downloads for each package on:
- Windows
- MacOS (Darwin)
- Linux
Each chart shows:
@NTT123
NTT123 / gemini-google-search-retrieval.py
Created November 14, 2024 04:06
Gemini appends search results at the end for grounding generation.
import os
import google.generativeai as genai
genai.configure(api_key=os.environ["GOOGLE_API_KEY"])
# Create the model
generation_config = {
"temperature": 0.0,
"max_output_tokens": 8192,
"response_mime_type": "text/plain",
@NTT123
NTT123 / inplace_rope.py
Created September 13, 2024 13:56
Inplace RoPE inference kernel
"""
RoPE triton kernel
"""
import triton
import triton.language as tl
@triton.jit
def _rope_kernel(
x_ptr, x_row_stride, x_head_stride,
@NTT123
NTT123 / in-place-rms-norm-triton-kernel.md
Last active September 12, 2024 05:25
Inplace RMSNorm Implementation

This is an optimized implementation of RMSNorm inference kernel using Triton, a Python-based GPU programming library. This implementation is a modified version of the excellent RMSNorm kernel from the Unsloth project.

It has two improvements:

  • int64 for pointer offset: We use int64 instead of the default int32 to compute the pointer offset value. This change prevents overflow when dealing with large sequence lengths where the offset exceeds the maximum int32 value (2B).
  • In-place computation: Our kernel writes the result back to the input buffer, eliminating the need for additional memory allocation. This approach halves the memory usage compared to traditional implementations that use a separate output buffer.
import torch
import triton
@NTT123
NTT123 / convert_hf_to_llama3.py
Last active September 19, 2024 12:43
This script converts a Hugging Face LLaMA3 model checkpoint to the original LLaMA3 checkpoint format.
"""
This script converts a Hugging Face LLaMA3 model checkpoint to the original LLaMA3 checkpoint format.
Usage example:
python convert_hf_to_llama3.py --hf_model_path "path/to/hf/model" --output_path "path/to/output"
"""
import torch
from transformers import LlamaForCausalLM
import os