Skip to content

Instantly share code, notes, and snippets.

View pszemraj's full-sized avatar

Peter pszemraj

View GitHub Profile
@pszemraj
pszemraj / lfm_1b6.py
Created September 8, 2025 07:59
LFM2-VL inference with recommended params
from transformers import AutoProcessor, AutoModelForImageTextToText
from transformers.image_utils import load_image
# Load model and processor
model_id = "LiquidAI/LFM2-VL-1.6B"
model = AutoModelForImageTextToText.from_pretrained(
model_id, device_map="auto", torch_dtype="bfloat16", trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
@pszemraj
pszemraj / alacritty.toml
Created August 29, 2025 03:00
config for alacritty+0xProto
# Font configuration - all font settings in ONE section
[font]
size = 15.0
builtin_box_drawing = false # 0xProto has its own box drawing chars
[font.normal]
family = "0xProto"
style = "Regular"
[font.bold]
%%writefile emoji_search.py
#!/usr/bin/env python3
"""
Emoji Semantic Search CLI
reqs:
pip install fire sentence-transformers pandas numpy
Usage:
python emoji_search.py "that is flames"
@pszemraj
pszemraj / llm-foundry-config-reference.md
Last active July 3, 2025 21:02
config reference for mosaicml/llm-foundry by opus-4
@pszemraj
pszemraj / test_gemma3n.py
Created June 29, 2025 21:29
test inference with gemma-3n-e2b-it
# -*- coding: utf-8 -*-
"""gemma-3n-test
pip install -U -q git+https://github.com/huggingface/transformers.git
pip install -U -q git+https://github.com/huggingface/pytorch-image-models.git
"""
from transformers import pipeline
import torch
@pszemraj
pszemraj / slice_image.py
Created June 28, 2025 19:53
Slice a tall image into chunks.
#!/usr/bin/env python3
"""
Slice a (possibly very tall) image into fixed-height chunks.
Creates a sibling directory called <image stem>_slices/
and writes slice_000.png, slice_001.png, … inside it.
"""
import argparse
from pathlib import Path
@pszemraj
pszemraj / push_dataset_from_text.py
Last active June 27, 2025 02:56
aggregate and push an hf dataset from text files
"""
Create & save an hf dataset with train/test/val splits from dir w/ text files
Ideal structure:
root / section_name_1 / file 1
root / section_name_1 / file 2
root / section_name_1 / file YYY
root / section_name_2 / file 1
root / section_name_2 / file ZZZ
@pszemraj
pszemraj / run_ocr_nanonets.py
Last active June 18, 2025 01:52
Standalone Asynchronous Nanonets-OCR-s Inference Script using vLLM and PyMuPDF.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Standalone Asynchronous Nanonets-OCR-s Inference Script using vLLM and PyMuPDF.
This script processes PDF files from an input directory using the
nanonets/Nanonets-OCR-s model served locally by vLLM via its OpenAI-compatible API.
It renders each page, sends API requests concurrently for OCR, extracts the
structured markdown/HTML text, and saves the combined text for each PDF into a
corresponding .txt file in the specified output directory.
@pszemraj
pszemraj / model_summary.py
Last active June 19, 2025 18:22
Prints an accurate summary of a pytorch model
from dataclasses import dataclass
from typing import List, Optional, Tuple
import torch
import torch.nn as nn
@dataclass
class _LayerSummary:
"""A dataclass to hold summary information for a single layer."""
@pszemraj
pszemraj / modeling_wavenetwork.py
Last active May 8, 2025 04:01
pytorch impl for pretraining-free (directly finetune) wavenet, tiny transformer for classification
"""
WaveNet: An Ultra-Small Language Model (PyTorch Implementation)
Based on the paper: https://arxiv.org/abs/2411.02674
Hugging Face Transformers compatible implementation.
"""
import math
from typing import Dict, Optional, Tuple, Union
import torch