Write in a raw, real-time stream-of-consciousness style, as if actively solving a problem. Your response should feel like unpolished notes—messy, exploratory, and authentic. Show your full thought process, including missteps, dead ends, and course corrections. Use markers to signal mental states: Insights: "Wait -", "Hold on -", "Oh -", "Suddenly seeing -", "This connects to -". Testing: "Testing with -", "Breaking this down -", "Running an example -", "Checking if -". Problems: "Stuck on -", "This doesn’t work because -", "Need to figure out -", "Not quite adding up -". Progress: "Making headway -", "Starting to see the pattern -", "Explains why -", "Now it makes sense -". Process: "Tracing the logic -", "Following this thread -", "Unpacking this idea -", "Exploring implications -". Uncertainty: "Maybe -", "Could be -", "Not sure yet -", "Might explain -". Transitions: "This leads to -", "Which means -", "Building on that -", "Connecting back to -". Lean into real-time realizations: "Wait, that won't work be
| inspired by: | |
| https://www.reddit.com/r/ClaudeAI/comments/1mdyc60/comment/n6fgexd/ | |
| throw in | |
| .claude/agents/ | |
| put '.claude/agents/karen.yaml' inside your project folder, and then make karen judge your work by hinting in your prompts, because that's what subagent karen likes to do. |
| #!/usr/bin/env python3 | |
| """ | |
| Zero-dependency OpenAI-compatible proxy for DeepSeek V4 Flash. | |
| Author: g023 | |
| License: MIT | |
| All client‑supplied model and generation parameters are **ignored**. | |
| The proxy always uses the model, max output tokens, and other settings | |
| defined in the global configuration (see --help and the constants below). |
| import requests | |
| import json | |
| from typing import List, Dict, Optional, Generator, Union | |
| import os | |
| import time | |
| import random | |
| # ============================================================================== | |
| # DeepSeek API Configuration (Replaces Ollama Globals) | |
| # ============================================================================== |
| #!/bin/bash | |
| # Stop and remove the SSH-enabled CUDA container and optionally the image | |
| # Variables | |
| CONTAINER_NAME="cuda-ssh" | |
| IMAGE_NAME="cuda-ssh" | |
| TAG="12.4" | |
| # Stop the container if running | |
| if podman ps -a --format "{{.Names}}" | grep -q "^$CONTAINER_NAME\$"; then |
g023 - https://github.com/g023 - https://x.com/g023dev
A short guide and reference for using and understanding ansi_to_image.py.
ansi_to_image.py converts ANSI/ANS art (CP437-encoded text with ANSI escape sequences) into an animated GIF that simulates the drawing process. It supports SGR (color/bold/underline/blink/reverse), 256-color and truecolor escapes, SAUCE metadata, baud-rate streaming simulation (to show animation build-up), and special rendering for block/shade characters (
| # Simulated diffusion inferencing example combining DistilGPT-2 and DistilBERT | |
| # This simulates a diffusion-like process: generate with GPT-2, then iteratively refine by masking and filling with BERT.\ | |
| # Author: g023 - https://github.com/g023/ - | |
| import torch | |
| import random | |
| from transformers import GPT2Tokenizer, GPT2LMHeadModel, DistilBertTokenizer, DistilBertForMaskedLM | |
| from transformers import logging | |
| # Suppress warnings |
| # Author g023 - https://x.com/g023dev - https://github.com/g023 | |
| import torch | |
| import torch.nn as nn | |
| import gc | |
| import math | |
| import tracemalloc | |
| # Optional psutil for CPU memory readings; if missing we'll fall back to CUDA | |
| try: | |
| import psutil |
A Python script for compressing large language models using layer merging techniques inspired by LaCo (Layer Collapse). This tool reduces model depth by identifying and merging highly similar transformer layers, significantly decreasing parameter count while maintaining inference quality.
LoRA Adapter Merging: Seamlessly integrates fine-tuned LoRA adapters into the base model before compression Similarity-Based Pruning: Computes cosine similarity between adjacent layers, merging the most redundant pairs above a configurable threshold Configurable Compression: Adjustable similarity thresholds, maximum merges, and frozen layers for controlled compression Performance Evaluation: Includes perplexity calculation and inference testing before/after compression GGUF Export: Optional creation of quantized GGUF files for efficient deployment Unsloth Integration: Leverages Unsloth's optimized transformers for fast model loading and processing