g023 g023

Write in a raw, real-time stream-of-consciousness style, as if actively solving a problem. Your response should feel like unpolished notes—messy, exploratory, and authentic. Show your full thought process, including missteps, dead ends, and course corrections. Use markers to signal mental states: Insights: "Wait -", "Hold on -", "Oh -", "Suddenly seeing -", "This connects to -". Testing: "Testing with -", "Breaking this down -", "Running an example -", "Checking if -". Problems: "Stuck on -", "This doesn’t work because -", "Need to figure out -", "Not quite adding up -". Progress: "Making headway -", "Starting to see the pattern -", "Explains why -", "Now it makes sense -". Process: "Tracing the logic -", "Following this thread -", "Unpacking this idea -", "Exploring implications -". Uncertainty: "Maybe -", "Could be -", "Not sure yet -", "Might explain -". Transitions: "This leads to -", "Which means -", "Building on that -", "Connecting back to -". Lean into real-time realizations: "Wait, that won't work be

Universal Prompt Templates For Various Prompt Strategies Using Large Language AI Models

Author: g023 (github.com/g023)

License: MIT

**Zero-shot Prompting**

ansi_to_image.py — ANSI to Animated GIF Converter

Author

g023 - https://github.com/g023 - https://x.com/g023dev

A short guide and reference for using and understanding ansi_to_image.py.

Summary

ansi_to_image.py converts ANSI/ANS art (CP437-encoded text with ANSI escape sequences) into an animated GIF that simulates the drawing process. It supports SGR (color/bold/underline/blink/reverse), 256-color and truecolor escapes, SAUCE metadata, baud-rate streaming simulation (to show animation build-up), and special rendering for block/shade characters (

Model Compression with Layer Collapse

A Python script for compressing large language models using layer merging techniques inspired by LaCo (Layer Collapse). This tool reduces model depth by identifying and merging highly similar transformer layers, significantly decreasing parameter count while maintaining inference quality.

Key Features

LoRA Adapter Merging: Seamlessly integrates fine-tuned LoRA adapters into the base model before compression Similarity-Based Pruning: Computes cosine similarity between adjacent layers, merging the most redundant pairs above a configurable threshold Configurable Compression: Adjustable similarity thresholds, maximum merges, and frozen layers for controlled compression Performance Evaluation: Includes perplexity calculation and inference testing before/after compression GGUF Export: Optional creation of quantized GGUF files for efficient deployment Unsloth Integration: Leverages Unsloth's optimized transformers for fast model loading and processing

	inspired by:

	https://www.reddit.com/r/ClaudeAI/comments/1mdyc60/comment/n6fgexd/

	throw in

	.claude/agents/

	put '.claude/agents/karen.yaml' inside your project folder, and then make karen judge your work by hinting in your prompts, because that's what subagent karen likes to do.

	#!/usr/bin/env python3
	"""
	Zero-dependency OpenAI-compatible proxy for DeepSeek V4 Flash.

	Author: g023
	License: MIT

	All client‑supplied model and generation parameters are ignored.
	The proxy always uses the model, max output tokens, and other settings
	defined in the global configuration (see --help and the constants below).

	import requests
	import json
	from typing import List, Dict, Optional, Generator, Union
	import os
	import time
	import random

	# ==============================================================================
	# DeepSeek API Configuration (Replaces Ollama Globals)
	# ==============================================================================

	#!/bin/bash
	# Stop and remove the SSH-enabled CUDA container and optionally the image

	# Variables
	CONTAINER_NAME="cuda-ssh"
	IMAGE_NAME="cuda-ssh"
	TAG="12.4"

	# Stop the container if running
	if podman ps -a --format "{{.Names}}" \| grep -q "^$CONTAINER_NAME\$"; then

	# Simulated diffusion inferencing example combining DistilGPT-2 and DistilBERT
	# This simulates a diffusion-like process: generate with GPT-2, then iteratively refine by masking and filling with BERT.\
	# Author: g023 - https://github.com/g023/ -

	import torch
	import random
	from transformers import GPT2Tokenizer, GPT2LMHeadModel, DistilBertTokenizer, DistilBertForMaskedLM
	from transformers import logging

	# Suppress warnings

	# Author g023 - https://x.com/g023dev - https://github.com/g023
	import torch
	import torch.nn as nn
	import gc
	import math
	import tracemalloc

	# Optional psutil for CPU memory readings; if missing we'll fall back to CUDA
	try:
	import psutil