Skip to content

Instantly share code, notes, and snippets.

View awdemos's full-sized avatar
🏠
Working from home

Drew the AI Guy awdemos

🏠
Working from home
View GitHub Profile
@awdemos
awdemos / zai-opencode-mapping.md
Created May 10, 2026 11:31 — forked from apnea/zai-opencode-mapping.md
Z.AI Coding Plan — OpenCode agent-to-model mapping

Z.AI Coding Plan — OpenCode Agent Mapping

Quota Cost per Model

Peak hours: 14:00–18:00 UTC+8. Off-peak: all other times. Monthly quota is equivalent to ~15–30× the subscription fee, converted at API pricing rates.

Model Quota (Peak: 14:00-18:00 UTC+8) Quota (Off-Peak) Temporary (thru June)
GLM-5.1 1× off-peak
GLM-5-Turbo 1× off-peak
@awdemos
awdemos / gist:2afc60416a62eae2b92ef7eb145dd0f7
Last active May 2, 2026 23:43
Disable Fedora Age Verification
Fedora Atomic (e.g., Silverblue, Kinoite) uses rpm-ostree for immutable /usr, so traditional `systemctl mask` on host services like systemd-userdbd requires layering or overrides, while user-level masking persists across rebase. These steps target age verification via systemd-userdb birthDate field and AccountsService integration. Reboot and `rpm-ostree status` after; test in toolbox for safety. [discussion.fedoraproject](https://discussion.fedoraproject.org/t/a-practical-architectural-solution-to-os-level-age-verification-laws/183387)
## Core Removal
Layer tools to override packages or drop-ins for persistence post-rebase.
- **User-level mask (quickest, survives updates)**: `systemctl --user mask --now systemd-userdbd.service systemd-userdbd.socket xdg-desktop-portal.service`—blocks per-user D-Bus queries. [github](https://github.com/systemd/systemd/issues/15175)
- **Host mask systemd-userdbd**: `rpm-ostree override remove systemd-userdbd` then `rpm-ostree install systemd` (or layer `systemd-libs`); rebas
@awdemos
awdemos / grpo_demo.py
Created August 6, 2025 03:31 — forked from NickyDark1/grpo_demo.py
a grpo modifaction for deepspeed in multigpu from https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb
# train_grpo.py
from typing import *
import re
import torch
from datasets import load_dataset, Dataset, load_from_disk
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments
from peft import LoraConfig
from trl import GRPOConfig, GRPOTrainer, TrlParser
from dataclasses import dataclass, field
@awdemos
awdemos / grpo_demo.py
Created March 6, 2025 13:48 — forked from willccbb/grpo_demo.py
GRPO Llama-1B
# train_grpo.py
#
# See https://github.com/willccbb/verifiers for ongoing developments
#
import re
import torch
from datasets import load_dataset, Dataset
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import LoraConfig
from trl import GRPOConfig, GRPOTrainer
@awdemos
awdemos / prompt.txt
Created March 2, 2025 19:17 — forked from dakshvar22/prompt.txt
Deepseek R1 for self-improvement of CALM's prompts
I am building a prompt for an LLM (gpt-4o) while building a conversational assistant. The LLM is expected to predict one of the available commands based on the instructions given. Here is the current prompt -
```
Your task is to analyze the current conversation context and generate a list of actions to start new business processes that we call flows, to extract slots, or respond to small talk and knowledge requests.
These are the flows that can be started, with their description and slots:
transfer_money: send money to friends and family
slot: transfer_money_recipient (the name of a person)
slot: transfer_money_amount_of_money (the amount of money without any currency designation)
@awdemos
awdemos / Deepseek Model Architecture and Training.md
Created February 2, 2025 02:20
Deepseek Model Architecture and Training

Model Architecture and Training

  • DeepSeek used GRPO (Group Reward Policy Optimization), a variant of PPO (Proximal Policy Optimization) for training

  • GRPO differs from PPO in several ways:

    • Samples multiple completions (G of them) instead of just one
    • Doesn't use a separate value network, instead using the group of completions as a Monte Carlo estimate
    • Uses a reference policy (SFT model) with KL divergence to prevent drift
  • The model is based on a distilled version of Qwen 7B (7.62 billion parameters)

  • They used a custom 8-bit format (F8 E4M3) for lower precision training

@awdemos
awdemos / How many GPUs needed for GPT-3.md
Last active February 1, 2025 18:53
How many GPUs needed for GPT-3?

GPT-3's computational requirements depend heavily on model size, precision, and hardware optimization. For the full 175B-parameter model, running inference requires at least 5 NVIDIA A100 GPUs (80GB each) to meet the 350GB memory requirement when using FP16 precision[4][9][22]. However, practical implementations often use 8 GPUs for improved parallelism and throughput[3][22].

Key Technical Requirements:

  • Memory:

    • 175B parameters require 350GB of VRAM at FP16 precision (2 bytes/parameter)[4][9][14].
    • Consumer GPUs like RTX 3090s (24GB VRAM) can technically work in multi-GPU setups (e.g., 8x24GB = 192GB), but require aggressive memory optimizations like 8-bit quantization[2][28].
  • Hardware Recommendations:

    • Data center GPUs: 5–8 NVIDIA A100/A800 (80GB) GPUs for stable deployment[3][4][13].
  • Consumer GPUs: 8x RTX 3090/4090 with PCIe 5.0 and NVLink for reduced communication bottlenecks[2][28].

@awdemos
awdemos / The Evolution of Large Language Models: From Transformers to DeepSeek-R1.md
Last active January 31, 2025 00:57
The Evolution of Large Language Models: From Transformers to DeepSeek-R1

The Evolution of Large Language Models: From Transformers to DeepSeek-R1

The field of artificial intelligence has seen remarkable progress in recent years, particularly in the domain of large language models (LLMs). This article explores the journey from the foundational Transformer architecture to the cutting-edge DeepSeek-R1 model, highlighting key developments and breakthroughs along the way.

Transformer Architecture: The Foundation of Modern LLMs

The Transformer architecture, introduced in 2017, revolutionized natural language processing. Its attention mechanism allowed for more efficient processing of sequential data, paving the way for larger and more capable language models1.

@awdemos
awdemos / Tutorial: Running Deepseek R1 Locally on Your Computer.md
Created January 29, 2025 13:34
Tutorial: Running Deepseek R1 Locally on Your Computer

Tutorial: Running Deepseek R1 Locally on Your Computer

Downloading and Running Deepseek R1 Model

Step 1: Download and Install Ollama

Ollama is an engine that allows you to run large language models (LLMs) such as Deepseek R1 on your local machine.

  1. Download Ollama from the Ollama Download Page.
  2. Follow the installation instructions specific to your operating system.
@awdemos
awdemos / Run DeepSeek R1 Dynamic 1.58-bit with Unsloth.md
Last active January 28, 2025 16:47
Run DeepSeek R1 Dynamic 1.58-bit with Unsloth

Introducing DeepSeek R1: A New Era in Open-Source AI

Version 1.58-bit Dynamic
Released: January 27, 2025
Authors: Andrew

DeepSeek R1 is making headlines by challenging OpenAI's O1 reasoning model, all while being completely open-source. We've worked on making it more accessible for local users by reducing the model's size from 720GB to 131GB, an impressive 80% reduction, without compromising its functionality.

By analyzing DeepSeek R1's structure, the Unsloth team was selectively able to quantize certain layers to higher bits (like 4-bit) while keeping most MoE layers at 1.5-bit. This approach prevents the model from producing errors like endless loops or nonsensical outputs, which occur if all layers are naively quantized.