Here we explore prompt chaining with local reasoning models in combination with base models. With shockingly powerful local models like QwQ and Qwen, we can build some powerful prompt chains that let us tap into their capabilities in a immediately useful, local, private, AND free way.
Explore the idea of building prompt chains where the first is a powerful reasoning model that generates a response, and then use a base model to extract the response.
Play with the prompts and models to see what works best for your use cases. Use the o1 series to see how qwq compares.
- Bun (to run
bun run chain.ts ...
) - uv (to run
uv run python chain.py ...
) - Ollama (for qwen and qwq)
ollama pull qwen2.5-coder:14b
ollama pull qwen2.5-coder:32b
ollama pull qwq:32b
- LLM (for o1-mini)
ollama run qwen2.5-coder:14b "ping"
ollama run qwq:32b "ping"
llm -m o1-mini "ping"
Problem: Chain of thought is included in qwq output. Solution: Use Prompt Chain 'Reasoner' + 'Extraction' pattern
bun run chain.ts "QwQ CANT BE: Local Prompt Chaining with Alibaba's QwQ reasoning model (bun + ollama)" "prompt_title_generation_reasoner.txt,prompt_title_generation_extraction.txt" "ollama:qwq:32b,ollama:qwen2.5-coder:32b"
bun run chain.ts "<starting value>" "<prompt-file-1.txt,prompt-file-2.txt,...>" "<ollama:model-1,llm:model-2,...>"
or
uv run python chain.py "<starting value>" "<prompt-file-1.txt,prompt-file-2.txt,...>" "<ollama:model-1,llm:model-2,...>"
bun run chain.ts "start-value-with-bun" "prompt_simple.txt" "ollama:qwen2.5-coder:14b"
or
uv run python chain.py "start-value-with-uv-python" "prompt_simple.txt" "ollama:qwen2.5-coder:14b"
bun run chain.ts "start-value" "prompt_simple.txt,prompt_simple.txt" "ollama:qwen2.5-coder:14b,ollama:qwen2.5-coder:14b"
or
uv run python chain.py "start-value" "prompt_simple.txt,prompt_simple.txt" "ollama:qwen2.5-coder:14b,ollama:qwen2.5-coder:14b"
bun run chain.ts "start-value" "prompt_simple.txt,prompt_simple.txt,prompt_simple.txt" "ollama:qwen2.5-coder:14b,ollama:qwen2.5-coder:14b,ollama:qwen2.5-coder:14b"
bun run chain.ts "start-value" "prompt_simple.txt,prompt_simple.txt,prompt_simple.txt,prompt_simple.txt,prompt_simple.txt,prompt_simple.txt,prompt_simple.txt,prompt_simple.txt,prompt_simple.txt,prompt_simple.txt" "ollama:qwen2.5-coder:14b,ollama:qwen2.5-coder:14b,ollama:qwen2.5-coder:14b,ollama:qwen2.5-coder:14b,ollama:qwen2.5-coder:14b,ollama:qwen2.5-coder:14b,ollama:qwen2.5-coder:14b,ollama:qwen2.5-coder:14b,ollama:qwen2.5-coder:14b,ollama:qwen2.5-coder:14b"
Mermaid Chart Generation
bun run chain.ts "Create a mermaid chart of the software development lifecycle with ai tooling in 5 steps." "prompt_mermaid_reasoner.txt" "ollama:qwq:32b" > output/pc1_mermaid_qwq_32b.txt
bun run chain.ts "Create a mermaid chart of the software development lifecycle with ai tooling in 5 steps." "prompt_mermaid_reasoner.txt" "llm:o1-mini" > output/pc1_mermaid_o1_mini.txt
Content Title Generation
bun run chain.ts "QwQ CANT BE: Local Prompt Chaining with Alibaba's QwQ reasoning model (bun + ollama)" "prompt_title_generation_reasoner.txt,prompt_title_generation_extraction.txt" "ollama:qwq:32b,ollama:qwen2.5-coder:32b" > output/pc2_title_generation_qwq_32b.txt
bun run chain.ts "QwQ CANT BE: Local Prompt Chaining with Alibaba's QwQ reasoning model (bun + ollama)" "prompt_title_generation_reasoner.txt,prompt_title_generation_extraction.txt" "llm:o1-mini,llm:gpt-4o-mini" > output/pc2_title_generation_o1_mini.txt
Architecture Design
bun run chain.ts "Design a low-latency event betting/gambling platform" "prompt_architecture_design_reasoner.txt,prompt_architecture_design_extraction.txt,prompt_architecture_design_simplifier.txt" "ollama:qwq:32b,ollama:qwen2.5-coder:32b,ollama:qwen2.5-coder:32b" > output/pc3_architecture_design_qwq_32b.txt
bun run chain.ts "Design a low-latency event betting/gambling platform" "prompt_architecture_design_reasoner.txt,prompt_architecture_design_extraction.txt,prompt_architecture_design_simplifier.txt" "llm:o1-mini,llm:gpt-4o-mini,llm:gpt-4o-mini" > output/pc3_architecture_design_o1_mini.txt
bun run chain.ts "Design a low-latency event betting/gambling platform" "prompt_architecture_design_reasoner.txt,prompt_architecture_design_extraction.txt,prompt_architecture_design_simplifier.txt" "llm:o1-preview,llm:gpt-4o,llm:gpt-4o" > output/pc3_architecture_design_o1_preview.txt