Last active
August 4, 2025 19:49
-
-
Save olafgeibig/7cdaa4c9405e22dba02dc57ce2c7b31f to your computer and use it in GitHub Desktop.
A LiteLLM proxy solution to use Claude Code with models from the Weights and Biases inference service. You need to have LiteLLM installed or use the docker container. Easiest is to install it with `uv tool install "litellm[proxy]"` Don't worry about the fallback warnings. Either LiteLLM, W&B or the combo of both are not handling streaming respon…
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
export WANDB_API_KEY=<your key> | |
export WANDB_PROJECT=<org/project> | |
litellm --port 4000 --debug --config cc-proxy.yaml |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
litellm_settings: | |
drop_params: true | |
cache: True | |
cache_params: | |
type: local | |
enable_preview_features: True | |
model_list: | |
- model_name: anthropic/claude-sonnet-* | |
litellm_params: | |
model: openai/Qwen/Qwen3-Coder-480B-A35B-Instruct | |
api_key: "os.environ/WANDB_API_KEY" | |
api_base: https://api.inference.wandb.ai/v1 | |
headers: | |
OpenAI-Project: "os.environ/WANDB_PROJECT" | |
max_tokens: 65536 | |
repetition_penalty: 1.05 | |
temperature: 0.7 | |
top_k: 20 | |
top_p: 0.8 | |
model_info: | |
input_cost_per_token: 0.000001 | |
output_cost_per_token: 0.0000015 | |
- model_name: anthropic/claude-opus-* | |
litellm_params: | |
model: openai/Qwen/Qwen3-235B-A22B-Thinking-2507 | |
api_key: "os.environ/WANDB_API_KEY" | |
api_base: https://api.inference.wandb.ai/v1 | |
headers: | |
OpenAI-Project: "os.environ/WANDB_PROJECT" | |
max_tokens: 65536 | |
repetition_penalty: 1.05 | |
temperature: 0.6 | |
top_k: 40 | |
top_p: 0.95 | |
model_info: | |
input_cost_per_token: 0.0000001 | |
output_cost_per_token: 0.0000001 | |
- model_name: anthropic/claude-3-5-haiku-* | |
litellm_params: | |
model: openai/Qwen/Qwen3-235B-A22B-Instruct-2507 | |
api_key: "os.environ/WANDB_API_KEY" | |
api_base: https://api.inference.wandb.ai/v1 | |
max_tokens: 65536 | |
repetition_penalty: 1.05 | |
temperature: 0.7 | |
top_k: 20 | |
top_p: 0.8 | |
headers: | |
OpenAI-Project: "os.environ/WANDB_PROJECT" | |
input_cost_per_token: 0.0000001 | |
output_cost_per_token: 0.0000001 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
export ANTHROPIC_AUTH_TOKEN=sk-1234 | |
export ANTHROPIC_BASE_URL=http://localhost:4000 | |
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 | |
# Startin VS Code, but could also run claude here | |
code & |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
hello @olafgeibig
do you have a video tutorial on how to setup this?