update README: working local model setup with LOCAL_MODEL_BASE_URL

2026-06-30 11:36:57 +10:00 · 2026-03-31 20:12:32 -04:00 · 2026-03-31 20:12:32 -04:00 · 38089ceaf0
commit 38089ceaf0
parent 8605b3e54a
1 changed files with 24 additions and 20 deletions
--- a/README.md
+++ b/README.md
@ -49,32 +49,36 @@ export ANTHROPIC_API_KEY=your-key
 node cli.js
 ```

-### With Local Models (Ollama, LM Studio)
+### With Local Models (Ollama + Qwen3-Coder)

-Claude Code uses the Anthropic Messages API format. To use local models, run [litellm](https://github.com/BerriAI/litellm) as a translation proxy:
+We patched the source to add `LOCAL_MODEL_BASE_URL` — routes only model API calls to your local proxy while letting auth/startup use Anthropic's servers normally.
+
+**Requirements:** [Ollama](https://ollama.com) + [litellm](https://github.com/BerriAI/litellm) + a Claude subscription (for auth)

 ```bash
-# Terminal 1: Start litellm proxy
-pip install litellm
-litellm --model ollama/llama3.1:8b --port 8080
+# Step 1: Pull a model with 128K+ context (required for Claude Code's system prompt)
+ollama pull qwen3-coder:30b

-# Terminal 2: Point Claude Code at the proxy
-export ANTHROPIC_BASE_URL=http://localhost:8080
-export ANTHROPIC_API_KEY=not-needed
-node cli.js
+# Step 2: Create litellm config that maps Claude's model name to your local model
+cat > litellm-config.yaml << 'CONF'
+model_list:
+  - model_name: "claude-sonnet-4-20250514"
+    litellm_params:
+      model: "ollama/qwen3-coder:30b"
+      num_ctx: 65536
+litellm_settings:
+  drop_params: true
+CONF
+
+# Step 3: Start litellm proxy (needs Python 3.10+)
+pip install 'litellm[proxy]'
+litellm --config litellm-config.yaml --port 8080
+
+# Step 4: Run Claude Code (in another terminal)
+LOCAL_MODEL_BASE_URL=http://localhost:8080 node cli.js
 ```

-Works with any model Ollama supports — llama3.1, codellama, deepseek-coder, mistral, etc.
-
-### With OpenAI / GPT Models
-
-```bash
-# Via litellm proxy
-litellm --model openai/gpt-4o --port 8080
-
-# Or any OpenAI-compatible endpoint (Codex, GPT-5.4, etc.)
-litellm --model openai/o3 --port 8080
-```
+Claude Code authenticates with Anthropic normally (you need a subscription), but all model inference runs locally on Qwen3-Coder via Ollama. Works with any model that has 128K+ context — qwen3-coder, deepseek-r1, llama4, etc.

 ---