From 38089ceaf0cf806fd898c2cab3d1fe549e5aca4a Mon Sep 17 00:00:00 2001
From: Anton Abyzov <anton.abyzov@easychamp.com>
Date: Tue, 31 Mar 2026 20:12:32 -0400
Subject: [PATCH] update README: working local model setup with
 LOCAL_MODEL_BASE_URL

---
 README.md | 44 ++++++++++++++++++++++++--------------------
 1 file changed, 24 insertions(+), 20 deletions(-)

diff --git a/README.md b/README.md
index 95e0212..d48d101 100644
--- a/README.md
+++ b/README.md
@@ -49,32 +49,36 @@ export ANTHROPIC_API_KEY=your-key
 node cli.js
 ```
 
-### With Local Models (Ollama, LM Studio)
+### With Local Models (Ollama + Qwen3-Coder)
 
-Claude Code uses the Anthropic Messages API format. To use local models, run [litellm](https://github.com/BerriAI/litellm) as a translation proxy:
+We patched the source to add `LOCAL_MODEL_BASE_URL` — routes only model API calls to your local proxy while letting auth/startup use Anthropic's servers normally.
+
+**Requirements:** [Ollama](https://ollama.com) + [litellm](https://github.com/BerriAI/litellm) + a Claude subscription (for auth)
 
 ```bash
-# Terminal 1: Start litellm proxy
-pip install litellm
-litellm --model ollama/llama3.1:8b --port 8080
+# Step 1: Pull a model with 128K+ context (required for Claude Code's system prompt)
+ollama pull qwen3-coder:30b
 
-# Terminal 2: Point Claude Code at the proxy
-export ANTHROPIC_BASE_URL=http://localhost:8080
-export ANTHROPIC_API_KEY=not-needed
-node cli.js
+# Step 2: Create litellm config that maps Claude's model name to your local model
+cat > litellm-config.yaml << 'CONF'
+model_list:
+  - model_name: "claude-sonnet-4-20250514"
+    litellm_params:
+      model: "ollama/qwen3-coder:30b"
+      num_ctx: 65536
+litellm_settings:
+  drop_params: true
+CONF
+
+# Step 3: Start litellm proxy (needs Python 3.10+)
+pip install 'litellm[proxy]'
+litellm --config litellm-config.yaml --port 8080
+
+# Step 4: Run Claude Code (in another terminal)
+LOCAL_MODEL_BASE_URL=http://localhost:8080 node cli.js
 ```
 
-Works with any model Ollama supports — llama3.1, codellama, deepseek-coder, mistral, etc.
-
-### With OpenAI / GPT Models
-
-```bash
-# Via litellm proxy
-litellm --model openai/gpt-4o --port 8080
-
-# Or any OpenAI-compatible endpoint (Codex, GPT-5.4, etc.)
-litellm --model openai/o3 --port 8080
-```
+Claude Code authenticates with Anthropic normally (you need a subscription), but all model inference runs locally on Qwen3-Coder via Ollama. Works with any model that has 128K+ context — qwen3-coder, deepseek-r1, llama4, etc.
 
 ---