update README: working local model setup with LOCAL_MODEL_BASE_URL

2026-06-30 11:36:57 +10:00 · 2026-03-31 20:12:32 -04:00 · 2026-03-31 20:12:32 -04:00 · 38089ceaf0
commit 38089ceaf0
parent 8605b3e54a
1 changed files with 24 additions and 20 deletions
--- a/README.md
+++ b/README.md
@ -49,32 +49,36 @@ export ANTHROPIC_API_KEY=your-key
 node cli.js
 ```
-### With Local Models (Ollama, LM Studio)
+### With Local Models (Ollama + Qwen3-Coder)
-Claude Code uses the Anthropic Messages API format. To use local models, run [litellm](https://github.com/BerriAI/litellm) as a translation proxy:
+We patched the source to add `LOCAL_MODEL_BASE_URL` — routes only model API calls to your local proxy while letting auth/startup use Anthropic's servers normally.
 **Requirements:** [Ollama](https://ollama.com) + [litellm](https://github.com/BerriAI/litellm) + a Claude subscription (for auth)
 ```bash
-# Terminal 1: Start litellm proxy
+# Step 1: Pull a model with 128K+ context (required for Claude Code's system prompt)
-pip install litellm
+ollama pull qwen3-coder:30b
 litellm --model ollama/llama3.1:8b --port 8080
-# Terminal 2: Point Claude Code at the proxy
+# Step 2: Create litellm config that maps Claude's model name to your local model
-export ANTHROPIC_BASE_URL=http://localhost:8080
+cat > litellm-config.yaml << 'CONF'
-export ANTHROPIC_API_KEY=not-needed
+model_list:
-node cli.js
+  - model_name: "claude-sonnet-4-20250514"
    litellm_params:
      model: "ollama/qwen3-coder:30b"
      num_ctx: 65536
 litellm_settings:
  drop_params: true
 CONF
 # Step 3: Start litellm proxy (needs Python 3.10+)
 pip install 'litellm[proxy]'
 litellm --config litellm-config.yaml --port 8080
 # Step 4: Run Claude Code (in another terminal)
 LOCAL_MODEL_BASE_URL=http://localhost:8080 node cli.js
 ```
-Works with any model Ollama supports — llama3.1, codellama, deepseek-coder, mistral, etc.
+Claude Code authenticates with Anthropic normally (you need a subscription), but all model inference runs locally on Qwen3-Coder via Ollama. Works with any model that has 128K+ context — qwen3-coder, deepseek-r1, llama4, etc.
 ### With OpenAI / GPT Models
 ```bash
 # Via litellm proxy
 litellm --model openai/gpt-4o --port 8080
 # Or any OpenAI-compatible endpoint (Codex, GPT-5.4, etc.)
 litellm --model openai/o3 --port 8080
 ```
 ---