update README: working local model setup with LOCAL_MODEL_BASE_URL

This commit is contained in:
Anton Abyzov 2026-03-31 20:12:32 -04:00
parent 8605b3e54a
commit 38089ceaf0

View File

@ -49,32 +49,36 @@ export ANTHROPIC_API_KEY=your-key
node cli.js
```
### With Local Models (Ollama, LM Studio)
### With Local Models (Ollama + Qwen3-Coder)
Claude Code uses the Anthropic Messages API format. To use local models, run [litellm](https://github.com/BerriAI/litellm) as a translation proxy:
We patched the source to add `LOCAL_MODEL_BASE_URL` — routes only model API calls to your local proxy while letting auth/startup use Anthropic's servers normally.
**Requirements:** [Ollama](https://ollama.com) + [litellm](https://github.com/BerriAI/litellm) + a Claude subscription (for auth)
```bash
# Terminal 1: Start litellm proxy
pip install litellm
litellm --model ollama/llama3.1:8b --port 8080
# Step 1: Pull a model with 128K+ context (required for Claude Code's system prompt)
ollama pull qwen3-coder:30b
# Terminal 2: Point Claude Code at the proxy
export ANTHROPIC_BASE_URL=http://localhost:8080
export ANTHROPIC_API_KEY=not-needed
node cli.js
# Step 2: Create litellm config that maps Claude's model name to your local model
cat > litellm-config.yaml << 'CONF'
model_list:
- model_name: "claude-sonnet-4-20250514"
litellm_params:
model: "ollama/qwen3-coder:30b"
num_ctx: 65536
litellm_settings:
drop_params: true
CONF
# Step 3: Start litellm proxy (needs Python 3.10+)
pip install 'litellm[proxy]'
litellm --config litellm-config.yaml --port 8080
# Step 4: Run Claude Code (in another terminal)
LOCAL_MODEL_BASE_URL=http://localhost:8080 node cli.js
```
Works with any model Ollama supports — llama3.1, codellama, deepseek-coder, mistral, etc.
### With OpenAI / GPT Models
```bash
# Via litellm proxy
litellm --model openai/gpt-4o --port 8080
# Or any OpenAI-compatible endpoint (Codex, GPT-5.4, etc.)
litellm --model openai/o3 --port 8080
```
Claude Code authenticates with Anthropic normally (you need a subscription), but all model inference runs locally on Qwen3-Coder via Ollama. Works with any model that has 128K+ context — qwen3-coder, deepseek-r1, llama4, etc.
---