From 38089ceaf0cf806fd898c2cab3d1fe549e5aca4a Mon Sep 17 00:00:00 2001 From: Anton Abyzov Date: Tue, 31 Mar 2026 20:12:32 -0400 Subject: [PATCH] update README: working local model setup with LOCAL_MODEL_BASE_URL --- README.md | 44 ++++++++++++++++++++++++-------------------- 1 file changed, 24 insertions(+), 20 deletions(-) diff --git a/README.md b/README.md index 95e0212..d48d101 100644 --- a/README.md +++ b/README.md @@ -49,32 +49,36 @@ export ANTHROPIC_API_KEY=your-key node cli.js ``` -### With Local Models (Ollama, LM Studio) +### With Local Models (Ollama + Qwen3-Coder) -Claude Code uses the Anthropic Messages API format. To use local models, run [litellm](https://github.com/BerriAI/litellm) as a translation proxy: +We patched the source to add `LOCAL_MODEL_BASE_URL` — routes only model API calls to your local proxy while letting auth/startup use Anthropic's servers normally. + +**Requirements:** [Ollama](https://ollama.com) + [litellm](https://github.com/BerriAI/litellm) + a Claude subscription (for auth) ```bash -# Terminal 1: Start litellm proxy -pip install litellm -litellm --model ollama/llama3.1:8b --port 8080 +# Step 1: Pull a model with 128K+ context (required for Claude Code's system prompt) +ollama pull qwen3-coder:30b -# Terminal 2: Point Claude Code at the proxy -export ANTHROPIC_BASE_URL=http://localhost:8080 -export ANTHROPIC_API_KEY=not-needed -node cli.js +# Step 2: Create litellm config that maps Claude's model name to your local model +cat > litellm-config.yaml << 'CONF' +model_list: + - model_name: "claude-sonnet-4-20250514" + litellm_params: + model: "ollama/qwen3-coder:30b" + num_ctx: 65536 +litellm_settings: + drop_params: true +CONF + +# Step 3: Start litellm proxy (needs Python 3.10+) +pip install 'litellm[proxy]' +litellm --config litellm-config.yaml --port 8080 + +# Step 4: Run Claude Code (in another terminal) +LOCAL_MODEL_BASE_URL=http://localhost:8080 node cli.js ``` -Works with any model Ollama supports — llama3.1, codellama, deepseek-coder, mistral, etc. - -### With OpenAI / GPT Models - -```bash -# Via litellm proxy -litellm --model openai/gpt-4o --port 8080 - -# Or any OpenAI-compatible endpoint (Codex, GPT-5.4, etc.) -litellm --model openai/o3 --port 8080 -``` +Claude Code authenticates with Anthropic normally (you need a subscription), but all model inference runs locally on Qwen3-Coder via Ollama. Works with any model that has 128K+ context — qwen3-coder, deepseek-r1, llama4, etc. ---