LOCAL VS API

Local LLM vs API: which to run on your VPS

read · 5 min

Both approaches run on the same VPS, but they trade off differently. Local models give you fixed costs and full privacy; API models give you frontier capability with no hardware. Here is how to decide.

Key points

Cost model

A local model costs a fixed monthly VPS fee no matter how much you use it. An API charges per token — cheap for light use, but it scales with volume. Heavy, predictable workloads favour local; bursty or low-volume usage favours API.

Privacy

With a local model your data never leaves the server — ideal for sensitive code or regulated data. With an API, prompts travel to the provider; choose EU endpoints and minimise personal data if that matters.

Capability

Frontier API models (Claude Opus, GPT, Gemini) still lead on hard reasoning and large context. Open-weights models (Llama, DeepSeek, Qwen, Mistral) are excellent and improving fast, and are often more than enough for focused tasks.

Hardware

Small open models (up to ~8B) run on CPU on a normal VPS. Larger models want a GPU. API models need no special hardware at all — just network access.

When to choose which

Pick local for privacy, fixed budgets and offline control. Pick API for maximum capability and zero hardware management. Many setups use both: a cheap local model for routine calls, a frontier API for the hard ones.

Frequently asked

Can I run both on one VPS? +

Yes. A common pattern is routing simple requests to a small local model and hard ones to a frontier API, keeping costs down without losing capability.

Do small local models need a GPU? +

No — models up to around 8B parameters run acceptably on CPU. A GPU only becomes necessary for larger models or high throughput.

Which is cheaper? +

It depends on volume. High, steady usage is usually cheaper on a fixed-fee local model; light or unpredictable usage is usually cheaper on a per-token API.

Related guides

RAG

Self-hosting a RAG vector database (ChromaDB & pgvector)

CLAUDE CODE

How to host Claude Code on a persistent VPS

Run either — or both

A VPS with room for local models and the bandwidth for API calls.

See VPS plans →

Guides ←