Local LLM vs API: which to run on your VPS
read · 5 min
Both approaches run on the same VPS, but they trade off differently. Local models give you fixed costs and full privacy; API models give you frontier capability with no hardware. Here is how to decide.
Key points
Cost model
A local model costs a fixed monthly VPS fee no matter how much you use it. An API charges per token — cheap for light use, but it scales with volume. Heavy, predictable workloads favour local; bursty or low-volume usage favours API.
Privacy
With a local model your data never leaves the server — ideal for sensitive code or regulated data. With an API, prompts travel to the provider; choose EU endpoints and minimise personal data if that matters.
Capability
Frontier API models (Claude Opus, GPT, Gemini) still lead on hard reasoning and large context. Open-weights models (Llama, DeepSeek, Qwen, Mistral) are excellent and improving fast, and are often more than enough for focused tasks.
Hardware
Small open models (up to ~8B) run on CPU on a normal VPS. Larger models want a GPU. API models need no special hardware at all — just network access.
When to choose which
Pick local for privacy, fixed budgets and offline control. Pick API for maximum capability and zero hardware management. Many setups use both: a cheap local model for routine calls, a frontier API for the hard ones.
Frequently asked
Can I run both on one VPS? +
Yes. A common pattern is routing simple requests to a small local model and hard ones to a frontier API, keeping costs down without losing capability.
Do small local models need a GPU? +
No — models up to around 8B parameters run acceptably on CPU. A GPU only becomes necessary for larger models or high throughput.
Which is cheaper? +
It depends on volume. High, steady usage is usually cheaper on a fixed-fee local model; light or unpredictable usage is usually cheaper on a per-token API.
Related guides
Run either — or both
A VPS with room for local models and the bandwidth for API calls.
See VPS plans →