Local-first inference
Tanvrit AI runs language and vision models on your own hardware via Ollama. No API key required, no source code uploaded, no quotas. When local won't cut it, the Plus tier lets you bring an Anthropic / OpenAI / Gemini key for cloud-fallback.
Privacy
Source code never leaves your machine. The free tier doesn't even talk to our servers — your index, embeddings, and inference stay on disk and on-CPU/GPU.
Cost
Once a model is on disk, calls are free. No per-token billing, no surprise overage at the end of the month, no "AI credits" to manage.
Latency
No round-trip to a hosted API. For small models on Apple Silicon you typically see first-token latency under 200 ms.
No quotas
Throw a million-line repo at it overnight. The only limit is your disk and the model's context window.
Install Ollama, pull a model, you are done
# 1. Install Ollama (one-line install)
curl -fsSL https://ollama.com/install.sh | sh
# 2. Pull the recommended models
ollama pull qwen2.5-coder
ollama pull nomic-embed-text
ollama pull qwen2.5-vl
# 3. Open Tanvrit AI — it auto-detects Ollama on localhost:11434.Tanvrit AI auto-detects Ollama on its default port. Switch models from Settings → Models; the dashboard's health pill shows which one is currently active.
Recommended models
qwen2.5-coderPlanning / chat · 1.5B / 7B / 14BSolid coder model with strong instruction following. 7B is the sweet spot on 16 GB RAM.
nomic-embed-textEmbeddings · 137MFast, strong on code-flavoured text. Tanvrit uses it for the vector index by default.
qwen2.5-vlVision (screenshots, diagrams) · 3B / 7BUsed when the agent needs to read a screenshot. Falls back to Tesseract OCR if not available.
Hardware reality check
We don't want to lie to you about local inference. Here is what each RAM tier actually delivers.
| Tier | RAM | What runs well |
|---|---|---|
| Minimum | 8 GB | 1.5B–3B models. Indexing is fine; generation is usable for short prompts. |
| Recommended | 16 GB | 7B coder + 137M embeddings comfortably. Most users land here. |
| Strong | 32 GB+ | 14B coder, 7B vision, simultaneous embeddings. Apple Silicon shines. |
Vision: Qwen2.5-VL with Tesseract fallback
When the agent needs to read a screenshot or diagram, Tanvrit tries qwen2.5-vl via Ollama first. If you don't have the vision model pulled, it falls back to Tesseract OCR for text extraction — degraded, but functional. We tell the agent which mode it's in so it doesn't over-reach.
Honest: Wasm portal limitations
The Wasm version of the app at /app/ degrades engine-dependent features silently — local inference, file watching, the Swift inference engine, and direct file-system access are not available in a browser. For real work, install the desktop build. Tracked in /docs/wasm-limitations (phase 2).
When you do want the cloud
Some prompts are bigger than your RAM. Plus ($9 / month) lets you bring an Anthropic / OpenAI / Gemini key and route only the requests that need it to the cloud.
See pricing →