← Back to Tanvrit AI
Features

Local-first inference

Tanvrit AI runs language and vision models on your own hardware via Ollama. No API key required, no source code uploaded, no quotas. When local won't cut it, the Plus tier lets you bring an Anthropic / OpenAI / Gemini key for cloud-fallback.

Privacy

Source code never leaves your machine. The free tier doesn't even talk to our servers — your index, embeddings, and inference stay on disk and on-CPU/GPU.

Cost

Once a model is on disk, calls are free. No per-token billing, no surprise overage at the end of the month, no "AI credits" to manage.

Latency

No round-trip to a hosted API. For small models on Apple Silicon you typically see first-token latency under 200 ms.

No quotas

Throw a million-line repo at it overnight. The only limit is your disk and the model's context window.

Install Ollama, pull a model, you are done

# 1. Install Ollama (one-line install)
curl -fsSL https://ollama.com/install.sh | sh

# 2. Pull the recommended models
ollama pull qwen2.5-coder
ollama pull nomic-embed-text
ollama pull qwen2.5-vl

# 3. Open Tanvrit AI — it auto-detects Ollama on localhost:11434.

Tanvrit AI auto-detects Ollama on its default port. Switch models from Settings → Models; the dashboard's health pill shows which one is currently active.

Recommended models

qwen2.5-coderPlanning / chat · 1.5B / 7B / 14B

Solid coder model with strong instruction following. 7B is the sweet spot on 16 GB RAM.

nomic-embed-textEmbeddings · 137M

Fast, strong on code-flavoured text. Tanvrit uses it for the vector index by default.

qwen2.5-vlVision (screenshots, diagrams) · 3B / 7B

Used when the agent needs to read a screenshot. Falls back to Tesseract OCR if not available.

Hardware reality check

We don't want to lie to you about local inference. Here is what each RAM tier actually delivers.

TierRAMWhat runs well
Minimum8 GB1.5B–3B models. Indexing is fine; generation is usable for short prompts.
Recommended16 GB7B coder + 137M embeddings comfortably. Most users land here.
Strong32 GB+14B coder, 7B vision, simultaneous embeddings. Apple Silicon shines.

Vision: Qwen2.5-VL with Tesseract fallback

When the agent needs to read a screenshot or diagram, Tanvrit tries qwen2.5-vl via Ollama first. If you don't have the vision model pulled, it falls back to Tesseract OCR for text extraction — degraded, but functional. We tell the agent which mode it's in so it doesn't over-reach.

Honest: Wasm portal limitations

The Wasm version of the app at /app/ degrades engine-dependent features silently — local inference, file watching, the Swift inference engine, and direct file-system access are not available in a browser. For real work, install the desktop build. Tracked in /docs/wasm-limitations (phase 2).

When you do want the cloud

Some prompts are bigger than your RAM. Plus ($9 / month) lets you bring an Anthropic / OpenAI / Gemini key and route only the requests that need it to the cloud.

See pricing →