A self-hosted AI inference station built on the NVIDIA Jetson Orin Nano. Local models run on-device via Ollama; LiteLLM provides a unified OpenAI-compatible API with Claude, Gemini, and Grok as cloud fallbacks.
Jetson AI is a personal edge inference platform — a dedicated machine that runs AI workloads locally, on your own hardware, without sending data to a cloud provider by default. It keeps latency low, costs predictable, and sensitive queries on-device.
When a task exceeds local capacity, LiteLLM automatically routes to the best available cloud model — Claude, Gemini, or Grok — through a single unified API endpoint that looks identical to any caller.
Purpose-built for sustained AI inference at the edge — fast storage for models, dedicated NPU for acceleration.
Runs open-weight models directly on the Jetson's GPU and NPU. Handles model management, quantization, and serving — no internet required for local inference.
Exposes a single OpenAI-compatible endpoint. Routes requests to local Ollama models first; falls back to cloud providers automatically when needed. Any app that speaks OpenAI protocol works without changes.