All Projects
Edge AI Active

Jetson AI

A self-hosted AI inference station built on the NVIDIA Jetson Orin Nano. Local models run on-device via Ollama; LiteLLM provides a unified OpenAI-compatible API with Claude, Gemini, and Grok as cloud fallbacks.

Overview

What is Jetson AI?

Jetson AI is a personal edge inference platform — a dedicated machine that runs AI workloads locally, on your own hardware, without sending data to a cloud provider by default. It keeps latency low, costs predictable, and sensitive queries on-device.

When a task exceeds local capacity, LiteLLM automatically routes to the best available cloud model — Claude, Gemini, or Grok — through a single unified API endpoint that looks identical to any caller.

caller ──▶ LiteLLM (:4000/v1)
┌────────┴──────────────────┐
local first cloud fallback
Ollama (Jetson Orin Nano) Claude · Gemini · Grok
└─────────────────────────────┘
Hardware

The Node

Purpose-built for sustained AI inference at the edge — fast storage for models, dedicated NPU for acceleration.

Compute
NVIDIA Jetson Orin Nano
40 TOPS AI performance · 8-core Arm Cortex-A78AE · 1024-core Ampere GPU
Primary Storage
Samsung 990 Pro 2TB
NVMe M.2 SSD · up to 7,450 MB/s read · model library and workspace
Secondary Storage
SanDisk Extreme 512GB
OS, swap, and fast-access cache
Software Stack

How It Runs

🤖

Ollama — Local Inference

Runs open-weight models directly on the Jetson's GPU and NPU. Handles model management, quantization, and serving — no internet required for local inference.

LiteLLM — Unified API Gateway

Exposes a single OpenAI-compatible endpoint. Routes requests to local Ollama models first; falls back to cloud providers automatically when needed. Any app that speaks OpenAI protocol works without changes.

Available Models

Local & Cloud

Local (Ollama) Claude Gemini Grok