Edge AI Active

Jetson AI

A self-hosted AI inference station built on the NVIDIA Jetson Orin Nano. Local models run on-device via Ollama; LiteLLM provides a unified OpenAI-compatible API with Claude, Gemini, and Grok as cloud fallbacks.

Overview

What is Jetson AI?

Jetson AI is a personal edge inference platform — a dedicated machine that runs AI workloads locally, on your own hardware, without sending data to a cloud provider by default. It keeps latency low, costs predictable, and sensitive queries on-device.

When a task exceeds local capacity, LiteLLM automatically routes to the best available cloud model — Claude, Gemini, or Grok — through a single unified API endpoint that looks identical to any caller.

caller ──▶ LiteLLM (:4000/v1)

│

┌────────┴──────────────────┐

│ local first │ cloud fallback

│ Ollama (Jetson Orin Nano) │ Claude · Gemini · Grok

└─────────────────────────────┘

Hardware

The Node

Purpose-built for sustained AI inference at the edge — fast storage for models, dedicated NPU for acceleration.

Compute

NVIDIA Jetson Orin Nano

40 TOPS AI performance · 8-core Arm Cortex-A78AE · 1024-core Ampere GPU

Primary Storage

Samsung 990 Pro 2TB

NVMe M.2 SSD · up to 7,450 MB/s read · model library and workspace

Secondary Storage

SanDisk Extreme 512GB

OS, swap, and fast-access cache

Software Stack

How It Runs

🤖

Ollama — Local Inference

Runs open-weight models directly on the Jetson's GPU and NPU. Handles model management, quantization, and serving — no internet required for local inference.

⚡

LiteLLM — Unified API Gateway

Exposes a single OpenAI-compatible endpoint. Routes requests to local Ollama models first; falls back to cloud providers automatically when needed. Any app that speaks OpenAI protocol works without changes.

Available Models

Local & Cloud

Local (Ollama) Claude Gemini Grok