Chat Engine

Name: Ghost
Author: ghostapp-ai

Ghost includes a full chat engine powered by local LLMs — no cloud APIs, no subscriptions.

How It Works

Hardware Detection: Ghost scans your CPU, RAM, GPU at startup
Model Selection: Automatically picks the largest model that fits comfortably
Background Download: Model downloads from HuggingFace Hub in the background
Native Inference: Runs via Candle GGUF (desktop) or Ollama fallback

Tier	Model	Size	RAM Required
Tiny	Qwen2.5-0.5B-Instruct-Q4_K_M	~400MB	2 GB
Small	Qwen2.5-1.5B-Instruct-Q4_K_M	~1.1GB	4 GB
Medium	Qwen2.5-3B-Instruct-Q4_K_M	~2.0GB	8 GB
Large	Qwen2.5-7B-Instruct-Q4_K_M	~4.3GB	16 GB

For the ReAct agent loop with tool calling, Ghost uses Qwen3 via Ollama:

Unified Omnibox: Type naturally — Ghost auto-detects chat intent
Streaming responses: Token-by-token output via AG-UI events
Conversation memory: Persisted in SQLite with FTS5 search across past conversations
Debug panel: See reasoning, tool calls, and timing with Ctrl+D

All chat settings are configurable via Settings (Ctrl+,):