Skip to content

Hybrid Search

Ghost’s search engine combines two complementary approaches for the best possible results.

Uses SQLite’s Full-Text Search 5 with Porter stemming and Unicode tokenization:

SELECT rowid, rank FROM chunks_fts WHERE chunks_fts MATCH ?;

Best for: exact filenames, code symbols, specific phrases.

Uses sqlite-vec for K-Nearest Neighbor search on embedding vectors:

SELECT chunk_id, distance FROM chunks_vec
WHERE embedding MATCH ? ORDER BY distance LIMIT 20;

Best for: conceptual queries, “find files about X”, natural language.

Both result sets are combined using RRF scoring:

RRF_score = Σ(1 / (k + rank_i)) for each ranking system

Where k = 60 (standard constant). This ensures results that appear in both keyword and semantic search rank highest.

Ghost uses a fallback chain for embeddings:

PriorityEngineDimensionsSizeSpeed
1 (Primary)Native Candle — all-MiniLM-L6-v2384~23MBInstant (in-process)
2 (Fallback)Ollama — nomic-embed-text768~274MBHTTP call
3 (Degraded)FTS5 only — no vectorsN/A0<5ms keyword only

The native engine runs in-process with zero external dependencies — no Ollama, no GPU, no internet after first model download.

Ghost extracts text from:

  • Documents: PDF, DOCX, XLSX, TXT, Markdown
  • Code: 50+ extensions (.rs, .py, .js, .ts, .go, .java, .cpp, .c, .rb, .php, etc.)
  • Data: JSON, YAML, TOML, XML, CSV
MetricTargetActual
FTS5 keyword search<5ms✅ <3ms typical
Semantic vector search<500ms✅ <200ms typical
File indexing (per file)<100ms✅ ~50ms typical
Background CPU usage<10%✅ ~5% during indexing