Local Semantic Media Search
Overview
A high-performance, privacy-focused semantic search engine for images and videos that runs entirely on your local machine. No cloud APIs, no data leakage.
Core Features
- Semantic Media Intelligence: Powered by
Qwen3-VL-Embedding-2Bfor deep visual understanding of both images and video frames. - Video Support: Automatically extracts representative frames from videos for indexing and provides an integrated player for playback.
- Incremental Indexing: Smart “memory” via
mtimetracking—only re-indexes files if they have been modified since the last scan. - Fast Local Search: ChromaDB provides sub-100ms vector lookups for your collection.
- Privacy First: All processing happens locally. Your media never leaves your device.
- Optimized UI: Pre-computed thumbnails and a dedicated file-serving endpoint for high-resolution viewing and smooth video streaming.
How It Works
System Architecture & Data Flow
This diagram illustrates how media files are processed, indexed into a vector database, and subsequently retrieved via natural language queries.
graph TD subgraph "Indexing Phase (Offline)" A[vault/ Media Store] --> B[Indexer Script] B --> C{Frame Extraction} C --> D[Qwen3-VL-Embedding-2B] D --> E[2048-dim Vector] E --> F[(ChromaDB)] F --- G[Metadata: Paths, Thumbnails, mtime] end subgraph "Search Phase (Online)" H[User Text Query] --> I[API Server] I --> J[Qwen3-VL-Embedding-2B] J --> K[Query Vector] K --> L{Vector Search} L --> F F --> M[Cosine Similarity Ranking] M --> N{Filter Threshold} N -->|Pass| O[Search Results] N -->|Fail| P[Filtered Out] endReading This Diagram
- Indexing Phase: The indexer scans the local
vault/directory, extracts representative frames from videos, and generates high-dimensional embeddings using the vision-language model. These are stored in ChromaDB along with relevant metadata for fast retrieval and file serving.- Search Phase: When a user enters a natural language query, it is converted into a vector using the same model. The system then performs a cosine similarity search against the indexed media, filtering out results that do not meet the quality threshold.
Link to original
- Indexer scans the
vault/directory for images and videos. - For each media file, a 2048-dimensional embedding is generated via
Qwen3-VL-Embedding-2B. - Embeddings are stored in
chromadbwith metadata (paths, thumbnails, timestamps). - The API server accepts natural language queries, embeds them using the same model, and retrieves nearest neighbors using Cosine Similarity.
- Results below a similarity threshold are filtered out to ensure quality.
Quick Start
- Install dependencies:
pip install -r requirements.txt - Index media:
python indexer.py vault - Start API:
python api.py - Open
http://127.0.0.1:8000in your browser.
Project Components
| Component | Description | Source File |
|---|---|---|
| Indexer | Media ingestion, frame extraction, and vectorization script | indexer.py |
| API Server | FastAPI search server and file streamer | api.py |
| Embedding Utils | Specialized Qwen3-VL embedding logic | embedding_utils.py |
Directory Structure
See Project Directory Structure for the full breakdown.
Last Updated: 2026-06-17