Local Semantic Media Search

Overview

A high-performance, privacy-focused semantic search engine for images and videos that runs entirely on your local machine. No cloud APIs, no data leakage.

Core Features

  • Semantic Media Intelligence: Powered by Qwen3-VL-Embedding-2B for deep visual understanding of both images and video frames.
  • Video Support: Automatically extracts representative frames from videos for indexing and provides an integrated player for playback.
  • Incremental Indexing: Smart “memory” via mtime tracking—only re-indexes files if they have been modified since the last scan.
  • Fast Local Search: ChromaDB provides sub-100ms vector lookups for your collection.
  • Privacy First: All processing happens locally. Your media never leaves your device.
  • Optimized UI: Pre-computed thumbnails and a dedicated file-serving endpoint for high-resolution viewing and smooth video streaming.

How It Works

System Architecture & Data Flow

This diagram illustrates how media files are processed, indexed into a vector database, and subsequently retrieved via natural language queries.

graph TD
    subgraph "Indexing Phase (Offline)"
        A[vault/ Media Store] --> B[Indexer Script]
        B --> C{Frame Extraction}
        C --> D[Qwen3-VL-Embedding-2B]
        D --> E[2048-dim Vector]
        E --> F[(ChromaDB)]
        F --- G[Metadata: Paths, Thumbnails, mtime]
    end

    subgraph "Search Phase (Online)"
        H[User Text Query] --> I[API Server]
        I --> J[Qwen3-VL-Embedding-2B]
        J --> K[Query Vector]
        K --> L{Vector Search}
        L --> F
        F --> M[Cosine Similarity Ranking]
        M --> N{Filter Threshold}
        N -->|Pass| O[Search Results]
        N -->|Fail| P[Filtered Out]
    end

Reading This Diagram

  • Indexing Phase: The indexer scans the local vault/ directory, extracts representative frames from videos, and generates high-dimensional embeddings using the vision-language model. These are stored in ChromaDB along with relevant metadata for fast retrieval and file serving.
  • Search Phase: When a user enters a natural language query, it is converted into a vector using the same model. The system then performs a cosine similarity search against the indexed media, filtering out results that do not meet the quality threshold.

Link to original

  1. Indexer scans the vault/ directory for images and videos.
  2. For each media file, a 2048-dimensional embedding is generated via Qwen3-VL-Embedding-2B.
  3. Embeddings are stored in chromadb with metadata (paths, thumbnails, timestamps).
  4. The API server accepts natural language queries, embeds them using the same model, and retrieves nearest neighbors using Cosine Similarity.
  5. Results below a similarity threshold are filtered out to ensure quality.

Quick Start

  1. Install dependencies: pip install -r requirements.txt
  2. Index media: python indexer.py vault
  3. Start API: python api.py
  4. Open http://127.0.0.1:8000 in your browser.

Project Components

ComponentDescriptionSource File
IndexerMedia ingestion, frame extraction, and vectorization scriptindexer.py
API ServerFastAPI search server and file streamerapi.py
Embedding UtilsSpecialized Qwen3-VL embedding logicembedding_utils.py

Directory Structure

See Project Directory Structure for the full breakdown.


Last Updated: 2026-06-17

10 items under this folder.