Audio version — Estimated duration: 1 min 59 sec

Local Semantic Media Search

Overview

A high-performance, privacy-focused semantic search engine for images and videos that runs entirely on your local machine. No cloud APIs, no data leakage.

Core Features

Semantic Media Intelligence: Powered by Qwen3-VL-Embedding-2B for deep visual understanding of both images and video frames.
Video Support: Automatically extracts representative frames from videos for indexing and provides an integrated player for playback.
Incremental Indexing: Smart “memory” via mtime tracking—only re-indexes files if they have been modified since the last scan.
Fast Local Search: ChromaDB provides sub-100ms vector lookups for your collection.
Privacy First: All processing happens locally. Your media never leaves your device.
Optimized UI: Pre-computed thumbnails and a dedicated file-serving endpoint for high-resolution viewing and smooth video streaming.

How It Works

System Architecture & Data Flow

This diagram illustrates how media files are processed, indexed into a vector database, and subsequently retrieved via natural language queries.
graph TD
    subgraph "Indexing Phase (Offline)"
        A[vault/ Media Store] --> B[Indexer Script]
        B --> C{Frame Extraction}
        C --> D[Qwen3-VL-Embedding-2B]
        D --> E[2048-dim Vector]
        E --> F[(ChromaDB)]
        F --- G[Metadata: Paths, Thumbnails, mtime]
    end

    subgraph "Search Phase (Online)"
        H[User Text Query] --> I[API Server]
        I --> J[Qwen3-VL-Embedding-2B]
        J --> K[Query Vector]
        K --> L{Vector Search}
        L --> F
        F --> M[Cosine Similarity Ranking]
        M --> N{Filter Threshold}
        N -->|Pass| O[Search Results]
        N -->|Fail| P[Filtered Out]
    end
Reading This Diagram

Indexing Phase: The indexer scans the local vault/ directory, extracts representative frames from videos, and generates high-dimensional embeddings using the vision-language model. These are stored in ChromaDB along with relevant metadata for fast retrieval and file serving.

Search Phase: When a user enters a natural language query, it is converted into a vector using the same model. The system then performs a cosine similarity search against the indexed media, filtering out results that do not meet the quality threshold.

Link to original

Indexer scans the vault/ directory for images and videos.
For each media file, a 2048-dimensional embedding is generated via Qwen3-VL-Embedding-2B.
Embeddings are stored in chromadb with metadata (paths, thumbnails, timestamps).
The API server accepts natural language queries, embeds them using the same model, and retrieves nearest neighbors using Cosine Similarity.
Results below a similarity threshold are filtered out to ensure quality.

Quick Start

Install dependencies: pip install -r requirements.txt
Index media: python indexer.py vault
Start API: python api.py
Open http://127.0.0.1:8000 in your browser.

Project Components

Component	Description	Source File
Indexer	Media ingestion, frame extraction, and vectorization script	`indexer.py`
API Server	FastAPI search server and file streamer	`api.py`
Embedding Utils	Specialized Qwen3-VL embedding logic	`embedding_utils.py`

Directory Structure

See Project Directory Structure for the full breakdown.

Last Updated: 2026-06-17

ProjectBreakdown-101

Explorer

Local Semantic Media Search

Local Semantic Media Search

Overview

Core Features

How It Works

System Architecture & Data Flow

Reading This Diagram

Quick Start

Project Components

Directory Structure

security

pipeline

ops

models

domain

diagrams

configuration

cli

api

Directory Structure

Graph View

Table of Contents

Backlinks