Qwen3-TTS GPU Suite

Overview

Markdown-to-Audio conversion using Alibaba’s Qwen3-TTS, optimized for NVIDIA GPUs. All model weights are stored locally, ensuring privacy and offline functionality.

GitHub Repository: https://github.com/prathmeshnik/qwen3-tts

This suite provides two main approaches:

  1. Voice Cloning - Clone any voice from a reference audio sample
  2. Preset Voices - Use high-quality built-in voices without reference files

Key Features

  • GPU-accelerated inference using PyTorch SDPA (Scaled Dot Product Attention)
  • Adaptive token budgeting for faster short-text processing
  • Voice cloning capabilities with reference audio
  • Multiple preset voice options (Serena, Vivian, Uncle_Fu, Ryan, Aiden, Ono_Anna, Sohee, Eric, Dylan)
  • Batch processing for improved throughput
  • Optimized for consumer GPUs like RTX 3050

Project Structure

See 08-directory-structure for complete file organization.

Quick Start

  1. Install dependencies: pip install -r requirements.txt
  2. Choose your model approach:
    • Voice Cloning: Use md_to_audio_base.py with reference audio
    • Preset Voices: Use md_to_audio_custom.py with built-in speakers
  3. Generate audio from your Markdown files