Qwen3-TTS GPU Suite
Overview
Markdown-to-Audio conversion using Alibaba’s Qwen3-TTS, optimized for NVIDIA GPUs. All model weights are stored locally, ensuring privacy and offline functionality.
GitHub Repository: https://github.com/prathmeshnik/qwen3-tts
This suite provides two main approaches:
- Voice Cloning - Clone any voice from a reference audio sample
- Preset Voices - Use high-quality built-in voices without reference files
Key Features
- GPU-accelerated inference using PyTorch SDPA (Scaled Dot Product Attention)
- Adaptive token budgeting for faster short-text processing
- Voice cloning capabilities with reference audio
- Multiple preset voice options (Serena, Vivian, Uncle_Fu, Ryan, Aiden, Ono_Anna, Sohee, Eric, Dylan)
- Batch processing for improved throughput
- Optimized for consumer GPUs like RTX 3050
Project Structure
See 08-directory-structure for complete file organization.
Quick Start
- Install dependencies:
pip install -r requirements.txt - Choose your model approach:
- Voice Cloning: Use
md_to_audio_base.pywith reference audio - Preset Voices: Use
md_to_audio_custom.pywith built-in speakers
- Voice Cloning: Use
- Generate audio from your Markdown files
Related Documentation
- 02-setup-guide - Detailed installation and hardware requirements
- 03-markdown-to-audio-pipeline - How the conversion process works
- 04-performance-optimization - GPU utilization tips for RTX 3050
- 05-cli-reference - Command-line interface options
- 06-models - Information about the Qwen3-TTS models used
- 07-integration-guide - Steps for integrating with other applications
- 09-references - External resources and model sources