Setup Guide
System Requirements
- OS: Windows 10/11 (WSL2 also supported but native Windows recommended for GPU access)
- GPU: NVIDIA RTX 30xx series (tested on RTX 3050) or newer with CUDA support
- VRAM: Minimum 6 GB (RTX 3050 6GB variant works)
- RAM: 16 GB recommended
- Python: 3.8+ (tested with Python 3.10)
Installation Steps
1. Clone the Repository
git clone https://github.com/yourusername/qwen3-tts-gpu-suite.git
cd qwen3-tts-gpu-suite2. Create Virtual Environment (Recommended)
python -m venv tts
.\tts\Scripts\activate3. Install Dependencies
pip install -r requirements.txt4. Verify GPU Installation
Run the diagnostics command to confirm PyTorch can access your GPU:
python -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else \"No GPU found\"}')"Expected output:
PyTorch version: 2.3.0
CUDA available: True
GPU: NVIDIA GeForce RTX 3050
5. Download Model Weights
The model weights are not included in the repository due to size. You need to download them separately:
Option A: Voice Cloning Model (Base)
- Download
Qwen3-TTS-12Hz-0.6B-Basefrom Hugging Face - Place the folder in the project root:
./Qwen3-TTS-12Hz-0.6B-Base/
Option B: Preset Voices Model (CustomVoice)
- Download
Qwen3-TTS-12Hz-0.6B-CustomVoicefrom Hugging Face - Place the folder in the project root:
./Qwen3-TTS-12Hz-0.6B-CustomVoice/
Troubleshooting
”CUDA not available” Error
- Ensure you have the latest NVIDIA drivers installed
- Verify CUDA toolkit is compatible with your PyTorch version
- Try reinstalling PyTorch with CUDA:
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118
”Out of Memory” Error
- Reduce batch size:
--batch-size 1(default for custom voice) or lower - Decrease max tokens by lowering
--max-chars(default 150 for custom voice) - Close other GPU-intensive applications
Device-Side Assert Errors
- Often caused by special characters in text that crash the tokenizer
- The custom script includes ASCII encoding to strip problematic characters
- Ensure your input text doesn’t contain unsupported unicode sequences
Verification
Try the example commands from the README to confirm everything works:
python md_to_audio_custom.py sample.md --speaker "serena" --instruct "Excited tone"See 08-directory-structure for complete file organization.