Setup Guide

System Requirements

OS: Windows 10/11 (WSL2 also supported but native Windows recommended for GPU access)
GPU: NVIDIA RTX 30xx series (tested on RTX 3050) or newer with CUDA support
VRAM: Minimum 6 GB (RTX 3050 6GB variant works)
RAM: 16 GB recommended
Python: 3.8+ (tested with Python 3.10)

Installation Steps

1. Clone the Repository

git clone https://github.com/yourusername/qwen3-tts-gpu-suite.git
cd qwen3-tts-gpu-suite

2. Create Virtual Environment (Recommended)

python -m venv tts
.\tts\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

4. Verify GPU Installation

Run the diagnostics command to confirm PyTorch can access your GPU:

python -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else \"No GPU found\"}')"

Expected output:

PyTorch version: 2.3.0
CUDA available: True
GPU: NVIDIA GeForce RTX 3050

5. Download Model Weights

The model weights are not included in the repository due to size. You need to download them separately:

Option A: Voice Cloning Model (Base)

Download Qwen3-TTS-12Hz-0.6B-Base from Hugging Face
Place the folder in the project root: ./Qwen3-TTS-12Hz-0.6B-Base/

Option B: Preset Voices Model (CustomVoice)

Download Qwen3-TTS-12Hz-0.6B-CustomVoice from Hugging Face
Place the folder in the project root: ./Qwen3-TTS-12Hz-0.6B-CustomVoice/

Troubleshooting

”CUDA not available” Error

Ensure you have the latest NVIDIA drivers installed
Verify CUDA toolkit is compatible with your PyTorch version
Try reinstalling PyTorch with CUDA: pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118

”Out of Memory” Error

Reduce batch size: --batch-size 1 (default for custom voice) or lower
Decrease max tokens by lowering --max-chars (default 150 for custom voice)
Close other GPU-intensive applications

Device-Side Assert Errors

Often caused by special characters in text that crash the tokenizer
The custom script includes ASCII encoding to strip problematic characters
Ensure your input text doesn’t contain unsupported unicode sequences

Verification

Try the example commands from the README to confirm everything works:

python md_to_audio_custom.py sample.md --speaker "serena" --instruct "Excited tone"

See 08-directory-structure for complete file organization.

ProjectBreakdown-101

Explorer

2 Setup Guide