Setup Guide

System Requirements

  • OS: Windows 10/11 (WSL2 also supported but native Windows recommended for GPU access)
  • GPU: NVIDIA RTX 30xx series (tested on RTX 3050) or newer with CUDA support
  • VRAM: Minimum 6 GB (RTX 3050 6GB variant works)
  • RAM: 16 GB recommended
  • Python: 3.8+ (tested with Python 3.10)

Installation Steps

1. Clone the Repository

git clone https://github.com/yourusername/qwen3-tts-gpu-suite.git
cd qwen3-tts-gpu-suite
python -m venv tts
.\tts\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

4. Verify GPU Installation

Run the diagnostics command to confirm PyTorch can access your GPU:

python -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else \"No GPU found\"}')"

Expected output:

PyTorch version: 2.3.0
CUDA available: True
GPU: NVIDIA GeForce RTX 3050

5. Download Model Weights

The model weights are not included in the repository due to size. You need to download them separately:

Option A: Voice Cloning Model (Base)

  • Download Qwen3-TTS-12Hz-0.6B-Base from Hugging Face
  • Place the folder in the project root: ./Qwen3-TTS-12Hz-0.6B-Base/

Option B: Preset Voices Model (CustomVoice)

  • Download Qwen3-TTS-12Hz-0.6B-CustomVoice from Hugging Face
  • Place the folder in the project root: ./Qwen3-TTS-12Hz-0.6B-CustomVoice/

Troubleshooting

”CUDA not available” Error

  1. Ensure you have the latest NVIDIA drivers installed
  2. Verify CUDA toolkit is compatible with your PyTorch version
  3. Try reinstalling PyTorch with CUDA: pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118

”Out of Memory” Error

  • Reduce batch size: --batch-size 1 (default for custom voice) or lower
  • Decrease max tokens by lowering --max-chars (default 150 for custom voice)
  • Close other GPU-intensive applications

Device-Side Assert Errors

  • Often caused by special characters in text that crash the tokenizer
  • The custom script includes ASCII encoding to strip problematic characters
  • Ensure your input text doesn’t contain unsupported unicode sequences

Verification

Try the example commands from the README to confirm everything works:

python md_to_audio_custom.py sample.md --speaker "serena" --instruct "Excited tone"

See 08-directory-structure for complete file organization.