External Resources
Core Dependencies
PyTorch
- Official Documentation: https://pytorch.org/docs/stable/
- GitHub Repository: https://github.com/pytorch/pytorch
- Installation Guide: https://pytorch.org/get-started/locally/
- CUDA Support: https://pytorch.org/get-started/previous-versions/
Qwen3-TTS Models
- Base Model (Voice Cloning): https://huggingface.co/Qwen/Qwen3-TTS-12Hz-0.6B-Base
- CustomVoice Model (Preset Voices): https://huggingface.co/Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice
- Qwen Series Overview: https://huggingface.co/Qwen
Essential Libraries
- transformers: https://huggingface.co/docs/transformers/index
- torchaudio: https://pytorch.org/audio/stable/index.html
- soundfile: https://pysoundfile.readthedocs.io/
- numpy: https://numpy.org/doc/
- tqdm: https://tqdm.github.io/
- accelerate: https://huggingface.co/docs/accelerate/index
Technical References
Attention Mechanisms
- Scaled Dot Product Attention (SDPA): https://pytorch.org/tutorials/intermediate/sdpa_tutorial.html
- Flash Attention: https://github.com/Dao-AILab/flash-attention
- TensorFloat-32 (TF32): https://developer.nvidia.com/blog/tensorfloat-32-precision-format/
Model Architecture
- Qwen Technical Report: https://arxiv.org/abs/2309.16609
- Text-to-Speech Survey: https://arxiv.org/abs/2106.06163
Optimization Resources
GPU Optimization
- NVIDIA CUDA Documentation: https://docs.nvidia.com/cuda/
- PyTorch Performance Tuning: https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html
- AMPERE Architecture Whitepaper: https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/ampere-architecture-whitepaper.pdf
Voice Cloning
- YourTTS Approach: https://arxiv.org/abs/2106.06163
- Voice Conversion Techniques: https://iscaspeech.org/archive/Interspeech_2020/pdfs/3057.pdf
Deployment & Hosting
Firebase Hosting
- Documentation: https://firebase.google.com/docs/hosting
- CLI Reference: https://firebase.google.com/docs/cli
- Hosting Guides: https://firebase.google.com/docs/hosting/quickstart
Web Audio API
- MDN Web Docs: https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API
- HTML5 Audio Element: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/audio
Community & Tutorials
TTS Communities
- r/MachineLearning TTS Discussions: https://www.reddit.com/r/MachineLearning/
- Hugging Face TTS Spaces: https://huggingface.co/spaces?sort=likes&search=tts
Example Projects
- Coqui TTS: https://github.com/coqui-ai/TTS
- Bark (Suno): https://github.com/suno-ai/bark
- Tortoise-TTS: https://github.com/neonbjb/tortoise-tts