Integration Guide
Overview
This guide explains how to integrate the Qwen3-TTS GPU Suite into web platforms, specifically addressing your use case of adding voice summaries to https://project-breakdown101.web.app/.
Recommended Architecture
For web applications, we recommend a pre-generation approach rather than real-time audio generation due to:
- Latency Concerns: Even optimized, TTS takes 15-30 seconds per chunk on RTX 3050
- User Experience: Users shouldn’t wait for audio generation
- Cost Efficiency: Generate once, serve many times
- Reliability: Avoid dependency on user’s GPU capabilities
Recommended Flow
Content Creation → Local Audio Generation → Upload to Firebase/CDN → Web App Consumption
Step-by-Step Integration for Firebase/Web Apps
1. Local Generation Script
Create a batch script to generate audio for all your documentation files:
# generate-audio.ps1
$docsDir = "./docs"
$audioDir = "./public/audio"
$modelPath = "./Qwen3-TTS-12Hz-0.6B-CustomVoice"
# Ensure audio directory exists
New-Item -ItemType Directory -Force -Path $audioDir | Out-Null
# Process all markdown files
Get-ChildItem -Path $docsDir -Filter *.md -Recurse | ForEach-Object {
$mdFile = $_.FullName
$relativePath = $_.FullName.Substring($docsDir.Length)
$audioFile = Join-Path $audioDir ($relativePath -replace '\.md$', '.wav')
# Ensure subdirectories exist
$audioDirPath = Split-Path $audioFile -Parent
if (-not (Test-Path $audioDirPath)) {
New-Item -ItemType Directory -Force -Path $audioDirPath | Out-Null
}
# Generate audio with optimized settings for RTX 3050
Write-Host "Generating audio for $mdFile -> $audioFile"
python md_to_audio_custom.py $mdFile `
--output $audioFile `
--speaker "Serena" `
--instruct "Speak clearly and professionally for technical documentation" `
--batch-size 1 `
--max-chars 150 `
--dtype "bfloat16"
if (Test-Path $audioFile) {
Write-Host "✓ Successfully generated: $audioFile"
} else {
Write-Host "✗ Failed to generate audio for $mdFile"
}
}2. Firebase Hosting Setup
Place generated audio files in your Firebase public directory:
firebase-project/
├── public/
│ ├── index.html
│ ├── audio/
│ │ ├── getting-started.wav
│ │ ├── api-reference.wav
│ │ └── ...
│ ├── assets/
│ └── 404.html
├── firebase.json
└── .firebaserc
3. Web Integration Example
Add audio playback controls to your documentation pages:
<!-- In your documentation template -->
<audio id="doc-audio" preload="none">
<source src="/audio/current-page.wav" type="audio/wav">
Your browser does not support the audio element.
</audio>
<button onclick="toggleAudio()" id="audio-btn">
🔊 Listen to Summary
</button>
<script>
let audioPlayer = null;
function toggleAudio() {
const btn = document.getElementById('audio-btn');
const audioSrc = `/audio/${window.location.pathname.replace(/\//g, '-').replace('.html', '')}.wav`;
if (!audioPlayer) {
audioPlayer = new Audio(audioSrc);
audioPlayer.onended = () => {
btn.textContent = '🔊 Listen to Summary';
btn.disabled = false;
};
}
if (audioPlayer.paused) {
btn.textContent = '⏸️ Pausing...';
btn.disabled = true;
audioPlayer.play().then(() => {
btn.textContent = '⏸️ Pause Audio';
btn.disabled = false;
}).catch(e => {
console.error('Audio play failed:', e);
btn.textContent = '🔊 Listen to Summary';
btn.disabled = false;
});
} else {
audioPlayer.pause();
btn.textContent = '🔊 Listen to Summary';
btn.disabled = false;
}
}
</script>4. Build Process Integration
Add audio generation to your static site build process:
For GitHub Actions:
name: Build and Deploy
on:
push:
branches: [ main ]
jobs:
build:
runs-on: windows-latest # Important: Windows for GPU access
steps:
- uses: actions/checkout@v3
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install Dependencies
run: |
pip install -r requirements.txt
- name: Download Models
run: |
# Download models using git lfs or huggingface_hub
pip install huggingface_hub
python -c "from huggingface_hub import snapshot_download; snapshot_download('Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice', local_dir='./Qwen3-TTS-12Hz-0.6B-CustomVoice')"
python -c "from huggingface_hub import snapshot_download; snapshot_download('Qwen/Qwen3-TTS-12Hz-0.6B-Base', local_dir='./Qwen3-TTS-12Hz-0.6B-Base')"
- name: Generate Audio
run: |
.\generate-audio.ps1
- name: Build Website
run: |
# Your existing build commands (e.g., npm run build for Next.js)
# or just copy files if using plain HTML
- name: Deploy to Firebase
uses: w9jds/firebase-action@v8
with:
args: deploy --only hosting5. Alternative: Cloud Function Approach
If you prefer on-demand generation (not recommended for RTX 3050 due to latency):
- Deploy to a GPU-enabled cloud service (AWS G4/G5 instances, Azure NC/ND series, Google Cloud A2/A3)
- Create an API endpoint that accepts text and returns audio
- Your web app calls this API when users request audio
However, for your specific use case with an RTX 3050 laptop, the pre-generation approach is strongly recommended.
Optimization for Your Documentation Site
Audio File Naming Convention
Use a consistent mapping between documentation pages and audio files:
/docs/getting-started.md→/audio/getting-started.wav/docs/api/reference.md→/audio/api-reference.wav- Handle nested paths:
/docs/guides/advanced-usage.md→/audio/guides-advanced-usage.wav
Cache Control
Set appropriate Firebase hosting headers for audio files:
{
"hosting": {
"public": "public",
"headers": [
{
"source": "/audio/**",
"headers": [
{
"key": "Cache-Control",
"value": "public, max-age=31536000, immutable"
}
]
}
]
}
}Audio Quality Settings
For documentation voice-overs, consider these optimal settings:
- Speaker: Serena or Vivian (clear, professional female voices)
- Instruct: “Speak clearly and professionally for technical documentation”
- Sample Rate: 24kHz (native output quality)
- Format: WAV (lossless) or convert to Opus/Ogg for web efficiency
Fallback Strategy
Always provide a fallback for browsers that don’t support audio playback:
<div class="audio-container">
<audio controls preload="metadata">
<source src="/audio/page.wav" type="audio/wav">
<a href="/audio/page.wav" download>Download audio summary (WAV, 5MB)</a>
</audio>
<p class="audio-hint">Click play to listen to a spoken summary of this section</p>
</div>Maintenance Considerations
Regenerating Audio
When documentation changes:
- Edit the markdown file
- Run the generation script again
- Redeploy to Firebase
- Audio files will be automatically updated due to content-hashed filenames or cache invalidation
Monitoring
Track:
- Audio generation success rate in your build logs
- Storage usage of audio files in Firebase
- Download bandwidth if you have many audio downloads
Scaling Tips
If you expand beyond your RTX 3050 laptop:
- Multiple Machines: Distribute markdown files across several GPU-equipped machines
- Cloud Bursting: Use spot instances on AWS/Azure/GCP for large batches
- Incremental Generation: Only regenerate audio for files that have changed (based on git diff)
Troubleshooting Integration Issues
Audio Not Playing
- Check network tab in dev tools for 404 errors on audio files
- Verify file paths match between
<source src="">and actual file location - Ensure Firebase hosting has deployed the audio files
- Check audio file isn’t corrupted (try playing locally)
Slow Initial Load
- Audio files might be large - consider compressing to Opus format for web
- Use
preload="none"to avoid blocking page load - Consider lazy-loading audio players when they enter viewport
Browser Compatibility
- WAV is widely supported, but consider providing MP3/Ogg fallbacks
- Test on Chrome, Firefox, Safari, and Edge
- Mobile browsers may have autoplay restrictions (user interaction required)
By following this pre-generation approach, you can provide high-quality voice summaries for your documentation site without making users wait for audio generation, while still leveraging your RTX 3050’s capabilities for the actual TTS processing.