Integration Guide

Overview

This guide explains how to integrate the Qwen3-TTS GPU Suite into web platforms, specifically addressing your use case of adding voice summaries to https://project-breakdown101.web.app/.

For web applications, we recommend a pre-generation approach rather than real-time audio generation due to:

  1. Latency Concerns: Even optimized, TTS takes 15-30 seconds per chunk on RTX 3050
  2. User Experience: Users shouldn’t wait for audio generation
  3. Cost Efficiency: Generate once, serve many times
  4. Reliability: Avoid dependency on user’s GPU capabilities
Content Creation → Local Audio Generation → Upload to Firebase/CDN → Web App Consumption

Step-by-Step Integration for Firebase/Web Apps

1. Local Generation Script

Create a batch script to generate audio for all your documentation files:

# generate-audio.ps1
$docsDir = "./docs"
$audioDir = "./public/audio"
$modelPath = "./Qwen3-TTS-12Hz-0.6B-CustomVoice"
 
# Ensure audio directory exists
New-Item -ItemType Directory -Force -Path $audioDir | Out-Null
 
# Process all markdown files
Get-ChildItem -Path $docsDir -Filter *.md -Recurse | ForEach-Object {
    $mdFile = $_.FullName
    $relativePath = $_.FullName.Substring($docsDir.Length)
    $audioFile = Join-Path $audioDir ($relativePath -replace '\.md$', '.wav')
    
    # Ensure subdirectories exist
    $audioDirPath = Split-Path $audioFile -Parent
    if (-not (Test-Path $audioDirPath)) {
        New-Item -ItemType Directory -Force -Path $audioDirPath | Out-Null
    }
    
    # Generate audio with optimized settings for RTX 3050
    Write-Host "Generating audio for $mdFile -> $audioFile"
    python md_to_audio_custom.py $mdFile `
        --output $audioFile `
        --speaker "Serena" `
        --instruct "Speak clearly and professionally for technical documentation" `
        --batch-size 1 `
        --max-chars 150 `
        --dtype "bfloat16"
    
    if (Test-Path $audioFile) {
        Write-Host "✓ Successfully generated: $audioFile"
    } else {
        Write-Host "✗ Failed to generate audio for $mdFile"
    }
}

2. Firebase Hosting Setup

Place generated audio files in your Firebase public directory:

firebase-project/
├── public/
│   ├── index.html
│   ├── audio/
│   │   ├── getting-started.wav
│   │   ├── api-reference.wav
│   │   └── ...
│   ├── assets/
│   └── 404.html
├── firebase.json
└── .firebaserc

3. Web Integration Example

Add audio playback controls to your documentation pages:

<!-- In your documentation template -->
<audio id="doc-audio" preload="none">
    <source src="/audio/current-page.wav" type="audio/wav">
    Your browser does not support the audio element.
</audio>
 
<button onclick="toggleAudio()" id="audio-btn">
    🔊 Listen to Summary
</button>
 
<script>
let audioPlayer = null;
 
function toggleAudio() {
    const btn = document.getElementById('audio-btn');
    const audioSrc = `/audio/${window.location.pathname.replace(/\//g, '-').replace('.html', '')}.wav`;
    
    if (!audioPlayer) {
        audioPlayer = new Audio(audioSrc);
        audioPlayer.onended = () => {
            btn.textContent = '🔊 Listen to Summary';
            btn.disabled = false;
        };
    }
    
    if (audioPlayer.paused) {
        btn.textContent = '⏸️ Pausing...';
        btn.disabled = true;
        audioPlayer.play().then(() => {
            btn.textContent = '⏸️ Pause Audio';
            btn.disabled = false;
        }).catch(e => {
            console.error('Audio play failed:', e);
            btn.textContent = '🔊 Listen to Summary';
            btn.disabled = false;
        });
    } else {
        audioPlayer.pause();
        btn.textContent = '🔊 Listen to Summary';
        btn.disabled = false;
    }
}
</script>

4. Build Process Integration

Add audio generation to your static site build process:

For GitHub Actions:

name: Build and Deploy
 
on:
  push:
    branches: [ main ]
 
jobs:
  build:
    runs-on: windows-latest  # Important: Windows for GPU access
    
    steps:
    - uses: actions/checkout@v3
    
    - name: Setup Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.10'
    
    - name: Install Dependencies
      run: |
        pip install -r requirements.txt
        
    - name: Download Models
      run: |
        # Download models using git lfs or huggingface_hub
        pip install huggingface_hub
        python -c "from huggingface_hub import snapshot_download; snapshot_download('Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice', local_dir='./Qwen3-TTS-12Hz-0.6B-CustomVoice')"
        python -c "from huggingface_hub import snapshot_download; snapshot_download('Qwen/Qwen3-TTS-12Hz-0.6B-Base', local_dir='./Qwen3-TTS-12Hz-0.6B-Base')"
    
    - name: Generate Audio
      run: |
        .\generate-audio.ps1
    
    - name: Build Website
      run: |
        # Your existing build commands (e.g., npm run build for Next.js)
        # or just copy files if using plain HTML
    
    - name: Deploy to Firebase
      uses: w9jds/firebase-action@v8
      with:
        args: deploy --only hosting

5. Alternative: Cloud Function Approach

If you prefer on-demand generation (not recommended for RTX 3050 due to latency):

  1. Deploy to a GPU-enabled cloud service (AWS G4/G5 instances, Azure NC/ND series, Google Cloud A2/A3)
  2. Create an API endpoint that accepts text and returns audio
  3. Your web app calls this API when users request audio

However, for your specific use case with an RTX 3050 laptop, the pre-generation approach is strongly recommended.

Optimization for Your Documentation Site

Audio File Naming Convention

Use a consistent mapping between documentation pages and audio files:

  • /docs/getting-started.md/audio/getting-started.wav
  • /docs/api/reference.md/audio/api-reference.wav
  • Handle nested paths: /docs/guides/advanced-usage.md/audio/guides-advanced-usage.wav

Cache Control

Set appropriate Firebase hosting headers for audio files:

{
  "hosting": {
    "public": "public",
    "headers": [
      {
        "source": "/audio/**",
        "headers": [
          {
            "key": "Cache-Control",
            "value": "public, max-age=31536000, immutable"
          }
        ]
      }
    ]
  }
}

Audio Quality Settings

For documentation voice-overs, consider these optimal settings:

  • Speaker: Serena or Vivian (clear, professional female voices)
  • Instruct: “Speak clearly and professionally for technical documentation”
  • Sample Rate: 24kHz (native output quality)
  • Format: WAV (lossless) or convert to Opus/Ogg for web efficiency

Fallback Strategy

Always provide a fallback for browsers that don’t support audio playback:

<div class="audio-container">
    <audio controls preload="metadata">
        <source src="/audio/page.wav" type="audio/wav">
        <a href="/audio/page.wav" download>Download audio summary (WAV, 5MB)</a>
    </audio>
    <p class="audio-hint">Click play to listen to a spoken summary of this section</p>
</div>

Maintenance Considerations

Regenerating Audio

When documentation changes:

  1. Edit the markdown file
  2. Run the generation script again
  3. Redeploy to Firebase
  4. Audio files will be automatically updated due to content-hashed filenames or cache invalidation

Monitoring

Track:

  • Audio generation success rate in your build logs
  • Storage usage of audio files in Firebase
  • Download bandwidth if you have many audio downloads

Scaling Tips

If you expand beyond your RTX 3050 laptop:

  1. Multiple Machines: Distribute markdown files across several GPU-equipped machines
  2. Cloud Bursting: Use spot instances on AWS/Azure/GCP for large batches
  3. Incremental Generation: Only regenerate audio for files that have changed (based on git diff)

Troubleshooting Integration Issues

Audio Not Playing

  1. Check network tab in dev tools for 404 errors on audio files
  2. Verify file paths match between <source src=""> and actual file location
  3. Ensure Firebase hosting has deployed the audio files
  4. Check audio file isn’t corrupted (try playing locally)

Slow Initial Load

  1. Audio files might be large - consider compressing to Opus format for web
  2. Use preload="none" to avoid blocking page load
  3. Consider lazy-loading audio players when they enter viewport

Browser Compatibility

  1. WAV is widely supported, but consider providing MP3/Ogg fallbacks
  2. Test on Chrome, Firefox, Safari, and Edge
  3. Mobile browsers may have autoplay restrictions (user interaction required)

By following this pre-generation approach, you can provide high-quality voice summaries for your documentation site without making users wait for audio generation, while still leveraging your RTX 3050’s capabilities for the actual TTS processing.