CLI Reference

Common Arguments

Both scripts share several core arguments for specifying input/output and model location.

Input/Output

ArgumentDescriptionDefault
input_mdPath to the input Markdown file(Required)
--outputPath to save the output WAV audio fileoutput.wav (base) / output_custom.wav (custom)
--model_pathPath to the local model weights folder./Qwen3-TTS-12Hz-0.6B-Base (base) / ./Qwen3-TTS-12Hz-0.6B-CustomVoice (custom)

Processing Parameters

ArgumentDescriptionDefault
--chunk_size / --max-charsMaximum characters per chunk (see note below)400 (base) / 150 (custom)
--batch_sizeNumber of chunks to process in parallel on GPU4 (base) / 1 (custom)
--langLanguage for generation (affects tokenizer and prosody)English

Voice Cloning Script (md_to_audio_base.py)

Unique Arguments

ArgumentDescriptionDefault
--ref_audioPath to reference audio file for voice cloning(Required for cloning)
--ref_textTranscript of the reference audio (must match audio content)(Required for cloning)

Usage Example

python md_to_audio_base.py document.md --ref_audio "my_voice.wav" --ref_text "This is my voice." --output "cloned_audio.wav"

Preset Voices Script (md_to_audio_custom.py)

Unique Arguments

ArgumentDescriptionDefault
--speakerSpeaker name (case-insensitive)Aiden
--instructStyle instruction for the modelSpeak naturally.
--dtypeModel data type (bfloat16, float16, float32)bfloat16

Available Speakers

  • Serena
  • Vivian
  • Uncle_Fu
  • Ryan
  • Aiden
  • Ono_Anna
  • Sohee
  • Eric
  • Dylan

Usage Examples

# Basic usage with default settings
python md_to_audio_custom.py document.md
# Specify speaker and style
python md_to_audio_custom.py document.md --speaker "Ryan" --instruct "Excited tone"
# Use float16 instead of bfloat16 (if experiencing numerical issues)
python md_to_audio_custom.py document.md --dtype "float16"
# Increase batch size if you have more than 6GB VRAM
python md_to_audio_custom.py document.md --batch_size 2

Chunk Size Behavior

Base Script (md_to_audio_base.py)

  • The --chunk_size argument sets the target, but the actual chunk size may be reduced to 200 characters if not overridden.
  • This adaptive behavior helps prevent VRAM overflow on long sentences.

Custom Script (md_to_audio_custom.py)

  • The --max-chars argument is used as-is with no fallback reduction.
  • The value of 150 was empirically determined to balance speed and quality on 6GB VRAM cards.
  • Lower values increase overhead; higher values risk OOM errors.

Performance Notes

  • For maximum throughput on GPUs with >8GB VRAM, increase --batch_size in the base script.
  • For latency-sensitive applications (like real-time preview), keep --batch_size at 1 and tune --max-chars.
  • The custom script’s --dtype parameter allows trading speed for numerical stability:
    • bfloat16: Fastest on Ampere (RTX 30xx) and newer
    • float16: Good balance, slightly slower than bfloat16
    • float32: Slowest but most numerically stable (not recommended unless necessary)

Error Handling

Both scripts include graceful degradation:

  • If a GPU batch fails, the base script falls back to sequential processing
  • The custom script will stop on batch failure and report the error (often due to special characters or VRAM issues)
  • Common errors and solutions are documented in troubleshooting