Our custom SNAC redistribution had wrong layer mapping (positions 1,2 vs 1,4 for layer 2) and incorrect audio slicing. Switched to importing convert_to_audio directly from orpheus_tts.decoder which handles the sliding window, layer redistribution, and 2048:4096 audio slice correctly. Audio now sounds clean with only a subtle boundary artifact on the first token group (inherent to SNAC streaming, not our code). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
23 KiB
23 KiB