# Voice MCP MCP server for text-to-speech generation using Bark TTS service on Jetson AGX Orin. ## Overview This MCP server provides tools to generate and play speech from text using the Bark TTS service. It follows a similar pattern to dreamtail-mcp but for voice generation instead of images. ## Features - **voice_generate()** - Convert text to speech using Bark TTS - **voice_play()** - Play generated audio files on macOS - **voice_get_last()** - Get info about last generated voice - French voice preset (v2/fr_speaker_1) hardcoded - Automatic download and caching of audio files ## Requirements - Python 3.8+ - macOS (for audio playback with `afplay`) - Bark TTS service running on bigorin.local:8766 ## Installation ```bash cd ~/Projects/voice-mcp # Install dependencies pip install -r requirements.txt # Test the server python3 voice_mcp.py ``` ## Configuration Add to your Claude Desktop configuration file: ```json { "mcpServers": { "voice": { "command": "python3", "args": [ "/Users/yourname/Projects/voice-mcp/voice_mcp.py" ], "env": { "BARK_BASE_URL": "http://bigorin.local:8766", "VOICE_DOWNLOAD_DIR": "/Users/yourname/voice_audio", "DEFAULT_VOICE": "v2/fr_speaker_1" } } } } ``` **Environment Variables:** - `BARK_BASE_URL` - Bark TTS service URL (default: http://bigorin.local:8766) - `VOICE_DOWNLOAD_DIR` - Where to save audio files (default: ~/voice_audio) - `DEFAULT_VOICE` - Voice preset to use (default: v2/fr_speaker_1) ## Usage ### Generate Speech ```python # In Claude Desktop with MCP enabled voice_generate("Bonjour, comment allez-vous?") # Returns: "abc123-def456.wav" ``` The tool will: 1. Submit text to Bark TTS service 2. Poll for completion (up to 120 seconds) 3. Download the WAV file to ~/voice_audio/ 4. Return the filename ### Play Audio ```python voice_play("abc123-def456.wav") # Returns: "Playing audio: abc123-def456.wav" ``` Plays the audio file using macOS's built-in `afplay` command. ### Get Last Generation Info ```python voice_get_last() # Returns: {"job_id": "abc123", "filename": "abc123.wav", "text": "Bonjour..."} ``` ## API Reference ### voice_generate(text: str) → str Generate speech from text using Bark TTS. **Args:** - `text` (str): Text to convert to speech **Returns:** - Filename of the generated WAV file **Raises:** - `RuntimeError`: If generation fails or times out **Example:** ```python filename = voice_generate("Bonjour le monde!") ``` ### voice_play(filename: str) → str Play a WAV audio file on macOS. **Args:** - `filename` (str): Name of the WAV file to play **Returns:** - Confirmation message **Raises:** - `FileNotFoundError`: If audio file doesn't exist - `RuntimeError`: If playback fails **Example:** ```python voice_play("abc123-def456.wav") ``` ### voice_get_last() → dict Get information about the last generated voice. **Returns:** - Dictionary with job_id, filename, and text **Example:** ```python info = voice_get_last() ``` ## File Structure ``` voice-mcp/ ├── voice_mcp.py # MCP server implementation ├── requirements.txt # Python dependencies ├── README.md # This file └── claude_desktop_config.example.json # Example config ``` Downloaded audio files are stored in `~/voice_audio/` by default. ## How It Works 1. **Submit Job**: `voice_generate()` sends text to Bark TTS service 2. **Poll Status**: Checks generation progress every 3 seconds 3. **Download Audio**: When complete, downloads WAV file 4. **Return Filename**: Returns filename for later playback 5. **Play Audio**: `voice_play()` uses macOS `afplay` to play the file ## Troubleshooting ### Connection Refused If you get connection errors: ```bash # Check if Bark TTS service is running curl http://bigorin.local:8766/health ``` ### Audio File Not Found Make sure you're using the exact filename returned by `voice_generate()`: ```python filename = voice_generate("Test") voice_play(filename) # Use the returned filename ``` ### afplay Not Found The `afplay` command is macOS-only. If you're on Linux/Windows, you'll need to modify `voice_play()` to use a different audio player. ## Voice Preset The default voice is `v2/fr_speaker_1` (French speaker #1). To use a different voice: 1. Edit `.env` or set environment variable: ```bash export DEFAULT_VOICE=v2/en_speaker_6 ``` 2. See `VOICES.md` in bark-tts project for all available voices (130 total) ## Performance - **Generation time**: 45-70 seconds for short text (on Jetson AGX Orin 64GB) - **Cached results**: Instant if same text was generated before - **Timeout**: 120 seconds (configurable in code) - **Poll interval**: 3 seconds ## Related Projects - **bark-tts** - The Bark TTS service this MCP connects to - **dreamtail-mcp** - Similar MCP for image generation with DreamTail ## License MIT