Voice MCP
MCP server for text-to-speech generation using Bark TTS service on Jetson AGX Orin.
Overview
This MCP server provides tools to generate and play speech from text using the Bark TTS service. It follows a similar pattern to dreamtail-mcp but for voice generation instead of images.
Features
- voice_generate() - Convert text to speech using Bark TTS
- voice_play() - Play generated audio files on macOS
- voice_get_last() - Get info about last generated voice
- French voice preset (v2/fr_speaker_1) hardcoded
- Automatic download and caching of audio files
Requirements
- Python 3.8+
- macOS (for audio playback with
afplay) - Bark TTS service running on bigorin.local:8766
Installation
cd ~/Projects/voice-mcp
# Install dependencies
pip install -r requirements.txt
# Test the server
python3 voice_mcp.py
Configuration
Add to your Claude Desktop configuration file:
{
"mcpServers": {
"voice": {
"command": "python3",
"args": [
"/Users/yourname/Projects/voice-mcp/voice_mcp.py"
],
"env": {
"BARK_BASE_URL": "http://bigorin.local:8766",
"VOICE_DOWNLOAD_DIR": "/Users/yourname/voice_audio",
"DEFAULT_VOICE": "v2/fr_speaker_1"
}
}
}
}
Environment Variables:
BARK_BASE_URL- Bark TTS service URL (default: http://bigorin.local:8766)VOICE_DOWNLOAD_DIR- Where to save audio files (default: ~/voice_audio)DEFAULT_VOICE- Voice preset to use (default: v2/fr_speaker_1)
Usage
Generate Speech
# In Claude Desktop with MCP enabled
voice_generate("Bonjour, comment allez-vous?")
# Returns: "abc123-def456.wav"
The tool will:
- Submit text to Bark TTS service
- Poll for completion (up to 120 seconds)
- Download the WAV file to ~/voice_audio/
- Return the filename
Play Audio
voice_play("abc123-def456.wav")
# Returns: "Playing audio: abc123-def456.wav"
Plays the audio file using macOS's built-in afplay command.
Get Last Generation Info
voice_get_last()
# Returns: {"job_id": "abc123", "filename": "abc123.wav", "text": "Bonjour..."}
API Reference
voice_generate(text: str) → str
Generate speech from text using Bark TTS.
Args:
text(str): Text to convert to speech
Returns:
- Filename of the generated WAV file
Raises:
RuntimeError: If generation fails or times out
Example:
filename = voice_generate("Bonjour le monde!")
voice_play(filename: str) → str
Play a WAV audio file on macOS.
Args:
filename(str): Name of the WAV file to play
Returns:
- Confirmation message
Raises:
FileNotFoundError: If audio file doesn't existRuntimeError: If playback fails
Example:
voice_play("abc123-def456.wav")
voice_get_last() → dict
Get information about the last generated voice.
Returns:
- Dictionary with job_id, filename, and text
Example:
info = voice_get_last()
File Structure
voice-mcp/
├── voice_mcp.py # MCP server implementation
├── requirements.txt # Python dependencies
├── README.md # This file
└── claude_desktop_config.example.json # Example config
Downloaded audio files are stored in ~/voice_audio/ by default.
How It Works
- Submit Job:
voice_generate()sends text to Bark TTS service - Poll Status: Checks generation progress every 3 seconds
- Download Audio: When complete, downloads WAV file
- Return Filename: Returns filename for later playback
- Play Audio:
voice_play()uses macOSafplayto play the file
Troubleshooting
Connection Refused
If you get connection errors:
# Check if Bark TTS service is running
curl http://bigorin.local:8766/health
Audio File Not Found
Make sure you're using the exact filename returned by voice_generate():
filename = voice_generate("Test")
voice_play(filename) # Use the returned filename
afplay Not Found
The afplay command is macOS-only. If you're on Linux/Windows, you'll need to modify voice_play() to use a different audio player.
Voice Preset
The default voice is v2/fr_speaker_1 (French speaker #1). To use a different voice:
-
Edit
.envor set environment variable:export DEFAULT_VOICE=v2/en_speaker_6 -
See
VOICES.mdin bark-tts project for all available voices (130 total)
Performance
- Generation time: 45-70 seconds for short text (on Jetson AGX Orin 64GB)
- Cached results: Instant if same text was generated before
- Timeout: 120 seconds (configurable in code)
- Poll interval: 3 seconds
Related Projects
- bark-tts - The Bark TTS service this MCP connects to
- dreamtail-mcp - Similar MCP for image generation with DreamTail
License
MIT