Voice MCP

MCP server for text-to-speech generation using Bark TTS service on Jetson AGX Orin.

Overview

This MCP server provides tools to generate and play speech from text using the Bark TTS service. It follows a similar pattern to dreamtail-mcp but for voice generation instead of images.

Features

  • voice_generate() - Convert text to speech using Bark TTS
  • voice_play() - Play generated audio files on macOS
  • voice_get_last() - Get info about last generated voice
  • French voice preset (v2/fr_speaker_1) hardcoded
  • Automatic download and caching of audio files

Requirements

  • Python 3.8+
  • macOS (for audio playback with afplay)
  • Bark TTS service running on bigorin.local:8766

Installation

cd ~/Projects/voice-mcp

# Install dependencies
pip install -r requirements.txt

# Test the server
python3 voice_mcp.py

Configuration

Add to your Claude Desktop configuration file:

{
  "mcpServers": {
    "voice": {
      "command": "python3",
      "args": [
        "/Users/yourname/Projects/voice-mcp/voice_mcp.py"
      ],
      "env": {
        "BARK_BASE_URL": "http://bigorin.local:8766",
        "VOICE_DOWNLOAD_DIR": "/Users/yourname/voice_audio",
        "DEFAULT_VOICE": "v2/fr_speaker_1"
      }
    }
  }
}

Environment Variables:

  • BARK_BASE_URL - Bark TTS service URL (default: http://bigorin.local:8766)
  • VOICE_DOWNLOAD_DIR - Where to save audio files (default: ~/voice_audio)
  • DEFAULT_VOICE - Voice preset to use (default: v2/fr_speaker_1)

Usage

Generate Speech

# In Claude Desktop with MCP enabled
voice_generate("Bonjour, comment allez-vous?")
# Returns: "abc123-def456.wav"

The tool will:

  1. Submit text to Bark TTS service
  2. Poll for completion (up to 120 seconds)
  3. Download the WAV file to ~/voice_audio/
  4. Return the filename

Play Audio

voice_play("abc123-def456.wav")
# Returns: "Playing audio: abc123-def456.wav"

Plays the audio file using macOS's built-in afplay command.

Get Last Generation Info

voice_get_last()
# Returns: {"job_id": "abc123", "filename": "abc123.wav", "text": "Bonjour..."}

API Reference

voice_generate(text: str) → str

Generate speech from text using Bark TTS.

Args:

  • text (str): Text to convert to speech

Returns:

  • Filename of the generated WAV file

Raises:

  • RuntimeError: If generation fails or times out

Example:

filename = voice_generate("Bonjour le monde!")

voice_play(filename: str) → str

Play a WAV audio file on macOS.

Args:

  • filename (str): Name of the WAV file to play

Returns:

  • Confirmation message

Raises:

  • FileNotFoundError: If audio file doesn't exist
  • RuntimeError: If playback fails

Example:

voice_play("abc123-def456.wav")

voice_get_last() → dict

Get information about the last generated voice.

Returns:

  • Dictionary with job_id, filename, and text

Example:

info = voice_get_last()

File Structure

voice-mcp/
├── voice_mcp.py                        # MCP server implementation
├── requirements.txt                     # Python dependencies
├── README.md                           # This file
└── claude_desktop_config.example.json  # Example config

Downloaded audio files are stored in ~/voice_audio/ by default.

How It Works

  1. Submit Job: voice_generate() sends text to Bark TTS service
  2. Poll Status: Checks generation progress every 3 seconds
  3. Download Audio: When complete, downloads WAV file
  4. Return Filename: Returns filename for later playback
  5. Play Audio: voice_play() uses macOS afplay to play the file

Troubleshooting

Connection Refused

If you get connection errors:

# Check if Bark TTS service is running
curl http://bigorin.local:8766/health

Audio File Not Found

Make sure you're using the exact filename returned by voice_generate():

filename = voice_generate("Test")
voice_play(filename)  # Use the returned filename

afplay Not Found

The afplay command is macOS-only. If you're on Linux/Windows, you'll need to modify voice_play() to use a different audio player.

Voice Preset

The default voice is v2/fr_speaker_1 (French speaker #1). To use a different voice:

  1. Edit .env or set environment variable:

    export DEFAULT_VOICE=v2/en_speaker_6
    
  2. See VOICES.md in bark-tts project for all available voices (130 total)

Performance

  • Generation time: 45-70 seconds for short text (on Jetson AGX Orin 64GB)
  • Cached results: Instant if same text was generated before
  • Timeout: 120 seconds (configurable in code)
  • Poll interval: 3 seconds
  • bark-tts - The Bark TTS service this MCP connects to
  • dreamtail-mcp - Similar MCP for image generation with DreamTail

License

MIT

Description
MCP client for VoiceTail (Bark TTS on Jetson Orin)
Readme 40 KiB
Languages
Python 100%