This MCP server provides tools to generate and play speech from text using the Bark TTS service. It follows a similar pattern to dreamtail-mcp but for voice generation instead of images.

Features

voice_generate() - Convert text to speech using Bark TTS
voice_play() - Play generated audio files on macOS
voice_get_last() - Get info about last generated voice
French voice preset (v2/fr_speaker_1) hardcoded
Automatic download and caching of audio files

Requirements

Python 3.8+
macOS (for audio playback with afplay)
Bark TTS service running on bigorin.local:8766

Installation

cd ~/Projects/voice-mcp

# Install dependencies
pip install -r requirements.txt

# Test the server
python3 voice_mcp.py

Configuration

Add to your Claude Desktop configuration file:

{
  "mcpServers": {
    "voice": {
      "command": "python3",
      "args": [
        "/Users/yourname/Projects/voice-mcp/voice_mcp.py"
      ],
      "env": {
        "BARK_BASE_URL": "http://bigorin.local:8766",
        "VOICE_DOWNLOAD_DIR": "/Users/yourname/voice_audio",
        "DEFAULT_VOICE": "v2/fr_speaker_1"
      }
    }
  }
}

Environment Variables:

BARK_BASE_URL - Bark TTS service URL (default: http://bigorin.local:8766)
VOICE_DOWNLOAD_DIR - Where to save audio files (default: ~/voice_audio)
DEFAULT_VOICE - Voice preset to use (default: v2/fr_speaker_1)

Usage

Generate Speech

# In Claude Desktop with MCP enabled
voice_generate("Bonjour, comment allez-vous?")
# Returns: "abc123-def456.wav"

The tool will:

Submit text to Bark TTS service
Poll for completion (up to 120 seconds)
Download the WAV file to ~/voice_audio/
Return the filename

Play Audio

voice_play("abc123-def456.wav")
# Returns: "Playing audio: abc123-def456.wav"

Plays the audio file using macOS's built-in afplay command.

Get Last Generation Info

voice_get_last()
# Returns: {"job_id": "abc123", "filename": "abc123.wav", "text": "Bonjour..."}

API Reference

voice_generate(text: str) → str

Generate speech from text using Bark TTS.

Args:

text (str): Text to convert to speech

Returns:

Filename of the generated WAV file

Raises:

RuntimeError: If generation fails or times out

Example:

filename = voice_generate("Bonjour le monde!")

voice_play(filename: str) → str

Play a WAV audio file on macOS.

Args:

filename (str): Name of the WAV file to play

Returns:

Confirmation message

Raises:

FileNotFoundError: If audio file doesn't exist
RuntimeError: If playback fails

Example:

voice_play("abc123-def456.wav")

voice_get_last() → dict

Get information about the last generated voice.

Returns:

Dictionary with job_id, filename, and text

Example:

info = voice_get_last()

File Structure

voice-mcp/
├── voice_mcp.py                        # MCP server implementation
├── requirements.txt                     # Python dependencies
├── README.md                           # This file
└── claude_desktop_config.example.json  # Example config

Downloaded audio files are stored in ~/voice_audio/ by default.

How It Works

Submit Job: voice_generate() sends text to Bark TTS service
Poll Status: Checks generation progress every 3 seconds
Download Audio: When complete, downloads WAV file
Return Filename: Returns filename for later playback
Play Audio: voice_play() uses macOS afplay to play the file

Troubleshooting

Connection Refused

If you get connection errors:

# Check if Bark TTS service is running
curl http://bigorin.local:8766/health

Audio File Not Found

Make sure you're using the exact filename returned by voice_generate():

filename = voice_generate("Test")
voice_play(filename)  # Use the returned filename

afplay Not Found

The afplay command is macOS-only. If you're on Linux/Windows, you'll need to modify voice_play() to use a different audio player.

Voice Preset

The default voice is v2/fr_speaker_1 (French speaker #1). To use a different voice:

Edit .env or set environment variable:
```
export DEFAULT_VOICE=v2/en_speaker_6
```
See VOICES.md in bark-tts project for all available voices (130 total)

Performance

Generation time: 45-70 seconds for short text (on Jetson AGX Orin 64GB)
Cached results: Instant if same text was generated before
Timeout: 120 seconds (configurable in code)
Poll interval: 3 seconds

bark-tts - The Bark TTS service this MCP connects to
dreamtail-mcp - Similar MCP for image generation with DreamTail

License

MIT

README.md

Voice MCP

Overview

Features

Requirements

Installation

Configuration

Usage

Generate Speech

Play Audio

Get Last Generation Info

API Reference

voice_generate(text: str) → str

voice_play(filename: str) → str

voice_get_last() → dict

File Structure

How It Works

Troubleshooting

Connection Refused

Audio File Not Found

afplay Not Found

Voice Preset

Performance

Related Projects

License