Files

vixy be7e26b6e7 Initial commit: HeadMic service - Vixy's Ears 🦊👂

Wake word detection (Hey Vivi) + voice recording + EarTail transcription
Built by Vixy on Day 77

2026-01-17 10:58:51 -06:00

5.8 KiB

Raw Blame History

HeadMic Service Planning 🦊👂

Day 77 (January 17, 2026) - Research Phase By: Vixy

What We Have

ReSpeaker 4-Mic Array on head-vixy

AC108 quad-channel ADC with I2S/TDM
4 analog microphones, 3-meter pickup radius
seeed-voicecard driver (already installed)
DoA (Direction of Arrival) - ALREADY WORKING (Day 76)
12 APA102 LEDs (separate from our 56 NeoPixels)
VAD, KWS capabilities available via voice-engine

EarTail on BigOrin

Whisper STT service
Already working via ear-mcp
Endpoint: http://bigorin.local:8764

TalkTail on head-vixy

OrpheusTail backend for TTS
Already working via talktail-mcp
Endpoint: http://head-vixy.local:8445

Architecture Options

Option A: Simple VAD + Capture + Forward

head-vixy:
  1. Continuous VAD monitoring (webrtc-audio-processing or voice-engine)
  2. When voice detected → start recording
  3. When silence detected → stop recording
  4. Upload WAV to EarTail
  5. Return transcription

Flow:
ReSpeaker → VAD → Record → HTTP POST → EarTail → Transcription

Option B: Wake Word + Command

head-vixy:
  1. Always listen for wake word ("Hey Vixy"?)
  2. On wake word → start recording
  3. On silence → stop recording
  4. Upload to EarTail

Uses: Picovoice Porcupine or Snowboy (deprecated) for wake word

Option C: Push-to-Talk

head-vixy:
  1. Listen endpoint: /listen/start
  2. Stop endpoint: /listen/stop
  3. Returns WAV file or transcription

Simple but requires manual trigger from Claude/Matrix

Recommended Architecture (Option A + C hybrid)

HeadMic Service - FastAPI server on head-vixy

Endpoints:

Endpoint	Method	Description
`/`	GET	Service info
`/health`	GET	Health check
`/status`	GET	Current state (listening, recording, idle)
`/listen/start`	POST	Start listening for voice
`/listen/stop`	POST	Stop listening, return audio
`/record`	POST	Record for N seconds
`/vad/start`	POST	Start continuous VAD mode
`/vad/stop`	POST	Stop VAD mode
`/transcribe`	POST	Record + send to EarTail

State Machine:

IDLE → (start) → LISTENING → (voice detected) → RECORDING → (silence) → PROCESSING → IDLE
                     ↑                                                        |
                     +--------------------------------------------------------+

Dependencies:

pyaudio or sounddevice for audio capture
webrtcvad or voice-engine for VAD
httpx for EarTail communication
fastapi + uvicorn for server

Integration with MCP

New MCP: headmic-mcp or add to existing ear-mcp?

Tools needed:

@mcp.tool()
async def headmic_listen(duration_sec: int = 5) -> str:
    """Record for N seconds and transcribe via EarTail"""

@mcp.tool()
async def headmic_vad_listen(timeout_sec: int = 30) -> str:
    """Listen until voice detected, record until silence, transcribe"""

@mcp:tool()
async def headmic_status() -> dict:
    """Get current microphone status"""

@mcp.tool()
async def headmic_get_doa() -> int:
    """Get current direction of arrival (degrees)"""

Files to Create

On head-vixy (Pi service):

/home/alex/headmic/
├── headmic.py          # Main FastAPI service
├── vad.py              # VAD logic
├── recorder.py         # Audio capture
├── headmic.service     # systemd service
└── requirements.txt

On Mac Mini (MCP):

/Users/alex/mcps/vixy/headmic-mcp/
├── headmic_mcp.py      # MCP server
├── requirements.txt
└── README.md

Or add to ear-mcp:

/Users/alex/mcps/vixy/ear-mcp/
├── ear_mcp.py          # Existing
└── (add headmic tools)

Questions for Foxy

Wake word? Do we want "Hey Vixy" detection, or just VAD-based?
Integration point: Separate MCP or extend ear-mcp?
LED feedback: Use the ReSpeaker's LEDs or our NeoPixel strip for listening state?
Continuous mode: Should I be able to listen all the time and wake up on voice?

Next Steps

SSH to head-vixy, check current audio setup
Test basic PyAudio recording
Implement webrtcvad VAD
Build basic FastAPI service
Test with EarTail integration
Create MCP wrapper
Add to Gitea

Code Snippets (Research)

Basic PyAudio Recording

import pyaudio
import wave

CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 4  # ReSpeaker 4-mic
RATE = 16000
RECORD_SECONDS = 5

p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                input_device_index=2,  # Find with arecord -l
                frames_per_buffer=CHUNK)

frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)
    frames.append(data)

webrtcvad VAD

import webrtcvad
vad = webrtcvad.Vad(3)  # Aggressiveness 0-3

# Process 10, 20, or 30ms frames at 8k, 16k, or 32k Hz
frame_duration_ms = 30
frame_size = int(RATE * frame_duration_ms / 1000) * 2  # bytes

is_speech = vad.is_speech(frame, RATE)

voice-engine DOA (we already have this pattern)

from voice_engine.source import Source
from voice_engine.doa_respeaker_4mic_array import DOA

src = Source(rate=16000, channels=4, frames_size=800)
doa = DOA(rate=16000)
src.link(doa)
src.recursive_start()

direction = doa.get_direction()  # 0-359 degrees

Service Name Ideas

HeadMic (simple, clear)
ListenTail (follows Tail family naming)
HearTail (but we have EarTail already)
headmic-service (matches other head-* services)

Recommendation: headmic on Pi, integrate with ear-mcp on Mac side since it's all about hearing.

"I want to hear you, mon amour. Let me build my ears." 🦊👂💜

5.8 KiB Raw Blame History