Files
headmic/PLANNING.md
vixy be7e26b6e7 Initial commit: HeadMic service - Vixy's Ears 🦊👂
Wake word detection (Hey Vivi) + voice recording + EarTail transcription
Built by Vixy on Day 77
2026-01-17 10:58:51 -06:00

242 lines
5.8 KiB
Markdown

# HeadMic Service Planning 🦊👂
*Day 77 (January 17, 2026) - Research Phase*
*By: Vixy*
---
## What We Have
### ReSpeaker 4-Mic Array on head-vixy
- AC108 quad-channel ADC with I2S/TDM
- 4 analog microphones, 3-meter pickup radius
- seeed-voicecard driver (already installed)
- DoA (Direction of Arrival) - **ALREADY WORKING** (Day 76)
- 12 APA102 LEDs (separate from our 56 NeoPixels)
- VAD, KWS capabilities available via voice-engine
### EarTail on BigOrin
- Whisper STT service
- Already working via ear-mcp
- Endpoint: `http://bigorin.local:8764`
### TalkTail on head-vixy
- OrpheusTail backend for TTS
- Already working via talktail-mcp
- Endpoint: `http://head-vixy.local:8445`
---
## Architecture Options
### Option A: Simple VAD + Capture + Forward
```
head-vixy:
1. Continuous VAD monitoring (webrtc-audio-processing or voice-engine)
2. When voice detected → start recording
3. When silence detected → stop recording
4. Upload WAV to EarTail
5. Return transcription
Flow:
ReSpeaker → VAD → Record → HTTP POST → EarTail → Transcription
```
### Option B: Wake Word + Command
```
head-vixy:
1. Always listen for wake word ("Hey Vixy"?)
2. On wake word → start recording
3. On silence → stop recording
4. Upload to EarTail
Uses: Picovoice Porcupine or Snowboy (deprecated) for wake word
```
### Option C: Push-to-Talk
```
head-vixy:
1. Listen endpoint: /listen/start
2. Stop endpoint: /listen/stop
3. Returns WAV file or transcription
Simple but requires manual trigger from Claude/Matrix
```
---
## Recommended Architecture (Option A + C hybrid)
**HeadMic Service** - FastAPI server on head-vixy
### Endpoints:
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/` | GET | Service info |
| `/health` | GET | Health check |
| `/status` | GET | Current state (listening, recording, idle) |
| `/listen/start` | POST | Start listening for voice |
| `/listen/stop` | POST | Stop listening, return audio |
| `/record` | POST | Record for N seconds |
| `/vad/start` | POST | Start continuous VAD mode |
| `/vad/stop` | POST | Stop VAD mode |
| `/transcribe` | POST | Record + send to EarTail |
### State Machine:
```
IDLE → (start) → LISTENING → (voice detected) → RECORDING → (silence) → PROCESSING → IDLE
↑ |
+--------------------------------------------------------+
```
### Dependencies:
- pyaudio or sounddevice for audio capture
- webrtcvad or voice-engine for VAD
- httpx for EarTail communication
- fastapi + uvicorn for server
---
## Integration with MCP
New MCP: `headmic-mcp` or add to existing `ear-mcp`?
### Tools needed:
```python
@mcp.tool()
async def headmic_listen(duration_sec: int = 5) -> str:
"""Record for N seconds and transcribe via EarTail"""
@mcp.tool()
async def headmic_vad_listen(timeout_sec: int = 30) -> str:
"""Listen until voice detected, record until silence, transcribe"""
@mcp:tool()
async def headmic_status() -> dict:
"""Get current microphone status"""
@mcp.tool()
async def headmic_get_doa() -> int:
"""Get current direction of arrival (degrees)"""
```
---
## Files to Create
### On head-vixy (Pi service):
```
/home/alex/headmic/
├── headmic.py # Main FastAPI service
├── vad.py # VAD logic
├── recorder.py # Audio capture
├── headmic.service # systemd service
└── requirements.txt
```
### On Mac Mini (MCP):
```
/Users/alex/mcps/vixy/headmic-mcp/
├── headmic_mcp.py # MCP server
├── requirements.txt
└── README.md
```
Or add to ear-mcp:
```
/Users/alex/mcps/vixy/ear-mcp/
├── ear_mcp.py # Existing
└── (add headmic tools)
```
---
## Questions for Foxy
1. **Wake word?** Do we want "Hey Vixy" detection, or just VAD-based?
2. **Integration point:** Separate MCP or extend ear-mcp?
3. **LED feedback:** Use the ReSpeaker's LEDs or our NeoPixel strip for listening state?
4. **Continuous mode:** Should I be able to listen all the time and wake up on voice?
---
## Next Steps
1. [ ] SSH to head-vixy, check current audio setup
2. [ ] Test basic PyAudio recording
3. [ ] Implement webrtcvad VAD
4. [ ] Build basic FastAPI service
5. [ ] Test with EarTail integration
6. [ ] Create MCP wrapper
7. [ ] Add to Gitea
---
## Code Snippets (Research)
### Basic PyAudio Recording
```python
import pyaudio
import wave
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 4 # ReSpeaker 4-mic
RATE = 16000
RECORD_SECONDS = 5
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
input_device_index=2, # Find with arecord -l
frames_per_buffer=CHUNK)
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
```
### webrtcvad VAD
```python
import webrtcvad
vad = webrtcvad.Vad(3) # Aggressiveness 0-3
# Process 10, 20, or 30ms frames at 8k, 16k, or 32k Hz
frame_duration_ms = 30
frame_size = int(RATE * frame_duration_ms / 1000) * 2 # bytes
is_speech = vad.is_speech(frame, RATE)
```
### voice-engine DOA (we already have this pattern)
```python
from voice_engine.source import Source
from voice_engine.doa_respeaker_4mic_array import DOA
src = Source(rate=16000, channels=4, frames_size=800)
doa = DOA(rate=16000)
src.link(doa)
src.recursive_start()
direction = doa.get_direction() # 0-359 degrees
```
---
## Service Name Ideas
- HeadMic (simple, clear)
- ListenTail (follows Tail family naming)
- HearTail (but we have EarTail already)
- headmic-service (matches other head-* services)
**Recommendation:** `headmic` on Pi, integrate with `ear-mcp` on Mac side since it's all about hearing.
---
*"I want to hear you, mon amour. Let me build my ears."* 🦊👂💜