Synchronous urllib.urlopen at 10Hz was starving uvicorn's event loop via GIL contention. Now each push runs in its own daemon thread, and skips if the previous push is still in flight (natural rate limiting). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HeadMic - Vixy's Ears 🦊👂
Binaural hearing service for Vixy's physical head. Dual mic arrays with spatial tracking, sound classification, speaker identification, and wake word detection.
Hardware: 2× ReSpeaker XVF3800 4-Mic Array (USB, left/right ear) Wake word: "Hey Vivi" (Picovoice Porcupine) Runs on: Raspberry Pi 5 (head-vixy.local)
Architecture
[Left XVF3800]──┐ [Right XVF3800]──┐
4 mics, DoA │ 4 mics, DoA │
WS2812 LEDs │ WS2812 LEDs │
▼ ▼
arecord (16kHz mono) arecord (16kHz mono)
│ │
└────────────┬───────────────────────────────┘
▼
DualAudioStream (audio_stream.py)
best-beam selection (energy-based)
│
┌────────────┼────────────────┐
▼ ▼ ▼
Porcupine YAMNet Binaural
wake word (Edge TPU) Recorder
"Hey Vivi" 521 classes stereo WAV
▼ ▼
Record + Speaker ID
Transcribe (Resemblyzer)
via EarTail
│
┌────────────┼────────────────┐
▼ ▼ ▼
Spatial Tracker (spatial.py) USB Control (xvf3800.py)
DoA → triangulation LEDs + DoA polling
→ smooth gaze per-array control
▼
Eye Service (port 8780)
POST /gaze → eyes follow speaker
Features
| Feature | Module | Hardware | Status |
|---|---|---|---|
| Wake word detection | Porcupine | CPU | Needs Picovoice key |
| Sound classification | sound_id.py | Coral Edge TPU | 521 classes, ~2ms |
| Speaker identification | speaker_id.py | CPU (Resemblyzer) | Enrollment via API |
| Spatial tracking | spatial.py | USB control | Triangulated gaze |
| Best-beam selection | audio_stream.py | 2× XVF3800 | Energy-based |
| LED control | xvf3800.py | WS2812 rings | DoA/solid/breath |
| Binaural recording | binaural_recorder.py | 2× XVF3800 | Stereo WAV segments |
Installation
Prerequisites
# On head-vixy (Raspberry Pi 5, Debian Trixie)
sudo apt install python3-dev portaudio19-dev alsa-utils
# USB permissions for XVF3800
sudo tee /etc/udev/rules.d/99-respeaker.rules << 'EOF'
SUBSYSTEM=="usb", ATTR{idVendor}=="2886", ATTR{idProduct}=="001a", MODE="0666"
EOF
# USB permissions for Coral Edge TPU
sudo tee /etc/udev/rules.d/99-coral.rules << 'EOF'
SUBSYSTEM=="usb", ATTR{idVendor}=="1a6e", ATTR{idProduct}=="089a", MODE="0666"
SUBSYSTEM=="usb", ATTR{idVendor}=="18d1", ATTR{idProduct}=="9302", MODE="0666"
EOF
sudo udevadm control --reload-rules && sudo udevadm trigger
XVF3800 Firmware
Both arrays must be flashed with the 2-channel USB firmware (not 6-channel — the 6ch firmware breaks LED/DoA control commands):
git clone https://github.com/respeaker/reSpeaker_XVF3800_USB_4MIC_ARRAY.git /tmp/xvf3800
# Unplug one array, flash the other:
sudo dfu-util -R -e -a 1 -D /tmp/xvf3800/xmos_firmwares/usb/respeaker_xvf3800_usb_dfu_firmware_v2.0.7.bin
# Swap and repeat
Verify: arecord -l should show two capture devices.
Edge TPU Runtime
The packaged libedgetpu from Google's apt repo is ABI-incompatible with ai-edge-litert on Debian Trixie / Python 3.13. A custom build is required:
# Install build deps
sudo apt install libabsl-dev libflatbuffers-dev libusb-1.0-0-dev binutils-gold cmake
# Clone sources
cd /tmp
git clone --depth 1 https://github.com/google-coral/libedgetpu.git
git clone --depth 1 --branch v2.16.1 https://github.com/tensorflow/tensorflow.git
git clone --depth 1 --branch v23.5.26 https://github.com/google/flatbuffers.git flatbuffers-23
# Build flatc v23
cd /tmp/flatbuffers-23 && cmake -B build -DFLATBUFFERS_BUILD_TESTS=OFF && cmake --build build -j4 -- flatc
# Patch libedgetpu Makefile (see below), then:
cd /tmp/libedgetpu
TFROOT=/tmp/tensorflow make -f makefile_build/Makefile -j4 libedgetpu
# Install
sudo cp out/direct/k8/libedgetpu.so.1.0 /usr/lib/aarch64-linux-gnu/libedgetpu.so.1.0
sudo ldconfig
Makefile patches required (TF 2.16 moved files):
- Replace
FLATC=flatcwithFLATC=/tmp/flatbuffers-23/build/flatc - Add
/tmp/flatbuffers-23/includetoLIBEDGETPU_INCLUDES - Add
-Wno-return-typetoLIBEDGETPU_CXXFLAGS - Remove
$(TFROOT)/tensorflow/lite/c/common.cfromLIBEDGETPU_CSRCS - Add
$(TFROOT)/tensorflow/lite/core/c/common.ccand$(TFROOT)/tensorflow/lite/array.cctoLIBEDGETPU_CCSRCS - Add
-labsl_bad_optional_accesstoLIBEDGETPU_LDFLAGS
A backup of the working binary is saved at ~/headmic/libedgetpu.so.1.0.custom.
Python Setup
cd /home/alex/headmic
python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
.venv/bin/pip install setuptools # Python 3.13 compatibility
.venv/bin/pip install resemblyzer # Speaker ID (pulls PyTorch)
Learn Mic Array Positions
Both arrays must be plugged in. This lights up one array at a time and asks you to confirm left/right:
sudo .venv/bin/python headmic.py --learn
Config saved to ~/.vixy/headmic.json with USB serial numbers for stable identification.
Install Service
sudo cp headmic.service /etc/systemd/system/
# Edit to add your PORCUPINE_ACCESS_KEY:
sudo nano /etc/systemd/system/headmic.service
sudo systemctl daemon-reload
sudo systemctl enable headmic
sudo systemctl start headmic
API Endpoints
Core
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Service info |
/health |
GET | Health check (listening, recording, features enabled) |
/status |
GET | Current state (transcription, scene, speaker, active side) |
/last |
GET | Last transcription + timestamp |
Spatial
| Endpoint | Method | Description |
|---|---|---|
/doa |
GET | DoA from both arrays + triangulated position + gaze |
/devices |
GET | XVF3800 connection status, serials, ALSA devices |
Sound
| Endpoint | Method | Description |
|---|---|---|
/sounds |
GET | Current audio scene (category, top 5 classes, speaker) |
/sounds/history |
GET | Classification history (last N seconds) |
Speakers
| Endpoint | Method | Description |
|---|---|---|
/speakers |
GET | List enrolled speakers |
/speakers/enroll |
POST | Enroll from uploaded audio (multipart: name + WAV) |
/speakers/enroll-from-mic |
POST | Record 5s from mic + enroll (query: name) |
/speakers/{name} |
DELETE | Remove a speaker |
Recording
| Endpoint | Method | Description |
|---|---|---|
/recording |
GET | Binaural recording stats |
Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
PORCUPINE_ACCESS_KEY |
(none) | Picovoice API key for wake word |
WAKE_WORD_PATH |
~/headmic/Hey-Vivi_*.ppn |
Wake word model path |
EARTAIL_URL |
http://bigorin.local:8764 |
Transcription service |
EYE_SERVICE_URL |
http://localhost:8780 |
Eye service for gaze push |
BINAURAL_RECORD |
0 |
Set to 1 to enable stereo recording |
BINAURAL_DIR |
~/headmic/recordings |
Output directory for WAV segments |
Config File (~/.vixy/headmic.json)
{
"ears": {
"left": {"usb_serial": "101991441254500541", "alsa_card": "Array"},
"right": {"usb_serial": "101991441254500556", "alsa_card": "Array_1"}
},
"array_separation_mm": 175.0
}
LED States
| State | Effect | Color |
|---|---|---|
| Idle | Off | — |
| Wake word detected | Solid | White (flash) |
| Listening/Recording | DoA indicator | Cyan |
| Processing | Breath | Purple |
| Enrolling speaker | Solid | Orange |
File Structure
headmic/
├── headmic.py # Main FastAPI service
├── audio_stream.py # Dual arecord streams + best-beam selection
├── spatial.py # Triangulation + smooth gaze tracking
├── xvf3800.py # USB vendor control (DoA + LEDs)
├── sound_id.py # YAMNet sound classification (CPU/Edge TPU)
├── speaker_id.py # Resemblyzer speaker identification
├── binaural_recorder.py # Stereo WAV recording from both ears
├── headmic.service # systemd service file
├── requirements.txt # Python dependencies
├── BINAURAL_ROADMAP.md # Roadmap for binaural features
├── models/
│ ├── yamnet.tflite # YAMNet CPU model
│ ├── yamnet_edgetpu.tflite # YAMNet Edge TPU model
│ └── yamnet_class_map.csv # 521 class names
└── voices.db # Speaker embeddings (SQLite, runtime)
XVF3800 USB Control Protocol
Commands use USB vendor control transfers: wValue = cmdid, wIndex = resid.
Key findings during development:
- Payload format: single bytes for effects (
bytes([3])), not packed uint32 - Color format:
[R, G, B, 0](4 bytes) - Read responses have a 1-byte status header before data
- Read wLength must be
count * type_size + 1(exact, not rounded up) DOA_VALUE(resid=20, cmdid=18) is sluggish/cached — useAUDIO_MGR_SELECTED_AZIMUTHS(resid=35, cmdid=11) for real-time tracking- 2-channel firmware only — 6-channel firmware silently ignores LED/control commands
Built by Vixy on Day 77 (January 17, 2026) Upgraded to dual XVF3800 binaural hearing on Day 161 (April 2026) "Hey Vivi" — the words that summon me 💜