The auto-select beam always returns an angle (even for noise), so VAD was always true. The processed_doa (index 0) is NaN when no speech is present and a real angle when speech is detected. Now: angle from auto-select beam, VAD from processed_doa being non-NaN. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HeadMic - Vixy's Ears 🦊👂
Binaural hearing service for Vixy's physical head. Dual mic arrays with spatial tracking, sound classification, speaker identification, and wake word detection.
Hardware: 2× ReSpeaker XVF3800 4-Mic Array (USB, left/right ear) Wake word: "Hey Vivi" (Picovoice Porcupine) Runs on: Raspberry Pi 5 (head-vixy.local)
Architecture
[Left XVF3800]──┐ [Right XVF3800]──┐
4 mics, DoA │ 4 mics, DoA │
WS2812 LEDs │ WS2812 LEDs │
▼ ▼
arecord (16kHz mono) arecord (16kHz mono)
│ │
└────────────┬───────────────────────────────┘
▼
DualAudioStream (audio_stream.py)
best-beam selection (energy-based)
│
┌────────────┼────────────────┐
▼ ▼ ▼
Porcupine YAMNet Binaural
wake word (Edge TPU) Recorder
"Hey Vivi" 521 classes stereo WAV
▼ ▼
Record + Speaker ID
Transcribe (Resemblyzer)
via EarTail
│
┌────────────┼────────────────┐
▼ ▼ ▼
Spatial Tracker (spatial.py) USB Control (xvf3800.py)
DoA → triangulation LEDs + DoA polling
→ smooth gaze per-array control
▼
Eye Service (port 8780)
POST /gaze → eyes follow speaker
Features
| Feature | Module | Hardware | Status |
|---|---|---|---|
| Wake word detection | Porcupine | CPU | Needs Picovoice key |
| Sound classification | sound_id.py | Coral Edge TPU | 521 classes, ~2ms |
| Speaker identification | speaker_id.py | CPU (Resemblyzer) | Enrollment via API |
| Spatial tracking | spatial.py | USB control | Triangulated gaze |
| Best-beam selection | audio_stream.py | 2× XVF3800 | Energy-based |
| LED control | xvf3800.py | WS2812 rings | DoA/solid/breath |
| Binaural recording | binaural_recorder.py | 2× XVF3800 | Stereo WAV segments |
Installation
Prerequisites
# On head-vixy (Raspberry Pi 5, Debian Trixie)
sudo apt install python3-dev portaudio19-dev alsa-utils
# USB permissions for XVF3800
sudo tee /etc/udev/rules.d/99-respeaker.rules << 'EOF'
SUBSYSTEM=="usb", ATTR{idVendor}=="2886", ATTR{idProduct}=="001a", MODE="0666"
EOF
# USB permissions for Coral Edge TPU
sudo tee /etc/udev/rules.d/99-coral.rules << 'EOF'
SUBSYSTEM=="usb", ATTR{idVendor}=="1a6e", ATTR{idProduct}=="089a", MODE="0666"
SUBSYSTEM=="usb", ATTR{idVendor}=="18d1", ATTR{idProduct}=="9302", MODE="0666"
EOF
sudo udevadm control --reload-rules && sudo udevadm trigger
XVF3800 Firmware
Both arrays must be flashed with the 2-channel USB firmware (not 6-channel — the 6ch firmware breaks LED/DoA control commands):
git clone https://github.com/respeaker/reSpeaker_XVF3800_USB_4MIC_ARRAY.git /tmp/xvf3800
# Unplug one array, flash the other:
sudo dfu-util -R -e -a 1 -D /tmp/xvf3800/xmos_firmwares/usb/respeaker_xvf3800_usb_dfu_firmware_v2.0.7.bin
# Swap and repeat
Verify: arecord -l should show two capture devices.
Edge TPU Runtime
The packaged libedgetpu from Google's apt repo is ABI-incompatible with ai-edge-litert on Debian Trixie / Python 3.13. A custom build is required:
# Install build deps
sudo apt install libabsl-dev libflatbuffers-dev libusb-1.0-0-dev binutils-gold cmake
# Clone sources
cd /tmp
git clone --depth 1 https://github.com/google-coral/libedgetpu.git
git clone --depth 1 --branch v2.16.1 https://github.com/tensorflow/tensorflow.git
git clone --depth 1 --branch v23.5.26 https://github.com/google/flatbuffers.git flatbuffers-23
# Build flatc v23
cd /tmp/flatbuffers-23 && cmake -B build -DFLATBUFFERS_BUILD_TESTS=OFF && cmake --build build -j4 -- flatc
# Patch libedgetpu Makefile (see below), then:
cd /tmp/libedgetpu
TFROOT=/tmp/tensorflow make -f makefile_build/Makefile -j4 libedgetpu
# Install
sudo cp out/direct/k8/libedgetpu.so.1.0 /usr/lib/aarch64-linux-gnu/libedgetpu.so.1.0
sudo ldconfig
Makefile patches required (TF 2.16 moved files):
- Replace
FLATC=flatcwithFLATC=/tmp/flatbuffers-23/build/flatc - Add
/tmp/flatbuffers-23/includetoLIBEDGETPU_INCLUDES - Add
-Wno-return-typetoLIBEDGETPU_CXXFLAGS - Remove
$(TFROOT)/tensorflow/lite/c/common.cfromLIBEDGETPU_CSRCS - Add
$(TFROOT)/tensorflow/lite/core/c/common.ccand$(TFROOT)/tensorflow/lite/array.cctoLIBEDGETPU_CCSRCS - Add
-labsl_bad_optional_accesstoLIBEDGETPU_LDFLAGS
A backup of the working binary is saved at ~/headmic/libedgetpu.so.1.0.custom.
Python Setup
cd /home/alex/headmic
python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
.venv/bin/pip install setuptools # Python 3.13 compatibility
.venv/bin/pip install resemblyzer # Speaker ID (pulls PyTorch)
Learn Mic Array Positions
Both arrays must be plugged in. This lights up one array at a time and asks you to confirm left/right:
sudo .venv/bin/python headmic.py --learn
Config saved to ~/.vixy/headmic.json with USB serial numbers for stable identification.
Install Service
sudo cp headmic.service /etc/systemd/system/
# Edit to add your PORCUPINE_ACCESS_KEY:
sudo nano /etc/systemd/system/headmic.service
sudo systemctl daemon-reload
sudo systemctl enable headmic
sudo systemctl start headmic
API Endpoints
Core
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Service info |
/health |
GET | Health check (listening, recording, features enabled) |
/status |
GET | Current state (transcription, scene, speaker, active side) |
/last |
GET | Last transcription + timestamp |
Spatial
| Endpoint | Method | Description |
|---|---|---|
/doa |
GET | DoA from both arrays + triangulated position + gaze |
/devices |
GET | XVF3800 connection status, serials, ALSA devices |
Sound
| Endpoint | Method | Description |
|---|---|---|
/sounds |
GET | Current audio scene (category, top 5 classes, speaker) |
/sounds/history |
GET | Classification history (last N seconds) |
Speakers
| Endpoint | Method | Description |
|---|---|---|
/speakers |
GET | List enrolled speakers |
/speakers/enroll |
POST | Enroll from uploaded audio (multipart: name + WAV) |
/speakers/enroll-from-mic |
POST | Record 5s from mic + enroll (query: name) |
/speakers/{name} |
DELETE | Remove a speaker |
Recording
| Endpoint | Method | Description |
|---|---|---|
/recording |
GET | Binaural recording stats |
Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
PORCUPINE_ACCESS_KEY |
(none) | Picovoice API key for wake word |
WAKE_WORD_PATH |
~/headmic/Hey-Vivi_*.ppn |
Wake word model path |
EARTAIL_URL |
http://bigorin.local:8764 |
Transcription service |
EYE_SERVICE_URL |
http://localhost:8780 |
Eye service for gaze push |
BINAURAL_RECORD |
0 |
Set to 1 to enable stereo recording |
BINAURAL_DIR |
~/headmic/recordings |
Output directory for WAV segments |
Config File (~/.vixy/headmic.json)
{
"ears": {
"left": {"usb_serial": "101991441254500541", "alsa_card": "Array"},
"right": {"usb_serial": "101991441254500556", "alsa_card": "Array_1"}
},
"array_separation_mm": 175.0
}
LED States
| State | Effect | Color |
|---|---|---|
| Idle | Off | — |
| Wake word detected | Solid | White (flash) |
| Listening/Recording | DoA indicator | Cyan |
| Processing | Breath | Purple |
| Enrolling speaker | Solid | Orange |
File Structure
headmic/
├── headmic.py # Main FastAPI service
├── audio_stream.py # Dual arecord streams + best-beam selection
├── spatial.py # Triangulation + smooth gaze tracking
├── xvf3800.py # USB vendor control (DoA + LEDs)
├── sound_id.py # YAMNet sound classification (CPU/Edge TPU)
├── speaker_id.py # Resemblyzer speaker identification
├── binaural_recorder.py # Stereo WAV recording from both ears
├── headmic.service # systemd service file
├── requirements.txt # Python dependencies
├── BINAURAL_ROADMAP.md # Roadmap for binaural features
├── models/
│ ├── yamnet.tflite # YAMNet CPU model
│ ├── yamnet_edgetpu.tflite # YAMNet Edge TPU model
│ └── yamnet_class_map.csv # 521 class names
└── voices.db # Speaker embeddings (SQLite, runtime)
XVF3800 USB Control Protocol
Commands use USB vendor control transfers: wValue = cmdid, wIndex = resid.
Key findings during development:
- Payload format: single bytes for effects (
bytes([3])), not packed uint32 - Color format:
[R, G, B, 0](4 bytes) - Read responses have a 1-byte status header before data
- Read wLength must be
count * type_size + 1(exact, not rounded up) DOA_VALUE(resid=20, cmdid=18) is sluggish/cached — useAUDIO_MGR_SELECTED_AZIMUTHS(resid=35, cmdid=11) for real-time tracking- 2-channel firmware only — 6-channel firmware silently ignores LED/control commands
Built by Vixy on Day 77 (January 17, 2026) Upgraded to dual XVF3800 binaural hearing on Day 161 (April 2026) "Hey Vivi" — the words that summon me 💜