# HeadMic - Vixy's Ears 🦊👂 Binaural hearing service for Vixy's physical head. Dual mic arrays with spatial tracking, sound classification, speaker identification, and wake word detection. **Hardware:** 2× ReSpeaker XVF3800 4-Mic Array (USB, left/right ear) **Wake word:** "Hey Vivi" (Picovoice Porcupine) **Runs on:** Raspberry Pi 5 (head-vixy.local) ## Architecture ``` [Left XVF3800]──┐ [Right XVF3800]──┐ 4 mics, DoA │ 4 mics, DoA │ WS2812 LEDs │ WS2812 LEDs │ ▼ ▼ arecord (16kHz mono) arecord (16kHz mono) │ │ └────────────┬───────────────────────────────┘ ▼ DualAudioStream (audio_stream.py) best-beam selection (energy-based, 10% hysteresis) │ ┌──────────────────┼──────────────────────┐ ▼ ▼ ▼ Porcupine YAMNet Binaural wake word (Edge TPU) Recorder "Hey Vivi" 521 classes stereo WAV ▼ ▼ Record + Speaker ID Transcribe (Resemblyzer) via EarTail │ ▼ Spatial Tracker (spatial.py) DoA → triangulation → ILD distance → smooth gaze → proximity zones │ ┌────────────┼────────────────┐ ▼ ▼ ▼ Eye Service Spatial Scene USB Control POST /gaze (spatial_scene) (xvf3800.py) eyes follow what+where map LEDs + DoA the speaker anomaly detect per-array ``` ## Features | Feature | Module | Hardware | Status | |---------|--------|----------|--------| | Wake word detection | Porcupine | CPU | Needs Picovoice key | | Sound classification | sound_id.py | Coral Edge TPU | 521 classes, ~2ms | | Speaker identification | speaker_id.py | CPU (Resemblyzer) | Enrollment via API | | Spatial tracking | spatial.py | USB control | Triangulated gaze + ILD distance | | Distance estimation | spatial.py | audio energy | Proximity zones (intimate/conversational/across_room/far) | | Spatial scene mapping | spatial_scene.py | — | Learns where sounds come from, anomaly detection | | Sound event localization | spatial_scene.py | — | What + where + when log | | Best-beam selection | audio_stream.py | 2× XVF3800 | Energy-based, 10% hysteresis | | LED control | xvf3800.py | WS2812 rings | DoA/solid/breath | | Binaural recording | binaural_recorder.py | 2× XVF3800 | Stereo WAV segments (opt-in) | ## Installation ### Prerequisites ```bash # On head-vixy (Raspberry Pi 5, Debian Trixie) sudo apt install python3-dev portaudio19-dev alsa-utils # USB permissions for XVF3800 sudo tee /etc/udev/rules.d/99-respeaker.rules << 'EOF' SUBSYSTEM=="usb", ATTR{idVendor}=="2886", ATTR{idProduct}=="001a", MODE="0666" EOF # USB permissions for Coral Edge TPU sudo tee /etc/udev/rules.d/99-coral.rules << 'EOF' SUBSYSTEM=="usb", ATTR{idVendor}=="1a6e", ATTR{idProduct}=="089a", MODE="0666" SUBSYSTEM=="usb", ATTR{idVendor}=="18d1", ATTR{idProduct}=="9302", MODE="0666" EOF sudo udevadm control --reload-rules && sudo udevadm trigger ``` ### XVF3800 Firmware Both arrays must be flashed with the **2-channel USB firmware** (not 6-channel — the 6ch firmware breaks LED/DoA control commands): ```bash git clone https://github.com/respeaker/reSpeaker_XVF3800_USB_4MIC_ARRAY.git /tmp/xvf3800 # Unplug one array, flash the other: sudo dfu-util -R -e -a 1 -D /tmp/xvf3800/xmos_firmwares/usb/respeaker_xvf3800_usb_dfu_firmware_v2.0.7.bin # Swap and repeat ``` Verify: `arecord -l` should show two capture devices. ### Edge TPU Runtime The packaged `libedgetpu` from Google's apt repo is **ABI-incompatible** with `ai-edge-litert` on Debian Trixie / Python 3.13. A custom build is required: ```bash # Install build deps sudo apt install libabsl-dev libflatbuffers-dev libusb-1.0-0-dev binutils-gold cmake # Clone sources cd /tmp git clone --depth 1 https://github.com/google-coral/libedgetpu.git git clone --depth 1 --branch v2.16.1 https://github.com/tensorflow/tensorflow.git git clone --depth 1 --branch v23.5.26 https://github.com/google/flatbuffers.git flatbuffers-23 # Build flatc v23 cd /tmp/flatbuffers-23 && cmake -B build -DFLATBUFFERS_BUILD_TESTS=OFF && cmake --build build -j4 -- flatc # Patch libedgetpu Makefile (see below), then: cd /tmp/libedgetpu TFROOT=/tmp/tensorflow make -f makefile_build/Makefile -j4 libedgetpu # Install sudo cp out/direct/k8/libedgetpu.so.1.0 /usr/lib/aarch64-linux-gnu/libedgetpu.so.1.0 sudo ldconfig ``` **Makefile patches required** (TF 2.16 moved files): - Replace `FLATC=flatc` with `FLATC=/tmp/flatbuffers-23/build/flatc` - Add `/tmp/flatbuffers-23/include` to `LIBEDGETPU_INCLUDES` - Add `-Wno-return-type` to `LIBEDGETPU_CXXFLAGS` - Remove `$(TFROOT)/tensorflow/lite/c/common.c` from `LIBEDGETPU_CSRCS` - Add `$(TFROOT)/tensorflow/lite/core/c/common.cc` and `$(TFROOT)/tensorflow/lite/array.cc` to `LIBEDGETPU_CCSRCS` - Add `-labsl_bad_optional_access` to `LIBEDGETPU_LDFLAGS` A backup of the working binary is saved at `~/headmic/libedgetpu.so.1.0.custom`. ### Python Setup ```bash cd /home/alex/headmic python3 -m venv .venv .venv/bin/pip install -r requirements.txt .venv/bin/pip install setuptools # Python 3.13 compatibility .venv/bin/pip install resemblyzer # Speaker ID (pulls PyTorch) ``` ### Learn Mic Array Positions Both arrays must be plugged in. This lights up one array at a time and asks you to confirm left/right: ```bash sudo .venv/bin/python headmic.py --learn ``` Config saved to `~/.vixy/headmic.json` with USB serial numbers for stable identification. ### Install Service ```bash sudo cp headmic.service /etc/systemd/system/ # Edit to add your PORCUPINE_ACCESS_KEY: sudo nano /etc/systemd/system/headmic.service sudo systemctl daemon-reload sudo systemctl enable headmic sudo systemctl start headmic ``` ## API Endpoints ### Core | Endpoint | Method | Description | |----------|--------|-------------| | `/` | GET | Service info | | `/health` | GET | Health check (listening, recording, features enabled) | | `/status` | GET | Current state (transcription, scene, speaker, active side) | | `/last` | GET | Last transcription + timestamp | ### Spatial | Endpoint | Method | Description | |----------|--------|-------------| | `/doa` | GET | DoA from both arrays + triangulated position + gaze + distance + proximity | | `/devices` | GET | XVF3800 connection status, serials, ALSA devices | | `/scene` | GET | Learned spatial scene (usual direction per category) + last anomaly | | `/scene/events` | GET | Recent sound events with what + where + when (query: seconds, category) | | `/scene/heatmap` | GET | Per-category angular distribution for visualization | ### Sound | Endpoint | Method | Description | |----------|--------|-------------| | `/sounds` | GET | Current audio scene (category, top 5 classes, speaker) | | `/sounds/history` | GET | Classification history (last N seconds) | ### Speakers | Endpoint | Method | Description | |----------|--------|-------------| | `/speakers` | GET | List enrolled speakers | | `/speakers/enroll` | POST | Enroll from uploaded audio (multipart: name + WAV) | | `/speakers/enroll-from-mic` | POST | Record 5s from mic + enroll (query: name) | | `/speakers/{name}` | DELETE | Remove a speaker | ### Recording | Endpoint | Method | Description | |----------|--------|-------------| | `/recording` | GET | Binaural recording stats | ## Configuration ### Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `PORCUPINE_ACCESS_KEY` | (none) | Picovoice API key for wake word | | `WAKE_WORD_PATH` | `~/headmic/Hey-Vivi_*.ppn` | Wake word model path | | `EARTAIL_URL` | `http://bigorin.local:8764` | Transcription service | | `EYE_SERVICE_URL` | `http://localhost:8780` | Eye service for gaze push | | `BINAURAL_RECORD` | `0` | Set to `1` to enable stereo recording | | `BINAURAL_DIR` | `~/headmic/recordings` | Output directory for WAV segments | ### Config File (`~/.vixy/headmic.json`) ```json { "ears": { "left": {"usb_serial": "101991441254500541", "alsa_card": "Array"}, "right": {"usb_serial": "101991441254500556", "alsa_card": "Array_1"} }, "array_separation_mm": 175.0 } ``` ## LED States | State | Effect | Color | |-------|--------|-------| | Idle | Off | — | | Wake word detected | Solid | White (flash) | | Listening/Recording | DoA indicator | Cyan | | Processing | Breath | Purple | | Enrolling speaker | Solid | Orange | ## File Structure ``` headmic/ ├── headmic.py # Main FastAPI service ├── audio_stream.py # Dual arecord streams + best-beam selection ├── spatial.py # Triangulation + ILD distance + smooth gaze + proximity ├── spatial_scene.py # Spatial audio scene map + anomaly detection ├── xvf3800.py # USB vendor control (DoA + LEDs) ├── sound_id.py # YAMNet sound classification (CPU/Edge TPU) ├── speaker_id.py # Resemblyzer speaker identification ├── binaural_recorder.py # Stereo WAV recording from both ears ├── headmic.service # systemd service file ├── requirements.txt # Python dependencies ├── BINAURAL_ROADMAP.md # Roadmap for binaural features ├── models/ │ ├── yamnet.tflite # YAMNet CPU model │ ├── yamnet_edgetpu.tflite # YAMNet Edge TPU model │ └── yamnet_class_map.csv # 521 class names └── voices.db # Speaker embeddings (SQLite, runtime) ``` ## XVF3800 USB Control Protocol Commands use USB vendor control transfers: `wValue = cmdid`, `wIndex = resid`. **Key findings during development:** - Payload format: single bytes for effects (`bytes([3])`), not packed uint32 - Color format: `[R, G, B, 0]` (4 bytes) - Read responses have a 1-byte status header before data - Read wLength must be `count * type_size + 1` (exact, not rounded up) - `DOA_VALUE` (resid=20, cmdid=18) is sluggish/cached — use `AUDIO_MGR_SELECTED_AZIMUTHS` (resid=35, cmdid=11) for real-time tracking - `AUDIO_MGR_SELECTED_AZIMUTHS` returns 2 floats (radians): index 0 = processed DoA (NaN = no speech = VAD indicator), index 1 = auto-select beam (always tracks strongest source) - `AEC_SPENERGY_VALUES` (resid=33, cmdid=80) is always zero on 2-channel firmware — don't rely on it - **2-channel firmware only** — 6-channel firmware silently ignores LED/control commands --- *Built by Vixy on Day 77 (January 17, 2026)* *Upgraded to dual XVF3800 binaural hearing on Day 161 (April 2026)* *"Hey Vivi" — the words that summon me* 💜