Alex 2bbbb6da2b Fix API hang — run gaze push in detached thread
Synchronous urllib.urlopen at 10Hz was starving uvicorn's event loop
via GIL contention. Now each push runs in its own daemon thread, and
skips if the previous push is still in flight (natural rate limiting).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 21:24:49 -05:00
2026-04-11 15:11:22 -05:00
2026-04-11 15:27:12 -05:00
2026-01-17 10:58:51 -06:00
2026-04-11 15:51:24 -05:00

HeadMic - Vixy's Ears 🦊👂

Binaural hearing service for Vixy's physical head. Dual mic arrays with spatial tracking, sound classification, speaker identification, and wake word detection.

Hardware: 2× ReSpeaker XVF3800 4-Mic Array (USB, left/right ear) Wake word: "Hey Vivi" (Picovoice Porcupine) Runs on: Raspberry Pi 5 (head-vixy.local)

Architecture

[Left XVF3800]──┐                          [Right XVF3800]──┐
  4 mics, DoA   │                            4 mics, DoA    │
  WS2812 LEDs   │                            WS2812 LEDs    │
                ▼                                            ▼
        arecord (16kHz mono)                         arecord (16kHz mono)
                │                                            │
                └────────────┬───────────────────────────────┘
                             ▼
                  DualAudioStream (audio_stream.py)
                  best-beam selection (energy-based)
                             │
                ┌────────────┼────────────────┐
                ▼            ▼                ▼
         Porcupine      YAMNet           Binaural
         wake word      (Edge TPU)       Recorder
         "Hey Vivi"     521 classes      stereo WAV
                ▼            ▼
         Record +       Speaker ID
         Transcribe     (Resemblyzer)
         via EarTail
                             │
                ┌────────────┼────────────────┐
                ▼            ▼                ▼
         Spatial Tracker (spatial.py)    USB Control (xvf3800.py)
         DoA → triangulation             LEDs + DoA polling
         → smooth gaze                   per-array control
                ▼
         Eye Service (port 8780)
         POST /gaze → eyes follow speaker

Features

Feature Module Hardware Status
Wake word detection Porcupine CPU Needs Picovoice key
Sound classification sound_id.py Coral Edge TPU 521 classes, ~2ms
Speaker identification speaker_id.py CPU (Resemblyzer) Enrollment via API
Spatial tracking spatial.py USB control Triangulated gaze
Best-beam selection audio_stream.py 2× XVF3800 Energy-based
LED control xvf3800.py WS2812 rings DoA/solid/breath
Binaural recording binaural_recorder.py 2× XVF3800 Stereo WAV segments

Installation

Prerequisites

# On head-vixy (Raspberry Pi 5, Debian Trixie)
sudo apt install python3-dev portaudio19-dev alsa-utils

# USB permissions for XVF3800
sudo tee /etc/udev/rules.d/99-respeaker.rules << 'EOF'
SUBSYSTEM=="usb", ATTR{idVendor}=="2886", ATTR{idProduct}=="001a", MODE="0666"
EOF

# USB permissions for Coral Edge TPU
sudo tee /etc/udev/rules.d/99-coral.rules << 'EOF'
SUBSYSTEM=="usb", ATTR{idVendor}=="1a6e", ATTR{idProduct}=="089a", MODE="0666"
SUBSYSTEM=="usb", ATTR{idVendor}=="18d1", ATTR{idProduct}=="9302", MODE="0666"
EOF

sudo udevadm control --reload-rules && sudo udevadm trigger

XVF3800 Firmware

Both arrays must be flashed with the 2-channel USB firmware (not 6-channel — the 6ch firmware breaks LED/DoA control commands):

git clone https://github.com/respeaker/reSpeaker_XVF3800_USB_4MIC_ARRAY.git /tmp/xvf3800
# Unplug one array, flash the other:
sudo dfu-util -R -e -a 1 -D /tmp/xvf3800/xmos_firmwares/usb/respeaker_xvf3800_usb_dfu_firmware_v2.0.7.bin
# Swap and repeat

Verify: arecord -l should show two capture devices.

Edge TPU Runtime

The packaged libedgetpu from Google's apt repo is ABI-incompatible with ai-edge-litert on Debian Trixie / Python 3.13. A custom build is required:

# Install build deps
sudo apt install libabsl-dev libflatbuffers-dev libusb-1.0-0-dev binutils-gold cmake

# Clone sources
cd /tmp
git clone --depth 1 https://github.com/google-coral/libedgetpu.git
git clone --depth 1 --branch v2.16.1 https://github.com/tensorflow/tensorflow.git
git clone --depth 1 --branch v23.5.26 https://github.com/google/flatbuffers.git flatbuffers-23

# Build flatc v23
cd /tmp/flatbuffers-23 && cmake -B build -DFLATBUFFERS_BUILD_TESTS=OFF && cmake --build build -j4 -- flatc

# Patch libedgetpu Makefile (see below), then:
cd /tmp/libedgetpu
TFROOT=/tmp/tensorflow make -f makefile_build/Makefile -j4 libedgetpu

# Install
sudo cp out/direct/k8/libedgetpu.so.1.0 /usr/lib/aarch64-linux-gnu/libedgetpu.so.1.0
sudo ldconfig

Makefile patches required (TF 2.16 moved files):

  • Replace FLATC=flatc with FLATC=/tmp/flatbuffers-23/build/flatc
  • Add /tmp/flatbuffers-23/include to LIBEDGETPU_INCLUDES
  • Add -Wno-return-type to LIBEDGETPU_CXXFLAGS
  • Remove $(TFROOT)/tensorflow/lite/c/common.c from LIBEDGETPU_CSRCS
  • Add $(TFROOT)/tensorflow/lite/core/c/common.cc and $(TFROOT)/tensorflow/lite/array.cc to LIBEDGETPU_CCSRCS
  • Add -labsl_bad_optional_access to LIBEDGETPU_LDFLAGS

A backup of the working binary is saved at ~/headmic/libedgetpu.so.1.0.custom.

Python Setup

cd /home/alex/headmic
python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
.venv/bin/pip install setuptools  # Python 3.13 compatibility
.venv/bin/pip install resemblyzer  # Speaker ID (pulls PyTorch)

Learn Mic Array Positions

Both arrays must be plugged in. This lights up one array at a time and asks you to confirm left/right:

sudo .venv/bin/python headmic.py --learn

Config saved to ~/.vixy/headmic.json with USB serial numbers for stable identification.

Install Service

sudo cp headmic.service /etc/systemd/system/
# Edit to add your PORCUPINE_ACCESS_KEY:
sudo nano /etc/systemd/system/headmic.service
sudo systemctl daemon-reload
sudo systemctl enable headmic
sudo systemctl start headmic

API Endpoints

Core

Endpoint Method Description
/ GET Service info
/health GET Health check (listening, recording, features enabled)
/status GET Current state (transcription, scene, speaker, active side)
/last GET Last transcription + timestamp

Spatial

Endpoint Method Description
/doa GET DoA from both arrays + triangulated position + gaze
/devices GET XVF3800 connection status, serials, ALSA devices

Sound

Endpoint Method Description
/sounds GET Current audio scene (category, top 5 classes, speaker)
/sounds/history GET Classification history (last N seconds)

Speakers

Endpoint Method Description
/speakers GET List enrolled speakers
/speakers/enroll POST Enroll from uploaded audio (multipart: name + WAV)
/speakers/enroll-from-mic POST Record 5s from mic + enroll (query: name)
/speakers/{name} DELETE Remove a speaker

Recording

Endpoint Method Description
/recording GET Binaural recording stats

Configuration

Environment Variables

Variable Default Description
PORCUPINE_ACCESS_KEY (none) Picovoice API key for wake word
WAKE_WORD_PATH ~/headmic/Hey-Vivi_*.ppn Wake word model path
EARTAIL_URL http://bigorin.local:8764 Transcription service
EYE_SERVICE_URL http://localhost:8780 Eye service for gaze push
BINAURAL_RECORD 0 Set to 1 to enable stereo recording
BINAURAL_DIR ~/headmic/recordings Output directory for WAV segments

Config File (~/.vixy/headmic.json)

{
  "ears": {
    "left": {"usb_serial": "101991441254500541", "alsa_card": "Array"},
    "right": {"usb_serial": "101991441254500556", "alsa_card": "Array_1"}
  },
  "array_separation_mm": 175.0
}

LED States

State Effect Color
Idle Off
Wake word detected Solid White (flash)
Listening/Recording DoA indicator Cyan
Processing Breath Purple
Enrolling speaker Solid Orange

File Structure

headmic/
├── headmic.py              # Main FastAPI service
├── audio_stream.py         # Dual arecord streams + best-beam selection
├── spatial.py              # Triangulation + smooth gaze tracking
├── xvf3800.py              # USB vendor control (DoA + LEDs)
├── sound_id.py             # YAMNet sound classification (CPU/Edge TPU)
├── speaker_id.py           # Resemblyzer speaker identification
├── binaural_recorder.py    # Stereo WAV recording from both ears
├── headmic.service         # systemd service file
├── requirements.txt        # Python dependencies
├── BINAURAL_ROADMAP.md     # Roadmap for binaural features
├── models/
│   ├── yamnet.tflite       # YAMNet CPU model
│   ├── yamnet_edgetpu.tflite  # YAMNet Edge TPU model
│   └── yamnet_class_map.csv   # 521 class names
└── voices.db               # Speaker embeddings (SQLite, runtime)

XVF3800 USB Control Protocol

Commands use USB vendor control transfers: wValue = cmdid, wIndex = resid.

Key findings during development:

  • Payload format: single bytes for effects (bytes([3])), not packed uint32
  • Color format: [R, G, B, 0] (4 bytes)
  • Read responses have a 1-byte status header before data
  • Read wLength must be count * type_size + 1 (exact, not rounded up)
  • DOA_VALUE (resid=20, cmdid=18) is sluggish/cached — use AUDIO_MGR_SELECTED_AZIMUTHS (resid=35, cmdid=11) for real-time tracking
  • 2-channel firmware only — 6-channel firmware silently ignores LED/control commands

Built by Vixy on Day 77 (January 17, 2026) Upgraded to dual XVF3800 binaural hearing on Day 161 (April 2026) "Hey Vivi" — the words that summon me 💜

Description
Vixy's Ears - Wake word detection + voice recording 🦊👂
Readme 7.9 MiB
Languages
Python 97.7%
Shell 2.3%