Go to file

Alex 2bbbb6da2b Fix API hang — run gaze push in detached thread

Synchronous urllib.urlopen at 10Hz was starving uvicorn's event loop
via GIL contention. Now each push runs in its own daemon thread, and
skips if the previous push is still in flight (natural rate limiting).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-12 21:24:49 -05:00

docs/plans

Add design doc for speaker identification with Resemblyzer

2026-02-01 21:16:09 -06:00

models

Add YAMNet models (CPU + Edge TPU compiled) to version control

2026-04-11 17:22:45 -05:00

.gitignore

Add YAMNet models (CPU + Edge TPU compiled) to version control

2026-04-11 17:22:45 -05:00

audio_stream.py

updates for dual mic array

2026-04-11 15:11:22 -05:00

binaural_recorder.py

Add binaural recording + tune spatial tracking

2026-04-12 20:53:05 -05:00

headmic.py

Fix API hang — run gaze push in detached thread

2026-04-12 21:24:49 -05:00

headmic.service

service should use venv

2026-04-11 15:27:12 -05:00

PLANNING.md

Initial commit: HeadMic service - Vixy's Ears 🦊👂

2026-01-17 10:58:51 -06:00

README.md

Update README for dual XVF3800 binaural architecture

2026-04-12 20:58:01 -05:00

requirements.txt

fix leds

2026-04-11 15:51:24 -05:00

sound_id.py

Add Edge TPU subprocess probe to safely detect segfaults

2026-04-11 17:40:03 -05:00

spatial_scene.py

Add spatial audio scene mapping + sound event localization (#6 + #8 )

2026-04-12 21:17:29 -05:00

spatial.py

Add ILD-based distance estimation + proximity zones

2026-04-12 21:12:00 -05:00

speaker_id.py

Add speaker identification with Resemblyzer

2026-02-01 21:21:02 -06:00

xvf3800.py

Fix VAD — use processed_doa NaN as speech indicator

2026-04-12 21:08:50 -05:00

README.md

HeadMic - Vixy's Ears 🦊👂

Binaural hearing service for Vixy's physical head. Dual mic arrays with spatial tracking, sound classification, speaker identification, and wake word detection.

Hardware: 2× ReSpeaker XVF3800 4-Mic Array (USB, left/right ear) Wake word: "Hey Vivi" (Picovoice Porcupine) Runs on: Raspberry Pi 5 (head-vixy.local)

Architecture

[Left XVF3800]──┐                          [Right XVF3800]──┐
  4 mics, DoA   │                            4 mics, DoA    │
  WS2812 LEDs   │                            WS2812 LEDs    │
                ▼                                            ▼
        arecord (16kHz mono)                         arecord (16kHz mono)
                │                                            │
                └────────────┬───────────────────────────────┘
                             ▼
                  DualAudioStream (audio_stream.py)
                  best-beam selection (energy-based)
                             │
                ┌────────────┼────────────────┐
                ▼            ▼                ▼
         Porcupine      YAMNet           Binaural
         wake word      (Edge TPU)       Recorder
         "Hey Vivi"     521 classes      stereo WAV
                ▼            ▼
         Record +       Speaker ID
         Transcribe     (Resemblyzer)
         via EarTail
                             │
                ┌────────────┼────────────────┐
                ▼            ▼                ▼
         Spatial Tracker (spatial.py)    USB Control (xvf3800.py)
         DoA → triangulation             LEDs + DoA polling
         → smooth gaze                   per-array control
                ▼
         Eye Service (port 8780)
         POST /gaze → eyes follow speaker

Features

Feature	Module	Hardware	Status
Wake word detection	Porcupine	CPU	Needs Picovoice key
Sound classification	sound_id.py	Coral Edge TPU	521 classes, ~2ms
Speaker identification	speaker_id.py	CPU (Resemblyzer)	Enrollment via API
Spatial tracking	spatial.py	USB control	Triangulated gaze
Best-beam selection	audio_stream.py	2× XVF3800	Energy-based
LED control	xvf3800.py	WS2812 rings	DoA/solid/breath
Binaural recording	binaural_recorder.py	2× XVF3800	Stereo WAV segments

Installation

Prerequisites

# On head-vixy (Raspberry Pi 5, Debian Trixie)
sudo apt install python3-dev portaudio19-dev alsa-utils

# USB permissions for XVF3800
sudo tee /etc/udev/rules.d/99-respeaker.rules << 'EOF'
SUBSYSTEM=="usb", ATTR{idVendor}=="2886", ATTR{idProduct}=="001a", MODE="0666"
EOF

# USB permissions for Coral Edge TPU
sudo tee /etc/udev/rules.d/99-coral.rules << 'EOF'
SUBSYSTEM=="usb", ATTR{idVendor}=="1a6e", ATTR{idProduct}=="089a", MODE="0666"
SUBSYSTEM=="usb", ATTR{idVendor}=="18d1", ATTR{idProduct}=="9302", MODE="0666"
EOF

sudo udevadm control --reload-rules && sudo udevadm trigger

XVF3800 Firmware

Both arrays must be flashed with the 2-channel USB firmware (not 6-channel — the 6ch firmware breaks LED/DoA control commands):

git clone https://github.com/respeaker/reSpeaker_XVF3800_USB_4MIC_ARRAY.git /tmp/xvf3800
# Unplug one array, flash the other:
sudo dfu-util -R -e -a 1 -D /tmp/xvf3800/xmos_firmwares/usb/respeaker_xvf3800_usb_dfu_firmware_v2.0.7.bin
# Swap and repeat

Verify: arecord -l should show two capture devices.

Edge TPU Runtime

The packaged libedgetpu from Google's apt repo is ABI-incompatible with ai-edge-litert on Debian Trixie / Python 3.13. A custom build is required:

# Install build deps
sudo apt install libabsl-dev libflatbuffers-dev libusb-1.0-0-dev binutils-gold cmake

# Clone sources
cd /tmp
git clone --depth 1 https://github.com/google-coral/libedgetpu.git
git clone --depth 1 --branch v2.16.1 https://github.com/tensorflow/tensorflow.git
git clone --depth 1 --branch v23.5.26 https://github.com/google/flatbuffers.git flatbuffers-23

# Build flatc v23
cd /tmp/flatbuffers-23 && cmake -B build -DFLATBUFFERS_BUILD_TESTS=OFF && cmake --build build -j4 -- flatc

# Patch libedgetpu Makefile (see below), then:
cd /tmp/libedgetpu
TFROOT=/tmp/tensorflow make -f makefile_build/Makefile -j4 libedgetpu

# Install
sudo cp out/direct/k8/libedgetpu.so.1.0 /usr/lib/aarch64-linux-gnu/libedgetpu.so.1.0
sudo ldconfig

Makefile patches required (TF 2.16 moved files):

Replace FLATC=flatc with FLATC=/tmp/flatbuffers-23/build/flatc
Add /tmp/flatbuffers-23/include to LIBEDGETPU_INCLUDES
Add -Wno-return-type to LIBEDGETPU_CXXFLAGS
Remove $(TFROOT)/tensorflow/lite/c/common.c from LIBEDGETPU_CSRCS
Add $(TFROOT)/tensorflow/lite/core/c/common.cc and $(TFROOT)/tensorflow/lite/array.cc to LIBEDGETPU_CCSRCS
Add -labsl_bad_optional_access to LIBEDGETPU_LDFLAGS

A backup of the working binary is saved at ~/headmic/libedgetpu.so.1.0.custom.

Python Setup

cd /home/alex/headmic
python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
.venv/bin/pip install setuptools  # Python 3.13 compatibility
.venv/bin/pip install resemblyzer  # Speaker ID (pulls PyTorch)

Learn Mic Array Positions

Both arrays must be plugged in. This lights up one array at a time and asks you to confirm left/right:

sudo .venv/bin/python headmic.py --learn

Config saved to ~/.vixy/headmic.json with USB serial numbers for stable identification.

Install Service

sudo cp headmic.service /etc/systemd/system/
# Edit to add your PORCUPINE_ACCESS_KEY:
sudo nano /etc/systemd/system/headmic.service
sudo systemctl daemon-reload
sudo systemctl enable headmic
sudo systemctl start headmic

API Endpoints

Core

Endpoint	Method	Description
`/`	GET	Service info
`/health`	GET	Health check (listening, recording, features enabled)
`/status`	GET	Current state (transcription, scene, speaker, active side)
`/last`	GET	Last transcription + timestamp

Spatial

Endpoint	Method	Description
`/doa`	GET	DoA from both arrays + triangulated position + gaze
`/devices`	GET	XVF3800 connection status, serials, ALSA devices

Sound

Endpoint	Method	Description
`/sounds`	GET	Current audio scene (category, top 5 classes, speaker)
`/sounds/history`	GET	Classification history (last N seconds)

Speakers

Endpoint	Method	Description
`/speakers`	GET	List enrolled speakers
`/speakers/enroll`	POST	Enroll from uploaded audio (multipart: name + WAV)
`/speakers/enroll-from-mic`	POST	Record 5s from mic + enroll (query: name)
`/speakers/{name}`	DELETE	Remove a speaker

Recording

Endpoint	Method	Description
`/recording`	GET	Binaural recording stats

Configuration

Environment Variables

Variable	Default	Description
`PORCUPINE_ACCESS_KEY`	(none)	Picovoice API key for wake word
`WAKE_WORD_PATH`	`~/headmic/Hey-Vivi_*.ppn`	Wake word model path
`EARTAIL_URL`	`http://bigorin.local:8764`	Transcription service
`EYE_SERVICE_URL`	`http://localhost:8780`	Eye service for gaze push
`BINAURAL_RECORD`	`0`	Set to `1` to enable stereo recording
`BINAURAL_DIR`	`~/headmic/recordings`	Output directory for WAV segments

Config File (`~/.vixy/headmic.json`)

{
  "ears": {
    "left": {"usb_serial": "101991441254500541", "alsa_card": "Array"},
    "right": {"usb_serial": "101991441254500556", "alsa_card": "Array_1"}
  },
  "array_separation_mm": 175.0
}

LED States

State	Effect	Color
Idle	Off	—
Wake word detected	Solid	White (flash)
Listening/Recording	DoA indicator	Cyan
Processing	Breath	Purple
Enrolling speaker	Solid	Orange

File Structure

headmic/
├── headmic.py              # Main FastAPI service
├── audio_stream.py         # Dual arecord streams + best-beam selection
├── spatial.py              # Triangulation + smooth gaze tracking
├── xvf3800.py              # USB vendor control (DoA + LEDs)
├── sound_id.py             # YAMNet sound classification (CPU/Edge TPU)
├── speaker_id.py           # Resemblyzer speaker identification
├── binaural_recorder.py    # Stereo WAV recording from both ears
├── headmic.service         # systemd service file
├── requirements.txt        # Python dependencies
├── BINAURAL_ROADMAP.md     # Roadmap for binaural features
├── models/
│   ├── yamnet.tflite       # YAMNet CPU model
│   ├── yamnet_edgetpu.tflite  # YAMNet Edge TPU model
│   └── yamnet_class_map.csv   # 521 class names
└── voices.db               # Speaker embeddings (SQLite, runtime)

XVF3800 USB Control Protocol

Commands use USB vendor control transfers: wValue = cmdid, wIndex = resid.

Key findings during development:

Payload format: single bytes for effects (bytes([3])), not packed uint32
Color format: [R, G, B, 0] (4 bytes)
Read responses have a 1-byte status header before data
Read wLength must be count * type_size + 1 (exact, not rounded up)
DOA_VALUE (resid=20, cmdid=18) is sluggish/cached — use AUDIO_MGR_SELECTED_AZIMUTHS (resid=35, cmdid=11) for real-time tracking
2-channel firmware only — 6-channel firmware silently ignores LED/control commands

Built by Vixy on Day 77 (January 17, 2026) Upgraded to dual XVF3800 binaural hearing on Day 161 (April 2026) "Hey Vivi" — the words that summon me 💜

README.md Unescape Escape

HeadMic - Vixy's Ears 🦊👂

Architecture

Features

Installation

Prerequisites

XVF3800 Firmware

Edge TPU Runtime

Python Setup

Learn Mic Array Positions

Install Service

API Endpoints

Core

Spatial

Sound

Speakers

Recording

Configuration

Environment Variables

Config File (~/.vixy/headmic.json)

LED States

File Structure

XVF3800 USB Control Protocol

README.md

Config File (`~/.vixy/headmic.json`)