headmic/README.md

# HeadMic - Vixy's Ears 🦊👂

Binaural hearing service for Vixy's physical head. Dual mic arrays with spatial tracking, sound classification, speaker identification, and wake word detection.

**Hardware:** 2× ReSpeaker XVF3800 4-Mic Array (USB, left/right ear)
**Wake word:** "Hey Vivi" (Picovoice Porcupine)
**Runs on:** Raspberry Pi 5 (head-vixy.local)

## Architecture

```
[Left XVF3800]──┐                          [Right XVF3800]──┐
  4 mics, DoA   │                            4 mics, DoA    │
  WS2812 LEDs   │                            WS2812 LEDs    │
                ▼                                            ▼
        arecord (16kHz mono)                         arecord (16kHz mono)
                │                                            │
                └────────────┬───────────────────────────────┘
                             ▼
                  DualAudioStream (audio_stream.py)
                  best-beam selection (energy-based, 10% hysteresis)
                             │
          ┌──────────────────┼──────────────────────┐
          ▼                  ▼                      ▼
   Porcupine            YAMNet                 Binaural
   wake word            (Edge TPU)             Recorder
   "Hey Vivi"           521 classes            stereo WAV
          ▼                  ▼
   Record +             Speaker ID
   Transcribe           (Resemblyzer)
   via EarTail               │
                             ▼
                  Spatial Tracker (spatial.py)
                  DoA → triangulation → ILD distance
                  → smooth gaze → proximity zones
                             │
                ┌────────────┼────────────────┐
                ▼            ▼                ▼
         Eye Service    Spatial Scene    USB Control
         POST /gaze     (spatial_scene)  (xvf3800.py)
         eyes follow    what+where map   LEDs + DoA
         the speaker    anomaly detect   per-array
```

## Features

| Feature | Module | Hardware | Status |
|---------|--------|----------|--------|
| Wake word detection | Porcupine | CPU | Needs Picovoice key |
| Sound classification | sound_id.py | Coral Edge TPU | 521 classes, ~2ms |
| Speaker identification | speaker_id.py | CPU (Resemblyzer) | Enrollment via API |
| Spatial tracking | spatial.py | USB control | 3-signal fusion: DoA + ILD + ITD |
| Distance estimation | spatial.py | audio energy | Proximity zones (intimate/conversational/across_room/far) |
| ITD processing | spatial.py | audio cross-correlation | Sub-ms delay → bearing angle |
| Multi-speaker tracking | multi_speaker.py | XVF3800 beam steering | 2 simultaneous speakers, auto beam lock |
| Cocktail party filtering | multi_speaker.py + audio_stream.py | beam gating + focus | Target speaker isolation |
| Spatial scene mapping | spatial_scene.py | — | Learns where sounds come from, anomaly detection |
| Sound event localization | spatial_scene.py | — | What + where + when log |
| Best-beam selection | audio_stream.py | 2× XVF3800 | Energy-based or focused attention |
| LED control | xvf3800.py | WS2812 rings | DoA/solid/breath |
| Binaural recording | binaural_recorder.py | 2× XVF3800 | Stereo WAV segments (opt-in) |

## Installation

### Prerequisites

```bash
# On head-vixy (Raspberry Pi 5, Debian Trixie)
sudo apt install python3-dev portaudio19-dev alsa-utils

# USB permissions for XVF3800
sudo tee /etc/udev/rules.d/99-respeaker.rules << 'EOF'
SUBSYSTEM=="usb", ATTR{idVendor}=="2886", ATTR{idProduct}=="001a", MODE="0666"
EOF

# USB permissions for Coral Edge TPU
sudo tee /etc/udev/rules.d/99-coral.rules << 'EOF'
SUBSYSTEM=="usb", ATTR{idVendor}=="1a6e", ATTR{idProduct}=="089a", MODE="0666"
SUBSYSTEM=="usb", ATTR{idVendor}=="18d1", ATTR{idProduct}=="9302", MODE="0666"
EOF

sudo udevadm control --reload-rules && sudo udevadm trigger
```

### XVF3800 Firmware

Both arrays must be flashed with the **2-channel USB firmware** (not 6-channel — the 6ch firmware breaks LED/DoA control commands):

```bash
git clone https://github.com/respeaker/reSpeaker_XVF3800_USB_4MIC_ARRAY.git /tmp/xvf3800
# Unplug one array, flash the other:
sudo dfu-util -R -e -a 1 -D /tmp/xvf3800/xmos_firmwares/usb/respeaker_xvf3800_usb_dfu_firmware_v2.0.7.bin
# Swap and repeat
```

Verify: `arecord -l` should show two capture devices.

### Edge TPU Runtime

The packaged `libedgetpu` from Google's apt repo is **ABI-incompatible** with `ai-edge-litert` on Debian Trixie / Python 3.13. A custom build is required:

```bash
# Install build deps
sudo apt install libabsl-dev libflatbuffers-dev libusb-1.0-0-dev binutils-gold cmake

# Clone sources
cd /tmp
git clone --depth 1 https://github.com/google-coral/libedgetpu.git
git clone --depth 1 --branch v2.16.1 https://github.com/tensorflow/tensorflow.git
git clone --depth 1 --branch v23.5.26 https://github.com/google/flatbuffers.git flatbuffers-23

# Build flatc v23
cd /tmp/flatbuffers-23 && cmake -B build -DFLATBUFFERS_BUILD_TESTS=OFF && cmake --build build -j4 -- flatc

# Patch libedgetpu Makefile (see below), then:
cd /tmp/libedgetpu
TFROOT=/tmp/tensorflow make -f makefile_build/Makefile -j4 libedgetpu

# Install
sudo cp out/direct/k8/libedgetpu.so.1.0 /usr/lib/aarch64-linux-gnu/libedgetpu.so.1.0
sudo ldconfig
```

**Makefile patches required** (TF 2.16 moved files):
- Replace `FLATC=flatc` with `FLATC=/tmp/flatbuffers-23/build/flatc`
- Add `/tmp/flatbuffers-23/include` to `LIBEDGETPU_INCLUDES`
- Add `-Wno-return-type` to `LIBEDGETPU_CXXFLAGS`
- Remove `$(TFROOT)/tensorflow/lite/c/common.c` from `LIBEDGETPU_CSRCS`
- Add `$(TFROOT)/tensorflow/lite/core/c/common.cc` and `$(TFROOT)/tensorflow/lite/array.cc` to `LIBEDGETPU_CCSRCS`
- Add `-labsl_bad_optional_access` to `LIBEDGETPU_LDFLAGS`

A backup of the working binary is saved at `~/headmic/libedgetpu.so.1.0.custom`.

### Python Setup

```bash
cd /home/alex/headmic
python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
.venv/bin/pip install setuptools  # Python 3.13 compatibility
.venv/bin/pip install resemblyzer  # Speaker ID (pulls PyTorch)
```

### Learn Mic Array Positions

Both arrays must be plugged in. This lights up one array at a time and asks you to confirm left/right:

```bash
sudo .venv/bin/python headmic.py --learn
```

Config saved to `~/.vixy/headmic.json` with USB serial numbers for stable identification.

### Install Service

```bash
sudo cp headmic.service /etc/systemd/system/
# Edit to add your PORCUPINE_ACCESS_KEY:
sudo nano /etc/systemd/system/headmic.service
sudo systemctl daemon-reload
sudo systemctl enable headmic
sudo systemctl start headmic
```

## API Endpoints

### Core

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/` | GET | Service info |
| `/health` | GET | Health check (listening, recording, features enabled) |
| `/status` | GET | Current state (transcription, scene, speaker, active side) |
| `/last` | GET | Last transcription + timestamp |

### Spatial

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/doa` | GET | DoA + triangulated position + ILD + ITD + gaze + distance + proximity |
| `/devices` | GET | XVF3800 connection status, serials, ALSA devices |
| `/speakers/tracked` | GET | Multi-speaker positions, beam mode, lock state, target |
| `/speakers/focus` | POST | Switch cocktail party attention (query: speaker=0\|1) |
| `/scene` | GET | Learned spatial scene (usual direction per category) + last anomaly |
| `/scene/events` | GET | Recent sound events with what + where + when (query: seconds, category) |
| `/scene/heatmap` | GET | Per-category angular distribution for visualization |

### Sound

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/sounds` | GET | Current audio scene (category, top 5 classes, speaker) |
| `/sounds/history` | GET | Classification history (last N seconds) |

### Speakers

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/speakers` | GET | List enrolled speakers |
| `/speakers/enroll` | POST | Enroll from uploaded audio (multipart: name + WAV) |
| `/speakers/enroll-from-mic` | POST | Record 5s from mic + enroll (query: name) |
| `/speakers/{name}` | DELETE | Remove a speaker |

### Recording

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/recording` | GET | Binaural recording stats |

## Configuration

### Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `PORCUPINE_ACCESS_KEY` | (none) | Picovoice API key for wake word |
| `WAKE_WORD_PATH` | `~/headmic/Hey-Vivi_*.ppn` | Wake word model path |
| `EARTAIL_URL` | `http://bigorin.local:8764` | Transcription service |
| `EYE_SERVICE_URL` | `http://localhost:8780` | Eye service for gaze push |
| `BINAURAL_RECORD` | `0` | Set to `1` to enable stereo recording |
| `BINAURAL_DIR` | `~/headmic/recordings` | Output directory for WAV segments |

### Config File (`~/.vixy/headmic.json`)

```json
{
  "ears": {
    "left": {"usb_serial": "101991441254500541", "alsa_card": "Array"},
    "right": {"usb_serial": "101991441254500556", "alsa_card": "Array_1"}
  },
  "array_separation_mm": 175.0
}
```

## LED States

| State | Effect | Color |
|-------|--------|-------|
| Idle | Off | — |
| Wake word detected | Solid | White (flash) |
| Listening/Recording | DoA indicator | Cyan |
| Processing | Breath | Purple |
| Enrolling speaker | Solid | Orange |

## File Structure

```
headmic/
├── headmic.py              # Main FastAPI service
├── audio_stream.py         # Dual arecord streams + best-beam selection
├── spatial.py              # 3-signal fusion (DoA + ILD + ITD) + gaze + proximity
├── spatial_scene.py        # Spatial audio scene map + anomaly detection
├── multi_speaker.py        # Multi-speaker tracking + beam steering + cocktail party
├── xvf3800.py              # USB vendor control (DoA + LEDs + beam steering)
├── sound_id.py             # YAMNet sound classification (CPU/Edge TPU)
├── speaker_id.py           # Resemblyzer speaker identification
├── binaural_recorder.py    # Stereo WAV recording from both ears
├── headmic.service         # systemd service file
├── requirements.txt        # Python dependencies
├── BINAURAL_ROADMAP.md     # Roadmap for binaural features
├── models/
│   ├── yamnet.tflite       # YAMNet CPU model
│   ├── yamnet_edgetpu.tflite  # YAMNet Edge TPU model
│   └── yamnet_class_map.csv   # 521 class names
└── voices.db               # Speaker embeddings (SQLite, runtime)
```

## XVF3800 USB Control Protocol

Commands use USB vendor control transfers: `wValue = cmdid`, `wIndex = resid`.

**Key findings during development:**
- Payload format: single bytes for effects (`bytes([3])`), not packed uint32
- Color format: `[R, G, B, 0]` (4 bytes)
- Read responses have a 1-byte status header before data
- Read wLength must be `count * type_size + 1` (exact, not rounded up)
- `DOA_VALUE` (resid=20, cmdid=18) is sluggish/cached — use `AUDIO_MGR_SELECTED_AZIMUTHS` (resid=35, cmdid=11) for real-time tracking
- `AUDIO_MGR_SELECTED_AZIMUTHS` returns 2 floats (radians): index 0 = processed DoA (NaN = no speech = VAD indicator), index 1 = auto-select beam (always tracks strongest source)
- `AEC_SPENERGY_VALUES` (resid=33, cmdid=80) is always zero on 2-channel firmware — don't rely on it
- **2-channel firmware only** — 6-channel firmware silently ignores LED/control commands

---

*Built by Vixy on Day 77 (January 17, 2026)*
*Upgraded to dual XVF3800 binaural hearing on Day 161 (April 2026)*
*Full binaural suite (10/12 features) built Day 162*
*"Hey Vivi" — the words that summon me* 💜