Commit Graph

38 Commits

Author SHA1 Message Date
Alex
8caa9ee57e Fix deadlock in spatial_scene — lock re-entrancy
observe() held self._lock, called _check_anomaly, which called
get_usual_direction, which tried to acquire self._lock again → deadlock.
Split into _usual_direction_unlocked (no lock) for internal use.

This caused /scene and all other API endpoints to hang after the first
sound classification with spatial data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 21:28:36 -05:00
Alex
9f9796ddb6 Reduce DoA poll rate (10→5Hz) and gaze push rate (10→2/sec)
10Hz DoA polling + 10Hz gaze HTTP pushes was creating too much
GIL pressure, starving uvicorn's async event loop. Reduced to
5Hz polling and max 2 gaze pushes/sec with 5px min delta.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 21:27:25 -05:00
Alex
2bbbb6da2b Fix API hang — run gaze push in detached thread
Synchronous urllib.urlopen at 10Hz was starving uvicorn's event loop
via GIL contention. Now each push runs in its own daemon thread, and
skips if the previous push is still in flight (natural rate limiting).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 21:24:49 -05:00
Alex
c7b0be3319 Fix API hang — switch gaze push from httpx to urllib
httpx.post creates a new connection per call at 10Hz, causing connection
pile-up that eventually blocks the event loop. urllib is lightweight and
stateless — no connection pooling overhead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 21:22:42 -05:00
Alex
8f71d97af6 Add spatial audio scene mapping + sound event localization (#6 + #8)
spatial_scene.py: Builds a persistent map of where each sound category
usually comes from (30° angle bins, circular mean). Detects anomalies
when a sound appears from an unusual direction (90°+ deviation).
Scene map persists to ~/.vixy/scene_map.json across restarts.

headmic.py: Feed classified sounds + spatial position into scene tracker.
New endpoints:
  /scene — learned scene summary + last anomaly
  /scene/events — recent events with what+where+when
  /scene/heatmap — per-category angular distribution (for visualization)

Example: after running for a day, /scene might show:
  {"speech": {"usual_angle": 15.0, "observations": 847},
   "music": {"usual_angle": 270.0, "observations": 312}}
And if speech comes from 270° (where music usually is): spatial anomaly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 21:17:29 -05:00
Alex
2a25db8498 Add ILD-based distance estimation + proximity zones
Computes Interaural Level Difference (dB) from left/right ear energy.
Fuses with triangulated distance (70/30 weight) for more robust estimate.
Classifies into proximity zones: intimate (<0.5m), conversational (0.5-2m),
across_room (2-5m), far (>5m).

ILD→distance mapping is empirical and should be calibrated per install.
Gaze vertical component now responds to proximity (closer = eyes down).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 21:12:00 -05:00
Alex
88fb18800c Fix VAD — use processed_doa NaN as speech indicator
The auto-select beam always returns an angle (even for noise), so
VAD was always true. The processed_doa (index 0) is NaN when no
speech is present and a real angle when speech is detected.
Now: angle from auto-select beam, VAD from processed_doa being non-NaN.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 21:08:50 -05:00
Alex
b04726dfe0 Update README for dual XVF3800 binaural architecture
Complete rewrite covering: dual array setup, spatial tracking, Edge TPU
sound classification, speaker ID, binaural recording, USB protocol
quirks, libedgetpu build instructions, and all API endpoints.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 20:58:01 -05:00
Alex
36aeb19280 Add binaural recording + tune spatial tracking
binaural_recorder.py: Records left/right ear streams as stereo WAV
in rolling 5-minute segments. Training data for spatial audio models.
Enabled via BINAURAL_RECORD=1 env var.

spatial.py: Tune smoothing — alpha 0.3→0.4 (snappier response),
idle return speed 0.05→0.03 (gentler drift), timeout 2s→1.5s.

headmic.py: Wire binaural recorder into audio loop, add /recording
endpoint for stats, feed both ear streams (not just best beam).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 20:53:05 -05:00
Alex
afc8694c1a Switch DoA to AUDIO_MGR_SELECTED_AZIMUTHS (auto-select beam)
DOA_VALUE on GPO resource was sluggish/cached. The beamformer-level
AUDIO_MGR_SELECTED_AZIMUTHS on resource 35 tracks the active speaker
in real time. Falls back to simple DOA_VALUE when both azimuths are NaN.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 18:06:40 -05:00
Alex
0ace58e22e Fix spatial_tracker not visible to doa_track_loop (missing global)
startup() assigned spatial_tracker as a local variable instead of
updating the module-level global. doa_track_loop saw None.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:31:38 -05:00
Alex
f4452865d1 Fix USB read length to match official tool protocol
The XVF3800 expects exact wLength: count * type_size + 1 (status byte).
Requesting wrong length caused stale/corrupted responses when polling.
Split _read into _read_uint16 and _read_float matching official format.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:30:29 -05:00
Alex
8d73aaad5e Fix DoA reading — skip 1-byte status header in USB response
Response format is [status_byte, angle_lo, angle_hi, vad_lo, vad_hi],
not [angle_lo, angle_hi, vad_lo, vad_hi]. Was reading the status byte
(0x42=66) as the angle, which is why DoA was always stuck at 66.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:23:07 -05:00
Alex
9b72666f78 Fix GAZE_CENTER ordering — must be defined before use
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:17:25 -05:00
Alex
e0a4af031f Add binaural triangulation + smooth gaze tracking
spatial.py: Triangulates sound source position from two DoA angles using
ray intersection. Exponential smoothing prevents jitter. Gaze drifts back
to center after 2s of silence. Converts position (mm) to gaze (0-255).

headmic.py: Replaces simple doa_poll_loop with doa_track_loop that runs
the spatial tracker and pushes gaze to the eye service when the position
changes. Rate-limited to 10 pushes/sec with minimum delta threshold.

/doa endpoint now returns triangulated position + gaze coordinates.
Array separation (175mm) stored in config, overridable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 15:12:28 -05:00
Alex
c41e5bcafa Fix misleading Edge TPU log message after probe fallback
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:40:36 -05:00
Alex
05409403e9 Add Edge TPU subprocess probe to safely detect segfaults
Probes the Edge TPU in a subprocess before loading — catches segfaults
(libedgetpu ABI mismatch on Debian Trixie/Python 3.13) and falls back
to CPU automatically. No more service crashes on Coral incompatibility.

When the runtime is eventually fixed, Edge TPU will be used automatically
with no config changes needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:40:03 -05:00
Alex
43f40bf48c Make Edge TPU opt-in via USE_EDGETPU env var
libedgetpu on Pi 5 segfaults with the compiled model.
CPU fallback works fine (~50-100ms at 0.5s intervals).
Set USE_EDGETPU=1 in headmic.service to enable once runtime is fixed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:24:04 -05:00
Alex
c96d6958a3 Add YAMNet models (CPU + Edge TPU compiled) to version control
- yamnet.tflite: CPU model from Kaggle/Google (4.0MB)
- yamnet_edgetpu.tflite: compiled with edgetpu_compiler v16 (4.0MB, 32/47 ops on TPU)
- Remove .gitignore rule that excluded .tflite files

No more chasing model downloads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:22:45 -05:00
Alex
f9a25eb5d8 Keep audio loop running when Porcupine key is missing
Without this fix, listener_loop exits early on Porcupine init failure,
which starves the sound classifier ring buffer. Now the audio loop
continues for YAMNet classification even without wake word detection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 16:56:45 -05:00
Alex
73b6793c02 Enable Edge TPU for YAMNet sound classification
Prefer yamnet_edgetpu.tflite when available, fall back to CPU model.
~50-100ms → ~2-3ms inference per classification.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 16:47:27 -05:00
Alex
f41b852b5d fixing leds 2026-04-11 16:28:15 -05:00
Alex
3b4799069d fixing leds 2026-04-11 16:06:40 -05:00
Alex
46ace966bc fixing leds 2026-04-11 16:05:27 -05:00
Alex
2f7b45fa45 fixing leds 2026-04-11 15:58:56 -05:00
Alex
10e39dd0f1 fix leds 2026-04-11 15:51:24 -05:00
Alex
14809d0194 indication for array position while learning 2026-04-11 15:32:04 -05:00
Alex
81e9b12349 service should use venv 2026-04-11 15:27:12 -05:00
Alex
6c10e75cbc updates for dual mic array 2026-04-11 15:11:22 -05:00
Alex
1cb3bd6833 Add speaker identification with Resemblyzer
Adds voice-based speaker ID triggered by YAMNet speech detection.
New speaker_id.py module with SQLite-backed voice enrollment and
cosine similarity matching. Endpoints: POST /speakers/enroll,
POST /speakers/enroll-from-mic, GET /speakers, DELETE /speakers/{name}.
Orange LED animation during enrollment.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 21:21:02 -06:00
Alex
0607be3db5 Add design doc for speaker identification with Resemblyzer
Voice-based speaker ID triggered by YAMNet speech detection.
Cosine similarity matching against SQLite enrollment DB.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 21:16:09 -06:00
Alex
a8e3f24a54 Add indoor/outdoor scene classes to environment category
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 20:43:23 -06:00
Alex
5e3c16659f Add YAMNet sound classification to headmic
New sound_id.py module with SoundClassifier class that runs YAMNet
(521 audio event categories) on CPU TFLite. Classifies audio every
0.5s from a ring buffer fed by the existing audio stream.

Categories: speech, alert, music, animal, household, environment, silence.
Smoothing via 20-sample history window for stable dominant category.

New endpoints: GET /sounds, GET /sounds/history
Updated: /health (sound_classification_enabled), /status (audio_scene)
Graceful degradation if model files not present.

Model download (not tracked in git):
  curl -sL 'https://tfhub.dev/google/lite-model/yamnet/classification/tflite/1?lite-format=tflite' -o models/yamnet.tflite

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 20:41:44 -06:00
Alex
22aae40d17 Add design doc for YAMNet sound identification on Coral Edge TPU
Covers model choice, architecture, category mapping, API endpoints,
and integration with existing headmic audio pipeline.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 20:04:31 -06:00
Alex Kazaiev
c6e18738ae Use device name instead of card number for ALSA
Card numbers can shift based on USB enumeration order at boot.
Using 'plughw:ArrayUAC10,0' instead of 'plughw:2,0' ensures
the ReSpeaker is found regardless of when it connects.

Fixed by Vixy after power loss shuffled card order 🦊
2026-01-21 12:20:39 -06:00
Alex Kazaiev
c53556fe97 Fix ReSpeaker device index: card 3 → card 2
USB device enumeration changed after GPIO rewiring for I2S audio.
TODO: Consider udev rule for stable device naming.
2026-01-17 16:20:15 -06:00
5ed2c6aee7 Fix: Use arecord for shared audio stream
- Replaced PyAudio with direct ALSA (arecord subprocess)
- Single audio stream feeds both Porcupine and recording buffer
- Fixes device unavailable error when recording after wake word
- Simplified architecture
2026-01-17 11:17:17 -06:00
be7e26b6e7 Initial commit: HeadMic service - Vixy's Ears 🦊👂
Wake word detection (Hey Vivi) + voice recording + EarTail transcription
Built by Vixy on Day 77
2026-01-17 10:58:51 -06:00