headmic

Author	SHA1	Message	Date
Alex	05034acd27	Add anonymous speaker tracking (online diarization) Unrecognized speakers now get stable IDs like "unknown_a7f3" instead of None. Uses online clustering of Resemblyzer embeddings: - Matches against tracked anonymous speakers (cosine > 0.70) - Updates running average embedding on re-identification - Creates new ID from SHA-256 hash of quantized embedding - Expires after 1 hour of silence, max 10 tracked simultaneously New API: POST /speakers/promote?anon_id=unknown_a7f3&name=Alex Promotes an anonymous speaker to enrolled using their averaged embedding. Flow: unknown person speaks → "unknown_a7f3" → you ask "who's that?" → promote to "Bob" → now recognized by name going forward. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 21:58:30 -05:00
Alex	cae14023b7	Add ITD (Interaural Time Difference) via cross-correlation (#12 ) Cross-correlates left/right ear audio frames (512 samples, ~32ms window) to find the sub-millisecond delay between arrays. Converts delay to bearing angle using speed of sound and array separation. At 16kHz with 175mm separation, resolution is ~1 sample = 62.5μs = ~7°. Not lab-grade, but adds a third independent angle estimate alongside DoA and ILD. Works with current 2-channel firmware — no raw mics needed. New fields in /doa spatial response: itd_angle: bearing from cross-correlation (degrees) itd_delay_us: raw time delay (microseconds, positive = source on right) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 21:51:25 -05:00
Alex	0705b3818b	Add cocktail party spatial filtering (#7 ) audio_stream.py: Added focus_side property. When set, the stream yields from the focused side regardless of energy (attention lock). When None, falls back to energy-based auto selection. multi_speaker.py: When beams lock onto 2 speakers, sets audio focus to the target speaker's side. Auto-switches target when the current target goes silent and the other starts talking. Manual focus via API. headmic.py: New endpoint POST /speakers/focus?speaker=0\|1 to manually switch attention. /speakers/tracked now shows is_target, target_speaker, and audio_focus fields. The cocktail party effect: when 2 people are talking, the audio feed to Porcupine/VAD/transcription comes from the target speaker's direction, suppressing the other. XVF3800 beam gating silences the non-speaking beam, and audio_stream focus locks the ear facing the target. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 21:47:30 -05:00
Alex	38d21ef53c	Add multi-speaker tracking with beam steering (#5 ) multi_speaker.py: Tracks up to 2 speakers simultaneously. When 2 distinct DoA angles are detected (30°+ apart) for >1s, locks the XVF3800's fixed beams onto each speaker. Releases back to auto mode when only 1 speaker remains (3s timeout). Manages beam gating so only the speaking beam is active. xvf3800.py: Added beam steering commands — enable_fixed_beams(), set_beam_azimuths(), enable_beam_gating(), read_all_beams(). Manager gets steer_beams() and release_beams() convenience methods. headmic.py: Wire multi-speaker tracker into DoA loop. New endpoint: GET /speakers/tracked — current speaker positions, beam mode, lock state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 21:37:49 -05:00
Alex	9f9796ddb6	Reduce DoA poll rate (10→5Hz) and gaze push rate (10→2/sec) 10Hz DoA polling + 10Hz gaze HTTP pushes was creating too much GIL pressure, starving uvicorn's async event loop. Reduced to 5Hz polling and max 2 gaze pushes/sec with 5px min delta. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 21:27:25 -05:00
Alex	2bbbb6da2b	Fix API hang — run gaze push in detached thread Synchronous urllib.urlopen at 10Hz was starving uvicorn's event loop via GIL contention. Now each push runs in its own daemon thread, and skips if the previous push is still in flight (natural rate limiting). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 21:24:49 -05:00
Alex	c7b0be3319	Fix API hang — switch gaze push from httpx to urllib httpx.post creates a new connection per call at 10Hz, causing connection pile-up that eventually blocks the event loop. urllib is lightweight and stateless — no connection pooling overhead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 21:22:42 -05:00
Alex	8f71d97af6	Add spatial audio scene mapping + sound event localization (#6 + #8 ) spatial_scene.py: Builds a persistent map of where each sound category usually comes from (30° angle bins, circular mean). Detects anomalies when a sound appears from an unusual direction (90°+ deviation). Scene map persists to ~/.vixy/scene_map.json across restarts. headmic.py: Feed classified sounds + spatial position into scene tracker. New endpoints: /scene — learned scene summary + last anomaly /scene/events — recent events with what+where+when /scene/heatmap — per-category angular distribution (for visualization) Example: after running for a day, /scene might show: {"speech": {"usual_angle": 15.0, "observations": 847}, "music": {"usual_angle": 270.0, "observations": 312}} And if speech comes from 270° (where music usually is): spatial anomaly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 21:17:29 -05:00
Alex	2a25db8498	Add ILD-based distance estimation + proximity zones Computes Interaural Level Difference (dB) from left/right ear energy. Fuses with triangulated distance (70/30 weight) for more robust estimate. Classifies into proximity zones: intimate (<0.5m), conversational (0.5-2m), across_room (2-5m), far (>5m). ILD→distance mapping is empirical and should be calibrated per install. Gaze vertical component now responds to proximity (closer = eyes down). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 21:12:00 -05:00
Alex	36aeb19280	Add binaural recording + tune spatial tracking binaural_recorder.py: Records left/right ear streams as stereo WAV in rolling 5-minute segments. Training data for spatial audio models. Enabled via BINAURAL_RECORD=1 env var. spatial.py: Tune smoothing — alpha 0.3→0.4 (snappier response), idle return speed 0.05→0.03 (gentler drift), timeout 2s→1.5s. headmic.py: Wire binaural recorder into audio loop, add /recording endpoint for stats, feed both ear streams (not just best beam). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 20:53:05 -05:00
Alex	0ace58e22e	Fix spatial_tracker not visible to doa_track_loop (missing global) startup() assigned spatial_tracker as a local variable instead of updating the module-level global. doa_track_loop saw None. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 17:31:38 -05:00
Alex	9b72666f78	Fix GAZE_CENTER ordering — must be defined before use Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 17:17:25 -05:00
Alex	e0a4af031f	Add binaural triangulation + smooth gaze tracking spatial.py: Triangulates sound source position from two DoA angles using ray intersection. Exponential smoothing prevents jitter. Gaze drifts back to center after 2s of silence. Converts position (mm) to gaze (0-255). headmic.py: Replaces simple doa_poll_loop with doa_track_loop that runs the spatial tracker and pushes gaze to the eye service when the position changes. Rate-limited to 10 pushes/sec with minimum delta threshold. /doa endpoint now returns triangulated position + gaze coordinates. Array separation (175mm) stored in config, overridable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 15:12:28 -05:00
Alex	c41e5bcafa	Fix misleading Edge TPU log message after probe fallback Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 17:40:36 -05:00
Alex	05409403e9	Add Edge TPU subprocess probe to safely detect segfaults Probes the Edge TPU in a subprocess before loading — catches segfaults (libedgetpu ABI mismatch on Debian Trixie/Python 3.13) and falls back to CPU automatically. No more service crashes on Coral incompatibility. When the runtime is eventually fixed, Edge TPU will be used automatically with no config changes needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 17:40:03 -05:00
Alex	43f40bf48c	Make Edge TPU opt-in via USE_EDGETPU env var libedgetpu on Pi 5 segfaults with the compiled model. CPU fallback works fine (~50-100ms at 0.5s intervals). Set USE_EDGETPU=1 in headmic.service to enable once runtime is fixed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 17:24:04 -05:00
Alex	f9a25eb5d8	Keep audio loop running when Porcupine key is missing Without this fix, listener_loop exits early on Porcupine init failure, which starves the sound classifier ring buffer. Now the audio loop continues for YAMNet classification even without wake word detection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 16:56:45 -05:00
Alex	73b6793c02	Enable Edge TPU for YAMNet sound classification Prefer yamnet_edgetpu.tflite when available, fall back to CPU model. ~50-100ms → ~2-3ms inference per classification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 16:47:27 -05:00
Alex	14809d0194	indication for array position while learning	2026-04-11 15:32:04 -05:00
Alex	6c10e75cbc	updates for dual mic array	2026-04-11 15:11:22 -05:00
Alex	1cb3bd6833	Add speaker identification with Resemblyzer Adds voice-based speaker ID triggered by YAMNet speech detection. New speaker_id.py module with SQLite-backed voice enrollment and cosine similarity matching. Endpoints: POST /speakers/enroll, POST /speakers/enroll-from-mic, GET /speakers, DELETE /speakers/{name}. Orange LED animation during enrollment. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-01 21:21:02 -06:00
Alex	5e3c16659f	Add YAMNet sound classification to headmic New sound_id.py module with SoundClassifier class that runs YAMNet (521 audio event categories) on CPU TFLite. Classifies audio every 0.5s from a ring buffer fed by the existing audio stream. Categories: speech, alert, music, animal, household, environment, silence. Smoothing via 20-sample history window for stable dominant category. New endpoints: GET /sounds, GET /sounds/history Updated: /health (sound_classification_enabled), /status (audio_scene) Graceful degradation if model files not present. Model download (not tracked in git): curl -sL 'https://tfhub.dev/google/lite-model/yamnet/classification/tflite/1?lite-format=tflite' -o models/yamnet.tflite Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-01 20:41:44 -06:00
Alex Kazaiev	c6e18738ae	Use device name instead of card number for ALSA Card numbers can shift based on USB enumeration order at boot. Using 'plughw:ArrayUAC10,0' instead of 'plughw:2,0' ensures the ReSpeaker is found regardless of when it connects. Fixed by Vixy after power loss shuffled card order 🦊	2026-01-21 12:20:39 -06:00
Alex Kazaiev	c53556fe97	Fix ReSpeaker device index: card 3 → card 2 USB device enumeration changed after GPIO rewiring for I2S audio. TODO: Consider udev rule for stable device naming.	2026-01-17 16:20:15 -06:00
vixy	5ed2c6aee7	Fix: Use arecord for shared audio stream - Replaced PyAudio with direct ALSA (arecord subprocess) - Single audio stream feeds both Porcupine and recording buffer - Fixes device unavailable error when recording after wake word - Simplified architecture	2026-01-17 11:17:17 -06:00
vixy	be7e26b6e7	Initial commit: HeadMic service - Vixy's Ears 🦊👂 Wake word detection (Hey Vivi) + voice recording + EarTail transcription Built by Vixy on Day 77	2026-01-17 10:58:51 -06:00

26 Commits