From fde3b985542ff444c95bb3362583d4104c12251c Mon Sep 17 00:00:00 2001 From: Alex Date: Sun, 12 Apr 2026 22:01:49 -0500 Subject: [PATCH] Document anonymous speaker tracking + promote workflow Added speaker identification section explaining the three-tier system (enrolled/anonymous/unidentified), the promote workflow, and enrollment options. Updated speakers API table with /speakers/promote endpoint. Co-Authored-By: Claude Opus 4.6 (1M context) --- README.md | 34 ++++++++++++++++++++++++++++++++-- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index a0b22e8..857442e 100644 --- a/README.md +++ b/README.md @@ -48,7 +48,7 @@ Binaural hearing service for Vixy's physical head. Dual mic arrays with spatial |---------|--------|----------|--------| | Wake word detection | Porcupine | CPU | Needs Picovoice key | | Sound classification | sound_id.py | Coral Edge TPU | 521 classes, ~2ms | -| Speaker identification | speaker_id.py | CPU (Resemblyzer) | Enrollment via API | +| Speaker identification | speaker_id.py | CPU (Resemblyzer) | Enrolled + anonymous tracking | | Spatial tracking | spatial.py | USB control | 3-signal fusion: DoA + ILD + ITD | | Distance estimation | spatial.py | audio energy | Proximity zones (intimate/conversational/across_room/far) | | ITD processing | spatial.py | audio cross-correlation | Sub-ms delay → bearing angle | @@ -196,9 +196,10 @@ sudo systemctl start headmic | Endpoint | Method | Description | |----------|--------|-------------| -| `/speakers` | GET | List enrolled speakers | +| `/speakers` | GET | List all speakers (enrolled + anonymous) | | `/speakers/enroll` | POST | Enroll from uploaded audio (multipart: name + WAV) | | `/speakers/enroll-from-mic` | POST | Record 5s from mic + enroll (query: name) | +| `/speakers/promote` | POST | Promote anonymous → enrolled (query: anon_id, name) | | `/speakers/{name}` | DELETE | Remove a speaker | ### Recording @@ -232,6 +233,35 @@ sudo systemctl start headmic } ``` +## Speaker Identification + +Three-tier recognition using Resemblyzer 256-dim GE2E embeddings: + +| Tier | Name format | How it works | +|------|-------------|-------------| +| Enrolled | `"Alex"` | Matched against stored embeddings (cosine ≥ 0.75) | +| Anonymous | `"unknown_bfa1"` | Clustered online from unrecognized voices (cosine ≥ 0.70) | +| Unidentified | `null` | Audio too short or no speech detected | + +Anonymous speakers get a stable 4-character hex ID derived from their voice embedding. The same person consistently gets the same ID across observations. IDs expire after 1 hour of silence, max 10 tracked simultaneously. + +**Workflow:** +``` +Unknown person speaks → "unknown_bfa1" (auto-created) + ↓ +You ask "who's that?" → check /speakers + ↓ +curl -X POST "http://head:8446/speakers/promote?anon_id=unknown_bfa1&name=Bob" + ↓ +Now recognized as "Bob" going forward (embedding saved to voices.db) +``` + +Alternatively, enroll directly from mic: +```bash +curl -X POST "http://head:8446/speakers/enroll-from-mic?name=Alex" +# Speak for 5 seconds +``` + ## LED States | State | Effect | Color |