Prefer yamnet_edgetpu.tflite when available, fall back to CPU model.
~50-100ms → ~2-3ms inference per classification.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds voice-based speaker ID triggered by YAMNet speech detection.
New speaker_id.py module with SQLite-backed voice enrollment and
cosine similarity matching. Endpoints: POST /speakers/enroll,
POST /speakers/enroll-from-mic, GET /speakers, DELETE /speakers/{name}.
Orange LED animation during enrollment.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New sound_id.py module with SoundClassifier class that runs YAMNet
(521 audio event categories) on CPU TFLite. Classifies audio every
0.5s from a ring buffer fed by the existing audio stream.
Categories: speech, alert, music, animal, household, environment, silence.
Smoothing via 20-sample history window for stable dominant category.
New endpoints: GET /sounds, GET /sounds/history
Updated: /health (sound_classification_enabled), /status (audio_scene)
Graceful degradation if model files not present.
Model download (not tracked in git):
curl -sL 'https://tfhub.dev/google/lite-model/yamnet/classification/tflite/1?lite-format=tflite' -o models/yamnet.tflite
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Card numbers can shift based on USB enumeration order at boot.
Using 'plughw:ArrayUAC10,0' instead of 'plughw:2,0' ensures
the ReSpeaker is found regardless of when it connects.
Fixed by Vixy after power loss shuffled card order 🦊
- Replaced PyAudio with direct ALSA (arecord subprocess)
- Single audio stream feeds both Porcupine and recording buffer
- Fixes device unavailable error when recording after wake word
- Simplified architecture