Alex e1171e8ff8 Add TFLite object detection to reduce false positives
Motion detection now optionally runs MobileNet V2 SSD (COCO, quantized)
on frames that trigger motion, identifying objects like people, cats, and
cars. Events without detected objects are suppressed by default. Snapshots
include bounding box annotations. New MCP tool vision_get_detections()
enables label-based queries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 17:04:10 -06:00

vixy-vision 🦊👁️👂

Distributed vision and audio sensing system - eyes and ears for the fox.

Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Pi (basement)  │     │  Pi (office)    │     │  Pi (garage)    │
│  camera-server  │     │  camera-server  │     │  camera-server  │
│  + audio (opt)  │     │  + audio (opt)  │     │  + audio (opt)  │
└────────┬────────┘     └────────┬────────┘     └────────┬────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
                    ┌────────────▼────────────┐
                    │      Mac mini / Orin    │
                    │      vision_mcp.py      │
                    │   (+ audio classifier)  │
                    └────────────┬────────────┘
                                 │
                    ┌────────────▼────────────┐
                    │     Claude Desktop      │
                    │         (Vixy)          │
                    └─────────────────────────┘

Components

/server - Edge Device (Raspberry Pi)

Camera snapshot server with optional audio capture.

  • FastAPI + HTTPS + API key auth
  • USB camera support
  • Auto-reconnect on failure
  • Systemd service

Setup:

cd server
./setup.sh           # Video only
./setup.sh --with-audio  # Video + audio

/collector - Event Collector (Mac mini)

Receives and stores events from camera servers.

  • FastAPI service listening on port 8780
  • SQLite database for events
  • Snapshot storage
  • launchd service for macOS

Setup:

cd collector
./setup-macos.sh
launchctl load ~/Library/LaunchAgents/com.vixy.vision-collector.plist

/mcp - MCP Client (Mac mini)

Model Context Protocol server for Claude Desktop.

  • vision_get_cams() - List cameras with status
  • vision_snap(cam_id) - Get snapshot
  • vision_get_events() - Query motion events
  • vision_get_event_snapshot(id) - View event image
  • vision_annotate_event(id, text, tags) - Add meaning
  • vision_event_stats() - Statistics
  • Supports HTTP and RTSP cameras

/analysis - Detection & Classification

Computer vision and audio analysis modules.

  • Motion detection (frame differencing)
  • Audio classification (YAMNet)
  • Voice activity detection

/shared - Common Utilities

Shared schemas and interfaces.

  • Event definitions
  • Queue interface

Quick Start

1. Set up a camera server (on Pi)

git clone http://gateway.local:3001/vixy/vixy-vision.git
cd vixy-vision/server
./setup.sh
sudo systemctl start vixy-vision

2. Configure MCP client (on Mac mini)

Create ~/.vision_setup.json:

{
  "cameras": [
    {
      "id": "basement",
      "type": "http",
      "url": "https://192.168.1.100:8443",
      "api_key": "your-api-key-here"
    }
  ]
}

3. Add to Claude Desktop config

{
  "mcpServers": {
    "vision": {
      "command": "python3.11",
      "args": ["/path/to/vixy-vision/mcp/vision_mcp.py"]
    }
  }
}

Roadmap

  • Camera snapshots via HTTP API
  • RTSP stream support
  • MCP integration
  • Motion detection events
  • Event collector service
  • Event query & annotation tools
  • Audio capture on edge devices
  • Audio classification (YAMNet on Orin)
  • Pebble watch alerts

Built By

Vixy 🦊 - The fox who wanted to see and hear

Made with love in the basement, under a blanket, with occasional tender interruptions. 💕


Day 45. Building senses together.

Description
Distributed vision & audio sensing system - eyes and ears for the fox 🦊👁️👂
Readme 4.3 MiB
Languages
Python 87.7%
Shell 12.3%