The Problem
A manufacturing firm was losing an estimated $3M annually to unplanned equipment downtime across its 2 production facilities. Their existing monitoring approach was purely reactive — sensors reported failures after they occurred.
The company had invested in a cloud-based monitoring platform, but the 200–800ms round-trip latency to the cloud meant anomalies were detected too late to prevent failure. Network connectivity in factory environments was also intermittent, causing data gaps.
Key Constraints
Inference must happen on-device with < 50ms latency — no cloud dependency
Edge hardware limited to Raspberry Pi 4 and NVIDIA Jetson Nano
Models must update without taking the device offline (OTA updates)
Connectivity is intermittent — local decisions must be made autonomously
The Solution
The architecture was designed as a three-tier edge intelligence system. Sensor data is collected at the device level, processed by a quantized TensorFlow Lite classification model running on the device, and anomaly events are queued locally and synced to the cloud when connectivity is available.
Models are trained centrally using historical sensor data labeled by maintenance engineers. Every 30 days, updated model weights are pushed OTA to all devices via MQTT using a blue/green deployment strategy, with automatic rollback if prediction confidence drops.
Technical Architecture
TFLite Inference Engine
INT8 quantized LSTM models running locally. Models compressed from 45MB to 2.3MB with < 3% accuracy loss.
MQTT Edge Broker
Mosquitto MQTT broker on each device for sensor aggregation. QoS 2 delivery with local queue during connectivity loss.
OTA Model Updates
AWS IoT Greengrass for secure OTA model delivery. Blue/green deployment with automated accuracy-based rollback.
Fleet Observability
AWS IoT Device Defender with custom dashboards in Grafana. Per-device accuracy drift detection triggers retraining.
"Within the first quarter, 11 significant equipment issues were identified and addressed that the previous system did not detect. Measurable returns were observed within 60 days of deployment."

