Case Studies

Gearbox Anomaly Detection: How Edge AI Caught a Failure 60 Hours Early

An anonymized account of how a gearmesh frequency anomaly in a helical gearbox was flagged by edge inference 60 hours before the unit would have seized — and what the corrective work actually cost.

Gearbox Anomaly Detection: How Edge AI Caught a Failure 60 Hours Early

We get asked a lot whether edge AI can actually catch a failure that a human would miss. This article answers that question with a specific example — not a simulation, but a real event at a real facility. We've anonymized the customer details, but the sensor readings, timestamps, and corrective work costs are drawn directly from deployment records.

The Asset and the Operating Context

The asset in question was a two-stage helical gearbox driving a transfer line indexing mechanism at a mid-size Tier-1 automotive stamping supplier in the Midwest. The gearbox ran two shifts per day, six days per week, at a fairly steady 1,450 RPM input speed. It had last been serviced eighteen months prior — well within the OEM's recommended 24-month PM interval — and carried no open work orders in the facility's IBM Maximo instance.

The Gearcadence edge gateway had been monitoring this asset for eleven weeks by the time the anomaly appeared. During the initial two-week calibration window, the system established a normal operating envelope for this specific unit: baseline RMS vibration in the 0.08–0.12 g range at the gear mesh frequency (approximately 290 Hz for this gear ratio), with seasonal temperature variation between 58°F and 74°F at the motor housing.

Nothing unusual had surfaced in the first nine weeks. The gearbox was green on the fleet dashboard every morning.

The First Signal: Gearmesh Frequency Drift

At 3:47 AM on a Tuesday, the edge model flagged an anomaly. Gear mesh frequency amplitude had climbed from its 0.11 g baseline to 0.19 g — a 73% increase — over the preceding six hours. On its own, that number is ambiguous. Vibration levels drift for all kinds of reasons: load changes, temperature swings, minor lubrication variations. The system held its alert until a secondary condition was met.

By 6:15 AM, the sideband pattern around the gear mesh frequency had changed character. Sidebands were appearing at shaft rotation frequency (24.2 Hz) on both sides of the gear mesh peak — a classic early indicator of tooth wear or surface fatigue on a helical gear set. This two-condition confirmation raised the anomaly score above the facility's alert threshold, and a work order draft landed in the Maximo queue at 6:22 AM, tagged as "Gearbox TF-07: elevated gearmesh sidebands, non-critical, inspect within 72 hours."

The maintenance planner saw it at the morning standup. The shift schedule had a four-hour planned downtime window on Thursday for an unrelated conveyor PM. The gearbox was added to that window.

What the Technician Found

When the gearbox was opened on Thursday morning — approximately 54 hours after the initial alert — the technician found significant pitting on three teeth of the low-speed helical gear: roughly 15–20% of the tooth face area showing fatigue spalling. Lubricant analysis pulled from the sump showed ferrous particle concentration at 340 ppm — nearly four times the 90 ppm baseline from the previous oil sample taken at the last PM.

The technician's assessment was unambiguous: without intervention, this gear set would have failed within the next day or two of operation. At the facility's production rate, a catastrophic gearbox seizure on this transfer line would have halted stamping output for an estimated 14 to 18 hours while the unit was replaced under emergency conditions.

The corrective repair — gear set replacement with new lubricant charge and shaft alignment verification — took 3.5 hours and cost approximately $4,200 in parts and labor at planned rates. Compare that to the avoided scenario: a replacement gearbox sourced on emergency lead time runs $11,000–$14,000 for this unit class, and the production loss at this facility's $55,000-per-hour throughput rate would have added another $77,000–$99,000 to the bill.

Why 60 Hours of Warning Matters

Sixty hours is a meaningful planning window in discrete manufacturing. It's long enough to:

  • Source replacement parts from an in-region distributor without expedite charges
  • Schedule a qualified technician on a specific shift without overtime premiums
  • Plan production around the repair window to minimize line downtime
  • Coordinate with the OEM for technical documentation if needed

A 6-hour warning — which is roughly what a threshold-alarm approach on vibration RMS would have provided — doesn't give you those options. Parts are on emergency procurement, the technician is pulled from another job, and the production scheduler is scrambling. In our experience working with maintenance teams across the Midwest, the difference between a planned repair and an emergency response on the same fault can be 3–5x in total cost when you account for labor, parts premium, and production impact.

The ±18% TTF accuracy that Gearcadence achieves on rolling-element bearings and gearboxes matters precisely because of this planning window calculus. A TTF estimate of "48 to 72 hours" is actionable. "Failure imminent" is not.

The Role of Edge Inference in This Detection

A detail worth noting: this detection happened at 3:47 AM. The plant's IT infrastructure runs scheduled maintenance windows from 2:00 AM to 5:00 AM on weeknights — during which the facility's network connectivity to cloud services is intermittent at best. A cloud-only architecture would have missed or delayed this detection window entirely.

Because the Gearcadence edge gateway runs inference locally at 10 kHz sampling rate, the anomaly score was computed on the device regardless of network state. When connectivity restored at 5:30 AM, the alert data synced and the Maximo work order was created. No alert was lost, no detection window was missed.

This isn't a hypothetical benefit. It's the specific reason we built edge-first inference rather than treating the edge gateway as a simple data forwarder. Plant networks are not data center networks. They go down during maintenance windows, they drop packets under heavy PLC traffic, and they weren't designed to carry continuous high-frequency sensor streams. The model needs to run where the sensor data is.

Takeaways for Reliability Engineers

This case doesn't prove that every failure will announce itself 60 hours in advance. Gear fatigue has a relatively predictable progression once it starts; other failure modes are more abrupt. What it does illustrate is that the spectral information needed to catch this failure was present in the sensor data well before any threshold-based alarm would have fired — and that acting on a ranked, confidence-scored alert rather than a binary trip condition is what created the planning margin.

If your plant has gearboxes and transfer-line drives that have seen 12+ months since last PM, the probability that at least one of them is showing sub-threshold wear progression right now is not zero. We'd rather help you find out before Thursday morning shift.

See Gearcadence on your equipment

We deploy alongside your maintenance team in a single day. Bring us one machine you've had trouble predicting — we'll show you what the sensors already know.

Request a Demo