Predictive maintenance - Electric motors are the workhorses of industrial automation. Unplanned motor failures cause costly downtime, sometimes hundreds of thousands of dollars per hour in manufacturing. Predictive maintenance uses vibration sensors to detect faults early, before they cause catastrophic failure, allowing maintenance to be scheduled during planned shutdowns.
Why vibration analysis? - A healthy motor produces smooth, periodic vibrations at its rotation frequency. Each fault type introduces a characteristic signature: imbalance adds harmonics, bearing defects create periodic impulses, and misalignment modulates the amplitude. These patterns are detectable in the time domain and are well-suited for sequence classification with recurrent neural networks.
Industry 4.0 and edge inference - In a smart factory, triaxial accelerometers mounted on motor housings stream vibration data continuously. An RNN running on an edge device (MCU or small SoC) classifies windows of vibration data in real-time, flagging anomalies for the maintenance team. The model must be small enough to run on constrained hardware, making compact LSTM/GRU architectures ideal.
Why RNN? - Vibration signals are inherently sequential. An RNN processes the signal timestep by timestep, building up an internal representation of the temporal pattern. LSTM and GRU cells have gating mechanisms that allow them to remember relevant features over many timesteps while ignoring noise, making them well-suited for this task.
Triaxial accelerometer - Each sample consists of three channels (X, Y, Z) captured simultaneously from a 3-axis accelerometer mounted on the motor housing. The X-axis is typically aligned with the motor shaft, Y is radial horizontal, and Z is radial vertical. Different fault types produce different patterns across the three axes.
Four fault types - The classifier distinguishes four conditions: (0) Normal - clean sinusoidal vibration at rotation frequency with minimal noise; (1) Imbalance - mass imbalance on the rotor adds a 2nd harmonic component; (2) Bearing fault - damaged bearing races produce periodic high-amplitude impulses; (3) Misalignment - shaft misalignment causes amplitude modulation of the base vibration.
Sliding windows - Continuous vibration data is segmented into fixed-size windows (32 or 64 timesteps). Each window becomes one training/test sample. The window size must be large enough to capture at least one full period of the fault signature. At 1 kHz sampling and 50 Hz rotation, 64 timesteps captures about 3 full rotations.
Normalization - Raw vibration values (typically in the range [-1.5, 1.5]) are normalized to [0, 1] using the transform (signal + 1.5) / 3.0, clamped to the unit interval. This ensures all inputs to the RNN are in a consistent range, which improves training stability.
LSTM vs GRU - LSTM (Long Short-Term Memory) uses 4 gates (forget, input, candidate, output) and maintains a separate cell state for long-term memory. GRU (Gated Recurrent Unit) uses 3 gates (reset, update, candidate) and merges the cell state into the hidden state, making it simpler and faster. For short sequences like vibration windows, GRU often performs comparably to LSTM with fewer parameters.
Hidden size - The number of recurrent units in the hidden layer. More units provide greater capacity to learn complex patterns but increase training time quadratically (due to hidden-to-hidden connections). For 4-class vibration classification, 16 units is often sufficient. Use 32 if accuracy is too low, or 8 for faster experiments.
Learning rate - Controls the step size during gradient updates. RNNs are sensitive to learning rate: too high causes exploding gradients and unstable training (loss jumps around), too low causes slow convergence. The default of 0.003 works well with Adam optimizer for this task. If training is unstable, try 0.001.
Window size - The number of timesteps in each vibration sample. Longer windows capture more temporal context but increase training time linearly (the RNN must process more steps per sample). 64 timesteps at 1 kHz sampling covers about 3 full motor rotations, which is enough for all four fault types.
Accuracy - The percentage of test samples correctly classified. For a 4-class problem, random guessing gives 25%. A well-trained model should reach 70-90%+ depending on hidden size and epochs. If accuracy is near 25%, the model has not learned - try more epochs or a higher learning rate.
Confusion matrix - Shows the distribution of predictions for each actual fault type. The diagonal cells (green) are correct predictions; off-diagonal cells (red) are misclassifications. Common confusion patterns: Normal and Misalignment may be confused because misalignment with weak modulation looks similar to normal vibration. Bearing faults are usually the easiest to classify due to their distinctive impulse pattern.
Loss curve - Shows the average training loss per epoch. A healthy curve decreases rapidly in the first few epochs and then gradually levels off. If the loss oscillates wildly, reduce the learning rate. If it plateaus high, the model may need more hidden units or more training samples.
Inference time - Total time to classify all test samples. This includes resetting state and running the full sequence through the RNN for each sample. Divide by the number of test samples to get per-sample latency, which is the relevant metric for real-time edge deployment.
Build and train the RNN classifier using @spiky-panda/core:
import {
RnnBuilder,
RnnCellType,
RnnInferenceRuntime,
RnnTrainingRuntime,
ActivationFunctions,
LossFunctions,
Optimizers,
} from "@spiky-panda/core";
// 1. Build the RNN
const graph = new RnnBuilder()
.withInputSize(3) // triaxial: X, Y, Z
.withHiddenSize(16) // 16 recurrent units
.withOutputSize(4) // 4 fault classes
.withCellType(RnnCellType.LSTM)
.withOutputActivation(ActivationFunctions.sigmoid)
.build();
// 2. Create runtime and trainer
const runtime = new RnnInferenceRuntime(graph);
const trainer = new RnnTrainingRuntime(
graph,
runtime,
LossFunctions.MSE,
0.003, // learning rate
Optimizers.Adam()
);
// 3. Training loop
for (let epoch = 0; epoch < 25; epoch++) {
let totalLoss = 0;
for (const sample of trainData) {
runtime.resetState();
// Repeat one-hot target for each timestep
const targets = sample.sequence.map(() =>
oneHot(sample.label, 4)
);
totalLoss += trainer.trainStep(
sample.sequence, targets
);
}
console.log(`Epoch ${epoch + 1} - Loss: ${
(totalLoss / trainData.length).toFixed(6)
}`);
}
// 4. Inference
runtime.resetState();
const outputs = runtime.run(testSample.sequence);
const predicted = argmax(outputs[outputs.length - 1]);
console.log("Predicted fault:", predicted);