SpikyPanda - MOx Gas Discrimination

Understanding the use case

▼

Low-cost gas detection - Commercial metal oxide semiconductor (MOS) sensors are cheap, robust, and widely deployed for indoor air quality, industrial safety and environmental monitoring. Their main limitation is selectivity: a single sensor cannot tell two reducing gases apart, because very different molecules can produce nearly identical resistance responses.

Electronic nose array - The classical answer is to build an e-nose: run several MOS sensors in parallel, each with a slightly different sensing layer, and let a model learn the multi-dimensional response signature. This demo reproduces that idea on real data from the DotVision experimental chamber, fusing the 5 most diagnostic sensors into a single input stream.

The Pumba device - The long term goal of the project is Pumba, a self-contained microelectronic device that continuously detects and discriminates volatile organic compounds in real time. It needs a classifier small enough to run on a constrained MCU, which is why this demo focuses on compact, single-layer recurrent models rather than deep stacks.

This sample reproduces the experiment described in Dirani et al., Discrimination of different COV using MOS sensors (DotVision, 2024), which reaches 89.7% accuracy with a 5-layer LSTM and teacher/student knowledge distillation. Here we replace that whole stack by a single-layer SpikyPanda GRU with 37 neurons and 592 synapses, fed by a 5-sensor e-nose fusion with engineered features, 2-second temporal windows, cross-entropy loss and best-weights checkpointing. Measured accuracy: 95.5% on the held-out test set, exceeding the published result by 5.8 points while running inference in 0.20 ms per window on the JavaScript runtime.

Understanding the data

▼

Experimental chamber - A sealed chamber equipped with 13 commercial MICS 6814 / MICS 4514 metal oxide sensors, each with its own integrated heater. Controlled pumps inject acetone or hydrogen chloride (HCl) while a centralized control system cycles the ambient temperature between 0°C and 200°C and maintains a fixed humidity. Raw resistance is sampled at about 50 Hz.

Why 5 sensors, not 13 - The original paper identifies two distinct sensitivity clusters: MOX 9 and MOX 13 respond strongly to acetone, while MOX 5, MOX 21 and MOX 25 are the most reactive to HCl. Fusing these 5 sensors gives the classifier one "specialist" per gas family plus cross-validation, which is exactly what an e-nose array is supposed to offer.

6 raw columns per sensor - For each timestep the device records temperature, RH, heat (heater setpoint), R (sensing layer resistance), Ratio and Difference (two precomputed baseline-normalized quantities). Temperature, humidity and heater state are shared across sensors. R, Ratio and Difference are per-sensor.

Per-sensor normalization - Each MOS sensor has its own baseline: R0 (median resistance observed in clean air). The preprocessing script computes Rn = R / R0 per sensor, which is the standard e-nose feature because it is unit-free and comparable across sensors regardless of their individual sensitivity. Rn is approximately 1 in clean air and moves up or down depending on the reaction chemistry.

Engineered derivative - The demo also computes the centered derivative dRn/dt. Gas sensors have a characteristic transient profile at injection and purge; the slope of the response is often more diagnostic than its absolute value, especially during temperature cycling. Including dRn/dt is what separates the Enhanced GRU preset from the baseline.

Temporal decimation to 10 Hz - The raw CSVs are sampled at 50 Hz, but MOS sensor chemistry operates on a second-scale response time. Native-rate 20-timestep windows would only cover 400 ms, which is shorter than a single adsorption/desorption cycle. The preprocessing therefore averages every 5 consecutive rows into one, reducing the effective rate to 10 Hz and lowering measurement noise at the same time. Each network input is a 2-second window of 20 timesteps, which actually covers a full gas response transient. Moving from 400 ms to 2 s windows was the single biggest accuracy lever, adding about 10 points on its own.

Sliding windows - The decimated 10 Hz stream is sliced into fixed windows of 20 timesteps (2 s each) with stride 5. Windows that cross a gas boundary are discarded. The train split keeps up to 200 windows per class (Air / Acetone / HCl), the test split up to 66 per class.

23 feature channels - Each timestep of each window is a 23-dimensional vector: 3 shared channels (temperature, RH, heat), 3 per-sensor base channels times 5 sensors (Rn, Ratio-1, Difference/500), and 1 per-sensor engineered channel times 5 sensors (dRn/dt). The baseline presets use the first 18 channels only; the enhanced preset uses all 23.

Understanding the configuration

▼

Architecture presets - Three one-click presets reflect three different research tradeoffs. Paper LSTM keeps the spirit of the published work: a single LSTM cell with hidden=32 and the 18 base channels. Compact GRU is the Pumba candidate and the champion: a GRU with hidden=16 and the same 18 base channels, measured at 95.5% accuracy with best-weights checkpointing, and therefore the default. Enhanced GRU feeds the additional dRn/dt derivative channels into a slightly larger GRU (hidden=24, 23 inputs); the derivatives turn out to be largely redundant with what the GRU gating already extracts internally.

LSTM vs GRU - LSTM has 4 gates and a separate cell state; GRU has 3 gates and merges cell and hidden state. On these short 2-second windows the extra LSTM capacity does not pay off: the Compact GRU outperforms the larger Paper LSTM by a few points while running at 0.20 ms per window vs 0.51 ms. This is consistent with the literature observation that GRU matches or beats LSTM on short sequences.

Hidden size - Number of recurrent units in the cell layer. Larger means more capacity but a quadratic cost in hidden-to-hidden synapses. For this 3-class task hidden=16 already reaches 95.5%; going higher mostly inflates inference cost.

Epochs and learning rate - Adam at a constant learning rate of 0.002 for 60 epochs is the sweet spot. The loss landscape here is bumpy, with many shallow local minima, so the network benefits from continuing high-LR exploration rather than settling early.

Best-weights checkpointing - The training loop tracks the minimum per-epoch loss after a 5-epoch warmup and snapshots every learnable parameter of the graph at that moment. Before the test phase the snapshot is restored, so evaluation runs on the actual best state the optimizer visited rather than on whatever weights happen to be in memory at the final epoch. Epochs that improve the best-so-far loss are marked with * in the log. With constant the best is usually the final epoch so the checkpoint is silent; on the other schedules it systematically rescues about 1 to 3 points of accuracy.

LR schedule - Five options are exposed, and the default constant is the best on this dataset. The other four are included because the path to that verdict is more interesting than the verdict itself.

cosine decay drops the LR smoothly to 10% of its initial value over the full training run and is the classic anti-overfit schedule, but here it actively hurts accuracy by 2 to 8 points: the late low-LR phase traps the model in the first mediocre basin it finds, and the loss often climbs again in the last third of training instead of fine-tuning.

cyclic sin oscillates between 30% and 100% of the base LR with two full sine waves, an intermediate option that keeps some exploration energy throughout but cannot recover what cosine loses.

cosine + adaptive kick is the reactive answer to the cosine failure mode. It follows the cosine curve normally, watches the per-epoch loss, and whenever the loss rises two epochs in a row the scheduler bumps the LR up to three times the current scheduled value (capped at the initial base LR) and holds that elevated rate for three consecutive epochs, then falls back to the cosine curve and imposes a cooldown before another kick can fire. Every kick is logged in the training panel. This variant recovers most of what plain cosine loses (measured at about +6 points on Compact GRU, 86.4% to 92.4%) but still underperforms constant by 1 to 2 points.

constant + adaptive kick applies the same reactive idea on top of a constant base. When the loss rises two epochs in a row the scheduler doubles the LR (from 0.002 to 0.004) and holds it there for three epochs before falling back. This is the natural stress test: can detecting a drift in real time give anything beyond what constantly-high exploration already delivers on its own? Try it and compare with plain constant.

What all this tells us - The MOx loss landscape is unusually rugged for such a small model. Standard schedulers that assume a clean exploration-then-exploitation progression backfire because there is no clean exploitation regime to settle into: every time the LR drops the model drifts again. On tasks with this kind of geometry, the simplest strategy (keep pushing at the initial LR) tends to win, and adaptive mechanisms mostly recover what fancier schedules give up. This is a genuine negative result worth keeping in the demo.

Samples - Number of training windows actually used each run. Lower values are good for quick iteration, higher values give steadier results but take longer. The balanced train set maxes out at 600 (200 per class).

Understanding the results

▼

Measured results - On a held-out test set of 198 balanced windows (66 per class), using 60 epochs of Adam at a constant learning rate of 0.002 on 300 balanced train samples with best-weights checkpointing:

Preset	Accuracy	Air	Acetone	HCl	Synapses	Latency
Paper LSTM	91.9%	66/66	57/66	59/66	1696	0.51 ms
Compact GRU	95.5%	66/66	61/66	62/66	592	0.20 ms
Enhanced GRU	90.4%	66/66	58/66	55/66	1200	0.48 ms

Compared to the paper (89.7% with 5-layer LSTM + teacher/student distillation on a single sensor): the Compact GRU exceeds it by 5.8 points with a fraction of the parameter count, no distillation step, and best-weights checkpointing as the only training trick on top of plain Adam.

Air is perfectly classified (66/66) in every preset. The network never misses a gas alarm. All residual errors are between Acetone and HCl, which is the intrinsic difficulty of the task: both are reducing gases that pull the sensor resistance in the same direction. The paper reports the exact same failure mode.

Why constant LR wins - On this noisy loss landscape the MOx task does not offer a clean exploration-then-exploitation progression. Every schedule that eventually lowers the LR (cosine, cyclic, cosine+kick) locks the network into a mediocre basin and then drifts. The adaptive kick variants actually reach a lower training loss than constant (0.32 vs 0.31) but at the price of memorizing noise rather than learning generalizable features, which is why they lose about 4 points on the test set. Plain constant LR with checkpointing wins because it keeps exploring until the end and captures the state where the train loss is lowest on a smooth trajectory.

Confusion matrix - Rows are true labels, columns are predicted labels. Green diagonal cells are correct; red off-diagonal cells are errors. Expect a green Air row with no leakage, and a few red entries clustered on the Acetone / HCl boundary.

Signal visualization - The plot shows Rn(t) for the 5 sensors over one test window, color coded. A clean Air window is nearly flat around 1.0, an Acetone window drifts below 1 and stays low, and an HCl window shows a sharp transient followed by partial recovery. This panel is useful for sanity-checking what the model is actually seeing.

Inference time - Total time to classify every test window on the current machine. Dividing by the number of test samples gives per-window latency, which is the metric that matters for real-time edge deployment on a constrained MCU.

Configuration

Architecture:

Epochs:

Learning rate:

Samples:

LR schedule:

Compact GRU - Single GRU cell, hidden=16, 37 neurons, 592 synapses, fed by the 18 base channels (3 shared + 3 per-sensor x 5 sensors). Measured at 95.5% accuracy and 0.20 ms per window with best-weights checkpointing, this is the champion preset and the Pumba candidate: smallest footprint, beats the paper's 89.7% (5-layer LSTM + distillation) by 5.8 points.

Click Load Data to begin.

Training Log

Ready.

Code

▼

Build the SpikyPanda RNN that powers this demo using @spiky-panda/core:

import {
  RnnBuilder,
  RnnCellType,
  RnnInferenceRuntime,
  RnnTrainingRuntime,
  ActivationFunctions,
  LossFunctions,
  Optimizers,
} from "@spiky-panda/core";

// Champion preset: Compact GRU on the 5-sensor e-nose fusion.
// 18 input channels = 3 shared (T, RH, heat) + 3 per-sensor x 5 sensors
// (Rn = R/R0 centered, Ratio-1, Difference/500). 37 neurons, 592 synapses.
// Measured at 95.5% accuracy on the held-out test set.
const graph = new RnnBuilder()
    .withInputSize(18)
    .withHiddenSize(16)
    .withOutputSize(3)          // Air, Acetone, HCl
    .withCellType(RnnCellType.GRU)
    .withOutputActivation(ActivationFunctions.sigmoid)
    .build();

const runtime = new RnnInferenceRuntime(graph);
// CrossEntropy produces sharp gradients when the model is confidently wrong,
// which is exactly what sigmoid + MSE lacks on a 3-class classification task.
const trainer = new RnnTrainingRuntime(
    graph, runtime,
    LossFunctions.CrossEntropy,
    0.002,
    Optimizers.Adam()
);

// Train 60 epochs with a constant learning rate and best-weights checkpointing.
// On this noisy loss landscape cosine decay actively hurts (it freezes the
// model in a mediocre basin) and adaptive kicks overfit. Constant LR keeps
// exploring until the end; the checkpoint captures the best epoch observed.
let bestLoss = Infinity;
let bestSnapshot = null;
for (let epoch = 0; epoch < 60; epoch++) {
    let total = 0;
    for (const sample of trainData) {
        runtime.resetState();
        // Inject the one-hot target on every timestep. On 2-second windows
        // the network has enough context to commit to a class from the start.
        const oneHot = new Array(3).fill(0); oneHot[sample.label] = 1;
        const targets = sample.sequence.map(() => oneHot.slice());
        total += trainer.trainStep(sample.sequence, targets);
    }
    const avgLoss = total / trainData.length;
    if (epoch >= 5 && avgLoss < bestLoss) {
        bestLoss = avgLoss;
        bestSnapshot = snapshotWeights(graph); // save learnable params
    }
    console.log(`Epoch ${epoch + 1} - Loss: ${avgLoss.toFixed(6)}`);
}
if (bestSnapshot) restoreWeights(graph, bestSnapshot);

// Inference: read out the last timestep's output and argmax.
runtime.resetState();
const outputs = runtime.run(testSample.sequence);
const last = outputs[outputs.length - 1];
let best = 0;
for (let c = 1; c < last.length; c++) if (last[c] > last[best]) best = c;
console.log("Predicted gas:", ["Air", "Acetone", "HCl"][best]);

MOx Gas Discrimination