Optimizing Savitzky-Golay Filters for Arduino

Abstract

Savitzky-Golay filters are widely used for signal smoothing and derivative computation in embedded systems, particularly for sensor data processing. However, their standard implementation relies heavily on floating-point arithmetic, making them unsuitable for resource-constrained microcontrollers like Arduino/ATmega. This technical note presents an optimized implementation that replaces float coefficients with int8_t values while maintaining high precision. The proposed method reduces memory usage by 75% and eliminates most floating-point operations, achieving median relative errors below 1.5% as validated through Monte Carlo simulation.

Problem Statement

Embedded systems frequently require real-time signal processing for sensor data such as temperature readings, accelerometer outputs, or pressure measurements. Savitzky-Golay filters excel at computing smoothed derivatives while preserving signal features, making them ideal for detecting clean trends in sensor data.

However, the standard implementation poses significant challenges for microcontrollers:

memory overhead: float coefficients require 4 bytes each (44 bytes for an 11-point filter);
computational cost: multiple floating-point multiplications per sample;
limited precision: standard Arduino float implementation uses only 23-bit mantissa.

These limitations become critical when processing multiple sensor channels or implementing real-time control loops on memory-constrained devices.

Solution Overview

This optimization strategy transforms the Savitzky-Golay computation from a float-intensive operation to an integer-dominant calculation. The key insight is to represent coefficients as scaled integers while maintaining accuracy through careful numerical analysis. The core approach relies on mixed arithmetic, performing multiplications in integer domain, apply scaling once. The expeced memory reduction is estimated as 11 bytes vs 44 bytes per filter (75% savings).

R Implementation Analysis

The R implementation demonstrates the complete workflow for generating optimized Savitzky-Golay coefficients suitable for embedded systems. The core quantize_to_int8() function employs an intelligent scaling strategy that identifies the coefficient with maximum absolute value in each filter row and applies asymmetric quantization to fully utilize the int8_t range [-128, +127]. The Monte Carlo validation framework simulates realistic Arduino conditions by generating random uint8_t sensor data and comparing float reference calculations against the quantized implementation.

The error analysis reveals that the optimization maintains excellent accuracy across all filter positions, with median relative errors consistently below 1.5%. The symmetric error distribution pattern observed in the results reflects the mathematical properties of the Savitzky-Golay kernel, where edge positions naturally exhibit slightly higher quantization errors due to their larger coefficient magnitudes. This validation approach provides embedded developers with confidence that the int8_t optimization delivers production-ready accuracy for real-world sensor applications.

Relative error analysis for int8_t quantized Savitzky-Golay coefficients across all 11 filter positions. Bars represent median relative error (%) based on 10,000 Monte Carlo simulations with random uint8_t input data (0-255 range). Error bars indicate the interquartile range (IQR, 25th-75th percentile). All positions maintain median errors below 1.5%, with the center position (P6) showing optimal performance at approximately 0.3% median error.

Arduino-Optimized Filter Implementation

The Arduino implementation demonstrates the practical benefits of the int8_t optimization in a production-ready embedded context. The core savitzky_golay_derivative() function exemplifies the efficiency gains: it performs only integer multiplications in a tight loop, accumulating results in a 32-bit signed integer to prevent overflow, followed by a single floating-point operation for final scaling. This design minimizes computational overhead while maintaining numerical precision. The implementation provides pre-computed coefficients for some useful filter positions, allowing to select the optimal row based on their specific application requirements.

The circular buffer management in update_and_compute_derivative() addresses the real-world challenge of continuous sensor data processing, automatically handling buffer wraparound and ensuring proper data ordering. Memory usage is dramatically reduced from the standard 44-byte float implementation to just 15 bytes (11 int8_t coefficients + 1 float scale factor), making this approach viable even on the most resource-constrained microcontrollers. The example temperature monitoring application illustrates practical deployment, including ADC reading, sensor value mapping, and derivative-based trend detection—common requirements in embedded sensor systems.