Interpolating Missing GPS Points with Kalman Filters
Interpolating missing GPS points with Kalman filters replaces naive straight-line assumptions with a recursive Bayesian estimator that models vehicle kinematics and sensor uncertainty. Unlike linear or cubic splines, a Kalman filter predicts the next spatial state (position and velocity), then corrects it when a new measurement arrives. This predict-update cycle runs continuously: during signal dropouts, only the prediction step executes, advancing the trajectory while explicitly inflating positional covariance. When a valid ping returns, the update step fuses prediction and observation, weighted by the relative confidence in the motion model versus receiver accuracy. The result is a statistically optimal, temporally continuous trajectory ready for downstream mobility analytics.
Why Kalman Filters Outperform Geometric Interpolation
Raw telemetry streams in urban mobility pipelines rarely arrive at fixed intervals. Multipath reflections, tunnel outages, and aggressive device power-saving modes create irregular gaps that break standard Temporal Aggregation & Window Mapping routines. Addressing these discontinuities through robust Gap Filling in Sparse Trajectories requires a state-space model that respects movement physics rather than geometric convenience.
| Method | Assumption | Handles Noise? | Propagates Uncertainty? | Best For |
|---|---|---|---|---|
| Linear / Cubic Spline | Smooth, continuous path | ❌ No | ❌ No | Static mapping, visualization |
| Dead Reckoning | Constant heading/speed | ️ Partial | ❌ No | Short-term fallback |
| Kalman Filter | Noisy linear dynamics | ✅ Yes | ✅ Yes | Analytics, routing, ETA |
State-Space Configuration for 2D Mobility
A 2D constant-velocity Kalman filter represents the system using five core components:
- State vector (
x):[lat, lon, v_lat, v_lon]ᵀ - State transition matrix (
F): Propagates position using velocity over the elapsed timeΔt - Observation matrix (
H): Maps the full state to the measured dimensions (position only) - Process noise covariance (
Q): Models unobserved dynamics like acceleration, braking, or turning - Measurement noise covariance (
R): Represents GPS receiver variance (typically 4–25 m² for consumer-grade GNSS)
The filter executes two steps per timestamp. When data is present, it runs both prediction and correction. When a timestamp lacks a reading, it skips the correction step entirely. This asymmetry is what enables seamless interpolation without fabricating false confidence.
Recursive Update Cycle & Dropout Handling
The recursive update follows standard linear-Gaussian assumptions:
x̂ₖ⁻ = F x̂ₖ₋₁
Pₖ⁻ = F Pₖ₋₁ Fᵀ + Q
(If measurement zₖ exists)
Kₖ = Pₖ⁻ Hᵀ (H Pₖ⁻ Hᵀ + R)⁻¹
x̂ₖ = x̂ₖ⁻ + Kₖ (zₖ − H x̂ₖ⁻)
Pₖ = (I − Kₖ H) Pₖ⁻
During outages, zₖ is undefined. The algorithm bypasses the Kalman gain (Kₖ) and update equations, propagating only x̂ₖ⁻ and Pₖ⁻. Covariance P grows monotonically with Q, providing a mathematically sound uncertainty envelope that downstream routing or ETA models can consume.
Production-Ready Implementation
The following Python implementation uses filterpy for clarity while remaining fully compatible with standard scientific stacks. It handles irregular timestamps, skips missing measurements, and returns a dense trajectory aligned to the original index with explicit uncertainty bounds.
import numpy as np
from filterpy.kalman import KalmanFilter
from datetime import datetime
def interpolate_gps_kalman(
timestamps: list[datetime],
positions: list[tuple[float, float] | None],
process_noise: float = 0.5,
measurement_noise: float = 15.0
) -> dict[str, np.ndarray]:
"""
Interpolates missing GPS points using a 2D constant-velocity Kalman filter.
Handles irregular sampling and returns dense trajectories with uncertainty bounds.
"""
n = len(timestamps)
if n == 0:
return {"lat": np.array([]), "lon": np.array([]), "uncertainty": np.array([])}
# Initialize filter
kf = KalmanFilter(dim_x=4, dim_z=2)
kf.x = np.array([positions[0][0], positions[0][1], 0.0, 0.0]) # lat, lon, v_lat, v_lon
kf.P *= 10.0 # Initial state uncertainty
kf.R *= measurement_noise
kf.Q *= process_noise
kf.H = np.array([[1, 0, 0, 0],
[0, 1, 0, 0]]) # Map full state to position only
out_lat = np.empty(n)
out_lon = np.empty(n)
out_unc = np.empty(n)
for i in range(n):
if i > 0:
dt = (timestamps[i] - timestamps[i-1]).total_seconds()
dt = max(dt, 0.1) # Prevent degenerate matrices
# Update transition matrix for variable dt
kf.F = np.array([
[1, 0, dt, 0],
[0, 1, 0, dt],
[0, 0, 1, 0],
[0, 0, 0, 1]
])
# Prediction step (always runs)
kf.predict()
# Update step (only if measurement exists)
if positions[i] is not None:
kf.update(np.array([positions[i][0], positions[i][1]]))
# Store results
out_lat[i] = kf.x[0]
out_lon[i] = kf.x[1]
out_unc[i] = np.sqrt(kf.P[0, 0] + kf.P[1, 1]) # Combined positional std dev
return {"lat": out_lat, "lon": out_lon, "uncertainty": out_unc}
Tuning, Validation & Pipeline Integration
Tuning Q and R dictates the filter’s responsiveness. A low Q trusts the motion model heavily, smoothing out noise but lagging during sharp turns. A high Q tracks rapid maneuvers but amplifies measurement jitter. For urban mobility workloads, start with R derived from the receiver’s reported HDOP or CEP50 values, and scale Q empirically against known ground-truth routes.
When processing high-frequency logistics telemetry, consider upgrading from constant-velocity to a Constant Turn Rate and Velocity (CTRV) model. CTRV explicitly captures angular dynamics, reducing interpolation drift during highway merges or roundabouts. For a rigorous mathematical foundation on state estimation and noise covariance tuning, refer to the An Introduction to the Kalman Filter by Welch & Bishop, which remains the standard reference for linear-Gaussian systems.
Validate interpolated segments using root-mean-square error (RMSE) against held-out pings or by checking if uncertainty bounds remain within acceptable thresholds for your use case. In production pipelines, wrap the filter in a vectorized batch processor or compile it via numba to handle millions of device trajectories without Python loop overhead. The output uncertainty column should be fed directly into downstream confidence scoring, ensuring that analytics never treat a 30-second tunnel gap with the same weight as a 1-second high-accuracy ping.