Why does covariance grow during GPS dropout periods?

During dropouts only the prediction step runs, so P accumulates process noise Q each tick without any measurement correction. This is correct behaviour — the growing uncertainty reflects the filter's honest acknowledgement that it is dead-reckoning.

How do I choose Q and R for urban fleet telemetry?

Derive R from the receiver's reported CEP50 or HDOP: R ≈ (HDOP × 5m)². For Q, start at 0.1–1.0 m²/s⁴ and tune empirically against held-out pings on known routes. Urban stop-and-go patterns need a slightly higher Q than highway cruise.

Can I use the Kalman filter on raw WGS84 coordinates?

Technically yes, but Q and R units become meaningless degrees² rather than metres². Always project to a metric CRS before filtering, then reproject the output back to WGS84 for storage.

When should I upgrade from constant-velocity to CTRV?

When your trajectories include frequent sharp turns — roundabouts, motorway on-ramps, or multi-storey car parks. The Constant Turn Rate and Velocity model adds a yaw-rate state that significantly reduces drift during cornering.

Interpolating Missing GPS Points with Kalman Filters

A Kalman filter fills GPS dropouts by replacing naive straight-line assumptions with a recursive Bayesian estimator that models vehicle kinematics and sensor uncertainty. During signal loss only the prediction step runs, advancing the trajectory while inflating positional covariance; when a valid ping returns, the update step fuses prediction and measurement weighted by their relative confidence. The result is a statistically optimal, temporally continuous trace with an explicit uncertainty column that downstream analytics can consume directly.

Why This Happens

Raw telemetry streams in urban mobility pipelines rarely arrive at fixed intervals. Multipath reflections in dense street canyons, tunnel outages, and aggressive device power-saving modes create irregular gaps that break standard resampling routines. These dropouts are the core problem addressed by Gap Filling in Sparse Trajectories, which sits at the foundation of Temporal Aggregation & Window Mapping.

Geometric methods — linear and cubic splines — know nothing about momentum. They connect the last known point to the next observed point as if the vehicle teleported along a smooth curve, producing unrealistic artefacts across even 10-second gaps. A state-space model respects physics: the filter’s motion model encodes that position changes with velocity and that velocity changes slowly, so predictions drift realistically under uncertainty rather than drawing implausible straight lines.

The diagram below shows the predict-update cycle across a 3-ping dropout.

Core Mitigation Pipeline

Project to a metric CRS — reproject WGS84 coordinates to EPSG:3857 or a local UTM zone so that state velocities carry metre-per-second units rather than meaningless decimal-degree deltas.
Build the constant-velocity state-space model — define the state vector, transition matrix parameterised by elapsed Δt, observation matrix, and noise covariances Q and R.
Run the predict-update loop — for every timestamp: always execute the prediction step; execute the update step only when a non-null measurement is present.
Attach uncertainty metadata and reproject — store the combined positional standard deviation per row, then reproject output back to WGS84 for storage or downstream joins.

Production-Ready Python Implementation

The implementation below uses filterpy and projects through pyproj so all state units are metres and m/s. It handles irregular timestamps, skips missing measurements, returns a dense trajectory aligned to the original index, and raises informative errors on degenerate inputs.

PYTHON

import numpy as np
import pandas as pd
from filterpy.kalman import KalmanFilter
from pyproj import Transformer

# NOTE: All distance/velocity arithmetic is performed in EPSG:3857 (metres).
# Raw WGS84 degrees are never fed into the filter — doing so makes Q and R
# dimensionally meaningless and silently degrades accuracy.

def interpolate_gps_kalman(
    df: pd.DataFrame,
    lat_col: str = "lat",
    lon_col: str = "lon",
    time_col: str = "timestamp",
    process_noise: float = 0.5,    # Q scale: m²/s⁴ — tune per transport mode
    measurement_noise: float = 25.0,  # R: m² — ~5 m CEP50 for consumer GNSS
) -> pd.DataFrame:
    """
    Fill missing GPS fixes in *df* using a 2D constant-velocity Kalman filter.

    Parameters
    ----------
    df : pd.DataFrame
        Must have columns *lat_col*, *lon_col*, *time_col*.
        Rows where lat/lon are NaN are treated as dropout ticks.
        The DataFrame must be sorted by *time_col* and contain at least
        two non-null fixes for initialisation.
    process_noise : float
        Scale factor for the process noise covariance Q (m²/s⁴).
        Higher values allow the filter to track sharper manoeuvres.
    measurement_noise : float
        GPS measurement variance in m² (R diagonal).
        Derive from receiver CEP50: R ≈ (CEP50)².

    Returns
    -------
    pd.DataFrame
        Copy of *df* with 'lat', 'lon' filled and two new columns:
        'is_interpolated' (bool) and 'kf_uncertainty_m' (float, metres std-dev).
    """
    if df.empty:
        raise ValueError("Input DataFrame is empty.")

    required = {lat_col, lon_col, time_col}
    missing = required - set(df.columns)
    if missing:
        raise ValueError(f"Missing required columns: {missing}")

    df = df.copy().sort_values(time_col).reset_index(drop=True)

    # Identify the first non-null fix for initialisation
    valid_mask = df[lat_col].notna() & df[lon_col].notna()
    if valid_mask.sum() < 2:
        raise ValueError("Need at least 2 non-null GPS fixes to initialise the filter.")

    first_valid = df.loc[valid_mask].iloc[0]

    # --- Project to metric CRS (EPSG:3857) ---
    to_metric = Transformer.from_crs("EPSG:4326", "EPSG:3857", always_xy=True)
    to_wgs84  = Transformer.from_crs("EPSG:3857", "EPSG:4326", always_xy=True)

    init_x, init_y = to_metric.transform(first_valid[lon_col], first_valid[lat_col])

    # --- Initialise 2D constant-velocity filter ---
    # State: [x, y, vx, vy]  (metres, m/s in EPSG:3857)
    kf = KalmanFilter(dim_x=4, dim_z=2)
    kf.x = np.array([init_x, init_y, 0.0, 0.0], dtype=float)
    kf.P = np.eye(4) * 50.0          # Initial state uncertainty (m² / (m/s)²)
    kf.H = np.array([[1, 0, 0, 0],   # Observe x position
                     [0, 1, 0, 0]])  # Observe y position
    kf.R = np.eye(2) * measurement_noise
    kf.Q = np.eye(4) * process_noise
    # F is set per-tick to handle variable Δt

    out_lats = df[lat_col].values.copy().astype(float)
    out_lons = df[lon_col].values.copy().astype(float)
    out_unc  = np.full(len(df), np.nan)
    is_interp = np.zeros(len(df), dtype=bool)

    timestamps = pd.to_datetime(df[time_col])

    for i in range(len(df)):
        if i == 0:
            # No prediction at the very first tick; just record init state
            out_x, out_y = kf.x[0], kf.x[1]
            out_unc[i] = float(np.sqrt(kf.P[0, 0] + kf.P[1, 1]))
            # If first row has no fix, mark interpolated
            if not valid_mask.iloc[i]:
                is_interp[i] = True
            lon_out, lat_out = to_wgs84.transform(out_x, out_y)
            out_lats[i], out_lons[i] = lat_out, lon_out
            continue

        dt = (timestamps.iloc[i] - timestamps.iloc[i - 1]).total_seconds()
        dt = max(dt, 0.1)  # Guard against duplicate timestamps

        # Time-varying transition matrix
        kf.F = np.array([
            [1, 0, dt,  0],
            [0, 1,  0, dt],
            [0, 0,  1,  0],
            [0, 0,  0,  1],
        ], dtype=float)

        # Prediction step — always runs
        kf.predict()

        # Update step — only when a valid fix exists
        if valid_mask.iloc[i]:
            meas_x, meas_y = to_metric.transform(
                df[lon_col].iloc[i], df[lat_col].iloc[i]
            )
            kf.update(np.array([meas_x, meas_y]))
        else:
            is_interp[i] = True

        out_unc[i] = float(np.sqrt(kf.P[0, 0] + kf.P[1, 1]))
        lon_out, lat_out = to_wgs84.transform(kf.x[0], kf.x[1])
        out_lats[i], out_lons[i] = lat_out, lon_out

    df[lat_col]           = out_lats
    df[lon_col]           = out_lons
    df["is_interpolated"] = is_interp
    df["kf_uncertainty_m"] = out_unc

    return df

Validation Block

After running interpolate_gps_kalman, confirm the output before passing it downstream: