Back to Insights Remote Sensing

Cloud-Gap Filling in Satellite Imagery: How Temporal Compositing Works

Cloud-Gap Filling in Satellite Imagery: How Temporal Compositing Works

Cloud cover is not a minor inconvenience for satellite-based crop monitoring — it is the fundamental operational challenge of the entire data pipeline. The US Midwest averages 60-70% cloud cover during June and July, which maps almost perfectly onto the vegetative and early reproductive period when crop stress signals matter most. If your yield monitoring architecture depends on cloud-free optical imagery, you have a product that fails exactly when it's most needed. Temporal compositing across multi-day satellite passes is the standard approach to solving this problem — here is how it actually works, where it performs well, and where it still leaves gaps.

Why Cloud Masking Alone Isn't Enough

The first step in any satellite imagery processing pipeline for agriculture is cloud masking — identifying and excluding cloud-contaminated pixels from analysis. Sentinel-2 products include a Scene Classification Layer (SCL) that flags cloud cover, cloud shadows, water, and other non-surface conditions. Landsat's CFMask algorithm serves a similar function. For clear days, these masks work well. The problem is what happens after masking.

Mask out cloudy pixels from a July 15th acquisition and you're left with either a partial image (cloud-free areas only), or no image at all for heavily overcast days. If you're trying to deliver a field-level NDVI estimate for a specific field polygon on July 15th — a date that falls within a 5-day update window — and the last three acquisitions for that location were 100% cloud-covered, you have a data gap that cloud masking cannot resolve on its own. This is where temporal compositing comes in.

The Basic Logic of Temporal Compositing

Temporal compositing uses multiple satellite acquisitions across a defined time window — typically 5-20 days — to build a synthetic clear-sky observation by selecting or blending the best available pixel values across that window. The simplest form is maximum-value compositing: for each pixel position, take the highest NDVI value observed across all acquisitions in the window. This works because cloud shadows reduce reflectance (and thus NDVI), so the maximum value over a short window is more likely to represent the actual vegetation state than a cloud-contaminated lower value.

Maximum-value compositing has a well-known flaw: it selects anomalously high values, not necessarily the most representative ones. A specular reflection artifact or atmospheric forward-scatter event that briefly inflates NDVI for a single pixel on a single date will persist in the maximum composite. In agricultural applications where a 0.05 NDVI difference can represent meaningful stress, this introduces systematic positive bias in the composite.

More sophisticated approaches use weighted compositing — assigning per-pixel weights based on observation quality, time distance from the center of the compositing window, and angular geometry — to produce a temporally smoothed estimate that is less sensitive to outliers. The Savitzky-Golay filter and TIMESAT algorithm are widely used in agricultural remote sensing to smooth NDVI time-series and reconstruct cloud-affected periods. Both approaches essentially fit a smoothing function to the available clear-sky observations, interpolating across cloud-gap periods based on the expected seasonal trajectory of vegetation development.

Multi-Day Pass Stacking: How the Compositing Window Is Designed

For a 5-day update cadence, the compositing window typically extends backward from the update date by 10-15 days to accumulate enough clear-sky observations to produce a reliable composite. A Sentinel-2 acquisition every 5 days (combined A+B constellation) over a 15-day window gives a maximum of 3 possible acquisitions per pixel. In practice, after cloud masking, 1.2-1.8 clear observations per pixel per 15-day window is a realistic expectation for central Illinois in July.

The compositing weight function needs to account for temporal distance: an observation from 12 days ago should contribute less to a "current state" estimate than an observation from 2 days ago, even if both are cloud-free. A linear or Gaussian decay function for the temporal weight, combined with quality weights from the cloud mask confidence scores, produces composites that prioritize recent clear observations while using older observations to fill gaps.

For a concrete example: a field in Story County, Iowa during a cloudy 2024 July had cloud-free acquisitions on July 2nd and July 19th, with seven consecutive cloudy days in between. A 15-day compositing window centered on July 10th can use the July 2nd observation (8 days from center, weight ~0.5) as the primary input. The resulting NDVI estimate for July 10th is less certain than it would be with a same-day clear observation, but it is substantially better than treating the period as a complete data gap. The confidence interval around the composite estimate is propagated forward into the yield model uncertainty.

Multi-Sensor Fusion: When Optical Compositing Still Isn't Enough

Temporal compositing across optical sensors addresses intermittent cloud cover — periods of 5-15 days with gaps in clear-sky observations. It does not address persistent cloud cover — the kind that blankets the Midwest for 3-4 consecutive weeks during La Niña-influenced summers. When cloud gaps exceed 20-25 days, optical compositing produces estimates with uncertainty too large for operational yield monitoring.

The solution for extended cloud cover is sensor fusion with synthetic aperture radar (SAR). Sentinel-1 C-band SAR acquires imagery in all weather conditions, day and night, because radar signals penetrate cloud cover. SAR backscatter from agricultural canopies responds to canopy structure, biomass, and moisture — different physical properties than optical NDVI, but correlated with crop development stage in ways that can be exploited to constrain optical yield model uncertainty during cloud-gap periods.

In practical terms, SAR fusion works by training a joint model that learns the relationship between SAR backscatter time-series and optically-derived NDVI and crop development stage across clear-sky training periods. When optical data is unavailable due to persistent cloud cover, the SAR signal provides a weaker but non-zero constraint on crop state that narrows the uncertainty distribution in the yield model. It's not a substitute for optical NDVI — the predictive relationship is noisier and crop-type specific — but it is meaningfully better than discarding the time period entirely.

We're not saying temporal compositing alone is sufficient for year-round crop monitoring across all geographies. It performs well for mid-latitude continental interiors with typical cloud frequency, and it fails during extended overcast periods that occur in some years. The multi-sensor fusion approach with SAR adds operational complexity and requires additional calibration, but it is the architecture needed for a data product that commits to 5-day updates regardless of weather conditions.

Communicating Compositing Quality to End Users

One operational aspect that data product design teams often underinvest in: communicating the quality of composited observations to downstream users. An actuary or commodity trader consuming a yield estimate from the API should know whether that estimate is backed by a clear-sky Sentinel-2 acquisition from yesterday or a temporal composite inferred from a two-week-old observation through an SAR-constrained model. The uncertainty is different, and the decisions informed by that estimate should reflect it.

The appropriate design is to include per-observation quality metadata in the API response: number of clear-sky acquisitions contributing to the composite, maximum temporal gap between observations in the compositing window, and the resulting confidence interval on the NDVI and yield estimate. A response that returns a P50 yield estimate alongside a ±4% confidence interval (recent clear observation) versus a ±14% confidence interval (extended cloud gap, SAR-inferred) gives the end user the information needed to make appropriate use of the data — and builds the kind of epistemic honesty about data quality that enterprise buyers in insurance and finance require.