Loading
Loading
C02
The model was trained on data that doesn't represent the conditions where it matters most — and nobody checked whose experience was missing from the training set.
Data-Constrained Predictive Models Failing at Extremes
26 problems across 15 domains · v3: 10 → v4: 18 → v5: 22 → v6: 25
Shared Structural DNA
These ~25 models — predicting ocean storms, building energy use, tornado damage, permafrost thaw, and bioreactor behavior — all fail at the same point: they were trained on data that doesn't represent deployment conditions. Five sub-patterns create the same failure through different mechanisms: events too rare to observe, variables too hidden to measure, databases too fragmented to combine, data too abundant to process, and training data that systematically excludes populations it claims to serve. The representativeness sub-pattern reframes 'data-constrained' from a technical problem to a structural one.
Models trained on common conditions fail on rare/extreme events — compound hazards, tornadoes, wildfires
Critical variables never measured in training data — carbon decomposition, organism adaptation, biological carbon dynamics
Needed data exists but is siloed across organizations, disciplines, or ecosystems
Data exists but volume overwhelms processing capacity (HL-LHC collisions)
Training data systematically excludes populations or conditions the model claims to serve — dialect bias, equity blindness, climate mismatch
Member Problems
Domain Spread