Ten Million Alerts Before Dawn

Astronomy's Next Survey Will Generate 10 Million Alerts Per Night and No System Can Classify Them in Time

digitalspace

Problem Statement

The Vera C. Rubin Observatory's Legacy Survey of Space and Time (LSST) will begin scanning the entire visible southern sky every three nights starting in 2025-2026, generating approximately 10 million transient alerts per night — objects that have changed brightness, position, or appeared for the first time. Each alert must be classified (supernova, asteroid, variable star, instrumental artifact, etc.) and distributed to follow-up telescopes within 60 seconds of image acquisition. Current alert broker systems (ANTARES, Fink, ALERCE, Lasair) have been tested on precursor surveys generating ~10,000-100,000 alerts/night and face three unsolved problems at LSST scale: (1) classification accuracy degrades when alert volume exceeds training set diversity, (2) real-time cross-matching against catalogs with billions of objects creates latency that exceeds the 60-second requirement, and (3) the science community lacks infrastructure to filter 10 million alerts/night to the ~1,000 that warrant immediate follow-up for any given science case.

Why This Matters

Time-domain astronomy — the study of objects that change on timescales from seconds to years — is among the fastest-growing fields in astrophysics. Kilonovae (neutron star mergers producing heavy elements), gravitational-wave electromagnetic counterparts, tidal disruption events, and interstellar objects all require rapid identification and spectroscopic follow-up within hours to days. A delay of even one night can mean missing the critical early evolution of a supernova or losing a near-Earth asteroid. LSST will discover more transient objects in its first year than all previous surveys combined, but this scientific bounty is only realized if the alert stream can be processed, classified, and prioritized in real time.

What’s Been Tried

The Zwicky Transient Facility (ZTF), operating since 2018, generates ~100,000-300,000 alerts/night and serves as the primary testbed for LSST alert infrastructure. ZTF brokers use random forest and neural network classifiers trained on spectroscopically confirmed objects, but the training sets are biased toward bright, nearby, common transient types — they systematically misclassify rare events (the scientifically most interesting ones). Cross-matching ZTF alerts against Gaia, Pan-STARRS, and 2MASS catalogs takes milliseconds per alert, but LSST alerts must be cross-matched against the LSST source catalog itself (~37 billion detections), and the indexing and query infrastructure for sub-second cross-matching at this scale is unproven. Alert filtering (deciding which alerts a given science team should see) currently relies on simple cuts (sky region, brightness range, color), which cannot express the complex, multi-parameter selection criteria needed for scientifically motivated filtering. Community alert filtering services like the LSST Science Platform's alert filtering system are being designed but have not been tested at full volume.

What Would Unlock Progress

Scalable, GPU-accelerated classification pipelines that can process 10 million alerts through multi-class probabilistic classifiers in under 60 seconds. Self-supervised or few-shot learning approaches that can identify anomalous/rare transients without requiring large labeled training sets for every class. Distributed database architectures with sub-millisecond spatial cross-matching against billion-row catalogs. Community-facing alert filtering interfaces that allow scientists to define complex selection functions (combining photometric, astrometric, and contextual features) without requiring database programming expertise.

Entry Points for Student Teams

A student team could build an alert classification pipeline using public ZTF alert streams (available via the ZTF Alert Archive), implementing and benchmarking different ML classifiers (random forest, CNN, transformer) on the task of early-time transient classification (classification from the first 1-3 detections only, when follow-up decisions must be made). Alternatively, a team could develop an anomaly detection system designed to flag alerts that don't fit any known class, using autoencoders or isolation forests on ZTF photometric features. Relevant disciplines: machine learning, database engineering, distributed systems, astronomy.

Genome Tags

Constraint

technicaldata

Domain

digitalspace

Scale

global

Failure

unrepresentative-data

Breakthrough

algorithmdata-integration

Stakeholders

multi-institution

Temporal

window

Tractability

prototype

Source Notes

- The `temporal:window` tag reflects that LSST first light is imminent (2025-2026). Alert infrastructure must be operational before the survey begins; retroactive processing misses the time-critical science. - The `failure:unrepresentative-data` tag captures the core ML challenge: training sets from smaller surveys are biased toward bright/common transients and cannot represent the diversity of objects LSST will discover. - The `failure:not-attempted` tag reflects that no system has ever processed 10 million astronomical alerts/night. The scale is qualitatively different from precursor surveys. - Cross-domain connection: shares the data-scaling structure with digital-hllhc-collision-data-processing (both involve processing pipelines built for current-generation instruments that cannot scale to next-generation data rates) and the unrepresentative-training-data structure with ocean-dl-extreme-event-failure (models that work on common cases but fail on rare/extreme events). - Public ZTF alert data, ZTF light curves, and Transient Name Server classifications provide excellent real-world datasets for student projects.