Fifty Times the Collisions, Same Computing Budget

The High-Luminosity LHC Will Produce 50× More Collision Data Than Current Computing Can Handle

digitalspace

Problem Statement

The High-Luminosity Large Hadron Collider (HL-LHC), scheduled to begin operations at CERN in 2029, will increase the LHC's collision rate by a factor of 5-7, producing approximately 200 simultaneous proton-proton collisions per beam crossing (vs. ~40 currently) and generating raw data at ~5 TB/second. After triggering and reconstruction, the experiments (ATLAS, CMS) will each accumulate approximately 1 exabyte of data over the HL-LHC's lifetime — 50-100× more than current LHC operations. The Worldwide LHC Computing Grid (WLCG), which currently processes LHC data using ~1 million CPU cores across 170 sites in 42 countries, cannot scale to HL-LHC requirements through incremental hardware upgrades alone. Moore's Law improvements will provide only a factor of ~3-5× by 2029, leaving a factor of ~10-20× computing gap that must be closed through algorithmic innovation, heterogeneous computing (GPUs, FPGAs), and fundamentally new approaches to event reconstruction and analysis.

Why This Matters

The HL-LHC is a $5+ billion international investment (including detector upgrades) designed to collect 10× more data than the LHC has accumulated to date, enabling measurements of the Higgs boson's self-coupling, searches for new particles beyond the Standard Model, and precision tests of fundamental symmetries. The P5 report identified HL-LHC completion and exploitation as its highest-priority near-term recommendation. If the computing challenge is not solved, the experiments will be forced to either discard data (reducing physics reach) or delay analysis (missing discovery opportunities). The computing gap is not a future problem — detector upgrade designs are being finalized now, and their data volumes are fixed by physics and engineering choices. The computing model must be ready when the detectors turn on.

What’s Been Tried

The current WLCG model distributes computing across a tiered hierarchy of data centers, with CERN as Tier-0 and national laboratories as Tier-1. This model was designed for LHC Run 1 (2010-2013) and has been incrementally scaled. Event reconstruction algorithms (track finding, calorimeter clustering, particle identification) use iterative combinatorial methods whose computational cost scales superlinearly with the number of simultaneous collisions — the 5× increase in pile-up produces a 10-20× increase in reconstruction time per event. Machine learning has been applied to specific subtasks (jet tagging, b-tagging, anomaly detection) with significant speedups, but full ML-based reconstruction pipelines remain research prototypes. GPU porting of reconstruction code has shown 10-100× speedups for specific algorithms, but the full reconstruction chain includes hundreds of algorithms written over 20+ years in C++, and converting this codebase to heterogeneous computing architectures is a massive software engineering effort. Commercial cloud computing could supplement the Grid, but network bandwidth costs for transferring petabytes of data and the unpredictable pricing of cloud resources create economic uncertainty.

What Would Unlock Progress

End-to-end event reconstruction pipelines that run natively on GPUs or other accelerators, rather than porting individual algorithms piecemeal. Graph neural networks for track reconstruction that scale linearly (rather than combinatorially) with pile-up, reducing the dominant computational bottleneck. "Analysis facilities" that co-locate computing with data (columnar data formats, query-based analysis) rather than distributing copies of event data to analyst laptops. Reduced precision computing (using float16 or integer arithmetic where full float64 is unnecessary) for specific reconstruction steps. Physics-informed data reduction at the trigger level — using ML-based triggers that make more sophisticated real-time decisions about which events to keep, reducing the data volume that must be reconstructed offline.

Entry Points for Student Teams

A student team could implement and benchmark a graph neural network for particle track reconstruction using public CMS or ATLAS simulated event samples (available via CERN Open Data), comparing computational cost and physics performance against traditional combinatorial algorithms. Alternatively, a team could prototype a GPU-accelerated implementation of a specific reconstruction algorithm (vertex finding, jet clustering, or calorimeter energy calibration) and measure the speedup achievable on consumer GPU hardware. Relevant disciplines: computer science, high-performance computing, machine learning, particle physics.

Genome Tags

Constraint

technicaldataeconomic

Domain

digitalspace

Scale

global

Failure

ignored-context

Breakthrough

algorithmdata-integrationhardware-integration

Stakeholders

multi-institution

Temporal

window

Tractability

prototype

Source Notes

- The P5 report identified HL-LHC completion and exploitation as the highest-priority near-term recommendation, explicitly noting computing challenges as a risk to the physics program. - The `failure:not-attempted` tag reflects that no experiment has operated at HL-LHC pile-up levels (200 simultaneous collisions). The reconstruction algorithms and computing model for this regime are untested. - The `failure:ignored-context` tag captures the "technical debt" problem: reconstruction software written over 20+ years was designed for lower pile-up and sequential CPU execution. Adapting it to heterogeneous architectures is a larger challenge than the original algorithm development. - The `temporal:window` tag reflects the fixed HL-LHC schedule: detector upgrades are being installed during Long Shutdown 3 (2026-2029), and the computing model must be ready when data-taking begins. - The `domain:space` secondary tag reflects that HEP computing challenges are closely related to astrophysical data processing (SKA, Rubin Observatory) — the techniques developed for HL-LHC directly benefit adjacent fields. - Cross-domain connection: shares the data-rate-exceeds-processing-capacity structure with digital-astronomical-transient-alert-processing (next-generation instruments that overwhelm current data pipelines) and the legacy-software-adaptation challenge with infrastructure-scada-legacy-ai-detection. - Public data: CERN Open Data (opendata.cern.ch) provides real and simulated LHC data for student projects.