EPT-2 vs ECMWF HRES: AI Forecast Accuracy in Europe

EPT-2 vs ECMWF HRES: AI Forecast Accuracy in Europe

ON THIS PAGE

Written by: Olivier Lam, Physical AI Team, Jua.ai AG

Key takeaways for European energy traders

  • EPT-2 outperforms ECMWF HRES on 10 m wind, 100 m wind, 2 m temperature, and surface solar radiation across 0–240 h lead times on StationBench.

  • EPT-2e beats the 50-member ECMWF ENS mean on RMSE and CRPS at virtually every lead time while running at a fraction of the compute cost.

  • ECMWF AIFS underperforms raw IFS ENS on probabilistic metrics and lacks SSRD output, which limits its value for energy trading.

  • Jua for Energy runs alongside existing ECMWF subscriptions and replaces manual pipelines and spreadsheets rather than the ECMWF feed itself.

  • Run a live benchmark of EPT-2 against your current provider.

EPT-2 accuracy vs ECMWF and Aurora on Europe

As of June 2026, EPT-2 ranks among the leading deterministic models for Europe on energy-critical variables and lead times, evaluated on StationBench against real ground stations. ECMWF HRES remains the forty-year benchmark and the universal reference point for the energy industry. EPT-2 outperforms HRES on the tested variables, while Jua for Energy runs alongside the ECMWF subscription rather than replacing it.

The table below compares the four models most relevant to European energy trading on dimensions that drive operational value. RMSE rankings come from arXiv:2507.09703 and arXiv:2410.15076. Update frequency and inference cost figures are sourced from the AIFS Single v1.1.0 operational paper and Jua’s published specifications.

Model

Deterministic accuracy vs HRES (10 m wind, 100 m wind, 2 m temp, SSRD, 0–240 h)

Update frequency

Inference cost per simulation

EPT-2 (Jua)

Outperforms HRES on the four variables over 0–240 h

Up to 24×/day (EPT-2 RR); 4×/day flagship

~0.25 kWh, ~$0.20–$15 on a single GPU

ECMWF HRES

The benchmark itself, with 40 years of NWP leadership

2–4×/day

~8,400 kWh, €1,000–€20,000 on HPC

ECMWF AIFS Single v1.1.0

Operational AI model; raw AIFS underperforms raw IFS ENS on CRPS and MAE across all lead times

4×/day

~2.5 min on a single A100 GPU

Microsoft Aurora

Loses to EPT-2 on 10 m wind, 100 m wind, and 2 m temperature across 0–240 h; no SSRD output

Typically 4×/day research cadence; no productised operational schedule

Similar order of magnitude to EPT-2 for inference, with EPT-2 ~25% faster

A 1 GW wind portfolio that gains four percentage points of forecast accuracy saves approximately €1.5 M per year under typical European hedging and imbalance-penalty structures. A 1 GW solar portfolio at the same accuracy gain saves about €3 M per year. Multi-GW portfolios scale these economics linearly.

Given EPT-2’s performance advantage over HRES, the next question is how ECMWF itself uses AI to compete.

How ECMWF uses AI alongside IFS

ECMWF AIFS Single v1 became operational on 25 February 2025, with version 1.1.0 released on 27 August 2025. Rather than replacing the physics-based IFS, AIFS runs alongside it as a complementary system that produces forecasts with substantially reduced compute time relative to the full IFS cycle.

AIFS runs four times daily at 00/06/12/18 UTC with 6-hourly output steps out to 360 hours. Its grid spacing is approximately 28 km, compared with 9 km for HRES. AIFS Single and AIFS ENS CF both underestimated peak 10 m wind speeds during Storm Amy (October 2025) and Storm Éowyn (January 2025), smoothing sharp frontal gradients and local wind maxima.

That structural limitation matters for energy traders who need accurate wind-ramp prediction. ECMWF itself states that traditional NWP remains essential for resolving fine-scale processes such as compact vorticity maxima, sharp fronts, and local jets responsible for the most damaging winds of severe mid-latitude storms.

AIFS is available on the Jua platform alongside EPT-2, ECMWF HRES, ECMWF ENS, and 22 other models, all under a single schema and a single API. Customers do not need to choose between ECMWF’s AI model and Jua’s; both run in the same workspace. That said, understanding the technical differences between the two models helps clarify when each is most valuable.

EPT-2 differs from AIFS in three operationally significant ways. First, EPT-2 forecasts at native any-Δt, so it predicts at arbitrary lead times rather than rolling forward in fixed 6-hour increments. This architecture matters because AIFS’s 6-hour rollforward compounds error at longer lead times, which degrades accuracy when traders need multi-day visibility. Second, EPT-2 natively covers surface solar radiation (SSRD), a variable AIFS does not output, which creates a critical gap for solar portfolio operators. Third, EPT-2 RR updates up to 24 times per day, compared with AIFS’s four daily runs, which enables intraday reforecasting as conditions evolve.

Compare EPT-2, AIFS, and HRES head-to-head on your region and variable.

How AIFS ensemble skill compares to EPT-2e

The most rigorous independent evaluation of ECMWF AIFS ensemble performance in 2026 is the Kocsis and Baran study, which assessed operational 50-member ensemble forecasts for 10 m wind speed from July to November 2025 at 9,246 global SYNOP stations. The findings are unambiguous. Raw IFS ensemble forecasts substantially outperformed raw AIFS forecasts across all lead times up to 15 days on CRPS, quantile score, coverage, and ensemble-median MAE, with AIFS showing CRPSS around −4% and MAES around −3% relative to IFS.

After post-processing with EMOS or quantile regression, the skill gap between IFS and AIFS narrowed substantially, and IFS retained a small but statistically significant advantage primarily at short lead times up to day 4. Raw AIFS, without post-processing, does not act as a drop-in replacement for IFS ENS on probabilistic skill metrics.

EPT-2e, Jua’s ensemble variant, operates on a different benchmark baseline. The table below summarises ensemble skill on the three variables most relevant to European energy trading, drawn from arXiv:2507.09703 and the Kocsis and Baran 2026 study.

Variable

EPT-2e vs ECMWF ENS mean (RMSE and CRPS, 0–240 h)

Raw AIFS ENS vs raw IFS ENS (CRPS, 0–360 h)

100 m wind speed

EPT-2e beats ENS mean at virtually every lead time

Raw AIFS underperforms raw IFS across all lead times

Surface solar radiation (SSRD)

EPT-2e beats ENS mean at virtually every lead time; AIFS has no SSRD output

Not applicable, because AIFS does not output SSRD

2 m temperature

EPT-2e beats ENS mean at virtually every lead time

Raw AIFS underperforms raw IFS across all lead times

EPT-2e runs 4 times per day with 30 members and a 60-day ensemble horizon. ECMWF ENS runs 50 members. EPT-2e beats the 50-member ENS mean on both RMSE and CRPS at virtually every lead time, with fewer members and at a fraction of the compute cost.

How AI weather models stack up against ECMWF

Model performance depends on the AI system, the variable, and the event type. For routine conditions across energy-critical variables, the evidence in 2026 favors EPT-2 over ECMWF HRES. For record-breaking extremes, the picture is more nuanced.

A peer-reviewed study in Science Advances found that GraphCast, Pangu-Weather, and Fuxi systematically underestimated the frequency and intensity of record-breaking heat, cold, and wind events, with ECMWF HRES producing lower RMSE than all tested AI models on those out-of-distribution extremes.

EPT-2 is a physics-constrained foundation model whose outputs respect conservation laws for mass, momentum, and energy, which addresses the structural extrapolation limitation that affects purely data-driven models on extremes.

The operational comparison between EPT-2e and ECMWF ENS is concrete. EPT-2e updates 4 times per day with a 60-day ensemble horizon, matching ECMWF’s cadence while running at a fraction of the cost. That cost asymmetry, roughly four orders of magnitude cheaper than traditional NWP, is what makes 24 daily updates from EPT-2 RR economically viable where traditional NWP is capped at two to four runs per day by HPC infrastructure costs.

EPT-2 also operates at a native resolution down to 5 km over Europe, compared with 9 km for ECMWF HRES and approximately 28 km for AIFS. The Jua platform delivers products at up to 1 km resolution. For wind-ramp prediction at specific turbine sites or solar irradiance at plant level, that resolution difference translates directly into forecast accuracy at the asset.

Microsoft Aurora, the closest AI peer on deterministic accuracy, loses to EPT-2 on 10 m wind, 100 m wind, and 2 m temperature across the full 0–240 hour range. Aurora has no SSRD output. Aurora shares AIFS’s rollforward limitation, while EPT-2’s native any-Δt architecture avoids the compounding error discussed earlier. EPT-2 inference is approximately 25% faster than Aurora.

For European energy traders, EPT-2 is a leading model on the variables that drive P&L, including wind at hub height, surface solar radiation, and 2 m temperature, across the lead times that matter for day-ahead and intraday trading. ECMWF HRES remains the essential reference signal. Jua for Energy delivers both in the same workspace.

See results in under 5 minutes and run live benchmarks on your own region and variables, head-to-head across 25+ models.

Frequently asked questions from trading desks

We already have ECMWF, so why add Jua for Energy?

Jua for Energy does not replace ECMWF. Most customers keep their ECMWF subscription and run Jua for Energy alongside it. ECMWF AIFS, ECMWF’s own AI model, runs on the Jua platform. Jua for Energy replaces everything around the ECMWF feed, including the in-house grib pipeline, the spreadsheet stitching, the manual benchmarking, and the morning-briefing routine.

The 7–9 a.m. prep routine compresses into a single workspace, refreshed up to 24 times per day, where ECMWF HRES, ECMWF ENS, AIFS, Aurora, and EPT-2 appear on the same screen under one schema and one API. EPT-2 also outperforms HRES on energy-critical variables and lead times, so the customer gains both workflow consolidation and a more accurate primary signal.

AI models hallucinate, so can EPT-2 outputs be trusted?

LLMs hallucinate because they are unconstrained on the symbolic surface, so token sequences that look plausible can be physically nonsensical. EPT is a physics foundation model, not a language model. It is trained on observational physics and its outputs are constrained by the conservation laws, including mass, momentum, and energy, that govern the real atmosphere.

The architecture cannot produce outputs that violate those laws in the way a generic transformer applied naively to physics would. Validation is external and reproducible. EPT-2 is benchmarked against real ground stations on StationBench, and the results are published in technical reports on arXiv (2507.09703 for EPT-2 and 2410.15076 for EPT-1.5). An LLM is unconstrained on the symbolic surface, while a physics model is constrained at the representation.

How fast can we prove this in our own environment?

Proof of value takes about 5 minutes for the first benchmark. The live benchmark on the Jua platform is the standard proof-of-value step. A prospect selects a region and a variable that matters to their book, typically 100 m wind over a wind-rich region of their home market or SSRD over a solar portfolio, selects their current provider alongside EPT-2, and the platform returns a head-to-head accuracy comparison on the spot.

Backtests against years of historical forecasts run in approximately 5 minutes via Athena, Jua’s AI agent. The objection shifts from “is this real?” to “how fast can we sign?” Customers operating multi-GW portfolios have closed procurement in as little as two weeks from first benchmark to contract.

Conclusion: what the 2026 StationBench results mean for P&L

The 2026 StationBench results are clear. EPT-2 beats ECMWF HRES on lead times and on variables that drive a European energy P&L, including 10 m wind, 100 m wind, 2 m temperature, and surface solar radiation. EPT-2e outperforms the 50-member ECMWF ENS mean on both deterministic and probabilistic metrics. ECMWF AIFS, the incumbent’s own AI model, underperforms raw IFS ENS on probabilistic skill metrics before post-processing and lacks SSRD output entirely.

Jua is a foundation model and agent company. EPT is a general physics foundation model and Athena is an AI agent. Jua for Energy is the first applied product, the platform that delivers EPT-2 and EPT-2e forecasts up to 24 times per day at approximately 0.25 kWh per simulation, alongside ECMWF HRES, ECMWF ENS, AIFS, Aurora, GraphCast, and 19 other models, all under a single schema. The manual grib pipeline, the spreadsheet stitching, and the morning-briefing routine are displaced. The ECMWF subscription stays.

The architecture learns physics and the domain is a variable. The atmosphere is the first physical system EPT has been fine-tuned for. Energy trading is the first market Athena has been instrumented for. Both will expand.

See the numbers for yourself and compare EPT-2 head-to-head against your current forecast provider on your region, your variables, and your lead times.

Want to talk to the team
behind the writing?

Book a demo to see EPT-2 and Athena in production, or read the open papers behind the work.