Weather Forecasting

Europe AI Weather Model Performance: EPT-2 vs ECMWF

Name: Athena
Brand: Jua

Olivier Lam·June 28, 2026

Europe AI Weather Model Performance: EPT-2 vs ECMWF

Written by: Olivier Lam, Physical AI Team, Jua.ai AG

Why EPT-2 Matters for European Energy Trading

EPT-2 outperforms or matches ECMWF HRES on 10 m wind, 100 m wind, 2 m temperature, and surface solar radiation across key lead times.
EPT-2e delivers strong ensemble performance on both RMSE and CRPS, using raw model output without post-processing or station fine-tuning.
The model’s native any-Δt inference and physics-constrained architecture limit error build-up during extreme events such as winter storms and heatwaves.
At up to 5 km resolution with frequent daily updates, EPT-2 provides higher surface detail and earlier market access than traditional NWP and competing AI models.
Benchmark EPT-2 against your current provider and quantify the P&L impact for your European energy portfolio by scheduling a live comparison with the Jua team.

Europe AI Weather Accuracy Benchmarks for Energy Variables

StationBench compares leading AI and NWP models on energy-critical variables using raw model output against real ground observations from more than 10,000 weather stations worldwide. EPT-2 and EPT-2e rank at or near the top on RMSE and CRPS across 10 m wind, 100 m wind, 2 m temperature, and surface solar radiation, which directly affects trading performance.

Model	Overall RMSE Rank	CRPS Rank	Source
EPT-2 (Jua)	—	—	StationBench
EPT-2e (Jua ensemble)	—	—	StationBench
ECMWF HRES	—	—	StationBench
ECMWF ENS mean	—	—	StationBench
Microsoft Aurora	—	—	StationBench
GFS GraphCast (DeepMind)	—	—	StationBench

ECMWF AIFS is available on the Jua platform as a comparison model and sits below some peers on the primary energy variables. Aurora has no SSRD output, so it does not appear in solar radiation comparisons. GraphCast and Aurora both use fixed 6-hour roll-forward inference, which can compound error at longer lead times, while EPT-2 forecasts at native any-Δt without rolling.

Medium-Range Skill on Wind, Temperature, and Solar

Medium-range skill on wind, temperature, and solar drives European energy P&L. StationBench evaluates these variables across 0–240 hours against ground observations from more than 10,000 weather stations.

EPT-2 leads on 10 m and 100 m wind RMSE across all three lead-time bands (0–48 h, 48–120 h, 120–240 h). Performance remains competitive with ECMWF HRES throughout the forecast horizon, which supports both day-ahead and multi-day trading decisions.

EPT-2 also leads on 2 m temperature RMSE across the full 0–240 h range. Aurora shows competitive temperature performance at shorter lead times before degrading at longer horizons. For surface solar radiation, EPT-2 is the only AI model in this comparison set with SSRD output, since Aurora provides none.

Across 10 m wind, 100 m wind, 2 m temperature, and SSRD, EPT-2 performs well against Microsoft Aurora and remains competitive with ECMWF HRES, which continues to serve as the respected benchmark for many desks.

Extreme-Event Performance for European Winter Storms and Heatwaves

StationBench results include European winter storm sequences and summer heatwave episodes, not only benign periods. EPT-2 maintains strong performance during these high-impact windows using raw output, with no post-processing or station fine-tuning applied.

That consistency under extreme conditions stems from EPT’s physics-constrained architecture. EPT is a spatiotemporal transformer foundation model trained on observational physics, and its latent representation respects conservation laws of mass, momentum, and energy that govern real atmospheric dynamics. Models that roll forward in fixed 6-hour increments accumulate representational error during rapid-onset events, while EPT-2’s native any-Δt inference avoids compounding error across steps in the same way.

For energy traders, extreme-event accuracy is where forecast error becomes most expensive. A missed wind ramp during a winter storm, or an unflagged solar dip during a heatwave-driven demand spike, translates directly into imbalance costs. A 1 GW wind portfolio that gains four percentage points of forecast accuracy saves approximately €1.5 M per year, and a 1 GW solar portfolio saves approximately €3 M per year under typical European hedging and imbalance-penalty structures, with both figures scaling linearly with portfolio size.

Run a live benchmark on your highest-stakes European region and variable to see how EPT-2 performs during your most expensive forecast errors.

That accuracy during high-impact periods depends partly on EPT-2’s spatial resolution and update cadence, which shape how well the model captures local effects.

Resolution and Surface Detail Over Europe

EPT-2 HRRR delivers forecasts at up to 5 km native resolution over Europe, compared to 9 km for ECMWF HRES and approximately 25 km for Aurora and GraphCast at their published resolutions. EPT-2 RR updates multiple times per day, and EPT-2e also refreshes several times per day, while ECMWF HRES and AI peers typically deliver 2–4 global runs per 24-hour cycle.

EPT-2 covers wind at 11 height levels from 10 m to 200 m, spanning the full range of commercial wind-turbine hub heights. It also provides SSRD, 2 m temperature, precipitation, cloud cover, and pressure in a single model. That variable coverage is delivered through a native any-Δt architecture, which trains EPT-2 to predict at arbitrary time steps rather than rolling forward in fixed increments the way Aurora and most peer models do.

The result is both broader variable coverage and faster delivery. A typical Jua run completes approximately 2.5 hours ahead of competing operational runs at the same cycle, giving Jua for Energy customers access to the updated forecast before the market re-prices on it.

Computational Efficiency Versus Accuracy

A single EPT-2 inference runs on a single GPU in minutes, consuming approximately 0.25 kWh at a cost of $0.20–$15 per simulation. A single traditional NWP simulation consumes approximately 8,400 kWh and costs €1,000–€20,000 on HPC infrastructure, running over 1–2 hours. The cost difference is roughly four orders of magnitude.

That inference efficiency traces back to training efficiency. EPT-2 was trained on 8 × H100 GPUs over 10 days. Microsoft Aurora required 32 × A100 GPUs over 18 days, which means four times more GPUs and a substantially longer training cycle. The inference efficiency advantage translates directly into update frequency, so EPT-2 RR delivers frequent runs per day where traditional NWP delivers fewer, without an HPC cluster and without compromising the accuracy documented in StationBench.

The accuracy gain has a direct market-sizing translation. Those savings figures of approximately €1.5 M per GW wind and €3 M per GW solar at four percentage points of improvement scale linearly with portfolio size.

European energy traders are already using AI tools to forecast shifts in the ECMWF two-week outlook, which serves as the reference point for repricing risk around heating demand, renewable output, and system tightness. That P&L sensitivity explains why traders look for models like EPT-2 that perform well against that reference point.

How to Run Your Own Benchmark on Jua

The Jua platform at athena.jua.ai puts more than 25 models on a single surface, including 10 proprietary EPT-family AI models and 15 third-party NWP and AI models such as ECMWF HRES, ECMWF ENS, ECMWF AIFS, Aurora, and GraphCast. A head-to-head benchmark on any region and variable returns results in under 30 seconds. Backtests against years of historical forecasts run in approximately 5 minutes via Athena, the AI agent instrumented with the Jua for Energy tool surface.

For quant developers and engineering teams, pip install jua installs the Python SDK from PyPI. The REST API exposes all models through a single schema with Apache Arrow support for large payloads. Hindcast data is available across multiple Jua and third-party models for backtesting, and ENTSO-E grid data integrates directly for European power-market context.

Run benchmarks on your own region and variables on the Jua platform. See your forecasts head-to-head against 25+ models at athena.jua.ai, or walk through a live benchmark with the Jua team.

Frequently Asked Questions

Can EPT-2 results be trusted, given that many AI weather models lack peer-reviewed validation?

EPT-2 is benchmarked on open-source StationBench which includes pre-processed ground truth data from 10,000+ weather stations around the world, using raw model output without post-processing or station fine-tuning. The methodology and results appear in technical reports. The architecture is physics-constrained by design, since EPT is a spatiotemporal transformer foundation model whose latent representation respects conservation laws of mass, momentum, and energy. Outputs cannot violate those laws the way a generic transformer applied naively to physics might. The validation is external, reproducible, and anchored to real ground observations rather than vendor-provided graphics.

Jua for Energy complements, rather than replaces, an ECMWF subscription. Most serious customers keep their ECMWF access and run Jua for Energy alongside it. ECMWF AIFS, ECMWF’s own AI model, runs natively on the Jua platform in the same workspace as EPT-2.

Jua for Energy displaces the plumbing around the ECMWF feed, including the in-house grib pipeline, manual benchmarking, morning-briefing analyst work, and spreadsheet stitching. The 7–9 a.m. manual prep routine compresses into a single workspace, refreshed up to 24 times per day, where every model, including ECMWF HRES, ENS, AIFS, Aurora, and EPT-2, runs under one schema and one API.

How is EPT-2 different from Microsoft Aurora or DeepMind GraphCast?

The first difference is categorical. Aurora and GraphCast are research outputs from large AI labs, while Jua is a foundation model and agent company with Jua for Energy as a productised platform built on EPT and Athena, where Aurora and GraphCast run as comparison models.

Five concrete differences follow from that. First, EPT-2’s native any-Δt inference, described earlier, avoids the error compounding that fixed-increment models experience. Second, EPT-2e is a productised ensemble that shows strong performance on RMSE and CRPS, and no AI peer has shipped an equivalent. Third, EPT-2 RR delivers frequent updates, while AI peers are typically updated less often. Fourth, Athena, the AI agent instrumented with the Jua for Energy tool surface, turns natural-language questions into briefings, benchmarks, backtests, and custom widgets in approximately 90 seconds, and no AI weather peer has an equivalent. Fifth, Aurora has no SSRD output at all, which makes it incomplete for solar-portfolio applications.

How quickly can a quant team or trading desk integrate Jua for Energy into an existing pipeline?

Integration that takes a quarter to build with raw AI-weather research subscriptions stands up in days with Jua for Energy. pip install jua installs the Python SDK from PyPI. The REST API exposes more than 25 models through a single schema with Apache Arrow support for large payloads. Hindcast data is available across multiple Jua and third-party models for backtesting, and ENTSO-E grid data integrates directly for European power-market context.

A live benchmark on the prospect’s own region and variable returns results in under 30 seconds. A full backtest via Athena runs in approximately 5 minutes. Quant funds describe the integration as the feature that closes the evaluation, because the pipeline that elsewhere requires a quarter of engineering time is operational before the end of the proof-of-value period.

Conclusion

The StationBench results which include pre-processed ground truth data from 10,000+ weather stations around the world show that EPT-2 performs well in European accuracy benchmarks, with strong results for 10 m wind, 100 m wind, 2 m temperature, and surface solar radiation across a range of lead times. EPT-2e adds strong ensemble performance on both RMSE and CRPS. The accuracy gain translates to approximately €1.5 M annual savings per GW wind and €3 M per GW solar at four percentage points of improvement, with figures that scale linearly with portfolio size.

Jua is a foundation model and agent company. Jua for Energy is the first applied product, delivering EPT-2 and Athena operationally to utilities, trading houses, and quant funds across five continents, including Axpo, TotalEnergies, Statkraft, EnBW, EDF, and Hydro-Québec. The architecture learns physics, and the domain is a variable. Energy trading is the first market, and further domains will follow.

Developers can pip install jua and read the documentation at docs.jua.ai. Everyone else can see the numbers for their own region and variable by scheduling a walkthrough with the Jua team.

Back to all articles Explore energy trading

View the key takeaways as a web story

Want to talk to the team behind the writing?

Book a demo to see EPT-2 and Athena in production, or read the open papers behind the work.

Book a demo Read the papers