Written by: Olivier Lam, Physical AI Team, Jua.ai AG
Key Takeaways for Energy and Weather Teams
-
AI weather models like EPT-2 outperform traditional NWP, such as ECMWF HRES, on key variables including wind speed, temperature, and solar radiation.
-
WeatherBench metrics (RMSE, CRPS) against 14,000+ stations show EPT-2 and EPT-2e leading competitors like Aurora and GraphCast in deterministic and ensemble accuracy.
-
Jua’s EPT models deliver stronger operational specs with 24x daily updates, 5km resolution, and 2-3 hour faster dissemination versus ECMWF’s 2-4 cycles and 9km grid.
-
AI cuts computational load dramatically, with EPT-2 inference at 0.25 kWh vs. NWP’s 8,400 kWh, while physics-grounded designs like EPT improve extreme weather performance.
-
Energy traders gain €1.5-3M/GW/year from Jua’s platform; see how EPT-2 performs in your region to quantify your potential gains.
How This Article Measures AI Weather Accuracy
Weather model accuracy relies on standardized metrics evaluated against ground truth observations. Root Mean Square Error (RMSE) measures deterministic forecast skill through L2 norm calculations, while Continuous Ranked Probability Score (CRPS) evaluates probabilistic ensemble performance.
WeatherBench provides the industry standard, testing models against 14,000+ real weather stations globally without post-processing or station-specific fine-tuning.
Three critical operational dimensions together define whether a model is production-ready. Temporal resolution determines forecast granularity, and EPT-2 produces native any-Δt forecasts at arbitrary time steps rather than rolling forward in fixed 6-hour increments.
This flexibility becomes powerful when paired with frequent updates, since mode run frequency controls data freshness, with EPT2-RR updating 24 times daily versus traditional NWP’s 2-4 cycles.
Update frequency only creates value when forecasts arrive in time to act, so dissemination time measures speed to market, and EPT-2 Early delivers forecasts 2-3 hours faster than competing operational runs.
The table below summarizes both accuracy metrics and operational specifications that together define production-ready model performance.
|
Metric |
Definition |
Use Case |
Source |
|---|---|---|---|
|
RMSE |
Root Mean Square Error |
Deterministic skill |
WeatherBench |
|
CRPS |
Continuous Ranked Probability Score |
Ensemble skill |
WeatherBench |
|
Any-Δt |
Arbitrary time step forecasting |
Operational flexibility |
EPT Architecture |
|
Dissemination |
Time from run to availability |
Market advantage |
Operational specs |
Top AI Models Ranked by Accuracy in 2026
With these measurement standards established, the comparison between leading AI models and traditional NWP becomes clear. WeatherBench evaluations confirm this advantage across specific variables, showing that EPT-2 maintains superior RMSE on 10m wind, 100m wind, 2m temperature, and surface solar radiation throughout the 0-240 hour forecast range.
The ensemble variant achieves this with remarkable efficiency, using only 10 members versus ECMWF ENS’s 50 while delivering superior RMSE and CRPS at virtually every lead time. Aurora and GraphCast trail significantly in these head-to-head comparisons.
Jua’s models also deliver superior spatial resolution at 5km over Europe compared to HRES’s 9km, while extending forecast horizons to 20-60 days. NVIDIA’s Earth-2 Medium Range enables high-accuracy predictions up to 15 days ahead across 70+ weather variables, and ECMWF AIFS demonstrates better accuracy than traditional physics-based models for large-scale patterns.
The following table quantifies EPT-2’s performance advantage across key variables and shows how it compares to Aurora, GraphCast, and the ECMWF HRES benchmark.
|
Model |
10m Wind RMSE (12-240h) |
2m Temp CRPS |
Source |
|---|---|---|---|
|
EPT-2 |
Beats HRES |
Superior performance |
|
|
Aurora |
Loses to EPT-2 |
Trails across range |
|
|
GraphCast |
Below EPT-2/Aurora |
Limited ensemble |
WeatherBench |
|
ECMWF HRES |
Benchmark reference |
40-year standard |
ECMWF |
AI vs Traditional NWP on Cost, Speed, and Skill
AI weather models achieve four orders of magnitude cost reduction compared to traditional numerical weather prediction. This efficiency gain comes from both speed and energy advantages.
NOAA’s AIGFS generates a 16-day forecast in approximately 40 minutes versus longer for the operational GFS, while EPT-2 inference runs at ~0.25 kWh versus NWP’s ~8,400 kWh per simulation, showing that AI models are faster and require dramatically less power per forecast.
ECMWF IFS maintains an accuracy advantage over many competitors, yet ECMWF AIFS can achieve competitive performance on upper-air variables while enabling rapid ensemble generation. These developments highlight how AI approaches now match or exceed traditional systems on both skill and efficiency.
The table below summarizes lead time performance across key variables, illustrating how EPT-2’s accuracy advantage persists from short range through longer horizons.
|
Lead Time |
100m Wind RMSE |
SSRD Performance |
Source |
|---|---|---|---|
|
12h |
EPT-2 > HRES |
EPT-2 superior |
|
|
3d |
EPT-2 advantage |
Consistent lead |
|
|
5d |
EPT-2 maintains |
Extended horizon |
|
|
10d |
EPT-2 leads |
Long-range skill |
Operational Accuracy for Energy Trading Desks
The Jua Platform converts model accuracy into trading value through continuous, high-frequency updates. It runs 25+ models benchmarked live every 5 minutes, with EPT2-RR updating 24 times daily and power forecasts refreshing every 15 minutes. Customers, including Axpo, TotalEnergies, and Statkraft, achieve €1.5-3M/GW/year efficiency gains through superior forecast accuracy and faster dissemination.
The table below compares operational specs across leading systems so trading and grid teams can see how update frequency, resolution, and dissemination differ in practice.
|
Model |
Update Frequency |
Resolution |
Dissemination |
|---|---|---|---|
|
EPT-2 |
up to 24x/day |
up to 5km |
2-3h faster |
|
ECMWF HRES |
2-4x/day |
9km global |
Standard |
|
NOAA GFS |
4x/day |
Standard |
|
|
Aurora |
Research mode |
25km global |
Variable |
Limitations and Extreme Weather Performance
Despite these operational advantages and accuracy gains in routine forecasting, AI weather models face significant challenges with extreme weather events.
ECMWF HRES consistently outperforms GraphCast, Pangu-Weather, and Fuxi in RMSE for record-breaking temperature and wind events across nearly all lead times. AI models systematically underpredict both frequency and intensity of record-breaking events, which results in low recall and high false negatives.
EPT’s spatiotemporal transformer architecture addresses these limitations through physics constraints. EPT learns conservation laws for mass, momentum, and energy directly from observational data, enabling superior performance on both routine and extreme conditions.
This physics-grounded approach contrasts with pure machine learning models, and Rice University’s 2026 study found Pangu-Weather and Aurora excel at predicting storm tracks but struggle with realistic physical structures in tropical cyclone windfields, precisely the kind of physical realism that EPT’s conservation law constraints are designed to preserve.
Teams can test EPT-2’s performance on extreme events in their own regions. Request access to run live comparisons against your current forecasting stack and see how physics constraints affect tail-risk scenarios.
Conclusion: What EPT-2 Means for Production Forecasting
EPT-2 establishes the 2026 benchmark for AI weather model accuracy, outperforming traditional NWP and competing AI systems across routine forecasting while addressing extreme weather limitations through physics-grounded architecture. The Jua Platform turns this technological advantage into operational value through 24x daily updates, ensemble forecasting, and agentic workflow automation.
For production energy trading, EPT-2’s combination of superior accuracy, operational frequency, and physics constraints delivers measurable economic value. Run benchmarks on your specific region and variables to experience the accuracy difference firsthand and quantify the impact on your portfolio.
FAQ
Does EPT-2 outperform Aurora on wind forecasting?
EPT-2 outperforms Aurora on both 10m and 100m wind speed across the full 0-240 hour forecast range. It also beats Aurora on 2m temperature up to approximately 130 hours. EPT-2 wins by default on surface solar radiation since Aurora produces no SSRD output. These results come from head-to-head WeatherBench evaluations against 14,000+ ground stations without post-processing.
How do AI weather models handle extreme weather events?
Most AI weather models struggle with extreme events and systematically underpredict both frequency and intensity of record-breaking conditions. Physics-grounded models like EPT perform better than pure machine learning approaches. EPT learns conservation laws for mass, momentum, and energy directly from observational data, which constrains outputs to physically realistic scenarios. This architecture supports stronger performance on both routine forecasts and extreme weather events that fall outside typical training distributions.
What is the Jua Platform’s update frequency compared to traditional NWP?
The Jua Platform updates up to 24 times per day through EPT2-RR, compared to traditional NWP’s 2-4 daily cycles. Power forecasts refresh every 15 minutes for actual generation data. EPT-2 Early provides 2-3 hour faster dissemination than competing operational runs at the same cycle. This frequency advantage enables traders to act on weather-driven market opportunities before competitors receive updated traditional forecasts.
What are the computational requirements for training EPT-2?
EPT-2 was trained on 8 H100 GPUs over 10 days, using 5+ petabytes of weather data from 120+ sources including over 100,000 weather stations. This represents significantly lower computational requirements compared to competitors, since Microsoft Aurora required 32 A100 GPUs over 18 days. For inference, EPT-2 runs on a single GPU in minutes at the ~0.25 kWh efficiency mentioned earlier, compared to traditional NWP’s HPC cluster requirements.
How does EPT-2e ensemble forecasting compare to ECMWF ENS?
EPT-2e uses only 10 ensemble members compared to ECMWF ENS’s 50 members, yet beats the ENS mean on both RMSE and CRPS at virtually every lead time. EPT-2e extends forecast horizons to 60 days versus ENS’s 15-day range. The ensemble provides superior probabilistic skill for uncertainty quantification while requiring substantially fewer computational resources than traditional ensemble systems, which enables more frequent ensemble updates and larger ensemble sizes when needed.