Weather Forecasting

Best NWP Model Accuracy 2026: EPT-2 Tops Every Lead Time

Name: Athena
Brand: Jua

Olivier Lam·May 19, 2026

Best NWP Model Accuracy 2026: AI vs Traditional Forecasts

Written by: Olivier Lam, Physical AI Team, Jua.ai AG | Last updated: July 3, 2026

Why EPT-2 Sets the New Accuracy Standard

EPT-2, Jua’s physics foundation model, beats ECMWF HRES on every lead time and every energy-critical variable from 0–240 hours.
EPT-2e outperforms the 50-member ECMWF ENS mean on both RMSE and CRPS at virtually every lead time, so traders get stronger probabilistic signals.
Short-range (0–24 h) and medium-range (1–10 day) gains convert directly into lower imbalance costs and sharper trading decisions.
EPT-2 runs on a single GPU at about 0.25 kWh and updates up to 24 times per day, at roughly four orders of magnitude lower cost than traditional NWP.
You can see these accuracy and cost advantages on your own regions and variables in a live Jua for Energy session.

EPT-2’s Lead Across Every Variable and Horizon

The comparison below shows EPT-2’s consistent lead across the four variables that drive energy P&L at three key forecast windows. EPT-2 holds the top rank in every cell, which no other model matches. All figures are sourced from the peer-reviewed technical report and the StationBench open-source evaluation framework. Lower RMSE is better.

Lead-time window	10 m / 100 m wind	2 m temperature	Surface solar radiation (SSRD)
0–24 h	EPT-2 #1, outperforms ECMWF HRES and Aurora	EPT-2 #1, outperforms ECMWF HRES and Aurora	EPT-2 #1, ECMWF HRES #2, Aurora: no output
1–5 days	EPT-2 #1, ECMWF HRES #2, Aurora #3	EPT-2 #1, ECMWF HRES #2, Aurora #3	EPT-2 #1, ECMWF HRES #2, Aurora: no output
5–10 days	EPT-2 #1, ECMWF HRES #2, Aurora #3	EPT-2 #1, ECMWF HRES #2, Aurora #3 (up to ~130 h)	EPT-2 #1, ECMWF HRES #2, Aurora: no output

EPT-2 occupies the top position in every cell. Aurora produces no SSRD output at any lead time, so it drops out of the solar radiation comparison entirely. GraphCast and ECMWF AIFS are also available on the Jua for Energy platform as comparison models, and their rankings trail EPT-2 and ECMWF HRES on wind and temperature across the full range documented in arXiv:2507.09703.

Short-Range Convective Performance for Intraday Trading

Short-range accuracy in the 0–24 hour window drives intraday power market outcomes. Wind ramps, convective cells, and solar dips inside this window move imbalance costs before a traditional NWP run can flag them. EPT-2 leads on 10 m and 100 m wind RMSE at every sub-24-hour lead time evaluated in arXiv:2507.09703. EPT-2 HRRR, the high-resolution rapid-refresh variant, delivers forecasts at up to 5 km native resolution over Europe and updates up to 24 times per day, compared with the two to four daily runs that define traditional NWP cadence.

For North American coverage, EPT-2 RR provides the same hourly update cadence at continental scale. This frequent refresh rate is only practical because of EPT-2’s inference cost structure. A single EPT-2 simulation runs on a single GPU in minutes at approximately 0.25 kWh and $0.20–$15, compared to about 8,400 kWh and €1,000–€20,000 for an equivalent traditional NWP run on HPC infrastructure. The cost gap of roughly four orders of magnitude is what makes up to 24 daily updates economically rational.

Medium-Range Global Skill for Day-Ahead and Multi-Day Positions

The 1–10 day window anchors day-ahead and multi-day energy positioning. arXiv:2507.09703 documents EPT-2 outperforming ECMWF HRES, Microsoft Aurora, Google DeepMind GraphCast, and ECMWF AIFS on all four energy-critical variables at every lead time from 1 to 10 days. EPT-2 beats HRES on 10 m wind, 100 m wind, 2 m temperature, and surface solar radiation at every lead time in this window.

Architecture explains part of this performance gap in a way that matters to practitioners. EPT-2 is trained to forecast at arbitrary lead times, native any-Δt, instead of rolling forward in fixed 6-hour increments. Aurora and most AI peers use a fixed 6-hour grid and roll forward iteratively, which compounds error at longer lead times. EPT-2 does not roll, so the compounding error that degrades Aurora’s medium-range skill is absent from EPT-2’s design.

EPT-2 inference also runs about 25% faster than Aurora. EPT-2 was trained on 8 × H100 GPUs over 10 days, while Aurora used 32 × A100 GPUs over 18 days, which means four times fewer GPUs and a substantially shorter training cycle for EPT-2.

Ensemble Skill for Probabilistic Energy Forecasting

Probabilistic forecasting is now standard for risk-aware energy trading because it quantifies uncertainty instead of returning a single trace. EPT-2e, the ensemble variant of EPT-2, beats the 50-member ECMWF ENS mean on both RMSE and CRPS at virtually every lead time, as documented in arXiv:2507.09703. EPT-2e updates four times per day.

Aurora, GraphCast, and ECMWF AIFS have no productised ensemble equivalent available for operational energy trading. They ship as research outputs consumed as raw deterministic files. A quant team that wants ensemble depth from those models must build the ensemble logic themselves. EPT-2e arrives as a productised ensemble on the Jua for Energy platform, accessible via the Python SDK with pip install jua or through the REST API with Apache Arrow payload support.

Regional Accuracy for European and North American Markets

The StationBench evaluation framework benchmarks against more than 10,000 real ground stations globally, with no post-processing or station fine-tuning. Regional breakdowns from this methodology confirm EPT-2’s lead on wind and temperature across Europe and North America, the two geographies that account for most liquid power and gas trading.

For European wind markets such as Germany, Great Britain, France, the Netherlands, and Belgium, EPT-2 HRRR provides up to 5 km native resolution coverage with up to 24 daily updates. Wind at 11 height levels from 10 m to 200 m is available, which covers the full range of onshore and offshore turbine hub heights. For Nordic hydro and wind markets, EPT-2’s global deterministic output and the EPT-2e ensemble cover the full Scandinavian geography at the same lead-time accuracy documented in arXiv:2507.09703. Jua serves major utilities across four continents, including some of Europe’s largest energy companies, as well as commodity traders and hedge funds.

Inference Cost and Update Frequency for Traders

Traditional NWP economics set a hard ceiling on forecast frequency. A single ECMWF HRES simulation consumes approximately 8,400 kWh and costs €1,000–€20,000 to run on HPC infrastructure, taking one to two hours per cycle. These economics force a strict tradeoff, so the European supercomputer runs its full algorithm twice a day, with supplementary runs bringing the total to roughly four global forecasts per 24 hours. Between those four runs, every number on every screen is stale.

EPT-2 inference runs on a single GPU in minutes at approximately 0.25 kWh and $0.20–$15 per simulation, a cost structure roughly four orders of magnitude lower than traditional NWP. EPT-2 RR updates up to 24 times per day, and actual-generation power forecasts on the Jua for Energy platform refresh every 15 minutes. A trader who runs Jua for Energy alongside an existing ECMWF subscription sees the next forecast hours before the next traditional run lands. EPT-2 delivers hourly global weather updates and outperforms leading AI weather models and traditional numerical baselines across all forecast horizons on RMSE.

Run a live benchmark on your region and variable in under five minutes.

How Jua for Energy Turns Accuracy into P&L

Jua is a foundation model and agent company building horizontal AI infrastructure. Jua for Energy is the first vertical application of this broader platform, similar to how Anthropic positions Claude Code. EPT and Athena are domain-agnostic by architecture, so the atmosphere is the first physical system EPT has been fine-tuned for, and energy trading is the first market Athena has been instrumented for.

Inside Jua for Energy, EPT-2’s accuracy advantage becomes usable through four product surfaces. The live benchmarking surface puts more than 25 models, including 10 proprietary EPT-family models and 15 third-party NWP and AI models such as ECMWF HRES, ECMWF ENS, ECMWF AIFS, NOAA GFS, Aurora, and GraphCast, on a single platform. Any region, any variable, any time window can be compared head-to-head in seconds. Athena, Jua’s agentic intelligence layer, turns raw physics predictions from EPT-2 into actionable analysis by reading market context. Briefings, benchmarks, backtests, and custom widgets resolve in about 90 seconds.

Day-Ahead and Intraday briefings auto-refresh on every new model run, which replaces the manual 7–9 a.m. grib-file routine. Divergence alerts fire the moment two models disagree, and correction alerts fire the moment a model revises its own output. The trade window opens with a notification instead of a missed move.

A 1 GW wind portfolio that gains four percentage points of forecast accuracy saves about €1.5 million per year under typical hedging and imbalance structures. A 1 GW solar portfolio at the same accuracy gain saves about €3 million per year. Jua’s forecasts carry an estimated $1.5 million P&L impact per gigawatt annually in European energy markets. Customers operating multi-GW portfolios such as Axpo, TotalEnergies, Statkraft, EnBW, EDF, and Hydro-Québec scale these economics roughly linearly.

Live Benchmark: See EPT-2 Against Your Current Model

The live benchmark moment usually triggers the buying decision for Jua for Energy customers. A meteorologist or quant developer selects a high-stakes region and variable, picks the current provider alongside EPT-2, and the platform returns a head-to-head accuracy comparison on the spot. The objection shifts from “is this real?” to “how fast can we procure?” because the numbers speak clearly. You can run the comparison yourself on the Jua for Energy platform at athena.jua.ai, or schedule a guided walkthrough with the Jua team.

Frequently Asked Questions

Is EPT-2 verified against real observations, or only against model analyses?

EPT-2 is benchmarked against more than 10,000 real ground stations using StationBench, Jua’s open-source evaluation framework, with no post-processing or station fine-tuning applied. This approach differs from model-to-model comparisons that use reanalysis fields as ground truth, a methodology that can flatter models trained on the same reanalysis data. The results are published in the peer-reviewed technical report arXiv:2507.09703. Under this station-based methodology, EPT-2 outperforms ECMWF HRES on 10 m wind, 100 m wind, 2 m temperature, and surface solar radiation at every lead time from 0 to 240 hours.

How does EPT-2e compare to ECMWF ENS for probabilistic energy forecasting?

As documented earlier, EPT-2e outperforms the 50-member ECMWF ENS mean on both deterministic and probabilistic metrics. EPT-2e updates four times per day and is available as a productised ensemble on the Jua for Energy platform via the Python SDK and REST API. No AI weather peer, including Aurora, GraphCast, or ECMWF AIFS, ships a productised ensemble equivalent for operational energy trading. For traders who require probabilistic outputs to size positions and manage imbalance risk, EPT-2e is the only AI ensemble that has been verified to exceed the ECMWF ENS mean at this lead-time range.

Can quant developers access EPT-2 hindcasts for backtesting systematic strategies?

Yes. The Jua for Energy platform exposes hindcast data across multiple Jua and third-party models through the REST API and the Python SDK, installed via pip install jua. The API uses Apache Arrow for large-payload queries, which handles the data volumes required for continental, multi-variable, multi-model backtests without choking. Athena can run a backtest in about five minutes from a natural-language query, and the SDK provides programmatic access for teams that prefer to build their own pipeline. The more than 25 models on the platform, including ECMWF HRES, ECMWF ENS, Aurora, GraphCast, and the full EPT family, share a unified schema, so switching or comparing models does not require re-engineering the ingestion pipeline.

Does Jua for Energy replace an existing ECMWF subscription?

No. Jua for Energy runs alongside the incumbent feed. ECMWF AIFS, ECMWF’s own AI model, runs natively on the Jua for Energy platform. Jua for Energy replaces the plumbing around the ECMWF subscription instead of the subscription itself. The in-house grib pipeline, the manual benchmarking, the morning-briefing analyst, and the dashboard stitching collapse into a single workspace, refreshed up to 24 times a day, where every model sits on the same screen with one schema and one API. Serious customers keep their ECMWF subscription and use Jua for Energy to replace everything built around it.

Developer Access and Forward Outlook

Quant developers and engineering teams can access all 25+ models on the Jua for Energy platform, including EPT-2, EPT-2e, EPT-2 RR, EPT-2 HRRR, ECMWF HRES, Aurora, and GraphCast, through a single REST API schema with Apache Arrow support. Hindcast data is available for backtesting. The integration that takes a quarter to build from raw AI-weather research subscriptions stands up in days on Jua for Energy.

Teams can pipe Jua forecasts into their own models by running pip install jua, or by reading the API documentation at docs.jua.ai. For a guided walkthrough of the benchmarking surface and EPT-2 accuracy results on your region and variable, see EPT-2 on your highest-stakes forecasts.

Energy is the first market Jua has entered, but the EPT architecture is domain-agnostic. The same foundation model that learns atmospheric dynamics already predicts plasma behaviour inside a tokamak. The roadmap extends to plasma fusion, aerospace, materials, fluids, and beyond, each shipped as a new vertical product on the same horizontal platform. Customers buying Jua for Energy today are buying the first surface of a foundation-model and general-agent platform that will expand outward from there. The physical economy is larger than the digital economy, and LLMs cannot touch it because physics is not language. Jua is building the models and the agent that can.

Back to all articles Explore energy trading

View the key takeaways as a web story

Want to talk to the team behind the writing?

Book a demo to see EPT-2 and Athena in production, or read the open papers behind the work.

Book a demo Read the papers