Best AI Weather Models 2026: EPT-2 Beats ECMWF & GraphCast

Best AI Weather Models 2026: EPT-2 Beats ECMWF & GraphCast

ON THIS PAGE

Written by: Olivier Lam, Physical AI Team, Jua.ai AG

Key Takeaways for Energy Traders

  • Jua’s EPT-2 beats ECMWF HRES, Microsoft Aurora, and DeepMind GraphCast across all lead times on energy-critical variables like wind and temperature.

  • EPT-2 delivers four global forecast cycles per day at 0.25 kWh per run, compared with traditional models that require about 8,400 kWh.

  • Physics foundation models like EPT learn conservation laws from observations, which prevents forecasts that break basic physical rules.

  • The Athena agent turns EPT-2 forecasts into natural-language briefings, cutting trader prep from hours to minutes.

  • Upgrade your energy trading forecasts with Jua’s EPT-2 and Athena and see how your current provider compares in a live benchmark.

Top AI Weather Models 2026 for Energy Use

Benchmarks against more than 10,000 ground stations show that EPT-2 is the only model that beats ECMWF HRES across all energy-relevant variables while also providing four daily updates. This combination of accuracy and refresh frequency is unique in the current landscape. The table below compares leading AI weather models on the metrics that matter most for energy trading.

Model

RMSE vs HRES (wind/temp/SSRD)

Ensemble CRPS

Resolution/Update Frequency

Forecast Horizon

Jua EPT-2

Beats HRES all lead times

EPT-2e beats ENS

high-res/4x daily

20 days

Microsoft Aurora

Loses on wind/temp

No ensemble

~25km/4x daily

10 days

DeepMind GraphCast

Trails across range

No ensemble

0.25°/4x daily

10 days

NOAA AIGFS

Improved over GFS

AIGEFS +18-24h vs GEFS

0.25° lat-lon/4x daily

16 days

EPT-2 leads current AI weather models in deterministic accuracy, and EPT-2e extends that lead in ensemble forecasting. EPT-2e uses 30 members that beat the 50-member ECMWF ENS mean on both RMSE and CRPS metrics, based on comprehensive benchmarking against ground truth observations.

EPT-2 is also the only model purpose-built for energy trading workflows. It supports wind, solar, and load forecasting across Germany, Great Britain, France, the Netherlands, and Belgium. Aurora and GraphCast provide raw meteorological outputs without trading-specific features, while NOAA AIGFS focuses on cyclone tracking rather than power markets.

#1: Jua EPT-2 as Global State-of-the-Art

The Earth Physics Transformer (EPT) family sets a new standard for physics foundation models. EPT does not treat the atmosphere as a sequence of tokens. It instead learns the governing physics of continuous systems directly from observational data. Benchmarks show that EPT-2 maintains its accuracy advantage across all variables and lead times that matter for energy trading.

EPT-2 runs native any-Δt forecasting, which means it predicts at arbitrary time intervals instead of rolling forward in fixed 6-hour steps like Aurora and most competitors. Competitors must chain together many 6-hour predictions to reach longer horizons, so small errors compound with every step. EPT-2 jumps directly to any target time and avoids this compounding, which explains its accuracy edge at short lead times that drive intraday trading.

The Athena agent turns EPT-2’s raw forecasting power into trading intelligence. Traders ask natural-language questions such as “what is the wind forecast spread across models for northern Germany tonight?” and receive a comprehensive briefing in about 90 seconds. This pairing of foundation model accuracy with agent-driven workflow support has made Jua for Energy the platform of choice for utilities like Axpo and TotalEnergies.

Jua operates as a foundation model and agent company, with Jua for Energy as its first applied product. The setup mirrors the relationship between Anthropic and Claude Code, where a horizontal AI platform supports flagship vertical applications. Request access to test EPT-2’s accuracy on your own regions and variables through live comparisons.

Runners-Up: Aurora, GraphCast, NOAA AIGFS, NVIDIA Earth-2

EPT-2 leads in accuracy and operational readiness, and understanding where competing models fall short highlights the value of a physics foundation model. Microsoft Aurora delivers strong speed gains over traditional NWP but omits surface solar radiation output, which limits its use for solar forecasting. Aurora more closely matches ERA5 intensity distributions than Pangu-Weather for tropical cyclones, so it suits hurricane tracking but not full energy portfolios.

DeepMind GraphCast uses Graph Neural Networks to capture spatial dependencies but still runs on fixed 6-hour time steps that compound forecast errors. GraphCast outperforms ECMWF’s HRES on 90% of 1380 verification targets, which makes it a strong research model. It still lacks ensemble capabilities and an operational refresh schedule tuned for systematic energy trading.

NOAA’s AIGEFS extends forecast skill by 18-24 hours over traditional GEFS while using only 9% of the computing resources. Its 0.25 degree lat-lon grid and four daily updates provide solid global coverage. EPT-2 still offers higher effective resolution and a workflow tailored to energy trading.

NVIDIA Earth-2 delivers high-resolution forecasts down to 2km and achieves a 90% reduction in compute time at 2.5-kilometer resolution compared with classic NWP on CPU clusters, according to the Israel Meteorological Service. It remains a modeling platform rather than a trading solution and requires separate tools for briefings, benchmarking, and workflow integration.

AI Weather Models in Daily Energy Trading

Energy traders face a persistent workflow gap between forecast delivery and trading decisions. Traditional NWP models update only two to four times per day, which leaves traders with stale data during volatile periods. The typical morning routine of downloading grib files, running in-house pipelines, and waiting for meteorologist briefings can consume two to three hours while markets move.

EPT-2’s frequent updates, combined with Athena’s natural-language briefings, compress this entire workflow into a single workspace. Power forecasts refresh every 15 minutes for actual generation, and divergence alerts flag moments when models disagree. Traders see potential opportunities before competitors react. DeepMind’s machine learning algorithms boosted the value of Google’s 700 megawatts of wind power capacity in the central United States by about 20% through better bidding accuracy, which illustrates the financial impact of improved forecasting.

Why Physics Foundation Models Like EPT Win

Physics differs from language in ways that matter for forecasting accuracy. Large language models work on discrete, symbolic tokens, so they can produce sequences that sound plausible but break physical rules. A token-based model might output temperature patterns that read well yet violate thermodynamics.

Atmospheric systems behave as continuous, multi-scale fields governed by conservation laws for mass, momentum, and energy. These laws restrict what can actually happen in the real world, which makes token-based approaches a poor fit for high-stakes forecasting. EPT learns these conservation laws directly from more than 5 petabytes of observational data and keeps outputs within physical bounds.

EPT’s physics-informed design avoids the kind of hallucinations that unconstrained AI models can produce. Its accuracy is validated against ground truth, and its forecasts respect atmospheric dynamics. This design helps explain why AI models slightly outperform traditional models, since they combine neural network speed with the physical consistency of numerical weather prediction.

Conclusion: Turning Forecast Accuracy into P&L

The 2026 benchmark results confirm EPT-2’s position as the global state-of-the-art in atmospheric prediction. It delivers leading accuracy across energy-relevant variables and lead times, and Athena extends that edge into day-to-day trading workflows.

Energy traders, meteorologists, and quant developers who want forecast accuracy that feeds directly into P&L now have a clear option. EPT-2’s physics-informed architecture and frequent updates provide a durable edge in volatile power markets. Start your evaluation and run live benchmarks on your own regions and variables.

Frequently Asked Questions

What makes EPT-2 different from other AI weather models?

EPT-2 is a physics foundation model that learns conservation laws directly from observational data, which avoids physically impossible outputs that language-style models can produce. It also supports native any-Δt forecasting, so it predicts at arbitrary time intervals instead of rolling forward in fixed steps like Aurora and GraphCast. This design reduces compounding errors and improves accuracy at the horizons traders care about. Frequent updates then keep those accurate forecasts aligned with fast-moving markets.

How does EPT-2 compare to ECMWF HRES in practical terms?

EPT-2 outperforms ECMWF HRES across lead times from 0 to 240 hours on key energy variables such as 10-meter wind, 100-meter wind, 2-meter temperature, and surface solar radiation. ECMWF HRES consumes about 8,400 kWh per simulation and costs roughly €1,000 to €20,000 per run. EPT-2 completes forecasts at about 0.25 kWh and $0.20 to $15 per simulation. This efficiency supports more frequent updates than HRES, which gives traders fresher information during the trading day.

Can AI weather models be trusted for energy trading decisions?

AI weather models can support trading decisions when they embed physics constraints and undergo rigorous validation. EPT-2 trains on observational physics and learns the conservation laws that govern atmospheric behavior, which prevents the hallucinations seen in unconstrained AI systems. Its accuracy is validated against more than 10,000 ground stations without post-processing, and results appear in peer-reviewed technical reports. This physics-informed approach combines reliability with the speed and efficiency of modern AI.

What specific benefits do energy traders see from better weather forecasting?

Small gains in forecast accuracy can translate into large financial savings. A 4 percentage point improvement can save about €1.5 million per year for each GW of wind capacity and €3 million per GW of solar capacity through better hedging and lower imbalance penalties. Traders using EPT-2 gain frequent updates that fill the gaps between traditional model runs, divergence alerts that highlight opportunities when models disagree, and natural-language briefings that shrink the morning prep window from hours to minutes. Together, these benefits support faster decisions and stronger positioning ahead of market moves.

How does the Athena agent enhance weather forecasting for energy applications?

Athena converts raw forecast data into trading-ready insights through natural-language interaction. Traders ask questions such as “what is the wind forecast spread across models for northern Germany tonight?” and receive a structured briefing in about 90 seconds. Athena also builds custom backtests, assembles personalized dashboards, and sends automatic morning briefings that summarize model consensus, disagreements, and price implications. This automation removes manual data processing and lets traders focus on risk, strategy, and execution.

Want to talk to the team
behind the writing?

Book a demo to see EPT-2 and Athena in production, or read the open papers behind the work.