Written by: Olivier Lam, Physical AI Team, Jua.ai AG
Key Takeaways for Energy Traders
- A production-grade AI weather forecast API built on a physics foundation model learns atmospheric conservation laws directly from data and delivers deterministic, ensemble, and hindcast outputs through a single interface.
- Traditional NWP systems are limited to 2–4 daily runs due to extreme compute costs, which leaves traders with stale data between cycles and forces complex in-house pipelines.
- EPT-2 outperforms ECMWF HRES on every lead time for wind, temperature, and solar radiation variables, while its ensemble variant beats the 50-member ECMWF ENS mean on RMSE and CRPS.
- EPT-2 runs at ~0.25 kWh per simulation, roughly four orders of magnitude cheaper than NWP, which enables up to 24 daily updates and rapid ensemble generation for probabilistic trading.
- Energy traders can book a demo with Jua to run live head-to-head benchmarks against their current provider in under five minutes.
Why Traditional Weather APIs Leave Traders With Stale Data
Numerical weather prediction (NWP), the method that decomposes the atmosphere into three-dimensional grid cells and solves differential equations inside each one, has led operational forecasting for over forty years. The method works, but the economics do not scale. A single NWP simulation consumes approximately 8,400 kWh of compute and costs €1,000–€20,000 to run on high-performance computing (HPC) infrastructure. That cost ceiling limits the European Centre for Medium-Range Weather Forecasts (ECMWF) supercomputer to two full runs per day, and with smaller supplementary runs, the energy industry receives roughly four global forecasts per 24-hour period.
Between those runs, traders work with stale numbers. The standard workflow compounds the problem. Teams download raw grib files, process them through brittle in-house pipelines, cross-reference an internal meteorology team or a consultancy, and stitch together a view of the day from a dozen sources before the market opens. Consumer APIs such as OpenWeatherMap and Visual Crossing sit on top of this same NWP stack. They simplify the interface but do not change the refresh rate, the ensemble availability, or the lead-time accuracy. These structural limits have pushed energy desks toward AI-based forecasting that can refresh more often and support probabilistic trading.
How AI Weather Forecasting Works in Practice
AI weather forecasting only works reliably when the architecture respects physics. A standard large language model applied naively to atmospheric data produces outputs that can violate conservation laws such as mass, momentum, and energy constraints that the real atmosphere cannot break. That behaviour mirrors hallucinations in text models, but in this case it appears as physically impossible weather. A 2024 Science Advances study found that unconstrained AI weather models including GraphCast, Pangu-Weather, and FuXi systematically underestimated the frequency and intensity of record-breaking heat, cold, and wind events, with forecast bias growing nearly linearly with record exceedance, a direct consequence of lacking explicit physical constraints.
EPT (Earth Physics Transformer) uses a different design. EPT is a general spatiotemporal transformer foundation model that learns the governing physics of complex systems directly from observational data in a latent representation that is integrated forward in time. Outputs are physically constrained by construction. The architecture is domain-agnostic. The same EPT model that learns atmospheric dynamics already predicts plasma behaviour inside a tokamak. Data and fine-tuning change from one physical system to the next. The relationship between Jua and Jua for Energy mirrors the relationship between Anthropic and Claude Code, a horizontal AI platform with a flagship vertical product.
EPT-2 (arXiv:2507.09703) is benchmarked against more than 10,000 real ground stations on open-source StationBench, with no post-processing or station fine-tuning. The results are reproducible and externally verifiable.
EPT-2 Accuracy on Energy-Relevant Variables (arXiv:2507.09703)
The EPT-2 technical report evaluates performance on the four variables that drive energy P&L: 10 m wind speed, 100 m wind speed (critical for wind-turbine hub heights), 2 m temperature, and surface solar radiation (SSRD). RMSE (root mean square error, the average magnitude of forecast error) and CRPS (continuous ranked probability score, a proper scoring rule for probabilistic forecasts) are the primary metrics, evaluated across the full 0–240 hour lead-time range. Lead time is the number of hours between forecast issuance and the valid time being predicted.
EPT-2 outperforms ECMWF HRES on every lead time and on all four energy-relevant variables across the 0–240 hour range. EPT-2 also beats Microsoft Aurora on 10 m wind, 100 m wind, and 2 m temperature across the full range. On surface solar radiation, EPT-2 wins by default because Aurora publishes no SSRD output. EPT-1.5 (arXiv:2410.15076), the previous-generation model, had already outperformed GraphCast, FuXi, Pangu-Weather, and ECMWF HRES on European wind and temperature, which set the benchmark trajectory the EPT family has continued.
EPT-2 was trained on 8 × H100 GPUs over 10 days. Microsoft Aurora required 32 × A100 GPUs over 18 days. EPT-2 inference runs on a single GPU in minutes at approximately 0.25 kWh and $0.20–$15 per simulation, roughly four orders of magnitude cheaper than a comparable NWP run. EPT-2 is also approximately 25% faster than Aurora at inference time.
Ensemble Forecasting and Rapid-Refresh Cadence
An ensemble forecast is a set of multiple model runs initialised with slightly different starting conditions, which quantifies forecast uncertainty and generates probability distributions over outcomes. A hindcast is a forecast run over a historical period using the same model configuration as the operational system, which supports backtesting trading strategies against years of past data.
EPT-2e, the ensemble variant of EPT-2, beats the 50-member ECMWF ENS mean on both RMSE and CRPS at virtually every lead time. EPT-2e updates 4 times per day and extends to a 60-day horizon. No AI weather peer such as Aurora, GraphCast, or ECMWF AIFS ships a productised ensemble equivalent that matches this probabilistic skill.
EPT-2 RR (rapid refresh) updates up to 24 times per day, compared with the 2–4 daily runs available from traditional NWP providers. EPT-2 HRRR delivers the same high-cadence refresh at up to 5 km spatial resolution over Europe. Customers running Jua for Energy alongside their existing ECMWF subscription see the next forecast hours before the next traditional run lands. That timing difference creates a structural edge in intraday and day-ahead power markets.
How Jua for Energy API and SDK Fit Into Your Stack
Jua for Energy exposes all EPT-family models and 15 third-party NWP and AI models through a unified REST API (POST /v1/forecast/data and related endpoints) with Apache Arrow support for large payloads. The official Python SDK installs in a single command:
pip install jua
The SDK provides forecast access, hindcast and backtesting, and weather-parameter standardisation across all 25+ models on the platform, which removes the need for custom ingestion and schema work. Full documentation and the developer dashboard are available at docs.jua.ai and developer.jua.ai respectively. Quant teams that currently build ingestion pipelines, ensemble logic, and benchmarking harnesses from scratch against raw GraphCast or Aurora outputs report that the equivalent Jua for Energy integration stands up in days rather than a quarter.
Athena, Jua’s AI agent currently instrumented with the Jua for Energy tool surface, accepts natural-language queries and resolves them to deliverables in approximately 90 seconds. A representative query is “What is the 100 m wind forecast spread across models for northern Germany tonight?” Athena plans, calls the relevant forecast and benchmarking tools, and returns the answer with the underlying widget. Backtests resolve in approximately 5 minutes.
Book a demo to run a live benchmark on your own region and variable in under 5 minutes.
Compute Economics: 0.25 kWh vs 8,400 kWh
The compute economics of AI weather inference versus NWP are structural, not marginal. A single traditional NWP simulation consumes approximately 8,400 kWh and costs €1,000–€20,000 on HPC infrastructure, taking 1–2 hours to complete. A single EPT-2 inference runs at approximately 0.25 kWh and $0.20–$15 on a single GPU, completing in minutes. The compute economics bear repeating: this four-order-of-magnitude gap reshapes what is viable in production.
That asymmetry makes 24× daily refresh economically viable for EPT-2 RR and physically impossible for NWP. It also makes ensemble generation tractable. EPT-2e generates probabilistic forecasts at a fraction of the cost of running the 50-member ECMWF ENS. For quant funds and trading houses that need to run backtests across years of historical data, the inference cost difference translates directly into research throughput.
When Jua for Energy Beats OpenWeatherMap or Raw ECMWF
The table below compares the primary options a quant developer or energy trader evaluates when selecting a production AI weather forecast API. Every data point is drawn from the sources cited inline.
| Capability | EPT-2 / EPT-2e (Jua for Energy) | ECMWF HRES / ENS | Aurora / GraphCast | OpenWeatherMap / Visual Crossing |
|---|---|---|---|---|
| Deterministic accuracy vs HRES (0–240 h, 10 m wind, 100 m wind, 2 m temp, SSRD) | Maintains the accuracy advantage over HRES documented above | The 40-year benchmark | Aurora loses to EPT-2 on 10 m wind, 100 m wind, 2 m temp across full range; no SSRD output | Resells processed NWP; no independent benchmark published |
| Ensemble (probabilistic) forecasting | EPT-2e beats 50-member ECMWF ENS mean on RMSE and CRPS at virtually every lead time | ENS: 50 members, gold standard for probabilistic NWP | No productised ensemble equivalent | No ensemble output |
| Update frequency | Matches the rapid-refresh cadence described above, with EPT-2 RR up to 24×/day and EPT-2e 4×/day | 2–4×/day | Typically 4×/day; no productised operational schedule | Dependent on underlying NWP; typically 4×/day |
| Hindcast access for backtesting | Available across multiple Jua and third-party models via SDK | Available to ECMWF members via MARS | Research code; limited or no productised hindcast access | Historical observations available; model hindcasts not standard |
| Inference cost per simulation | ~0.25 kWh; $0.20–$15 on a single GPU | ~8,400 kWh; €1,000–€20,000 on HPC | Similar order of magnitude to Jua for inference | API pricing; underlying NWP cost not exposed |
| Python SDK and API quality | pip install jua; REST + Apache Arrow; 25+ models under one schema; hindcast and backtesting built in |
Grib files via MARS; member access; no unified SDK | Research code / limited API; no unified schema | REST API; no ensemble, hindcast, or multi-model schema |
Frequently Asked Questions
What is the difference between an AI weather forecast API and a traditional weather API?
A traditional weather API wraps the output of a numerical weather prediction system, a physics-based simulation run on supercomputing infrastructure, behind a REST endpoint. The underlying model runs 2–4 times per day, and the API delivers those outputs with minimal transformation. An AI weather forecast API built on a physics foundation model generates forecasts using a neural architecture trained directly on observational data at a fraction of the compute cost, which enables refresh rates up to 24 times per day.
The critical distinction for production use is not the interface but the model underneath. A physics-constrained foundation model like EPT-2 respects conservation laws and produces physically valid outputs, while an unconstrained model does not. For energy trading, the practical consequences are accuracy at energy-relevant variables such as wind at hub height and surface solar radiation, ensemble availability for probabilistic positioning, and hindcast access for strategy backtesting. Consumer weather APIs do not provide this combination.
How does EPT-2 compare to GraphCast and Aurora for energy trading use cases?
EPT-2 outperforms Microsoft Aurora on 10 m wind, 100 m wind, and 2 m temperature across the full 0–240 hour lead-time range, as documented in arXiv:2507.09703. Aurora publishes no surface solar radiation output, which is a primary variable for solar generation forecasting. EPT-1.5, the previous generation, already outperformed GraphCast, FuXi, and Pangu-Weather on European wind and temperature.
Operational differences matter even more for a trading workflow. EPT-2 RR updates up to 24 times per day versus the typical 4-times-per-day cadence of Aurora and GraphCast research outputs. EPT-2e is a productised ensemble that maintains the documented accuracy advantage over the 50-member ECMWF ENS mean, while neither Aurora nor GraphCast ships a productised ensemble equivalent. EPT-2 forecasts at arbitrary lead times natively, while Aurora rolls forward in fixed 6-hour steps that compound error. Aurora and GraphCast run as guest models on the Jua for Energy platform, so head-to-head comparison is built into the product surface.
Can I backtest a trading strategy using Jua for Energy’s hindcast data?
Hindcast data, historical forecast runs generated with the same model configuration as the operational system, is available across multiple Jua and third-party models through the Python SDK and REST API. Backtests run in approximately 5 minutes via Athena, Jua’s AI agent, or directly through the SDK for teams that prefer programmatic access. The platform also integrates ERA5 reanalysis data from 1990 onward at 0.25° resolution, used as the historical training base for the EPT family and as the reference for long-horizon backtests.
For quant funds running systematic weather-signal strategies, this feature replaces roughly a quarter of pipeline-engineering work. Hindcast access, ensemble depth, and a clean schema sit under one pip install jua.
Is Jua for Energy a replacement for an ECMWF subscription?
Jua for Energy is designed to run alongside an ECMWF subscription, not replace it. ECMWF HRES and ENS remain the universal benchmarks for operational NWP, and ECMWF AIFS, ECMWF’s own AI model, runs natively on the Jua for Energy platform. Jua for Energy displaces the plumbing around the ECMWF feed. That includes the in-house grib pipeline, the manual benchmarking harness, the morning-briefing analyst, and the dashboard stitching across a dozen vendor contracts.
The 7–9 a.m. manual prep routine compresses into a single workspace, refreshed up to 24 times per day, where ECMWF, GFS, AIFS, Aurora, and EPT-2 all appear under one schema and one API. A 1 GW wind portfolio that gains four percentage points of forecast accuracy saves approximately €1.5 M per year in hedging and imbalance costs. A 1 GW solar portfolio at the same accuracy gain saves approximately €3 M per year.
Conclusion: Benchmark EPT-2 on Your Own Region
The 2026 arXiv benchmarks documented in arXiv:2507.09703 establish EPT-2 as the current state of the art in production AI weather forecasting for energy-relevant variables. The numbers are externally verifiable against more than 10,000 ground stations on open-source StationBench, with no post-processing. The operational specifications, including up to 24× daily refresh, the EPT-2e ensemble beating the 50-member ECMWF ENS mean, ~0.25 kWh inference cost, and the pip install jua SDK, are live in production and used by Axpo, TotalEnergies, Statkraft, EnBW, EDF, and Hydro-Québec across five continents.
The deal trigger for every serious evaluation remains the same. Teams run the benchmark on their own region and their own variable. The Jua for Energy platform returns a head-to-head accuracy comparison against the current provider in seconds, which shifts the objection from “is this real?” to “how fast can we sign?”