Written by: Olivier Lam, Physical AI Team, Jua.ai AG
Key Takeaways for European Power and Gas Desks
-
Legacy NWP models refresh only 2–4 times daily, so traders rely on stale forecasts and silent mid-cycle revisions that move markets first.
-
Physics-foundation-model-plus-agent platforms like EPT-2 deliver up to 24 daily refreshes at ~0.25 kWh per run, cutting compute costs by roughly four orders of magnitude.
-
EPT-2 outperforms ECMWF HRES on every lead time for 10 m wind, 100 m wind, 2 m temperature, and surface solar radiation, validated against 10,000+ ground stations on open-source StationBench.
-
Athena’s natural-language agent compresses manual morning briefings, cross-model benchmarking, and backtests into 90-second queries while auto-generating widgets and dashboards.
-
See how Jua’s physics-foundation-model-plus-agent platform replaces fragmented grib pipelines and scattered tools with a single, live workspace in a demo.
The Problem: Legacy Forecasts Slow Intraday Energy Decisions
The European Centre for Medium-Range Weather Forecasts’ two-week outlook is the definitive reference point for traders repricing risk around heating demand, renewable output, and system tightness. That reference refreshes only two to four times per day. Between runs, traders operate on stale numbers. When ECMWF or GFS revises an output mid-cycle, the revision is silent, and the trader often discovers it only after someone else has already traded on it.
The workflow built on top of those forecasts compounds the problem. A typical morning begins at 6 a.m. The desk downloads overnight ECMWF and GFS runs as raw grib files (binary gridded format used by meteorological agencies), processes them through an in-house pipeline, cross-references an internal meteorology team or a consultancy, and stitches together spreadsheets, terminal screens, and vendor dashboards. By the time a coherent view of the day exists, the market has often moved. A single traditional NWP simulation consumes approximately 8,400 kWh of compute and costs €1,000–€20,000 to run on high-performance computing infrastructure. That hard economic ceiling caps update frequency at two to four runs per day and has constrained the energy industry for forty years.
Benchmarking across models is equally fragmented. Raw AI weather outputs from research labs arrive without ensembles (probabilistic multi-member forecast sets), hindcasts (historical forecast archives for backtesting), or productised tooling. Quant teams that subscribe to these outputs must build the ingestion pipeline, the ensemble logic, and the benchmarking harness themselves. Engineering capacity that should go into alpha research instead disappears into plumbing.
See how Jua for Energy replaces this fragmented stack with a single workspace in a live demo.
Physics-Foundation-Model-Plus-Agent Platforms for Energy Trading
The physics-foundation-model-plus-agent category combines two horizontal components: a foundation model trained on observational physics and an AI agent that turns natural-language objectives into deliverables. Jua is a foundation model and agent company. Jua for Energy is its first applied product, in the same way Anthropic runs Claude Code as a flagship vertical on a horizontal AI platform.
Jua’s Earth Physics Transformer (EPT) family is a general spatiotemporal transformer foundation model that learns the governing physics of complex systems, such as mass, momentum, and energy conservation, directly from observational data. Athena is Jua’s AI agent, instrumented with the Jua for Energy tool surface: forecast queries, model benchmarks, backtests, and widget generation. The architecture is domain-agnostic. Data and fine-tuning change from one physical system to the next.
A concrete trader workflow shows the difference. At 6 a.m., instead of downloading grib files, the trader opens the Jua platform. A Day-Ahead briefing covering model consensus across 25+ models, model delta since the previous run, convergence tracking, and price implications is already written and refreshed. The trader types into Athena: “What is the 100 m wind forecast spread across models for northern Germany tonight?” Athena returns the answer, with the underlying widget, in approximately 90 seconds. A divergence alert fires at 8:15 a.m. when EPT-2 and ECMWF ENS disagree on a wind ramp over the North Sea. The trader acts before the market reprices.
Validation Evidence: EPT-2 Accuracy Against ECMWF HRES
EPT-2, documented in the peer-reviewed technical report arXiv:2507.09703, outperforms ECMWF HRES on every lead time across the full 0–240 hour range on 10 m wind, 100 m wind, 2 m temperature, and surface solar radiation. The evaluation methodology uses open-source StationBench, benchmarked against more than 10,000 real ground stations with no post-processing or station fine-tuning. That setup provides an external, auditable standard. EPT-2e, the ensemble variant, beats the 50-member ECMWF ENS mean on both RMSE (root mean square error) and CRPS (continuous ranked probability score, a probabilistic accuracy metric) at virtually every lead time.
EPT-2 was trained on 8 × H100 GPUs over 10 days, while Microsoft Aurora required 32 × A100 GPUs over 18 days. A single EPT-2 inference runs on a single GPU in minutes at approximately 0.25 kWh and $0.20–$15. That figure is roughly four orders of magnitude cheaper than the equivalent NWP run.
Closing the Staleness Gap with EPT-2 Rapid Refresh
The staleness problem finds a direct answer in EPT-2 RR, Jua’s rapid-refresh model variant, which updates up to 24 times per day. EPT-2 HRRR delivers the same high-cadence refresh at up to 5 km spatial resolution over Europe. EPT-2 outperforms Aurora and IFS HRES across the 0–240 h horizon for energy-relevant variables. Customers running Jua for Energy alongside their existing NWP subscriptions see the next forecast hours before the next traditional run lands. A typical Jua run completes approximately 2.5 hours ahead of competing operational runs at the same cycle.
ECMWF HRES remains the universal benchmark, and Jua for Energy does not replace it. EPT-2 RR instead replaces the six-hour gap between traditional runs, during which traders carry unpriced weather risk.
Athena Natural Language Energy Analytics: From Questions to Trades
Fragmented tooling and manual benchmarking create real workflow costs, not just minor inconveniences. Internal meteorology teams spend a disproportionate share of their time producing daily morning briefings by hand and answering ad-hoc forecast questions for the trading floor. That work is high-quality and irreplaceable, yet slow and impossible to scale. Point-solution SaaS vendors sell processed NWP outputs without ensembles, benchmarking, or workflow tooling. Meteorology consultancies produce analyst reports the morning after the trade window has closed.
Athena addresses this directly. A typical query such as “backtest a wind-ramp strategy on EPT-2e over the last two winters” resolves in approximately 90 seconds for a briefing and approximately 5 minutes for a full backtest. Athena auto-creates personalised widgets and dashboards on request, which removes the manual assembly step. Athena turns raw physics predictions from EPT-2 into trading intelligence by reading market context and modelling participant behaviour. Trading houses and quant desks describe Athena as “another headcount, for free.”
Watch Athena answer a live query on your region and variables in a demo.
Rapid Refresh Forecasts for Intraday Trading
The compute ceiling on traditional NWP, approximately 8,400 kWh and €1,000–€20,000 per simulation, is structural rather than temporary. The economics of HPC infrastructure cap update frequency at two to four runs per day, and that ceiling has held for forty years. When a model revises its output mid-cycle, the revision is silent, and the trader often realises only after the market has moved.
EPT-2 RR runs on a single GPU in minutes at approximately 0.25 kWh and $0.20–$15 per simulation, which enables up to 24 refreshes per day without an HPC cluster. Divergence alerts fire the moment two or more models disagree on a key variable, so a trading opportunity appears as a notification instead of a missed move. Correction alerts fire the moment a model revises its own output between runs. Actual-generation power forecasts refresh every 15 minutes. The trade window opens with a notification instead of a manual refresh.
ENTSO-E Integration for Power Traders
The gap between raw weather model output and action-ready power market analysis is where most of the manual work sits. Raw AI weather subscriptions arrive as model files, and the quant team then builds the ingestion pipeline, the ensemble logic, and the benchmarking harness. Grid data from ENTSO-E (European Network of Transmission System Operators for Electricity) requires a separate integration. Power forecasts for specific bidding zones require capacity-weighting and PSR (Production Source Resource) classification on top of the weather signal.
Jua for Energy integrates ENTSO-E grid data directly, covering actual generation, capacity, and PSR classifications across the European power markets the platform serves. Power Forecast covers solar, wind onshore, wind offshore, total wind, total renewables, load, and residual load, live in Germany, Great Britain, France, the Netherlands, and Belgium. A Fundamental Model combines the EPT weather forecast with installed-capacity data and runs out to 20 days. An Actual Generation Model refreshes every 15 minutes with a 48-hour horizon. A 1 GW wind portfolio that gains four percentage points of forecast accuracy saves approximately €1.5 million per year under typical hedging and imbalance-cost structures, translating to hundreds of millions for large portfolios.
Hindcast Backtesting for European Power Markets
Systematic strategies require years of historical forecast data to backtest. Most providers cannot deliver hindcasts at the depth, schema consistency, and model coverage required to run a continental, multi-variable, multi-model backtest. Quant teams that subscribe to AI weather research outputs receive raw model files without hindcast access and must build the historical archive themselves.
The Jua platform exposes hindcast data across multiple Jua and third-party models through a single REST API with Apache Arrow (a columnar payload format for large data queries) support. pip install jua installs the Python SDK. Backtests run in approximately 5 minutes via Athena or directly through the SDK for programmatic access. ERA5 reanalysis data is available from 1990 onward at 0.25° resolution as the historical training base and long-horizon backtest reference. The integration that takes a quant team a quarter to build elsewhere stands up in days.
Head-to-Head Comparison: Jua for Energy vs NWP Incumbents and AI Labs
|
Capability |
Jua for Energy (EPT family + Athena) |
ECMWF HRES / ENS |
Aurora / GraphCast |
|---|---|---|---|
|
Deterministic accuracy (0–240 h, 10 m wind, 100 m wind, 2 m temp, SSRD) |
The 40-year benchmark, universal reference |
Aurora loses to EPT-2 on 10 m and 100 m wind across full range, no SSRD output |
|
|
Ensemble (probabilistic) forecasting |
EPT-2e beats 50-member ECMWF ENS mean on RMSE and CRPS at virtually every lead time |
ENS: 50 members, gold standard for probabilistic NWP |
No productised ensemble equivalent |
|
Spatial resolution |
Up to 5 km native (EPT-2 HRRR, Europe) |
9 km (HRES) |
~25 km at published resolution |
|
Update frequency |
2–4×/day |
Typically 4×/day research cadence, no productised operational schedule |
|
|
Inference cost per simulation |
~0.25 kWh, ~$0.20–$15, minutes on a single GPU |
~8,400 kWh, €1,000–€20,000, 1–2 hours on HPC |
Similar order of magnitude to Jua for inference |
|
Natural-language agent |
Athena: briefings, benchmarks, backtests, widget generation (~90 s per query) |
None |
None |
|
Live cross-model benchmarking |
25+ models on one platform, any region, any variable, result in seconds |
Available to members, no productised cross-vendor benchmarking |
No productised benchmarking surface |
|
Power forecasts |
Solar, wind on/offshore, load, residual load in 5 countries, 15-min refresh, 20-day horizon |
Not a native product |
Not a native product |
|
API / SDK |
REST + Apache Arrow, |
Grib files via MARS, member access |
Research code or limited API |
Risks and Due Diligence for Energy Trading Analytics Europe
Physics models that hallucinate or violate conservation laws are unsafe to trade on. Any evaluation of an AI weather model for energy trading should require three things, and each one addresses a different failure mode. First, physics-constraint verification ensures the model does not produce physically impossible outputs that break conservation laws. The model architecture must enforce conservation of mass, momentum, and energy at the representation level, not as a post-processing correction. EPT is a spatiotemporal transformer trained on observational physics, and its outputs are physically constrained by construction.
Second, peer-reviewed benchmark reports prevent cherry-picked accuracy claims. Performance must be anchored to external, auditable evaluation against real ground-truth observations. EPT-2 is benchmarked against more than 10,000 real ground stations on open-source StationBench, with results published in arXiv:2507.09703. EPT-1.5 results are published in arXiv:2410.15076. Third, a live benchmark requirement separates platforms from static reports. Any vendor that cannot run a head-to-head accuracy comparison on the prospect’s own region and variable in real time is selling a report, not a platform. The Jua platform returns a live benchmark result in seconds.
Frequently Asked Questions
Is Jua a weather AI company?
No. Jua is a foundation model and agent company. EPT is a general physics foundation model, and Athena is an AI agent. Jua for Energy is the first applied product built on both, mirroring the relationship Anthropic has to Claude Code. The atmosphere is the first physical system EPT has been fine-tuned for, and energy trading is the first market Athena has been instrumented for. Both will expand to other physical-economy domains including plasma fusion, aerospace, and materials.
We already have an ECMWF subscription. Why add Jua for Energy?
Jua for Energy does not replace ECMWF. Most serious customers keep their ECMWF subscription and run Jua for Energy alongside it, and ECMWF AIFS even runs natively on the Jua platform. Jua for Energy instead replaces the plumbing around the ECMWF feed: the in-house grib pipeline, the spreadsheet stitching, the manual benchmarking, and the morning-briefing routine. The 7–9 a.m. manual prep compresses into a single workspace, refreshed up to 24 times a day, where every model, including ECMWF, GFS, AIFS, Aurora, and EPT, sits on the same screen with one schema and one API.
How is EPT-2 different from Microsoft Aurora or DeepMind GraphCast?
Aurora and GraphCast are research outputs from large companies’ AI labs. Jua for Energy is a productised platform built on EPT and Athena, where Aurora and GraphCast run as guests on the comparison surface. Five concrete differences matter. EPT-2 forecasts at arbitrary lead times natively, while Aurora rolls forward in fixed 6-hour steps and compounds error. EPT-2e is a productised ensemble that beats the 50-member ECMWF ENS mean on RMSE and CRPS at virtually every lead time, with no AI peer shipping an equivalent. EPT-2 RR refreshes up to 24 times per day against the peers’ typical 4-times-per-day research cadence. Athena provides a natural-language agent layer with no peer equivalent. The 25-model benchmarking platform includes Aurora and GraphCast as guests, so the comparison is built in.
How quickly can we integrate Jua for Energy into our existing pipelines?
pip install jua installs the Python SDK. The REST API exposes 25+ models through a single schema with Apache Arrow support for large payloads. ENTSO-E grid data integrates directly. Hindcast data is available across multiple Jua and third-party models for backtesting. Quant teams that have spent a quarter building equivalent integrations elsewhere typically stand up the Jua for Energy integration in days. A live benchmark proof-of-value runs in approximately 5 minutes on the prospect’s own region and variable.
What is the financial case for switching?
The economics scale linearly across portfolios. The €1.5 million annual savings per GW of wind capacity documented earlier applies to each additional gigawatt in a portfolio, and a 1 GW solar portfolio at the same accuracy gain saves approximately €3 million per year. Customers operating multi-GW portfolios extend these economics across their full book. The live benchmark is the deal trigger, because meteorologists and quant developers who run the head-to-head comparison on their own region and variable shift the internal conversation from “is this real?” to “how fast can we sign?”
Conclusion
The European energy trading analytics stack has a structural problem. NWP models refresh two to four times per day, manual grib pipelines consume the first two hours of every trading morning, and fragmented tooling prevents live cross-model benchmarking. The cost is measurable, at approximately €1.5 million per gigawatt of wind capacity per year at four percentage points of forecast accuracy lost and approximately €3 million per gigawatt of solar. The solution category is physics-foundation-model-plus-agent platforms.
Jua for Energy, built on EPT-2 and Athena, delivers live 25-model benchmarking, up to 24 refreshes per day via EPT-2 RR, 90-second natural-language queries via Athena, ENTSO-E-integrated power forecasts refreshing every 15 minutes across five European countries, and hindcast backtesting via a single Python SDK. The accuracy advantage documented earlier, EPT-2’s lead over ECMWF HRES across all energy-relevant variables, is validated against more than 10,000 ground stations on open-source StationBench and published in peer-reviewed reports on arXiv. The numbers speak, and the benchmark is live.
See EPT-2 head-to-head against your current forecast provider in a live benchmark.