Written by: Olivier Lam, Physical AI Team, Jua.ai AG
Key Takeaways for Trading Teams
- Day-ahead solar forecast accuracy directly shapes imbalance penalties and trading spreads, so update frequency, probabilistic skill, and transparent benchmarking matter.
- Legacy NWP models typically run only 2–4 times per day because of high compute costs, which leaves traders working with stale forecasts between cycles.
- Jua’s EPT-2 foundation model delivers rapid-refresh forecasts up to 24 times daily at a fraction of NWP cost while outperforming ECMWF HRES on surface solar radiation across all lead times.
- The Jua platform unifies 25+ models under a single schema and AI agent (Athena), turning fragmented tools into one workspace that refreshes automatically.
- Schedule a consultation with Jua to benchmark day-ahead solar forecasts on your own regions and variables in minutes.
Problem: Stale Forecasts Between NWP Runs
Global numerical weather prediction (NWP) underpins ECMWF HRES, NOAA GFS, and DWD ICON by decomposing the atmosphere into three-dimensional grid cells and solving differential equations inside each one. A single simulation consumes approximately 8,400 kWh and costs €1,000–€20,000 on high-performance computing infrastructure. That compute ceiling limits operational runs to two to four per 24-hour period. Between runs, the solar forecast a trader sees can be six hours old or more, which means the market often moves on information the desk does not yet have. Research on intraday electricity markets shows that solar forecast errors can increase traded intraday volume and that wind forecast errors can widen the day-ahead versus intraday price spread, a direct measure of the cost of stale data.
Solution: Rapid Refresh and Native Any-Δt Forecasting with EPT
Jua is a foundation model and agent company, and Jua for Energy is its first applied product. The product is built on the Earth Physics Transformer (EPT) family, a general spatiotemporal transformer foundation model that learns the governing physics of complex systems directly from observational data, and on Athena, an AI agent. EPT-2, the deterministic flagship, produces forecasts at native any-Δt and is trained to predict at arbitrary lead times rather than rolling forward in fixed 6-hour increments. This architecture matters because Aurora and most AI peers roll forward in 6-hour steps, compounding error at each step, while EPT-2 avoids that error cascade entirely. The EPT family then splits into operational variants: EPT-2 RR focuses on rapid refresh and updates up to 24 times per day, and EPT-2e focuses on probabilistic outlooks and updates 4 times per day. Rapid-refresh variants cover shorter horizons than the 20-day deterministic flagship, and users who need extended probabilistic outlooks use EPT-2e, which runs to 60 days.
Problem: Fragmented Tools and Manual Workflow Integration
The standard energy-trading workflow for solar forecasting often involves downloading raw grib files from ECMWF or GFS, processing them through in-house pipelines, cross-referencing vendor dashboards, and manually assembling a desk view before market open. Each step introduces a potential failure point and a delay. Pipeline maintenance consumes engineering capacity that could support strategy development. Vendor dashboards rarely share a schema, which complicates cross-model comparison. When a model revises mid-cycle, the trader frequently discovers the change only after the market has already re-priced.
Solution: Agent-Assisted Single Workspace for Trading Desks
Athena, Jua’s AI agent instrumented with the Jua for Energy tool surface, accepts a natural-language objective and returns a briefing, benchmark, backtest, or custom widget. Typical queries resolve in approximately 90 seconds. The Jua platform exposes 25+ models, including 10 proprietary EPT-family variants and 15 third-party NWP and AI models (ECMWF HRES, ECMWF ENS, ECMWF AIFS, NOAA GFS, DWD ICON, Microsoft Aurora, GFS GraphCast, and others), through a single unified schema and a single REST API with Apache Arrow support. The 7–9 a.m. manual prep routine compresses into one workspace that refreshes on the cadence of the underlying physics rather than on manual downloads.
Problem: Difficulty Benchmarking Provider Accuracy
Most day-ahead solar forecast providers do not publish region-specific skill scores against independent ground-truth observations. Vendor-provided graphics do not replace head-to-head accuracy comparisons on the variables and geographies that drive a specific book. Empirical studies show that accuracy varies substantially by model and regime, and this spread remains invisible without transparent benchmarking.
Solution: Live Cross-Model Benchmarking Surface on Jua
The Jua platform’s benchmarking surface places 25+ models on a single interface. A user selects any region, any variable, and any time window, and the platform returns a head-to-head accuracy comparison in seconds. Benchmarks are evaluated against more than 10,000 real ground stations using the open-source StationBench methodology, with no post-processing or station fine-tuning. EPT-2 outperforms ECMWF HRES on surface solar radiation (SSRD) across the full 0–240 hour lead-time range, as documented in the EPT-2 technical report (arXiv:2507.09703). Microsoft Aurora has no SSRD output at all, so EPT-2 is currently the only AI model in the comparison set with a published solar radiation benchmark.
Problem: High Compute Cost Limiting Update Frequency
The economics of HPC infrastructure create a hard constraint on NWP refresh frequency. At approximately 8,400 kWh and €1,000–€20,000 per simulation, running a full global NWP more than four times per day is not economically viable for any operational center. The energy industry has operated under this constraint for forty years, which has normalized infrequent updates even when markets move faster.
Solution: GPU-Efficient Foundation Model Inference
A single EPT-2 inference runs on a single GPU in minutes at approximately 0.25 kWh and $0.20–$15 per simulation, which is roughly four orders of magnitude cheaper than an equivalent NWP run. EPT-2 was trained on 8 × H100 GPUs over 10 days, while Microsoft Aurora required 32 × A100 GPUs over 18 days. The cost asymmetry at inference enables up to 24 refreshes per day for EPT-2 RR without an HPC cluster and without compromising forecast quality. Frequent updates, however, only solve part of the trading problem, because traders still need probabilistic information to size bids and hedges.
Problem: Gap Between Raw Output and Action-Ready Analysis
Point forecasts, which provide a single expected generation value per hour, offer no information about the distribution of possible outcomes. An independent power producer bidding into the day-ahead market on a deterministic 80 MW forecast, when a probabilistic forecast shows a 40% chance of output falling below 70 MW, is systematically exposed to imbalance penalties that a quantile-aware bid would have hedged. Point forecasts create costly errors in market positions and settlement exposure for utility-scale operators, and probabilistic P10/P90 envelopes are operationally required for nomination construction in day-ahead bidding across European exchanges.
Solution: Probabilistic Outputs and Automated Trading Briefings
EPT-2e, the ensemble variant of the EPT family, beats the 50-member ECMWF ENS mean on both RMSE and CRPS (Continuous Ranked Probability Score, a proper scoring rule that rewards calibrated probabilistic forecasts) at virtually every lead time, as documented in arXiv:2410.15076 and confirmed in arXiv:2507.09703. No AI weather peer currently ships a productised ensemble equivalent. Day-ahead briefings on the Jua platform auto-refresh on every new model run and cover model consensus across 25+ models, model delta since the previous run, convergence tracking, and price implications. Traders receive a written summary before the market opens rather than assembling it manually.
How Accurate Are Day-Ahead Solar Forecasts for Trading?
Concrete benchmarks anchor evaluation for trading use. EPT-2 outperforms ECMWF HRES on surface solar radiation across the full 0–240 hour lead-time range, evaluated against more than 10,000 ground stations on open-source StationBench (arXiv:2507.09703). The same benchmarking work shows that EPT-2e maintains an advantage over ECMWF ENS on RMSE and CRPS at almost every lead time. Aurora produces no SSRD output, which removes it from the solar-specific comparison entirely.
For a hypothetical 500 MW solar portfolio, the operational implication is direct. A 1 GW solar portfolio that gains four percentage points of forecast accuracy saves approximately €3 M per year under typical hedging and imbalance penalty structures. At 500 MW, that scales to approximately €1.5 M per year. The Jua platform supports up to 1 km resolution for product-level power forecasts, which enables asset-level granularity for large portfolios. Studies on the ERCOT grid demonstrate the level of day-ahead solar forecast accuracy that foundation models can achieve at the 24-hour horizon, a reference point for the accuracy regime that systematic trading strategies require.
Neutral Comparison of Day-Ahead Solar Forecast Providers
The following comparison highlights how Jua’s approach differs from legacy NWP and AI peers on three trading-relevant dimensions: update frequency, probabilistic skill for solar radiation, and cost per run, which together shape how often you can refresh, how confidently you can size bids, and whether rapid refresh is economically viable.
| Provider | Update Frequency | Probabilistic Skill (SSRD) | Cost per Run |
|---|---|---|---|
| Jua for Energy (EPT-2 / EPT-2e) | Up to 24×/day (EPT-2 RR); EPT-2e 4×/day | EPT-2e beats 50-member ECMWF ENS mean on RMSE and CRPS at virtually every lead time (arXiv:2507.09703); EPT-2 beats ECMWF HRES on SSRD 0–240 h | ~$0.20–$15 per simulation, ~0.25 kWh, single GPU |
| ECMWF HRES / ENS (NWP incumbent) | 2–4×/day | ENS: 50-member gold standard for probabilistic NWP; HRES: deterministic benchmark for 40 years | ~€1,000–€20,000 per simulation, ~8,400 kWh, HPC cluster |
| Microsoft Aurora (AI peer) | Typically 4×/day (research cadence; no productised operational schedule) | No productised ensemble; no SSRD output published | Similar inference order of magnitude to Jua; no productised cost schedule |
| GFS GraphCast / ECMWF AIFS (AI peers) | Typically 4×/day (research cadence) | No productised ensemble equivalent; raw outputs without probabilistic tooling | Similar inference order of magnitude; no productised cost schedule |
Point-solution SaaS vendors and meteorology consultancies are not included in the table because they do not own a forecasting model and cannot provide like-for-like inference cost or probabilistic skill figures. They resell processed NWP outputs or produce analyst reports, so the comparison is categorical rather than metric.
Risks and Due Diligence When Evaluating Day-Ahead Solar Forecast Providers
Four due-diligence criteria apply when evaluating any day-ahead solar forecast provider for trading use.
Peer-reviewed benchmarks. First, demand published accuracy results evaluated against independent ground-truth observations rather than relying on vendor graphics. EPT-2 and EPT-2e results are documented in peer-reviewed technical reports at arXiv:2507.09703 and arXiv:2410.15076. Providers unable to point to equivalent external validation should be treated with caution.
Hindcast availability. Benchmarks alone, however, do not enable systematic evaluation, which requires hindcast availability. Backtesting a trading strategy needs years of historical forecast data at the same cadence and schema as the live feed. Providers that cannot supply hindcasts cannot be evaluated systematically. The Jua platform provides hindcast data across multiple Jua and third-party models, accessible via pip install jua and the REST API.
Integration requirements. Once you have confirmed both accuracy and historical data access, the next criterion is integration. A forecast that cannot be ingested into existing dispatch, risk, or trading systems without a multi-quarter engineering project is not operationally viable. Evaluate schema stability, API documentation quality, large-payload support, with Apache Arrow as the relevant standard for continental multi-model queries, and ENTSO-E grid-data integration for European power markets.
StationBench methodology. Finally, examine the evaluation methodology itself. The open-source StationBench evaluation framework benchmarks forecast models against more than 10,000 real ground stations with no post-processing or station fine-tuning. It is the methodology Jua uses for EPT-2 and EPT-2e evaluation and is available for independent replication. Providers evaluated only on gridded reanalysis comparisons, rather than against real station observations, may report inflated skill scores.
Frequently Asked Questions
What is a day-ahead solar forecast and why does it matter for energy trading?
A day-ahead solar forecast is a quantitative prediction of surface solar irradiance or photovoltaic generation output for each hour of the following trading day, produced before day-ahead market gate closure. Energy traders, utility dispatchers, and balancing-responsible parties use it to set generation bids, schedule reserve capacity, and manage imbalance exposure. Errors in the forecast translate directly into imbalance penalties, which are charges levied when actual generation deviates from the scheduled position, and into missed intraday spreads when a model revision moves the price after the trade window has closed. The €3 M annual saving for a 1 GW portfolio discussed earlier illustrates how even modest accuracy gains change P&L.
What inputs and data sources go into a day-ahead solar forecast?
Operational day-ahead solar forecasts combine multiple data streams. NWP models provide the atmospheric state, including cloud cover, surface solar radiation, temperature, and humidity at grid resolution. Satellite imagery, both geostationary and polar-orbiting, provides near-real-time cloud observations used for nowcasting and model initialization. Ground station networks (SYNOP, METAR, proprietary feeds) provide surface irradiance measurements for calibration and validation. Machine learning post-processing layers correct systematic NWP biases and downscale outputs to asset-level resolution. The EPT family is trained on more than 5 petabytes of weather and climate data from 120+ distinct sources, including all of the above plus ocean buoys, national radar networks, and ERA5 reanalysis, and covers more than 10,000 proprietary stations in EPT-2.
How should I evaluate probabilistic skill in a day-ahead solar forecast?
Two metrics are standard for evaluating probabilistic skill. RMSE (Root Mean Square Error) measures deterministic accuracy, and lower values indicate better performance. CRPS (Continuous Ranked Probability Score) is a proper scoring rule for probabilistic forecasts that rewards calibrated uncertainty estimates and penalizes forecasts that assign high probability to outcomes that do not occur. The benchmarking results detailed earlier, including EPT-2e’s advantage over ECMWF ENS on both RMSE and CRPS, are evaluated on open-source StationBench against more than 10,000 ground stations. For trading applications, CRPS is often the more operationally relevant metric because it directly measures the quality of the probability distribution used to construct quantile bids (P10/P50/P90) and size imbalance hedges.
Can I integrate Jua for Energy forecasts into my existing trading pipeline?
Jua for Energy exposes all 25+ models through a REST API with Apache Arrow support for large payloads and a Python SDK installable via pip install jua. Hindcast data is available across multiple Jua and third-party models for backtesting. ENTSO-E grid data is integrated directly for European power markets. The unified schema means that switching between EPT-2, ECMWF HRES, Aurora, or any other model on the platform does not require re-engineering the ingestion pipeline. Quant teams at capital-markets funds pipe Jua forecasts directly into systematic models, and utilities and trading houses connect to existing dispatch and risk systems, so integration that might take a quarter elsewhere typically stands up in days.
What are the limitations of AI-based day-ahead solar forecasts compared to NWP?
AI weather models trained without physics constraints can produce outputs that violate conservation laws such as mass, momentum, and energy, which makes them unsafe to trade on without validation. EPT addresses this by design and uses a spatiotemporal transformer foundation model that learns the governing physics of complex systems directly from observational data in a latent representation constrained by those conservation laws. The validation is external and reproducible, because EPT-2 is benchmarked against more than 10,000 real ground stations on open-source StationBench, with results published in peer-reviewed technical reports. The remaining limitation shared by all forecast models is irreducible atmospheric uncertainty at longer lead times, which is why probabilistic outputs from EPT-2e are more operationally useful than point forecasts for day-ahead trading decisions.
Conclusion: Applying Evaluation Criteria to Day-Ahead Solar Forecast Selection
Legacy NWP day-ahead solar forecasts leave traders exposed to imbalance penalties and missed spreads through four structural constraints: update frequency capped at two to four runs per day by HPC economics, point-only outputs that provide no probabilistic context for bid sizing, fragmented workflows that require manual grib processing and dashboard stitching, and the absence of transparent, region-specific benchmarking surfaces. Physics foundation models with agent-assisted platforms address all four constraints at once and align the forecast surface with trading needs.
The five evaluation criteria that follow from this analysis are clear. Desks should focus on update frequency and any-Δt capability, probabilistic skill measured by CRPS against independent ground stations, workflow integration via a unified API and SDK, hindcast availability for systematic backtesting, and benchmarking transparency using a methodology like StationBench that is reproducible and independent of the vendor. The benchmarking results detailed earlier, including EPT-2’s advantage over HRES and EPT-2e’s advantage over ENS, are documented in peer-reviewed technical reports at arXiv:2507.09703 and arXiv:2410.15076.