Written by: Olivier Lam, Physical AI Team, Jua.ai AG
Key takeaways for evaluating Atmo AI vs EPT-2
- Atmo AI has not published peer-reviewed benchmarks on RMSE, CRPS, or ensemble skill, so independent verification is not possible.
- EPT-2 outperforms ECMWF HRES on every lead time for 10 m wind, 100 m wind, 2 m temperature, and surface solar radiation across the full 0–240 h range.
- EPT-2e beats the 50-member ECMWF ENS mean on both RMSE and CRPS at virtually every lead time, with updates up to 24× per day.
- Forecast accuracy gains of four percentage points can save €1.5 M–€3 M per year on a 1 GW wind or solar portfolio.
- Run EPT-2 against your current provider on the Jua platform and see the gap on your own data.
How Atmo AI presents its weather forecasting claims
Atmo presents itself as an AI-native weather forecasting provider for commercial and operational use. Its public materials describe high accuracy and fast inference. As of May 2026, no peer-reviewed technical report, arXiv preprint, or independent benchmark documents Atmo’s RMSE, CRPS, ensemble spread-skill ratio, or verification methodology against ground-truth observations.
That absence has direct operational consequences. Meteorologists at regulated utilities and physical trading houses evaluate forecast systems against real station data, not vendor graphics. Research on Pangu-Weather shows that even well-resourced AI weather models trained exclusively on reanalysis data rather than real observational inputs require careful benchmark validation before anyone can rely on them operationally. Without a published methodology, teams cannot run that evaluation for Atmo AI.
EPT-2, the deterministic flagship inside Jua for Energy, follows a different path. EPT (Earth Physics Transformer) is a general spatiotemporal transformer foundation model that learns the governing physics of complex systems, including conservation of mass, momentum, and energy, directly from observational data. The architecture is domain-agnostic. Atmospheric prediction is the first physical system it has been fine-tuned for. EPT-2 and its ensemble variant EPT-2e are documented in peer-reviewed technical reports on arXiv:2507.09703 and arXiv:2410.15076, benchmarked against more than 10,000 real ground stations using open-source StationBench, with no post-processing or station fine-tuning. That documentation makes a direct comparison possible.
Atmo AI vs EPT-2 benchmarks for energy trading variables
The table below compares published benchmark performance across the four variables most relevant to energy trading. EPT-2 figures come from arXiv:2507.09703. Atmo AI figures appear as not published because no peer-reviewed source exists as of the date of this article. ECMWF HRES appears as the universal NWP benchmark.
| Model | 10 m Wind / 100 m Wind (RMSE vs HRES, 0–240 h) | 2 m Temperature (RMSE vs HRES, 0–240 h) | Surface Solar Radiation (RMSE vs HRES, 0–240 h) |
|---|---|---|---|
| EPT-2 | Outperforms ECMWF HRES on every lead time across full 0–240 h range | Outperforms ECMWF HRES on every lead time across full 0–240 h range | Outperforms ECMWF HRES on every lead time across full 0–240 h range |
| ECMWF HRES | Benchmark, 40 years of NWP leadership | Benchmark, 40 years of NWP leadership | Benchmark, 40 years of NWP leadership |
| Microsoft Aurora | Loses to EPT-2 on 10 m and 100 m wind across full 0–240 h range | Loses to EPT-2 up to ~130 h lead time | No SSRD output published |
| Atmo AI | Not published, no peer-reviewed benchmark available | Not published, no peer-reviewed benchmark available | Not published, no peer-reviewed benchmark available |
EPT-2e, the ensemble variant, beats the 50-member ECMWF ENS mean on both RMSE and CRPS at virtually every lead time, as documented in arXiv:2507.09703. No equivalent ensemble benchmark exists for Atmo AI. ECMWF researchers show that ensemble-mean forecasts reduce overall RMSE by lowering noise error through smoothing while retaining the same information error as individual perturbed members. Ensemble CRPS skill, not just mean RMSE, therefore provides the correct metric for probabilistic forecast evaluation. EPT-2e is documented on both metrics.
Atmo AI accuracy and the economics of energy trading
Forecast accuracy flows straight into trading economics. A 1 GW wind portfolio that gains four percentage points of forecast accuracy saves approximately €1.5 M per year under typical hedging and imbalance penalty structures. A 1 GW solar portfolio at the same accuracy gain saves approximately €3 M per year. Multi-GW portfolios scale these figures linearly. Accurate short-term weather forecasts let traders anticipate renewable supply conditions and market volatility, avoid penalties for under- or over-performance, and time storage for maximum economic benefit.
Without published RMSE or CRPS figures, no one can place Atmo AI on this economic scale. A vendor that claims accuracy without a verification methodology cannot be evaluated against a 1 GW wind book or a day-ahead imbalance position. The evaluation requires numbers, which is precisely why Jua built a platform that makes those numbers immediately accessible for any model that publishes them.
Jua for Energy provides those numbers through a live benchmarking surface that puts 25+ models, including 10 proprietary AI models from the EPT family plus 15 third-party NWP and AI models such as ECMWF HRES, ECMWF ENS, Microsoft Aurora, and GFS GraphCast, on a single platform. This unified access means a meteorologist or quant developer can select any region, any variable, and any time window and receive a head-to-head accuracy comparison in seconds, without data wrangling. For users who prefer natural language over manual selection, Athena, Jua’s AI agent instrumented with the Jua for Energy tool surface, resolves a natural-language benchmark query in approximately 90 seconds and can run a full multi-year backtest in approximately 5 minutes.
Atmo AI update frequency and trading windows
Update frequency sets a hard boundary for how often traders can refresh their view of the atmosphere. A single traditional NWP simulation consumes approximately 8,400 kWh of compute and costs €1,000–€20,000 to run on HPC infrastructure. That cost profile caps ECMWF HRES at two to four global runs per day, a ceiling the energy industry has operated under for forty years. Between runs, traders work from stale numbers.
A single EPT-2 inference runs on a single GPU in minutes at approximately 0.25 kWh and $0.20–$15. That cost asymmetry, roughly four orders of magnitude, makes high-frequency operational refresh economically viable. Atmo AI has not published its inference cost, update schedule, or ensemble availability in any peer-reviewed source. The table below shows how this cost advantage translates into operational capabilities that matter for trading: update frequency, ensemble availability, and workflow integration.
| System | Update Frequency | Ensemble Availability | Workflow Tooling |
|---|---|---|---|
| EPT-2 / EPT-2 RR (Jua for Energy) | Up to 24×/day (EPT-2 RR); EPT-2e 4×/day; native resolution up to 5 km | EPT-2e: beats 50-member ECMWF ENS mean on RMSE and CRPS at virtually every lead time | REST API, Python SDK (pip install jua), Athena agent, 25+ model benchmarking surface, divergence and correction alerts |
| ECMWF HRES / ENS | 2–4×/day | ENS: 50 members, gold standard for probabilistic NWP | Grib files via MARS, member access, no productised cross-vendor benchmarking |
| Microsoft Aurora / GFS GraphCast | Typically 4×/day, no productised operational schedule | No productised ensemble equivalent published | Research code or limited API, no agent layer, no benchmarking surface |
| Atmo AI | Not published | Not published | Not published in peer-reviewed sources |
EPT-2 RR updates up to 24 times per day, which gives traders 24 chances to catch market-moving weather shifts. For users who need both high frequency and high spatial detail, EPT-2 HRRR delivers the same high-cadence refresh at up to 5 km native resolution over Europe. The platform then converts these atmospheric forecasts into actual-generation power forecasts that refresh every 15 minutes, and the product surface supports up to 1 km resolution for site-specific analysis. This speed advantage means a typical Jua run completes approximately 2.5 hours ahead of competing operational runs at the same cycle.
See how 24×/day updates change your trading windows by testing EPT-2 RR on your region.
Skepticism in the meteorology community about AI weather models
Meteorologists at regulated utilities act as the most technically rigorous evaluators in the energy forecasting market. They do not respond to vendor graphics. They run benchmarks. Their scepticism of AI weather models has a strong basis. AI-based methods produce smoother forecast results than traditional NWP due to regression toward average values, which increases the risk of underestimating the magnitudes of extreme weather events. The same research shows that models trained exclusively on reanalysis data rather than real observational inputs create differences that require further investigation for real-world performance.
A core challenge for large AI weather models is limited physical consistency and interpretability, because purely data-driven training on historical datasets does not explicitly solve atmospheric physics equations. This challenge has prompted the research community to investigate hybrid approaches that embed physical constraints. EPT addresses this at the architecture level. It is a physics foundation model that learns conservation laws directly from observational data in a latent representation that is integrated forward in time. Outputs are physically constrained by construction.
The validation record for EPT-2 is concrete and external, which directly addresses that scepticism. The StationBench methodology described earlier benchmarks EPT-2 against more than 10,000 real ground stations with no post-processing or station fine-tuning, and the results appear in the arXiv reports already cited. Atmo AI has no equivalent published record. For a meteorologist who must defend a forecast to internal risk and regulatory stakeholders, that asymmetry becomes the evaluation.
GraphCast’s evaluation focuses on standard verification metrics and benchmark comparisons against systems such as ECMWF HRES rather than guaranteed performance across all operational edge cases. GraphCast still ranks among the more transparent AI weather models in the field. Atmo AI has not yet reached that bar.
Conclusion: how to evaluate Atmo AI weather forecasting for trading
Five criteria determine whether an AI weather forecasting system is fit for energy trading operations. These criteria are accuracy against ground-truth observations, update frequency relative to trade horizons, ensemble skill on RMSE and CRPS, transparency through peer-reviewed benchmarks, and workflow fit through productised tooling. Evaluated on all five, the picture is clear.
EPT-2 delivers the HRES performance advantage documented earlier across all four energy-critical variables and the full forecast horizon. EPT-2e delivers the ensemble performance advantage over ECMWF ENS documented in the benchmark section. EPT-2 RR updates up to 24 times per day. The Jua platform benchmarks 25+ models on any region and variable in seconds. Athena resolves natural-language queries in approximately 90 seconds. Atmo AI weather forecasting claims cannot be placed on any of these axes because no peer-reviewed benchmark has been published.
Jua operates as a foundation model and agent company. Jua for Energy is the first applied product, built on EPT, a general physics foundation model, and Athena, an AI agent. The architecture learns physics and treats the domain as a variable. The atmosphere is the first physical system EPT has been fine-tuned for. Energy trading is the first market Athena has been instrumented for. The published numbers provide the basis for evaluation.
See the benchmark on your own data and quantify the trading impact.
Frequently asked questions
What makes EPT-2 different from Atmo AI and other AI weather models?
EPT-2 is the atmospheric application of EPT, a general physics foundation model built by Jua. The architecture learns conservation laws, including mass, momentum, and energy, directly from observational data in a latent representation that is integrated forward in time. Outputs are physically constrained by construction. EPT-2 is benchmarked against more than 10,000 real ground stations using open-source StationBench, with results published in peer-reviewed technical reports on arXiv. Atmo AI has published no equivalent benchmark. Microsoft Aurora and GFS GraphCast are research-grade outputs without productised ensembles, operational refresh schedules, or agent layers. EPT-2’s performance advantage over HRES, detailed in the benchmark section above, extends across all wind, temperature, and solar variables that matter for energy trading. Aurora loses to EPT-2 on 10 m and 100 m wind across the same range and has no surface solar radiation output at all.
How often does Jua for Energy update its forecasts, and why does that matter for trading?
EPT-2 RR updates up to 24 times per day. EPT-2e updates 4 times per day. Actual-generation power forecasts inside Jua for Energy refresh every 15 minutes. Traditional NWP systems are capped at two to four global runs per day by the economics of HPC infrastructure. A single NWP simulation consumes approximately 8,400 kWh and costs €1,000–€20,000. A single EPT-2 inference runs on a single GPU in minutes at approximately 0.25 kWh and $0.20–$15, which is roughly four orders of magnitude cheaper. For energy traders, the gap between runs is the gap between a current position and a stale one. Divergence alerts on the Jua platform fire the moment two models disagree on a key variable, and correction alerts fire the moment a model revises its own output. Both alert types define trade windows, and the trader who sees the revision first can act before the market reprices.
Can I benchmark Atmo AI against EPT-2 on my own region and variables?
Atmo AI does not appear on the Jua platform’s benchmarking surface because it has not published the data required to run a like-for-like comparison, including RMSE series, CRPS figures, and a verification methodology against ground-truth observations. The Jua platform benchmarks 25+ models that do publish that data, including ECMWF HRES, ECMWF ENS, Microsoft Aurora, GFS GraphCast, ECMWF AIFS, NOAA GFS, and DWD ICON, alongside the full EPT family. A meteorologist or quant developer selects a region, a variable, and a time window and receives a head-to-head accuracy comparison in seconds. Athena can run a full backtest against years of historical forecasts in approximately 5 minutes. If Atmo AI publishes peer-reviewed benchmarks in a format compatible with the platform’s verification methodology, a direct comparison becomes possible. Until that point, the evaluation remains asymmetric by Atmo’s own choice.
What is the economic case for switching from a current forecast provider to Jua for Energy?
The market-sizing economics follow directly from accuracy gains. A 1 GW wind portfolio that gains four percentage points of forecast accuracy saves approximately €1.5 M per year under typical hedging and imbalance penalty structures. A 1 GW solar portfolio at the same accuracy gain saves approximately €3 M per year. Multi-GW portfolios scale linearly. The live benchmark on the Jua platform acts as the proof-of-value mechanism. A prospect selects their highest-stakes region and variable, runs EPT-2 against their current provider, and sees the accuracy comparison on screen in seconds. Customers including Axpo, TotalEnergies, Statkraft, EnBW, EDF, and Hydro-Québec use Jua for Energy for daily trading decisions across five continents.
Is Jua a weather AI company?
No. Jua operates as a foundation model and agent company. EPT is a general physics foundation model, and Athena is an AI agent. Jua for Energy is the first applied product where Jua deploys both. The relationship mirrors the one between Anthropic and Claude Code, a horizontal AI platform with a flagship vertical product. The atmosphere is the first physical system EPT has been fine-tuned for, and energy trading is the first market Athena has been instrumented for. The roadmap extends to other physical-economy domains, including plasma fusion, aerospace, materials, and fluids, each shipped as a new vertical product on the same horizontal platform.