{"id":581,"date":"2026-06-15T05:00:12","date_gmt":"2026-06-15T05:00:12","guid":{"rendered":"https:\/\/jua.ai\/articles\/ecmwf-vs-ai-weather-europe\/"},"modified":"2026-06-15T05:00:12","modified_gmt":"2026-06-15T05:00:12","slug":"ecmwf-vs-ai-weather-europe","status":"publish","type":"post","link":"https:\/\/jua.ai\/articles\/ecmwf-vs-ai-weather-europe\/","title":{"rendered":"EPT-2 vs ECMWF HRES: AI Forecast Accuracy in Europe"},"content":{"rendered":"<p><em>Written by: Olivier Lam, Physical AI Team, Jua.ai AG<\/em><\/p>\n<h2>Key takeaways for European energy traders<\/h2>\n<ul>\n<li>\n<p>EPT-2 outperforms ECMWF HRES on 10 m wind, 100 m wind, 2 m temperature, and surface solar radiation across 0\u2013240 h lead times on StationBench.<\/p>\n<\/li>\n<li>\n<p>EPT-2e beats the 50-member ECMWF ENS mean on RMSE and CRPS at virtually every lead time while running at a fraction of the compute cost.<\/p>\n<\/li>\n<li>\n<p>ECMWF AIFS underperforms raw IFS ENS on probabilistic metrics and lacks SSRD output, which limits its value for energy trading.<\/p>\n<\/li>\n<li>\n<p>Jua for Energy runs alongside existing ECMWF subscriptions and replaces manual pipelines and spreadsheets rather than the ECMWF feed itself.<\/p>\n<\/li>\n<li>\n<p><a target=\"_blank\" rel=\"noopener noreferrer nofollow\" href=\"https:\/\/meetings-eu1.hubspot.com\/guett\/energy-trading?uuid=d780665f-ff71-439c-addf-c80e49af0627\">Run a live benchmark<\/a> of EPT-2 against your current provider.<\/p>\n<\/li>\n<\/ul>\n<h2>EPT-2 accuracy vs ECMWF and Aurora on Europe<\/h2>\n<p>As of June 2026, EPT-2 ranks among the leading deterministic models for Europe on energy-critical variables and lead times, evaluated on <a target=\"_blank\" rel=\"noindex nofollow\" href=\"https:\/\/arxiv.org\/abs\/2507.09703\">StationBench against real ground stations<\/a>. ECMWF HRES remains the forty-year benchmark and the universal reference point for the energy industry. EPT-2 outperforms HRES on the tested variables, while Jua for Energy runs alongside the ECMWF subscription rather than replacing it.<\/p>\n<p>The table below compares the four models most relevant to European energy trading on dimensions that drive operational value. RMSE rankings come from <a target=\"_blank\" rel=\"noindex nofollow\" href=\"https:\/\/arxiv.org\/abs\/2507.09703\">arXiv:2507.09703<\/a> and <a target=\"_blank\" rel=\"noindex nofollow\" href=\"https:\/\/arxiv.org\/abs\/2410.15076\">arXiv:2410.15076<\/a>. Update frequency and inference cost figures are sourced from the AIFS Single v1.1.0 operational paper and Jua&#8217;s published specifications.<\/p>\n<table style=\"min-width: 100px\">\n<colgroup>\n<col style=\"min-width: 25px\">\n<col style=\"min-width: 25px\">\n<col style=\"min-width: 25px\">\n<col style=\"min-width: 25px\"><\/colgroup>\n<tbody>\n<tr>\n<th colspan=\"1\" rowspan=\"1\">\n<p>Model<\/p>\n<\/th>\n<th colspan=\"1\" rowspan=\"1\">\n<p>Deterministic accuracy vs HRES (10 m wind, 100 m wind, 2 m temp, SSRD, 0\u2013240 h)<\/p>\n<\/th>\n<th colspan=\"1\" rowspan=\"1\">\n<p>Update frequency<\/p>\n<\/th>\n<th colspan=\"1\" rowspan=\"1\">\n<p>Inference cost per simulation<\/p>\n<\/th>\n<\/tr>\n<tr>\n<td colspan=\"1\" rowspan=\"1\">\n<p><a target=\"_blank\" rel=\"noindex nofollow\" href=\"https:\/\/arxiv.org\/abs\/2507.09703\">EPT-2 (Jua)<\/a><\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\">\n<p>Outperforms HRES on the four variables over 0\u2013240 h<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\">\n<p>Up to 24\u00d7\/day (EPT-2 RR); 4\u00d7\/day flagship<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\">\n<p><a target=\"_blank\" rel=\"noopener noreferrer nofollow\" href=\"https:\/\/nebius.com\/customer-stories\/jua\">~0.25 kWh, ~$0.20\u2013$15 on a single GPU<\/a><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td colspan=\"1\" rowspan=\"1\">\n<p><a target=\"_blank\" rel=\"noindex nofollow\" href=\"https:\/\/en.wikipedia.org\/wiki\/European_Centre_for_Medium-Range_Weather_Forecasts\">ECMWF HRES<\/a><\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\">\n<p>The benchmark itself, with 40 years of NWP leadership<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\">\n<p>2\u20134\u00d7\/day<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\">\n<p>~8,400 kWh, \u20ac1,000\u2013\u20ac20,000 on HPC<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td colspan=\"1\" rowspan=\"1\">\n<p>ECMWF AIFS Single v1.1.0<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\">\n<p>Operational AI model; <a target=\"_blank\" rel=\"noindex nofollow\" href=\"https:\/\/arxiv.org\/html\/2606.02508v1\">raw AIFS underperforms raw IFS ENS on CRPS and MAE across all lead times<\/a><\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\">\n<p>4\u00d7\/day<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\">\n<p>~2.5 min on a single A100 GPU<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td colspan=\"1\" rowspan=\"1\">\n<p><a target=\"_blank\" rel=\"noindex nofollow\" href=\"https:\/\/arxiv.org\/abs\/2410.15076\">Microsoft Aurora<\/a><\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\">\n<p>Loses to EPT-2 on 10 m wind, 100 m wind, and 2 m temperature across 0\u2013240 h; no SSRD output<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\">\n<p>Typically 4\u00d7\/day research cadence; no productised operational schedule<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\">\n<p>Similar order of magnitude to EPT-2 for inference, with EPT-2 ~25% faster<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>A 1 GW wind portfolio that gains four percentage points of forecast accuracy saves approximately <a target=\"_blank\" rel=\"noopener noreferrer nofollow\" href=\"https:\/\/nebius.com\/customer-stories\/jua\">\u20ac1.5 M per year<\/a> under typical European hedging and imbalance-penalty structures. A 1 GW solar portfolio at the same accuracy gain saves about \u20ac3 M per year. Multi-GW portfolios scale these economics linearly.<\/p>\n<p>Given EPT-2&#8217;s performance advantage over HRES, the next question is how ECMWF itself uses AI to compete.<\/p>\n<h2>How ECMWF uses AI alongside IFS<\/h2>\n<p>ECMWF AIFS Single v1 became operational on 25 February 2025, with version 1.1.0 released on 27 August 2025. Rather than replacing the physics-based IFS, AIFS runs alongside it as a complementary system that produces forecasts with substantially reduced compute time relative to the full IFS cycle.<\/p>\n<p>AIFS runs four times daily at 00\/06\/12\/18 UTC with 6-hourly output steps out to 360 hours. Its grid spacing is approximately 28 km, compared with 9 km for HRES. AIFS Single and AIFS ENS CF both underestimated peak 10 m wind speeds during Storm Amy (October 2025) and Storm \u00c9owyn (January 2025), smoothing sharp frontal gradients and local wind maxima. <\/p>\n<p>That structural limitation matters for energy traders who need accurate wind-ramp prediction. ECMWF itself states that traditional NWP remains essential for resolving fine-scale processes such as compact vorticity maxima, sharp fronts, and local jets responsible for the most damaging winds of severe mid-latitude storms.<\/p>\n<p>AIFS is available on the Jua platform alongside EPT-2, ECMWF HRES, ECMWF ENS, and 22 other models, all under a single schema and a single API. Customers do not need to choose between ECMWF&#8217;s AI model and Jua&#8217;s; both run in the same workspace. That said, understanding the technical differences between the two models helps clarify when each is most valuable.<\/p>\n<p>EPT-2 differs from AIFS in three operationally significant ways. First, EPT-2 forecasts at native any-\u0394t, so it predicts at arbitrary lead times rather than rolling forward in fixed 6-hour increments. This architecture matters because AIFS&#8217;s 6-hour rollforward compounds error at longer lead times, which degrades accuracy when traders need multi-day visibility. Second, EPT-2 natively covers surface solar radiation (SSRD), a variable AIFS does not output, which creates a critical gap for solar portfolio operators. Third, EPT-2 RR updates up to 24 times per day, compared with AIFS&#8217;s four daily runs, which enables intraday reforecasting as conditions evolve.<\/p>\n<p><a target=\"_blank\" rel=\"noopener noreferrer nofollow\" href=\"https:\/\/meetings-eu1.hubspot.com\/guett\/energy-trading?uuid=d780665f-ff71-439c-addf-c80e49af0627\">Compare EPT-2, AIFS, and HRES<\/a> head-to-head on your region and variable.<\/p>\n<h2>How AIFS ensemble skill compares to EPT-2e<\/h2>\n<p>The most rigorous independent evaluation of ECMWF AIFS ensemble performance in 2026 is the <a target=\"_blank\" rel=\"noindex nofollow\" href=\"https:\/\/arxiv.org\/html\/2606.02508v1\">Kocsis and Baran study<\/a>, which assessed operational 50-member ensemble forecasts for 10 m wind speed from July to November 2025 at 9,246 global SYNOP stations. The findings are unambiguous. <a target=\"_blank\" rel=\"noindex nofollow\" href=\"https:\/\/arxiv.org\/html\/2606.02508v1\">Raw IFS ensemble forecasts substantially outperformed raw AIFS forecasts across all lead times up to 15 days on CRPS, quantile score, coverage, and ensemble-median MAE, with AIFS showing CRPSS around \u22124% and MAES around \u22123% relative to IFS.<\/a><\/p>\n<p><a target=\"_blank\" rel=\"noindex nofollow\" href=\"https:\/\/arxiv.org\/html\/2606.02508v1\">After post-processing with EMOS or quantile regression, the skill gap between IFS and AIFS narrowed substantially, and IFS retained a small but statistically significant advantage primarily at short lead times up to day 4.<\/a> Raw AIFS, without post-processing, does not act as a drop-in replacement for IFS ENS on probabilistic skill metrics.<\/p>\n<p>EPT-2e, Jua&#8217;s ensemble variant, operates on a different benchmark baseline. The table below summarises ensemble skill on the three variables most relevant to European energy trading, drawn from <a target=\"_blank\" rel=\"noindex nofollow\" href=\"https:\/\/arxiv.org\/abs\/2507.09703\">arXiv:2507.09703<\/a> and the <a target=\"_blank\" rel=\"noindex nofollow\" href=\"https:\/\/arxiv.org\/html\/2606.02508v1\">Kocsis and Baran 2026 study<\/a>.<\/p>\n<table style=\"min-width: 75px\">\n<colgroup>\n<col style=\"min-width: 25px\">\n<col style=\"min-width: 25px\">\n<col style=\"min-width: 25px\"><\/colgroup>\n<tbody>\n<tr>\n<th colspan=\"1\" rowspan=\"1\">\n<p>Variable<\/p>\n<\/th>\n<th colspan=\"1\" rowspan=\"1\">\n<p>EPT-2e vs ECMWF ENS mean (RMSE and CRPS, 0\u2013240 h)<\/p>\n<\/th>\n<th colspan=\"1\" rowspan=\"1\">\n<p>Raw AIFS ENS vs raw IFS ENS (CRPS, 0\u2013360 h)<\/p>\n<\/th>\n<\/tr>\n<tr>\n<td colspan=\"1\" rowspan=\"1\">\n<p><a target=\"_blank\" rel=\"noindex nofollow\" href=\"https:\/\/arxiv.org\/abs\/2507.09703\">100 m wind speed<\/a><\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\">\n<p>EPT-2e beats ENS mean at virtually every lead time<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\">\n<p><a target=\"_blank\" rel=\"noindex nofollow\" href=\"https:\/\/arxiv.org\/html\/2606.02508v1\">Raw AIFS underperforms raw IFS across all lead times<\/a><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td colspan=\"1\" rowspan=\"1\">\n<p><a target=\"_blank\" rel=\"noindex nofollow\" href=\"https:\/\/arxiv.org\/abs\/2507.09703\">Surface solar radiation (SSRD)<\/a><\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\">\n<p>EPT-2e beats ENS mean at virtually every lead time; AIFS has no SSRD output<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\">\n<p>Not applicable, because AIFS does not output SSRD<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td colspan=\"1\" rowspan=\"1\">\n<p><a target=\"_blank\" rel=\"noindex nofollow\" href=\"https:\/\/arxiv.org\/abs\/2507.09703\">2 m temperature<\/a><\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\">\n<p>EPT-2e beats ENS mean at virtually every lead time<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\">\n<p><a target=\"_blank\" rel=\"noindex nofollow\" href=\"https:\/\/arxiv.org\/html\/2606.02508v1\">Raw AIFS underperforms raw IFS across all lead times<\/a><\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>EPT-2e runs 4 times per day with 30 members and a 60-day ensemble horizon. ECMWF ENS runs 50 members. EPT-2e beats the 50-member ENS mean on both RMSE and CRPS at virtually every lead time, with fewer members and at a fraction of the compute cost.<\/p>\n<h2>How AI weather models stack up against ECMWF<\/h2>\n<p>Model performance depends on the AI system, the variable, and the event type. For routine conditions across energy-critical variables, the evidence in 2026 favors EPT-2 over ECMWF HRES. For record-breaking extremes, the picture is more nuanced. <\/p>\n<p><a target=\"_blank\" rel=\"noindex nofollow\" href=\"https:\/\/science.org\/doi\/10.1126\/sciadv.aec1433\">A peer-reviewed study in Science Advances found that GraphCast, Pangu-Weather, and Fuxi systematically underestimated the frequency and intensity of record-breaking heat, cold, and wind events, with ECMWF HRES producing lower RMSE than all tested AI models on those out-of-distribution extremes.<\/a> <\/p>\n<p>EPT-2 is a physics-constrained foundation model whose outputs respect conservation laws for mass, momentum, and energy, which addresses the structural extrapolation limitation that affects purely data-driven models on extremes.<\/p>\n<p>The operational comparison between EPT-2e and ECMWF ENS is concrete. EPT-2e updates 4 times per day with a 60-day ensemble horizon, matching ECMWF&#8217;s cadence while running at a fraction of the cost. That cost asymmetry, roughly four orders of magnitude cheaper than traditional NWP, is what makes 24 daily updates from EPT-2 RR economically viable where traditional NWP is capped at two to four runs per day by HPC infrastructure costs.<\/p>\n<p>EPT-2 also operates at a native resolution down to 5 km over Europe, compared with 9 km for ECMWF HRES and approximately 28 km for AIFS. The Jua platform delivers products at up to 1 km resolution. For wind-ramp prediction at specific turbine sites or solar irradiance at plant level, that resolution difference translates directly into forecast accuracy at the asset.<\/p>\n<p>Microsoft Aurora, the closest AI peer on deterministic accuracy, <a target=\"_blank\" rel=\"noindex nofollow\" href=\"https:\/\/arxiv.org\/abs\/2507.09703\">loses to EPT-2 on 10 m wind, 100 m wind, and 2 m temperature across the full 0\u2013240 hour range<\/a>. Aurora has no SSRD output. Aurora shares AIFS&#8217;s rollforward limitation, while EPT-2&#8217;s native any-\u0394t architecture avoids the compounding error discussed earlier. EPT-2 inference is approximately 25% faster than Aurora.<\/p>\n<p>For European energy traders, EPT-2 is a leading model on the variables that drive P&amp;L, including wind at hub height, surface solar radiation, and 2 m temperature, across the lead times that matter for day-ahead and intraday trading. ECMWF HRES remains the essential reference signal. Jua for Energy delivers both in the same workspace.<\/p>\n<p><a target=\"_blank\" rel=\"noopener noreferrer nofollow\" href=\"https:\/\/meetings-eu1.hubspot.com\/guett\/energy-trading?uuid=d780665f-ff71-439c-addf-c80e49af0627\">See results in under 5 minutes<\/a> and run live benchmarks on your own region and variables, head-to-head across 25+ models.<\/p>\n<h2>Frequently asked questions from trading desks<\/h2>\n<h3>We already have ECMWF, so why add Jua for Energy?<\/h3>\n<p>Jua for Energy does not replace ECMWF. Most customers keep their ECMWF subscription and run Jua for Energy alongside it. ECMWF AIFS, ECMWF&#8217;s own AI model, runs on the Jua platform. Jua for Energy replaces everything around the ECMWF feed, including the in-house grib pipeline, the spreadsheet stitching, the manual benchmarking, and the morning-briefing routine.<\/p>\n<p>The 7\u20139 a.m. prep routine compresses into a single workspace, refreshed up to 24 times per day, where ECMWF HRES, ECMWF ENS, AIFS, Aurora, and EPT-2 appear on the same screen under one schema and one API. EPT-2 also outperforms HRES on energy-critical variables and lead times, so the customer gains both workflow consolidation and a more accurate primary signal.<\/p>\n<h3>AI models hallucinate, so can EPT-2 outputs be trusted?<\/h3>\n<p>LLMs hallucinate because they are unconstrained on the symbolic surface, so token sequences that look plausible can be physically nonsensical. EPT is a physics foundation model, not a language model. It is trained on observational physics and its outputs are constrained by the conservation laws, including mass, momentum, and energy, that govern the real atmosphere.<\/p>\n<p>The architecture cannot produce outputs that violate those laws in the way a generic transformer applied naively to physics would. Validation is external and reproducible. EPT-2 is benchmarked against real ground stations on StationBench, and the results are published in technical reports on arXiv (2507.09703 for EPT-2 and 2410.15076 for EPT-1.5). An LLM is unconstrained on the symbolic surface, while a physics model is constrained at the representation.<\/p>\n<h3>How fast can we prove this in our own environment?<\/h3>\n<p>Proof of value takes about 5 minutes for the first benchmark. The live benchmark on the Jua platform is the standard proof-of-value step. A prospect selects a region and a variable that matters to their book, typically 100 m wind over a wind-rich region of their home market or SSRD over a solar portfolio, selects their current provider alongside EPT-2, and the platform returns a head-to-head accuracy comparison on the spot.<\/p>\n<p>Backtests against years of historical forecasts run in approximately 5 minutes via Athena, Jua&#8217;s AI agent. The objection shifts from &#8220;is this real?&#8221; to &#8220;how fast can we sign?&#8221; Customers operating multi-GW portfolios have closed procurement in as little as two weeks from first benchmark to contract.<\/p>\n<h2>Conclusion: what the 2026 StationBench results mean for P&amp;L<\/h2>\n<p>The 2026 StationBench results are clear. EPT-2 beats ECMWF HRES on lead times and on variables that drive a European energy P&amp;L, including 10 m wind, 100 m wind, 2 m temperature, and surface solar radiation. EPT-2e outperforms the 50-member ECMWF ENS mean on both deterministic and probabilistic metrics. ECMWF AIFS, the incumbent&#8217;s own AI model, underperforms raw IFS ENS on probabilistic skill metrics before post-processing and lacks SSRD output entirely.<\/p>\n<p>Jua is a foundation model and agent company. EPT is a general physics foundation model and Athena is an AI agent. Jua for Energy is the first applied product, the platform that delivers EPT-2 and EPT-2e forecasts up to 24 times per day at approximately 0.25 kWh per simulation, alongside ECMWF HRES, ECMWF ENS, AIFS, Aurora, GraphCast, and 19 other models, all under a single schema. The manual grib pipeline, the spreadsheet stitching, and the morning-briefing routine are displaced. The ECMWF subscription stays.<\/p>\n<p>The architecture learns physics and the domain is a variable. The atmosphere is the first physical system EPT has been fine-tuned for. Energy trading is the first market Athena has been instrumented for. Both will expand.<\/p>\n<p><a target=\"_blank\" rel=\"noopener noreferrer nofollow\" href=\"https:\/\/meetings-eu1.hubspot.com\/guett\/energy-trading?uuid=d780665f-ff71-439c-addf-c80e49af0627\">See the numbers for yourself<\/a> and compare EPT-2 head-to-head against your current forecast provider on your region, your variables, and your lead times.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Jua&#8217;s EPT-2 outperforms ECMWF HRES &amp; ENS on wind, temperature &amp; solar across Europe. See the 2026 StationBench results and run a live benchmark.<\/p>\n","protected":false},"author":103,"featured_media":580,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[11],"tags":[],"class_list":["post-581","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-weather-forecasting"],"_links":{"self":[{"href":"https:\/\/jua.ai\/articles\/wp-json\/wp\/v2\/posts\/581","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jua.ai\/articles\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jua.ai\/articles\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jua.ai\/articles\/wp-json\/wp\/v2\/users\/103"}],"replies":[{"embeddable":true,"href":"https:\/\/jua.ai\/articles\/wp-json\/wp\/v2\/comments?post=581"}],"version-history":[{"count":0,"href":"https:\/\/jua.ai\/articles\/wp-json\/wp\/v2\/posts\/581\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/jua.ai\/articles\/wp-json\/wp\/v2\/media\/580"}],"wp:attachment":[{"href":"https:\/\/jua.ai\/articles\/wp-json\/wp\/v2\/media?parent=581"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jua.ai\/articles\/wp-json\/wp\/v2\/categories?post=581"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jua.ai\/articles\/wp-json\/wp\/v2\/tags?post=581"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}