Over the past year, AI weather models have set new benchmarks in short-term forecasting. With EPT-2 and EPT-2 HRRR, we pushed the state of the art for 1-10 day predictions, delivering higher accuracy and faster updates than traditional numerical weather prediction. The short-term problem is increasingly solved.
Now we are turning to the next frontier: subseasonal forecasting. Today, we are publicly releasing the extended-range capabilities of EPT-2e. Our ensemble model delivers skillful forecasts out to 60 days, with internal versions already running at 180 days for select customers.
The Subseasonal Challenge
The subseasonal range, roughly 2 to 8 weeks ahead, sits in a difficult gap. Initial condition information from weather observations fades, but seasonal climate signals have not yet taken over. Traditional subseasonal systems like ECMWF Seasonal and NOAA GEFS provide forecasts at these lead times, but their skill degrades rapidly. For key variables like temperature, ECMWF Seasonal skill drops to zero around day 35 and goes negative beyond that, meaning the forecasts become worse than simply using historical averages.
This matters for real decisions. Energy traders need to plan capacity weeks ahead. Utilities managing hydropower reservoirs must anticipate inflows over monthly timescales. Agricultural operations depend on extended outlooks. Emergency managers need early signals of extreme events. All of these users need skill at lead times where traditional models struggle.
What We Found
We evaluated EPT-2e against three baselines: ECMWF Seasonal (SEAS5), NOAA GEFS, and monthly climatology (1991-2020). Verification was performed over the past two years against ERA5 reanalysis over Europe. We are currently adding ECMWF ENS Extended Range (46-day) to these comparisons and will update this article when available. The results show a fundamental difference in how these models perform at extended ranges.
The Skill Score
The skill score measures forecast quality relative to climatology. A positive skill score means the model adds value over historical averages. A zero skill score means the model performs the same as climatology. A negative skill score means the model performs worse than climatology.
This last point is important. A negative skill score means you would be better off using historical averages than using the model's forecast. The model is not just unhelpful; it is counterproductive.
Temperature at 2m
Temperature forecasting shows our strongest results. EPT-2e maintains positive skill throughout the 60-day forecast horizon. ECMWF Seasonal does not.

Looking at the skill score panel (right), EPT-2e maintains 10-20% improvement over climatology out to day 60. ECMWF Seasonal crosses zero around day 35 and becomes 10-20% worse than climatology by day 40-60. GEFS hovers around zero from day 25 onward, adding no value. Note that GEFS only extends to 35 days lead time.
Beyond day 35, using ECMWF Seasonal forecasts for temperature actually makes your predictions worse than using historical averages. EPT-2e continues to add value where ECMWF Seasonal becomes counterproductive.
Dew Point Temperature at 2m
Dew point temperature, critical for energy demand forecasting and agricultural applications, shows the same pattern.

EPT-2e maintains positive skill throughout 60 days. ECMWF Seasonal drops below zero after day 35. The gap is even larger than for air temperature: at day 40-60, EPT-2e still outperforms climatology by 10-15%, while ECMWF Seasonal underperforms climatology by 15-20%.
Wind Speed at 10m
Wind speed forecasting is essential for renewable energy operations. At extended ranges, all models converge toward climatology skill levels, but EPT-2e remains competitive.

Beyond day 10, skill scores for all models approach zero. EPT-2e maintains slight positive skill at day 60 while ECMWF Seasonal fluctuates around zero or slightly negative. Wind is inherently harder to predict at extended ranges due to its higher variability, but EPT-2e does not degrade below climatology.
Air Pressure at Mean Sea Level
Pressure forecasting is important for large-scale weather pattern prediction.

All models show similar behavior for pressure. By day 40-60, skill scores for all three models hover around zero. The differences between models are small at extended ranges. Pressure is more predictable than precipitation but less predictable than temperature at subseasonal timescales.
Precipitation
Precipitation is the most challenging variable for extended-range forecasting.

By day 10, skill scores for all models converge to near zero and stay there. None of the models, including EPT-2e, show meaningful skill for precipitation beyond the medium range. This reflects fundamental limits in precipitation predictability at subseasonal timescales. All models perform similarly; further research is needed to improve this.
Spatial Error Comparison
We also examined spatial error patterns across Europe. The comparison below shows 2m temperature forecasts from EPT-2e (50 ensemble members), NOAA GEFS (31 members), and EC Seasonal (51 members), with their respective differences from ERA5 reanalysis.

EPT-2e achieves RMSE of 0.92°C and CRPS of 0.55°C. GEFS shows RMSE of 1.31°C (42% higher) and CRPS of 0.74°C (35% higher). EC Seasonal shows RMSE of 1.54°C (67% higher) and CRPS of 0.87°C (58% higher).
The error maps show different bias patterns. GEFS shows widespread warm biases across Eastern Europe. EC Seasonal shows a mix of warm and cold biases with larger spatial structure. EPT-2e errors are smaller in magnitude overall.
Case Study: August 2025 Western Germany Heatwave
To illustrate real-world forecast behavior, we analyzed the August 12-14, 2025 heatwave that affected Western Germany. Temperatures exceeded 35°C, creating significant stress on power grids and public health systems. We will be adding additional case studies in future updates.
We compared forecasts issued at two lead times: 42 days before the event (July 1) and 11 days before the event (August 1). At the 42-day range we compare against ECMWF Seasonal, the appropriate reference at that lead time. At 11 days we compare against ECMWF IFS ENS, the operational ensemble system designed for the medium range.

At 42 days (July 1 forecast)
Both models underestimated. Ensemble means predicted daily maximum temperatures well below the observed 35°C, with both models hovering around 21-22°C during the heatwave period.
- EPT-2e heatwave forecast accuracy: 8%
- ECMWF Seasonal heatwave forecast accuracy: 38%
At this extended range, ECMWF Seasonal showed higher accuracy for the extreme event. Its wider ensemble spread assigned more probability to extreme temperatures, even though its ensemble mean was no closer to reality than EPT-2e's.
At 11 days (August 1 forecast)
At 11 days lead time, we compare EPT-2e against ECMWF IFS ENS, the proper operational ensemble system for this range. Both models improved substantially, but EPT-2e provided a stronger signal.
- EPT-2e heatwave forecast accuracy: 77%
- ECMWF IFS ENS heatwave forecast accuracy: 56%
EPT-2e's ensemble mean shifted upward to capture much of the heat signal, with its uncertainty band covering the observed temperatures. ECMWF IFS ENS also improved over the Seasonal baseline, correctly shifting its ensemble mean higher, but its overall accuracy remained lower than EPT-2e's.
Forecast Sharpening
The key difference is how EPT-2e sharpened its forecast as the event approached.
EPT-2e runs daily and incorporated 31 days of new observations between July 1 and August 1. It correctly revised its forecast upward, with accuracy jumping from 8% to 77%. The model's daily updates allowed it to progressively lock onto the emerging heat signal.
ECMWF Seasonal runs monthly and cannot incorporate new observations between initialization dates. At 11 days, the proper comparison shifts to ECMWF IFS ENS, which does benefit from recent observations and reached 56% accuracy. However, EPT-2e's 77% represents a significantly more decisive signal for decision-makers.
Implications for Decision-Makers
At 11 days before a heat emergency, ECMWF IFS ENS indicated 56% accuracy for the heatwave — a signal that the event was possible but far from certain. EPT-2e indicated 77% accuracy — a strong signal that the event was likely.
The actual event exceeded 35°C.
For utilities, grid operators, and emergency managers making decisions at the 11-day mark, both models provided some signal, but EPT-2e's stronger and more decisive forecast gave decision-makers higher confidence to act. In high-stakes scenarios, the difference between 56% and 77% can determine whether preemptive measures are taken.
Operational Advantages
Beyond accuracy, EPT-2e offers practical advantages for operational use.
Daily initialization: ECMWF Seasonal runs once per month. GEFS provides extended range forecasts out to 35 days, but Week 3-4 outlooks are issued weekly. EPT-2e runs daily with full 60-day (and for select customers, 180-day) horizons, incorporating the latest atmospheric observations each day.
Computational efficiency: EPT-2e generates 50-member ensembles significantly faster than traditional numerical models, enabling higher update frequency without proportional infrastructure costs.
Consistency with short-range: EPT-2e uses the same underlying model architecture for both short-range and extended-range forecasting, providing a smooth transition from deterministic short-range to probabilistic extended-range predictions.
| Feature | EPT-2e | ECMWF Seasonal | GEFS |
|---|---|---|---|
| Update Frequency | Daily | Monthly | Weekly (extended range) |
| Incorporates recent observations | Yes | No (monthly init) | Yes |
| Public Horizon | 60 days | 7 months | 35 days |
| Ensemble Members | 50 | 51 | 31 |
| Skill at day 30+ (temperature) | Positive | Negative | ~Zero |
Current Deployment
EPT-2e extended-range forecasts are already operational. Several large utility customers in hydropower are using 180-day forecasts for reservoir management and inflow planning. The public platform currently provides 60-day forecasts, with plans to extend to 180 days.
Getting Started:
- Graphical Platform: Access EPT-2e through our web interface on the Earth Intelligence Platform
- API Access: Integrate EPT-2e into your applications using our Python SDK:
pip install jua
Summary
EPT-2e now delivers skillful forecasts in the subseasonal range where traditional models struggle. For temperature and dew point, EPT-2e maintains positive skill out to 60 days. ECMWF Seasonal degrades below climatology by day 35, meaning it becomes counterproductive to use. GEFS extends only to 35 days and hovers around zero skill, adding no value over historical averages.
The August 2025 heatwave case study demonstrates the operational difference. At 42 days, both models underestimated the event, with ECMWF Seasonal's wider ensemble spread giving it 38% accuracy versus EPT-2e's 8%. But at 11 days, EPT-2e's daily updates delivered 77% heatwave forecast accuracy, significantly outperforming ECMWF IFS ENS at 56%.
Daily updates, computational efficiency, and demonstrated skill make EPT-2e a practical tool for energy traders, utilities, and operational forecasters who need reliable extended-range guidance.
We will continue extending the public forecast horizon toward 180 days and improving skill in challenging variables like precipitation. The subseasonal gap is closing.