Can AI weather models predict out-of-distribution gray swan tropical cyclones?
Can AI weather models predict out-of-distribution gray swan tropical cyclones?by Y. Qiang Sun, Pedram Hassanzadeh, Mohsen Zand, Ashesh Chattopadhyay, Jonathan Weare, and Dorian S. AbbotInability to Extrapolate to Gray Swans Globally: AI weather models like FourCastNet struggle to predict "gray swan" tropical cyclones (TCs), which are rare, strong, and absent from training data. When Category 3-5 TCs are entirely removed from the global training dataset, the model cannot extrapolate from weaker storms (Category 1-2) to accurately forecast these stronger, unseen events, often leading to dangerous "false negative" predictions. This limitation persists even if the training data includes strong extratropical cyclones, as their dynamics differ from TCs.Limited Generalization Across Basins for Dynamically Similar Events: Despite the global extrapolation challenge, FourCastNet can demonstrate some ability to generalize learning across tropical basins for dynamically similar strong storms. This means that if the model has seen strong TCs in one ocean basin, it can apply that learned knowledge to forecast similar strong TCs in another basin, even if those specific events were excluded from the training data for that particular region.Lack of Physical Consistency and Masked Performance: Current AI weather models, including FourCastNet, fail to reproduce key physical balances like the gradient-wind balance that TCs obey in real-world data, regardless of whether they were trained on full or reduced datasets. Furthermore, common evaluation metrics (e.g., anomaly correlation coefficient or root-mean-square error) can obscure these critical shortcomings by showing similar overall performance for general weather or less extreme events, highlighting the need for specialized tests for gray swans.Implications and Future Directions: This research suggests that current AI weather models may provide unreliable early warnings for unprecedented extreme weather events, potentially leading to serious societal risks. It also indicates that AI climate emulators might mischaracterize extreme weather statistics for gray swans. The study emphasizes the urgent need for novel learning strategies (such as incorporating physics-based synthetic data or rare-event sampling algorithms) and rigorous testing methodologies to improve and reliably validate AI models for these high-impact, out-of-distribution events.