Robust feature selection is vital for creating reliable and interpretable machine-learning (ML) models. When designing statistical prediction models in cases where domain knowledge is limited and underlying interactions are unknown, choosing the optimal set of features is often difficult. To mitigate this issue, we introduce a multidata (M) causal feature selection approach that simultaneously processes an ensemble of time series datasets and produces a single set of causal drivers. This approach uses the causal discovery algorithms PC $ {}_1 $ or PCMCI that are implemented in the Tigramite Python package. These algorithms utilize conditional independence tests to infer parts of the causal graph. Our causal feature selection approach filters out causally spurious links before passing the remaining causal features as inputs to ML models (multiple linear regression and random forest) that predict the targets. We apply our framework to the statistical intensity prediction of Western Pacific tropical cyclones (TCs), for which it is often difficult to accurately choose drivers and their dimensionality reduction (time lags, vertical levels, and area-averaging). Using more stringent significance thresholds in the conditional independence tests helps eliminate spurious causal relationships, thus helping the ML model generalize better to unseen TC cases. M-PC $ {}_1 $ with a reduced number of features outperforms M-PCMCI, noncausal ML, and other feature selection methods (lagged correlation and random), even slightly outperforming feature selection based on explainable artificial intelligence. The optimal causal drivers obtained from our causal feature selection help improve our understanding of underlying relationships and suggest new potential drivers of TC intensification.
<p>Tropical storms that develop over the North Indian Ocean basin pose a major threat to the extensive peninsular coastlines teeming with overpopulated cities and vast areas of low-lying farmlands. With each year, the economic and property losses due to storm-induced gales, landslides and flash floods over the coastlines are becoming more frequent. Reliable subseasonal prediction of tropical cyclogenesis over the landlocked North Indian Ocean basin has extreme demand and requires accurate rendition of the crucial parameters that influence the storm development. While several genesis potential indices are used for climatological monitoring and prediction of cyclogenesis globally, their skill in subseasonal prediction of individual storm development is limited, especially near coastlines. This study reviews an improved genesis potential parameter, namely IGPP, that can detect cyclogenesis, evolution and storm tracks from post-processed Multi-model ensemble outputs. The IGPP is a revised version of Kotal Genesis Potential Parameter (KGPP) introduced by the India Meteorological Department for short and medium&#8208;range operational cyclogenesis prediction over the North Indian Ocean. We analyzed and compared the cyclogenesis prediction systems when multiple storm systems of different intensities develop simultaneously. Results show that false alarms and overestimation of values present in KGPP are remarkably reduced by using IGPP for all the cases. Moreover, IGPP outperforms KGPP in distinguishing between developing and non-developing storms by accurately representing the cyclogenesis and intensity variations. The mean IGPP shows better correlation with maximum wind speeds of selected storms, with an improvement of almost 34 % compared to KGPP, which we attribute to the changes in thermodynamic and shear terms. The thermodynamic term is modified as the mean equivalent potential temperature of the surface and middle troposphere to include the effect of warm sea surface and tropospheric latent heat release whereas the vertical wind shear between 850 and 200 hPa levels is averaged over an annular region between 100 and 200 km radii from the storm centres and rescaled. IGPP has replaced KGPP operationally and is successfully implemented as one of the indices for the extended range probabilistic prediction of cyclogenesis by the India Meteorological Department. Probabilistic predictions using IGPP has been instrumental in providing early guidance on storm formation and weekly forecasts are available at https://www.tropmet.res.in/erpas/.</p><p><img src="https://contentmanager.copernicus.org/fileStorageProxy.php?f=gepj.d262ff6e6ed165127691461/sdaolpUECMynit/22UGE&app=m&a=0&c=ec9360a38864fff9c82f2561f677aeab&ct=x&pn=gepj.elif&d=1" alt=""></p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.