Climate change is a phenomenon that is sometimes denied or trivialized. However, in recent years, we have faced extreme phenomena such as fires, floods, excessive temperatures, etc. which affect our physical and mental condition and the environment, often leading to significant material damage. To understand these problems and highlight the meteorological and phenomenological changes encountered in the last decade, time series were web-scraped and analyzed from several open data sources: weather news broadcast in Romania, air quality, temperature, etc. The extraction and organization of data recorded between 2009 and 2023 are formulated as a framework that can be reproduced and replicated to continue the monitoring. The exploratory analysis of the categorical and numerical data highlights intricate patterns and correlations within meteorological conditions across regions and seasons. From temperature trends to air quality fluctuations, the study underscores the dynamic interplay of weather phenomena, paving the way for informed forecasting and deeper climate research. At the same time, data processing includes Latent Dirichlet Allocation, K-prototype clustering analysis, in addition to K-means clustering with dimensional reduction techniques, all of which are employed to further reveal the extreme phenomena in news and regions with higher occurrence. Therefore, in this paper, we propose a data processing framework for multiple datasets and analytics, extracting valuable information on climate change and identifying the exposed regions to extreme phenomena.
INDEX TERMSclimate change; news; web scraping; NLP; data analysis; data clustering