Accordingly to Science Daily, 90 percent of all the data in the world has been generated in the last two years. However, the world is analyzing less than 1 percent of its data so far. With the advancement of high-performance computing, deep learning methods are readily applied to analyze large-scale high dimensional datasets. These machine learning methods have achieved significantly efficient training and inferencing as well as producing much more accurate predicted results. Clustering is an unsupervised machine learning method of identifying and grouping similar data points into the same cluster. Clustering plays a fundamental role in the data mining and machine learning community for grouping data into structures so that similar data points are assigned to similar groups. Furthermore, to process these huge amounts of high-dimensional data, deep learning becomes a key technique to learn and perform feature representation of data in latent space for many real world applications. In this paper, we propose deep clustering with robust autoencoder (DCRA), which jointly utilizes robust auto-encoder and deep clustering to perform feature representation and cluster assignments simultaneously. Multiple experiments using open public datasets have been conducted to evaluate our model’s performance. Our results show DCRA is capable of generating high quality clusters with high clustering accuracy of 90% above in high dimensional datasets. The decreasing training and test loss with increasing number of epochs also validates our results.
The western U.S. has been experiencing a mega-scale drought since 2000. By killing trees and drying out forests, the drought triggers widespread wildfire activities. In the 2020 California fire season alone, more than 10.3 million acres of land were burned and over 10000 structures were damaged. The estimated cost is over $12 billion. Drought also devastates agriculture and drains the social and emotional well-being of impacted communities. This work aims at predicting the occurrence and severity of drought, and thus helping mitigate drought related adversaries. A machine learning based framework was developed, including time series data collection, model training, forecast and visualization. The data source is from the National Drought Monitor center with FIPS (Federal Information Processing Standards) geographic identification codes. For model training and forecasting, a Bayesian structural time series (BSTS) based statistical model was employed for a time-series forecasting of drought spatially and temporally. In the model, a time-series component captures the general trend and seasonal patterns in the data; a regression component captures the impact of the drought in measurements such as severity of drought, temperature, etc. The statistical measure, Mean Absolute Percentage Error, was used as the model accuracy metric. The last 10 years of drought data up to 2020-09-01 was used for model training and validation. Back-testing was implemented to validate the model . Afterwards, the drought forecast was generated for the upcoming 3 weeks of the United States based on the unit of county level. 2-D heat maps were also integrated for visual reference.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.