This work uses individuals' social media messages to identify latent infectious diseases, without prior information, quicker than when the disease(s) is formalized by national public health institutes. In particular, the unsupervised machine learning model using user, textual, and temporal information in social media data, along with sentiment analysis, identifies latent infectious diseases in a given location.
Intrusion detection systems (IDSs) are intrinsically linked to a comprehensive solution of cyberattacks prevention instruments. To achieve a higher detection rate, the ability to design an improved detection framework is sought after, particularly when utilizing ensemble learners. Designing an ensemble often lies in two main challenges such as the choice of available base classifiers and combiner methods. This paper performs an overview of how ensemble learners are exploited in IDSs by means of systematic mapping study. We collected and analyzed 124 prominent publications from the existing literature. The selected publications were then mapped into several categories such as years of publications, publication venues, datasets used, ensemble methods, and IDS techniques. Furthermore, this study reports and analyzes an empirical investigation of a new classifier ensemble approach, called stack of ensemble (SoE) for anomaly-based IDS. The SoE is an ensemble classifier that adopts parallel architecture to combine three individual ensemble learners such as random forest, gradient boosting machine, and extreme gradient boosting machine in a homogeneous manner. The performance significance among classification algorithms is statistically examined in terms of their Matthews correlation coefficients, accuracies, false positive rates, and area under ROC curve metrics. Our study fills the gap in current literature concerning an up-to-date systematic mapping study, not to mention an extensive empirical evaluation of the recent advances of ensemble learning techniques applied to IDSs.
The authors of this work propose an algorithm that determines optimal search keyword combinations for querying online product data sources in order to minimize identification errors during the product feature extraction process. Data-driven product design methodologies based on acquiring and mining online product-feature-related data are presented with two fundamental challenges: (1) determining optimal search keywords that result in relevant product related data being returned and (2) determining how many search keywords are sufficient to minimize identification errors during the product feature extraction process. These challenges exist because online data, which is primarily textual in nature, may violate several statistical assumptions relating to the independence and identical distribution of samples relating to a query. Existing design methodologies have predetermined search terms that are used to acquire textual data online, which makes the resulting data acquired, a function of the quality of the search term(s) themselves. Furthermore, the lack of independence and identical distribution of text data from online sources impacts the quality of the acquired data. For example, a designer may search for a product feature using the term “screen,” which may return relevant results such as “the screen size is just perfect,” but may also contain irrelevant noise such as “researchers should really screen for this type of error.” A text mining algorithm is introduced to determine the optimal terms without labeled training data that would maximize the veracity of the data acquired to make a valid conclusion. A case study involving real-world smartphones is used to validate the proposed methodology.
Vibration measurement and monitoring are essential in a wide variety of applications. Vibration measurements are critical for diagnosing industrial machinery malfunctions because they provide information about the condition of the rotating equipment. Vibration analysis is considered the most effective method for predictive maintenance because it is used to troubleshoot instantaneous faults as well as periodic maintenance. Numerous studies conducted in this vein have been published in a variety of outlets. This review documents data-driven and recently published deep learning techniques for vibration-based condition monitoring. Numerous studies were obtained from two reputable indexing databases, Web of Science and Scopus. Following a thorough review, 59 studies were selected for synthesis. The selected studies are then systematically discussed to provide researchers with an in-depth view of deep learning-based fault diagnosis methods based on vibration signals. Additionally, a few remarks regarding future research directions are made, including graph-based neural networks, physics-informed ML, and a transformer convolutional network-based fault diagnosis method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.