Spoken Language Identification (LID) is the process of determining and classifying natural language from a given content and dataset. Typically, data must be processed to extract useful features to perform LID. The extracting features for LID, based on literature, is a mature process where the standard features for LID have already been developed using Mel-Frequency Cepstral Coefficients (MFCC), Shifted Delta Cepstral (SDC), the Gaussian Mixture Model (GMM) and ending with the i-vector based framework. However, the process of learning based on extract features remains to be improved (i.e. optimised) to capture all embedded knowledge on the extracted features. The Extreme Learning Machine (ELM) is an effective learning model used to perform classification and regression analysis and is extremely useful to train a single hidden layer neural network. Nevertheless, the learning process of this model is not entirely effective (i.e. optimised) due to the random selection of weights within the input hidden layer. In this study, the ELM is selected as a learning model for LID based on standard feature extraction. One of the optimisation approaches of ELM, the Self-Adjusting Extreme Learning Machine (SA-ELM) is selected as the benchmark and improved by altering the selection phase of the optimisation process. The selection process is performed incorporating both the Split-Ratio and K-Tournament methods, the improved SA-ELM is named Enhanced Self-Adjusting Extreme Learning Machine (ESA-ELM). The results are generated based on LID with the datasets created from eight different languages. The results of the study showed excellent superiority relating to the performance of the Enhanced Self-Adjusting Extreme Learning Machine LID (ESA-ELM LID) compared with the SA-ELM LID, with ESA-ELM LID achieving an accuracy of 96.25%, as compared to the accuracy of SA-ELM LID of only 95.00%.
Ground level ozone is one of the common pollution issues that has a negative influence on human health. The key characteristic behind ozone level analysis lies on the complex representation of such data which can be shown by time series. Clustering is one of the common techniques that have been used for time series metrological and environmental data. The way that clustering technique groups the similar sequences relies on a distance or similarity criteria. Several distance measures have been integrated with various types of clustering techniques. However, identifying an appropriate distance measure for a particular field is a challenging task. Since the hierarchical clustering has been considered as the state of the art for metrological and climate change data, this paper proposes an agglomerative hierarchical clustering for ozone level analysis in Putrajaya, Malaysia using three distance measures i.e. Euclidean, Minkowski and Dynamic Time Warping. Results shows that Dynamic Time Warping has outperformed the other two distance measures.
One of the significant threats that faces the web nowadays is the DNS tunneling which is an attack that exploit the domain name protocol in order to bypass security gateways. This would lead to lose critical information which is a disastrous situation for many organizations. Recently, researchers have pay more attention in the machine learning techniques regarding the process of DNS tunneling. Machine learning is significantly impacted by the utilized features. However, the lack of benchmarking standard dataset for DNS tunneling, researchers have captured the features of DNS tunneling using different techniques. This paper aims to present a review on the features used for the DNS tunneling.
Time series clustering is the process of grouping sequential correspondences in similar clusters. The key feature behind clustering time series data lies on the similarity/distance function used to identify the sequential matches. Dynamic Time Warping (DTW) is one of the common distance measures that have demonstrated competitive results compared to other functions. DTW aims to find the shortest path in the process of identifying sequential matches. DTW relies on dynamic programming to obtain the shortest path where the smaller distance is being computed. However, in the case of equivalent distances, DTW is selecting the path randomly. Hence, the selection could be misguided in such randomization process, which significantly affects the matching quality. This is due to randomization may lead to the longer path which drifts from obtaining the optimum path. This paper proposes a modified DTW that aims to enhance the dynamic selection of the shortest path when handling equivalent distances. Experiments were conducted using twenty UCR benchmark datasets. Also, the proposed modified DTW result has been compared with the state of the art competitive distance measures which is based on precision, recall and f-measure including the original DTW, Minkowski distance measure and Euclidean distance measure. The results showed that the proposed modified DTW reveal superior results in compared to the standard DTW, either using Minkowski or Euclidean. This can demonstrate the effectiveness of the proposed modification in which optimizing the shortest path has enhanced the performance of clustering. The proposed modified DTW can be used for having good clustering method for any time series data.
Ozone analysis is the process of identifying meaningful patterns that would facilitate the prediction of future trends. One of the common techniques that have been used for ozone analysis is the clustering technique. Clustering is one of the popular methods which contribute a significant knowledge for time series data mining by aggregating similar data in specific groups. However, identifying significant patterns regarding the ground-level ozone is quite a challenging task especially after applying the clustering task. This paper presents a pattern discovery for ground-level ozone using a proposed method known as an Agglomerative Hierarchical Clustering with Dynamic Time Warping (DTW) as a distance measure on which the patterns have been extracted using the Apriori Association Rules (AAR) algorithm. The experiment is conducted on a Malaysian Ozone dataset collected from Putrajaya for year 2006. The experiment result shows 20 pattern influences on high ozone with a high confident (1.00). However, it can be classified into four meaningful patterns; more high temperature with low nitrogen oxide, nitrogen oxide and nitrogen dioxide high, nitrogen oxide with carbon oxide high, and carbon oxide high. These patterns help in decision making to plan the amount of carbon oxide and nitrogen oxide to be reduced in order to avoid the high ozone surface.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.