This paper proposes an improved adaptive density-based spatial clustering of applications with noise (DBSCAN) algorithm based on genetic algorithm and MapReduce parallel computing programming framework to improve the poor clustering effect and low efficiency of the DBSCAN algorithm, which due to experiential solving parameters. The size of Intensive Interval Threshold minPts and Scan Radius Eps would be rational planned by genetic algorithm iterative optimization, and it is secondary statute processing with the similarity and variability of the dataset and the efficient computing power of Hadoop Cluster. The data could be reasonable serialization, and the efficient adaptive parallel clustering could be achieved ultimately. Through the experimental results, it is shown that the proposed algorithm in this paper has higher clustering accuracy and execution efficiency than that of the comparison baselines. The trend will continue to grow with the increased volume of dataset. The improved algorithm provides a more accurate implementation method for the threshold of DBSCAN algorithm, and realizes the specific calculation process, which provides practice support for the realization of DBSCAN.
Keywords: cluster storage, HDFS, consistent hashing, copy optimizationCopyright © 2016 Universitas Ahmad Dahlan. All rights reserved. IntroductionDistributed clustered storage is the key technology of large data storage management. Including because of the high transmission rate and high fault tolerance of HDFS, which has became effectively to solve big data storage applications [1][2][3][4]. However, the randomness data placement strategies could cause uneven data distribution, and affect the overall system performance issues, which has been proposed to solve this problem from several aspects research programs [5][6][7][8]. Wang proposed a Minimum service cost policies to achieve dynamic adjustment of the number and the location can save storage space and improve the reliability and stability of the system [9]. Zhai has proposed a tradeoff storage cost and bandwidth costs P2P cache capacity design method, optimal buffer capacity design problem as an integer programming problem [10]. Pamies-Juarez has concluded the optimal data placement strategy to reduce the use of redundancy [11,12]. However, the data did not take the implementation of balanced performance problems and system issues into account.To solve these problems, Li proposes a virtual disk layout scheduling method based on energy-efficient. The dynamic work area is divided into Workspace and ready region, which distribute resources to user and effectively alleviate the problem of prolonged response time [13], Jiang presents a problem to solve the bottleneck of name node performance in storage to reduce access latency and improve the access efficiency [14], Wang is intended to implement a process to read and write files in HDFS parallel transmission strategy, improved copy automatically copy strategy to improve the reading and writing efficiency, reduce latency, to provide efficient and reliable service for cloud storage users [15]. On this basis, store copies of data dispersed principles methods using consistent hashing algorithm are proposed. Combined with improved consistency hashing algorithm [16,17], virtual data storage node and an aliquot area are introduced [18][19][20]. It is possible to consider the data evenly distributed simultaneously, adaptively completed quickly locate stored data and improve system performance.
Traffic flow prediction is one of the fundamental components in Intelligent Transportation Systems (ITS). Many traffic flow prediction models have been developed, but with limitation of noise sensitivity, which will result in poor generalization. Fused Lasso, also known as total variation denoising, penalizes L1-norm on the model coefficients and pairwise differences between neighboring coefficients, has been widely used to analyze highly correlated features with a natural order, as is the case with traffic flow. It denoises data by encouraging both sparsity of coefficients and their differences, and estimates the coefficients of highly correlated variables to be equal to each other. However, for traffic data, the same coefficients will lead to overexpression of features, and losing the trend of time series of traffic flow. In this work, we propose a Fused Ridge multi-task learning (FR-MTL) model for multi-road traffic flow prediction. It introduces Fused Ridge for traffic data denoising, imposes penalty on L2-norm of the coefficients and their differences. The penalty of L2-norm proportionally shrinks coefficients, and generates smooth coefficient vectors with non-sparsity. It has both capability of trend preservation and denoising. In addition, we jointly consider multi-task learning (MTL) for training shared spatiotemporal information among traffic roads. The experiments on real traffic data show the advantages of the proposed model over other four regularized baseline models, and on traffic data with Gaussian noise and missing data, the FR-MTL model demonstrates potential and promising capability with satisfying accuracy and effectiveness.
In this work, we propose a multi-channel semantic fusion convolutional neural network (SFCNN) to solve the problem of emotional ambiguity caused by the change of contextual order in sentiment classification task. Firstly, the emotional tendency weights are evaluated on the text word vector through the improved emotional tendency attention mechanism. Secondly, the multi-channel semantic fusion layer is leveraged to combine deep semantic fusion of sentences with contextual order to generate deep semantic vectors, which are learned by CNN to extract high-level semantic features. Finally, the improved adaptive learning rate gradient descent algorithm is employed to optimize the model parameters, and completes the sentiment classification task. Three datasets are used to evaluate the effectiveness of the proposed algorithm. The experimental results show that the SFCNN model has the high steady-state precision and generalization performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.