Random forests are known to be good for data mining of classification tasks, because random forests are robust for datasets having insufficient information possibly with some errors. But applying random forests blindly may not produce good results, and a dataset in the domain of rotogravure printing is one of such datasets. Hence, in this paper, some best classification accuracy based on clever application of random forests to predict the occurrence of cylinder bands in rotogravure printing is investigated. Since random forests could generate good results with an appropriate combination of parameters like the number of randomly selected attributes for each split and the number of trees in the forests, an effective data mining procedure considering the property of the target dataset by way of trial random forests is investigated. The effectiveness of the suggested procedure is shown by experiments with very good results.
Recent world events in go games between human and artificial intelligence called AlphaGo showed the big advancement in machine learning technologies. While AlphaGo was trained using real world data, AlphaGo Zero was trained using massive random data, and the fact that AlphaGo Zero won AlphaGo completely revealed that diversity and size in training data is important for better performance for the machine learning algorithms, especially in deep learning algorithms of neural networks. On the other hand, artificial neural networks and decision trees are widely accepted machine learning algorithms because of their robustness in errors and comprehensibility respectively. In this paper in order to prove that diversity and size in data are important factors for better performance of machine learning algorithms empirically, the two representative algorithms are used for experiment. A real world data set called breast tissue was chosen, because the data set consists of real numbers that is very good property for artificial random data generation. The result of the experiment proved the fact that the diversity and size of data are very important factors for better performance.
In order to deal with data inconsistency problems in high degree relations, a new method based on rough set theory is presented. The inconsistent data that exist under the attribute sets in the relations having possible functional dependencies can be found effectively by applying the suggested rough set based consistency checking method.
The estimation of missing sensor values is an important problem in sensor network applications, but the existing approaches have some limitations, such as the limitations of application scope and estimation accuracy. Therefore, in this paper, we propose a new estimation model based on a spatial-temporal correlation analysis (STCAM). STCAM can make full use of spatial and temporal correlations and can recognize whether the sensor parameters have a spatial correlation or a temporal correlation, and whether the missing sensor data are continuous. According to the recognition results, STCAM can choose one of the most suitable algorithms from among linear interpolation algorithm of temporal correlation analysis (TCA-LI), multiple regression algorithm of temporal correlation analysis (TCA-MR), spatial correlation analysis (SCA), spatial-temporal correlation analysis (STCA) to estimate the missing sensor data. STCAM was evaluated over Intel lab dataset and a traffic dataset, and the simulation experiment results show that STCAM has good estimation accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.