In order to further improve the technical level of data cleaning and data mining and better avoid the defects of uncertain knowledge expression in traditional Bayesian networks, a Bayesian network algorithm based on combined data cleaning and mining technology is proposed, and a manual functional data cleaning architecture based on Hadoop is constructed. The results show that the traditional neighbor sorting algorithm with window size of 5 takes the least time to process the same amount of data. The nearest neighbor sorting algorithm with window size 7 is always the longest. The time consumption of the nonfixed window nearest neighbor sorting algorithm is similar to that of the traditional nearest neighbor sorting algorithm with a window size of 5. However, with the increase of data volume, the consumption time increases rapidly until it approaches the consumption time of the traditional sorting nearest neighbor algorithm with window size of 7. Therefore, the algorithm can improve the precision of data cleaning at the expense of cleaning speed, which proves that the artificial intelligence architecture based on combined data significantly improves the efficiency of the algorithm and can effectively analyze and process large data sets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.