Abstract-Rapid progress in digital data acquisition techniques have led to huge volume of data. More than 80 percent of today's data is composed of unstructured or semi-structured data. The discovery of appropriate patterns and trends to analyze the text documents from massive volume of data is a big issue. Text mining is a process of extracting interesting and nontrivial patterns from huge amount of text documents. There exist different techniques and tools to mine the text and discover valuable information for future prediction and decision making process. The selection of right and appropriate text mining technique helps to enhance the speed and decreases the time and effort required to extract valuable information. This paper briefly discuss and analyze the text mining techniques and their applications in diverse fields of life. Moreover, the issues in the field of text mining that affect the accuracy and relevance of results are identified.
The success of data mining learned rules highly depends on its actionability: how useful it is to perform suitable actions in any real business environment. To improve rule actionability, different researchers have initially presented various Data Mining (DM) frameworks by focusing on different factors only from the business domain dataset. Afterward, different Domain-Driven Data Mining (D3M) frameworks were introduced by focusing on domain knowledge factors from the context of the overall business environment. Despite considering these several dataset factors and domain knowledge factors in different phases of their frameworks, the learned rules still lacked actionability. The objective of our research is to improve the learned rules' actionability. For this purpose, we have analyzed: (1) what overall actions or tasks are being performed in the overall business process, (2) in which sequence different tasks are being performed, (3) under what certain conditions these tasks are being performed, (4) by whom the tasks are being performed (5) what data is provided and produced in performing these tasks. We observed that the inclusion of rule learning factors only from dataset or from domain knowledge is not sufficient. Our Process-based Domain-Driven Data Mining-Actionable Knowledge Discovery (PD3M-AKD) framework explains its different phases to consider and include additional factors from five perspectives of the business process. This PD3M-AKD framework is also in line with the existing phases of current DM and D3M frameworks for considering and including dataset and domain knowledge accordingly. Finally, we evaluated and validated our case study results from different real-life scenarios from education, engineering, and business process domains at the end. INDEX TERMS Actionable knowledge, business process, data mining, data mining framework, domain-driven data mining framework, data privacy.
Abstract-The rapid growth in size of data sets poses challenge to extract and analyze information in timely manner for better prediction and decision making. Data warehouse is the solution for strategic decision making. Data warehouse serves as a repository to store historical and current data. Extraction, Transformation and Loading (ETL) process gather data from different sources and integrate it into data warehouse. This paper proposes a multi-agent framework that enhance the efficiency of ETL process. Agents perform specific task assigned to them. The identification of errors at different stages of ETL process become easy. This was difficult and time consuming in traditional ETL process. Multi-agent framework identify data sources, extract, integrate, transform, and load data into data warehouse. A monitoring agent remains active during this process and generate alerts if there is issue at any stage.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.