Text mining is a specific method to extract knowledge from structured and unstructured data. This extracted knowledge from text mining process can be used for further usage and discovery. This paper presents the method for extraction information from unstructured text data and the importance of Association Rules Mining, specifically for of Korean language (text) and also, NLP (Natural Language Processing) tools are explained. Association Rules Mining (ARM) can also be used for mining association between itemsets from unstructured data with some modifications. Which can then, help for generating statistical thesaurus, to mine grammatical rules and to search large data efficiently. Although various association rules mining techniques have successfully used for market basket analysis but very few has applied on Korean text. A proposed Korean language mining method calculates and extracts meaningful patterns (association rules) between words and presents the hidden knowledge. First it cleans and integrates data, select relevant data then transform into transactional database. Then data mining techniques are used on data source to extract hidden patterns. These patterns are evaluated by specific rules until we get the valid and satisfactory result. We have tested on Korean news corpus and results have shown that it has worked well, and the results were adequate enough to research further.These processes are done until we get the required result. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. We have used same technique to extract meaningful association rules out of Korean text. First step is to pre-process data (getting rid of un-necessary data), then we transformed process data into transactional database. Grouping this transactional database with other hand-built databases we mined some interesting and useful patterns in given Korean text data collection (corpus).Rest of the paper is structured as follows: Section 2 covers related work; Section 3 describes the association rules mining for Korean language. Experiments and results are in Sections 4, and Sections 5 are conclusion and future work.
Abstract. This paper presents an efficient text mining method focusing on extraction and updating of unknown words (unknown foreign words) to improve data classification and POS tags. Our proposed method used simple but efficient techniques, first it converts the data into structured form, using data preprocessing techniques. In this phase data passes through different stages, such as, cleaning, integration and selection of important data, and then it gets organized into databases for further analysis and processing. These database(s) consists of different kinds of dictionaries, our system heavily based on dictionaries. Our proposed methods for discovering and updating foreign unknown word, first discovers the foreign word using morphological analysis with the help of automatically and manually crated dictionaries, then suffix trimming and word segmentation, next our algorithm checks for its different written pattern using dictionaries according to its spelling and synonym word in native language (Korean) and also, updates the POS tags.
Abstract. This paper explains the importance of Association Rules Mining for of Korean language (text). Association rules mining can also be used for mining association rules from textual data with some modifications. Which can then, help for generating statistical thesaurus, to mine grammatical rules and to search large data efficiently. Although various association rules mining techniques have successfully used for market basket analysis but very few has applied on Korean text. A proposed Korean language mining model calculates and extracts meaningful patterns (association rules) between words and presents the hidden knowledge. First it cleans and integrates data and select relevant data then transform into transactional database. Then data mining techniques are used on data source to extract hidden patterns. These patterns are evaluated by specific rules until we get the valid and satisfactory result. We have tested on Korean news corpus and results have shown that it has worked well, and the results were adequate enough to research further.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations –citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.