In this paper, we propose a system to visualize the relationships in huge quantities of Internet news by twodimensional self-organizing maps instead of the conventional methods of listing Internet news. In the proposed method, morphological analysis is conducted on the texts of Internet news to generate input vectors with elements of keywords. The characteristics specific to Internet news that many of the vector elements become sparse allows dimensional reductions as well as speeding up of self-organizing mapping with restricted search regions in learning. We verify through evaluation experiments with the data of 80 pieces of news that the proposed system can reduce computation time by 75% to 99% and can create more efficient SOM compared with the generally available SOM.
The purpose of this paper is to provide a solution of extracting appropriate keywords to identify meaningful learningcontents on the Web. There are some issues in identifying documents that have learning content. Firstly, the documents need to be identified according to the learning area of a student's school year. Secondly, the documents need to be identified according to the learning area that the student is now studying or studied. In this paper, we present a method of extracting keywords for mining meaningful learning-contents using Wikipedia. At first, we select the articles in Wikipedia with the arbitrary input keyword of learning items. Then, we select other Wikipedia's articles related to the articles selected by the first process, using links and categories of Wikipedia. Furthermore, we calculate degrees of association between the articles and the keywords using PF-IBF, and put the degree on each keyword. Finally, we screen the keywords using his/her curriculum guideline to adjust the keywords to the learning area of the student's school year. In the next step, we are planning to develop a method of screening keywords according to each student's ability, so that we can select more appropriate keywords for each student.
When users use search engines to acquire knowledge on certain subjects in unknown domains, they often refer to the related search keywords that are generated on the frequency of use as search keywords. However, such searches by reference to related search keywords may not always turn out to be useful for the expansion of knowledge on the research subjects. We, therefore, propose a new method to generate related search keywords by means of Wikipedia. In the proposed method, users first searchWikipedia pages of the same title with the queries input by users to extract information on the category of the pages. Next, obtain the sets of pages that fall into the category and extract related page groups from the pages contained in any plural product sets of pages. Then, calculate pointwise mutual information or tf-idf for the keywords extracted from each page to make either information of higher values associated with search keywords. We have confirmed effectiveness of the proposed method through comparison with related search keywords generated by Google as well as through subjective evaluation experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.