Medline and Pubmed repositories are rich in medical literature .Once the documents are retrieved from PUBMED, they need further analysis. This paper describes new model for text classification by estimating terms weights and shows how the classification accuracy is improved with this method. The method uses global relevant weight as term weighing schema. Experiments performed with different weighing schemas shows that the new global relevant weighing method outperforms the traditional term weighing approaches.
Information on internet increases rapidly from day to day and the usage of the web also increases, thus there is the need to discover interesting patterns from web. The process used to extract and mine useful information from web documents by using Data Mining Techniques is called Web Mining. Web Mining is broadly classified in to three types namely Web Content Mining, Web Structure Mining and Web Usage Mining. In this paper our focus is mainly on Web Usage Mining, where we are applying the data mining techniques to analyse and discover interesting knowledge from the Web Usage data. The activities of the user are captured and stored at different levels such as server level, proxy level and user level called as Web Usage Data and the usage data stored at server side is Web Server Log, where it records the browsing behavior of users and their requests based on the user clicks. Web server Log is a primary source to perform Web Usage Mining. This paper also brings in to discussion of various existing pre-processing techniques and analysis of web log files and how clustering is applied to group the users based on the browsing behavior of users on their interested contents.
The exponential growth of online repositories in medical science has led to the development of various text mining tool . Theses tools assist the users in analyzing text data stored in the online repositories like Pubmed and medline. The pubmed repositories are growing at the rate of 500000 articles per year. Classification of Medline documents becomes very complex due to high dimensionality of feature space. In this study we discussed how dimensionality is reduced. We study and compared various dimensionality reduction techniques at the preprocessing stage. We introduce a novel feature weighting scheme "GRW " and proved that this schema improves classification accuracy. Our experimental results indicate that existing feature weighting methods has less accuracy rate when compared to GRW schema and tested on medical data set
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.