Most previous research on document classification assigned only one or two category tags to each document. Furthermore, the tagged items are rarely incorporated into their topic groups in subsequent classification work though it would conceivably enhance classification efficiency. With the modularity method, this research incrementally adds the classified documents to their topic groups after the recognition process to examine the changes in grouping quality. The result shows that social network analysis demonstrates great potential for automatic document classification, especially in identifying citation networks embedded in research papers and reference lists. A modified TF-IDF technique calculates the weight of each keyword in the topic groups. All the papers under study are collected from three journals in IEEE Computer Society collection published from 1979 to 2011.
The enormous popularity of Web 2.0 social network services has led to much research on social network analysis (SNA). These studies focus on analyzing the complex interactive activities between users in the world of virtual networks. SNA has shown great potential in automatic document classification, especially in identifying citation networks of research papers and the references among them. This research adopts the Clique Percolation Method (CPM) to identify all overlapping subgroups in a citation network. In the grouping process, research papers with similar topics will be grouped into the same topic group. Two papers are regarded as having a relationship when the common citation rate between them is higher than the threshold. A modified TF-IDF calculates the weight of each keyword in the topic groups. The keyword-weight vector represents the main features of each group, while the category of a new-coming document is determined by a novel similarity function. All the papers under study are collected from the journal IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) published from 1979 to 2011.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.