Topic (FICSIT) corpus is controlled precisely for cross-topic samples. The corpus was compiled from data dumps provided by StackExchange 2 . The StackExchange network contains a large collection of different question-answer forums spanning 176 sites with over three million users. Out of all the topics available on the StackExchange network, cross-topic data was extracted for users contributing to two or more topics. This requirement was satisfied by 293,415 users, who were again constrained to at least 70 samples per user. Finally, a cross-topic corpus was obtained with 308 topics and 188,077 text samples for 1,237 distinct authors. No other pre-processing steps were performed on the collected data. A summary of the corpus statistics for FICSIT is given in Table 1.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.