We investigate the similarities of pairs of articles which are co-cited at the different cocitation levels of the journal, article, section, paragraph, sentence and bracket. Our results indicate that textual similarity, intellectual overlap (shared references), author overlap (shared authors), proximity in publication time all rise monotonically as the co-citation level gets lower (from journal to bracket). While the main gain in similarity happens when moving from journal to article co-citation, all level changes entail an increase in similarity, especially section to paragraph and paragraph to sentence/bracket levels. We compare results from four journals over the years 2010-2015: Cell, the European Journal of Operational Research, Physics Letters B and Research Policy, with consistent general outcomes and some interesting differences. Our findings motivate the use of granular co-citation information as defined by meaningful units of text, with implications for, among others, the elaboration of maps of science and the retrieval of scholarly literature.
IntroductionThe co-citation relation is used extensively in bibliometrics, and has received some recent attention in information retrieval. Applications include the identification of topically-related publications for search engines and clustering of publications to understand the structure of science. If two or more publications are co-cited by a third one, they are generally assumed to be related to some extent, from the viewpoint of their citing authors (Small 1973). Normally, this assumption is considered to be valid already at a relatively coarse co-citation level, most often at the publication (e.g. article) level. In addition, recent work suggests that the relatedness of co-cited publications might increase with increasing proximity of two publications within the full text of the citing publication (e.g. Gipp & Beel 2009), and that improvements in maps of science or document retrieval can be obtained by taking textual proximity into account (e.g. Boyack et al. 2013). Indeed, it makes sense to assume that if two publications are co-cited within the same sentence or bracket in a publication, they typically will be in some way more related than two publications co-cited only at the more general section or publication levels. Yet open questions remain. We know little about the ways in which related, co-cited publications are similar over different dimensions. Furthermore, to what extent do different notions of similarity, such as textual and intellectual, depend on the level of the co-citation? This study was designed to provide answers to these questions.