Co-occurrence matrices, such as co-citation, co-word, and co-link matrices, have been used widely in the information sciences. However, confusion and controversy have hindered the proper statistical analysis of this data. The underlying problem, in our opinion, involved understanding the nature of various types of matrices. This paper discusses the difference between a symmetrical co-citation matrix and an asymmetrical citation matrix as well as the appropriate statistical techniques that can be applied to each of these matrices, respectively.Similarity measures (like the Pearson correlation coefficient or the cosine) should not be applied to the symmetrical co-citation matrix, but can be applied to the asymmetrical citation matrix to derive the proximity matrix. The argument is illustrated with examples. The study then extends the application of co-occurrence matrices to the Web environment where the nature of the available data and thus data collection methods are different from those of traditional databases such as the Science Citation Index. A set of data collected with the Google Scholar search engine is analyzed using both the traditional methods of multivariate analysis and the new visualization software Pajek that is based on social network analysis and graph theory.
Commercial search engines are now playing an increasingly important role in Web information dissemination and access. Of particular interest to business and national governments is whether the big engines have coverage biased towards the U.S. or other countries. In our study we tested for national biases in three major search engines and found significant differences in their coverage of commercial Web sites. The U.S. sites were much better covered than the others in the study: sites from China, Taiwan and Singapore. We then examined the possible technical causes of the differences and found that the language of a site does not affect its coverage by search engines. However, the visibility of a site, measured by the number of links to it, affects its chance to be covered by search engines. We conclude that the coverage bias does exist but this is due not to deliberate choices of the search engines but occurs as a natural result of cumulative advantage effects of U.S. sites on the Web. Nevertheless, the bias remains a cause for international concern.
Web citations have been proposed as comparable to, even replacements for, bibliographic citations, notably in assessing the academic impact of work in promotion and tenure decisions. We compared bibliographic and Web citations to articles in 46 journals in library and information science. For most journals (57%), Web citations correlated significantly with both bibliographic citations listed in the Social Sciences Citation Index and the ISI's Journal Impact Factor. Many of the Web citations represented intellectual impact, coming from other papers posted on the Web (30%) or from class readings lists (12%). Web citation counts were typically higher than bibliographic citation counts for the same article. Journals with more Web citations tended to have Web sites that provided tables of contents on the Web, while less cited journals did not have such publicity. The number of Web citations to journal articles increased from 1992 to 1997.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.