This article discusses three issues concerning content analysis method and ends with a list of best practices in conducting and reporting content analysis projects. Issues addressed include the use of search and databases for sampling, the differences between content analysis and algorithmic text analysis, and which reliability coefficients should be calculated and reported. The "Best Practices" section provides steps to produce reliable and valid content analysis data and the appropriate reporting of those steps so the project can be properly evaluated and replicated.
This study compares 20 sets each of samples of four different sizes (n=7, 14, 21 and 28) using simple random, constructed week and consecutive day samples of newspaper content. Comparisons of sample efficiency, based on the percentage of sample means in each set of 20 falling within one or two standard errors of the population mean, show the superiority of constructed week sampling.
This study views intercoder reliabilityasasamplingproblem. It develops a formula for generating sample sizes needed to have valid reliability estimates. It also suggests steps for reporting reliability. The resulting sample sizes will permit a known degree ofconfidence that the agreement in a sample of items is representative of the pattern that would occur ifall content items were coded by all coders.Every researcher who conducts a content analysis faces the same question: How large a sample of content units should be used to assess the level of reliability?To an extent, sample size depends on the number of content units in the population and the homogeneity of the population with respect to variable coding complexity. Content can be categorized easily for some variables, but not for other variables. How does a researcher ensure that variations in degree of difficulty are included in the reliability assessment?As in most applications involving representativeness, the answer is probability sampling, assuring that each unit in the reliability check is selected randomly.' Calculating sampling error for reliability tests is possible with probability sampling, but few content analyses address this point.This study views intercoder reliability as a sampling problem, requiring clarification of the term "population." Content analysis typically refers to a study's "population" as all potentially codable content from which a sample is drawn and analyzed. However, this sample itself becomes a "population" of content units from which a sample of test units is randomly drawn to check reliability. This article suggests content samples need to have reliability estimates representing the population. The resulting sample sizes will permit a known degree of confidence that the agreement in a sample of test units is representative of the pattern that would occur if all study units were coded by all coders.Reproducibility reliability is the extent to which coding decisions can be replicated by different researchers? In principle, the use of multiple independent coders applying the same rules in the same way assures that categorized content does not represent the bias of one coder.Research methods texts discuss reliability in terms of measurement error resulting from problems in coding instructions, failure of coders to achieve a common frame of reference, and coder mistakes? Few texts or
BackgroundStephen Lacy is a professor in the Michigan State University School of Journalism, and Daniel Rife is professor in the E. W. Scripps School of Journalism at Ohio University. The authors thank Fred Ficofor his comments and suggestions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.