Reliability of observational data was measured simultaneously by two assessors under two experimental conditions. During overt assessment, observers were told that reliability would be measured by one of the two assessors, thus permitting computation of reliability with an identified and an unidentified assessor. During covert assessment, observers were not informed of the reliability measured. Throughout the study, each of the assessors employed a unique version of a standard observational code. In the overt assessment condition, reliability of observers with the identified assessor was consistently higher than reliability with the unidentified assessor, indicating that observers modified their observational criteria to approximate those of the identified assessor. In the covert assessment condition, reliability with the two assessors was substantially lower than during overt assessment. Further, observers consistently recorded lower frequencies of disruptive behavior than the two assessors during covert assessment.
Systematic biases of observational recordings of behavior as a function of experimental hypotheses were investigated. Predictions of decrease and of no change in level of recorded behavior as a function of "treatment" were given, respectively, to two groups of five pairs of observers. Both groups viewed the same videotapes, selected to show no change from "baseline" to "treatment." Global evaluations of treatment effects were significantly affected by predicted results, but behavioral recordings were not. Observational recordings were increased by knowledge by observers that reliability was being assessed, computation of reliability within (versus between) observer pairs, and computation of reliability by the observers (versus the experimenter). Implications of these findings for studies utilizing observational data are discussed.
Simultaneous observational recordings were made in vivo, via an observation mirror, and via closed circuit television. Three of nine observers had extensive experience recording behavior in vivo; three had extensive experience recording behavior via mirror; and three had extensive experience recording via television. Observers recorded nine categories of disruptive behavior for children in a special class setting. Frequencies of behavior recorded in vivo, via mirror, and via television differed significantly for only one category, vocalization. There were no significant main effects or interactions involving the observers' previous experience. Occurrence reliability coefficients computed within and between media demonstrated the similarity of observer agreement in all three media. Data collection procedures using an observational mirror or closed circuit television appear to be reasonable alternatives to in vivo observation in circumstances similar to those in the present study.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.