SUMMARY Interrater variability of sleep stage scorings has an essential impact not only on the reading of polysomnographic sleep studies (PSGs) for clinical trials but also on the evaluation of patientsÕ sleep. With the introduction of a new standard for sleep stage scorings (AASM standard) there is a need for studies on interrater reliability (IRR). The SIESTA database resulting from an EU-funded project provides a large number of studies (n = 72; 56 healthy controls and 16 subjects with different sleep disorders, mean age ± SD: 57.7 ± 18.7, 34 females) for which scorings according to both standards (AASM and R&K) were done. Differences in IRR were analysed at two levels: (1) based on quantitative sleep parameter by means of intraclass correlations; and (2) based on an epoch-by-epoch comparison by means of CohenÕs kappa and FleissÕ kappa. The overall agreement was for the AASM standard 82.0% (CohenÕs kappa = 0.76) and for the R&K standard 80.6% (CohenÕs kappa = 0.68). Agreements increased from R&K to AASM for all sleep stages, except N2. The results of this study underline that the modification of the scoring rules improve IRR as a result of the integration of occipital, central and frontal leads on the one hand, but decline IRR on the other hand specifically for N2, due to the new rule that cortical arousals with or without concurrent increase in submental electromyogram are critical events for the end of N2.k e y w o r d s AASM scoring standard, interrater reliability, Rechtschaffen and Kales, SIESTA project, sleep stage scoring
SUMMAR Y Interrater variability of sleep stage scorings is a well-known phenomenon. The SIESTA project offered the opportunity to analyse interrater reliability (IRR) between experienced scorers from eight European sleep laboratories within a large sample of patients with different (sleep) disorders: depression, general anxiety disorder with and without non-organic insomnia, Parkinson's disease, period limb movements in sleep and sleep apnoea. The results were based on 196 recordings from 98 patients (73 males: 52.3 ± 12.1 years and 25 females: 49.5 ± 11.9 years) for which two independent expert scorings from two different laboratories were available. Cohen's j was used to evaluate the IRR on the basis of epochs and intraclass correlation was used to analyse the agreement on quantitative sleep parameters. The overall level of agreement when five different stages were distinguished was j ¼ 0.6816 (76.8%), which in terms of j reflects a 'substantial' agreement (Landis and Koch, 1977). For different groups of patients j values varied from 0.6138 (Parkinson's disease) to 0.8176 (generalized anxiety disorder). With regard to (sleep) stages, the IRR was highest for rapid eye movement (REM), followed by Wake, slow-wave sleep (SWS), non-rapid eye movement 2 (NREM2) and NREM1. The results of regression analysis showed that age and sex only had a statistically significant effect on j when the (sleep) stages are considered separately. For NREM2 and SWS a statistically significant decrease of IRR with age has been observed and the IRR for SWS was lower for males than for females. These variations of IRR most probably reflect changes of the sleep electroencephalography (EEG) with age and gender.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.