Study Objective:Sleep is reflected not only in the electroencephalogram but also in heart rhythms and breathing patterns. Therefore, we hypothesize that it is possible to accurately stage sleep based on the electrocardiogram (ECG) and respiratory signals.Methods: Using a dataset including 8,682 polysomnographs, we develop deep neural networks to stage sleep from ECG and respiratory signals. Five deep neural networks consisting of convolutional networks and long short-term memory networks are trained to stage sleep using heart and breathing, including the timing of R peaks from ECG, abdominal and chest respiratory effort, and the combinations of these signals.Results: ECG in combination with the abdominal respiratory effort achieve the best performance for staging all five sleep stages with a Cohen's kappa of 0.600 (95% confidence interval 0.599 -0.602); and 0.762 (0.760 -0.763) for discriminating awake vs. rapid eye movement vs. non-rapid eye movement sleep. The performance is better for young participants and for those with a low apnea-hypopnea index, while it is robust for commonly used outpatient medications.
Conclusions:Our results validate that ECG and respiratory effort provide substantial information about sleep stages in a large population. It opens new possibilities in sleep research and applications where electroencephalography is not readily available or may be infeasible, such as in critically ill patients.
Deep Network ArchitectureWe trained five deep neural networks based on the following input signals and their combinations: 1) ECG; 2) CHEST (chest respiratory effort); 3) ABD (abdominal respiratory effort); 4) ECG+CHEST; and 5) ECG+ABD. Each deep neural network contained a feed-forward convolutional neural network (CNN) which learned features pertaining to each epoch, and a recurrent neural network (RNN), in this case long-short term memory (LSTM), to learn temporal patterns among consecutive epochs.The CNN of the network is similar to that in Hannun et al. 20 . As shown in Figure 1A and Figure 1B, the network for a single type of input signal, i.e. ECG, CHEST or ABD, consists of a convolutional layer, several residual blocks and a final output block. For a network with both ECG and CHEST/ABD as input signals ( Figure 1C), we first fixed the weights of the layers up to the 9 th residual block (gray) for the ECG network and similarly fixed up to the 5 th residual block (gray) for the CHEST/ABD network, concatenated the outputs, and then fed this concatenation into a subnetwork containing five residual blocks and a final output block. The numbers of fixed layers were chosen so that the outputs of layers from different modalities have the same shape (after padding zeros), and were then concatenated.The LSTM of the network has the same structure for different input signals. It is a bi-directional LSTM, where the context cells from the forward and backward directions are concatenated. For the network