Epilepsy is one of the most common neurological disorders and affects over 65 million people worldwide. Despite the continuing advances in anti-epileptic treatments, one third of the epilepsy patients live with drug resistant seizures. Besides, the mortality rate among epileptic patients is 2-3 times higher than in the matching group of the general population. Wearable devices offer a promising solution for the detection of seizures in real time so as to alert family and caregivers to provide immediate assistance to the patient. However, in order for the detection system to be reliable, a considerable amount of labeled data is needed to train it. Labeling epilepsy data is a costly and time-consuming process that requires manual inspection and annotation of electroencephalogram (EEG) recordings by medical experts. In this paper, we present a self-learning methodology for epileptic seizure detection without medical supervision. We propose a minimally-supervised algorithm for automatic labeling of seizures in order to generate personalized training data. We demonstrate that the median deviation of the labels from the ground truth is only 10.1 seconds or, equivalently, less than 1% of the signal length. Moreover, we show that training a real-time detection algorithm with data labeled by our algorithm produces a degradation of less than 2.5% in comparison to training it with data labeled by medical experts. We evaluated our methodology on a wearable platform and achieved a lifetime of 2.59 days on a single battery charge.
We take a deep look into the behaviour of selfattention heads in the transformer architecture. In light of recent work discouraging the use of attention distributions for explaining a model's behaviour, we show that attention distributions can nevertheless provide insights into the local behaviour of attention heads. This way, we propose a distinction between local patterns revealed by attention and global patterns that refer back to the input, and analyze BERT from both angles. We use gradient attribution to analyze how the output of an attention head depends on the input tokens, effectively extending the local attention-based analysis to account for the mixing of information throughout the transformer layers. We find that there is a significant mismatch between attention and attribution distributions, caused by the mixing of context inside the model. We quantify this discrepancy and observe that interestingly, there are some patterns that persist across all layers despite the mixing.
Automatic ICD coding is the task of assigning codes from the International Classification of Diseases (ICD) to medical notes. These codes describe the state of the patient and have multiple applications, e.g., computer-assisted diagnosis or epidemiological studies. ICD coding is a challenging task due to the complexity and length of medical notes. Unlike the general trend in language processing, no transformer model has been reported to reach high performance on this task. Here, we investigate in detail ICD coding using PubMedBERT, a stateof-the-art transformer model for biomedical language understanding. We find that the difficulty of fine-tuning the model on long pieces of text is the main limitation for BERT-based models on ICD coding. We run extensive experiments and show that despite the gap with current state-of-the-art, pretrained transformers can reach competitive performance using relatively small portions of text. We point at better methods to aggregate information from long texts as the main need for improving BERT-based ICD coding.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.