Wearable technology will become available and allow prolonged electroencephalography (EEG) monitoring in the home environment of patients with epilepsy. Neurologists analyse the EEG visually and annotate all seizures, which patients often under-report. Visual analysis of a 24-h EEG recording typically takes one to two hours. Reliable automated seizure detection algorithms will be crucial to reduce this analysis. We investigated such algorithms on a dataset of behind-the-ear EEG measurements. Our first aim was to develop a methodology where part of the data is deferred to a human expert, who performs perfectly, with the goal of obtaining an (almost) perfect detection sensitivity (DS). Prediction confidences are determined by temperature scaling of the classification model outputs and trust scores. A DS of approximately 90% (99%) can be achieved when deferring around 10% (40%) of the data. Perfect DS can be achieved when deferring 50% of the data. Our second contribution demonstrates that a common modelling strategy, where predictions from several short EEG segments are combined to obtain a final prediction, can be improved by filtering out untrustworthy segments with low trust scores. The false detection rate shows a relative decrease between 21% and 43%, and the DS shows a small increase or decrease.