Objective: The fusion of multiple noisy labels for biomedical data (such as ECG annotations, which may be obtained from human experts or from automated systems) into a single robust annotation has many applications in physiologic monitoring. Directly modelling the difficulty of the task has the potential to improve the fusion of such labels. This paper proposes a means for the incorporation of task difficulty, as quantified by ‘signal quality’, into the fusion process. Approach: We propose a Bayesian fusion model to infer a consensus through aggregating labels, where the labels are provided by multiple imperfect automated algorithms (or ‘annotators’). Our model incorporates the signal quality of the underlying recording when fusing labels. We compare our proposed model with previously published approaches. Two publicly available datasets were used to demonstrate the feasibility of our proposed model: one focused on QT interval estimation in the ECG and the other focused on respiratory rate (RR) estimation from the photoplethysmogram (PPG). We inferred the hyperparameters of our model using maximum- a posteriori inference and Gibbs sampling. Main results: For the QT dataset, our model significantly outperformed the previously published models (root-mean-square error of ms for our model versus ms from the best existing model) when fusing labels from only three annotators. For the RR dataset, no improvement was observed compared to the same model without signal quality modelling, where our model outperformed existing models (mean-absolute error of bpm for our model versus bpm from the best existing model). We conclude that our approach demonstrates the feasibility of using a signal quality metric as a confidence measure to improve label fusion. Significance: Our Bayesian learning model provides an extension over existing work to incorporate signal quality as a confidence measure to improve the reliability of fusing labels from biomedical datasets.