The aim of this study was to assess the feasibility and test-retest reliability of the Welfare Quality® Animal Welfare Assessment Protocol for Growing Pigs. Twenty-three German pig farms were visited repeatedly by the same trained observers; each farm being visited six times during two fattening periods. The entire protocol assessment was carried out during each farm visit, ie a Qualitative Behaviour Assessment (QBA), behavioural observations (BO), a Human Animal Relationship test (HAR) and different individual parameters (IPs), eg bursitis and tail-biting. Test-retest reliability was evaluated by a Wilcoxon signed rank test (W) and by calculation of the Smallest Detectable Change (SDC) and Limits of Agreement (LoA). The QBA presented non-satisfactory agreement between farm visits. However, good agreement, in general, was found for the BO. For the HAR, no reliability could be detected. Most IPs were of acceptable agreement, with the exception of bursitis and manure on the body. Bursitis showed great differences, which can be explained by difficulties in the assessment when the animals moved around or their legs were dirty. The disagreement in the parameter manure on the body can be explained by seasonal effects. Disagreement was further found concerning the parameters coughing, sneezing, pleuritis, pneumonia and milkspots. Feasibility was good; both observers could be well-trained to fulfil the protocol. Furthermore, the time needed for an assessment did not exceed 6 h. The parts of the protocol that proved to be insufficiently reliable need to be addressed in the future in order to enhance and improve the objective measurement of animal welfare.
Animal welfare has become an important subject of public and political debate, leading to the necessity of an objective evaluation system for on-farm use. As welfare is a multi-dimensional concept, it makes sense to use a multi-criteria aggregation system to obtain an overall welfare score. Such an aggregation system is provided by the Welfare Quality® Network. The present paper focusses on the assessment of the multi-criteria evaluation model included in the Welfare Quality® protocol for growing pigs in order to aggregate the animal-based indicators first to criteria, then to principles and finally to an overall welfare score. Specifically, the importance of the indicators on the overall assessment of growing pig farms is analysed in a given population which consisted of a total of 198 protocol assessments carried out on a sample of 24 farms in Germany. By means of partial least squares modelling, the influence of measures in the calculation procedure is estimated by calculation and interpretation of Variable Importance for Projection (VIP) scores. Variable Importance for Projection scores revealed some meaningful, unexpected influences as the multi-criteria evaluation model of Welfare Quality® aimed at avoiding interferences and double-counting. Some of these influences led to the assumption that some measures might have potential as iceberg indicators, whereas others showed lesser importance. Thus, feasibility can be gained by the deletion and special weighting of indicators according to their importance. Altogether, the study is an essential contribution to the further development of the Welfare Quality® protocols as well as the application of multi-criteria decision systems in the field of animal welfare science in general.
The present paper focuses on evaluating the interobserver reliability of the ‘Welfare Quality® Animal Welfare Assessment Protocol for Growing Pigs’. The protocol for growing pigs mainly consists of a Qualitative Behaviour Assessment (QBA), direct behaviour observations (BO) carried out by instantaneous scan sampling and checks for different individual parameters (IP), e.g. presence of tail biting, wounds and bursitis. Three trained observers collected the data by performing 29 combined assessments, which were done at the same time and on the same animals; but they were carried out completely independent of each other. The findings were compared by the calculation of Spearman Rank Correlation Coefficients (RS), Intraclass Correlation Coefficients (ICC), Smallest Detectable Changes (SDC) and Limits of Agreements (LoA). There was no agreement found concerning the adjectives belonging to the QBA (e.g. active: RS: 0.50, ICC: 0.30, SDC: 0.38, LoA: −0.05 to 0.45; fearful: RS: 0.06, ICC: 0.0, SDC: 0.26, LoA: −0.20 to 0.30). In contrast, the BO showed good agreement (e.g. social behaviour: RS: 0.45, ICC: 0.50, SDC: 0.09, LoA: −0.09 to 0.03 use of enrichment material: RS: 0.75, ICC: 0.68, SDC: 0.06, LoA: −0.03 to 0.03). Overall, observers agreed well in the IP, e.g. tail biting (RS: 0.52, ICC: 0.88; SDC: 0.05, LoA: −0.01 to 0.02) and wounds (RS: 0.43, ICC: 0.59, SDC: 0.10, LoA: −0.09 to 0.10). The parameter bursitis showed great differences (RS: 0.10, ICC: 0.0, SDC: 0.35, LoA: −0.37 to 0.40), which can be explained by difficulties in the assessment when the animals moved around quickly or their legs were soiled. In conclusion, the interobserver reliability was good in the BO and most IP, but not for the parameter bursitis and the QBA.
This paper focuses on the reliability of the multi-criteria evaluation model included in the Welfare Quality® protocol for growing pigs to aggregate the animal-based indicators, first to criteria, then to principle level and finally to an overall welfare score. This assessment was carried out in a practical application study on a sample of 24 farms in Germany. Altogether, 102 protocol assessments were carried out in repeated visits to these farms in order to evaluate the inter-observer and test-retest repeatability of the overall scores calculated by the multi-criteria evaluation system. Reliability is then assessed by the calculation of different reliability and agreement parameters: Spearman Rank Correlation Coefficients (RS), Intraclass Correlation Coefficients (ICC), Smallest Detectable Changes (SDC) and Limits of Agreement (LoA). Inter-observer repeatability was insufficient for the criteria comfort around resting, absence of injuries, expression of social behaviours, expression of other behaviours, good human-animal relationship and positive emotional state as well as for the principles good housing and appropriate behaviour. This is probably due in the main to insufficient repeatability of the underlying indicators that have been revealed in previous studies. Test-retest repeatability is predominantly insufficient. Overall, the present results highlight the importance of absolutely reliable indicators at the baseline level. Furthermore, it could be shown that the calculation procedure is partly incorrect and consequently needs correction. Therefore, this study is an important contribution to the future progression of the Welfare Quality® protocols and animal welfare assessment tools in general.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.