Model ensemble is widely used in deep learning since it can balance the variance and bias of complex models. The mainstream model ensemble methods can be divided into “implicit” and “explicit”. The “implicit” method obtains different models by randomly inactivating the internal parameters in the complex structure of the deep learning model, and these models are integrated by sharing parameters. However, these methods lack flexibility because they can only ensemble homogeneous models with the similar structure. While the “explicit” ensemble method can fuse completely different heterogeneous model structures, which significantly enhances the flexibility of model selection and makes it possible to integrate more models with entirely different perspectives. However, the explicit ensemble will face the challenge of averaging the outputs, leading to a chaotic result. To this end, researchers further proposed using knowledge distillation and adversarial learning technologies to perform a nonlinear combination of multiple heterogeneous models to obtain better ensemble performance, however these require significant modifications to the training or testing procedure and are computationally expensive compared to simply averaging. In this paper, based on the linear combination assumption, we propose an interpretable ensemble method for averaging model results which is simple to implement, and conducting experiments on the representation learning tasks of Computer Vision(CV) and Natural Language Processing(NLP). The results show that our method is superior to direct averaging results while retaining the practicality of direct averaging.
In representation learning domain, the mainstream methods for model ensemble include “implicit” ensemble approaches, such as using techniques like dropout, and “explicit” ensemble methods, such as voting or weighted averaging based on multiple model outputs. Compared to implicit ensemble techniques, explicit ensemble methods allow for more flexibility in combining models with different structures to obtain different perspectives on representations. However, the representations obtained from different models do not guarantee a linear relationship, and simply linearly combining multiple model outputs may result in a degraded performance. Meanwhile, employing non‐linear fusion mechanisms such as distillation and meta‐learning can be uninterpretable and time‐consuming. To this end, we propose the hypothesis of linear fusion based on the output representations of deep learning models, and design a interpretable linear fusion method based on this hypothesis. This method applies a transform layer to map the output representations of different models to the same classification center. Experimental results demonstrate that compared to directly averaging the representations, our method achieves better performance. Additionally, our method retains the convenience of direct averaging while offering improved performance in terms of time and computational efficiency compared to non‐linear fusion. Furthermore, we test the applicability of our method in both computer vision and natural language processing representation tasks using supervised and semi‐supervised approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.