A residual-networks family with hundreds or even thousands of layers
dominates major image recognition tasks, but building a network by simply
stacking residual blocks inevitably limits its optimization ability. This paper
proposes a novel residual-network architecture, Residual networks of Residual
networks (RoR), to dig the optimization ability of residual networks. RoR
substitutes optimizing residual mapping of residual mapping for optimizing
original residual mapping. In particular, RoR adds level-wise shortcut
connections upon original residual networks to promote the learning capability
of residual networks. More importantly, RoR can be applied to various kinds of
residual networks (ResNets, Pre-ResNets and WRN) and significantly boost their
performance. Our experiments demonstrate the effectiveness and versatility of
RoR, where it achieves the best performance in all residual-network-like
structures. Our RoR-3-WRN58-4+SD models achieve new state-of-the-art results on
CIFAR-10, CIFAR-100 and SVHN, with test errors 3.77%, 19.73% and 1.59%,
respectively. RoR-3 models also achieve state-of-the-art results compared to
ResNets on ImageNet data set.Comment: IEEE Transactions on Circuits and Systems for Video Technology 201
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.