Deep learning is a research hot topic in the field of machine learning. Real-value neural networks (Real NNs), especially deep real networks (DRNs), have been widely used in many research fields. In recent years, the deep complex networks (DCNs) and the deep quaternion networks (DQNs) have attracted more and more attentions. The octonion algebra, which is an extension of complex algebra and quaternion algebra, can provide more efficient and compact expression. This paper constructs a general framework of deep octonion networks (DONs) and provides the main building blocks of DONs such as octonion convolution, octonion batch normalization and octonion weight initialization; DONs are then used in image classification tasks for CIFAR-10 and CIFAR-100 data sets. Compared with the DRNs, the DCNs, and the DQNs, the proposed DONs have better convergence and higher classification accuracy. The success of DONs is also explained by multi-task learning.
IntroductionReal-value neural networks (Real NNs) [1][2][3][4][5][6][7][8][9][10][11][12] attracted the attention of many researchers and recently made major breakthroughs in many areas such as signal processing, image processing, natural language processing, etc.Many models of Real NNs have been constructed in the literature. These models can generally be categorized into two kinds: non-deep models and deep models. The non-deep models are mainly constructed by multilayer perceptron module [13] and hard to train, if we only use the real-valued back propagation (BP) algorithm [14], when their layers are larger than 4. The deep models can be roughly constructed by the following two strategies: multilayer perceptron models assisted by the unsupervised pretrained methods (for example, deep belief nets [15], deep auto-encoder [16], etc.) and real-value convolutional neural networks (Real CNNs), including LeNet-5 [17], AlexNet [18], Inception [19-22], VGGNet [23], HighwayNet [24], ResNet [25], ResNeXt [26], DenseNet [27], FractalNet [28], PolyNet [29], SENet [30], CliqueNet [31], BinaryNet [32], SqueezeNet [33], MobileNet [34], etc.Although Real CNNs have achieved great success in various applications, the correlations between convolution kernels generally do not take into consideration, that is, there are no connections or no special relationships considering between convolution kernels. The opposite of Real CNNs is real-value recurrent neural networks (Real RNNs) [35][36][37][38], who obtain the correlations by adding the connections between convolution kernels and then learn the weights of these connections, which, however, increased significantly the training difficulty and was easier to encounter converge problems. The first question has been raised: Can we consider the correlations between convolution kernels by some special relationships, which do not need to learn, instead of adding the connections between convolution kernels?Many researchers find that the performance can be improved when the relationships between convolution kernels are modeled by complex algebra, quaterni...