A novel recurrent hybrid network for feature fusion in action recognition

Yu, Shengquan; Cheng, Yun; Xie, Lei; Luo, Zhiming; Huang, Min; Li, Shaozi

doi:10.1016/j.jvcir.2017.09.007

Cited by 23 publications

(11 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the other hand, feature concatenation often integrates features before classification by direct concatenation of the features extracted from individual modalities. Yu et al concatenated semantic features, long-term temporal features, and short-term temporal features of a video [43]. Ji et al concatenated object features, motion features and scene features from videos for linear classification [44].…”

Section: Fusion Methodsmentioning

confidence: 99%

A Hybrid Network for Large-Scale Action Recognition from RGB and Depth Modalities

Wang

Song

et al. 2020

Sensors

View full text Add to dashboard Cite

The paper presents a novel hybrid network for large-scale action recognition from multiple modalities. The network is built upon the proposed weighted dynamic images. It effectively leverages the strengths of the emerging Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) based approaches to specifically address the challenges that occur in large-scale action recognition and are not fully dealt with by the state-of-the-art methods. Specifically, the proposed hybrid network consists of a CNN based component and an RNN based component. Features extracted by the two components are fused through canonical correlation analysis and then fed to a linear Support Vector Machine (SVM) for classification. The proposed network achieved state-of-the-art results on the ChaLearn LAP IsoGD, NTU RGB+D and Multi-modal & Multi-view & Interactive ( M 2 I ) datasets and outperformed existing methods by a large margin (over 10 percentage points in some cases).

show abstract

Section: Fusion Methodsmentioning

confidence: 99%

A Hybrid Network for Large-Scale Action Recognition from RGB and Depth Modalities

Wang

Song

et al. 2020

Sensors

View full text Add to dashboard Cite

show abstract

“…To effectively learn spatiotemporal features, they apply a residual connection from the spatial stream to the temporal stream. Meantime, inspired by the success of recurrent neural networks in sequential information modeling [73]- [76], many researchers [42], [44], [45], [48], [77], [78] propose LSTM model for action recognition. Ng et al [44] and Donahue et al [45] extracted framelevel features of video by using CNNs model, and train LSTM with the frame-level feature for direct video-level prediction.…”

Section: B Deep Learning For Action Recognitionmentioning

confidence: 99%

“…Srivastava et al [48] proposed an approach for learning the sequence information in unsupervised settings by using LSTM architecture. To mitigate the overfitting problem, Yu et al [42] proposed a single-layer LSTM frameworks for learning long-term motion features. To learn spatio-temporal information, Zhang et al [79] proposed multi-level recurrent residual networks to produce complementary representations for action recognition.…”

Section: B Deep Learning For Action Recognitionmentioning

confidence: 99%

“…Moreover, 3D CNNs cannot be pretrained on ImageNet [41]. In addition, 3D models only preserve short-term temporal features [42] whereas long-term motion features are crucial for representation of human action in video.…”

Section: Introductionmentioning

confidence: 99%

“…Simultaneous training of CNNs and LSTM models is prone to overfitting on challenging benchmark database HMDB51 [7] and UCF101 [47], and recognition accuracy is lower than hand-crafted feature methods. In order to tackle overfitting problem, Yu et al [42] proposed single layer pi-LSTM architecture to learn long-term information for action recognition. However, shallow LSTM has difficulty in learning rich semantic features.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Learning Long-Term Temporal Features With Deep Neural Networks for Human Action Recognition

Xie

Liu

et al. 2020

IEEE Access

Self Cite

View full text Add to dashboard Cite

One of challenging tasks in the field of artificial intelligence is the human action recognition. In this paper, we propose a novel long-term temporal feature learning architecture for recognizing human action in video, named Pseudo Recurrent Residual Neural Networks (P-RRNNs), which exploits the recurrent architecture and composes each in different connection among units. Two-stream CNNs model (GoogLeNet) is employed for extracting local temporal and spatial features respectively. The local spatial and temporal features are then integrated into global long-term temporal features by using our proposed two-stream P-RRNNs. Finally, the Softmax layer fuses the outputs of two-stream P-RRNNs for action recognition. The experimental results on two standard databases UCF101 and HMDB51 demonstrate the outstanding performance of proposed method based on architectures for human action recognition. INDEX TERMS Action recognition, residual learning, recurrent neural networks, long short-term memory (LSTM).

show abstract

Unsupervised feature extraction with convolutional autoencoder with application to daily stock market prediction

Xie

2021

Concurrency and Computation

Self Cite

View full text Add to dashboard Cite

Due to the volatility and noise of the stock market, accurately obtaining the trend of the stock market is a challenging problem, and gets the attention of many researchers and speculators. Recently, convolutional neural network (CNN) has been used to automatically learn effective features and predict stock market trends. In CNN-based methods reported so far, less focus has been paid to time series information of the stock, but is very crucial for stock forecasting. In this study, an unsupervised feature extraction method with convolutional autoencoder (CAE) with application to daily stock market prediction is proposed, which has a higher prediction than traditional models.The proposed method mainly consists of the data processing part, unsupervised feature learning part, and the support vector machine model part. Data processing part includes time series data transform into two-dimensional data and data normalization. CAE network-based unsupervised feature learning is designed by fusing convolution and autoencoder. In order to verify the performance of the model, various initial financial and economic variables of stock indices are chosen for prediction experiments. The experimental results on different stock indices demonstrate a significant improvement in prediction's performance compared with the baseline methods. K E Y W O R D S convolutional autoencoder network, convolutional neural network, stock market prediction, support vector machine INTRODUCTIONThe stock market has become an important component for listed companies to raise funds from investors. However, the law of stock price change is difficult to accurately grasp, and most investors usually depend on subjective judgment to conduct stock trading. 1,2 The stock market has always been affected by political factors, the industry specific factors, the world economic situation, and the price trends are highly nonlinear and nonstationary. [3][4][5] Therefore, effective and efficient predicting the stock is an extremely challenging task for both investors and researchers. And thus, it is significant to research how to build a universal and effective model to forecast future movements of the stock market.There are a number of advanced methods that are applied for prediction of the stock market. Depending on the relationship between the historical behavior and future trend of the stock price is processed, forecasting methods can be divided into three categories. [6][7][8] Among statistical methodologies, due to the interpretability, linear regression, autoregression, and moving average are helpful in financial time series forecasting. Assume that the stock price trend is a result of historical behaviors, extracting effective features from the original stock data is the key of the prediction process. However, features are subjectively designed, and the accurate prediction of models mostly depended on the A preliminary version of this article has been published by the seventh International Conference on Information Science and Control Engineering.

show abstract

A novel recurrent hybrid network for feature fusion in action recognition

Cited by 23 publications

References 36 publications

A Hybrid Network for Large-Scale Action Recognition from RGB and Depth Modalities

A Hybrid Network for Large-Scale Action Recognition from RGB and Depth Modalities

Learning Long-Term Temporal Features With Deep Neural Networks for Human Action Recognition

Unsupervised feature extraction with convolutional autoencoder with application to daily stock market prediction

Contact Info

Product

Resources

About