Shoplifting has got serious concern because of a steep surge in these types of cases all around. People are found stealing the items from the store without being noticed, either by putting them in bags or hiding objects inside clothes. CCTV cameras are generally installed at any such site, but evidences suggest that these cameras are not very effective unless the video feeds are constantly monitored. Therefore, we intend to build an automated and intelligent surveillance system to catch these shoplifters by identifying their stealing actions. This article proposes a deep neural network-based solution to identify these shoplifting activities. The model proposed uses a dual-stream fusion-based network that effectively binds appearance and motion dynamics in the temporal domain to efficiently identify the shoplifting actions. The deep Inception V3 model is used to extract activity-specific body posture features from video streams through two deep neural network pipelines, one each corresponding to appearance and motion information. Next, a recurrent neural network, namely Long Short Term Memory (LSTM) network, is used to build a temporal relation between features extracted from consecutive frames in order to distinguish human stealing actions accurately. Added to it, this article introduces a shoplifting dataset synthesized in our lab, which contains normal human actions and object stealing actions. The proposed methodology supported with experimental results demonstrates encouraging outcomes with the accuracy achieved up to 91.48%, which outperforms other existing methods.