From the past few decades, Human activity recognition (HAR) is one of the vital research areas in computer vision in which much research is ongoing. The researcher's focus is shifting towards this area due to its vast range of real-life applications to assist in daily living. Therefore, it is necessary to validate its performance on standard benchmark datasets and state-of-the-art systems before applying it in real-life applications. The primary objective of this Systematic Literature Review (SLR) is to collect existing research on video-based human activity recognition, summarize, and analyze the state-of-the-art deep learning architectures regarding various methodologies, challenges, and issues. The top five scientific databases (such as ACM, IEEE, ScienceDirect, SpringerLink, and Taylor & Francis) are accessed to accompany this systematic study by summarizing 70 different research articles on human activity recognition after critical review. Human activity recognition in videos is a challenging problem due to its diverse and complex nature. For accurate video classification, extraction of both spatial and temporal features from video sequences is essential. Therefore, this SLR focuses on reviewing the recent advancements in stratified self-deriving feature-based deep learning architectures. Furthermore, it explores various deep learning techniques available for HAR, challenges researchers to face to build a robust model, and state-of-theart datasets used for evaluation. This SLR intends to provide a baseline for video-based human activity recognition research while emphasizing several challenges regarding human activity recognition accuracy in video sequences using deep neural architectures.