Human activity recognition from the video is an important problem due to its potential applications in remote surveillance, content-based video retrieval, and in humanoid robots. Most of the visual activity recognition research in the last decade focus on recognizing basic human actions from wellconstrained laboratory videos and depend upon fully annotated video dataset for model training. Whereas activity recognition from unconstrained videos is still very challenging due to large variations in object appearance and pose, occlusion, and inter and intraclass variations. It is an extremely laborious task to prepare large scale realistic video activity dataset with detailed annotations of the human, object, and their mutual interactions in each frame. Although it is intuitive to model contextual relationships form fully annotated dataset, however, it is unknown as to how reliably multilevel contextual features can be extracted in the absence of annotated dataset. To mitigate these challenges, we propose a weakly supervised approach for complex human activity recognition from realistic videos. The proposed approach requires only activity labels for each video to train the model. A novel multilevel contextual features and context estimation procedure from the un-annotated dataset is also introduced. Restricted Boltzman machine is used to systematically integrate multilevel contextual features. We evaluate the proposed approach on benchmark realistic surveillance video datasets for human-human and human-object interaction activity recognition. The experimental results show improved accuracies on benchmark datasets without using fully annotated datasets.INDEX TERMS Activity recognition, multilevel context model, multilevel contextual features, weakly supervised.MUHAMMAD AJMAL is currently pursuing the Ph.D. degree in computer science under the Faculty Development Program. He is currently a Lecturer with the Department of Computer Science, COMSATS University Islamabad, Lahore Campus, Pakistan. He has published numerous articles in well reputed national and international journals and conferences in the past decade. His research interests include computer vision, machine learning, vision-based human activity recognition and summarization, theory and applications of formal methods, and the verification and simulation of systems.