Human activity recognition has been a branch of interest in the field of computer vision for decades, due to its numerous applications in different domains, such as medicine, surveillance, entertainment or human-computer interaction. We propose an intuitive, effective, quickly trainable and customizable system for recognizing human activities designed with an automated machine learning method based on Neural Architecture Search. Information from all channels of a 3D video (RGB and depth data, skeleton and context objects) is merged by independently passing these data streams through 2D convolutional neural networks. The outputs of all networks are combined in a summarizing array of class scores using fusion mechanisms that are not computationally intensive but reflect the meaningful information from a video. The proposed system is tested using three public datasets and a new dataset-PRECIS HAR-that was created in our laboratory. In all our experiments, the system is proven to be highly accurate: 98.43% on MSRDailyActivity3D, 91.41% on UTD-MHAD, 90.95% on NTU RGB+D, and 94.38% on our dataset. INDEX TERMS Automated machine learning, context, convolutional neural networks, data fusion, human activity recognition, RGB-D data, skeleton.