Modern smartphones and smartwatches are equipped with inertial sensors (accelerometer, gyroscope, and magnetometer) that can be used for Human Activity Recognition (HAR) to infer tasks such as daily activities, transportation modes and, gestures. HAR requires collecting raw inertial sensor values and training a machine learning model on the collected data. The challenge in this approach is that the models are trained for specific devices and device configurations whereas, in reality, the set of devices carried by a person may vary over time. Ideally, one would like activity inferencing to be robust of this variation and provide accurate predictions by making opportunistic use of information from available devices. Moreover, the devices may be located at different parts of the body (e.g. pocket, left and right wrist), may have different sets of sensors (e.g. a smartwatch may not have gyroscope while a smartphone might), and may differ in sampling frequencies. In this paper, we provide a solution which makes use of the information from available devices while being robust to their variations. Instead of training an endto-end model for every permutation of device combinations and configurations, we propose a scalable deep learning based solution in which each device learns its own sensor fusion model that maps the raw sensor values to a shared low dimensional latent space which we call the 'SenseHAR'-a virtual activity sensor. The virtual sensor has the same format and similar behavior regardless of the subset of devices, sensor's availability, sampling rate, or a device's location. This would help machine learning engineers to develop their application specific (e.g., from gesture recognition to activities of daily life) models in a hardware-agnostic manner based on this virtual activity sensor. Our evaluations show that an application model trained on SenseHAR achieves the state of the art accuracies of 95.32%, 74.22% and 93.13% on PAMAP2, Opportunity(gestures) and our collected datasets respectively.