In an organization, individuals prefer to form various formal and informal groups for mutual interactions. Therefore, ubiquitous identification of such groups and understanding their dynamics are important to monitor activities, behaviours and well-being of the individuals. In this paper, we develop a lightweight, yet near-accurate, methodology, called MeetSense, to identify various interacting groups based on collective sensing through users' smartphones. Group detection from sensor signals is not straightforward because users in proximity may not always be under the same group. Therefore, we use acoustic context extracted from audio signals to infer interaction pattern among the subjects in proximity. We have developed an unsupervised and lightweight mechanism for user group detection by taking cues from network science and measuring the cohesivity of the detected groups in terms of modularity. Taking modularity into consideration, MeetSense can efficiently eliminate incorrect groups, as well as adapt the mechanism depending on the role played by the proximity and the acoustic context in a specific scenario. The proposed method has been implemented and tested under many real-life scenarios in an academic institute environment, and we observe that MeetSense can identify user groups with close to 90% accuracy even in a noisy environment.