Many human activities and states are related to the facial muscles’ actions: from the expression of emotions, stress, and non-verbal communication through health-related actions, such as coughing and sneezing to nutrition and drinking. In this work, we describe, in detail, the design and evaluation of a wearable system for facial muscle activity monitoring based on a re-configurable differential array of stethoscope-microphones. In our system, six stethoscopes are placed at locations that could easily be integrated into the frame of smart glasses. The paper describes the detailed hardware design and selection and adaptation of appropriate signal processing and machine learning methods. For the evaluation, we asked eight participants to imitate a set of facial actions, such as expressions of happiness, anger, surprise, sadness, upset, and disgust, and gestures, like kissing, winkling, sticking the tongue out, and taking a pill. An evaluation of a complete data set of 2640 events with 66% training and a 33% testing rate has been performed. Although we encountered high variability of the volunteers’ expressions, our approach shows a recall = 55%, precision = 56%, and f1-score of 54% for the user-independent scenario(9% chance-level). On a user-dependent basis, our worst result has an f1-score = 60% and best result with f1-score = 89%. Having a recall ≥60% for expressions like happiness, anger, kissing, sticking the tongue out, and neutral(Null-class).