Non-intrusive monitoring of fine-grained activities of daily living (ADL) enables various smart healthcare applications. For example, ADL pattern analysis for older adults at risk can be used to assess their loss of safety or independence. Prior work in the area of ADL recognition has focused on coarse-grained ADL recognition at the context-level (e.g., cooking, cleaning, sleeping), and/or activity duration segmentation (hourly or minutely). It also typically relies on a high-density deployment of a variety of sensors. In this work, we target a finer-grained ADL recognition at the action-level to provide more detailed ADL information, which is crucial for enabling the assessment of patients' activity patterns and potential changes in behavior. To achieve this fine-grained ADL monitoring, we present a heterogeneous multi-modal cyber-physical system, where we use (1) distributed vibration sensors to capture the action-induced structural vibrations and their spatial characteristics for information aggregation, and (2) single point electrical sensor to capture appliance usage with high temporal resolution. To evaluate our system, we conducted real-world experiments with multiple human subjects to demonstrate the complementary information from these two sensing modalities. Our system achieved an average 90% accuracy in recognizing activities, which is up to 2.6× higher than baseline systems considering each state-of-the-art sensing modality separately.