With the increasing spread of smartphones and wearable devices equipped with various sensors, human activities, biometric information, and surrounding situations can be recognized. The process of human activity recognition must construct a model that has learned annotated sensor data, i.e., ground truth, labels, or answer activity, in advance. Therefore, a large and diverse set of annotated data is required to improve and evaluate model performance. It is difficult to judge a user's situation even after observing acceleration data; thus, it is necessary to annotate the collected acceleration data. In this paper, we propose a method to estimate user and device situations from the user's response to a notification generated by a device, e.g., a smartphone. The user and device situations are estimated from the user's response time to the notification and the device's acceleration values. An estimation result with high confidence is given to the sensor data as an annotation. Increasing the frequency of notifications, response to the notifications can be used as a sensor. We assume that acceleration values are affected by a user and device situation when the device notifications are taken instantly after its generation. The system pursues a high precision of estimation by selecting input acceleration data based on the interaction to the notification so that the estimations can be used as annotations. Through an evaluation experiment, for seven types of annotation classes, an average precision of 0.769 and 0.963 for user-independent experiments and user-dependent experiments were achieved, respectively. We also tested the proposed method in a natural environment, where 25 correct annotations were given for 45 responses to notifications, no annotations were given for 19 responses, and only one incorrect notification was observed.