The paper proposes a system for identifying gestures and actions in smart homes. The proposed method is based on MobilenetV2 feature extraction combining with single shot detector (SSD) network. We used eleven types of gestures of walking, sitting down, falling back, wearing shoes, waving hands, falling down, smoking, baby crawling, standing up, reading, and typing for recognizing the gestures. In this system, the data are captured from the camera of mobile devices that are used to detect the object. The results are obtained objects on the frame by a bounding box. The results show that the system meets the requirements with an accuracy of over 90% that is suitable for real application.