Requirements for animal and dairy products are increasing gradually in emerging economic bodies. However, it is critical and challenging to maintain the health and welfare of the increasing population of dairy cattle, especially the dairy calf (up to 20% mortality in China). Animal behaviors reflect considerable information and are used to estimate animal health and welfare. In recent years, machine vision-based methods have been applied to monitor animal behaviors worldwide. Collected image or video information containing animal behaviors can be analyzed with computer languages to estimate animal welfare or health indicators. In this proposed study, a new deep learning method (i.e., an integration of background-subtraction and inter-frame difference) was developed for automatically recognizing dairy calf scene-interactive behaviors (e.g., entering or leaving the resting area, and stationary and turning behaviors in the inlet and outlet area of the resting area) based on computer vision-based technology. Results show that the recognition success rates for the calf’s science-interactive behaviors of pen entering, pen leaving, staying (standing or laying static behavior), and turning were 94.38%, 92.86%, 96.85%, and 93.51%, respectively. The recognition success rates for feeding and drinking were 79.69% and 81.73%, respectively. This newly developed method provides a basis for inventing evaluation tools to monitor calves’ health and welfare on dairy farms.