Detecting depression on social media has received significant attention. Developing a depression detection model helps screen depressed individuals who may need proper treatment. While prior work mainly focused on developing depression detection models with social media posts, including text and image, little attention has been paid to how videos on social media can be used to detect depression. To this end, we propose a depression detection model that utilizes both audio and video features extracted from the vlogs (video logs) on YouTube. We first collected vlogs from YouTube and annotated them into depression and non-depression. We then analyze the statistical differences between depression and non-depression vlogs. Based on the lessons learned, we build a depression detection model that learns both audio and visual features, achieving high accuracy. We believe our model helps detect depressed individuals on social media at an early stage so that individuals who may need appropriate treatment can get help.