In a world full of online videos, it is really hard to find relevant content as the data is simply too much. A recommendation system was created to refine this experience, to match relevant content to an interested user. Most recommending systems use algorithms, calculations, and implicit feedback. These methods are effective unless the video does not have implicit feedback in which the algorithms will mostly fail to get relevant content. This is known as cold-start that affects newly uploaded videos, since they start without any data or user comments. Another problem facing users every day is finding the content they want, because it is dependent on videos having labels or having many user views. Since the search engine's mechanism uses the tags and keywords inserted for the video rather than the actual content in it. In this paper, a recommendation system by content is proposed, the system detects the objects and sounds inside the video, and also adds the feature to search using uploaded scenes or filter scenes based on keyword inputted. More experimental results have been done with various scenarios to demonstrate the effectiveness of the proposed system in terms of video recommendation by content.