Web images represent unstructured data sets which often lead to challenges when users try to locate distinct images via text-based searches on the web. Such difficulties stem from different factors, e.g., redundant image storage, irrelevant metadata tags, and incorrect associations. To overcome this issue, we propose a semantic model based on the ontology language that enables users to find images that exactly match their queries. The proposed technique employs a simple procedure where users generate image captions by constructing an ontology for each image in the repository. In order to fit the existing ontology domains, the ontology generation relies on information gathered from the image's visual and textual elements, including low-level features like color, name, and shape. Next, constructing the ontology establishes accurate relationships with existing ontology concepts using the "an" and "is a part of" relationships. The resulting text with immersed ontology information yields accurate results, leading to easy retrieval using semantics keyword searches. Our framework relies on two main ontology concepts, i.e., animals and vehicles. In this study, we used a dataset of MAT files comprising images, content, and information to study the ontology of animals (e.g., wolves, foxes, and dogs) as well as the ontology of vehicles. The overall comparative evaluation of the proposed framework was performed under various conditions to obtain valuable insights.