As the demand for artificial intelligence robots and cognitive agents increases, it becomes essential for these agents to comprehend previous encounters and respond to inquiries based on their past experiences. In essence, they need to maintain their memories in an episodic manner. This paper presents a novel approach to address this demand by leveraging the real-life experiences of robots to enrich their knowledge base. To achieve this goal, we employ diverse artificial intelligence techniques, including computer vision, multimodal cross embeddings, speech processing, and generative AI. These methods are utilised to establish a knowledge base that functions as memories for an agent, enabling it to maintain a memory akin to that of a human.To ensure comprehensive memory retention, an agent encounters diverse scenarios such as interacting with individuals, observing conversations, and visiting various locations. To maintain a robust visual and linguistic knowledge base encompassing these experiences, we employ techniques like scene graphs, along with the aforementioned AI methodologies. Existing approaches that involve understanding language and vision used in problem statements such as video question answering, dialogue understanding, or world understanding often overlook the temporal order in which events are observed by the agent or may be restricted to the set of characters or the world in which it has been trained. They struggle to effectively retrieve memories and generate meaningful answers based on this chronological context or in cases where we may rely on a number of past experiences which may reason an event that happened in the future. So, in our study, we've worked on building a solid knowledge base and a way for an agent to remember and link events, just like people do.In conclusion, our work aims to make AI more like humans by helping agents remember and understand events better. This could lead to smarter AI systems that adapt well to different situations in the real world.