In this paper, the aim is to build a hybrid Word Sense Disambiguation(WSD) technique, which is acutely focused on text associated with a certain form of visual. Natural language processing helps establish a context among the data elements that are aggregated to establish a certain meaning. Analyzing transcripts of visuals being uploaded in real-time saves resources and time required to sort content based on genres or emotions. The training data lays a foundation to rate the polarities of elements, on top of which the dictionary expands as an when new content is supplied to the apparatus. Third-party intelligence is combined with the dictionary to experience growth even when the consumer usage is idle. All these entities are mutually intertwined to ensure maximum utility and output.