Presently, large recordings like audio and video are uploaded each day on the public networks particularly Facebook, YouTube, etc. Such postings generate unlimited data via the Internet. Over the next few decades adapting and dealing with data mining to obtain relevant details from social media is considered as a difficult task. To tackle these challenges, a novel text-audio-video consistency-driven multimodal sentiment analysis method is proposed in this paper. This proposed system examines the correlation among the text, audio and video followed by multimodal sentiment analysis.A different set of features are extracted and then the extracted features are chosen optimally by employing a new hybrid grass bee optimization algorithm (HGBEE) thus obtaining a feature set containing an optimal value for better precision and low computational time. Then an utterance level multimodular fusion regarding text, audio, and video features is developed. Finally, the proposed multikernal extreme learning classifier (MKELM) is employed for sentiment classification. Then the proposed system is evaluated by testing with three multimodal datasets in terms of precision, accuracy, recall, and F-measure, respectively. From the simulation, it is clear that the proposed system accomplishes the maximum classification accuracy value of 98.3% with minimum computation time. The implementation of our proposed approach is done under the MATLAB platform.
K E Y W O R D Saccuracy and computation time, feature level fusion, grass bee, kernal extreme learning classifier, multimodal sentiment analysis Recently, the public network facilitates gathering information for posting and sharing various types of multimodal features in social media without knowing much detail regarding the network topology and architecture. 1 Individuals are now widely utilizing the internet-based life condition, for example, YouTube, Facebook, Blog, and Microblog so as to express their opinions. 2 Here, sentiment analysis acts as a key for unlocking huge data since