Nowadays, all over the world a great deal of data is being created every day at every instant through several electronically driven social platforms namely Facebook, WhatsApp, Twitter, etc. Data thus created may consist of negotiation between a businessman and a customer during the online purchase of different things, Chit-chat between friends by using different types of emojis, an online discussion between the educationists and the students, and pasting and uploading of interesting pictures and videos, etc. People from all spheres of society are engaged in expressing their views and sentiments through these worldwide networking systems. At the same time in recent days, several powerful machines and deep learning tools have emerged to analyse this huge bulk of data to visualize the social, economic, political, and psychological scenarios of different communities, states, and nations. In this article, the authors have adopted a deep learning-based multimodal approach to investigate and compare the efficiencies of different deep learning architectures namely Multi-Layered Perceptron (MLP), Recurrent Neural Network (RNN), Bi-directional RNN, Long and Short Memory (LSTM), Bi-directional Long and Short Memory (Bi-LSTM) have to carry out the important task of sentiment analysis. The noticeable results, conclusions, and applications that emerged from the study will be discussed in a systematic manner. Some relevant references consulted during the literature survey of this work will be presented afterward. Finally, the authors acknowledge the helps they have received during the successful completion of this article.