Sign language is the most effective communication for deaf or hard-of-hearing people. Specialized training is required to understand sign language, and as such, people without disabilities around them cannot communicate effectively. The main objective of this study is to develop a mechanism for streamlining the deep learning model for sign language recognition by utilizing the 30 most prevalent words in our everyday lives. The dataset was designed through 30 ASL (American Sign Language) words consisting of custom-processed video sequences, which consist of 5 subjects and 50 sample videos for each class. The CNN model can be applied to video frames to extract spatial properties. Using CNN’s acquired data, the LSTM model may then predict the action being performed in the video. We present and evaluate the results of two separate datasets—the Pose dataset and the Raw video dataset. The dataset was trained with the Long-term Recurrent Convolutional Network (LRCN) approach. Finally, a test accuracy of 92.66% was reached for the raw dataset, while 93.66% for the pose dataset.