In an era dominated by digital communication, the vast amounts of data generated from social media and financial markets present unique opportunities and challenges for forecasting stock market prices. This paper proposes an innovative approach that harnesses the power of social media sentiment analysis combined with stock market data to predict stock prices, directly addressing the critical challenges in this domain. A major challenge in sentiment analysis is the uneven distribution of data across different sentiment categories. Traditional models struggle to accurately identify fewer common sentiments (minority class) due to the overwhelming presence of more common sentiments (majority class). To tackle this, we introduce the Off-policy Proximal Policy Optimization (PPO) algorithm, specifically designed to handle class imbalance by adjusting the reward mechanism in the training phase, thus favoring the correct classification of minority class instances. Another challenge is effectively integrating the temporal dynamics of stock prices with sentiment analysis results. Our solution is implementing a Transductive Long Short-Term Memory (TLSTM) model that incorporates sentiment analysis findings with historical stock data. This model excels at recognizing temporal patterns and gives precedence to data points that are temporally closer to the prediction point, enhancing the prediction accuracy. Ablation studies confirm the effectiveness of the Off-policy PPO and TLSTM components on the overall model performance. The proposed approach advances the field of financial analytics by providing a more nuanced understanding of market dynamics but also offers actionable insights for investors and policymakers seeking to navigate the complexities of the stock market with greater precision and confidence.