Depression constitutes a significant mental health condition, impacting an individual's emotional state, thought processes, and ability to carry out everyday tasks. Depression is defined by ongoing feelings of sadness, diminished interest in previously enjoyed activities, alterations in hunger, sleep disturbances, decreased vitality, and challenges with focus. The impact of depression extends beyond the individual, affecting society at large through decreased productivity and higher healthcare costs. In the realm of social media, users often express their thoughts and emotions through posts, which can provide insightful data for identifying patterns of depression. This research aims to detect depression early by analyzing social media user content with machine learning techniques. We have built advanced machine learning models using a benchmark depression database containing 20,000 tagged tweets from user profiles identified as depressed or non-depressed. We are introducing an innovative BERT-RF feature engineering method that extracts Contextualized Embeddings and Probabilistic Features from textual input. The Bidirectional Encoder Representations from Transformers (BERT) model, based on the Transformer architecture, is used to extract Contextualized Embedding features. These features are then fed into a random forest model to generate class probabilistic features. These prominent features aid in enhancing the identification of depression from social media. In order to classify tweets using the features derived from the BERT-RF features selection step, we have used five popular classifiers: Random Forest (RF), Multilayer Perceptron (MLP), K-Neighbors Classifier (KNC), Logistic Regression (LR), and Long Short-Term Memory (LSTM). Evaluation experiments show that our approach, using BERT-RF for feature engineering, enables the Logistic Regression model to outperform state-of-the-art methods with a high accuracy score of 99%. We have validated the results through k-fold cross-validation and statistical T-tests. We achieved 99% k-fold accuracy during the validation of the proposed approach. This research contributes significantly to computational linguistics and mental health analytics by providing a robust approach to the early detection of user depression from social media content.