Large wildfires (>125 hectares) in the United States account for over 95% of the burned area each year. Predicting large wildfires is imperative; however, current wildfire predictive models are region-based and computationally intensive. Using a scalable model based on easily available environmental and atmospheric data, this research aims to accurately predict whether large wildfires will develop across the United States. The data used in this study include 2109 wildfires over 20 years, representing 14 million hectares burned. Remote sensing environmental data (Normalized Difference Vegetation Index—NDVI; Enhanced Vegetation Index—EVI; Leaf Area Index—LAI; Fraction of Photosynthetically Active Radiation—FPAR; Land Surface Temperature during the Day—LST Day; and Land Surface Temperature during the Night—LST Night) consisting of 1.3 billion satellite observations was used. Atmospheric reanalysis data (u component of wind, v component of wind, relative humidity, temperature, and geopotential) at four pressure levels (300, 500, 700, and 850 Ha) were also factored in. Six machine learning classification models (Logistic Regression, Decision Tree, Random Forest, eXtreme Gradient Boosting, K-Nearest Neighbors, and Support Vector Machine) were created and tested on the resulting dataset to determine their accuracy in predicting large wildfires. Model validation tests and variable importance analysis were performed. The eXtreme Gradient Boosting (XGBoost) classification model performed best in predicting large wildfires, with 90.44% accuracy, a true positive rate of 0.92, and a true negative rate of 0.88. Furthermore, towards environmental justice, an analysis was performed to identify disadvantaged communities that are also vulnerable to wildfires. This model can be used by wildfire safety organizations to predict large wildfires with high accuracy and prioritize resource allocation to employ protective safeguards for impacted socioeconomically disadvantaged communities.