Malignant cerebral edema occurs when brain swelling displaces and compresses vital midline structures within the first week of a large middle cerebral artery stroke. Early interventions such as hyperosmolar therapy or surgical decompression may reverse secondary injury but must be administered judiciously. To optimize treatment and reduce secondary damage, clinicians need strategies to frequently and quantitatively assess the trajectory of edema using updated, relevant information. However, existing risk assessment tools are limited by the absence of structured records capturing the evolution of edema and typically estimate risk at a single time point early in the admission, therefore failing to account for changes in variables over the following hours or days. To address this, we developed and validated dynamic machine learning models capable of accurately predicting the severity of midline structure displacement, an established indicator of malignant edema, in real-time. Our models can provide updated estimations as frequently as every hour, using data from structured time-varying patient records, radiographic text, and human-curated neurological characteristics. Our work resulted in two novel multi-class classification models, collectively named Hybrid Ensemble Learning Models for Edema Trajectory (HELMET), predicting the progression of midline shift over 8-hour (HELMET-8) and 24-hour windows (HELMET-24), respectively. HELMET combines transformer-based large language models with supervised ensemble learning, demonstrating the value of merging human expertise and multimodal health records in developing clinical risk scores. Both models were trained on a retrospective cohort of 15,696 observations from 623 patients hospitalized with large middle cerebral artery ischemic stroke and were externally validated using 3,713 observations from 60 patients at a separate hospital system. Our HELMET models are accurate and generalize effectively to diverse populations, achieving a cross-validated mean area under the receiver operating characteristic score of 96.6% in the derivation cohort and 92.5% in the external validation cohort. Moreover, our approach provides a framework for developing hybrid risk prediction models that integrate both human-extracted and algorithm-derived multi-modal inputs. Our work enables accurate estimation of complex, dynamic, and highly specific clinical targets, such as midline shift, in real-time, even when relevant structured information is limited in electronic health record databases.