Predicting change from multivariate time series has relevant applications ranging from medical to engineering fields. Multisensory stimulation therapy in patients with dementia aims to change the patient’s behavioral state. For example, patients who exhibit a baseline of agitation may be paced to change their behavioral state to relaxed. This study aims to predict changes in behavioral state from the analysis of the physiological and neurovegetative parameters to support the therapist during the stimulation session. In order to extract valuable indicators for predicting changes, both handcrafted and learned features were evaluated and compared. The handcrafted features were defined starting from the CATCH22 feature collection, while the learned ones were extracted using a Temporal Convolutional Network, and the behavioral state was predicted through Bidirectional Long Short-Term Memory Auto-Encoder, operating jointly. From the comparison with the state-of-the-art, the learned features-based approach exhibits superior performance with accuracy rates of up to 99.42% with a time window of 70 seconds and up to 98.44% with a time window of 10 seconds.