Background: A smartphone is a promising tool for daily cardiovascular measurement and mental stress monitoring. Photoplethysmography (PPG) and low---cost thermography can be used to create cheap, convenient and mobile systems. However, to achieve robustness, a person has to remain still for several minutes while a measurement is being taken. This is very cumbersome, and limits the usage in applications such producing instant measurements of stress. Objective: We propose to use smartphone---based mobile PPG and thermal imaging to provide a fast binary measure of stress responses to an event using dynamical physiological changes which occur within 20 seconds of the event finishing. Methods: We propose a system that uses a smartphone and its physiological sensors to reliably and continuously measure over a short window of time a person's blood volume pulse, the time interval between heartbeats (R---R interval) and the 1D thermal signature of the nose tip. 17 healthy participants, involved in a series of stress---inducing mental activities, measured their physiological response to stress in the 20 second---window immediately following each activity. A 10---cm Visual Analogue Scale was used by them to self---report their level of mental stress. As a main labeling strategy, normalized K---means clustering is used to better treat inter---personal differences in ratings. By taking an array of the R---R intervals and thermal directionality as a low---level feature input, we mainly use an artificial neural network to enable the automatic feature learning and the machine learning inference process. To compare the automated inference performance, we also extracted widely used high level features from HRV (e.g., LF/HF ratio) and the thermal signature and input them to a k---nearest neighbor to infer perceived stress levels.Results: First, we tested the physiological measurement reliability. The measured cardiac signals were considered highly reliable (signal goodness probability used, Mean=0.9584, SD=0.0151). The proposed 1D thermal signal processing algorithm effectively minimized the effect of respiratory cycles on detecting the apparent temperature of the nose tip (respiratory signal goodness probability Mean=0.8998 to Mean=0). Second, we tested the 20 seconds instant perceived stress inference performance. The best results were obtained by using automatic feature learning and classification using artificial neural networks rather than using pre---crafted features. The combination of both modalities produced higher accuracy on the