This article addresses the problem of how a robot can infer what a person has done recently, with a focus on checking oral medicine intake in dementia patients. We present PastVision+, an approach showing how thermovisual cues in objects and humans can be leveraged to infer recent unobserved human-object interactions. Our expectation is that this approach can provide enhanced speed and robustness compared to existing methods, because our approach can draw inferences from single images without needing to wait to observe ongoing actions and can deal with short-lasting occlusions; when combined, we expect a potential improvement in accuracy due to the extra information from knowing what a person has recently done. To evaluate our approach, we obtained some data in which an experimenter touched medicine packages and a glass of water to simulate intake of oral medicine, for a challenging scenario in which some touches were conducted in front of a warm background. Results were promising, with a detection accuracy of touched objects of 50% at the 15 s mark and 0% at the 60 s mark, and a detection accuracy of cooled lips of about 100 and 60% at the 15 s mark for cold and tepid water, respectively. Furthermore, we conducted a follow-up check for another challenging scenario in which some participants pretended to take medicine or otherwise touched a medicine package: accuracies of inferring object touches, mouth touches, and actions were 72.2, 80.3, and 58.3% initially, and 50.0, 81.7, and 50.0% at the 15 s mark, with a rate of 89.0% for person identification. The results suggested some areas in which further improvements would be possible, toward facilitating robot inference of human actions, in the context of medicine intake monitoring.