2020
DOI: 10.1109/tip.2019.2947204
|View full text |Cite
|
Sign up to set email alerts
|

RhythmNet: End-to-End Heart Rate Estimation From Face via Spatial-Temporal Representation

Abstract: Heart rate (HR) is an important physiological signal that reflects the physical and emotional status of a person. Traditional HR measurements usually rely on contact monitors, which may cause inconvenience and discomfort. Recently, some methods have been proposed for remote HR estimation from face videos; however, most of them focus on well-controlled scenarios, their generalization ability into less-constrained scenarios (e.g., with head movement, and bad illumination) are not known. At the same time, lacking… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
189
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 258 publications
(189 citation statements)
references
References 41 publications
(128 reference statements)
0
189
0
Order By: Relevance
“…A straightforward way of employing heart rate (HR) signals for DeepFake detection is to use existing HR representations that are designed for the remote HR estimation. For example, we can use the spatial-temporal representation (STR) proposed by Niu et al [48] for representing HR signals and feed them to a classifier for DeepFake detection. However, it is hard to achieve high fake detection accuracy with the STR directly since the differences between real and fake videos are not highlighted, i.e., STR's discriminative power for DeepFake detection is limited.…”
Section: Motion-magnified Spatial-temporal Representationmentioning
confidence: 99%
See 1 more Smart Citation
“…A straightforward way of employing heart rate (HR) signals for DeepFake detection is to use existing HR representations that are designed for the remote HR estimation. For example, we can use the spatial-temporal representation (STR) proposed by Niu et al [48] for representing HR signals and feed them to a classifier for DeepFake detection. However, it is hard to achieve high fake detection accuracy with the STR directly since the differences between real and fake videos are not highlighted, i.e., STR's discriminative power for DeepFake detection is limited.…”
Section: Motion-magnified Spatial-temporal Representationmentioning
confidence: 99%
“…As shown in Table 3, our MMSTR significantly improves the DR-st's accuracy, e.g., 0.328 improvement on ALL. The ST map from [48] has little discriminative power for DeepFake detection since DR-st achieves about 0.5 accuracy on every testing dataset, which means it randomly guesses a video being real/fake. After using our MMSTR, DR-mmst achieves 0.217 averaged accuracy increment over DFD, DF, F2F, FS, and ALL datasets.…”
Section: Ablation Study On Accuracymentioning
confidence: 99%
“…Therefore, a new publicly available dataset, directly related to rPPG-suitable practical applications, is vital. The currently available datasets include MAHNOB-HCI [33], DEAP [34], MMSE-HR [35], PURE [36], OBF [37], and VIPL-HR [38] et al, and their specifications are listed in Table 3. At present, rPPG is tentatively applied in Intensive Care Units (ICU), because the subjects are still and a frontal face video can be continuously collected.…”
Section: Face Detection Effect Performance Description Advantage Disamentioning
confidence: 99%
“…To handle the noises from non-rigid motions, they divided the signal into smaller parts of the same length, computed the standard deviation for each subpart, and removed subparts having standard deviations too different from others. Finally, more recent studies are learning-based; they attempted to learn the mapping from facial videos to heart rate-related signals from a huge dataset using deep neural networks [34], [35].…”
Section: Related Workmentioning
confidence: 99%