2020
DOI: 10.3390/app10196857
|View full text |Cite
|
Sign up to set email alerts
|

Audio-Visual Tensor Fusion Network for Piano Player Posture Classification

Abstract: Playing the piano in the correct position is important because the correct position helps to produce good sound and prevents injuries. Many studies have been conducted in the field of piano playing posture recognition that combines various techniques. Most of these techniques are based on analyzing visual information. However, in the piano education field, it is essential to utilize audio information in addition to visual information due to the deep relationship between posture and sound. In this paper, we pro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
12
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 6 publications
(12 citation statements)
references
References 13 publications
0
12
0
Order By: Relevance
“…The authors proposed methods that apply various operations such as a Kronecker product and a Hadamard product, respectively, to implicitly represent spatiotemporal information and in the end fuse each feature value obtained through the operations. Finally, the previous methods have a problem [11][12][13][14][15][16] in that they do not provide evidence or sufficient explanation for the results derived by the model. This problem is resolved through a method of representing the RGB image, which preserves the pre-fusion data and the Conv-layer filters as analyzable feature maps (Figure 1e).…”
Section: Spaito-temporal Data Representationmentioning
confidence: 99%
See 4 more Smart Citations
“…The authors proposed methods that apply various operations such as a Kronecker product and a Hadamard product, respectively, to implicitly represent spatiotemporal information and in the end fuse each feature value obtained through the operations. Finally, the previous methods have a problem [11][12][13][14][15][16] in that they do not provide evidence or sufficient explanation for the results derived by the model. This problem is resolved through a method of representing the RGB image, which preserves the pre-fusion data and the Conv-layer filters as analyzable feature maps (Figure 1e).…”
Section: Spaito-temporal Data Representationmentioning
confidence: 99%
“…This problem is resolved through a method of representing the RGB image, which preserves the pre-fusion data and the Conv-layer filters as analyzable feature maps (Figure 1e). [11], (b) late fusion [12,13], (c) AV-TFN [14], (d) MTLN [16], and (e) the proposed method (TRT-Net).…”
Section: Spaito-temporal Data Representationmentioning
confidence: 99%
See 3 more Smart Citations