In this paper, we propose a novel area of research referred to as singing information processing. To shape the concept of this area, we first introduce singing understanding systems for synchronizing between vocal melody and corresponding lyrics, identifying the singer name, evaluating singing skills, creating hyperlinks between phrases in the lyrics of songs, and detecting breath sounds. We then introduce music information retrieval systems based on similarity of vocal melody timbre and vocal percussion, and singing synthesis systems. Common signal processing techniques for modeling singing voices that are used in these systems, such as techniques for extracting the vocal melody from polyphonic music recordings and modeling the lyrics by using phoneme HMMs for singing voices, are discussed.Index Terms-Music, singing information processing, singing voice modeling, vocal melody
INTRODUCTIONAs research on music information processing [1, 2, 3], including research on music information retrieval [4], has continued to rapidly expand, research activities related to singing have also become more vigorous. Such activities are attracting attention not only from a scientific point of view, but also from the standpoint of industrial applications. Singing-related research is highly diverse, ranging from basic research on the features unique to singing to applied research such as that on the synthesis of singing voices, lyrics recognition, singer identification, retrieval of singing voices, and singing-skill evaluation. In this paper, we refer to this broad range of singing-related studies as singing information processing and introduce examples of these studies with the focus on signal processing techniques for modeling singing voices.Singing possesses aspects of both speech and music, and there are many unsolved research problems from the viewpoint of either field. For example, singing voices generally fluctuate more than speaking voices, and musical accompaniment, which is closely interlinked with singing, is usually included at a relatively high volume. Because of these characteristics, the automatic recognition of singing is the most difficult class of speech recognition from a technical point of view. In fact, the automatic recognition of lyrics in vocals has not yet been fully achieved. Furthermore, from the viewpoint of music recognition and understanding, large fluctuations and variations in singing cause various difficulties compared to a similar analysis of musical instruments. Technically speaking, there are many difficult and deeply interesting problems in this regard. Similarly, in the research on singing synthesis, many problems still exist, since, in addition to conveying content in the form of language as in speaking, singing synthesis requires dynamic, complex, and expressive changes in the voice pitch, intensity, and timbre of singing. In this way, the study of singing information processing is a genuine frontier of science.Moreover, while music is an important type of content from the viewpoints o...