For the purpose of constructing a naturalistic emotional speech database, a novel paradigm of collecting naturalistic emotional speech during a spontaneous Japanese dialog was proposed. The proposed paradigm was assessed by investigating whether the collected speech contains and conveys rich emotions psychologically and acoustically. To encourage speakers to experience and express their natural and vivid emotions, a Massively Multiplayer Online Role-Playing Game (MMORPG) was adopted as a task for speakers. They were asked to play the MMORPG together while discussing strategies to achieve their tasks through a voice chat system. The recording was performed for one hour per speaker. The total recording time was approximately 14 hours. The results of emotional labeling for the collected speech supported the validity of the paradigm showing higher interlabeler agreement than the chance levels. In addition, it was revealed that the paradigm is superior in the quantity of emotional speech to other paradigm by showing a significantly higher rate of labeling instances for our speech material (73%, 2 ð2Þ ¼ 27659:87, p < 0:001) than other speech materials. Finally, an acoustical analysis supported the validity of the paradigm, showing a significant difference between the nonemotional utterances and the emotional utterances (p < 0:05).