2011
DOI: 10.1007/978-3-642-23538-2_47
|View full text |Cite
|
Sign up to set email alerts
|

Zanzibar OpenIVR: An Open-Source Framework for Development of Spoken Dialog Systems

Abstract: Abstract. The maturity of standards and the availability of open source components for all levels of the MRCP stack provide us with new opportunities for the development of spoken dialog technology. In this paper a standard-based and modular architecture for interactive voice response (IVR) systems is presented together with its implementation -Zanzibar OpenIVR. The architecture, described in terms of components and standards, is compared to other existing frameworks. The usage of our framework is discussed re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2013
2013
2017
2017

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(7 citation statements)
references
References 4 publications
0
7
0
Order By: Relevance
“…Please see Figure for a schematic overview of this framework. Because the HALEF architecture and components have been described in detail in prior publications (Ramanarayanan, Suendermann‐Oeft, Ivanov, & Evanini, ; Suendermann‐Oeft, Ramanarayanan, Teckenbrock, Neutatz, & Schmidt, ), we only briefly mention the various modules of the system here: Telephony servers Asterisk (van Meggelen, Smith, & Madsen, ) and FreeSWITCH (Minessale, Schreiber, Collins, & Chandler, ), which are compatible with Session Initiation Protocol (SIP), Public Switched Telephone Network (PSTN), and web Real‐Time Communications (WebRTC) standards and include support for voice and video A voice browser, JVoiceXML (Schnelle‐Walka, Radomski, & Mühlhäuser, ), which is compatible with VoiceXML 2.1 and can process SIP traffic and which incorporates support for multiple grammar standards, such as Java Speech Grammar Format (JSGF), Advanced Research Projects Agency (ARPA), and Weighted Finited State Transducer (WFST) An Media Resource Control Protocol (MRCP) speech server (Prylipko, Schnelle‐Walka, Lord, & Wendemuth, ), Cairo, which allows the voice browser to initiate SIP or Real‐Time Transport Protocol (RTP) connections from/to the telephony server and incorporates two speech recognizers (Sphinx and Kaldi; see respectively Lamere et al, ; Povey et al, ) and synthesizers (Mary and Festival; see respectively Schröder & Trouvain, ; Taylor, Black, & Caley, ). An Apache Tomcat‐based web server, which can host dynamic VoiceXML pages, web services, and media libraries containing grammars and audio files OpenVXML, a VoiceXML‐based voice application authoring suite: generates dynamic web applications that can be housed on the web server A MySQL database server for storing call logs A speech transcription, annotation, and rating portal that allows one to listen to and transcribe full‐call recordings, rate them on a variety of dimensions such as caller experience and latency, and perform various semantic annotation tasks required to train ASR and SLU modules …”
Section: The Halef Dialog Ecosystemmentioning
confidence: 99%
“…Please see Figure for a schematic overview of this framework. Because the HALEF architecture and components have been described in detail in prior publications (Ramanarayanan, Suendermann‐Oeft, Ivanov, & Evanini, ; Suendermann‐Oeft, Ramanarayanan, Teckenbrock, Neutatz, & Schmidt, ), we only briefly mention the various modules of the system here: Telephony servers Asterisk (van Meggelen, Smith, & Madsen, ) and FreeSWITCH (Minessale, Schreiber, Collins, & Chandler, ), which are compatible with Session Initiation Protocol (SIP), Public Switched Telephone Network (PSTN), and web Real‐Time Communications (WebRTC) standards and include support for voice and video A voice browser, JVoiceXML (Schnelle‐Walka, Radomski, & Mühlhäuser, ), which is compatible with VoiceXML 2.1 and can process SIP traffic and which incorporates support for multiple grammar standards, such as Java Speech Grammar Format (JSGF), Advanced Research Projects Agency (ARPA), and Weighted Finited State Transducer (WFST) An Media Resource Control Protocol (MRCP) speech server (Prylipko, Schnelle‐Walka, Lord, & Wendemuth, ), Cairo, which allows the voice browser to initiate SIP or Real‐Time Transport Protocol (RTP) connections from/to the telephony server and incorporates two speech recognizers (Sphinx and Kaldi; see respectively Lamere et al, ; Povey et al, ) and synthesizers (Mary and Festival; see respectively Schröder & Trouvain, ; Taylor, Black, & Caley, ). An Apache Tomcat‐based web server, which can host dynamic VoiceXML pages, web services, and media libraries containing grammars and audio files OpenVXML, a VoiceXML‐based voice application authoring suite: generates dynamic web applications that can be housed on the web server A MySQL database server for storing call logs A speech transcription, annotation, and rating portal that allows one to listen to and transcribe full‐call recordings, rate them on a variety of dimensions such as caller experience and latency, and perform various semantic annotation tasks required to train ASR and SLU modules …”
Section: The Halef Dialog Ecosystemmentioning
confidence: 99%
“…The HALEF (Help Assistant-Language-Enabled and Free) framework leverages different open-source components to form an SDS framework that is modular and industry-standard-compliant: Asterisk, a SIP-(Session Initiation Protocol) and PSTN-(Public Switched Telephone Network) compatible telephony server (van Meggelen et al 2009); JVoiceXML, an open-source voice browser that can process SIP traffic (Schnelle-Walka et al 2013) via a voice browser interface called Zanzibar (Prylipko et al 2011); Cairo, an MRCP (Media Resource Control Protocol) speech server, which allows the voice browser to initiate SIP or RTP (Real-time Transport Protocol) connections from/to the telephony server (Prylipko et al 2011); the Sphinx automatic speech recognizer (Lamere et al 2003); Festival (Taylor et al 1998) and Mary (Schröder and Trouvain 2003)-text-to-speech synthesis engines; and an Apache Tomcat-based web server that can host dynamic VoiceXML (VXML) pages and serve media files such as grammars 1 and audio files to the voice browser. Figure 5.1 schematically depicts the main components of the HALEF system.…”
Section: Halef System Descriptionmentioning
confidence: 99%
“…Prylipko et al [4] explained the architecture of Zanzibar OpenIVR which includes speech application server utilized with Voice-XML interpreter (JVoiceXML), speech recognition engine (CMU Sphinx 4) and text to speech engine (FreeTTS). The prototype used dialogue management with mixed imitative dialogue strategy.…”
Section: Introductionmentioning
confidence: 99%
“…The mixed strategy allows the system to ask for missing information from the user, while still following the form items in VoiceXML. Prylipko et al [4] also tabulated different characteristics of available platforms for spoken dialogue system. The comparison focuses on use of modular component, VoiceXML dialogue and being open source.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation