Mobile technologies promote computer-assisted language learning (CALL) while mobile applications, being learner-oriented by design, provide a powerful founding to build individual self-paced environments for language study. Mobile CALL (MALL) tools are able to offer new educational contexts and fix, at least, partially, the problems of previous generations of CALL software. Nonetheless, mobile technologies alone are not able to respond to CALL challenges without cooperation and interaction with language theory and pedagogy. To facilitate and formalize this interaction several criteria sets for CALL software has been worked out in recent years. That is why an approach based on using mobile devices is a natural way to transfer the learning process from teaching-centered classroom to a process, which is oriented to individual learners and groups of learners with better emphasis on supporting individual learning styles, user collaboration and different teaching strategies. Pronunciation teaching technology in one of areas, where the automated speech processing algorithms and corresponding software meet the problems of practical phonology. Computer-assisted prosody teaching (CAPT), a sub-domain of CALL, is a relatively new topic of interest for computer scientists and software developers. Present-day advancement of mobile CAPT tools is supported by evolutionary processes in the theory of language learning and teaching. This paper explores language–technology relations using a case of StudyIntonation – a cross-platform multi-functional mobile CAPT tool based on a digital processing core for speech processing, visualization and estimation developed by the authors. We particularly address the problems of developing CAPT evaluation frameworks. To define the problematic points of the project and understand the directions for future work, we discuss an approach to formalized evaluation using a set of CAPT-specific criteria drawing attention to such evaluation factors as general descriptive information, instructional purposes, functionality, usability, and presentation.