The ability of children to combine syllables represents an important developmental milestone. This ability is often delayed or impaired in a variety of clinical groups, including children with autism spectrum disorders (ASD) and speech delays (SPD). Prior work has demonstrated successful use of computer-based voice visualizations to facilitate speech production and vocalization in children with and without ASD/SPD. While prior work has focused on increasing frequency of speech-like vocalizations or accuracy of speech sound production, we believe that there is a potential new direction of research: exploration of real-time visualizations to shape multisyllabic speech. Over two years we developed VocSyl, a real-time voice visualization system. Rather than building visualizations based on what adult clinicians and software designers may think is needed, we designed VocSyl using the Task Centered User Interface Design (TCUID) methodology throughout the design process. Children with ASD and SPD, targeted users of the software, were directly involved in the development process, allowing us to focus on what these children demonstrate they require. This paper presents the results of our TCUID design cycle of VocSyl, as well as design guidelines for future work with children with ASD and SPD.