Music has been shown to improve social interaction and attention to verbal stimuli in autism. We report a feasibility randomised controlled trial of an online intervention using music-assisted programmes, compared with best-practice treatment (Social Communication Intervention for Pre-schoolers–Intensive) for language learning in preschool autistic children with minimal verbal language. Minimisation randomisation ensured comparability of groups before intervention. Ninety-one people expressed interest in taking part; 27 met eligibility criteria and were randomised to receive either music-assisted programmes or Social Communication Intervention for Pre-schoolers–Intensive. Children and their parent received two 45-min sessions weekly, over 18 weeks, coached online by a speech and language therapist. A smartphone app was developed to support home-based practice between sessions. Over the study period, 20% of participants completed the intervention and assessments of outcome measures. At 3 months post-intervention follow-up, social responsiveness, understanding of words and phrases and number of words spoken and parent–child interaction improved more in the music-assisted programmes than the Social Communication Intervention for Pre-schoolers–Intensive group. The results demonstrate the feasibility of recruiting this population into a randomised controlled trial and the music-assisted programmes had high perceived acceptability highlighted by parent interviews. A full clinical trial to establish music-assisted programmes’ effectiveness in improving early vocabulary learning in autistic children is warranted. Lay abstract Research has shown that autistic individuals often have unusually good musical skills and that combining words and music helps autistic individuals to focus on spoken words. This study tests the idea that music will help with early language learning of preschool autistic children. The results show that when caregivers sing words to autistic children, the children pay more attention to the caregiver than when the words are spoken and that they learn word combinations more easily.