We present work in progress on a multimodal dialog system for English language assessment using a modular cloud‐based architecture adhering to open industry standards. Among the modules being developed for the system, multiple modules heavily exploit machine learning techniques, including speech recognition, spoken language proficiency rating, speaker recognition, and the scoring of behaviors in multimodal data streams.