We develop an interface that bridges crowd simulation and natural language processing techniques so that casual users can produce crowd animation by text input. The interface adopts a parser and a tagger to analyze simple English sentences to convert them into intermediate data structures that encapsulate the essential elements of crowds. There are five stages: preprocessing, parsing input sentences, crowd generation, animation adjustment, and crowd animation. Our system supports basic behaviors including standing, walking, running, escaping, being attracted, and queuing. We conducted a user experience study to evaluate the interface. The results show that the interface is user-friendly for casual users to produce crowd animation. K E Y W O R D S crowd animation, interactive interface, natural language processing 2 RELATED WORK 2.1 Natural language processing Various types of models have been proposed for performing sequence tagging tasks (e.g., labeling each word of a sentence), such as linear statistical models 13 , 14 convolutional network based models, 15 and recurrent neural network models. 16 Our