Despite many efforts, the behavior of a crowd is not fully understood. The advent of modern communication media has made it an even more challenging problem, as crowd dynamics could be driven by both human-to-human and human-technology interactions. Here, we study the dynamics of a crowd controlled game (Twitch Plays Pokémon), in which nearly a million players participated during more than two weeks. We dissect the temporal evolution of the system dynamics along the two distinct phases that characterized the game. We find that players who do not follow the crowd average behavior are key to succeed in the game. The latter finding can be well explained by an n-th order Markov model that reproduces the observed behavior. Secondly, we analyze a phase of the game in which players were able to decide between two different modes of playing, mimicking a voting system. Our results suggest that under some conditions, the collective dynamics can be better regarded as a swarm-like behavior instead of a crowd. Finally, we discuss our findings in the light of the social identity theory, which appears to describe well the observed dynamics.