The monitoring of emotional state is important in the prevention and management of mental health problems and is increasingly being used to support affective computing. As such, researchers are exploring various modalities from which emotion can be inferred, such as through facial images or via electroencephalography (EEG) signals. Current research commonly investigates the performance of machine-learning-based emotion recognition systems by exposing users to stimuli that are assumed to elicit a single unchanging emotional response. Moreover, in order to demonstrate better results, many models are tested in evaluation frameworks that do not reflect realistic real-world implementations. Consequently, in this paper, we explore the design of EEG-based emotion recognition systems using longer, variable stimuli using the publicly available AMIGOS dataset. Feature engineering and selection results are evaluated across four different cross-validation frameworks, including versions of leave-one-movie-out (testing with a known user, but a previously unseen movie), leave-one-person-out (testing with a known movie, but a previously unseen person), and leave-one-person-and-movie-out (testing on both a new user and new movie). Results of feature selection lead to a 13% absolute improvement over comparable previously reported studies, and demonstrate the importance of evaluation framework on the design and performance of EEG-based emotion recognition systems.