Affective state estimation is a research field that has gained increased attention from the research community in the last decade. Two of the main catalysts for this are the advancement in the data analysis using artificial intelligence and the availability of high-quality video. Unfortunately, benchmarks and public datasets are limited, thus making the development of new methodologies and the implementation of comparative studies essential. The current work presents the eSEE-d database, which is a resource to be used for emotional State Estimation based on Eye-tracking data. Eye movements of 48 participants were recorded as they watched 10 emotion-evoking videos, each of them followed by a neutral video. Participants rated four emotions (tenderness, anger, disgust, sadness) on a scale from 0 to 10, which was later translated in terms of emotional arousal and valence levels. Furthermore, each participant filled three self-assessment questionnaires. An extensive analysis of the participants’ answers to the questionnaires’ self-assessment scores as well as their ratings during the experiments is presented. Moreover, eye and gaze features were extracted from the low-level eye-recorded metrics, and their correlations with the participants’ ratings are investigated. Finally, we take on the challenge to classify arousal and valence levels based solely on eye and gaze features, leading to promising results. In particular, the Deep Multilayer Perceptron (DMLP) network we developed achieved an accuracy of 92% in distinguishing positive valence from non-positive and 81% in distinguishing low arousal from medium arousal. The dataset is made publicly available.