An advantage of using eye tracking for diagnosis is that it is non-invasive and can be performed in individuals with different functional levels and ages. Computer/aided diagnosis using eye tracking data is commonly based on eye fixation points in some regions of interest (ROI) in an image. However, besides the need for every ROI demarcation in each image or video frame used in the experiment, the diversity of visual features contained in each ROI may compromise the characterization of visual attention in each group (case or control) and consequent diagnosis accuracy. Although some approaches use eye tracking signals for aiding diagnosis, it is still a challenge to identify frames of interest when videos are used as stimuli and to select relevant characteristics extracted from the videos. This is mainly observed in applications for autism spectrum disorder (ASD) diagnosis. To address these issues, the present paper proposes: (1) a computational method, integrating concepts of Visual Attention Model, Image Processing and Artificial Intelligence techniques for learning a model for each group (case and control) using eye tracking data, and (2) a supervised classifier that, using the learned models, performs the diagnosis. Although this approach is not disorder-specific, it was tested in the context of ASD diagnosis, obtaining an average of precision, recall and specificity of 90%, 69% and 93%, respectively.