The localization of sound sources on the horizontal plane is mainly aided by perceived interaural level and time differences. However, identifying elevation cues in Head-Related Transfer Functions (HRTFs) remains challenging. Spectral cues play a key role in localizing sources in elevation and are highly individual, resulting from anatomic characteristics specific to each person, such as the shape of the pinnae, head, or torso. In a previous study, we proposed a simple 1D convolutional neural network (CNN) trained to classify HRTF signals into different elevation sectors to identify spectral elevation cues using explainability techniques. Although the model obtained promising results, it was only trained and validated on the CIPIC database. In this work, we focus on developing a model that can generalize across multiple HRTF datasets to achieve good classification performance across various subjects and measurements. Since each dataset is obtained in different conditions (e.g., source signal used, distance between emitters and receivers, spatial resolution, calibration), the preprocessing of the data may significantly impact the overall inter-dataset model performance. We explore different preprocessing techniques and evaluate their impact on the classification task to select meaningful standardization strategies for working with multiple HRTF datasets.