Travel-time prediction holds significant importance in Intelligent Transportation Systems (ITS), providing essential information for tasks such as accident detection and congestion control. While data-driven methods are commonly used for travel-time prediction, the accuracy of predictions heavily relies on the selection of appropriate features. In this study, a two-stage methodology for travel time prediction is introduced, comprising a novel feature selection method called OA2DD with two layers of optimization and a layer of data-driven predictive methods. In the first stage (offline process), the optimal set of features and architecture for the machine learning model is selected using interconnected optimization algorithms. In the second stage (real-time process), travel time prediction is performed using new data from unseen parts of the dataset. The method is applied to a case study involving the M50 motorway in Dublin. Additionally, several wrapper feature selection methods are employed to assess and validate its performance. Results show that the proposed method has a better convergence curve and reduces the number of selected features by up to half, which reduces the computational cost of prediction process up to 56%. Moreover, employing the selected features from the OA2DD method leads to a reduction in predication error by up to 29% compared to the full set of features and the other feature selection methods.