With the rapid development of telecommunication networks, the predictability of network traffic is of significant interest in network analysis and optimization, bandwidth allocation, and load balancing adjustment. Consequently, in recent years, significant research attention has been paid to forecasting telecommunication network traffic. Telecommunication traffic forecasting problems can be considered a time‐series problem, wherein periodic historical data is fed as the input to a model. Time‐series forecasting approaches are broadly categorized as statistical machine learning (ML) methods and their combinations. Statistical approaches forecast linear characteristics of time‐series data, unable to capture nonlinear and complex patterns. ML‐based approaches can model nonlinear characteristics of data. In recent years, hybrid approaches combining statistical and ML‐based approaches have been widely used to model linear and nonlinear data characteristics. However, the performance of these approaches highly depends on feature selection techniques and hyper‐parameter tuning of ML methods. A novel hybrid method is proposed for short‐term traffic forecasting based on feature selection and hyperparameter optimization to address this problem. It combines statistical and ML methods to model linear and nonlinear components of data. First, a novel feature selection technique, modified mutual information based on a linear combination of targets, is proposed to find the candidate input variables. Next, a combination of vector auto regressive moving average (VARMA), long short‐term memory (LSTM), and multilayer perceptron (MLP), called VARMA‐LSTM‐MLP forecaster, is suggested to forecast short‐term traffic. A hybrid metaheuristic algorithm, composed of firefly and BAT, is employed to find the optimal set of hyper‐parameter values. The proposed method is assessed by a real‐world dataset containing Tehran city's daily telecommunication data in IRAN. The evaluation results demonstrate that the proposed method outperforms the existing methods in terms of mean squared error and mean absolute error.