Feature selection is the process of decreasing the number of features in a dataset by removing redundant, irrelevant, and randomly class-corrected data features. By applying feature selection on large and highly dimensional datasets, the redundant features are removed, reducing the complexity of the data and reducing training time. The objective of this paper was to design an optimizer that combines the well-known metaheuristic population-based optimizer, the grey wolf algorithm, and the gradient descent algorithm and test it for applications in feature selection problems. The proposed algorithm was first compared against the original grey wolf algorithm in 23 continuous test functions. The proposed optimizer was altered for feature selection, and 3 binary implementations were developed with final implementation compared against the two implementations of the binary grey wolf optimizer and binary grey wolf particle swarm optimizer on 6 medical datasets from the UCI machine learning repository, on metrics such as accuracy, size of feature subsets,
F
-measure, accuracy, precision, and sensitivity. The proposed optimizer outperformed the three other optimizers in 3 of the 6 datasets in average metrics. The proposed optimizer showed promise in its capability to balance the two objectives in feature selection and could be further enhanced.