Random forests of axis-parallel decision trees still show competitive accuracy in various tasks; however, they have drawbacks that limit their applicability. Namely, they perform poorly for multidimensional sparse data that is frequently input in many text and image processing tasks. A straightforward solution is to create forests of decision trees with oblique splits; however, most training approaches have low performance or are hard to tune. Besides, those ensembles appear unstable and easily overfit, so they should be regularized to find the trade-off between complexity and generalization. This paper proposes an algorithm to train kernel decision trees and random forests. At the top level, this algorithm follows a common greedy procedure to train binary decision trees; however, it produces quasi-optimal linear and kernel splits at the decision stump level. At each stump, the algorithm finds a quasi-optimal distribution of classes to subtrees and trains this stump via optimization of an SVM-like loss function with a margin re-scaling approach, which helps simultaneously optimize the margin between subtree data and arbitrary impurity criteria. We also try to reveal uniform stability-based generalization bounds for those forests and apply them to select the regularization technique. The obtained bounds explicitly consider primary hyperparameters of forests, trees, and decision stumps. The bounds also show a relationship between outliers in decision tree structure, stability and generalization, and how forests smooth these outliers. We performed an experimental evaluation on several tasks, such as studying the reaction of social media users, image recognition, and bank scoring. The experimental results show that the proposed algorithm trains ensembles, which outperform other oblique or kernel forests in many commonly-recognized datasets. Namely, the proposed algorithm shows 99.1% accuracy on MNIST and 58.1% on CIFAR-10. It has been confirmed that the selected regularization technique helps reduce overfitting on several datasets. Therefore, the proposed algorithm may be considered a small step toward customized and regularized ensembles of oblique or kernel trees that keep reasonable training performance on large datasets. We believe that the proposed algorithm can be utilized as a construction element of approaches to training context-aware models on complex problems with integer inputs and outputs, such as text mining and image recognition.INDEX TERMS Generalization bounds, kernel splits, kernel forests, random forests, random uniform stability, regularization, slack re-scaling.