Many engineering applications in the automotive, aeronautic, rubber, mechanics, and manufacturing industries collect multiple datasets measuring physical relations between input variables and performances for modeling purposes. The challenge relies on that such data is often highly dimensional, non-linear and contain mixed variables, i.e., numerical and categorical features, requiring specific algorithms and encoding schemes to perform regression task efficiently. Moreover, defining an appropriated similarity criterion for mixed-type data is a non-trivial task, especially when it is meant to be used in regression problems. This paper discusses the use of different machine learning algorithms for regression problems, involving mixed-type variables across multiple datasets. We use tire-related datasets as a case study to perform a rigorous, statistically founded comparison of different machine learning algorithms with encoding schemes to handle mixed variables in the prediction of tire-performances across multiple tirerelated datasets. Friedman's statistic and Nemenyi post-hoc tests are used to test the significance of performance differences between techniques and encoding strategies. Our contributions come as a series of recommendations for handling efficiently mixed-type variables while achieving high performances on regression tasks over multiple datasets. Furthermore, we provide a flexible and efficient similarity function between tires useful for tire comparison, prediction, and retrieval tasks. INDEX TERMS mixed-type variables, categorical encoding, Friedman Nemenyi, regression algorithms I. INTRODUCTION M ACHINE learning (ML) in engineering applications has grown in popularity during the last decades [1]-[5]. Many industrial applications use ML tools to build regression models for product design, performance optimization, variable design, fault detection, quality assessment, and others. For instance, the rubber industry, [6] uses non-linear least squares to estimate the tire-road friction coefficient for tire design. In automotive design, [7] employs support vector regression in structural optimization to vehicle crashworthiness design. More recently [8] performed thermodynamics compressor performance modeling for engine design with neural networks and non-linear support vector regression. A common characteristic of engineering data is its tabularlike structure, where rows represent data examples, which are themselves described as a mixture of numerical and categorical, i.e., mixed-type variables. For instance, in car crashworthiness design [9], vehicle structures must be designed to absorb crash energy through structural deformation as much as possible and attenuate the impact force to lower levels when impact occurs. The design variables are thickness related (measured in mm) and steel hardness types. For instance, the B-pillar inner, reinforce, the floor side and door beltline, are all numerical variables. However material design variables associated to the steel hardness, i.e., meal, medium, or high str...