<p>Machine Learning (ML) is a topic of interest in computer science. Amazon AWS, Google Cloud and Microsoft Azure are examples of cloud platforms that provide ML services. Analysts can easily analyze data using online ML services, and most of the model processing is done using cloud services to provide fast and reliable results. Many of these services offer similar functionalities, and the challenge is how to select the best service for the data. Traditional approaches for service-selection depend only on general understanding of the service functionality. A better way is to consider both functional and non-functional requirements. Functional requirements determine overall behaviour of a service. Non-functional requirements establish how relevant a service is to the user’s query and refer to the Quality of Service (QoS) attribute. QoS- based service-selection has been studied in the service computing community for some time. However, characteristics of the input dataset are not usually considered in the selection process even though they might affect the QoS values of the service. In this dissertation, we investigate the impact of adding dataset features and other side information on the performance of QoS prediction and service recommendation. We focus on ML services since their QoS values are potentially highly dependent on the dataset.</p>
<p>We propose two approaches for ML service recommendation and compare their performances. The first approach uses factorization for web service recommendation. We identify latent features of the datasets and the services and then recommend services by exploiting these latent variables. The second approach uses neural networks to identify latent features. We also integrate two sets of side information (dataset and service) in both approaches, and study the effect of these added features. In the experiment, we test our system using the real QoS data of 24 classification models running on 390 datasets downloaded from OpenML. In both implementations, models with side information outperform the basic model. To guarantee the best performance for our model, adding side information is necessary, which increases the predictive accuracy by 5% to 25%. Thus, we recommend integrating the side information in recommender systems, specifically including dataset features when recommending ML services. </p>