Material Science, the science of studying materials and their properties, involves many aspects such as performing experiments to calculate certain physical properties. Scientists are always looking to utilise the collected experimental data in order to make predictions for new points, where the studied property is unknown. Using a computer model to make these predictions, whether it is via a machine learning or mathematical approach, is the desirable option, since doing actual experiments have proven to be very costly and time consuming. We are therefore looking at utilising the vast quantity of pre-collected data in the literature in order to build models for making future predictions. We already know that the Gaussian process regression interpolation technique gives accurate predictions for some physical properties. However, it is also the slowest of the machine learning algorithms and not suitable for on-line applications. For on-line learning, making quick and accurate predictions is essential. In this research we propose a novel strategy, including batch query processing and co-clustering, to achieve a scalable and efficient Gaussian process regression. This new approach, called the scalable Gaussian process (SGP), allows the use of large databases and makes it suitable for on-line applications. The proposed strategy is applied to a real application involving the prediction of materials properties. Results demonstrate the high accuracy and efficiency of our approach. We test and compare SGP with five different machine learning models on materials properties databases and make recommendations accordingly, also demonstrating that prior knowledge of the problem is essential when choosing a machine learning model.As one could expect, databases consisting of experimental data are noisy since they rely on human measurements, and also because they are an amalgamation of various independent sources (research papers). Therefore, some conflicting information can be found between the various sources. In our research we also introduce a novel truth discovery approach to reduce the amount of noise and filter the incorrect conflicting information hidden in scientific databases. Our method ranks the multiple data sources by considering the relationships between them, i.e., the amount of conflicting information and the amount of agreement, and as well eliminates the conflicting information. Our previously introduced technique, SGP, is ii then applied to the clean dataset to make predictions. We compare the prediction accuracy before and after pruning the databases. With our new approach, we are able to highly improve the accuracy of SGP predictions and provide a more reliable database. Our results also prove the extreme robustness of SGP, as we demonstrate that a relatively high amount of noise is handled very well by this technique.iii