Data is useless without data science. This rapidly developing field aims to extract knowledge and understanding from any kind of data. It gives meaning to the bits and bytes in this world. In an age where we have an abundance of data, techniques from data science have had many success stories. Understanding what the data tells us is extremely useful for gaining insights and making predictions. The focus of many data scientists is on predictive methods for practical applications. Classification and regression techniques are able to automatically learn from data in order to make predictions about unseen data. In this dissertation, we strive to solve general problems that are a fundamental yet underexposed part of data science. We question commonly used techniques and develop better alternatives. Our research gives practitioners the means to gain accurate insights and draw meaningful conclusions.
This dissertation consists of topics in the fields of artificial intelligence, machine learning, statistics, and data analysis. The first part of the dissertation is about face generators and active learning. A face generator is evaluated with a humanlike approach and a pioneering study is done to improve labeling of pairwise distance datasets that can be used to advance face recognition and likeness methods. The second part is about benchmarking binary classification methods, where we introduce a new baseline approach. This baseline can even be theoretically derived for most common measures. Furthermore, we prove that it is the best baseline that does not use any feature values. The third part consists of two important subjects in data analysis and statistics. Accurately quantifying how dependent one variable is on another variable is a fundamental part of many studies. Additionally, determining how important a feature is for predicting a target variable is crucial for understanding the data.