Joris Pries scite author profile

Joris Pries

5Publications

10Citation Statements Received

111Citation Statements Given

How they've been cited

How they cite others

111

Affiliations

Centrum Wiskunde & Informatica

Publications

Order By: Most citations

The BP Dependency Function: a Generic Measure of Dependence between Random Variables

Berkelmans¹,

Pries²,

Bhulai³

et al. 2022

Preprint

View full text Add to dashboard Cite

Measuring and quantifying dependencies between random variables (RV's) can give critical insights into a data-set. Typical questions are: 'Do underlying relationships exist?', 'Are some variables redundant?', and 'Is some target variable Y highly or weakly dependent on variable X?' Interestingly, despite the evident need for a general-purpose measure of dependency between RV's, common practice of data analysis is that most data analysts use the Pearson correlation coefficient (PCC) to quantify dependence between RV's, while it is well-recognized that the PCC is essentially a measure for linear dependency only. Although many attempts have been made to define more generic dependency measures, there is yet no consensus on a standard, general-purpose dependency function. In fact, several ideal properties of a dependency function have been proposed, but without much argumentation. Motivated by this, in this paper we will discuss and revise the list of desired properties and propose a new dependency function that meets all these requirements. This general-purpose dependency function provides data analysts a powerful means to quantify the level of dependence between variables. To this end, we also provide Python code to determine the dependency function for use in practice.

show abstract

The Optimal Input-Independent Baseline for Binary Classification: The Dutch Draw

Pries¹,

Etienne²,

Klein³

et al. 2023

Preprint

View full text Add to dashboard Cite

Before any binary classification model is taken into practice, it is important to validate its performance on a proper test set.Without a frame of reference given by a baseline method, it is impossible to determine if a score is 'good' or 'bad'. The goal of this paper is to examine all baseline methods that are independent of feature values and determine which model is the 'best' and why. By identifying which baseline models are optimal, a crucial selection decision in the evaluation process is simplified. We prove that the recently proposed Dutch Draw baseline is the best input-independent classifier (independent of feature values) for all positional-invariant measures (independent of sequence order) assuming that the samples are randomly shuffled. This means that the Dutch Draw baseline is the optimal baseline under these intuitive requirements and should therefore be used in practice.

show abstract

The optimal input‐independent baseline for binary classification: The Dutch Draw

Pries

Etienne

Klein

et al. 2023

Statistica Neerlandica

View full text Add to dashboard Cite

Before any binary classification model is taken into practice, it is important to validate its performance on a proper test set. Without a frame of reference given by a baseline method, it is impossible to determine if a score is “good” or “bad.” The goal of this paper is to examine all baseline methods that are independent of feature values and determine which model is the “best” and why. By identifying which baseline models are optimal, a crucial selection decision in the evaluation process is simplified. We prove that the recently proposed Dutch Draw baseline is the best input‐independent classifier (independent of feature values) for all order‐invariant measures (independent of sequence order) assuming that the samples are randomly shuffled. This means that the Dutch Draw baseline is the optimal baseline under these intuitive requirements and should therefore be used in practice.

show abstract

The Berkelmans-Pries Feature Importance Method: A Generic Measure of Informativeness of Features

Pries¹,

Berkelmans²,

Bhulai³

et al. 2023

Preprint

View full text Add to dashboard Cite

Over the past few years, the use of machine learning models has emerged as a generic and powerful means for prediction purposes. At the same time, there is a growing demand for interpretability of prediction models. To determine which features of a dataset are important to predict a target variable Y , a Feature Importance (FI) method can be used. By quantifying how important each feature is for predicting Y , irrelevant features can be identified and removed, which could increase the speed and accuracy of a model, and moreover, important features can be discovered, which could lead to valuable insights. A major problem with evaluating FI methods, is that the ground truth FI is often unknown. As a consequence, existing FI methods do not give the exact correct FI values. This is one of the many reasons why it can be hard to properly interpret the results of an FI method. Motivated by this, we introduce a new global approach named the Berkelmans-Pries FI method, which is based on a combination of Shapley values and the Berkelmans-Pries dependency function. We prove that our method has many useful properties, and accurately predicts the correct FI values for several cases where the ground truth FI can be derived in an exact manner. We experimentally show for a large collection of FI methods (468) that existing methods do not have the same useful properties. This shows that the Berkelmans-Pries FI method is a highly valuable tool for analyzing datasets with complex interdependencies.

show abstract

The Berkelmans–Pries dependency function: A generic measure of dependence between random variables

et al. 2023

View full text Add to dashboard Cite

Measuring and quantifying dependencies between random variables (RVs) can give critical insights into a dataset. Typical questions are: ‘Do underlying relationships exist?’, ‘Are some variables redundant?’, and ‘Is some target variable Y highly or weakly dependent on variable X?’ Interestingly, despite the evident need for a general-purpose measure of dependency between RVs, common practice is that most data analysts use the Pearson correlation coefficient to quantify dependence between RVs, while it is recognized that the correlation coefficient is essentially a measure for linear dependency only. Although many attempts have been made to define more generic dependency measures, there is no consensus yet on a standard, general-purpose dependency function. In fact, several ideal properties of a dependency function have been proposed, but without much argumentation. Motivated by this, we discuss and revise the list of desired properties and propose a new dependency function that meets all these requirements. This general-purpose dependency function provides data analysts with a powerful means to quantify the level of dependence between variables. To this end, we also provide Python code to determine the dependency function for use in practice.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Joris Pries

The BP Dependency Function: a Generic Measure of Dependence between Random Variables

The Optimal Input-Independent Baseline for Binary Classification: The Dutch Draw

The optimal input‐independent baseline for binary classification: The Dutch Draw

The Berkelmans-Pries Feature Importance Method: A Generic Measure of Informativeness of Features

The Berkelmans–Pries dependency function: A generic measure of dependence between random variables

Contact Info

Product

Resources

About