Cement-stabilized rammed earth (CSRE) is a sustainable construction material. The use of it allows for economizing on the cost of a structure. These two properties of CSRE are based on the fact that the soil used for the rammed mixture is usually dug close to the construction site, so it has random characteristics. That is the reason for the lack of widely accepted prescriptions for CSRE mixture, which could ascertain high enough compressive strength. Therefore, assessing which components of CSRE have the highest impact on its compressive strength becomes an important issue. There are three machine learning regression tools, i.e., artificial neural networks, decision tree, and random forest, used for predicting the compressive strength based on the relative content of CSRE composites (clay, silt, sand, gravel, cement, and water content). The database consisted of 434 samples of CSRE, which were prepared and crushed for testing purposes. Relatively low prediction errors of aforementioned models allowed for the use of explainable artificial intelligence tools (drop-out loss, mean squared error reduction, accumulated local effect) to rank the influence of the ingredients on the dependent variable—the compressive strength. Consistent results from all above-mentioned methods are discussed and compared to some statistical analysis of selected features. This innovative approach, helpful in designing the construction material is a solid base for reliable conclusions.
Many methods have been developed to understand complex predictive models and high expectations are placed on post-hoc model explainability. It turns out that such explanations are not robust nor trustworthy, and they can be fooled. This paper presents techniques for attacking Partial Dependence (plots, profiles, PDP), which are among the most popular methods of explaining any predictive model trained on tabular data. We showcase that PD can be manipulated in an adversarial manner, which is alarming, especially in financial or medical applications where auditability became a must-have trait supporting black-box models. The fooling is performed via poisoning the data to bend and shift explanations in the desired direction using genetic and gradient algorithms. To the best of our knowledge, this is the first work performing attacks on variable dependence explanations. The novel approach of using a genetic algorithm for doing so is highly transferable as it generalizes both ways: in a model-agnostic and an explanation-agnostic manner.
Many methods have been developed to understand complex predictive models and high expectations are placed on post-hoc model explainability. It turns out that such explanations are not robust nor trustworthy, and they can be fooled. This paper presents techniques for attacking Partial Dependence (plots, profiles, PDP), which are among the most popular methods of explaining any predictive model trained on tabular data. We showcase that PD can be manipulated in an adversarial manner, which is alarming, especially in financial or medical applications where auditability became a must-have trait supporting black-box machine learning. The fooling is performed via poisoning the data to bend and shift explanations in the desired direction using genetic and gradient algorithms. We believe this to be the first work using a genetic algorithm for manipulating explanations, which is transferable as it generalizes both ways: in a model-agnostic and an explanation-agnostic manner.
Finding optimal hyperparameters for the machine learning algorithm can often significantly improve its performance . But how to choose them in a time-efficient way? In this paper we present the protocol of generating benchmark data describing the performance of different ML algorithms with different hyperparameter configurations. Data collected in this way is used to study the factors influencing the algorithm's performance.This collection was prepared for the purposes of the study presented in the EPP paper Gosiewska et al. [2020]. We tested algorithms performance on dense grid of hyperparameters. Tested datasets and hyperparameters were chosen before any algorithm has run and were not changed. This is a different approach than the one usually used in hyperparameter tuning, where the selection of candidate hyperparameters depends on the results obtained previously. However, such selection allows for systematic analysis of performance sensitivity from individual hyperparameters.This resulted in a comprehensive dataset of such benchmarks that we would like to share. We hope, that computed and collected result may be helpful for other researchers. This paper describes the way data was collected. Here you can find benchmarks of 7 popular machine learning algorithms on 39 OpenML datasets.The detailed data forming this benchmark are available at: https://www.kaggle.com/mi2datalab/mementoml. Related datasetsKühn et al.[2018] introduced a benchmark of algorithms created for the OpenML repository. This dataset contains data about 6 algorithms written in R: glmnet, rpart, kknn, svm, ranger, xgboost. It also allows us to run some additional computations and obtain further results in a similar way. The Smith et al.[2014] provides a MongoDB database that has got data on an instance level. It contains predictions made for every single instance in the considered datasets, information about algorithms and its hyperparameters. There is also a possibility to extend this benchmark.mlpack benchmark Edel et al. [2014] contains data about performance of different algorithms in popular machine learning frameworks/libraries. It also provides comprehensive scripts for further evaluation. Algorithms, datasets and hyperparameters usedWe used several number of popular machine learning algorithms: gradient boosting on decision trees (catboost Prokhorenkova et al. [2017], gbm, xgboost ), generalized linear models (glmnet Friedman et al. [2010]), k nearest neighbours (kknn), random forests (randomforest Liaw and Wiener [2002], ranger
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.