Abstract-Social discrimination (e.g., against females) arising from data mining techniques is a growing concern worldwide. In recent years, several methods have been proposed for making classifiers learned over discriminatory data discriminationaware. However, these methods suffer from two major shortcomings: (1) They require either modifying the discriminatory data or tweaking a specific classification algorithm and (2) They are not flexible w.r.t. discrimination control and multiple sensitive attribute handling. In this paper, we present two solutions for discrimination-aware classification that neither require data modification nor classifier tweaking. Our first and second solutions exploit, respectively, the reject option of probabilistic classifier(s) and the disagreement region of general classifier ensembles to reduce discrimination. We relate both solutions with decision theory for better understanding of the process. Our experiments using real-world datasets demonstrate that our solutions outperform existing state-ofthe-art methods, especially at low discrimination which is a significant advantage. The superior performance coupled with flexible control over discrimination and easy applicability to multiple sensitive attributes makes our solutions an important step forward in practical discrimination-aware classification.
Abstract-In data mining we often have to learn from biased data, because, for instance, data comes from different batches or there was a gender or racial bias in the collection of social data. In some applications it may be necessary to explicitly control this bias in the models we learn from the data. This paper is the first to study learning linear regression models under constraints that control the biasing effect of a given attribute such as gender or batch number. We show how propensity modeling can be used for factoring out the part of the bias that can be justified by externally provided explanatory attributes. Then we analytically derive linear models that minimize squared error while controlling the bias by imposing constraints on the mean outcome or residuals of the models. Experiments with discrimination-aware crime prediction and batch effect normalization tasks show that the proposed techniques are successful in controlling attribute effects in linear regression models.
Traffic incidents are nonrecurrent and pseudorandom events that disrupt the normal flow of traffic and create a bottleneck in the road network. The probability of incidents is higher during peak flow rates when the systemwide effect of incidents is most severe. Model-based solutions to the incident detection problem have not produced practical, useful results primarily because the complexity of the problem does not lend itself to accurate mathematical and knowledge-based representations. A new multiparadigm intelligent system approach is presented for the solution of the problem, employing advanced signal processing, pattern recognition, and classification techniques. The methodology effectively integrates fuzzy, wavelet, and neural computing techniques to improve reliability and robustness. A wavelet-based denoising technique is employed to eliminate undesirable fluctuations in observed data from traffic sensors. Fuzzy c-mean clustering is used to extract significant information from the observed data and to reduce its dimensionality. A radial basis function neural network (RBFNN) is developed to classify the denoised and clustered observed data. The new model produced excellent incident detection rates with no false alarms when tested using both real and simulated data.
A multiparadigm general methodology is advanced for development of reliable, efficient, and practical freeway incident detection algorithms. The performance of the new fuzzy-wavelet radial basis function neural network ͑RBFNN͒ freeway incident detection model of Adeli and Karim is evaluated and compared with the benchmark California algorithm #8 using both real and simulated data. The evaluation is based on three quantitative measures of detection rate, false alarm rate, and detection time, and the qualitative measure of algorithm portability. The new algorithm outperformed the California algorithm consistently under various scenarios. False alarms are a major hindrance to the widespread implementation of automatic freeway incident detection algorithms. The false alarm rate ranges from 0 to 0.07% for the new algorithm and from 0.53 to 3.82% for the California algorithm. The new fuzzy-wavelet RBFNN freeway incident detection model is a single-station pattern-based algorithm that is computationally efficient and requires no recalibration. The new model can be readily transferred without retraining and without any performance deterioration.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.