Abraham Itzhak Weinberg scite author profile

Abraham Itzhak Weinberg

5Publications

38Citation Statements Received

90Citation Statements Given

How they've been cited

How they cite others

108

Affiliations

Ben-Gurion University of the Negev, Bar-Ilan University, Aston University

Publications

Order By: Most citations

Selecting a representative decision tree from an ensemble of decision-tree models for fast big data classification

Weinberg

Last

2019

J Big Data

View full text Add to dashboard Cite

With the vast growth of information volume and variety in the recent years, many organizations focus on big data platforms and technologies [6]. In order to train machine learning algorithms on big data there is a need for a distributed framework such as MAPREDUCE, which can induce in parallel multiple models out of small subsets of massive-scale training data, which cannot fit into the memory of a single machine. Here, we limit our discussion to the model combining phase of distributed data processing, known as REDUCE. More specifically, we focus on induction of decision tree models.

show abstract

Interpretable decision-tree induction in a big data parallel framework

Weinberg

2017

View full text Add to dashboard Cite

When running data-mining algorithms on big data platforms, a parallel, distributed framework, such as MAPREDUCE, may be used. However, in a parallel framework, each individual model fits the data allocated to its own computing node without necessarily fitting the entire dataset. In order to induce a single consistent model, ensemble algorithms such as majority voting, aggregate the local models, rather than analyzing the entire dataset directly. Our goal is to develop an efficient algorithm for choosing one representative model from multiple, locally induced decision-tree models. The proposed SySM (syntactic similarity method) algorithm computes the similarity between the models produced by parallel nodes and chooses the model which is most similar to others as the best representative of the entire dataset. In 18.75% of 48 experiments on four big datasets, SySM accuracy is significantly higher than that of the ensemble; in about 43.75% of the experiments, SySM accuracy is significantly lower; in one case, the results are identical; and in the remaining 35.41% of cases the difference is not statistically significant. Compared with ensemble methods, the representative tree models selected by the proposed methodology are more compact and interpretable, their induction consumes less memory, and, as confirmed by the empirical results, they allow faster classification of new records.

show abstract

Interpretable AI for bio-medical applications

Sathyan¹,

Weinberg²,

Cohen³

2022

Complex Eng Syst

View full text Add to dashboard Cite

This paper presents the use of two popular explainability tools called Local Interpretable Model-Agnostic Explanations (LIME) and Shapley Additive exPlanations (SHAP) to explain the predictions made by a trained deep neural network. The deep neural network used in this work is trained on the UCI Breast Cancer Wisconsin dataset. The neural network is used to classify the masses found in patients as benign or malignant based on 30 features that describe the mass. LIME and SHAP are then used to explain the individual predictions made by the trained neural network model. The explanations provide further insights into the relationship between the input features and the predictions. SHAP methodology additionally provides a more holistic view of the effect of the inputs on the output predictions. The results also present the commonalities between the insights gained using LIME and SHAP. Although this paper focuses on the use of deep neural networks trained on UCI Breast Cancer Wisconsin dataset, the methodology can be applied to other neural networks and architectures trained on other applications. The deep neural network trained in this work provides a high level of accuracy. Analyzing the model using LIME and SHAP adds the much desired benefit of providing explanations for the recommendations made by the trained model.

show abstract

SubStrat

2022

View full text Add to dashboard Cite

Automated machine learning (AutoML) frameworks have become important tools in the data scientist's arsenal, as they dramatically reduce the manual work devoted to the construction of ML pipelines. Such frameworks intelligently search among millions of possible ML pipelines - typically containing feature engineering, model selection, and hyper parameters tuning steps - and finally output an optimal pipeline in terms of predictive accuracy. However, when the dataset is large, each individual configuration takes longer to execute, therefore the overall AutoML running times become increasingly high. To this end, we present SubStrat, an AutoML optimization strategy that tackles the data size, rather than configuration space. It wraps existing AutoML tools, and instead of executing them directly on the entire dataset, SubStrat uses a genetic-based algorithm to find a small yet representative data subset that preserves a particular characteristic of the full data. It then employs the AutoML tool on the small subset, and finally, it refines the resulting pipeline by executing a restricted, much shorter, AutoML process on the large dataset. Our experimental results, performed on three popular AutoML frameworks, Auto-Sklearn, TPOT, and H2O show that SubStrat reduces their running times by 76.3% (on average), with only a 4.15% average decrease in the accuracy of the resulting ML pipeline.

show abstract

Dynamic Fusion of Electromyographic and Electroencephalographic Data towards Use in Robotic Prosthesis Control

Pritchard

Weinberg

Williams

et al. 2021

J. Phys.: Conf. Ser.

View full text Add to dashboard Cite

We demonstrate improved performance in the classification of bioelectric data for use in systems such as robotic prosthesis control, by data fusion using low-cost electromyography (EMG) and electroencephalography (EEG) devices. Prosthetic limbs are typically controlled through EMG, and whilst there is a wealth of research into the use of EEG as part of a brain-computer interface (BCI) the cost of EEG equipment commonly prevents this approach from being adopted outside the lab. This study demonstrates as a proof-of-concept that multimodal classification can be achieved by using low-cost EMG and EEG devices in tandem, with statistical decision-level fusion, to a high degree of accuracy. We present multiple fusion methods, including those based on Jensen-Shannon divergence which had not previously been applied to this problem. We report accuracies of up to 99% when merging both signal modalities, improving on the best-case single-mode classification. We hence demonstrate the strengths of combining EMG and EEG in a multimodal classification system that could in future be leveraged as an alternative control mechanism for robotic prostheses.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.