Accelerating a Random Forest Classifier: Multi-Core, GP-GPU, or FPGA?

Essen, Brian Van; Macaraeg, Chris; Gokhale, Maya; Prenger, Ryan

doi:10.1109/fccm.2012.47

Cited by 142 publications

(83 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…To date, several non-chemogenomic studies have shown that the number of trees in a random forest could be reduced to a certain degree without losing model performance [64][65][66][67][68], and often rule-of-thumb suggestions are made and accepted within the different computational communities, which allow model training without previous parameter optimization. To probe whether parameter optimization might be an advantageous step instead of accepting rule-of-thumb guidelines, we investigated how chemogenomic active learning performance changed when reducing the number of trees.…”

Section: Discussionmentioning

confidence: 99%

Small Random Forest Models for Effective Chemogenomic Active Learning

Rakers

Reker

Brown

2017

J. Comput. Aided Chem.

View full text Add to dashboard Cite

The identification of new compound-protein interactions has long been the fundamental quest in the field of medicinal chemistry. With increasing amounts of biochemical data, advanced machine learning techniques such as active learning have been proven to be beneficial for building high-performance prediction models upon subsets of such complex data. In a recently published paper, chemogenomic active learning had been applied to the interaction spaces of kinases and G protein-coupled receptors featuring over 150,000 compound-protein interactions. Prediction models were actively trained based on random forest classification using 500 decision trees per experiment. In a new direction for chemogenomic active learning, we address the question of how forest size influences model evolution and performance. In addition to the original chemogenomic active learning findings that highly predictive models could be constructed from a small fraction of the available data, we find here that that model complexity as viewed by forest size can be reduced to one-fourth or one-fifth of the previously investigated forest size while still maintaining reliable prediction performance. Thus, chemogenomic active learning can yield predictive models with reduced complexity based on only a fraction of the data available for model construction.

show abstract

Section: Discussionmentioning

confidence: 99%

Small Random Forest Models for Effective Chemogenomic Active Learning

Rakers

Reker

Brown

2017

J. Comput. Aided Chem.

View full text Add to dashboard Cite

show abstract

“…Our implementation of the random forest classifier uses the version provided by the Weka machine learning library 5 [16], which is a collection of algorithms for machine learning and data mining. We chose the random forest approach, because it is fast and achieves good results [49]. It is important to point out that for this step, another classification algorithm can also be used.…”

Section: Multi-class Global-feature-based Eirmentioning

confidence: 99%

Efficient disease detection in gastrointestinal videos – global features versus neural networks

Pogorelov

Riegler

Eskeland

et al. 2017

Multimed Tools Appl

View full text Add to dashboard Cite

Analysis of medical videos from the human gastrointestinal (GI) tract for detection and localization of abnormalities like lesions and diseases requires both high precision and recall. Additionally, it is important to support efficient, real-time processing for live feedback during (i) standard colonoscopies and (ii) scalability for massive population-based screening, which we conjecture can be done using a wireless video capsule endoscope (camera-pill). Existing related work in this field does neither provide the necessary combination of accuracy and performance for detecting multiple classes of abnormalities simultaneously nor for particular disease localization tasks. In this paper, a complete end-toend multimedia system is presented where the aim is to tackle automatic analysis of GI tract videos. The system includes an entire pipeline ranging from data collection, processing and analysis, to visualization. The system combines deep learning neural networks, information retrieval, and analysis of global and local image features in order to implement multi-class classification, detection and localization. Furthermore, it is built in a modular way, so that it can be easily extended to deal with other types of abnormalities. Simultaneously, the system is developed for efficient processing in order to provide real-time feedback to the doctors and for scalability reasons when potentially applied for massive population-based algorithmic screenings in the future. Initial experiments show that our system has multiclass detection accuracy and polyp localization precision at least as good as state-of-the-art systems, and provides additional novelty in terms of real-time performance, low resource consumption and ability to extend with support for new classes of diseases.

show abstract

“…It is notable that in most systems this design will be I/O It is notable that all tested software implementations were single threaded, but decision tree ensembles can be effectively implemented as multithreaded programs running on multicore processors [17].…”

Section: Resource Usagementioning

confidence: 99%

“…Parallel implementations of random forests for URL classification on multicore CPU, GPGPU and FPGA are presented in [17]. Publicly available dataset is used by the au- …”

Section: Related Workmentioning

confidence: 99%

“…In recent years, many parallel implementations of both random forest learning and prediction were presented, targeted at multicore CPUs, GPGPUs and FPGAs [11,13,14,15,17]. Each sample from input data set is classified independently of others, making it possible to exploit datalevel parallelism by processing different samples by the same classification/regression tree simultaneously.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

FPGA Implementation of Decision Trees and Tree Ensembles for Character Recognition in Vivado Hls

Kułaga

Gorgon

2014

Image Processing &Amp; Communications

View full text Add to dashboard Cite

Abstract. Decision trees and decision tree ensembles are popular machine learning methods, used for classification and regression. In this paper, an FPGA implementation of decision trees and tree ensembles for letter and digit recognition in Vivado High-Level Synthesis is presented. Two publicly available datasets were used at both training and testing stages. Different optimizations for tree code and tree node layout in memory are considered. Classification accuracy, throughput and resource usage for different training algorithms, tree depths and ensemble sizes are discussed. The correctness of the module's operation was verified using C/RTL cosimulation and on a Zynq-7000 SoC device, using Xillybus IP core for data transfer between the processing system and the programmable logic.

show abstract

Accelerating a Random Forest Classifier: Multi-Core, GP-GPU, or FPGA?

Cited by 142 publications

References 7 publications

Small Random Forest Models for Effective Chemogenomic Active Learning

Small Random Forest Models for Effective Chemogenomic Active Learning

Efficient disease detection in gastrointestinal videos – global features versus neural networks

FPGA Implementation of Decision Trees and Tree Ensembles for Character Recognition in Vivado Hls

Contact Info

Product

Resources

About