Composable Multi-Threading and Multi-Processing for Numeric Libraries

Malakhov, Anton; Liu, David; Anton, Gorshkov,; Wilmarth, Terry

doi:10.25080/majora-4af1f417-003

Cited by 7 publications

(5 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Order By: Relevance

“…38 To tune the hyperparameters of our models, we used the TPE approach implemented in the Optuna framework. 39 We have used Intel Distribution for Python and Python API for Intel Data Analytics Acceleration Library (Intel DAAL)-named PyDAAL 40 -to boost ML and data analytics performance. Using the advantage of optimized Scikit-learn (Scikit-learn with Intel DAAL) that comes with it, we were able to achieve faster training time and accurate results for the prediction problem.…”

Section: Model Buildingmentioning

confidence: 99%

Accurate prediction of B-form/A-form DNA conformation propensity from primary sequence: A machine learning and free energy handshake

2021

View full text Add to dashboard Cite

Summary DNA carries the genetic code of life, with different conformations associated with different biological functions. Predicting the conformation of DNA from its primary sequence, although desirable, is a challenging problem owing to the polymorphic nature of DNA. We have deployed a host of machine learning algorithms, including the popular state-of-the-art LightGBM (a gradient boosting model), for building prediction models. We used the nested cross-validation strategy to address the issues of “overfitting” and selection bias. This simultaneously provides an unbiased estimate of the generalization performance of a machine learning algorithm and allows us to tune the hyperparameters optimally. Furthermore, we built a secondary model based on SHAP (SHapley Additive exPlanations) that offers crucial insight into model interpretability. Our detailed model-building strategy and robust statistical validation protocols tackle the formidable challenge of working on small datasets, which is often the case in biological and medical data.

show abstract

Section: Model Buildingmentioning

confidence: 99%

Accurate prediction of B-form/A-form DNA conformation propensity from primary sequence: A machine learning and free energy handshake

2021

View full text Add to dashboard Cite

show abstract

“…We have used Intel Distribution for Python and Python API for Intel Data Analytics Acceleration Library (Intel DAAL) -named PyDAAL 30 -to boost machine-learning and data analytics performance. Using the advantage of optimised scikit-learn (Scikit-learn with Intel DAAL) that comes with it, we were able to achieve faster training time and accurate results for the prediction problem.…”

Section: (D) Splitting the Train And Test Datamentioning

confidence: 99%

Accurate Prediction of B-form/A-form DNA Conformation Propensity from Primary Sequence: A Machine Learning and Free energy Handshake

Gupta¹,

Kulkarni²,

Mukherjee

2020

Preprint

View full text Add to dashboard Cite

<div> <div> <div> <p>DNA carries the genetic code of life. Different conformations of DNA are associated with various biological functions. Predicting the conformation of DNA from its primary sequence, although desirable, is a challenging problem owing to the polymorphic nature of DNA. Although a few efforts were made in this regard, currently there exists no method that can accurately predict the conformation of right- handed DNA solely from the sequence. In this study, we present a novel approach based on machine learning that predicts A-DNA and B-DNA conformational propensities of a sequence with high accuracy (~95%). In addition, we show that the impact of the dinucleotide steps in determining the conformation agrees qualitatively with the free energy cost for A-DNA formation in water. This method enables us to examine the genomic sequence to understand the prospective biological roles played by the A-form of DNA. </p> </div> </div> </div>

show abstract

“…The application of parallel and multiprocessor algorithms can break down significant numerical problems into smaller subtasks, reducing the total computation time on multiprocessor computers and resulting in better performance [23]. In dealing with this parallel computing problem, the concept of a processing "pool" is used: "tasks" (data) are forwarded in bulk to the pool, and the pool handles the distribution of tasks to a number of available worker processes [27]- [29].…”

Section: Introductionmentioning

confidence: 99%

Evaluating Web Scraping Performance Using XPath, CSS Selector, Regular Expression, and HTML DOM With Multiprocessing Technical Applications

Darmawan

Maulana²,

Gunawan³

et al. 2022

JOIV : Int. J. Inform. Visualization

View full text Add to dashboard Cite

Data collection has become a necessity today, especially since many sources of data on the internet can be used for various needs. The main activity in data collection is collecting quality information that can be analyzed and used to support decisions or provide evidence. The process of retrieving data from the internet is also known as web scraping. There are various methods of web scraping that are commonly used. The amount of data scattered on the internet will be quite time-consuming if the web scraping is done on a large scale. By applying the parallel concept, the multi-processing approach can help complete a job. This study aimed to determine the performance of the web scraping method with the application of multi-processing. Testing is done by doing the process of scraping data from a predetermined target web. Four web scraping methods: CSS Selector, HTML DOM, Regex, and XPath, were selected to be used in the experiment measured based on the parameters of CPU usage, memory usage, execution time, and bandwidth usage. Based on experimental data, the Regex method has the least CPU and memory usage compared to other methods. While XPath requires the least time compared to other methods. The CSS Selector method is the smallest in terms of bandwidth usage compared to other methods. The application of multi-processing techniques to each web scraping method is proven to save memory usage, reduce execution time and reduce bandwidth usage compared to only using single processing.

show abstract

Composable Multi-Threading and Multi-Processing for Numeric Libraries

Cited by 7 publications

References 1 publication

Accurate prediction of B-form/A-form DNA conformation propensity from primary sequence: A machine learning and free energy handshake

Accurate prediction of B-form/A-form DNA conformation propensity from primary sequence: A machine learning and free energy handshake

Accurate Prediction of B-form/A-form DNA Conformation Propensity from Primary Sequence: A Machine Learning and Free energy Handshake

Evaluating Web Scraping Performance Using XPath, CSS Selector, Regular Expression, and HTML DOM With Multiprocessing Technical Applications

Contact Info

Product

Resources

About