2013
DOI: 10.48550/arxiv.1309.0238
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

API design for machine learning software: experiences from the scikit-learn project

Abstract: scikit-learn is an increasingly popular machine learning library. Written in Python, it is designed to be simple and efficient, accessible to non-experts, and reusable in various contexts. In this paper, we present and discuss our design choices for the application programming interface (API) of the project. In particular, we describe the simple and elegant interface shared by all learning and processing units in the library and then discuss its advantages in terms of composition and reusability. The paper als… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
231
0
2

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 230 publications
(234 citation statements)
references
References 11 publications
1
231
0
2
Order By: Relevance
“…Parallelization for multi-core execution is also available for a set of algorithms using joblib. Inspired by scikit-learn's API design (Buitinck et al, 2013), all implemented outlier detection algorithms inherit from a base class with the same interface: (i) fit processes the train data and computes the necessary statistics; (ii) decision function generates raw outlier scores for unseen data after the model is fitted; (iii) predict returns a binary class label corresponding to each input sample instead of the raw outlier score and (iv) predict proba offers the result as a probability using either normalization or Unification (Kriegel et al, 2011). Within this framework, new models are easy to implement by taking advantage of inheritance and polymorphism.…”
Section: Library Design and Implementationmentioning
confidence: 99%
“…Parallelization for multi-core execution is also available for a set of algorithms using joblib. Inspired by scikit-learn's API design (Buitinck et al, 2013), all implemented outlier detection algorithms inherit from a base class with the same interface: (i) fit processes the train data and computes the necessary statistics; (ii) decision function generates raw outlier scores for unseen data after the model is fitted; (iii) predict returns a binary class label corresponding to each input sample instead of the raw outlier score and (iv) predict proba offers the result as a probability using either normalization or Unification (Kriegel et al, 2011). Within this framework, new models are easy to implement by taking advantage of inheritance and polymorphism.…”
Section: Library Design and Implementationmentioning
confidence: 99%
“…The pipelines in the motivating examples are depicted in Figure 1, which follows the representation provided by Yang et al [71]. In this paper, we adapted the canonical definition of pipeline from Scikit-Learn pipeline specification [14,63], which is aligned with the ML models studied in the literature for fair classification tasks [3,8,10,26,27,71]. We are interested in investigating the fairness of the data preprocessing stages in the pipeline, which is depicted with grey boxes in Figure 1.…”
Section: Pipelinementioning
confidence: 99%
“…A data transformer is a well-known algorithm or method to perform a specific operation such as variable encoding, feature selection, feature extraction, dimensionality reduction, etc. on the data [14]. For example, in the second motivating example, two transformers (PCA and SelectKBest) have been used.…”
Section: Pipelinementioning
confidence: 99%
See 1 more Smart Citation
“…The API closely follows that of scikit-learn [20] to make the package accessible to those with even basic knowledge of machine learning in Python [21]. The main object type in mvlearn is the estimator object, which is modeled after scikit-learn's estimator.…”
Section: Api Designmentioning
confidence: 99%