2009
DOI: 10.1007/978-0-387-84858-7
|View full text |Cite
|
Sign up to set email alerts
|

The Elements of Statistical Learning

Abstract: This study demonstrates the importance of obtaining statistically stable results when using machine learning methods to predict the activity of antimicrobial peptides, due to the cost and complexity of the chemical processes involved in cases where datasets are particularly small (less than a few hundred instances). Like in other fields with similar problems, this results in large variability in the performance of predictive models, hindering any attempt to transfer them to lab practice. Rather than targeting … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

42
10,498
0
272

Year Published

2010
2010
2024
2024

Publication Types

Select...
9
1

Relationship

0
10

Authors

Journals

citations
Cited by 20,750 publications
(10,812 citation statements)
references
References 8 publications
42
10,498
0
272
Order By: Relevance
“…To avoid overfitting the data, extensive validation is needed (Hastie et al 2009). For example, test set validation divides a bucket table into two parts, one serving as model set and the other as an independent test set.…”
Section: Resultsmentioning
confidence: 99%
“…To avoid overfitting the data, extensive validation is needed (Hastie et al 2009). For example, test set validation divides a bucket table into two parts, one serving as model set and the other as an independent test set.…”
Section: Resultsmentioning
confidence: 99%
“…To account for the uncertainty in the relationship between the ICF categories and the GH score due to the convenience sample of patients and missing values in the independent variables, especially in the ICF categories, the data set was imputed 1,000 times, and then a bootstrap sample (i.e., a random sample with replacement and of equal size to the original data set) (38) was drawn from each imputed data set and group lasso was applied to each of the 1,000 resulting data sets. Based on this procedure, the mean of the regression coefficients over the 1,000 trials and their pointwise 90% (percentile) confidence intervals were obtained.…”
Section: Data Collectionmentioning
confidence: 99%
“…Contributions made by both individual researchers and teams can thus extend the data infrastructure, adding components that can then be used by later researchers. For example, many of the ideas being developed elsewhere in the frontier data world(Hastie et al, 2009), such as scraping and mining CVs; scraping sites such as Linked in and department or company websites could be applied here, as could many of the techniques being developed in econometrics (Varian, 2014). …”
Section: Using the Data: Building A Community Of Practicementioning
confidence: 99%