2019
DOI: 10.1126/sciadv.aau6792
|View full text |Cite
|
Sign up to set email alerts
|

Machine learning in a data-limited regime: Augmenting experiments with synthetic data uncovers order in crumpled sheets

Abstract: Machine learning has gained widespread attention as a powerful tool to identify structure in complex, high-dimensional data. However, these techniques are ostensibly inapplicable for experimental systems where data are scarce or expensive to obtain. Here, we introduce a strategy to resolve this impasse by augmenting the experimental dataset with synthetically generated data of a much simpler sister system. Specifically, we study spontaneously emerging local order in crease networks of crumpled thin sheets, a p… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
47
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 63 publications
(48 citation statements)
references
References 45 publications
1
47
0
Order By: Relevance
“…However, the majority of machine-learning methods require data sets that are orders of magnitude larger than those gathered in "omics" studies with limited number of patients. This is why we decided to augment our data set with computer-generated and artificially noised data to train different deep learning algorithms [48,51]. Using this approach with bile lipidomic data we selected two sets of features, lipid species, that when analyzed with NN allowed a very good separation between control patients and those with CCA or PDAC-related strictures.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…However, the majority of machine-learning methods require data sets that are orders of magnitude larger than those gathered in "omics" studies with limited number of patients. This is why we decided to augment our data set with computer-generated and artificially noised data to train different deep learning algorithms [48,51]. Using this approach with bile lipidomic data we selected two sets of features, lipid species, that when analyzed with NN allowed a very good separation between control patients and those with CCA or PDAC-related strictures.…”
Section: Discussionmentioning
confidence: 99%
“…On the other hand, the data structure has to be maintained. Biological data is full of correlated variables and it is important to maintain that relationship [51]. Once the synthetic data was generated as described in Materials and Methods, we applied three different reduction approaches for feature selection: DAPC, random forest (RF) and AUC analyses.…”
Section: Application Of Machine-learning Methods To Metabolomic Data mentioning
confidence: 99%
See 1 more Smart Citation
“…This result stems from the random training order of a randomly selected small dataset, e.g., 60 or 300 examples, consisting of a balanced appearance for each label. Around equalized trained label appearances, there are more temporal fluctuations for a dataset involving 300 examples with 30 appearances for conclusions Based on increased η with coherent consecutive gradients, the brain-inspired accelerated-learning mechanism outperforms existing common ML strategies for small sets of training examples 20 . Consistent results occur across various cost functions, e.g., square cost-function, however, with a relatively diminished performance (Fig.…”
mentioning
confidence: 95%
“…The importance of the descriptor selection from physical considerations has been observed for a diverse set of materials science applications; [1][2][3][4][5][16][17][18][19][20][21][22][23][24][25][26] however, it is not always possible to find the relevant physical descriptors for the desired application. Furthermore, even if physical descriptors have been identified, they are not always easily accessible.…”
Section: Article Scitationorg/journal/jcpmentioning
confidence: 99%