2017
DOI: 10.1109/tkde.2017.2720168
|View full text |Cite
|
Sign up to set email alerts
|

Theory-Guided Data Science: A New Paradigm for Scientific Discovery from Data

Abstract: Data science models, although successful in a number of commercial domains, have had limited applicability in scientific problems involving complex physical phenomena. Theory-guided data science (TGDS) is an emerging paradigm that aims to leverage the wealth of scientific knowledge for improving the effectiveness of data science models in enabling scientific discovery. The overarching vision of TGDS is to introduce scientific consistency as an essential component for learning generalizable models. Further, by … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
586
0
2

Year Published

2017
2017
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 1,011 publications
(651 citation statements)
references
References 76 publications
3
586
0
2
Order By: Relevance
“…, Turner and Carpenter ); and the melding of data‐driven models with theory‐based ecological models that facilitate both data mining and interpretation (Karpatne et al. ). These computing approaches, which all represent significant breakthroughs in the realm of computer science, are beginning to be used by ecologists, but are not yet well‐integrated into ecological research (Porter et al.…”
Section: Increasing Collaboration Between Computer Scientists and Ecomentioning
confidence: 99%
See 1 more Smart Citation
“…, Turner and Carpenter ); and the melding of data‐driven models with theory‐based ecological models that facilitate both data mining and interpretation (Karpatne et al. ). These computing approaches, which all represent significant breakthroughs in the realm of computer science, are beginning to be used by ecologists, but are not yet well‐integrated into ecological research (Porter et al.…”
Section: Increasing Collaboration Between Computer Scientists and Ecomentioning
confidence: 99%
“…In particular, major challenges that remain include transmitting ecological data from sensors to discoverable and accessible repositories, converting raw data to meaningful ecological variables (e.g., translating electrical voltages from fluorescent sensors into phytoplankton biomass concentrations; Roesler et al 2017), extracting usable information from complex and diverse data sources, and connecting patterns in the data to ecological processes (Lee et al 2018). These challenges create many opportunities for collaboration between ecologists and computer scientists, including the development of cyberinfrastructure (e.g., virtual private network software and cloud computing) that is adaptable to a variety of data streams from a diversity of environmental observatories and facilitates findable, accessible, interoperable, and re-usable data (FAIR data; Wilkinson et al 2016); the adoption of software techniques and technologies within the ecological research community (e.g., modeling in R); the development of computing power that scales with the size of the data and the demands of the models (Subratie et al 2017, Turner andCarpenter 2017); and the melding of data-driven models with theory-based ecological models that facilitate both data mining and interpretation (Karpatne et al 2017). These computing approaches, which all represent significant breakthroughs in the realm of computer science, are beginning to be used by ecologists, but are not yet well-integrated into ecological research (Porter et al 2005, Seidl 2017).…”
Section: Benefits To Ecologymentioning
confidence: 99%
“…This is not meant to be a comprehensive survey, but rather a sample of pioneering work on artificial intelligence for scientific discovery for readers unfamiliar with this literature. For more detailed overviews see [16,18,21,23,36]. The goal here is to highlight two important things.…”
Section: Artificial Intelligence In Sciencementioning
confidence: 99%
“…Standard approaches can be categorized into two major classes: the Theory-driven approach, and the Data-driven approach [1]. The Theory-driven approach utilizes background knowledge accumulated through prior research to establish future knowledge, while the data-driven approach relies solely upon the data being analyzed to generate new scientific knowledge.…”
Section: Introductionmentioning
confidence: 99%