Towards high-throughput gibbs sampling at scale

Zhang, Ce; Ré, Christopher

doi:10.1145/2463676.2463702

Cited by 42 publications

(36 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…DimmWitted [55], the statistical inference and learning engine in DeepDive , is built upon our research of how to design a high-performance statistical inference and learning engine on a single machine [29,41,54,55]. DimmWitted models Gibbs sampling as a “column-to-row access” operation: each row corresponds to one factor, each column to one variable, and the non-zero elements in the matrix correspond to edges in the factor graph.…”

Section: System Infrastructurementioning

confidence: 99%

“…DeepDive 's model of KBC is motivated by the recent attempts of using machine learning-based technique for KBC [3,4,24,38,46,52,56] and the line of research that aims to improve the quality of a specific component of KBC system [7,12,15,21,26,27,31–33,35,39,42,47,48,51,53,54]. When designing DeepDive , we used these systems as test cases to justify the generality of our framework.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Extracting Databases from Dark Data with DeepDive

Zhang

Shin

Ré

et al. 2016

Proceedings of the 2016 International Conference on Management of Data

Self Cite

View full text Add to dashboard Cite

DeepDive is a system for extracting relational databases from dark data: the mass of text, tables, and images that are widely collected and stored but which cannot be exploited by standard relational tools. If the information in dark data — scientific papers, Web classified ads, customer service notes, and so on — were instead in a relational database, it would give analysts a massive and valuable new set of “big data.” DeepDive is distinctive when compared to previous information extraction systems in its ability to obtain very high precision and recall at reasonable engineering cost; in a number of applications, we have used DeepDive to create databases with accuracy that meets that of human annotators. To date we have successfully deployed DeepDive to create data-centric applications for insurance, materials science, genomics, paleontologists, law enforcement, and others. The data unlocked by DeepDive represents a massive opportunity for industry, government, and scientific researchers. DeepDive is enabled by an unusual design that combines large-scale probabilistic inference with a novel developer interaction cycle. This design is enabled by several core innovations around probabilistic training and inference.

show abstract

Section: System Infrastructurementioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Extracting Databases from Dark Data with DeepDive

Zhang

Shin

Ré

et al. 2016

Proceedings of the 2016 International Conference on Management of Data

Self Cite

View full text Add to dashboard Cite

show abstract

“…Each tuple (I1, I2, I3, w) represents a weighted ground rule I1 ← I2, I3. I1 is the head ; I2, I3 are the body and allowed to be NULL for factors of sizes 1 or 2. can be input to probabilistic inference engines, e.g., [29,56]. Moreover, since it records the causal relationships among facts, it contains the entire lineage and can be queried [52].…”

Section: Factor Graphsmentioning

confidence: 99%

“…During grounding, the database optimizes and executes the stored procedures and generates a factor graph in relational format. Existing inference engines, e.g., Gibbs [56], GraphLab [29], can be used to perform probabilistic inference over the result factor graph. …”

Section: Introductionmentioning

confidence: 99%

Knowledge expansion over probabilistic knowledge bases

Chen

Wang

2014

Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

View full text Add to dashboard Cite

Information extraction and human collaboration techniques are widely applied in the construction of web-scale knowledge bases. However, these knowledge bases are often incomplete or uncertain. In this paper, we present ProbKB, a probabilistic knowledge base designed to infer missing facts in a scalable, probabilistic, and principled manner using a relational DBMS. The novel contributions we make to achieve scalability and high quality are: 1) We present a formal definition and a novel relational model for probabilistic knowledge bases. This model allows an efficient SQL-based inference algorithm for knowledge expansion that applies inference rules in batches; 2) We implement ProbKB on massive parallel processing databases to achieve further scalability; and 3) We combine several quality control methods that identify erroneous rules, facts, and ambiguous entities to improve the precision of inferred facts. Our experiments show that ProbKB system outperforms the state-of-the-art inference engine in terms of both performance and quality.

show abstract

“…Essentially, every tuple in the database or result of a query is a random variable (node) in this factor graph. The inference phase takes the factor graph from grounding and performs statistical inference using standard techniques, e.g., Gibbs sampling [42,44]. The output of inference is the marginal probability of every tuple in the database.…”

Section: Introductionmentioning

confidence: 99%

Incremental knowledge base construction using DeepDive

Shin

Wang

et al. 2016

The VLDB Journal

Self Cite

View full text Add to dashboard Cite

Populating a database with unstructured information is a long-standing problem in industry and research that encompasses problems of extraction, cleaning, and integration. Recent names used for this problem include dealing with dark data and knowledge base construction (KBC). In this work, we describe DeepDive, a system that combines database and machine learning ideas to help develop KBC systems, and we present techniques to make the KBC process more efficient. We observe that the KBC process is iterative, and we develop techniques to incrementally produce inference results for KBC systems. We propose two methods for incremental inference, based respectively on sampling and variational techniques. We also study the tradeoff space of these methods and develop a simple rule-based optimizer. DeepDive includes all of these contributions, and we evaluate Deep-Dive on five KBC systems, showing that it can speed up KBC inference tasks by up to two orders of magnitude with negligible impact on quality.

show abstract

Towards high-throughput gibbs sampling at scale

Cited by 42 publications

References 37 publications

Extracting Databases from Dark Data with DeepDive

Extracting Databases from Dark Data with DeepDive

Knowledge expansion over probabilistic knowledge bases

Incremental knowledge base construction using DeepDive

Contact Info

Product

Resources

About