A typical index at the end of a textbook contains a manuallyprovided vocabulary of terms related to the content of the textbook. In this paper, we extend our previous work on extraction of knowledge models from digital textbooks. We are taking a more critical look at the content of a textbook index and present a mechanism for classifying index terms according to their domain specificity: a core domain concept, an in-domain concept, a concept from a related domain, and a concept from a foreign domain. We link the extracted models to DBpedia and leverage the aggregated linguistic and structural information from textbooks and DBpedia to construct and prune the domain-specific knowledge graphs. The evaluation experiments demonstrate (1) the ability of the approach to identify (with high accuracy) different levels of domain specificity for automatically extracted concepts, (2) its cross-domain robustness, and (3) the added value of the domain specificity information. These results clearly indicate the improved quality of the refined knowledge graphs and widen their potential applicability.
Textbooks are educational documents created, structured and formatted by domain experts with the primary purpose to explain the knowledge in the domain to a novice. Authors use their understanding of the domain when structuring and formatting the content of a textbook to facilitate this explanation. As a result, the formatting and structural elements of textbooks carry the elements of domain knowledge implicitly encoded by their authors. Our paper presents an extensible approach towards automated extraction of knowledge models from textbooks and enrichment of their content with additional links (both internal and external). The textbooks themselves essentially become hypertext documents where individual pages are annotated with important concepts in the domain. The evaluation experiments examine several aspects and stages of the approach, including the accuracy of model extraction, the pragmatic quality of extracted models using one of their possible applications-semantic linking of textbooks in the same domain, the accuracy of linking models to external knowledge sources and the effect of integration of multiple textbooks from the same domain. The results indicate high accuracy of model extraction on symbolic, syntactic and structural levels across textbooks and domains, and demonstrate the added value of the extracted models on the semantic level.
This paper evaluates an automatically extracted domain model from textbooks and applies learning curve analysis to assess its ability to represent students' knowledge and learning. Results show that extracted concepts are meaningful knowledge components with varying granularity, depending on textbook authors' perspectives. The evaluation demonstrates the acceptable quality of the extracted domain model in knowledge modeling.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.