From a computational point of view, the semantic annotation of large-scale data collections is an extremely expensive task. One possible way of dealing with this drawback is to distribute the execution of the annotation algorithm in several computing environments. In this paper, we show how the problem of semantically annotating a large-scale collection of learning objects has been conducted. The terms related to each learning object have been processed. The output was an RDF graph computed from the DBpedia database. According to an initial study, the use of a sequential implementation of the annotation algorithm would require more than 1600 CPU-years to deal with the whole set of learning objects (about 15 millions). For this reason, a framework able to integrate a set of heterogeneous computing infrastructures has been used to execute a new parallel version of the algorithm. As a result, the problem was solved in 178 days.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. ISEM '13, September
The Universia repository is composed of more than 15 million of educational resources. The lack of metadata describing these resources complicates their classification, search and recovery. To overcome this drawback, it was decided to semantically annotate the available educational resources using the ADEGA algorithm. For this objective, we selected the DBpedia, a cross-domain linked data composed of more than 3.77 million 'things' with 400 million 'facts', in order to make sure that the wide range of Universia topics are covered by the ontology. However, this kind of process is extremely expensive from a computational point of view: more than 1600 years of CPU time was estimated to achieve it. In this paper, parallel programming techniques and distributed computing paradigms are combined in order to achieve this semantic annotation in a reasonable time. The cornerstone of this proposal is a resource management and execution framework able to integrate heterogeneous computing resources at our disposal (grid, cluster and cloud resources). As a result, the problem was solved in less than 180 days, demonstrating that it is perfectly feasible to exploit the advantages of these computing models in the field of linked data. ontology. From a semantic point of view, the use of the ADEGA algorithm was very promising in terms of precision and recall. Nevertheless, a preliminary estimation concluded that more than 1640 years of CPU time would be needed to create graphs of depth 3 and 25,000 years for graphs of depth 4 (the semantic richness of a graph is proportional to its depth). Obviously, with these computation requirements, a specific computing resource (a personal computer, for instance) or a limited set of resources cannot solve the semantic annotation process.Distributed computing paradigms (grid or cloud computing, for instance) can help us to make the semantic annotation process viable from a computational point of view. This approach is not totally novel, and, therefore, some interesting practical experiences can be found in the scientific literature [5][6][7]. These experiences present two common features that must be emphasized. On the one hand, they show software applications or tools to annotate semantically Web documents or pages. These applications were programmed to be executed on a set of concrete computing resources, and, therefore, they are highly coupled to their corresponding execution environments and technologies. On the other hand, the described annotation processes are fast from time's perspective. This is due to that small-sized and medium-sized collections of data were semantically annotated and, additionally, each data was annotated by means of a single instance of the selected ontology. Now, we try to take advantage of our knowledge in the field of scientific computing and, unlike previous proposals, provide a solution independent of the underlying computing environment and able to solve large-scale annotation processes. In the Universia problem, the complexity of the annotation process is cau...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.