Solving the Interoperability Problem by Means of a Bus

Fabra, Javier; Hernández, Sergio; Ezpeleta, J.; Álvarez, Pedro

doi:10.1007/s10723-013-9276-1

Cited by 6 publications

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Integration of grid, cluster and cloud resources to semantically annotate a large‐sized repository of learning objects

Fabra

Hernández

Otero

et al. 2014

Concurrency and Computation

Self Cite

View full text Add to dashboard Cite

The Universia repository is composed of more than 15 million of educational resources. The lack of metadata describing these resources complicates their classification, search and recovery. To overcome this drawback, it was decided to semantically annotate the available educational resources using the ADEGA algorithm. For this objective, we selected the DBpedia, a cross-domain linked data composed of more than 3.77 million 'things' with 400 million 'facts', in order to make sure that the wide range of Universia topics are covered by the ontology. However, this kind of process is extremely expensive from a computational point of view: more than 1600 years of CPU time was estimated to achieve it. In this paper, parallel programming techniques and distributed computing paradigms are combined in order to achieve this semantic annotation in a reasonable time. The cornerstone of this proposal is a resource management and execution framework able to integrate heterogeneous computing resources at our disposal (grid, cluster and cloud resources). As a result, the problem was solved in less than 180 days, demonstrating that it is perfectly feasible to exploit the advantages of these computing models in the field of linked data. ontology. From a semantic point of view, the use of the ADEGA algorithm was very promising in terms of precision and recall. Nevertheless, a preliminary estimation concluded that more than 1640 years of CPU time would be needed to create graphs of depth 3 and 25,000 years for graphs of depth 4 (the semantic richness of a graph is proportional to its depth). Obviously, with these computation requirements, a specific computing resource (a personal computer, for instance) or a limited set of resources cannot solve the semantic annotation process.Distributed computing paradigms (grid or cloud computing, for instance) can help us to make the semantic annotation process viable from a computational point of view. This approach is not totally novel, and, therefore, some interesting practical experiences can be found in the scientific literature [5][6][7]. These experiences present two common features that must be emphasized. On the one hand, they show software applications or tools to annotate semantically Web documents or pages. These applications were programmed to be executed on a set of concrete computing resources, and, therefore, they are highly coupled to their corresponding execution environments and technologies. On the other hand, the described annotation processes are fast from time's perspective. This is due to that small-sized and medium-sized collections of data were semantically annotated and, additionally, each data was annotated by means of a single instance of the selected ontology. Now, we try to take advantage of our knowledge in the field of scientific computing and, unlike previous proposals, provide a solution independent of the underlying computing environment and able to solve large-scale annotation processes. In the Universia problem, the complexity of the annotation process is cau...

show abstract