Dealing with a huge quantity of semi-structured documents and the extraction of information therefrom is an important topic that is getting a lot of attention. Methods that allow to accurately define where the data can be found are then pivotal in constructing a robust solution, allowing for imperfections and structural changes in the source material. In this paper we investigate a wrapper induction method that revolves around aligning XPath elements (steps), allowing a user to generalise upon training examples he gives to the data extraction system. The alignment is based on a modification of the well known Levenshtein edit distance. When the training example XPaths have been aligned with each other they are subsequently merged into the path that generalises, as precise as possible, the examples, so it can be used to accurately fetch the required data from the given source material.
A recently released voxel model quantifying aggregate resources of the Belgian part of the North Sea includes lithological properties of all Quaternary sediments and modelling-related uncertainty. As the underlying borehole data come from various sources and cover a long time span, data-related uncertainties should be accounted for as well. Applying a tiered data-uncertainty assessment to a composite lithology dataset with uniform, standardised lithological descriptions and rigorously completed metadata fields, uncertainties were qualified and quantified for positioning, sampling and vintage. The uncertainty on horizontal positioning combines navigational errors, on-board and off-deck offsets, and underwater drift. Sampling-gear uncertainty evaluates the suitability of each instrument in terms of its efficiency of sediment yield per lithological class. Vintage uncertainty provides a likelihood of temporal change since the moment of sampling, using the mobility of fine-scale bedforms as an indicator. For each uncertainty component, quality flags from 1 (very uncertain) to 5 (very certain) were defined, and converted into corresponding uncertainty percentages meeting the input requirements of the voxel model. Obviously, an uncertainty-based data selection procedure, aimed at improving the confidence of data products, reduces data density. Whether or not this density reduction is detrimental to the spatial coverage of data products, will depend on their intended use. At the very least, demonstrable reductions in spatial coverage will help to highlight the need for future data acquisition and to optimise survey plans. By opening up our subsurface model with associated data uncertainties in a public decision support application, policy makers and other end users are better able to visualise overall confidence and identify areas with insufficient coverage meeting their needs. Having to work with a borehole dataset that is increasingly limited with depth below the seabed, engineering geologists and geospatial analysts in particular will profit from a better visualisation of data-related uncertainty.Thematic collection: This article is part of the Mapping the Geology and Topography of the European Seas (EMODnet) collection available at: https://www.lyellcollection.org/cc/EMODnet
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.