2014
DOI: 10.1007/978-3-319-12823-8_1
|View full text |Cite
|
Sign up to set email alerts
|

Content Profiling for Preservation: Improving Scale, Depth and Quality

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
7
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
3
1

Relationship

3
1

Authors

Journals

citations
Cited by 4 publications
(8 citation statements)
references
References 7 publications
1
7
0
Order By: Relevance
“…We adopted a very defensive approach to rule definition to avoid distorting errors, so numerous conflicting patterns remain for which no resolution is specified. Still, similar to the findings in Kulmukhametov & Becker (2014), a small number of rules is highly effective in improving data quality. The effect on the ratio of conflicted values shows that residual conflicts are small.…”
Section: Table 4 Exemplary Conflicts In the Data Set And Their Resolusupporting
confidence: 64%
See 2 more Smart Citations
“…We adopted a very defensive approach to rule definition to avoid distorting errors, so numerous conflicting patterns remain for which no resolution is specified. Still, similar to the findings in Kulmukhametov & Becker (2014), a small number of rules is highly effective in improving data quality. The effect on the ratio of conflicted values shows that residual conflicts are small.…”
Section: Table 4 Exemplary Conflicts In the Data Set And Their Resolusupporting
confidence: 64%
“…This is essential for subsequent analysis since some segments contain large shares of conflicted items. The implemented mechanisms go beyond the data cleansing performed by Jackson (2012) and build on earlier work (Kulmukhametov & Becker, 2014). An analysis of conflict patterns supported the declarative formulation of a relatively small rule set in the market spreadsheet so that conflict resolution is extensible and fully integrated.…”
Section: Input Preparation and Configurationmentioning
confidence: 99%
See 1 more Smart Citation
“…To diversify aspects such as the number of text snippets and their order and implementation possibilities, a set of model transformations is de ned that operate on the PIM and PSM level. To achieve a realistic dataset in terms of the feature distributions, we use the content pro ling tool C3PO [14] to sample initial real world distributions from the Govdocs data set. Where feature distribution data is not available, the transformation falls back to its default distribution.…”
Section: Instantiating the Data Generation Framework For This Benchmarkmentioning
confidence: 99%
“…However, the accuracy and correctness of the tools vary [19,20], and they frequently disagree on such questions as which format is this object encoded in?. As a consequence, they produce contradicting outputs [23,24]. Another source of contradiction is evolving metadata [25] when information and metadata standards change over time.…”
Section: Introductionmentioning
confidence: 99%