Introduction:
The need for efficient search engines has been identified with the ever-increasing technological
advancement and huge growing demand of data on the web.
Method:
Automating duplicate detection over query results in identifying the records from multiple web databases that
point to the similar real-world entity and returns non-matching records to the end-users. The proposed algorithm in this
paper is based upon an unsupervised approach with classifiers over heterogeneous web databases that return more
accurate results with high precision, F-measure, and recall. Different assessments are also executed to analyze the efficacy
of the proposed algorithm for identification of the duplicates.
Result:
Results show that the proposed algorithm has greater precision, F-score measure, and the same recall values as
compared to standard UDD.
Conclusion:
This paper concludes that the proposed algorithm outperforms standard UDD.
Discussion:
This paper aims to introduce an algorithm that automates the process of duplicate detection for lexical
heterogeneous web databases.