2016 IEEE International Conference on Big Data (Big Data) 2016
DOI: 10.1109/bigdata.2016.7840645
|View full text |Cite
|
Sign up to set email alerts
|

Sampling labelled profile data for identity resolution

Abstract: Identity resolution capability for social networking profiles is important for a range of purposes, from open-source intelligence applications to forming semantic web connections. Yet replication of research in this area is hampered by the lack of access to ground-truth data linking the identities of profiles from different networks. Almost all data sources previously used by researchers are no longer available, and historic datasets are both of decreasing relevance to the modern social networking landscape an… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 16 publications
0
7
0
Order By: Relevance
“…However, finding an appropriate similarity function that combines the similarities of attributes and decides on whether to link or not the entities is often difficult. Several works use a training set to learn a classifier [7,8,17], others base the decision on a threshold derived through experiments [9,18]. Other approaches decide the include the uncertainty of a match into the decision [19].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…However, finding an appropriate similarity function that combines the similarities of attributes and decides on whether to link or not the entities is often difficult. Several works use a training set to learn a classifier [7,8,17], others base the decision on a threshold derived through experiments [9,18]. Other approaches decide the include the uncertainty of a match into the decision [19].…”
Section: Related Workmentioning
confidence: 99%
“…The problem that we need to solve is which s i , s j pairs indicate a strong similarity to be considered for a match. The related work solutions propose using a classifier [7,8,33] or experimenting with different thresholds [9,18,33]. We propose a more relaxed technique that uses Pareto optimality [34] for filtering the positive class.…”
Section: Ranking the Pairsmentioning
confidence: 99%
“…In fact it is possible to identify a single trend largely definitional both for the technical developments in the area of IM systems and social and ethical concerns associated with this field-the identity resolution problem. This problem has emerged as a rather innocuous and purely technical issue in the data base management and statistics as a problem of classification task whereby two or more entities (collections of attributes)-often from different databases-are matched together based on the similarity of their features (Edwards et al, 2016). This problem has also motivated the development of novel identity resolution techniques and tools assisted by the advancements in artificial intelligence.…”
Section: Identity Management Solutionsmentioning
confidence: 99%
“…However, finding an appropriate similarity function that combines the similarities of attributes and decides on whether to link or not the entities is often difficult. Several works use a training set to learn a classifier [9][10][11], others base the decision on a threshold derived through experiments [12,13]. Other approaches that deal with uncertainty are described in the survey of Magnani and Montesi [7], including probabilistic, rule-based probabilistic, fuzzy, and preference-based relationships between records as well as aggregation of multi-matches.…”
Section: Related Workmentioning
confidence: 99%
“…This is an old problem in the entity linkage community. A classifier [9,11,31] can learn the behavior of the matches and detect the positive class. However, it is difficult to obtain labeled data, especially across different sources.…”
Section: Ranking the Pairsmentioning
confidence: 99%