2019
DOI: 10.3390/mti3030058
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised Keyphrase Extraction for Web Pages

Abstract: Keyphrase extraction is an important part of natural language processing (NLP) research, although little research is done in the domain of web pages. The World Wide Web contains billions of pages that are potentially interesting for various NLP tasks, yet it remains largely untouched in scientific research. Current research is often only applied to clean corpora such as abstracts and articles from academic journals or sets of scraped texts from a single domain. However, textual data from web pages differ from … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
5
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 22 publications
0
5
0
Order By: Relevance
“…As an extension to the index navigation application, a further goal is to develop and implement 'see also' functionality, so that when a user selects a particular file, the application suggests and provides links to related files in the indexed corpus. This kind of functionality, now widespread and familiar, is one outcome of the half a century of fundamental research into information retrieval to which an extensive literature attests (e.g., in relation to the present investigations, refs [1][2][3][4][5][6][7][8][9]. Increasingly, such functionality is delivered by AI-based approaches such as those developed by, for example, UNSILO (https://unsilo.ai) and Yewno (https://www.yewno.com).…”
Section: Aims and Contextmentioning
confidence: 88%
“…As an extension to the index navigation application, a further goal is to develop and implement 'see also' functionality, so that when a user selects a particular file, the application suggests and provides links to related files in the indexed corpus. This kind of functionality, now widespread and familiar, is one outcome of the half a century of fundamental research into information retrieval to which an extensive literature attests (e.g., in relation to the present investigations, refs [1][2][3][4][5][6][7][8][9]. Increasingly, such functionality is delivered by AI-based approaches such as those developed by, for example, UNSILO (https://unsilo.ai) and Yewno (https://www.yewno.com).…”
Section: Aims and Contextmentioning
confidence: 88%
“…As an extension to the index navigation application, a further goal is to develop and implement 'see also' functionality, so that when a user selects a particular file, the application suggests and provides links to related files in the indexed corpus. This kind of functionality, which is now widespread and familiar, represents the fruition of half a century of fundamental research into information retrieval, to which an extensive literature attests (e.g., in relation to the present investigations, refs [1][2][3][4][5][6][7][8][9]. Increasingly, such functionality is delivered by AI-based technologies such as those developed by, for example, UNSILO (https://unsilo.ai) and Yewno (https://www.yewno.com).…”
Section: Aims and Contextmentioning
confidence: 96%
“…def find_intersection(dictA, dictB): # Return the elements that are common to dictA and dictB # Inputs are now dictionaries of the form {phr_id: signif, ... } intersect = [] sum_A_sigs = 0.0 sum_B_sigs = 0.0 for key in dictA: # key is a phrase ID sig1 = dictA[key] # sig1: significance in file A of phrase with ID value = key if key in dictB: sig2 = dictB[key] # sig2: significance in file B of phrase with ID value = key intersect.append( [key] ) # add to intersection if phrase is common to A and B sum_A_sigs = sum_A_sigs + sig1 sum_B_sigs = sum_B_sigs + sig2 # Compute the significance-based weighting by which to multiply the A-B link strength # -the product of the average significance values for A and B: intersect_size = len(intersect) av_A_sig = sum_A_sigs / (1+intersect_size) av_B_sig = sum_B_sigs / (1+intersect_size) tot_sig = av_A_sig * av_B_sig # Return the intersection size and combined significance: return (intersect_size, tot_sig) (vi) The overall similarity between files A and B is calculated as the product of an adjusted version of the total significance, taking into account phrase frequency statistics and the number of phrase components as outlined above, and the Overlap Multiplication (OM) term already discussed: 4 AB_score = (math.sqrt(tot_sig) * 100) * ( intersect**2 / ((1+A_only) * (1+B_only)) )…”
Section: Topical Relationshipsmentioning
confidence: 99%
See 1 more Smart Citation
“…Various keyphrase extraction methods have been developed to support the aforementioned applications [8], [9], [7], [10], [11], [12]. Domain-specific strategies [9], for example, need knowledge of the application domain, whereas linguistic approaches [9] demand language proficiency.…”
Section: Introductionmentioning
confidence: 99%