2020
DOI: 10.1155/2020/4750871
|View full text |Cite
|
Sign up to set email alerts
|

Sentence Embedding Based Semantic Clustering Approach for Discussion Thread Summarization

Abstract: Huge data on the web come from discussion forums, which contain millions of threads. Discussion threads are a valuable source of knowledge for Internet users, as they have information about numerous topics. The discussion thread related to single topic comprises a huge number of reply posts, which makes it hard for the forum users to scan all the replies and determine the most relevant replies in the thread. At the same time, it is also hard for the forum users to manually summarize the bulk of reply posts in … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 38 publications
0
5
0
Order By: Relevance
“…Tokenization is the procedure for segmenting sentences into words, the results of which are called tokens [21]. Sentences are truncated based on whitespaces such as tabs and blanks, and punctuation symbols such as commas, semicolons, periods, and colons [22]. This stage eliminates unnecessary punctuation, spaces, and characters.…”
Section: Tokenizationmentioning
confidence: 99%
“…Tokenization is the procedure for segmenting sentences into words, the results of which are called tokens [21]. Sentences are truncated based on whitespaces such as tabs and blanks, and punctuation symbols such as commas, semicolons, periods, and colons [22]. This stage eliminates unnecessary punctuation, spaces, and characters.…”
Section: Tokenizationmentioning
confidence: 99%
“…In the appendix we provide a detailed description of how we implemented this additional filter, along with a sample of text ranked from least to most related to economic concepts (Table 12). A number of recent works suggest using a similar approach where a final computational step is taken to classify text data, see Bao et al (2020), Wu et al (2020), Wan et al (2020) and Khan et al (2020). A small scale evaluation of the classification strategy (100 pictures and associated text) results in an estimated precision of 0.78 and estimated recall of 0.80.…”
Section: Selection Of Imagesmentioning
confidence: 99%
“…The domain of summarization is diverse in the scenario of different applications. One of the earliest approach focuses on summarizing a single document [18] then extended to summarize multiple documents [19], email thread [20], recognize the specific online arguments and dialogues [21], [22], and timeline summarization [23]. Comments summarization in social media concentrates on determining which posts are most relevant to a particular topic.…”
Section: Introductionmentioning
confidence: 99%