2019
DOI: 10.14569/ijacsa.2019.0100742
|View full text |Cite
|
Sign up to set email alerts
|

Vectorization of Text Documents for Identifying Unifiable News Articles

Abstract: Vectorization is imperative for processing textual data in natural language processing applications. Vectorization enables the machines to understand the textual contents by converting them into meaningful numerical representations. The proposed work targets at identifying unifiable news articles for performing multi-document summarization. A framework is introduced for identification of news articles related to top trending topics/hashtags and multi-document summarization of unifiable news articles based on t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
31
0
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 85 publications
(32 citation statements)
references
References 7 publications
0
31
0
1
Order By: Relevance
“…The process of converting textual content into meaningful numerical representation is known as vectorization [39]. Many feature engineering techniques exist in literature to vectorize texts, such as Bag-of-Words [40], Word2Vec [41], FastText [42], Glove [43] and Term Frequency-Inverse Document Frequency (TF-IDF) [44].…”
Section: ) Feature Extractionmentioning
confidence: 99%
“…The process of converting textual content into meaningful numerical representation is known as vectorization [39]. Many feature engineering techniques exist in literature to vectorize texts, such as Bag-of-Words [40], Word2Vec [41], FastText [42], Glove [43] and Term Frequency-Inverse Document Frequency (TF-IDF) [44].…”
Section: ) Feature Extractionmentioning
confidence: 99%
“…They did not observe an advantage with the utilization of ELMo, compared to the commonly used, like GloVe or the random indexing approach. Singh et al [6] propose a vectorization approach based on word targets, to identify unifiable news articles. They define a framework for identifying news related to trending topics/hashtags.…”
Section: Literature Reviewmentioning
confidence: 99%
“…However, our approach uses non-labeled datasets: patents, journals and learning resources. In addition, it is the only one that uses learning resources and patents, and only another one uses a scientific publication dataset in its analysis [6]. In regard to the used techniques, our work is interested in feature extraction methods to transform text documents into a list of features that can be easily used and understood, like BM25 and TF-IDF, and methods of document vectorization to create numerical features using statistical analysis, like LDA, LSA, and Doc2Vec.…”
Section: Comparison With Previous Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Text Vectorization: In this step, the text data is converted into numerical form [14]. Word Embedding algorithm is a process in which the input text is converted to the equivalent number representation.…”
mentioning
confidence: 99%