2018
DOI: 10.3390/bdcc2040033
|View full text |Cite
|
Sign up to set email alerts
|

Topological Signature of 19th Century Novelists: Persistent Homology in Text Mining

Abstract: Topological Data Analysis (TDA) refers to a collection of methods that find the structure of shapes in data. Although recently, TDA methods have been used in many areas of data mining, it has not been widely applied to text mining tasks. In most text processing algorithms, the order in which different entities appear or co-appear is being lost. Assuming these lost orders are informative features of the data, TDA may play a significant role in the resulted gap on text processing state of the art. Once provided,… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
16
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 21 publications
(16 citation statements)
references
References 26 publications
0
16
0
Order By: Relevance
“…We show that topological information, extracted from the relationships between sentences, can be used in inference, namely it can be applied to the very difficult legal entailment problem given in the COLIEE 2018 data set. Previous results of Doshi and Zadrozny (2018) and Gholizadeh et al (2018) show that topological features are useful for classification. The applications of computational topology to entailment are novel, and in our view provide a new set of tools for discourse semantics: computational topology can perhaps provide a bridge between the brittleness of logic and the regression of neural networks.…”
mentioning
confidence: 85%
See 1 more Smart Citation
“…We show that topological information, extracted from the relationships between sentences, can be used in inference, namely it can be applied to the very difficult legal entailment problem given in the COLIEE 2018 data set. Previous results of Doshi and Zadrozny (2018) and Gholizadeh et al (2018) show that topological features are useful for classification. The applications of computational topology to entailment are novel, and in our view provide a new set of tools for discourse semantics: computational topology can perhaps provide a bridge between the brittleness of logic and the regression of neural networks.…”
mentioning
confidence: 85%
“…Gholizadeh et al (2018) applied a different method for computing homological persistence to the task of authorship attribution, which is also a classification task, showing that the patterns of how authors introduce characters in novels can be captured to large extent using topological descriptors. Interestingly, neither of these works uses topological features to augments the usual tf/idf representations of documents: Doshi and Zadrozny (2018) use counts of words (from a previously identified vocabularies) to form a matrix which is the only input to topological persistence, and then they make a rule based decision based only on the presence of barcodes; and Gholizadeh et al (2018) use time series. To use topological data analysis (TDA), Zhu (2013) assumes that text is implicitly coherent (SIFTS method), and so do Doshi and Zadrozny (2018).…”
Section: Introductionmentioning
confidence: 99%
“…Some visualization tools such as "Persistent Diagram", "Barcode" and " Persistent Landscape" are invented to indicate the main topological features of data. Persistent homology has been previously used in brain [6], image analysis [4] and data mining [5].…”
Section: Introductionmentioning
confidence: 99%
“…Gholizadeh et al (2018) applied a different method for computing homological persistence to the task of authorship attribution, which is also a classification task, showing that the patterns of how authors introduce characters in novels can be captured to large extent using topological descriptors. Interestingly, neither of these works uses topological features to augments the usual tf/idf representations of documents: Doshi and Zadrozny (2018) use counts of words (from a previously identified vocabularies) to form a matrix which is the only input to topological persistence, and then they make a rule based decision based only on the presence of barcodes; and Gholizadeh et al (2018) use time series. To use topological data analysis (TDA), Zhu (2013) assumes that text is implicitly coherent (SIFTS method), and so do Doshi and Zadrozny (2018).…”
Section: Topological Data Analysis For Discourse Semantics? 1 Introdumentioning
confidence: 99%