Using Rich Document Representation in XML Information Retrieval

Raja, Fahimeh; Keikha, Mostafa; Rahgozar, Maseud; Oroumchian, Farhad

doi:10.1007/978-3-540-73888-6_29

Cited by 6 publications

(3 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…." suggests a relationship between two noun phrases ''operating systems" and ''personal computers" [14]. Then, these relations are represented with a format similar to that of Multivalued logic as used in the theory of human plausible reasoning i.e., operating_system (personal_computeres) [2].…”

Section: Rich Document Representation (Rdr)mentioning

confidence: 99%

Rich document representation and classification: An analysis

Keikha

Khonsari

Oroumchian

2009

Knowledge-Based Systems

Self Cite

View full text Add to dashboard Cite

Section: Rich Document Representation (Rdr)mentioning

confidence: 99%

Rich document representation and classification: An analysis

Keikha

Khonsari

Oroumchian

2009

Knowledge-Based Systems

Self Cite

View full text Add to dashboard Cite

“…Extensible Markup Language (XML) IR, by contrast to traditional IR, deals with documents that contain structural markups which can be used as hints to assess the relevancy of individual elements instead of the whole document. Reference [16] presents how the DST can be used in the weighting of elements in the document. It is also used to express uncertainty and to combine evidences derived from different inferences, providing relevancy values of all elements of the XML document.…”

Section: A the Dempster-shafer Theory In Information Retrievalmentioning

confidence: 99%

New Metrics between Bodies of Evidences

Djiknavorian¹,

Grenier²,

Valin³

2012

JETWI

View full text Add to dashboard Cite

We address the problem of the computational difficulties occurring by the heavy processing load required by the use of the Dempster-Shafer Theory (DST) in Information Retrieval. Specifically, we focus our efforts on the measure of performance known as the Jousselme distance between two basic probability assignments (or bodies of evidences). We discuss first the extension of the Jousselme distance from the DST to the Dezert-Smarandache Theory, a generalization of the DST. It is followed by an introduction to two new metrics we have developed: a Hamming inspired metric for evidences, and a metric based on the degree of shared uncertainty. The performances of theses metrics are compared one to each other

show abstract

“…." suggests a relationship between two noun phrases "operating systems" and "personal computers" [RKOR06]. Then, these relations are represented with a format similar to that of multivalued logic as used in the theory of human plausible reasoning; that is, operating system(personal computers) [CM89].…”

Section: Rich Document Representation (Rdr)mentioning

confidence: 99%

Document Representation and Quality of Text: An Analysis

Keikha

Razavian

Oroumchian

et al. 2008

Survey of Text Mining II

Self Cite

View full text Add to dashboard Cite

OverviewThere are three factors involved in text classification: the classification model, the similarity measure, and the document representation. In this chapter, we will focus on document representation and demonstrate that the choice of document representation has a profound impact on the quality of the classification. We will also show that the text quality affects the choice of document representation. In our experiments we have used the centroid-based classification, which is a simple and robust text classification scheme. We will compare four different types of document representation: N-grams, single terms, phrases, and a logic-based document representation called RDR. The N-gram representation is a string-based representation with no linguistic processing. The single-term approach is based on words with minimum linguistic processing. The phrase approach is based on linguistically formed phrases and single words. The RDR is based on linguistic processing and representing documents as a set of logical predicates. Our experiments on many text collections yielded similar results. Here, we base our arguments on experiments conducted on Reuters-21578 and contest (ASRS) collection (see Appendix). We show that RDR, the more complex representation, produces more effective classification on Reuters-21578, followed by the phrase approach. However, on the ASRS collection, which contains many syntactic errors (noise), the 5-gram approach outperforms all other methods by 13%. That is because the 5-gram approach is a robust method in presence of noise. The more complex models produce better classification results, but since they are dependent on natural language processing (NLP) techniques, they are vulnerable to noise.

show abstract

Using Rich Document Representation in XML Information Retrieval

Cited by 6 publications

References 4 publications

Rich document representation and classification: An analysis

Rich document representation and classification: An analysis

New Metrics between Bodies of Evidences

Document Representation and Quality of Text: An Analysis

Contact Info

Product

Resources

About