Narjes Sharif Razavian scite author profile

Narjes Sharif Razavian

3Publications

7Citation Statements Received

33Citation Statements Given

How they've been cited

How they cite others

Affiliations

Carnegie Mellon University, University of Tehran

Publications

Order By: Most citations

Document Representation and Quality of Text: An Analysis

Keikha

Razavian

Oroumchian

et al. 2008

View full text Add to dashboard Cite

OverviewThere are three factors involved in text classification: the classification model, the similarity measure, and the document representation. In this chapter, we will focus on document representation and demonstrate that the choice of document representation has a profound impact on the quality of the classification. We will also show that the text quality affects the choice of document representation. In our experiments we have used the centroid-based classification, which is a simple and robust text classification scheme. We will compare four different types of document representation: N-grams, single terms, phrases, and a logic-based document representation called RDR. The N-gram representation is a string-based representation with no linguistic processing. The single-term approach is based on words with minimum linguistic processing. The phrase approach is based on linguistically formed phrases and single words. The RDR is based on linguistic processing and representing documents as a set of logical predicates. Our experiments on many text collections yielded similar results. Here, we base our arguments on experiments conducted on Reuters-21578 and contest (ASRS) collection (see Appendix). We show that RDR, the more complex representation, produces more effective classification on Reuters-21578, followed by the phrase approach. However, on the ASRS collection, which contains many syntactic errors (noise), the 5-gram approach outperforms all other methods by 13%. That is because the 5-gram approach is a robust method in presence of noise. The more complex models produce better classification results, but since they are dependent on natural language processing (NLP) techniques, they are vulnerable to noise.

show abstract

The web as a platform to build machine translation resources

Razavian

Vogel

2009

View full text Add to dashboard Cite

In the last few years, the World Wide Web has changed tremendously. Now accessible to millions of users from hundreds of countries, it has started to show new online behaviors. Following the new patterns we now see many multilingual activities going on in large scales. In this paper, we provide an analysis on how this emerging usage patterns can affect the Machine Translation community. We identify the main motivations behind these activity patterns. Using examples we compare the traditional approaches to resource collection to new online-based approaches. We then present our experimental results of an online community designed to collect parallel corpora.

show abstract

Embedding a Corporate Blogging System in the CRM Solutions

Razavian

Taghiyareh²

2008

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.