Texts (books, novels, papers, short messages) are sequences of sentences, words or symbols. Each author has an unique writing style. It can be characterized by some collection of attributes obtained from texts. The text verification is the case of an authorship verification where we have some text and we analyze if all parts of this text were written by the same (unknown or known) author. In this paper, there are analyzed and compared results of two developed methods for a text verification based on ngrams of symbols and on local histograms of words. The results of a symbol n-gram method and a method of word histograms for a dissimilarities searching in text parts of each text are analyzed and evaluated. The searched dissimilarities call for an attention to the text (or not) if the text parts were written by the same author or not. The attention depends on selected parameters prepared in experiments. Results illustrate usability of the methods to dissimilarities searching in text parts.
Every written text in any language has one author or more authors (authors have their individual sublanguage). An analysis of text if authors are not known could be done using methods of data analysis, data mining, and structural analysis. In this paper, two methods are described for anomaly detections: ngrams method and a system of Self-Organizing Maps working on sequences built from a text. there are analyzed and compared results of usable methods for discrepancies detection based on character n-gram profiles (the set of character n-gram normalized frequencies of a text) for Arabic texts. Arabic texts were analyzed from many statistical characteristics point of view. We applied some heuristics for measurements of text parts dissimilarities. We evaluate some Arabic texts and show its parts they contain discrepancies and they need some following analysis for anomaly detection. The analysis depends on selected parameters prepared in xperiments. The system is trained to input sequences after which it determines text parts with anomalies using a cumulative error and winner analysis in the networks. Both methods have been tested on Arabic texts and they have a perspective contribution to text analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.