The growing abundance of text articles in internet requires automated tagging using key phrases. The automated key phrase generation of resources helps in the information retrieval. To generate the key phrases for texts from all possible domains, the need is an automated approach that would extract the key ideas directly from the text itself. In this paper, we have suggested a methodology that uses the noun words and phrases, their occurrence and co-occurrences to generate the keywords. The method, employing both the statistical and linguistic features, has been successful in extracting the keywords and phrases to tag a text that best summarizes its content.
Image processing is a rapidly evolving field with immense significance in science and engineering. One of the latest applications of Image processing is in Intelligent Character Recognition (ICR), that is the computer translation of handwritten text into machine-readable and machine-editable characters. ICR is an advanced version of Optical Character Recognition system that allows fonts and different styles of handwriting to be recognized during processing with high accuracy and speed. ICR, in combination with OCR and OMR (Optical Mark Recognition), is used in forms processing. Forms processing is a process by which one can capture information entered into different data fields filled in forms and convert it to an editable text. Forms processing systems can range from the processing of small application forms to large scale survey forms with multiple pages. The Recognition Engine, designed using Image Processing and Convolution Networks helps save time, labor and money in addition to the increase of accuracy.
Large amount of insights can be drawn from the articles that are published online. Instead of manually reading all the articles and assigning relevant tags to them satisfying the content, it will be highly efficient if there exists an automated process for performing the task. In this paper, an unsupervised approach for the automated tagging of articles in Chinese language has been implemented. The input is an article and output is the tags to that article. The major challenge is the segmentation of the Chinese characters, which do not make use of separators unlike the English characters. To overcome this, different approaches are combined together in order to get accurate results. Efficient tagging of articles is required, which can be used for many applications in the analysis, one of which is in Recommendation Engine. The tagging process should consider all the aspects of the article and assign the most relevant tags accordingly. The proposed algorithm was implemented for a Chinese Publication House and relevant tags were assigned to its articles of different categories. At the end of the project, the results were manually checked for, in a corpus of 10000 Chinese articles, which reflected the attainment of overall accuracy of around 85%, greater than that obtained through different traditional methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.