In this study, a cloud-based web application is developed that extracts information such as names, phone numbers, e-mail addresses, job titles from physical business cards prepared in Turkish using Tesseract-based optical character recognition method (OCR). Figure A. System architecture of the proposed cloud-based business card reader application Purpose: In the literature, there are various applications developed based on Mobile Vision, OpenCV and Tesseract for the digitization and extraction of information on paper-based business cards in English, Vietnamese, Japanese and Chinese languages. In this study, it is aimed to develop a high-accuracy cloud-based business card recognition software (TRCardScan) compatible with Turkish language with Tesseract-based OCR in order to extract the information from paper-based business cards in Turkish.Theory and Methods: The system architecture of the proposed paper-based Turkish business card reader application is given in Figure A. As can be seen, firstly, business card photos, are taken from the camera or image gallery as input to the application. Then, these photos are subjected to character reading process with Tesseract-based OCR method. As a result of the OCR process, the read and converted texts such as name, surname, mobile phone, e-mail address and contact address on the business card are parsed by algorithms specific to the characteristic of the parts in which it is located. In the last stage, the data that is parsed and transformed into meaningful information is sent to the web service to be written in the relevant field in the database.
Results:In the analyzes made with 15 paper-based Turkish business cards with different features, it was observed that the proposed TRCardScan software was able to extract the information from physical business cards with 84.76% accuracy, 96.05% precision, 84.88% recall and 90.12% F1 score. In addition, the average extraction time per business card is 1.6 seconds.
Conclusion:The proposed TRCardScan can read and parse data from physical business cards with an average extraction time of 1.6 seconds and high accuracy of around 85%. These results show that the parsing algorithm designed for the Turkish language in the proposed web-based application is successful, considering the time and performance criteria. Finally, when compared to similar software, TRCardScan is considered to be quite successful with its high accuracy rates and reasonable extraction times.