“…This has a direct consequence on a proper rendering of the paper in digital form. Consequently, this impacts on the adequate interpretation of the language in the text using Natural Language Processing techniques such as word and sentence tokenisation, part-of-speech tagging ( Lopresti, 2009 , Tanner et al, 2009 , Kettunen and Pääkkönen, 2016 ) in correctly interpreting text. Lately, the use of Machine Learning techniques has been gaining traction in building models that can be trained to recognise information entities from textual sources.…”