Handwritten Information Extraction from Historical Census Documents

Nion, Thibauld; Menasri, F.; Louradour, Jérôme; Sibade, Cédric; Retornaz, Thomas; Metaireau, Pierre-Yves; Kermorvant, Christopher

doi:10.1109/icdar.2013.168

Cited by 16 publications

(6 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…LSTM are carefully designed recurrent neurons which gave superior performance in a wide range of sequence modeling problems. In fact, RNNs enhanced by LSTM cells [8] won several important contests [9], [10], [11] and currently hold the best known results in handwriting recognition.…”

Section: Introductionmentioning

confidence: 99%

Dropout Improves Recurrent Neural Networks for Handwriting Recognition

Pham

Bluche

Kermorvant³

et al. 2014

2014 14th International Conference on Frontiers in Handwriting Recognition

Self Cite

462

258

View full text Add to dashboard Cite

Recurrent neural networks (RNNs) with Long Short-Term memory cells currently hold the best known results in unconstrained handwriting recognition. We show that their performance can be greatly improved using dropout -a recently proposed regularization method for deep architectures. While previous works showed that dropout gave superior performance in the context of convolutional networks, it had never been applied to RNNs. In our approach, dropout is carefully used in the network so that it does not affect the recurrent connections, hence the power of RNNs in modeling sequences is preserved. Extensive experiments on a broad range of handwritten databases confirm the effectiveness of dropout on deep architectures even when the network mainly consists of recurrent and shared connections.

show abstract

Section: Introductionmentioning

confidence: 99%

Dropout Improves Recurrent Neural Networks for Handwriting Recognition

Pham

Bluche

Kermorvant³

et al. 2014

2014 14th International Conference on Frontiers in Handwriting Recognition

Self Cite

462

258

View full text Add to dashboard Cite

show abstract

“…Under augmentation, the date model achieves an SA 0 of 90.5% while the age model has an SA 0 of 97.2%. In the context of US censuses, Nion et al (2013) transcribed age at a sequence accuracy of approximately 85% using convolutional neural networks. In ongoing (unpublished) work, the BYU Record Linking Lab is using a CTCbased approach to correct mistakes in US censuses, but we do not yet have performance metrics to compare against.…”

Section: Resultsmentioning

confidence: 99%

Applications of Machine Learning in Document Digitisation

Dahl¹,

Johansen²,

Sørensen³

et al. 2021

Preprint

View full text Add to dashboard Cite

Data acquisition forms the primary step in all empirical research. The availability of data directly impacts the quality and extent of conclusions and insights. In particular, larger and more detailed datasets provide convincing answers even to complex research questions. The main problem is that "large and detailed" usually implies "costly and difficult", especially when the data medium is paper and books. Human operators and manual transcription have been the traditional approach for collecting historical data.We instead advocate the use of modern machine learning techniques to automate the digitisation process. We give an overview of the potential for applying machine digitisation for data collection through two illustrative applications. The first demonstrates that unsupervised layout classification applied to raw scans of nurse journals can be used to construct a treatment indicator. Moreover, it allows an assessment of assignment compliance. The second application uses attention-based neural networks for handwritten text recognition in order to transcribe age and birth and death dates from a large collection of Danish death certificates. We describe each step in the digitisation pipeline and provide implementation insights. * Acknowledgements: We thank Peter Sandholdt Jensen, Joseph Price, and Michael Rosholm for useful comments. We also thank Søren Poder for contributing his expertise on digitisation of historical documents. We gratefully acknowledge support from Rigsarkivet (Danish National Archive) and Aarhus Stadsarkiv (Aarhus City Archive) who have supplied large amounts of scanned source material. We also gratefully acknowledge support from DFF who has funded the research project "Inside the black box of welfare state expansion: Early-life health policies, parental investments and socio-economic and health trajectories" (grant 8106-00003B) with PI Miriam Wüst.

show abstract

“…[27] combined BLSTM-CTC with a probabilistic language model and by this developed a system capable of directly transcribing raw online handwriting data. In a real-world use case this system showed a very high automation rate with an error rate comparable to a human on this kind of task ( [57]). In another approach [35] combined BLSTM-CTC with multidimensional LSTM and applied it to an offline handwriting recognition task, as well outperforming classifiers based on Hidden-Markov models.…”

Section: Handwriting Recognitionmentioning

confidence: 97%

Understanding LSTM -- a tutorial into Long Short-Term Memory Recurrent Neural Networks

Staudemeyer¹,

Morris²

2019

Preprint

174

100

View full text Add to dashboard Cite

Long Short-Term Memory Recurrent Neural Networks (LSTM-RNN) are one of the most powerful dynamic classifiers publicly known. The network itself and the related learning algorithms are reasonably well documented to get an idea how it works. This paper will shed more light into understanding how LSTM-RNNs evolved and why they work impressively well, focusing on the early, ground-breaking publications. We significantly improved documentation and fixed a number of errors and inconsistencies that accumulated in previous publications. To support understanding we as well revised and unified the notation used.

show abstract

Handwritten Information Extraction from Historical Census Documents

Cited by 16 publications

References 9 publications

Dropout Improves Recurrent Neural Networks for Handwriting Recognition

Dropout Improves Recurrent Neural Networks for Handwriting Recognition

Applications of Machine Learning in Document Digitisation

Understanding LSTM -- a tutorial into Long Short-Term Memory Recurrent Neural Networks

Contact Info

Product

Resources

About