Florian Laws scite author profile

Florian Laws

5Publications

218Citation Statements Received

80Citation Statements Given

How they've been cited

296

217

How they cite others

Affiliations

University of Stuttgart

Publications

Order By: Most citations

Estimation of conditional probabilities with decision trees and an application to fine-grained POS tagging

Schmid

Laws

2008

View full text Add to dashboard Cite

We present a HMM part-of-speech tagging method which is particularly suited for POS tagsets with a large number of fine-grained tags. It is based on three ideas: (1) splitting of the POS tags into attribute vectors and decomposition of the contextual POS probabilities of the HMM into a product of attribute probabilities, (2) estimation of the contextual probabilities with decision trees, and (3) use of high-order HMMs. In experiments on German and Czech data, our tagger outperformed stateof-the-art POS taggers.

show abstract

CloudScan - A Configuration-Free Invoice Analysis System Using Recurrent Neural Networks

Palm

Winther

Laws³

2017

View full text Add to dashboard Cite

We present CloudScan; an invoice analysis system that requires zero configuration or upfront annotation.In contrast to previous work, CloudScan does not rely on templates of invoice layout, instead it learns a single global model of invoices that naturally generalizes to unseen invoice layouts.The model is trained using data automatically extracted from end-user provided feedback. This automatic training data extraction removes the requirement for users to annotate the data precisely.We describe a recurrent neural network model that can capture long range context and compare it to a baseline logistic regression model corresponding to the current CloudScan production system.We train and evaluate the system on 8 important fields using a dataset of 326,471 invoices. The recurrent neural network and baseline model achieve 0.891 and 0.887 average F1 scores respectively on seen invoice layouts. For the harder task of unseen invoice layouts, the recurrent neural network model outperforms the baseline with 0.840 average F1 compared to 0.788.

show abstract

Stopping criteria for active learning of named entity recognition

Laws

Schütze

2008

View full text Add to dashboard Cite

Active learning is a proven method for reducing the cost of creating the training sets that are necessary for statistical NLP. However, there has been little work on stopping criteria for active learning. An operational stopping criterion is necessary to be able to use active learning in NLP applications. We investigate three different stopping criteria for active learning of named entity recognition (NER) and show that one of them, gradient-based stopping, (i) reliably stops active learning, (ii) achieves nearoptimal NER performance, (iii) and needs only about 20% as much training data as exhaustive labeling.

show abstract

Attend, Copy, Parse End-to-end Information Extraction from Documents

Palm¹,

Laws²,

Winther

2019

View full text Add to dashboard Cite

Document information extraction tasks performed by humans create data consisting of a PDF or document image input, and extracted string outputs. This end-to-end data is naturally consumed and produced when performing the task because it is valuable in and of itself. It is naturally available, at no additional cost. Unfortunately, state-of-the-art word classification methods for information extraction cannot use this data, instead requiring word-level labels which are expensive to create and consequently not available for many real life tasks. In this paper we propose the Attend, Copy, Parse architecture, a deep neural network model that can be trained directly on end-toend data, bypassing the need for word-level labels. We evaluate the proposed architecture on a large diverse set of invoices, and outperform a state-of-the-art production system based on word classification. We believe our proposed architecture can be used on many real life information extraction tasks where word classification cannot be used due to a lack of the required word-level labels. 1

show abstract

A graph-theoretic algorithm for automatic extension of translation lexicons

Dorow

Laws

Michelbacher

et al. 2009

View full text Add to dashboard Cite

This paper presents a graph-theoretic approach to the identification of yetunknown word translations. The proposed algorithm is based on the recursive Sim-Rank algorithm and relies on the intuition that two words are similar if they establish similar grammatical relationships with similar other words. We also present a formulation of SimRank in matrix form and extensions for edge weights, edge labels and multiple graphs.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Florian Laws

Estimation of conditional probabilities with decision trees and an application to fine-grained POS tagging

CloudScan - A Configuration-Free Invoice Analysis System Using Recurrent Neural Networks

Stopping criteria for active learning of named entity recognition

Attend, Copy, Parse End-to-end Information Extraction from Documents

A graph-theoretic algorithm for automatic extension of translation lexicons

Contact Info

Product

Resources

About