2010
DOI: 10.1007/978-3-642-12837-0_3
|View full text |Cite
|
Sign up to set email alerts
|

Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents

Abstract: Abstract. Information extraction from legal documents is an important and open problem. A mixed approach, using linguistic information and machine learning techniques, is described in this paper. In this approach, top-level legal concepts are identified and used for document classification using Support Vector Machines. Named entities, such as, locations, organizations, dates, and document references, are identified using semantic information from the output of a natural language parser. This information, lega… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 22 publications
(12 citation statements)
references
References 25 publications
0
12
0
Order By: Relevance
“…The consensus verdict from Table II has been subsequently used to compute the system's search and retrieval performance using the system-based evaluation metrics [45]- [48] given in Equations…”
Section: System Evaluation and Resultsmentioning
confidence: 99%
“…The consensus verdict from Table II has been subsequently used to compute the system's search and retrieval performance using the system-based evaluation metrics [45]- [48] given in Equations…”
Section: System Evaluation and Resultsmentioning
confidence: 99%
“…Although these approaches give better results compared to other approaches, the drawback of using handcrafted rules remains the same. Another approach by Quaresma and Gonçalves combined linguistic and ML techniques for identifying entities from a corpus of legal documents. Here, NEs such as location names, organization names, dates, references to other documents, and document articles are extracted using the semantic information from the output of a natural language parser.…”
Section: Related Work On Ner Approachesmentioning
confidence: 99%
“…Srihari et al used a combination of the maximum entropy model (MaxEnt), hidden Markov model (HMM), and handcrafted patterns for extracting NEs from natural language text. Quaresma and Gonçalves proposed a mixed approach, using linguistic and ML techniques to identify entities from a corpus of legal documents. Rule‐based approaches have several advantages: first, rule‐based techniques can work without depending on the availability of annotated training data, which can be expensive to obtain.…”
Section: Introductionmentioning
confidence: 99%
“…They report high precision for identifying judges (98%) and high recall for capturing jurisdictions (87%). Quaresma and Gonçalves use linguistic information to identify named entities in a corpus of legal documents from the International Agreements/External Relations section of the EUR-Lex portal 1 and classify documents based on some of the named entities discovered [8]. Unlike Dozier et al, Quaresma and Gonçalves found only very few person names in their corpus (presumably due to nature of international agreements).…”
Section: Named Entity and Date Extraction In Legal Textmentioning
confidence: 99%