2014
DOI: 10.12705/632.11
|View full text |Cite
|
Sign up to set email alerts
|

Detailed mark‐up of semi‐monographic legacy taxonomic works using FlorML

Abstract: We present FlorML, an XML schema, specifically designed for the detailed mark‐up of highly complicated semi‐monographic legacy taxonomic works, such as large Floras and Faunas. We discuss the prerequisites for developing a suitable XML schema, and the limitations presented by the legacy taxonomic works, requirements by stakeholders and the desired output format. Furthermore, we explain how FlorML was deployed to mark up two legacy taxonomic works, Flora Malesiana and Flore du Gabon, how that deployment was imp… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
8

Relationship

2
6

Authors

Journals

citations
Cited by 15 publications
(8 citation statements)
references
References 21 publications
0
8
0
Order By: Relevance
“…Taxonomic descriptions often describe a broader range of character traits, including both qualitative and quantitative traits that provide a summary of the variation observed within a taxon (e.g., length of leaf: 6–10 cm; shape of leaf: ovate to obovate). Consequently, recent research has focused on developing the infrastructure, including software, glossaries, and ontologies, to automate the large‐scale extraction of phenotypic data from taxonomic descriptions (Jaiswal et al., ; Cui, ; Burleigh et al., ; Hamman et al., ; Garnier et al., ; Hoendorf et al., ; Endara et al., ).We describe a natural language processing (NLP) pipeline that leverages this new infrastructure to build character‐by‐taxon phenotypic trait matrices that are usable for evolutionary inference from formal taxonomic descriptions written in English. The NLP pipeline uses a non‐supervised learning strategy that analyzes the full length of the body of a description.…”
mentioning
confidence: 99%
“…Taxonomic descriptions often describe a broader range of character traits, including both qualitative and quantitative traits that provide a summary of the variation observed within a taxon (e.g., length of leaf: 6–10 cm; shape of leaf: ovate to obovate). Consequently, recent research has focused on developing the infrastructure, including software, glossaries, and ontologies, to automate the large‐scale extraction of phenotypic data from taxonomic descriptions (Jaiswal et al., ; Cui, ; Burleigh et al., ; Hamman et al., ; Garnier et al., ; Hoendorf et al., ; Endara et al., ).We describe a natural language processing (NLP) pipeline that leverages this new infrastructure to build character‐by‐taxon phenotypic trait matrices that are usable for evolutionary inference from formal taxonomic descriptions written in English. The NLP pipeline uses a non‐supervised learning strategy that analyzes the full length of the body of a description.…”
mentioning
confidence: 99%
“…TaxonX (Capatano 2010) as a flexible and lightweight XML schema, facilitates such communication step by offering developers an agree-upon taxon treatment model into which they may package the extracted text ("encoding"). TaxonX aims at modelling taxon treatments and their individual elements so that they can be re-used for data mining and data extraction and is especially suitable to markup legacy literature (Penev et al 2011). The Darwin Core Archive format is used to export treatments including the observation records to external users such as GBIF, EOL or the EU BON taxonomic backbone.…”
Section: Sharing Data: Taxonx Schema Darwin Core Archive and Rdfmentioning
confidence: 99%
“…However, much additional information on the Malesian flora already exists in a large but fragmented body of detailed species-level data in regional floras and the scientific literature. To mobilise and present this data, pilot projects in the FM framework have explored the use of database backbones dynamically linked to website output, mark-up procedures to populate these databases mobilising data from both digitalised legacy literature (earlier FM volumes) and digitally-born (more recent Flora volumes and journal articles) data sources (Flora Malesiana Working Group, 2011 onwards; De Wilde, 2014;Hamann et al, 2014;Penev, 2014), and platforms allowing remote online collaboration and data sharing for Flora contributors (Hovenkamp & Cicuzza, 2013;Thomas et al, 2013). Roos (2003) emphasised the fact that progress on the Flora Malesiana project has heavily relied on institutional commitment in the Netherlands, and, to a lesser extent, from other European institutions, while contributions coordinated by researchers from Malesian botanical institutions have been sporadic (but see Soepadmo, 1973;Keng, 1978).…”
Section: Flora Malesianamentioning
confidence: 99%