Alan Burk scite author profile

PurposeThe purpose of this paper is to develop automated methods for creating metadata for documents in an institutional repository.Design/methodology/approachTwo methods are examined for automatically building metadata in an institutional repository context. Text mining techniques are employed to discover relationships among documents with similar content, from which are inferred possible values for missing or incomplete metadata elements. Machine learning techniques are used to identify and extract specific metadata element values from document content.FindingsText mining techniques can be used to cluster documents with similar content. This allows values for metadata elements, like keyword, to be projected from documents with established metadata to related documents. Machine learning techniques are found to be reasonably accurate for extracting from documents values for metadata elements, such as, title, author, and abstract. Results show sufficient promise to support the next phase of the project: the development of assistive tools for use by metadata specialists to create or edit document metadata.Originality/valueThis paper focuses on the use of automated metadata extraction techniques to assist metadata creation, lessening the time and effort required to add documents to institutional repositories.

show abstract

The Canadian Poetry Collection:

Charlong¹,

Burk²

View full text Add to dashboard Cite

The Canadian Arts and Humanities Computing Centre:

Burk¹,

Butler²,

Gerrity³

et al.

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Alan Burk

Practice and Preservation – Format Issues

New possibilities for metadata creation in an institutional repository context

The Canadian Poetry Collection:

The Canadian Arts and Humanities Computing Centre:

Contact Info

Product

Resources

About