We critically assess mainstream accounting and finance research applying methods from computational linguistics (CL) to study financial discourse. We also review common themes and innovations in the literature and assess the incremental contributions of studies applying CL methods over manual content analysis. Key conclusions emerging from our analysis are: (a) accounting and finance research is behind the curve in terms of CL methods generally and word sense disambiguation in particular; (b) implementation issues mean the proposed benefits of CL are often less pronounced than proponents suggest; (c) structural issues limit practical relevance; and (d) CL methods and high quality manual analysis represent complementary approaches to analyzing financial discourse. We describe four CL tools that have yet to gain traction in mainstream AF research but which we believe offer promising ways to enhance the study of meaning in financial discourse. The four tools are named entity recognition (NER), summarization, semantics and corpus linguistics. K E Y W O R D S 10-K, annual reports, computational linguistics, conference calls, corpus linguistics, earnings announcements, machine learning, NLP, semantics 1Information is the lifeblood of financial markets and the amount of data available to decision-makers is increasing exponentially. Bank of England (2015) estimates that 90% of global information has been created during the last decade, (MD&A), whereas practitioners, standard setters and regulators are often interested in more granular issues such as the format and content of specific disclosures, placement of content within the overall reporting package, limits on the use of jargon concerning particular topics, etc. Second, it is not immediately obvious how commonly employed empirical proxies for discourse quality such as readability (Fog index), tone (word-frequency counts) and text re-use (cosine similarity) map into the practical properties of effective communication identified by financial market regulators.With these caveats in mind, we proceed to review common themes and innovations in the literature and assess the incremental contributions of work applying CL methods over manual content analysis. The median AF study examines 10-K filings using basic content analysis methods such as readability algorithms and keyword counts. The degree of clustering is consistent with the initial phase of the research lifecycle, with agendas shaped as much by ease of data access and implementation as by research priorities. Nevertheless, closer inspection reveals how relatively basic word-level methods have been used to provide richer insights into the properties and effects of financial discourse.Refinements to standard readability metrics, development of domain-specific wordlists, and the use of weighting schemes and text filtering to improve word-sense disambiguation represent welcome advances on naïve unigram word counts. We also acknowledge a move towards the use of more NLP technology in the form of machine learning and topic...
We provide a methodological contribution by developing, describing and evaluating a method for automatically retrieving and analysing text from digital PDF annual report files published by firms listed on the London Stock Exchange (LSE). The retrieval method retains information on document structure, enabling clear delineation between narrative and financial statement components of reports, and between individual sections within the narratives component. Retrieval accuracy exceeds 95% for manual validations using a random sample of 586 reports. Large-sample statistical validations using a comprehensive sample of reports published by non-financial LSE firms confirm that report length, narrative tone and (to a lesser degree) readability vary predictably with economic and regulatory factors. We demonstrate how the method is adaptable to non-English language documents and different regulatory regimes using a case study of Portuguese reports. We use the procedure to construct new research resources including corpora for commonly occurring annual report sections and a dataset of text properties for over 26,000 U.K. annual reports.
Language resources are important for those working on computational methods to analyse and study languages. These resources are needed to help advancing the research in fields such as natural language processing, machine learning, information retrieval and text analysis in general. We describe the creation of useful resources for languages that currently lack them, taking resources for Arabic summarisation as a case study. We illustrate three different paradigms for creating language resources, namely: (1) using crowdsourcing to produce a small resource rapidly and relatively cheaply; (2) translating an existing gold-standard dataset, which is relatively easy but potentially of lower quality; and (3) using manual effort with appropriately skilled human participants to create a resource that is more expensive but of high quality. The last of these was used as a test collection for TAC-2011. An evaluation of the resources is also presented.The current paper describes and extends the resource creation activities and evaluations that underpinned experiments and findings that have previously appeared as an LREC workshop paper (El-Haj et al 2010), a student conference paper (El-Haj et al 2011b), and a description of a multilingual summarisation pilot (El-Haj et al 2011c;.
Tel ++44 (0) 1524 594242; Email s.young@lancaster.ac.uk. We are grateful for very helpful comments from two anonymous reviewers whose suggestions significantly improved the paper. Comments and suggestions were also provided by Kathleen AbstractDoubts have been raised about the rigour and objectivity of sell-side analysts' research due to institutional structures that promote pro-management behaviour. However, research in psychology stresses the importance of controlling for biases in individuals' inherent cognitive processing behaviour when drawing conclusions about their propensity to undertake careful scientific analysis. Using social cognition theory, we predict that the rigour and objectivity evident in analyst research is more pronounced following unexpected news in general and unexpected bad news in particular. We evaluate this prediction against the null hypothesis that analyst research consistently lacks rigour and objectivity to maintain good relations with management. Using U.S. firm earnings surprises as our conditioning event, we examine the content of analysts' conference call questions and research notes to assess the properties of their research. We find that analysts' notes and conference call questions display material levels of rigour and objectivity when earnings news is unexpectedly positive, and that these characteristics are more pronounced in response to unexpectedly poor earnings news. Results are consistent with analysts' innate cognitive processing response counteracting institutional considerations when attributional search incentives are strong. Exploratory analysis suggests that studying verbal and written outputs provides a more complete picture of analysts' work.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.