Analysing Narrative Aspects of UK Preliminary Earnings Announcements and Annual Reports: Tools and Insights for Researchers and Regulators

Lead Research Organisation: Lancaster University
Department Name: Accounting & Finance

Abstract

The quality of information provided to investors by corporate management in publicly traded companies is a matter of central importance to financial market participants. Narrative commentaries represent an increasingly significant component of financial communications. While financial narratives in the UK are shaped in part by prevailing regulations, senior management enjoys significant discretion over the content, structure and presentation of these disclosures. The informativeness of financial narrative disclosures and the way management apply their reporting discretion are key questions for academics and policymakers.

Partnering with the UK body responsible for promoting high quality corporate governance and financial reporting - the Financial Reporting Council (FRC) - this interdisciplinary project will combine expertise from accounting with state-of-the-art methods from computational linguistics to examine two key elements of financial disclosure. The first aspect is preliminary earnings announcements (PEAs), which arguably represent the most important disclosure in UK firms' annual reporting calendar. The second aspect is the annual report to shareholders, which forms the largest single recurring disclosure commitment for management.

Two opposing perspectives exist on corporate narrative disclosures. On the one hand, proponents argue that narratives provide information beyond that contained in financial data. On the other hand, opponents claim that management exploit the discretion embedded in narrative reporting to obfuscate or present a biased representation of actual performance. While extant work on UK annual report and PEA narrative disclosures provides evidence consistent with both perspectives, both the scope of the research and the generalizeability of findings is compromised because conclusions rely on manual coding methods applied to small samples.

This project will develop and use state-of-the-art computerized textual analysis methods to study the properties and usefulness of financial narratives for a comprehensive sample of UK disclosures published between 2003 and 2016. While researchers are already using these methods to study disclosures made by US companies, problems accessing digital PEAs and annual reports coupled with inconsistent document structure has hindered computerized analysis of UK financial narratives and skewed research agendas away from studying UK reporting outcomes. This project will shine much needed light on two key aspects of UK narrative reporting. The work will provide the first large sample analysis of PEAs narratives.

The project will also examine a set of contemporary policy-relevant themes relating to the content and structure of UK annual reports. Software tools and datasets from the project will also create new opportunities for the research community.

Policymakers are facing pressure to adopt evidenced-based approaches to regulation. While the FRC is committed to conducting impact and evaluation analyses, it is reliant on a relatively small team of research staff to undertake such work, much of which involves manual collection and analysis of unstructured data. The labour-intensive nature of the work inevitably yields results that are hard to generalize and constrains the scope of the FRC's work. As well as examining novel and policy-relevant research questions, this project will embed computerized text analytics methods in the FRC's formal policymaking processes. The methods will complement existing approaches by facilitating lower cost and more comprehensive assessments of regulatory changes and emerging issues in narrative reporting.

Planned Impact

Who will benefit from the work?
The project will deliver economic and societal benefits as well as contributing to academic research.

The work involves co-funded and co-produced research with the UK financial reporting regulator, the Financial Reporting Council (FRC). The work seeks to enhance policymaking in corporate governance and financial reporting by: reviewing a key unregulated aspect of corporate reporting in the form of preliminary earnings announcements (PEAs) to determine the need or otherwise for regulatory guidance; evaluating the impact of recent developments in annual report narratives; and embedding large-sample textual analysis methods in the FRC's policymaking toolkit.

Other bodies with links to financial reporting are also expected to benefit from project outputs including the UK Investor Relations Society (UK IRS) and the Institute of Chartered Accountants in England and Wales (ICAEW), the European Financial Reporting Advisory Group (EFRAG), and the International Integrated Reporting Council (IRRC).

The academic community will also benefit from the project. Large-sample empirical research on corporate narratives is skewed heavily toward the US due in part to the ease with which financial narratives can be accessed and processed automatically in that market. This project will create new resources, insights, and agendas for researchers generally and UK researchers in particular.

What form will the benefits take?
The research will enhance policymaking through two ex ante impact assessments of prevailing financial reporting practice. First, we will undertake the first systematic analysis of the properties and economic impact of PEA commentaries as a basis for evaluating the need or otherwise for the FRC to issue regulatory guidance. (PEAs are largely unregulated in the UK, creating variation in practice and scope for both informative reporting and obfuscation.) Second, we will provide large-sample evidence on emerging trends in unregulated aspects of annual report narratives as a basis for identifying both best practice and areas where regulatory guidance may be required. We also expect these findings to be of interest to other bodies involved in financial reporting including UK IRS, ICAEW, EFRAG and IIRC.

The project will also contribute to FRC policymaking activities by providing comprehensive post-implementation reviews of recent developments in annual reporting. (The FRC is currently restricted to conducting small sample manual post-implementation reviews that are costly to produce and hard to generalise.)

Coincident with this instrumental impact, the project will also deliver capacity-building impact to policymaker and academic communities. For the policymaker community, the work will embed large sample textual analysis and big data methods in the FRC's policy toolkit, empowering it to conduct comprehensive, timely, and low cost analyses of UK firms' narrative reporting practices as part of its surveillance and post-implementation review activities (where only small sample manual work is currently possible). Training and documentation to support software and methods will enable FRC colleagues to harness the potential of these resources and ensure significant legacy benefits. Datasets of financial narratives will also enhance contemporaneous and future evidence-based policymaking activities.

For the academic community, the project will build sustainable UK-focused research capacity by: developing software resources that facilitate automatic retrieval and analysis of corporate financial narratives; providing new training opportunities in textual analysis for researchers; generating datasets summarizing the properties of narrative commentaries; and stimulating UK-focused research agendas in hitherto unexplored areas such as document structure, content integration, and data presentation.

Publications

10 25 50
 
Description Comissioned research project
Amount £31,200 (GBP)
Organisation Financial Conduct Authority (FCA) 
Sector Public
Country United Kingdom
Start 04/2018 
End 08/2018
 
Title Annual Reports Key Sections Corpora 2003 to 2017 
Description UK Annual Reports Key Sections Plain text content extracted from an initial sample of 31,464 annual reports published between January 2002 and December 2017 by firms listed on the London Stock Exchange (LSE). Annual reports provided as PDF files are processed using the CFIE-FRSE tool downloadable from https://github.com/drelhaj/CFIE-FRSE and described in the companion paper available at http://ssrn.com/abstract=2803275. The tool processed 26,284 reports from the initial sample (83.5%). The final sample includes reports published by financial and non-financial firms listed on either the LSE Main Market or the Alternative Investment Market (AIM). The document table of contents (TOC) forms the basis of extraction for 15,883 reports (approximately 60%); pre-existing document bookmarks are used to process the remaining 10,401 reports. The CFIE-FRSE tool partitions annual reports into the "front-end" narratives component and the "back-end" financials component (including the auditor's report, mandatory financial statements and associated footnotes, and miscellaneous disclosures). We further partition the narratives component into a set of commonly occurring annual report sections that feature prominently in prior research. These narrative subsections (together with the auditor's report) are numbered 1-12 and described in more detail in the following table. Text extracts are provided by report calendar year in separate files of one-million words for each core section 1-12. All extracted content is provided for the pooled set of reports processed using TOC (N = 15,883) to ensure classification consistency across reports. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact None to date 
 
Description Detecting and Disrupting Misleading Statements 
Organisation Financial Conduct Authority (FCA)
Country United Kingdom 
Sector Public 
PI Contribution Confidential
Collaborator Contribution Confidential
Impact None to date
Start Year 2017
 
Title CFIE Final Report Structure Extractor 
Description The tool extracts text from UK annual reports published as PDF files by firms listed on the London Stock Exchange. The current version (2.0) of the tool is an update of a beta version previously available at https://drelhaj.github.io/CFIE-FRSE/. The tool retains the structure of the disclosures provided in the PDF annual report. The tool also classifies sections into generic categories to facilitate temporal and cross-sectional comparisons 
Type Of Technology Webtool/Application 
Year Produced 2019 
Open Source License? Yes  
Impact The previous version of the tool is being used widely by academic researchers. The new version of the tool has been used by the research team to support collaborative research with the Financial Reporting Council to explore disclosure of alternative performance measures. 
URL https://github.com/drelhaj/CFIE-FRSE-2019-Runnable
 
Description COLLABORATIONS BETWEEN LINGUISTICS AND THE PROFESSIONS 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact ESRC-funded event at Lancaster University (organised by the Centre for Corpus Approaches in Social Science). A free event exploring interactions between linguists and private-sector organisations help on 4-6 March 2019. A series of invited speakers from academia and business discussed experiences, challenges and opportunities in areas including publishing, IT, forensic analysis, organisational culture, marketing, financial reporting, and language teaching, learning and assessment. My talk reviewed ongoing work in the area of financial discourse and collaborations with financial market partners including Financial Conduct Authority and Financial Reporting Council.
Year(s) Of Engagement Activity 2019
URL http://cass.lancs.ac.uk/mycalendar-events/?event_id1=65
 
Description Center for Financial Reporting and Auditing Workshop "Natural Language Processing in Financial Markets" 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact 1-day workshop organised by ESMT Berlin with the aim of bring practitioners and academics together to discuss the role of natural language processing (NLP) in financial reporting research and prcatice. The audience comprised a mix of academics and financial market practitioners. The total number of attendees was approximately 60. The workshop comprised 5 sessions plus a panel discussion. The keynote academic presentation was delivered by Steven Young (Lancaster). The keynote practitioner presentation was delivered by Ryan Lafond (Deputy Chief Investment Officer at Algert Global LLC). Presentations and discussions focused on how accounting research and financial market prcatice can make better use of NLP technology. Follow-up discussions with KPMG have taken place with a view to identifying opportunities for collaboration in the area of sustainable reporting.
Year(s) Of Engagement Activity 2018
URL https://www.esmt.org/faculty-research/centers-chairs-and-institutes/center-financial-reporting-and-a...
 
Description Financial Accounting Workshop 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact The aim of my session to review ways that academic accounting research can inform regulatory activities. I identified opportunities for researchers and discussed barriers to progress (and how they might be overcome). The talk was part of a one-day workshop organised by Bristol University. The event attracted approximately 30 participants including academic faculty, PhD students, and representatives from the accounting profession (International Accounting Standards Board). My session involved a presentation followed by discussion. Participants explored a range of issues regarding engagement activities including collaborative research opportunities, contracting, and the tension between impact versus publications in the context of academic progression.
Year(s) Of Engagement Activity 2018
 
Description Presentation to Senior Management at FCA 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact Update on results of ongoing research collaboration involving development of methods to predict financial reporting violations. The aim of the presentation was to update senior FCA managers on results achived to date and the next steps in the model builing process.
Year(s) Of Engagement Activity 2018
 
Description Talk to London Text Analytics 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact Part of a one-day workshop hosted at Accenture's London office. The event was organised by Tony Russell-Rose (UXLabs), Dyaa Albakour (Signal), and Udo Kruschwitz (University of Essex). Dr El-Haj (Senior RA on project) delivered one of the two invited presentations. The talk highlighted the importance of automatic extraction and textual analysis of financial disclosures for their contribution to corporate success. Textual disclosures help to clarify issues obscured by complex accounting methods, contextualise financial results and summarise key elements of business activity that are difficult to quantify. The extraction and textual analysis of these disclosures combine with natural language processing and corpus linguistics to form the Financial Narrative Processing (FNP). The audience comprised 47 text analysis practitioners, academics, and PhD students. Several follow-up requests were received from text analysis practitioners regarding software tools developed by the research team.
Year(s) Of Engagement Activity 2018
URL https://www.meetup.com/en-AU/textanalytics/events/252152599/
 
Description Workshop on textual analysis 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Workshop on Textual Analysis Methods in Accounting and Finance
Lancaster University Management School
Programme
Day 1: 12 September
11.00-11.30 Introduction and welcome
11.30-12.30 Session 1 Overview of textual analysis literature in accounting and finance
The aim of this session is to provide participants with an overview of extant research on textual analysis in the accounting and finance literature. We will focus on the proposed benefits of automated analysis of text and evaluate extant research against these perceived advantages. A key conclusion that will emerge from the review is that prior research is limited in scope and fails to deliver many of the suggested benefits. A critical theme informing the remainder of the workshop is that automated analysis is not a "quick fix" replacement for close manual reading by domain experts: most advanced applications of computational methods rely on significant manual reading for training and validation.
Mention preprocessing and signpost coverage to Session 7 (but this is something that needs to be done for all text applications)
12.30-13.30
13.30-15.00 Session 2 Text extraction: Methods and pitfalls
Automated text retrieval is the starting point for most large-sample applications of textual analysis in accounting and finance. This session will provide general guidelines on the text retrieval process, as well as hands-on experience with retrieving: 10-K annual report text (including harvesting documents from EDGAR) using python and R scripts; U.K. annual report narratives published as PDF files using the CFIE's java-based annual report tool; and U.K. earnings announcement narratives using the CFIE's java-based PEA tool.
15.00-15.15
15.15-17.15 Session 3 Readability and tone: Methods and critique
Readability and tone (sentiment) are the two most commonly analysed features of financial market text. This session will review and critique methods used in the extant literature to measure readability and tone. We will demonstrate the problems of relying on standard readability metrics such as Fog to capture sophisticated narrative features such as complexity and understandability. We will also review the various approaches for measuring tone, ranging from simple wordlists to more advanced machine learning methods. A key conclusion that will emerge from this review is that simple measures of readability and tone provide limited scope for generating significant new insights in the literature.
18.00-19.30 Dinner & research presentation: Measuring Tone and Attribution
A buffet dinner followed by a discussion of ongoing research assessing the relative accuracy of wordlists and machine learning for measuring the tone of performance sentences and the presence of managerial self-attribution bias in earnings announcements.

Day 2: 13 September
09.00-10.30 Session 4 Constructing and using wordlists
Wordlists are the most common approach to analysing financial text in the accounting and finance literature. This session discusses the advantages and weaknesses of using a wordlist approach to study financial text, reviews the most common wordlists employed in the literature, and considers some of the methods used in conjunction with wordlists to improve their classification performance. The session will also explain the different approaches to constructing wordlists, together with the strengths and weaknesses of each approach.
10.30-11.00
11.00-12.30 Session 5 Introduction to machine learning
While machine learning forms the basis for a large proportion of research in the field of natural language processing, its uptake in accounting and finance is more limited. This session provides a board introduction to the field of machine learning methods, including both supervised and unsupervised approaches. Different aspects of machine learning and their relation will be explained including classification, named entity recognition, summarization, and topic modelling.
12.30-13.30
13.30-15.00 Session 6 Machine learning applications: Classification
This session provides a hands-on introduction to classification using machine learning methods. Participants will use the Weka toolkit (https://www.cs.waikato.ac.nz/~ml/weka/downloading.html) to construct and evaluate a model for identifying fraudulent financial reporting using 10-K filings. Results and insights from the analysis will be used to highlight weaknesses in the extant literature and identify opportunities for future research.
15.00-15.15
15.15-17.15 Session 7 Machine learning applications: Topic modelling
Several recent papers in the accounting literature have employed topic modelling methods such as Latent Dirichlet Allocation (LDA) to identify topics in financial text (e.g., Dyer et al. 2017). This session provides a hands-on introduction to topic modelling. Participants will use MALLET (http://mallet.cs.umass.edu/index.php) to extract topics from an annual report corpus. In addition to walking participants through the pracitcalites of the modelling process, the session will highlight the many problems associated with topic modelling and discuss alternative approaches to the content analysis problem.
18.00-19.30 Dinner & research presentation: Characteristics of Award Winning Annual Reports
A buffet dinner followed by a discussion of ongoing research that employs corpus methods to isolate the distinguishing features of high quality annual reports as proxied by reports shortlisted for a narrative reporting award in the U.K.

Day 3: 14 September
09.00-10.30 Session 8 Introduction to corpus linguistics
This session provides an introduction to the theory and core methods underpinning the systematic analysis of a large body of text (i.e., a corpus). The session will cover the following themes: introduction to basic corpus linguistic concepts; presentation of different corpora types and examples; methodology for corpus design, compilation, and processing; corpus annotation, and examples of annotated data; basic resources and corpus analysis tools; examples from the literature of using corpus methods to analyse analysis of financial discourse
10.30-11.00
11.00-13.00 Session 9 Applied corpus methods: Tools and techniques
This session provides hands-on experience of corpus analysis. The session will consist of two parts. Part 1 will introduce the corpus that will form the basis of our analysis (Brexit narratives in annual reports of UK financial firms), along with the AntConc software (Anthony 2014) for corpus analysis. In Part 2, participants will use the AntConc concordancer to analyse a small dataset and perform corpus tasks including: extracting word lists; finding collocates; and searching for n-grams and keywords. The session will conclude with a discussion of the insights gained from analysing the corpus.
13.00-14.00 and workshop ends
14.00-15.30 Optional surgery session for PhD students seeking feedback on research proposals and ongoing work involving analysis of text
Year(s) Of Engagement Activity 2018