Analysing Narrative Aspects of UK Preliminary Earnings Announcements and Annual Reports: Tools and Insights for Researchers and Regulators

Lead Research Organisation: Lancaster University
Department Name: Accounting & Finance

Abstract

The quality of information provided to investors by corporate management in publicly traded companies is a matter of central importance to financial market participants. Narrative commentaries represent an increasingly significant component of financial communications. While financial narratives in the UK are shaped in part by prevailing regulations, senior management enjoys significant discretion over the content, structure and presentation of these disclosures. The informativeness of financial narrative disclosures and the way management apply their reporting discretion are key questions for academics and policymakers.

Partnering with the UK body responsible for promoting high quality corporate governance and financial reporting - the Financial Reporting Council (FRC) - this interdisciplinary project will combine expertise from accounting with state-of-the-art methods from computational linguistics to examine two key elements of financial disclosure. The first aspect is preliminary earnings announcements (PEAs), which arguably represent the most important disclosure in UK firms' annual reporting calendar. The second aspect is the annual report to shareholders, which forms the largest single recurring disclosure commitment for management.

Two opposing perspectives exist on corporate narrative disclosures. On the one hand, proponents argue that narratives provide information beyond that contained in financial data. On the other hand, opponents claim that management exploit the discretion embedded in narrative reporting to obfuscate or present a biased representation of actual performance. While extant work on UK annual report and PEA narrative disclosures provides evidence consistent with both perspectives, both the scope of the research and the generalizeability of findings is compromised because conclusions rely on manual coding methods applied to small samples.

This project will develop and use state-of-the-art computerized textual analysis methods to study the properties and usefulness of financial narratives for a comprehensive sample of UK disclosures published between 2003 and 2016. While researchers are already using these methods to study disclosures made by US companies, problems accessing digital PEAs and annual reports coupled with inconsistent document structure has hindered computerized analysis of UK financial narratives and skewed research agendas away from studying UK reporting outcomes. This project will shine much needed light on two key aspects of UK narrative reporting. The work will provide the first large sample analysis of PEAs narratives.

The project will also examine a set of contemporary policy-relevant themes relating to the content and structure of UK annual reports. Software tools and datasets from the project will also create new opportunities for the research community.

Policymakers are facing pressure to adopt evidenced-based approaches to regulation. While the FRC is committed to conducting impact and evaluation analyses, it is reliant on a relatively small team of research staff to undertake such work, much of which involves manual collection and analysis of unstructured data. The labour-intensive nature of the work inevitably yields results that are hard to generalize and constrains the scope of the FRC's work. As well as examining novel and policy-relevant research questions, this project will embed computerized text analytics methods in the FRC's formal policymaking processes. The methods will complement existing approaches by facilitating lower cost and more comprehensive assessments of regulatory changes and emerging issues in narrative reporting.

Planned Impact

Who will benefit from the work?
The project will deliver economic and societal benefits as well as contributing to academic research.

The work involves co-funded and co-produced research with the UK financial reporting regulator, the Financial Reporting Council (FRC). The work seeks to enhance policymaking in corporate governance and financial reporting by: reviewing a key unregulated aspect of corporate reporting in the form of preliminary earnings announcements (PEAs) to determine the need or otherwise for regulatory guidance; evaluating the impact of recent developments in annual report narratives; and embedding large-sample textual analysis methods in the FRC's policymaking toolkit.

Other bodies with links to financial reporting are also expected to benefit from project outputs including the UK Investor Relations Society (UK IRS) and the Institute of Chartered Accountants in England and Wales (ICAEW), the European Financial Reporting Advisory Group (EFRAG), and the International Integrated Reporting Council (IRRC).

The academic community will also benefit from the project. Large-sample empirical research on corporate narratives is skewed heavily toward the US due in part to the ease with which financial narratives can be accessed and processed automatically in that market. This project will create new resources, insights, and agendas for researchers generally and UK researchers in particular.

What form will the benefits take?
The research will enhance policymaking through two ex ante impact assessments of prevailing financial reporting practice. First, we will undertake the first systematic analysis of the properties and economic impact of PEA commentaries as a basis for evaluating the need or otherwise for the FRC to issue regulatory guidance. (PEAs are largely unregulated in the UK, creating variation in practice and scope for both informative reporting and obfuscation.) Second, we will provide large-sample evidence on emerging trends in unregulated aspects of annual report narratives as a basis for identifying both best practice and areas where regulatory guidance may be required. We also expect these findings to be of interest to other bodies involved in financial reporting including UK IRS, ICAEW, EFRAG and IIRC.

The project will also contribute to FRC policymaking activities by providing comprehensive post-implementation reviews of recent developments in annual reporting. (The FRC is currently restricted to conducting small sample manual post-implementation reviews that are costly to produce and hard to generalise.)

Coincident with this instrumental impact, the project will also deliver capacity-building impact to policymaker and academic communities. For the policymaker community, the work will embed large sample textual analysis and big data methods in the FRC's policy toolkit, empowering it to conduct comprehensive, timely, and low cost analyses of UK firms' narrative reporting practices as part of its surveillance and post-implementation review activities (where only small sample manual work is currently possible). Training and documentation to support software and methods will enable FRC colleagues to harness the potential of these resources and ensure significant legacy benefits. Datasets of financial narratives will also enhance contemporaneous and future evidence-based policymaking activities.

For the academic community, the project will build sustainable UK-focused research capacity by: developing software resources that facilitate automatic retrieval and analysis of corporate financial narratives; providing new training opportunities in textual analysis for researchers; generating datasets summarizing the properties of narrative commentaries; and stimulating UK-focused research agendas in hitherto unexplored areas such as document structure, content integration, and data presentation.

Publications

10 25 50
 
Description A large fraction of information published by companies to inform stakeholders about economic and social impact takes the form of qualitative (narrative) information. Researchers (and to a large degree financial market participants) have overlooked such disclosures because they are hard to process other than via intensive manual reading. The volume of narrative information that large companies now disclose makes sole reliance on manual analysis unfeasible.

This project aims to provide insights in the properties and economic consequences of companies' qualitative disclosures using computerized methods to read and process the information. A growing body of work has begun to explore research questions using qualitative disclosures made by companies listed on US stock markets. Access to data and ease of automated processing make the US a natural venue to undertake such research.

Comparable research in other (non-US) markets is largely non-existent. The lack of evidence for markets such as the UK is important because reporting structures and regulations differ substantially from those prevailing in the US, meaning that it is an open question whether extant insights are transferrable to a UK setting. Further, the nature of corporate reporting in the UK provides opportunities to examine novel questions and extend our broad understanding of qualitative financial reporting.

Significant technical barriers prevent researchers from conducting work on UK financial narratives: corporate disclosures are hard to collect on a large sample basis and the file type (often PDF) limits the scope for harvesting and processing text in a structured manner. The same barriers prevent regulators and industry professionals from analysing such data on a large scale.

Our project develops resources to facilitate analysis of UK corporate narrative disclosures on a large scale. We develop new software resources and datasets to support academic research in the area. We also work closely with policymakers and financial market participants to assist with the analysis of qualitative disclosures and provide relevant insights on the properties and usefulness of financial narratives. Non-academic users with whom we have worked during the project include the Financial Reporting Council (UK financial reporting regulator), the Financial Conduct Authority (UK financial market regulator), the Investor Relations Society (industry body representing IR professionals and promoting high quality corporate communication), RPMI Railpen (pension fund manager), and INQUIRE UK (body representing quantitative investment analysts working the UK).

The primary findings and contributions (F&C) of our research are summarised as follows:
F&C1. We analyse the narrative component of UK firms' preliminary earnings announcements (PEAs) using manual scoring methods and confirm that disclosures are characterised by the presence of significant self-attribution bias: management are more likely to take credit for positive results while attributing poor performance to external factors beyond their control.

F&C2. We test whether these insights resulting from careful manual analysis are reproducible using computerized scoring methods. We compare a range of automated scoring techniques from simple wordlists used in prior research to more sophisticated machine learning classifiers. We find that exclusive reliance on automated scoring methods cannot replicate our manual analysis. While automated methods do a reasonable job of detecting performance tone and identifying the type of attribution (internal versus external), none of the approaches we use are able to detect the presence of an attribution. Our result highlights both the opportunities and limitations associated with automated text scoring. We provide the first evidence of which we are aware that illustrates the scale of the measurement error problem resulting from the application of automated scoring methods. We conclude that in certain situations, manual content analysis remains an important research tool despite the growth in large sample automated methods. We view manual and computerised methods as complements rather than substitutes, and we propose a new strategy for analysis of qualitative disclosures that combines elements of manual scoring (to ensure precision) and automated scoring (to reduce research costs).

F&C3. We compare the content of narrative disclosures in firms' PEAs with disclosures made in the corresponding annual report (typically published 1-3 months later). Comparing content in these two disclosures is important because: (a) PEAs typically contain new (price sensitive) information whereas annual reports are viewed as providing a comprehensive review of information reported during the year via more timely sources; (b) surprisingly, disclosure regulations focus largely on the annual report, with the content of PEAs being largely unregulated. This situation creates an opportunity for management to report strategically by emphasising good news in the PEA (to increase share price) and deferring less positive interpretations of performance to the annual report in the hope it is overlooked. This is a research question that interests academics and policymakers alike. We find evidence suggesting that PEA narratives are more positive than their annual report counterparts despite discussing the same underlying economic performance. We also find some evidence of differences in content between the two narratives. Our findings provide the first evidence that management may use their reporting discretion to create a more favourable impression of performance in the PEA because this communication channel is more likely to influence share price.

F&C4. We also study the interplay between the tone and informativeness of narrative commentary in corporate press releases and the financial media. We find that the financial media serves an important external monitoring role by interpreting the tone of commentary in corporate announcements, which in turn moderates the impact of such commentary on market participants. We find that financial journalists moderate the tone of management commentary for both positive and negative news. The effect is much more pronounced however for positively toned corporate news, which is consistent with the view that management has incentives to overstate performance and recognising this fact, sceptical financial journalists discount excessively positive tone accordingly. Further, we find that journalist-adjusted tone provides incremental information to investors above and beyond the disclosures provided by management. Financial journalists, our results suggest, provide a useful filtering role on management optimism.

F&C5. We provide evidence on the properties of high quality annual reporting. Prior research uses a set of simple linguistic features such as readability and tone to measure the quality of annual report narratives. These measures have been criticised in the literature on several grounds. First, because they do not capture meaning in narrative disclosures, these measures are unlikely to provide deep insights into reporting quality. Second, the metrics do not align with practitioners' and policymakers' views on the factors that determine high quality annual reporting. We use a combination of machine learning methods and techniques from corpus linguistics to compare the properties of UK annual reports that win awards for quality with a matched sample of non-winning reports. We use machine learning techniques to identify core themes in the reports, while corpus linguistics methods are used to refine these themes to better reflect meaning in a financial reporting context. We then compare winning and non-winning reports to identify themes that distinguish the two groups statistically. We group themes into two broad categories: content and writing style. Content relates to the topics that management discuss such as strategy, business model, growth, etc. Writing style relates to the way information is presented and its effect on cognitive processing, and includes dimensions such as the amount of cross-referencing, use of grammar, use of negation, etc. Results reveal that award-winning reports are associated with more commentary on strategy and a higher incidence of writing styles that prior research links to better cognitive processing. Both our content and writing style metrics align with notions of reporting quality emphasised by accounting practitioners and policymakers. Tests reveal that our suite of content and writing style features are better than traditional metrics such as readability and tone at distinguishing award-winning reports from non-award-winning reports. Out-of-sample tests indicate that our suite of features can predict future award winners with a high degree of accuracy (> 75%). Ours is the first study of which we are aware to examine the linguistic properties of award-winning annual reports and to develop a model for measuring the quality of annual report discourse.

F&C6. We study the properties and impact of annual report commentary on strategy and value creation. Financial reporting regulators are placing increasing emphasis on the need for management do a better job of articulating their approach to creating and maintaining value for their external stakeholders. The UK is leading the way in this area with regulations introduced in 2010 and 2013 that require large companies to disclose information on strategy and business model in their annual report. Nevertheless, it remains an open question just how diligently management have responded to these reporting requirements and whether external stakeholders find the new disclosures useful. We use a range of methods from computational linguistics to study strategy-related disclosures. We find that, as expected, the number of companies discussing strategy in their annual report increased in response to requirements introduced in 2010 and 2014. We also show that this increase in disclosure improved the quality of companies' information environment by reducing the level of information asymmetry with stock market investors. We also find that requiring management to explain their value creation strategy reduces their focus on short-term earnings performance and increases discussion of non-earnings-based performance measures that correlate more closely with long-term value creation (e.g., customer satisfaction, R&D investment, operating efficiency, employee well-being, etc.). Our analysis supports the view that requiring management to articulate their long-term strategy can help reduce the focus on short-term results. On the other hand, further (ongoing) analysis suggests that the quality of mandated strategy-related discourse varies substantially across companies, with a material fraction of entities adopting a compliance approach to disclosure that involves providing bland, generic statements rather than detailed and rigorous analysis.

F&C7. Working in partnership with the Financial Reporting Council (FRC) and the Financial Conduct Authority (FCA), we are developing statistical models to predict the risk of financial misreporting. The costs of accounting scandals and unexpected corporate failures to investors and society more broadly are well known (e.g., Carillion). Financial market regulators monitor reporting quality proactively in an effort identify poor reporting practice and promote the liquidity and efficiency of financial markets. The FRC and FCA rely on manual scrutiny of reports to identify suspicious cases, along with referrals from stakeholders such as whistle-blowers and the media. The high cost of manual scrutiny means that monitoring activities are partial and limited to high-risk areas. The inevitable consequence of this approach is that some accounting scandals go undetected until it is too late to intervene and limit losses. Using proprietary data from the FCA and FRC, we have developed a prototype statistical model to predict the probability of misreporting based on the content of published annual reports. We use a combination of quantitative and qualitative data in our model. Our quantitative features comprise common financial reporting ratios that prior research shows can predict financial fraud and failure. Our qualitative features are derived from machine learning algorithms that detect the distinctive properties of financial discourse in the presence of misreporting. Results reveal that our model can predict financial misreporting out-of-sample and that the majority of predictive power comes from the qualitative features. We are currently working with the FCA and FRC to implement the model as an automated screening tool to improve monitoring efficiency by directing scarce manual resources to reports where the risk of financial wrongdoing is highest.

F&C8. We provide evidence on the link between the quality of UK annual report narrative disclosures and companies' cost of equity capital. Prior research predicts (and finds) a negative linear relation between disclosure quality and cost of capital: higher quality disclosures reduce information asymmetry and undiversifiable risk, which in turn feeds through to a lower cost of capital. Contrary to prior research, we hypothesize and test for a U-shaped relation between the cost of equity capital and the level of disclosure in annual report narratives. Our measure of annual report disclosure quality is a disclosure index constructed using a computerized method based on word counts. Consistent with our prediction, we find that the cost of equity capital is negatively associated with annual report disclosure at low levels of disclosure, while at high disclosure levels cost of capital increases with the provision of even more narrative commentary (consistent with uninformative clutter increasing processing costs and potentially reflecting management obfuscation). Additional analyses reveal that regulatory corporate reporting initiatives such as the UK Corporate Governance Code (2010) help to move disclosure levels toward the optimal level. Our analysis helps shed new light on the role of annual report disclosures (and the regulation thereof) in shaping companies' information environment.

F&C9. A general insight from our work is that accounting and finance research lags well behind best practice methods in computational linguistics. Large sample textual analysis is a relatively recent phenomenon and this early stage work is characterised by a reliance on naïve approaches to quantifying narrative content. Accounting and finance researchers appear slow to adopt cutting-edge methods from natural language processing and corpus linguistics, with the result that extant conclusions may be overstated. We call for researchers in accounting and finance to abandon their reliance on naïve strategies for measuring the content of financial narratives and look to research in computing and linguistics for guidance on ways to improve the robustness and rigour of their work.

F&C10. A lack of data and resources hinders research on UK corporate narratives. Narrative commentary in annual reports and corporate press releases is hard to extract in a structured manner; and the techniques for processing retrieved text require computing skills that most accounting research do not possess. We provide a suite of software tools, datasets and other resources to support academic and non-commercial research on UK financial narratives.

The main outputs (O) of the project consist of:
O1. A set of academic publications and working papers in preparation for submission to academic peer-reviewed journals. To date, the project has generated: six publications in ABS 3* journals (relating to F&C4, F&C9 and F&C10); one revise and resubmit at an ABS 4* journal (relating to F&C6); and four working papers scheduled for submission over the next 12 months (relating to F&C1, F&C2 and F&C5);

O2. Software, datasets, and methods to support research on UK corporate narratives (F&C10). Resources include: a software app to facilitate structured extraction of text from UK annual reports published as PDF files; a software app to facilitate extraction of management commentary from PEAs; annual report corpora for various key components including chair's letter to shareholders, management commentary, corporate governance statements, remuneration reports, commentary on environmental, social and governance factors, and risk reports; a dataset of disclosure scores for over 25,000 UK annual reports published between 2003 and 2018; a dataset of disclosure scores for over 10,000 PEAs released between 2007 and 2018; resources to clean raw text to support further analysis (e.g., removing redundant punctuation and special characters, identifying sentences, stemming, removing infrequent words, dealing with inconsistent use of hyphens, Named Entity Recognition); and machine learning resources to measure tone and attribution type.

O3. Improvements in UK research capacity through training. We have delivered two hands-on training workshops on textual analysis research in accounting and finance (70 attendees). Working in partnership with INQUIRE UK, we are scheduled to deliver a five-session workshop programme in textual analysis for up to 30 professional quantitative research analysts (April 2020). Finally, working in partnership with Universidade Católica Portuguesa (Porto), we are scheduled to deliver an introductory workshop on python programming for up to 30 postgraduate research students and early career researchers (July 2020).

O4. Contributions to financial reporting policy. Our work with colleagues at the FRC has informed policy and practice in several areas including earnings announcements (F&C3), reporting of alternative performance measures, and strategy-related management commentary (F&C6). We will continue working with the FRC over the next 12 months to embed our datasets and textual analysis resources within their decision-making processes, with a particular focus on supporting horizon scanning activities and post-implementation reviews.

O5. A prototype model of financial misreporting to support market-monitoring activities by the FCA and FRC (F&C7). Preliminary results demonstrate that the model has the ability to predict misreporting cases out-of-sample up to two-years before the start of the violation period. We will continue working closely with the FCA and FRC to embed the model within their normal market monitoring processes.

O6. A model of annual reporting quality (F&C5) that the Investor Relations (IR) Society are using to inform their annual awards process. The IR Society use the annual awards process to promote best practice in annual reporting and improve the overall quality of corporate communication to investors and other stakeholders. Our statistical model is enabling the IR Society to scan a larger number of reports proactively to find evidence of best practice.

O7. Our research on methods for extracting text from UK annual reports (F&C10) has informed internal development by Fidelity International of tools for analysing financial narratives in their global portfolios. Fidelity is using these resources to support its financial analysis and portfolio allocation decisions. Our contribution to Fidelity's work is acknowledged in a letter of support that forms part of an impact case for the forthcoming Research Excellence Framework.
Exploitation Route We see several avenues in which the results and outcomes of this project will evolve and impact users communities:
1. We will continue to maintain and update our annual report and earnings announcement databases to support academic research UK financial narratives;

2. We are continuing to work with the FRC to embed textual analysis methods within their policymaking activities;

3. We are continuing to work with the FCA and FRC to refine and implement our model for predicting financial misreporting, with the aim of improving the efficiency of their market monitoring activities;

4. We will continue working closely with the IR Society to identify annual reports that represent exemplars of best practice;

5. We are working with Perfect Information (PI) to develop and extend our software tools for structured extraction of text from annual reports and other corporate documents. PI are keen to leverage their industry-leading data resources to provide users with the opportunity to query documents and retrieve tailored information requests. We have signed a contract with PI to share our source code on a short-term basis to help their digital team understand our methods and assess the scope for integrating and extending our methods. PI have signalled an interest in working with a consortium of universities headed by Lancaster. Discussions are ongoing and the probability of undertaking collaborative work in future seems high;

6. We are working with RPMI Railpen to develop methods for forecasting future returns using financial narratives. We have organised a nine-month internship for one of our project research assistants (Ferdinand Bratek), ending September 2020. We will be working closely with Ferdinand and the team at RPMI to develop new text-based models to support financial analysis and asset allocation decisions;

7. We are in discussions with the CIPD to build a people/workforce reporting framework for corporate reporting that unites various areas of current thinking including the PLSA's (2017) Hidden Talent framework, FRC Lab's (2020) report on Workforce Reporting, the CIPD's People Risk reporting framework, and the CIPD Good Work/UK Working Lives Job quality measures. We have a meeting scheduled for mid-March to discuss collaboration opportunities and next steps.
Sectors Digital/Communication/Information Technologies (including Software),Financial Services, and Management Consultancy

URL http://ucrel.lancs.ac.uk/cfie/
 
Description Software, datasets, and methods developed and disseminated to the academic research community are supporting new research studies on the properties and economic impact of financial narratives. Our work with colleagues at the FRC has informed policy and practice in several areas including: evidence on the properties of narrative commentary in earnings announcements that has been cited ion committee meetings to evaluate the need for regulation; an analysis of reporting trends for alternative performance measures (APM) that signalled a general improvement in reporting but also highlighted systematic weaknesses where further guidance is required; company-specific evidence on compliance with reporting guidelines for APMs that has been used by the FRC to target companies that persistently fail to follow reporting guidelines; and evidence on the properties and impact of strategy-related management commentary that has informed thinking at the FRC Lab. We have developed a prototype model of financial misreporting to support market-monitoring activities by the FCA and FRC. Preliminary results demonstrate that the model has the ability to predict misreporting cases out-of-sample up to two-years before the start of the violation period. We are working closely with the FCA and FRC to embed the model within their normal market monitoring processes. A model of annual reporting quality that the Investor Relations (IR) Society are using to inform their annual awards process. The IR Society use the annual awards process to promote best practice in annual reporting and improve the overall quality of corporate communication to investors and other stakeholders. Our statistical model is enabling the IR Society to scan a larger number of reports proactively to find evidence of best practice. Our research on methods for extracting text from UK annual reports has informed development by Fidelity International of in-house tools for analysing financial narratives in their global portfolios. Fidelity is using these resources to support its financial analysis and portfolio allocation decisions. Our contribution to Fidelity's work is acknowledged in a letter of support that forms part of an impact case for the forthcoming Research Excellence Framework.
First Year Of Impact 2018
Sector Financial Services, and Management Consultancy,Other
Impact Types Economic,Policy & public services

 
Description Enhancing the quality of disclosures on alternative performance measures
Geographic Reach National 
Policy Influence Type Citation in systematic reviews
Impact Our evidence was commissioned by the Financial Reporting Council (FRC) to inform practice on reporting alternative performance measures (APMs) and provide input on whether further regulatory guidance in the area was required. Our analysis indicates that although the transparency with with APMs has increased following guidance issued by the FRC in 2017, problem areas still remain. Our analysis also highlighted specific with particularly opaque reporting practices that the FRC compliance team subsequently contacted directly to request improvements in reporting quality.
 
Description Comissioned research project
Amount £31,200 (GBP)
Organisation Financial Conduct Authority (FCA) 
Sector Public
Country United Kingdom
Start 04/2018 
End 08/2018
 
Title Python module for machine learning classification of performance sentences in earnings announcements 
Description Use of machine learning classifiers to quantify content in financial discourse is still in its infancy in the mainstream accounting and finance literature. The techniques are not well understood and researchers face high set-up costs due to the technical knowledge and programming skills required for implementing machine learning classifiers. Extant research is limited to using Naïve Bayes classifiers and the small set of studies that apply these techniques do not make their code and training data available publicly. This lack of transparency hampers progress and reduces replicability. We use of suite of supervised machine learning algorithms including Naïve Bayes, random forest, support vector machines, and a neural network to measure linguistic properties of management commentary provided in firms' preliminary earnings announcements. We measure the following three features of management performance-related commentary: tone (positive versus negative), the presence of at least one attribution by management explaining the reasons for reported performance, and the type of the attribution (i.e., relating to internal factors such as strategy, cost cutting, product quality etc. versus external factors such as macroeconomic conditions, consumer behaviour, extreme weather, etc.). Our training sample comprises a large set of manually annotated sentences for firms' earning announcements. We develop python code to train our classifiers and also to apply the resulting models out-of-sample. We provide the python code along with our training dataset to help researchers classify tone, attribution and attribution type on other datasets, and to refine our classifiers by adding additional features. The python code and annotated dataset are provided with step-by-step guidelines to help researchers implement and tweak our machine learning classifiers. As far as we are aware, this is first resource of this type to be developed and disseminated to accounting and finance researchers. 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? Yes  
Impact No data available as yet 
URL https://github.com/apmoore1/pea_classification
 
Title Annual Reports Key Sections Corpora 2003 to 2017 
Description UK Annual Reports Key Sections Plain text content extracted from an initial sample of 31,464 annual reports published between January 2002 and December 2017 by firms listed on the London Stock Exchange (LSE). Annual reports provided as PDF files are processed using the CFIE-FRSE tool downloadable from https://github.com/drelhaj/CFIE-FRSE and described in the companion paper available at http://ssrn.com/abstract=2803275. The tool processed 26,284 reports from the initial sample (83.5%). The final sample includes reports published by financial and non-financial firms listed on either the LSE Main Market or the Alternative Investment Market (AIM). The document table of contents (TOC) forms the basis of extraction for 15,883 reports (approximately 60%); pre-existing document bookmarks are used to process the remaining 10,401 reports. The CFIE-FRSE tool partitions annual reports into the "front-end" narratives component and the "back-end" financials component (including the auditor's report, mandatory financial statements and associated footnotes, and miscellaneous disclosures). We further partition the narratives component into a set of commonly occurring annual report sections that feature prominently in prior research. These narrative subsections (together with the auditor's report) are numbered 1-12 and described in more detail in the following table. Text extracts are provided by report calendar year in separate files of one-million words for each core section 1-12. All extracted content is provided for the pooled set of reports processed using TOC (N = 15,883) to ensure classification consistency across reports. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact None to date 
 
Title UK annual report narratives dataset: CFIE-FRSE May 2019 
Description This file contains a dataset of summary textual features for a large sample of UK annual report narratives published over the period 2002-2017, and extracted and processed using the CFIE-FRSE app described in El-Haj et al. (2019), Retrieving, Classifying and Analysing Narrative Commentary in Unstructured (Glossy) Annual Reports Published as PDF Files Accounting and Business Research (DOI/MS ID: 10.1080/00014788.2019.1609346 /). Data are provided in three file formats: csv, SAS and Stata. Details of the sampling procedure, variable definitions and method for matching to Thomson Reuters Datastream are also provided. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact Data being used widely within the academic community 
URL http://ucrel.lancs.ac.uk/cfie/
 
Description Detecting and Disrupting Misleading Statements 
Organisation Financial Conduct Authority (FCA)
Country United Kingdom 
Sector Public 
PI Contribution Confidential
Collaborator Contribution Confidential
Impact None to date
Start Year 2017
 
Description Narrative Reporting in Preliminary Earnings Announcements 
Organisation Financial Reporting Council
Country United Kingdom 
Sector Public 
PI Contribution The research team are undertaking empirical analysis of a large sample of UK preliminary earnings announcements (PEAs) over the period 2008-2017. The work involves developing algorithms to extract and analyse narrative commentary from PEAs, and to develop software tools for use by colleagues at the Financial Reporting Council (FRC).
Collaborator Contribution The FRC is keen to provide evidence on a range of issues related to reporting financial performance in general, and performance reporting in PEAs specifically. FRC colleagues are working with the research team to identify research questions and develop research designs to address questions of interest. FRC colleagues also provide feedback on early stage results and help to disseminate research findings within the financial reporting community (through conferences and references in FRC publications).
Impact None to date
Start Year 2017
 
Description Natural language processing (NLP) workshop delivered in conjunction with INQUIRE UK (http://www.inquire.org.uk/). 
Organisation INQUIRE
Sector Charity/Non Profit 
PI Contribution Bringing an understanding of NLP methods to investment practitioners interested in using textual analysis as part of their investment strategy. The programme is designed and delivered by Young (Lancaster University) in London at Lancaster University premises (The Work Foundation)
Collaborator Contribution The programme is designed to introduce participants to the key steps of textual analysis, from extraction and preprocessing through to common machine learning methods and their applications. INQUIRE UK are responsible for marketing, registration and all other administrative activities, and provide a travel and subsistence budget for Young
Impact Ongoing
Start Year 2020
 
Description Python training workshop 
Organisation Catholic University of Portugal
Department Catolica Oporto Business School
Country Portugal 
Sector Academic/University 
PI Contribution 3-day introductory workshop to python programming delivered at Universidade Católica Portuguesa in partnership with Lancaster University Management School. Programme design and teaching is co-delivered by Young (Lancaster)
Collaborator Contribution 3-day introductory workshop to python programming delivered at Universidade Católica Portuguesa in partnership with Lancaster University Management School. Programme design and teaching is co-delivered by Alves (Universidade Católica Portuguesa). Programme delivered at Universidade Católica Portuguesa and all administrative support provided by Universidade Católica Portuguesa.
Impact Ongoing
Start Year 2020
 
Title CFIE Final Report Structure Extractor 
Description The tool extracts text from UK annual reports published as PDF files by firms listed on the London Stock Exchange. The current version (2.0) of the tool is an update of a beta version previously available at https://drelhaj.github.io/CFIE-FRSE/. The tool retains the structure of the disclosures provided in the PDF annual report. The tool also classifies sections into generic categories to facilitate temporal and cross-sectional comparisons 
Type Of Technology Webtool/Application 
Year Produced 2019 
Open Source License? Yes  
Impact The previous version of the tool is being used widely by academic researchers. The new version of the tool has been used by the research team to support collaborative research with the Financial Reporting Council to explore disclosure of alternative performance measures. 
URL https://github.com/drelhaj/CFIE-FRSE-2019-Runnable
 
Description 2019 Summer Program in Accounting Research (SPAR) Doctoral Program, Current Issues in Empirical Financial Reporting Research 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact 4 × 75-minute sessions introducing students to methods and pitfalls of textual analysis in accounting research. Themes of the four sessions were as follows:
1. Overview of textual analysis in accounting, Tuesday 30 July 2019, 14.00-15.15
2. Bag-of-words methods, Tuesday 30 July 2019, 15.45-17.00
3. Introduction to natural language processing, Wednesday, 31 July 2019, 13.00-14.15
4. Introduction to corpus linguistics methods, Wednesday, 31 July 2019, 14.45-16.00
Year(s) Of Engagement Activity 2019
URL https://www.whu.edu/fakultaet-forschung/finance-accounting-group/internationale-rechnungslegung/spar...
 
Description 2nd Annual European Quantitative and Macro Investment Conference, hosted by Wolfe Research 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The one-day conference featured:
• How to best harness alpha from the latest alternative and unstructured Big Data sources, for both stock selection and global macro forecast
• How to take advantage of machine learning and artificial intelligence to identify market anomalies and investment opportunities
• How fundamental/discretionary PMs/analysts can take advantage of alternative data and advanced analytics in their investment process
• Pragmatic and practical applications from premier investment management firms and asset owners
• Demonstrations from data vendors providing the most unique and interesting data contents
• Great networking opportunities for portfolio managers, research analysts, asset owners, and academic researchers

The programme for the conference was:
8.30 - 9:15am: A Credit-Based Theory of the Currency Risk Premium
Pasquale Della Corte, Associate Professor of Finance, Imperial College London

9:15 - 10:30am Best Short
Dr. Robert Kosowski, Associate Professor of Finance, Imperial College London - School of Business

10:30 - 11:15pm Systematic Incorporation of ESG/SRI into the Investment Process
Panel discussion

11:30 - 12: 15pm Large, Global Asset Management Firms and the Credit Default Swap Market
Giovanni Calice, Senior Lecturer, Loughborough University

1:15 - 2:00pm Global REITs and Property Stock Selection Models
Yin Luo, Quantitative Analysis, Economics and Strategy, and Vice Chairman - Wolfe Research

2:00 - 2:45pm Learning Tone and Attribution for Financial Text Mining
Steven Young, Professor of Accounting, Lancaster University Management School

2:45 - 3:30pm How Active Managers Can Best Utilize Alternative Data and Quantitative Techniques
Panel discussion

3:45 - 4:30pm Measuring Horizon-Specific Systematic Risk via Spectral Betas
Andrea Tamoni, Assistant Professor of Finance, London School of Economics and Political Science

4:30 - 5:15pm The Term Structure of Sovereign CDS and the Cross-Section Exchange Rate Predictability
Abalfazl Zareei, Assistant Professor of Finance, Stockholm Business School

5:15 - 5:30pm Concluding Remarks
Yin Luo, Quantitative Analysis, Economics and Strategy, and Vice Chairman - Wolfe Research
Year(s) Of Engagement Activity 2019
 
Description 2nd ESRC Workshop on Textual Analysis in Accounting and Finance 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact To complete
Year(s) Of Engagement Activity 2019
 
Description BBC Radio Lancashire interview 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact 10-minute interview on 15 August 2019 at 16.35. the focus was on our new app for analyzing textual content in UK corporate reports.
Year(s) Of Engagement Activity 2019
 
Description COLLABORATIONS BETWEEN LINGUISTICS AND THE PROFESSIONS 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact ESRC-funded event at Lancaster University (organised by the Centre for Corpus Approaches in Social Science). A free event exploring interactions between linguists and private-sector organisations help on 4-6 March 2019. A series of invited speakers from academia and business discussed experiences, challenges and opportunities in areas including publishing, IT, forensic analysis, organisational culture, marketing, financial reporting, and language teaching, learning and assessment. My talk reviewed ongoing work in the area of financial discourse and collaborations with financial market partners including Financial Conduct Authority and Financial Reporting Council.
Year(s) Of Engagement Activity 2019
URL http://cass.lancs.ac.uk/mycalendar-events/?event_id1=65
 
Description Center for Financial Reporting and Auditing Workshop "Natural Language Processing in Financial Markets" 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact 1-day workshop organised by ESMT Berlin with the aim of bring practitioners and academics together to discuss the role of natural language processing (NLP) in financial reporting research and prcatice. The audience comprised a mix of academics and financial market practitioners. The total number of attendees was approximately 60. The workshop comprised 5 sessions plus a panel discussion. The keynote academic presentation was delivered by Steven Young (Lancaster). The keynote practitioner presentation was delivered by Ryan Lafond (Deputy Chief Investment Officer at Algert Global LLC). Presentations and discussions focused on how accounting research and financial market prcatice can make better use of NLP technology. Follow-up discussions with KPMG have taken place with a view to identifying opportunities for collaboration in the area of sustainable reporting.
Year(s) Of Engagement Activity 2018
URL https://www.esmt.org/faculty-research/centers-chairs-and-institutes/center-financial-reporting-and-a...
 
Description Commissioned workshop on NLP methods 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact One-day workshop commissioned by the Quantitative Research Team at RPMI Railpen (https://www.rpmirailpen.co.uk/) on NLP methods including machine learning classification, topic modelling, and information retrieval. We summarised results of our ongoing research; reviewed methods for topic modelling and made suggestions on how RPMI can extend their current work in this area; and showcased our approach to extracting document structure from UK annual reports. The workshop resulted in several action point for further collaboration including a student internship, sharing data and methods, and working together to scale-up annual report extraction to a global sample of firms.
Year(s) Of Engagement Activity 2019
 
Description European Accounting Association Doctoral Colloquium 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Plenary session on the problems and challenges facing mainstream accounting research examining financial discourse. I argue that much of the extant work fails to apply best practice methods from the NLP and corpus linguistics literatures. Instead, work adopts a quasi-scientific approach that emphasizes sample size and econometric rigor over word-sense disambiguation and meaning. The current approach represents a potential trap for inexperienced PhD researchers insofar as standard methods such as readability and tone can be computed at low cost for very large samples, which in turn can encourage researchers to focus on questions that are easy to address rather than questions that are interesting and important. I stress the need for a balanced approach to the analysis of financial discourse that combines small-sample manual analysis methods with large-sample automated scoring methods.
Year(s) Of Engagement Activity 2019
URL http://www.eiasm.org/frontoffice/event_announcement.asp?event_id=1372#5839
 
Description Financial Accounting Workshop 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact The aim of my session to review ways that academic accounting research can inform regulatory activities. I identified opportunities for researchers and discussed barriers to progress (and how they might be overcome). The talk was part of a one-day workshop organised by Bristol University. The event attracted approximately 30 participants including academic faculty, PhD students, and representatives from the accounting profession (International Accounting Standards Board). My session involved a presentation followed by discussion. Participants explored a range of issues regarding engagement activities including collaborative research opportunities, contracting, and the tension between impact versus publications in the context of academic progression.
Year(s) Of Engagement Activity 2018
 
Description Paper prepared for European Financial Reporting Advisory Group (EFRAG) Academic Panel 1 June meeting 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact 6-page non-technical report summarizing key insights and themes for financial reporting regulators highlighted in the following paper:
Lewis, C., Young, S. (2019). Fad or future: Automated analysis of financial text and its implications for corporate reporting. Accounting & Business Research 49(5) in press http://eprints.lancs.ac.uk/133315/

The executive summary of the report is as follows:
Applying natural language processing (NLP) methods to analyze unstructured data in the corporate reporting package offers two generic benefits:
• The ability to process large volumes of content at relatively low cost;
• The ability to detect latent features that even manual analysis may struggle to identify.
Realizing these benefits is conditional on low cost, reliable access to financial text on a large scale.

The IASB defines users along a single dimension reflecting the information needs associated with their contractual relation with the entity. Increasing interest in NLP approaches suggests a second dimension that distinguishes between:
• Traditional users, who adopt a manual reading strategy and typically view documents as a linear narrative or a key reference source;
• Digitally sophisticated, who users operate on a larger scale, with the aim of extracting and processing content automatically to realize the generic benefits of NLP.
The distinction foregrounds debate over the format and delivery of the financial reporting package, and whether it is possible to satisfy both groups via a single reporting model.

NLP methods have important implications for the disclosure problem as defined by the IASB:
• The problem of too much irrelevant information may be less of a concern because information overload is less of a concern for NLP applications
• The potential for NLP to detect latent features raises questions about ex ante definitions of relevance. From a big data NLP perspective, relevance is determined by algorithms and statistical analysis rather than regulators. The same argument holds for materiality.

NLP has implications for the effective communication:
• NLP methods offer a (partial) means of overcoming ineffective communication by filtering-out boilerplate disclosure, translating technical jargon, highlighting links between relevant information, and identifying key reporting themes;
• NLP offers the potential to change the way decision-makers use unstructured data by introducing a dynamic dimension that allows users to reformulate (normalize) disclosures and select as-reported content conditional on the specific decision context faced;
• Use of NLP methods has implications for the definition of effective communication. Since NLP relies on reliable, low cost access to source data, the focus of effective communication expands to include delivery as well as content.

A series of structural impediments involving two core themes of data access and collaboration restrict use of NLP methods to financial reporting data. Overcoming these impediments requires coordinated action by a range of key financial reporting stakeholders.
Year(s) Of Engagement Activity 2019
 
Description Plenary speaker, British Accounting & Finance Association Northern Area Group meeting 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Session focused on the proposed benefits of automated analysis of text and evaluate extant research against these perceived advantages. Key themes emerging from a review of prior research are (a) a significant fraction of work is limited in scope and often fails to deliver many of the suggested benefits and (b) automated analysis is not a 'quick fix' replacement for close manual reading by domain experts.
Year(s) Of Engagement Activity 2019
 
Description Presentation to European Financial Reporting Advisory Group (EFRAG) 18 October meeting 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Policymakers/politicians
Results and Impact Item on textual analysis included on meeting agenda, coupled with invitation to join the meeting and 60-minute discussion concerning the state of the art of research using these methods and how automated methods for analyzing text can assist EFRAG's activities. I opened the discussion by providing a brief overview of key issues. A range of related issues were then discussed.
Year(s) Of Engagement Activity 2019
 
Description Presentation to Senior Management at FCA 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact Update on results of ongoing research collaboration involving development of methods to predict financial reporting violations. The aim of the presentation was to update senior FCA managers on results achived to date and the next steps in the model builing process.
Year(s) Of Engagement Activity 2018
 
Description Press release CFIE-FRSE app and associated publication 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact The press release reads as follows:
New software from Lancaster University cuts through hard-to-understand financial reports, to help investors and regulators.

Researchers have developed the Corporate Financial Information Environment - Final Report Structure Extractor (CFIE-FRSE) app to dissect and analyse the narrative aspects of companies' annual reports, which are aimed at shareholders but also used by other stakeholders including financial analysts, prospective investors, journalists and regulators.

Annual report narratives contain commentary on financial performance, as well as supplementary information on topics such as principal risks and corporate social responsibility policies - but management has a high level of discretion over report content and structure, and as a result, investors can struggle to find and understand the information they require.

At present, there is no uniform structure to such documents, making comparison and large-scale analysis severely challenging.

Professor Steve Young, Head of Accounting in LUMS, said: "Annual reports are highly unstructured, and different companies report in different ways, which makes extracting content and comparing reports very difficult. Almost every document is different.

"Many reports are almost impossible for non-specialists to read, which is at odds with the trend towards a broader model of stakeholder reporting.

"We have designed an app to extract commentary from these documents and normalise it across firms - making comparisons much easier. The procedure provides a reliable means of capturing and classifying these narratives."

The interdisciplinary project has involved academics from Lancaster University Management School (LUMS), the School of Computing and Communications and the Department of Linguistics & English Language.

There has already been interest in CFIE-FRSE from investment and hedge-fund managers, who would gain a greater insight into the status and stability of companies, as well as regulators looking to see where businesses may be trying to conceal information and where intervention is needed. It also allows them to see where regulation is working and where it may need to change.

More than 26,000 documents published between 2003 and 2017 by companies listed on the London Stock Exchange have been analysed by the app and scored on features such as length, readability and sentiment. Because the CFIE-FRSE app detects report structure, scores are available for each section listed in the table of contents.

Dr Mahmoud El-Haj, Senior Research Associate in the School of Computing and Communications at Lancaster University, said: "The app uses heuristic approaches and rule-based decision making to automatically detect the structure of an annual report. This helps the software to extract sections' text by knowing their start and end pages.

"The app was trained to identify a set of common section titles (types) based on a training list of synonyms generated by accounting and finance experts. For example, the app is able to identify that the 'Letter to shareholders' in one company's report is the same as the 'Chairman's statement' in another company's report."

Analysis of the annual reports processed by the app reveals a number of interesting features and reporting trends. For example, average report length has more than doubled over the last decade to almost 34,000 words. (A dissertation on a typical Masters degree comprises 10,000-13,000 words.)

Average report readability is also poor; and there has been no noticeable improvement over the sample period. (Readability is measured using an algorithm that penalizes long sentences and complex words.)

Long, unstructured documents containing complex language means that many retail investors and other non-specialist stakeholders struggle to understand the typical annual report.

Sentiment also varies dramatically across different sections within the same report. For example, sections where content is shaped by regulation and compliance such as governance statements and remuneration reports are characterized by neutral language. In contrast, the tone of language is up to four times more positive in sections where directors have more reporting discretion and where performance is the primary focus.

The CFIE-FRSE app aims cut through hard-to-understand annual report language and help users identify unusual patterns in corporate reports that may help to distinguish long-term financial strength from inflated short-term profits.

Coverage as at 19 August 2019 includes:
The business sites Bdaily and Business Up North have carried the news:
https://www.businessupnorth.co.uk/lancaster-university-software-innovation-cuts-through-hard-to-understand-financial-reports/
https://bdaily.co.uk/articles/2019/08/14/new-software-from-lancaster-university-cuts-through-hard-to-understand-financial-reports-to-help-investors-and-regulators

A Canadian finance website has covered the app here:
https://www.wealthprofessional.ca/market-talk/company-results-can-be-complicated-but-academics-have-an-answer-278038.aspx

And German, Swiss and Austrian sites have reported on it:
https://computerwelt.at/news/neue-app-entschluesselt-geschaeftsberichte/
https://www.pressetext.com/news/20190813002
https://www.manager24.ch/xn--neue_app_entschlsselt_geschftsberichte-hhd26g.html
https://www.ictk.ch/inhalt/neue-app-entschl%C3%BCsselt-gesch%C3%A4ftsberichte

It has also been published on these sites:
https://www.sciencecodex.com/lancaster-university-programme-brings-clarity-hard-decipher-company-annual-reports-631314
https://scienmag.com/lancaster-university-programme-brings-clarity-to-hard-to-decipher-company-annual-reports/
http://7thspace.com/headlines/929408/lancaster_university_programme_brings_clarity_to_hard_to_decipher_company_annual_reports.html
https://www.techsite.io/p/1173989/t/lancaster-university-programme-brings-clarity-to-hard-to-decipher-company-annual-reports
https://www.eurekalert.org/pub_releases/2019-08/lu-lup080819.php
https://www.alphagalileo.org/en-gb/Item-Display/ItemId/181700?returnurl=https://www.alphagalileo.org/en-gb/Item-Display/ItemId/181700

Accountancy Daily has carried an article on the new software, which can be seen here:
https://www.accountancydaily.co/university-launches-app-translate-annual-report-information

BBC Radio Lancashire interviewed the PI on 15 August 2019
Year(s) Of Engagement Activity 2019
 
Description Research summary for Investor Relations Society 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Non-technical summary of research study examining the properties of high quality annual reporting. The summary was requested by Communications Manager of the Investor Relations Society (Laura Hayter) and Chief Insight and Engagement Officer (Sallie Pilot) following conference call.
Year(s) Of Engagement Activity 2020
 
Description Speaker at INQUIRE UK joint conference with INQUIRE EUROPE - 2019 Residential 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Plenary talk providing an overview of textual analysis methods in the context of quantitative investment strategies, followed by round table discussion with Q&A. The audience comprised quantitative financial analysis and fund managers from Europe and the the US. Interest in the value of using unstructured data in investment strategies is growing. However, practices are varied and core principles from linguistics and natural language processing are often overlooked in favour of "quick and dirty" approaches that ignore theory. The presentation walked the audience through the fundamental steps in the textual analysis pipeline (corpus creation, cleaning and preprocessing, corpus annotation, and text processing) with the aim of highlight best practice and alerting audience members to potential pitfalls. The presentation stimulated significant debate about how best to approach the task of quantifying narrative content and where the most promising opportunities lie. A number of fund managers followed up requesting additional information.
Year(s) Of Engagement Activity 2019
URL https://www.inquire2019.co.uk/inquire2019/login
 
Description Talk to London Text Analytics 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact Part of a one-day workshop hosted at Accenture's London office. The event was organised by Tony Russell-Rose (UXLabs), Dyaa Albakour (Signal), and Udo Kruschwitz (University of Essex). Dr El-Haj (Senior RA on project) delivered one of the two invited presentations. The talk highlighted the importance of automatic extraction and textual analysis of financial disclosures for their contribution to corporate success. Textual disclosures help to clarify issues obscured by complex accounting methods, contextualise financial results and summarise key elements of business activity that are difficult to quantify. The extraction and textual analysis of these disclosures combine with natural language processing and corpus linguistics to form the Financial Narrative Processing (FNP). The audience comprised 47 text analysis practitioners, academics, and PhD students. Several follow-up requests were received from text analysis practitioners regarding software tools developed by the research team.
Year(s) Of Engagement Activity 2018
URL https://www.meetup.com/en-AU/textanalytics/events/252152599/
 
Description Workshop on narrative reporting 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This one-day workshop unites accounting researchers, financial market professionals, and experts in textual analysis with the goal of sharing views on the properties of high quality annual report narratives and the methods for analysing them. Enhanced practical understanding of the features shaping narrative reporting quality is critical for academic researchers interested in studying the phenomenon. Similarly, cutting-edge research in accounting affords insights into the text processing opportunities available to financial market professionals. Finally, both accounting researchers and professionals require guidance from linguistics and computer scientists on the practicalities of analysing text.

The workshop programme comprises a novel combination of practice-focused sessions, research summaries, and introductions to aspects of natural language processing (NLP) and corpus linguistics for those new to the area. The workshop is supported by the Economic and Social Research Council, Lancaster University Management School, and the Centre for Corpus Approaches to Social Science.

Sessions
• Practitioner views on high quality annual reporting
• Academic evidence on the properties of award-winning annual reports and the effectiveness of automated procedures for measuring sentiment and attribution in earnings announcements
• Panel session on emerging issues in narrative reporting
• Review of research methods in corpus linguistics and machine learning

Speakers
• Sallie Pilot (Chief Insight and Engagement Officer, Black Sun Plc)
• Phil Fitz-Gerald (Director of the Financial Reporting Lab, Financial Reporting Council)
• Peter Hogarth/Mark O'Sullivan (PwC)
• Dr Eddie Bell (Head of Machine Learning, Ravelin)
• Dr Vaclav Brezina (ESRC Centre for Corpus Approaches to Social Science)
• Prof Martin Walker (Alliance Manchester Business School)
• Prof Steven Young (Lancaster University Management School)
Year(s) Of Engagement Activity 2019
URL http://ucrel.lancs.ac.uk/cfie/hqfrn2019.pdf
 
Description Workshop on textual analysis 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Workshop on Textual Analysis Methods in Accounting and Finance
Lancaster University Management School
Programme
Day 1: 12 September
11.00-11.30 Introduction and welcome
11.30-12.30 Session 1 Overview of textual analysis literature in accounting and finance
The aim of this session is to provide participants with an overview of extant research on textual analysis in the accounting and finance literature. We will focus on the proposed benefits of automated analysis of text and evaluate extant research against these perceived advantages. A key conclusion that will emerge from the review is that prior research is limited in scope and fails to deliver many of the suggested benefits. A critical theme informing the remainder of the workshop is that automated analysis is not a "quick fix" replacement for close manual reading by domain experts: most advanced applications of computational methods rely on significant manual reading for training and validation.
Mention preprocessing and signpost coverage to Session 7 (but this is something that needs to be done for all text applications)
12.30-13.30
13.30-15.00 Session 2 Text extraction: Methods and pitfalls
Automated text retrieval is the starting point for most large-sample applications of textual analysis in accounting and finance. This session will provide general guidelines on the text retrieval process, as well as hands-on experience with retrieving: 10-K annual report text (including harvesting documents from EDGAR) using python and R scripts; U.K. annual report narratives published as PDF files using the CFIE's java-based annual report tool; and U.K. earnings announcement narratives using the CFIE's java-based PEA tool.
15.00-15.15
15.15-17.15 Session 3 Readability and tone: Methods and critique
Readability and tone (sentiment) are the two most commonly analysed features of financial market text. This session will review and critique methods used in the extant literature to measure readability and tone. We will demonstrate the problems of relying on standard readability metrics such as Fog to capture sophisticated narrative features such as complexity and understandability. We will also review the various approaches for measuring tone, ranging from simple wordlists to more advanced machine learning methods. A key conclusion that will emerge from this review is that simple measures of readability and tone provide limited scope for generating significant new insights in the literature.
18.00-19.30 Dinner & research presentation: Measuring Tone and Attribution
A buffet dinner followed by a discussion of ongoing research assessing the relative accuracy of wordlists and machine learning for measuring the tone of performance sentences and the presence of managerial self-attribution bias in earnings announcements.

Day 2: 13 September
09.00-10.30 Session 4 Constructing and using wordlists
Wordlists are the most common approach to analysing financial text in the accounting and finance literature. This session discusses the advantages and weaknesses of using a wordlist approach to study financial text, reviews the most common wordlists employed in the literature, and considers some of the methods used in conjunction with wordlists to improve their classification performance. The session will also explain the different approaches to constructing wordlists, together with the strengths and weaknesses of each approach.
10.30-11.00
11.00-12.30 Session 5 Introduction to machine learning
While machine learning forms the basis for a large proportion of research in the field of natural language processing, its uptake in accounting and finance is more limited. This session provides a board introduction to the field of machine learning methods, including both supervised and unsupervised approaches. Different aspects of machine learning and their relation will be explained including classification, named entity recognition, summarization, and topic modelling.
12.30-13.30
13.30-15.00 Session 6 Machine learning applications: Classification
This session provides a hands-on introduction to classification using machine learning methods. Participants will use the Weka toolkit (https://www.cs.waikato.ac.nz/~ml/weka/downloading.html) to construct and evaluate a model for identifying fraudulent financial reporting using 10-K filings. Results and insights from the analysis will be used to highlight weaknesses in the extant literature and identify opportunities for future research.
15.00-15.15
15.15-17.15 Session 7 Machine learning applications: Topic modelling
Several recent papers in the accounting literature have employed topic modelling methods such as Latent Dirichlet Allocation (LDA) to identify topics in financial text (e.g., Dyer et al. 2017). This session provides a hands-on introduction to topic modelling. Participants will use MALLET (http://mallet.cs.umass.edu/index.php) to extract topics from an annual report corpus. In addition to walking participants through the pracitcalites of the modelling process, the session will highlight the many problems associated with topic modelling and discuss alternative approaches to the content analysis problem.
18.00-19.30 Dinner & research presentation: Characteristics of Award Winning Annual Reports
A buffet dinner followed by a discussion of ongoing research that employs corpus methods to isolate the distinguishing features of high quality annual reports as proxied by reports shortlisted for a narrative reporting award in the U.K.

Day 3: 14 September
09.00-10.30 Session 8 Introduction to corpus linguistics
This session provides an introduction to the theory and core methods underpinning the systematic analysis of a large body of text (i.e., a corpus). The session will cover the following themes: introduction to basic corpus linguistic concepts; presentation of different corpora types and examples; methodology for corpus design, compilation, and processing; corpus annotation, and examples of annotated data; basic resources and corpus analysis tools; examples from the literature of using corpus methods to analyse analysis of financial discourse
10.30-11.00
11.00-13.00 Session 9 Applied corpus methods: Tools and techniques
This session provides hands-on experience of corpus analysis. The session will consist of two parts. Part 1 will introduce the corpus that will form the basis of our analysis (Brexit narratives in annual reports of UK financial firms), along with the AntConc software (Anthony 2014) for corpus analysis. In Part 2, participants will use the AntConc concordancer to analyse a small dataset and perform corpus tasks including: extracting word lists; finding collocates; and searching for n-grams and keywords. The session will conclude with a discussion of the insights gained from analysing the corpus.
13.00-14.00 and workshop ends
14.00-15.30 Optional surgery session for PhD students seeking feedback on research proposals and ongoing work involving analysis of text
Year(s) Of Engagement Activity 2018