Virtual Investment Researcher

Lead Research Organisation: University of Cambridge
Department Name: Computer Science and Technology

Abstract

SEWA will gather relevant information sources through a combination of searches on the company, relevant websites, and public and internal databases. This information may be in textual or numerical form represented in HTML, Word, PDF, Excel or SQL formats. Semi-automating this process requires the development of a system that can take a company name as input, search the relevant websites, intranets, and databases, handling the different formats, and return a superset of potentially relevant sources from which the researcher can quickly select. In order to exploit techniques from the field of information retrieval, such as query expansion and relevance feedback to ensure all sources are found efficiently, the research team will need to develop a bespoke solution built on an open source platform. To achieve the full functionality required for the application, this solution will need to go beyond the currently available technology by, for example, being able to accurately index and extract text sentences and paragraphs from PDF documents.

Text analytics or information extraction will require the system to learn contextual patterns encoding relevant types of information on the basis of the links between text regions and report content. To customise such technology to the company report application will require the researchers to deploy either open source platforms or flexible commercial toolkits, for example for named entity recognition or relation extraction and the formatting of output in XML. The researchers provide input to the application developers, for example to optimise the reliability and accuracy of information aggregation.

Automation of text generation will be undertaken using the structured XML content extracted. The university research team will develop and apply appropriate algorithms which are able to generate natural language, and produce pre-defined text according to rules set up by the business team. The researchers will also provide input to the software development team on the integration of the selected algorithms and toolkits into the CUI.

Planned Impact

This project is led by All Street Research Ltd, with the intent of developing a working prototype of their proposed autonomous investment research system, SEWA. We will make fundamental contributions to the underpinning machine learning framework, which is vital for delivering the prototype. A successful outcome will have a huge positive impact on the company, enabling them to move into commercial deployment of their product, raise investment, and enable them to substantially increase their subscription revenue base.

Longer term, there will be significant impact on access to investment for SMEs, as SEWA will enable more comprehensive investment reports to be produced much more rapidly, and be much more widely available at an affordable price to both institutional and individual investors.

In addition, this project will help cement the relationship between All Street and the University of Cambridge Computer Laboratory, which is one that we believe could lead to longer term collaboration.

Publications

10 25 50
 
Description We have identified that a combination of unsupervised clustering to bootstrap training of a multiclass classifier embedded in an active learning framework is sufficient to support the workplan objectives and increase the productivity of analysts using SEWA. However, we await integration of our approach into SEWA in order to be able to conduct a user-based evaluation.
Exploitation Route Active learning ombined with stochastic random selection of examples to annotate is a resource efficient way of embedding machine learning based technology in an operational context where the task and data may change over time and where large-scale annotated datasets are not available
Sectors Digital/Communication/Information Technologies (including Software)

 
Description Licensed software has increased functionality of SEWA tool for cost-effective generation of company investment reports
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Economic

 
Description InnovateUK/EPSRC
Amount £1,000,000 (GBP)
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 04/2018 
End 10/2019
 
Title RASP 
Description NLP software provided as part of technology transfer project 
IP Reference  
Protection Copyrighted (e.g. software)
Year Protection Granted
Licensed Yes
Impact Enhanced functionality of SEWA
 
Title RASP Toolkit 
Description Continuous development of NLP toolkit, including extension to active learning under this grant 
IP Reference  
Protection Copyrighted (e.g. software)
Year Protection Granted 2019
Licensed Yes
Impact Company has right to use IP. Impact indeterminate to date.