971579Exploiting machine learning to improve the proposal submission processClosedSmall Business Research InitiativeInnovate UKThe 3-month Phase 1 proposal describes the creation of a prototype system to assess proposals that are submitted to Innovate UK. The project has three main goals: matching proposals to assessors, identifying duplicates and resubmitted proposals and then identifying fraudulent behaviour. The system will use a combination of Natural Language Processing (NLP), machine learning and data analytics to understand the content of a proposal and the assign this to a relevant assessor. The system uses topic modelling to identify the key concepts of a proposal, while NLP extracts entities such as people, organisations and keywords to build a rich picture of the proposal content. The extracted data is then stored to a graph database and in JSON-LD. A similar approach is then taken to extract content from assessor’s profiles, resumes, authored articles, blog posts, tweets or journal papers to create a rich picture of an assessor’s knowledge and skills. The extracted content, topics, keywords and entities from the proposals and assessor’s material, is then stored in a structured manner in JSON-LD in Elasticsearch and a graph database (Neo4j). Search queries, social network analysis and text vector machine learning clustering techniques are then used to link assessors to proposals based upon the similarities of their skills and the proposal topics. The second aim of the proposal is to identify resubmissions, duplicate content and reworded proposals using techniques called document fingerprinting and Winnowing. These techniques break a document into chunks that can then be identified across a corpus of submitted proposals. These results are then clustered using a Neural Network machine learning approach. The final aim is to identify possible fraudulent behaviour, by assessing the people and organisations involved in the proposal against publicly gathered information gathered information. We propose to use NLP to extract the people, organisation and websites entities and then cross-reference this against data from Companies House and Gateway to Research APIs. Again, we will use a graph database with social network analysis and machine learning techniques to identify fraudulent proposals, for example multiple proposals that have been submitted under company subsidiary names.