MODS: Mapping knowledge with data science

Lead Research Organisation: University College London
Department Name: Centre for Advanced Spatial Analysis


The British Library manages the national database of UK doctoral theses, called EThOS ( EThOS
enables users to search for, discover, and access theses for use in their own research; however, the almost complete
aggregation of metadata about some 450,000 dissertations enables us to ask very interesting questions about the nature
and production of knowledge in an institutional and geographic context. For example, it becomes possible to compare
one university's outputs against another (not just in a quantitative sense, but also in terms of collective contributions,
impact on the discipline, etc.), make connections between authors and their supervisors, and to analyse disciplinary
These are quintessentially social science questions about the impact of individuals, work, and mobility on organisations
and cultures, but making sense of this amount of data requires sophisticated computational approaches to digesting text
and analysing relationships. The project therefore offers an exciting opportunity for interdisciplinary working: for those
with an computer science background it is an application of cutting-edge algorithms to real-world challenges with real
world impact, and for those with a social science grounding it is the opportunity to draw upon the full force of the 'AI
revolution' to conduct ground-breaking research at scale.
At its heart, this project is part of what has been termed 'computational social science'
(Lazer et al. 2009) in that it involves the application of cutting-edge computer science
techniques to large, rich data sets of human behaviour in order to support research into the
geography of academic knowledge production over time. However, as it approaches half-amillion
records the EThOS metadata has become solely interpretable and navigable through
'distant reading' approaches taken from the collaboration of the digital humanities with
computer science: by automating the processing and classification of textual data we are able
to tease out linkages between both texts (PhD theses) and individuals (supervisors and
students) within a corpus (EthOS) in order to group theses by discipline, by degree of topical
or thematic similarity, and by a kind of intellectual and historical genealogy. So although it
remains rooted in the interests and episteme of the social sciences, the research involves
work at the interface with both the natural sciences and the (digital) humanities to yield
excitingly interdisciplinary research.


10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
ES/P000703/1 01/10/2017 30/09/2027
2115536 Studentship ES/P000703/1 01/10/2018 30/09/2022 Jennie Williams