Semantic Tools for Archaeological Resources
Lead Research Organisation:
University of South Wales
Department Name: Faculty of Advanced Technology
Abstract
Increasingly within archaeology, the Web is used for making results and findings available as datasets. However Google and other web search engines are ill equipped to retrieve information from these richly structured databases. Important archaeological results and reports are also appearing as 'grey literature' on the Web, before or instead of traditional publication. Typically these are not indexed or made available for searching other than as ordinary web documents. It is difficult using conventional search engines to link across these datasets or to search them using terminology other than that employed by the authors. Different people use different words for the same concept or may employ slightly different concepts and this is a barrier to widening scholarly access.
This research is in collaboration with English Heritage (EH). The aim is to construct a Web Demonstrator to investigate and evaluate novel computer techniques for searching across archaeological databases and linking them to grey literature reports. This will open up these datasets for wider scholarly inquiry than immediate subject specialists. English Heritage is a collaborator in the project and contributes staff time and various datasets from the Raunds project, which covers a large area in Northamptonshire, focusing mainly on Roman material (with some Iron Age).
Building upon previous work at Glamorgan (the FACET Project), the research makes use of knowledge organisation vocabularies, such as classifications, thesauri and ontologies, which can be used to structure and connect different databases together. These will be combined with computer-based linguistic techniques to make links to grey literature reports that have not been indexed with controlled keywords from any knowledge organisation system. The research investigates the potential of a high level ontology, the Conceptual Reference Model (CRM), to bridge the terminology employed by very different databases and reports. The CRM covers cultural heritage generally and is envisaged as a 'semantic glue' mediating between different sources and types of information.
EH staff are known for work in digital archiving. However, the existing situation is one of fragmented datasets and applications, employing different terminology systems. Even simply expressed queries are currently difficult to answer, due to lack of tools for cross database searching. There is a need for an integrative framework and EH have built a core ontology based on the CRM. The intention is that a common ontology will provide greater semantic depth and potential for cross-domain searching by researchers within and beyond the archaeological sector. However work to date has focused on modelling. The proposed research will investigate the potential of combining this CRM-based ontology with query expansion techniques from the FACET project to assist archaeological inquiry.
Many archaeologists are familiar with detailed searches within single project datasets but currently such databases exist as isolated universes. A key aim is to show archaeologists (who may not be computer specialists) new possibilities for broadening such searches, linking datasets and grey literature. These outcomes have relevance beyond the immediate EH datasets to archaeology researchers more generally and beyond that to cultural heritage scholars attempting cross-domain search.
Evaluation will include a comparison of the Demonstrator with currently available functionality, taking account of cost benefit and utility issues. Two workshops, with EH and wider archaeological users, will assist evaluation. The first workshop will evaluate a pilot search system and feed into iterative design of the interface and demonstrator. The second workshop (held at the Archaeology Data Service) will evaluate the Demonstrator, as well as reviewing the outcomes and discussing further exploitation to digital archives and grey literature more generally.
This research is in collaboration with English Heritage (EH). The aim is to construct a Web Demonstrator to investigate and evaluate novel computer techniques for searching across archaeological databases and linking them to grey literature reports. This will open up these datasets for wider scholarly inquiry than immediate subject specialists. English Heritage is a collaborator in the project and contributes staff time and various datasets from the Raunds project, which covers a large area in Northamptonshire, focusing mainly on Roman material (with some Iron Age).
Building upon previous work at Glamorgan (the FACET Project), the research makes use of knowledge organisation vocabularies, such as classifications, thesauri and ontologies, which can be used to structure and connect different databases together. These will be combined with computer-based linguistic techniques to make links to grey literature reports that have not been indexed with controlled keywords from any knowledge organisation system. The research investigates the potential of a high level ontology, the Conceptual Reference Model (CRM), to bridge the terminology employed by very different databases and reports. The CRM covers cultural heritage generally and is envisaged as a 'semantic glue' mediating between different sources and types of information.
EH staff are known for work in digital archiving. However, the existing situation is one of fragmented datasets and applications, employing different terminology systems. Even simply expressed queries are currently difficult to answer, due to lack of tools for cross database searching. There is a need for an integrative framework and EH have built a core ontology based on the CRM. The intention is that a common ontology will provide greater semantic depth and potential for cross-domain searching by researchers within and beyond the archaeological sector. However work to date has focused on modelling. The proposed research will investigate the potential of combining this CRM-based ontology with query expansion techniques from the FACET project to assist archaeological inquiry.
Many archaeologists are familiar with detailed searches within single project datasets but currently such databases exist as isolated universes. A key aim is to show archaeologists (who may not be computer specialists) new possibilities for broadening such searches, linking datasets and grey literature. These outcomes have relevance beyond the immediate EH datasets to archaeology researchers more generally and beyond that to cultural heritage scholars attempting cross-domain search.
Evaluation will include a comparison of the Demonstrator with currently available functionality, taking account of cost benefit and utility issues. Two workshops, with EH and wider archaeological users, will assist evaluation. The first workshop will evaluate a pilot search system and feed into iterative design of the interface and demonstrator. The second workshop (held at the Archaeology Data Service) will evaluate the Demonstrator, as well as reviewing the outcomes and discussing further exploitation to digital archives and grey literature more generally.
Organisations
- University of South Wales (Lead Research Organisation)
- York Archaeological Trust (Collaboration)
- Museum of London Archaeology (Collaboration)
- English Heritage (Collaboration)
- UNIVERSITY OF YORK (Collaboration)
- University of Copenhagen (Project Partner)
- Historic Bldgs & Mnts Commis for England (Project Partner)
People |
ORCID iD |
Douglas Tudhope (Principal Investigator) |
Publications
Aloia N
(2017)
Enabling European Archaeological Research: The ARIADNE E-Infrastructure
in Internet Archaeology
Binding C
(2018)
A study of semantic integration across archaeological data and reports in different languages
in Journal of Information Science
Binding C
(2010)
Semantic Technologies for Archaeology Resources: Results from the STAR project
in CAA 2010
Binding C
(2008)
Research and Advanced Technology for Digital Libraries
May K
(2015)
Barriers and opportunities for Linked Open Data use in archaeology and cultural heritage
in Archäologische Informationen
Meghini C
(2017)
ARIADNE A Research Infrastructure for Archaeology
in Journal on Computing and Cultural Heritage
Tudhope D
(2011)
Connecting Archaeological Data and Grey Literature via Semantic Cross Search
in Internet Archaeology
Tudhope D
(2008)
Making KOS Machine Understandable
Tudhope D
(2008)
Faceted Thesauri
in Axiomathes
Description | National cultural heritage thesauri and vocabularies have acted as standards for use by both national organizations and local authority Historic Environment Records but until now have lacked the persistent Linked Open Data (LOD) URIs that would allow them to act as vocabulary hubs for the Web of Data. The AHRC funded SENESCHAL project has made such vocabularies available online as Semantic Web resources. SENESCHAL has made available as Linked Open Data key vocabularies from Historic England RCAHMS (now Historic Environment Scotland) RCAHMW (Royal Commission on Ancient & Historical Monuments of Wales ) RESTful web services make the vocabulary resources programmatically accessible and searchable. A series of case studies have explored use of these web services, in collaboration with the project partners. These have been used for example by ADS (the Archaeological Data Service). In addition, a set of widgets are available to make it easier to use the vocabularies. Repeated references to RDF, SKOS, LOD, and REST services can sometimes seem an impenetrable wall of jargon leaving some people cold - how do we actually use all this stuff? The SENESCHAL widgets. This is a suite of predefined visual user interface controls that dynamically obtain vocabulary information from the web services. The controls provide vocabulary navigation, search and selection functionality that can be embedded directly within your own web pages. A set of associated demonstration pages show how to configure and use each widget control, and how to combine them to create functionally rich user interfaces. |
Exploitation Route | see follow on ARIADNE projects and also the Narrative Impact section here |
Sectors | Creative Economy Digital/Communication/Information Technologies (including Software) Leisure Activities including Sports Recreation and Tourism Culture Heritage Museums and Collections |
URL | http://www.heritagedata.org/blog/ |
Description | We represented the English Heritage archaeological extension to the CRM ontology in RDF and as Linked Data. This allowed it to be a key ontology hub in the ADS archaeology Linked Data [I1]. This is another important step in English Heritage's strategic plans for information standards. |
First Year Of Impact | 2010 |
Sector | Digital/Communication/Information Technologies (including Software),Culture, Heritage, Museums and Collections |
Impact Types | Cultural |
Description | AHRC DEDEFI - Digital Equipment and Database Enhancement for Impact |
Amount | £110,000 (GBP) |
Organisation | Arts & Humanities Research Council (AHRC) |
Sector | Public |
Country | United Kingdom |
Start | 03/2010 |
End | 02/2011 |
Description | AHRC Follow on Fund |
Amount | £76,000 (GBP) |
Organisation | Arts & Humanities Research Council (AHRC) |
Sector | Public |
Country | United Kingdom |
Start | 03/2013 |
End | 02/2014 |
Description | EC FP7 Infrastructures Grant: ARIADNE (Advanced Research Infrastructure for Archaeological Dataset Networking in Europe) |
Amount | £205,000 (GBP) |
Funding ID | 313193 |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 02/2013 |
End | 01/2017 |
Description | H2020 Programme |
Amount | € 6,597,368 (EUR) |
Funding ID | H2020-INFRAIA-2018-1-823914 |
Organisation | European Commission H2020 |
Sector | Public |
Country | Belgium |
Start | 01/2019 |
End | 12/2022 |
Description | Subsequently, building on the PERTAINS work with MIMAS, the team was successful in bidding for funding from IMSL/JISC/AHRC/ESRC - the transatlantic Digging into Data Challenge. £74,944 to Glamorgan |
Amount | £74,944 (GBP) |
Organisation | Arts & Humanities Research Council (AHRC) |
Sector | Public |
Country | United Kingdom |
Start | 02/2012 |
End | 01/2014 |
Description | University of Glamorgan Research investment Scheme |
Amount | £19,000 (GBP) |
Organisation | University of Glamorgan |
Sector | Academic/University |
Country | United Kingdom |
Start | 08/2010 |
End | 08/2011 |
Description | Welsh Assembly - Follow on award for the Welsh Natural Language Toolkit project previously funded |
Amount | £40,000 (GBP) |
Organisation | Welsh Assembly |
Sector | Public |
Country | United Kingdom |
Start | 06/2016 |
End | 03/2017 |
Description | Welsh-language technology and digital media grant (Welsh Natural Language Toolkit) |
Amount | £37,000 (GBP) |
Organisation | Welsh Assembly |
Sector | Public |
Country | United Kingdom |
Start | 06/2015 |
End | 03/2016 |
Title | CRM-based indexing of ADS OASIS Grey Literature |
Description | CIDOC CRM-based annotation of ADS OASIS Grey Literature via semantic information extraction NLP techniques |
Type Of Material | Database/Collection of data |
Year Produced | 2010 |
Provided To Others? | Yes |
Impact | THis has fed into outcomes for the FP7 ARIADNE project (Advanced Research Infrastructure for Archaeological Dataset Networking in Europe) |
URL | http://hypermedia.research.southwales.ac.uk/resources/STAR-data-outputs.html |
Title | RDF implementation of CRM-EH, the EH archaeological extension of CIDOC CRM |
Description | RDF implementation of CRM-EH, the EH archaeological extension of CIDOC CRM |
Type Of Material | Database/Collection of data |
Year Produced | 2010 |
Provided To Others? | Yes |
Impact | This has fed into the ADS linked data project and influenced CRMarchaeo, the archaeological extension of CIDOC-CRM core ontology |
URL | http://hypermedia.research.southwales.ac.uk/resources/STAR-data-outputs.html |
Title | SKOS (RDF) representations of EH NMR thesauri |
Description | SKOS (RDF) representations of EH NMR thesauri |
Type Of Material | Database/Collection of data |
Year Produced | 2010 |
Provided To Others? | Yes |
Impact | These SKOS representations fed into the linked data publication of EH, RCAHMS and RCAHMW vocabularies as Linked data in the SENESCHAL project - published at https://www.heritagedata.org/blog/ See also REF 2014 Impact Case Study https://impact.ref.ac.uk/casestudies/CaseStudy.aspx?Id=27425 |
URL | http://hypermedia.research.southwales.ac.uk/resources/STAR-data-outputs.html |
Description | English Heritage |
Organisation | English Heritage |
Country | United Kingdom |
Sector | Charity/Non Profit |
PI Contribution | STAR, STELLAR and SENESCHAL outcomes made a significant contribution to EH strategic objectivies in digital heritage and vocabulary management and standards. In STAR, The collaboration with English Heritage (EH) on digital archaeology has been interdisciplinary. EH have seen direct benefit, both to their datasets and the wider exposure of their thesauri via the terminology services and the implementation of their extension to the CIDOC CRM ontology for archaeology. The following remarks on the project's significance are contributed by EH. "One key outcome of STAR has been the development and dissemination of the EH ontological modelling (referred to by the project as the CRM-EH) in RDF, which otherwise would have been unlikely to have happened, and certainly not as soon. Another outcome from STAR has been the enhanced awareness of the CRM-EH and its ontological basis in the CIDOC CRM across wider cultural heritage and related IT sectors which has been significantly increased through the various STAR project publications, workshops and project team attendance at conferences and presentations. This has helped EH in promoting the potential use of standards like CIDOC CRM, SKOS and Thesauri for developing interoperability in the sector. Conversion of the EH Thesauri into SKOS format would have been very unlikely to happen so succinctly and effectively without the R&D expertise provided by partnership with Glamorgan Uni. This is a major benefit and considerable technological step forward for our sector that will have benefits across and beyond the heritage sector where the EH thesauri and related terminologies are the most widely used resource of their type. An example of this is the development by the STAR project of the SKOS terminology web services, particularly for other related resources such as ADS, which will most likely facilitate enhancement of the thesauri in SKOS format for the OASIS pan-UK online archaeological reporting system. " in SENESCHAL We (and the vocabulary partners in the SENESCHAL project) published as (SKOS) Linked Data the nationally recognised cultural heritage thesauri standards from English Heritage, the Royal Commission on the Ancient and Historical Monuments of Scotland and the Royal Commission on the Ancient and Historical Monuments of Wales. This includes concepts widely used for indexing relating to monument types, archaeological events and time periods. The significance is that previously the vocabulary providers lacked the ability to facilitate uniquely identified semantic indexing of data. Major thesauri can act as vocabulary hubs for the Web of Data (as suggested by W3C Library Linked Data Incubator Group). For example, the availability of the Thesaurus of Monument Terms in this way is seen as a major development for the ADS archive metadata Linked Data . This Linked Data publication of the English Heritage thesauri is a significant development in their vocabulary standards practice and their information access strategy. The potential reach is wide since it is a core activity of ADS, English Heritage, The Royal Commissions on the Ancient and Historical Monuments of Scotland/Wales to promote and disseminate best practice to the heritage sectors, as well as providing guidance on appropriate data standards including thesauri. The linked data vocabularies and web services will be integrated into the widely used ADS reporting/archiving tool, OASIS, which is in near universal use by commercial and local government archaeologists. Adoption of linked data based vocabulary management in this tool will immediately affect how all sectors engage in archaeological field practice and development control planning. We represented the English Heritage archaeological extension to the CRM ontology in RDF and as Linked Data. This allowed it to be a key ontology hub in the ADS archaeology Linked Data. This is another important step in English Heritage's strategic plans for information standards. |
Collaborator Contribution | STAR - English Heritage. The collaboration with English Heritage was very significant to the whole research project and absolutely necessary. Although there was no formal agreement the collaboration was planned and detailed in the Proposal - EH effectively acted English Heritage. UK non Research Organisation Keith May of English Heritage was a key member of the project team and project management. His contributions included design of the CRM-EH ontology, intellectual mapping of datasets to CRM-EH, writing and presenting outcomes, etc. |
Impact | General STAR, STELLAR, SENESCHAL project outcomes and outputs http://hypermedia.research.southwales.ac.uk/kos/star/ http://intarch.ac.uk/journal/issue30/tudhope_index.html http://hypermedia.research.southwales.ac.uk/kos/stellar/ http://hypermedia.research.southwales.ac.uk/kos/SENESCHAL/ http://www.heritagedata.org/ |
Start Year | 2006 |
Description | Museum of London Archaeology (MOLA) |
Organisation | Museum of London Archaeology |
Country | United Kingdom |
Sector | Charity/Non Profit |
PI Contribution | STAR used an extract from the MOLA database in the project for semantic integration and the final demsontrator |
Collaborator Contribution | Museum of London Archaeology (MOLA) made datasets available and hosted a project meeting, giving early feedback |
Impact | General STAR outcomes and outputs http://hypermedia.research.southwales.ac.uk/kos/star/ http://intarch.ac.uk/journal/issue30/tudhope_index.html |
Start Year | 2007 |
Description | The Archaeology Data Service (ADS) |
Organisation | University of York |
Department | Archaeology Data Service (ADS) |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | In STAR - Semantic information extraction from ADS OASIS grey literature library. Research demonstrator of semantic integration of archaeological datasets and grey literature reports, very relevant to ADS research strategy. Continuing collaboration through two other AHRC grants and FP7 project ARIADNE. ADS Co-I with us in AHRC projects STELLAR and SENSCHAL The Archaeology Data Service (ADS) were Co-Investigators in STELLAR. They used the STELLAR tools to map and extract CRM-based RDF and published Linked Data. |
Collaborator Contribution | The Archaeology Data Service (ADS) provided the extract of OASIS grey literature reports for the STAR NLP work. ADS hosted the final STAR workshop and also hosted a joint STAR/ArcheoTools project workshop which was very helpful in the early stages of the project. The Archaeology Data Service (ADS) were Co-Investigators in STELLAR. They used the STELLAR tools to map and extract CRM-based RDF and published Linked Data. ADS were Co-Is in STELLAR/. The research on semantic data integration (STELLAR) provided tools and techniques that enabled the Archaeology Data Service (ADS http://archaeologydataservice.ac.uk/) to extract and publish Linked Data from major commercial archaeology units' excavation datasets, integrated semantically via mapping to the CIDOC CRM ontology. It is envisaged this will serve as a catalyst for further production of archaeological Linked Data by ADS and others. Building on this work, we are leading the FP7 ARIADNE archaeology e-infrastructure Work Package, Linking Archaeology Data. The research enabled ADS (non-specialists in semantic technologies) first foray into Linked Data and represents a major development in practice and capability by ADS and in UK archaeological data publication. It has generated considerable attention. The significance also derives from the importance of the published datasets and the exemplar. The Linked Data includes datasets drawn the Channel Tunnel Rail Link and the Aggregates Levy Sustainability Fund, major archaeological programmes with excavations undertaken by two of the largest commercial units in England (Oxford Archaeology Ltd and Wessex Archaeology Ltd). Other datasets included an excavation database with details of the earliest ironworking yet known in Britain. As the only record of unrepeatable fieldwork, it is essential that these data are preserved and made available for re-use and re-interpretation. ADS were also Co-Is in SENESCHAL and made use of the SENESCHAL services in their content management system and actively partiicpated throughout the project. |
Impact | STAR, STELLAR, SENESCHAL project outcomes and outputs http://hypermedia.research.southwales.ac.uk/kos/star/ http://intarch.ac.uk/journal/issue30/tudhope_index.html http://hypermedia.research.southwales.ac.uk/kos/stellar/ http://hypermedia.research.southwales.ac.uk/kos/SENESCHAL/ http://www.heritagedata.org/ |
Start Year | 2007 |
Description | York Archaeological trust (YAT) |
Organisation | York Archaeological Trust |
Country | United Kingdom |
Sector | Charity/Non Profit |
PI Contribution | An extract from the YAT database formed part of the STAR project and final Demonstrator |
Collaborator Contribution | York Archaeological trust (YAT) supplied one of the datasets used in the Demonstrator and also participated in project workshops, giving valuable feedback. |
Impact | General STAR outcomes and outputs http://hypermedia.research.southwales.ac.uk/kos/star/ http://intarch.ac.uk/journal/issue30/tudhope_index.html |
Start Year | 2007 |
Title | OPTIMA |
Description | Semantic information extraction - the outcome of Andreas Vlachidis PhD Thesis. Information about the NLP techniques, grey literature corpus and tools is available from the Andronikos portal http://www.andronikos.co.uk/. The NLP information extraction pipeline (OPTIMA) automatically extracts relevant CRM and (CRM-EH) entities and relationships and is based on the GATE toolkit. A full evaluation is reported in Andreas Vlachidis PhD thesis (Vlachidis 2012, Vlachidis et al. 2013). The OPTIMA information extraction (NER and RE) tools are open source and freely available from http://sourceforge.net/projects/optimacidoc/ |
Type Of Technology | Software |
Year Produced | 2013 |
Open Source License? | Yes |
Impact | STAR project Demonstrator http://hypermedia.research.southwales.ac.uk/kos/star/ http://intarch.ac.uk/journal/issue30/tudhope_index.html |
URL | http://sourceforge.net/projects/optimacidoc/ |