Semantic Tools for Archaeological Resources

Lead Research Organisation: University of Glamorgan
Department Name: Faculty of Advanced Technology

Abstract

Increasingly within archaeology, the Web is used for making results and findings available as datasets. However Google and other web search engines are ill equipped to retrieve information from these richly structured databases. Important archaeological results and reports are also appearing as 'grey literature' on the Web, before or instead of traditional publication. Typically these are not indexed or made available for searching other than as ordinary web documents. It is difficult using conventional search engines to link across these datasets or to search them using terminology other than that employed by the authors. Different people use different words for the same concept or may employ slightly different concepts and this is a barrier to widening scholarly access.

This research is in collaboration with English Heritage (EH). The aim is to construct a Web Demonstrator to investigate and evaluate novel computer techniques for searching across archaeological databases and linking them to grey literature reports. This will open up these datasets for wider scholarly inquiry than immediate subject specialists. English Heritage is a collaborator in the project and contributes staff time and various datasets from the Raunds project, which covers a large area in Northamptonshire, focusing mainly on Roman material (with some Iron Age).

Building upon previous work at Glamorgan (the FACET Project), the research makes use of knowledge organisation vocabularies, such as classifications, thesauri and ontologies, which can be used to structure and connect different databases together. These will be combined with computer-based linguistic techniques to make links to grey literature reports that have not been indexed with controlled keywords from any knowledge organisation system. The research investigates the potential of a high level ontology, the Conceptual Reference Model (CRM), to bridge the terminology employed by very different databases and reports. The CRM covers cultural heritage generally and is envisaged as a 'semantic glue' mediating between different sources and types of information.

EH staff are known for work in digital archiving. However, the existing situation is one of fragmented datasets and applications, employing different terminology systems. Even simply expressed queries are currently difficult to answer, due to lack of tools for cross database searching. There is a need for an integrative framework and EH have built a core ontology based on the CRM. The intention is that a common ontology will provide greater semantic depth and potential for cross-domain searching by researchers within and beyond the archaeological sector. However work to date has focused on modelling. The proposed research will investigate the potential of combining this CRM-based ontology with query expansion techniques from the FACET project to assist archaeological inquiry.

Many archaeologists are familiar with detailed searches within single project datasets but currently such databases exist as isolated universes. A key aim is to show archaeologists (who may not be computer specialists) new possibilities for broadening such searches, linking datasets and grey literature. These outcomes have relevance beyond the immediate EH datasets to archaeology researchers more generally and beyond that to cultural heritage scholars attempting cross-domain search.

Evaluation will include a comparison of the Demonstrator with currently available functionality, taking account of cost benefit and utility issues. Two workshops, with EH and wider archaeological users, will assist evaluation. The first workshop will evaluate a pilot search system and feed into iterative design of the interface and demonstrator. The second workshop (held at the Archaeology Data Service) will evaluate the Demonstrator, as well as reviewing the outcomes and discussing further exploitation to digital archives and grey literature more generally.

Publications

10 25 50

publication icon
Meghini C (2017) ARIADNE A Research Infrastructure for Archaeology in Journal on Computing and Cultural Heritage

publication icon
Richards J (2015) Mathematics and Archaeology

publication icon
Tudhope D (2008) Faceted Thesauri in Axiomathes

publication icon
Vlachidis A (2012) A pilot investigation of information extraction in the semantic annotation of archaeological reports in International Journal of Metadata, Semantics and Ontologies

publication icon
Vlachidis A (2016) A knowledge-based approach to Information Extraction for semantic interoperability in the archaeology domain in Journal of the Association for Information Science and Technology

publication icon
Vlachidis A (2011) Metadata and Semantic Research

publication icon
Vlachidis A (2013) Computational Linguistics

publication icon
Vlachidis A (2013) Metadata and Semantics Research

 
Description National cultural heritage thesauri and vocabularies have acted as standards for use by both national organizations and local authority Historic Environment Records but until now have lacked the persistent Linked Open Data (LOD) URIs that would allow them to act as vocabulary hubs for the Web of Data. The AHRC funded SENESCHAL project has made such vocabularies available online as Semantic Web resources. SENESCHAL has made available as Linked Open Data key vocabularies from
Historic England
RCAHMS (now Historic Environment Scotland)
RCAHMW (Royal Commission on Ancient & Historical Monuments of Wales )


RESTful web services make the vocabulary resources programmatically accessible and searchable. A series of case studies have explored use of these web services, in collaboration with the project partners. These have been used for example by ADS (the Archaeological Data Service).

In addition, a set of widgets are available to make it easier to use the vocabularies. Repeated references to RDF, SKOS, LOD, and REST services can sometimes seem an impenetrable wall of jargon leaving some people cold - how do we actually use all this stuff? The SENESCHAL widgets. This is a suite of predefined visual user interface controls that dynamically obtain vocabulary information from the web services. The controls provide vocabulary navigation, search and selection functionality that can be embedded directly within your own web pages. A set of associated demonstration pages show how to configure and use each widget control, and how to combine them to create functionally rich user interfaces.
Exploitation Route see follow on ARIADNE projects and also the Narrative Impact section here
Sectors Creative Economy,Digital/Communication/Information Technologies (including Software),Leisure Activities, including Sports, Recreation and Tourism,Culture, Heritage, Museums and Collections

URL http://www.heritagedata.org/blog/
 
Description We represented the English Heritage archaeological extension to the CRM ontology in RDF and as Linked Data. This allowed it to be a key ontology hub in the ADS archaeology Linked Data [I1]. This is another important step in English Heritage's strategic plans for information standards.
First Year Of Impact 2010
Sector Digital/Communication/Information Technologies (including Software),Culture, Heritage, Museums and Collections
Impact Types Cultural

 
Description AHRC DEDEFI - Digital Equipment and Database Enhancement for Impact
Amount £110,000 (GBP)
Organisation Arts & Humanities Research Council (AHRC) 
Sector Public
Country United Kingdom
Start 03/2010 
End 02/2011
 
Description AHRC Follow on Fund
Amount £76,000 (GBP)
Organisation Arts & Humanities Research Council (AHRC) 
Sector Public
Country United Kingdom
Start 03/2013 
End 02/2014
 
Description EC FP7 Infrastructures Grant: ARIADNE (Advanced Research Infrastructure for Archaeological Dataset Networking in Europe)
Amount £205,000 (GBP)
Funding ID 313193 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 02/2013 
End 01/2017
 
Description H2020 Programme
Amount € 6,597,368 (EUR)
Funding ID H2020-INFRAIA-2018-1-823914 
Organisation European Commission H2020 
Sector Public
Country Belgium
Start 01/2019 
End 12/2022
 
Description Subsequently, building on the PERTAINS work with MIMAS, the team was successful in bidding for funding from IMSL/JISC/AHRC/ESRC - the transatlantic Digging into Data Challenge. £74,944 to Glamorgan
Amount £74,944 (GBP)
Organisation Arts & Humanities Research Council (AHRC) 
Sector Public
Country United Kingdom
Start 02/2012 
End 01/2014
 
Description University of Glamorgan Research investment Scheme
Amount £19,000 (GBP)
Organisation University of Glamorgan 
Sector Academic/University
Country United Kingdom
Start 09/2010 
End 08/2011
 
Description Welsh Assembly - Follow on award for the Welsh Natural Language Toolkit project previously funded
Amount £40,000 (GBP)
Organisation Welsh Assembly 
Sector Public
Country United Kingdom
Start 07/2016 
End 03/2017
 
Description Welsh-language technology and digital media grant (Welsh Natural Language Toolkit)
Amount £37,000 (GBP)
Organisation Welsh Assembly 
Sector Public
Country United Kingdom
Start 07/2015 
End 03/2016
 
Title CRM-based indexing of ADS OASIS Grey Literature 
Description CIDOC CRM-based annotation of ADS OASIS Grey Literature via semantic information extraction NLP techniques 
Type Of Material Database/Collection of data 
Year Produced 2010 
Provided To Others? Yes  
Impact THis has fed into outcomes for the FP7 ARIADNE project (Advanced Research Infrastructure for Archaeological Dataset Networking in Europe) 
URL http://hypermedia.research.southwales.ac.uk/resources/STAR-data-outputs.html
 
Title RDF implementation of CRM-EH, the EH archaeological extension of CIDOC CRM 
Description RDF implementation of CRM-EH, the EH archaeological extension of CIDOC CRM 
Type Of Material Database/Collection of data 
Year Produced 2010 
Provided To Others? Yes  
Impact This has fed into the ADS linked data project and influenced CRMarchaeo, the archaeological extension of CIDOC-CRM core ontology 
URL http://hypermedia.research.southwales.ac.uk/resources/STAR-data-outputs.html
 
Title SKOS (RDF) representations of EH NMR thesauri 
Description SKOS (RDF) representations of EH NMR thesauri 
Type Of Material Database/Collection of data 
Year Produced 2010 
Provided To Others? Yes  
Impact These SKOS representations fed into the linked data publication of EH, RCAHMS and RCAHMW vocabularies as Linked data in the SENESCHAL project - published at https://www.heritagedata.org/blog/ See also REF 2014 Impact Case Study https://impact.ref.ac.uk/casestudies/CaseStudy.aspx?Id=27425 
URL http://hypermedia.research.southwales.ac.uk/resources/STAR-data-outputs.html
 
Description English Heritage 
Organisation English Heritage
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution STAR, STELLAR and SENESCHAL outcomes made a significant contribution to EH strategic objectivies in digital heritage and vocabulary management and standards. In STAR, The collaboration with English Heritage (EH) on digital archaeology has been interdisciplinary. EH have seen direct benefit, both to their datasets and the wider exposure of their thesauri via the terminology services and the implementation of their extension to the CIDOC CRM ontology for archaeology. The following remarks on the project's significance are contributed by EH. "One key outcome of STAR has been the development and dissemination of the EH ontological modelling (referred to by the project as the CRM-EH) in RDF, which otherwise would have been unlikely to have happened, and certainly not as soon. Another outcome from STAR has been the enhanced awareness of the CRM-EH and its ontological basis in the CIDOC CRM across wider cultural heritage and related IT sectors which has been significantly increased through the various STAR project publications, workshops and project team attendance at conferences and presentations. This has helped EH in promoting the potential use of standards like CIDOC CRM, SKOS and Thesauri for developing interoperability in the sector. Conversion of the EH Thesauri into SKOS format would have been very unlikely to happen so succinctly and effectively without the R&D expertise provided by partnership with Glamorgan Uni. This is a major benefit and considerable technological step forward for our sector that will have benefits across and beyond the heritage sector where the EH thesauri and related terminologies are the most widely used resource of their type. An example of this is the development by the STAR project of the SKOS terminology web services, particularly for other related resources such as ADS, which will most likely facilitate enhancement of the thesauri in SKOS format for the OASIS pan-UK online archaeological reporting system. " in SENESCHAL We (and the vocabulary partners in the SENESCHAL project) published as (SKOS) Linked Data the nationally recognised cultural heritage thesauri standards from English Heritage, the Royal Commission on the Ancient and Historical Monuments of Scotland and the Royal Commission on the Ancient and Historical Monuments of Wales. This includes concepts widely used for indexing relating to monument types, archaeological events and time periods. The significance is that previously the vocabulary providers lacked the ability to facilitate uniquely identified semantic indexing of data. Major thesauri can act as vocabulary hubs for the Web of Data (as suggested by W3C Library Linked Data Incubator Group). For example, the availability of the Thesaurus of Monument Terms in this way is seen as a major development for the ADS archive metadata Linked Data . This Linked Data publication of the English Heritage thesauri is a significant development in their vocabulary standards practice and their information access strategy. The potential reach is wide since it is a core activity of ADS, English Heritage, The Royal Commissions on the Ancient and Historical Monuments of Scotland/Wales to promote and disseminate best practice to the heritage sectors, as well as providing guidance on appropriate data standards including thesauri. The linked data vocabularies and web services will be integrated into the widely used ADS reporting/archiving tool, OASIS, which is in near universal use by commercial and local government archaeologists. Adoption of linked data based vocabulary management in this tool will immediately affect how all sectors engage in archaeological field practice and development control planning. We represented the English Heritage archaeological extension to the CRM ontology in RDF and as Linked Data. This allowed it to be a key ontology hub in the ADS archaeology Linked Data. This is another important step in English Heritage's strategic plans for information standards.
Collaborator Contribution STAR - English Heritage. The collaboration with English Heritage was very significant to the whole research project and absolutely necessary. Although there was no formal agreement the collaboration was planned and detailed in the Proposal - EH effectively acted English Heritage. UK non Research Organisation Keith May of English Heritage was a key member of the project team and project management. His contributions included design of the CRM-EH ontology, intellectual mapping of datasets to CRM-EH, writing and presenting outcomes, etc.
Impact General STAR, STELLAR, SENESCHAL project outcomes and outputs http://hypermedia.research.southwales.ac.uk/kos/star/ http://intarch.ac.uk/journal/issue30/tudhope_index.html http://hypermedia.research.southwales.ac.uk/kos/stellar/ http://hypermedia.research.southwales.ac.uk/kos/SENESCHAL/ http://www.heritagedata.org/
Start Year 2006
 
Description Museum of London Archaeology (MOLA) 
Organisation Museum of London Archaeology
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution STAR used an extract from the MOLA database in the project for semantic integration and the final demsontrator
Collaborator Contribution Museum of London Archaeology (MOLA) made datasets available and hosted a project meeting, giving early feedback
Impact General STAR outcomes and outputs http://hypermedia.research.southwales.ac.uk/kos/star/ http://intarch.ac.uk/journal/issue30/tudhope_index.html
Start Year 2007
 
Description The Archaeology Data Service (ADS) 
Organisation University of York
Department Archaeology Data Service (ADS)
Country United Kingdom 
Sector Academic/University 
PI Contribution In STAR - Semantic information extraction from ADS OASIS grey literature library. Research demonstrator of semantic integration of archaeological datasets and grey literature reports, very relevant to ADS research strategy. Continuing collaboration through two other AHRC grants and FP7 project ARIADNE. ADS Co-I with us in AHRC projects STELLAR and SENSCHAL The Archaeology Data Service (ADS) were Co-Investigators in STELLAR. They used the STELLAR tools to map and extract CRM-based RDF and published Linked Data.
Collaborator Contribution The Archaeology Data Service (ADS) provided the extract of OASIS grey literature reports for the STAR NLP work. ADS hosted the final STAR workshop and also hosted a joint STAR/ArcheoTools project workshop which was very helpful in the early stages of the project. The Archaeology Data Service (ADS) were Co-Investigators in STELLAR. They used the STELLAR tools to map and extract CRM-based RDF and published Linked Data. ADS were Co-Is in STELLAR/. The research on semantic data integration (STELLAR) provided tools and techniques that enabled the Archaeology Data Service (ADS http://archaeologydataservice.ac.uk/) to extract and publish Linked Data from major commercial archaeology units' excavation datasets, integrated semantically via mapping to the CIDOC CRM ontology. It is envisaged this will serve as a catalyst for further production of archaeological Linked Data by ADS and others. Building on this work, we are leading the FP7 ARIADNE archaeology e-infrastructure Work Package, Linking Archaeology Data. The research enabled ADS (non-specialists in semantic technologies) first foray into Linked Data and represents a major development in practice and capability by ADS and in UK archaeological data publication. It has generated considerable attention. The significance also derives from the importance of the published datasets and the exemplar. The Linked Data includes datasets drawn the Channel Tunnel Rail Link and the Aggregates Levy Sustainability Fund, major archaeological programmes with excavations undertaken by two of the largest commercial units in England (Oxford Archaeology Ltd and Wessex Archaeology Ltd). Other datasets included an excavation database with details of the earliest ironworking yet known in Britain. As the only record of unrepeatable fieldwork, it is essential that these data are preserved and made available for re-use and re-interpretation. ADS were also Co-Is in SENESCHAL and made use of the SENESCHAL services in their content management system and actively partiicpated throughout the project.
Impact STAR, STELLAR, SENESCHAL project outcomes and outputs http://hypermedia.research.southwales.ac.uk/kos/star/ http://intarch.ac.uk/journal/issue30/tudhope_index.html http://hypermedia.research.southwales.ac.uk/kos/stellar/ http://hypermedia.research.southwales.ac.uk/kos/SENESCHAL/ http://www.heritagedata.org/
Start Year 2007
 
Description York Archaeological trust (YAT) 
Organisation York Archaeological Trust
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution An extract from the YAT database formed part of the STAR project and final Demonstrator
Collaborator Contribution York Archaeological trust (YAT) supplied one of the datasets used in the Demonstrator and also participated in project workshops, giving valuable feedback.
Impact General STAR outcomes and outputs http://hypermedia.research.southwales.ac.uk/kos/star/ http://intarch.ac.uk/journal/issue30/tudhope_index.html
Start Year 2007
 
Title OPTIMA 
Description Semantic information extraction - the outcome of Andreas Vlachidis PhD Thesis. Information about the NLP techniques, grey literature corpus and tools is available from the Andronikos portal http://www.andronikos.co.uk/. The NLP information extraction pipeline (OPTIMA) automatically extracts relevant CRM and (CRM-EH) entities and relationships and is based on the GATE toolkit. A full evaluation is reported in Andreas Vlachidis PhD thesis (Vlachidis 2012, Vlachidis et al. 2013). The OPTIMA information extraction (NER and RE) tools are open source and freely available from http://sourceforge.net/projects/optimacidoc/ 
Type Of Technology Software 
Year Produced 2013 
Open Source License? Yes  
Impact STAR project Demonstrator http://hypermedia.research.southwales.ac.uk/kos/star/ http://intarch.ac.uk/journal/issue30/tudhope_index.html 
URL http://sourceforge.net/projects/optimacidoc/