Semantic Technologies Enhancing Links and Linked data for Archaeological Resources (STELLAR)

Lead Research Organisation: University of South Wales
Department Name: Faculty of Advanced Technology

Abstract

The Archaeology Data Service (ADS) was established in 1996 and has developed into a national repository for all digital data from the UK historic environment sector, crosscutting the academic and public and private sectors. The ADS provides online access to over one million metadata records covering the archaeology of England, Scotland, Wales and Northern Ireland. These are brokered on behalf of national government agencies, local government Historic Environment Records, and amenity and period societies and other specialist databases. The ADS has a mandate from the AHRC, NERC and English Heritage, amongst others, to provide a digital repository for all outputs from research they fund. Archaeology globally is seeing increasing use of the Web to disseminate data and the ADS is at the forefront of trying to make such data coherent and cross-searchable.

Not withstanding these impressive, operational achievements, the current situation is one of fragmented datasets and applications, with different terminology systems. The AHRC funded STAR (Semantic Technologies for Archaeological Resources) project, in collaboration with English Heritage (EH), aimed to address these concerns by utilising semantic terminology tools to link digital archive databases, vocabularies and the associated grey literature, exploiting the potential of a high level, core ontology - the CIDOC Conceptual Reference Model (CRM - ISO21127 standard), extended for archaeological purposes by May and others at EH) and natural language processing techniques. The STAR system cross searches over excavation datasets from different archaeological database schemas. The project is now in its concluding phase and various outcomes have been developed including a cross search and browse demonstrator, a data mapping/extraction tool and a set of semantic web services.

Recent Semantic Web technological developments in 'Linked Data' techniques transpired after the STAR plans were set. The linked data initiative is a move towards the Semantic Web vision of a 'web of data'. Content is made available in RDF, addressed via virtual but persistent URIs that allow HTTP clients to 'negotiate' their preferred representation of the content. This facilitates data reuse by RDF-aware applications and services. This technology is very topical, as seen in announcement of recent initiatives by both US and UK governments to make their data available as linked data.

This proposal seeks to build on the major development of dataset resources at ADS together with the STAR projrct outcomes and in collaboration with Project Partners, English Heritage. Major gains in accessibility, impact and long-term sustainability are possible by leveraging new possibilities for Linked Data and enhancing STAR tools for third party use, so that can be used 'in the wild' (they currently require specialist knowledge).

The aim of the research is to enhance the discoverability, accessibility, impact and sustainability of ADS datasets and STAR project outcomes (services and data resources) by enhancing the interoperability between resources using the latest integration technologies and development of semantic search facilities and associated user interfaces.

Best practice guidelines and tools will be developed both for mapping/extracting archaeological data as RDF and for generating archaeological Linked Data. To this end, third party data providers will use the tools developed by the project to map and extract archaeological datasets into RDF/XML representation conforming to the CIDOC CRM-EH standard ontology. These datasets will be generated as Linked Data. Evaluation will consider both the mapping and linked data generation exercises, taking account of technical and pragmatic issues.


Planned Impact

The proposed research will benefit a wide range of users and non-academic organisations, both within the archaeological domain and wider to encompass digital library, museum, archive, government and other communities.

The creation and/or enhancement of STAR tools in the STELLAR project will benefit various groups. The first group to benefit will be the ADS user community. This comprises a broad range of users with the ADS focus being on serving the needs of Higher Education researchers. Other users groups that regularly use ADS resources include National and local government archaeologists and cultural heritage managers, museums archaeologists, commercial archaeologists and members of the public. STELLAR will enhance these audiences access to ADS and MoLAS resources such that the content is both more discoverable, more accessible and, ultimately, more usable for research purposes. Improved discover-ability and access by leveraging the advantages of RDF and Linked Data could potentially lead to the formulation of new research questions, that have previously been impossible to answer due to the siloed nature of the required data sources.

Project partners, English Heritage, will also see direct benefit, both to their own datasets and vocabularies and also as regards the refinement of their extension to the CIDOC CRM ontology for archaeology.

The archaeological data creation and data curation community will also benefit from STELLAR outputs. Best practice methodology and for data extraction and mapping along with a mapping tool will greatly enhance the ability of data creators to participate in enhanced discovery and linking scenarios facilitated by Linked Data. Commercial archaeology units can similarly benefit from wider exposure of their data and the ability to cross search across different datasets.

The project team is actively involved in a number of European and transatlantic projects that have coinciding objects to the STELLAR project, this means that cross-fertilisation and collaboration between such ADS projects as TAG: Transatlantic Archaeological Gateway, the ESFRI DARIAH (ARENA2) is likely. The ADS is obviously well placed to ensure that these projects which are looking specifically at cross-searching and aggregation in an archaeological context benefit from the guidance, methodology and tools developed under STELLAR.

The methods and techniques developed are not confined to the archaeology domain and can be applied more generally. They are particularly relevant to digital libraries, museums and potentially also to archives. The CIDOC CRM is an ISO Standard with particular relevance to the museum and library communities. Glamorgan belongs to the Europeana FP7 Thematic Network - the European Digital Library project, at the broadest level, shares similar goals and will potentially benefit from methods, guidelines and techniques generated by the work. This can also potentially benefit various JISC communities, where the project team has good contacts and previous collaborations.

The recent interest in Linked Data by government circles associated with an open data approach means that the general methods, guidelines and experience will be of benefit to the ongoing efforts in these areas.

The ADS and Glamorgan will engage in the dissemination activities outlined in the proposal. In addition the ADS engages in a series of regular dissemination activities under the direction of the ADS's User Services Manager (STELLAR Co-I). These activities include 2 hard copy newsletters per annum with a circulation of around 2000, regular RSS feed updates and the utilisation of jiscmail email lists (ads-all). The project team regularly presents at national and international conferences (conference attendance specifically for STELLAR is included in this proposal) and engages in an extensive programme of visits and g

Publications

10 25 50
 
Description The benefits for semantic interoperability in mapping and extracting archaeological datasets to an integrating conceptual framework are widely recognized. However, achieving mappings in practice has required specialist knowledge of the ontology and has been resource intensive. STELLAR (Semantic Technologies Enhancing Links and Linked data for Archaeological Resources) aims to makes it easier for archaeology data owners who are not ontology specialists to
express their excavation data consistently in terms of a core ontology and generate linked data. The main deliverables are
the freely available mapping/extraction tools and a published set of linked data.

The current situation is one of fragmented datasets and applications, with different terminology systems. The AHRC funded
STAR (Semantic Technologies for Archaeological Resources) project, in collaboration with English Heritage (EH),
addressed these concerns, exploiting the potential of a high level, core ontology - the CIDOC Conceptual Reference Model
(CRM), extended for archaeological purposes. The STAR system cross searches over excavation datasets from different
archaeological database schemas. However, the STAR data mapping and extraction to the CRM ontology was performed
by project team members.

STELLAR generalises and extend the data extraction tools produced by STAR to facilitate their adoption by third party data
providers. The project is a collaboration between the University of Glamorgan and the Archaeological Data Service at the University of York, with EH as project partners. The extracted data is represented in standard RDF formats that allow the datasets to be cross searched and linked by a variety of Semantic Web tools, following a Linked Data approach.
STELLAR tools work from a set of templates that express commonly occurring patterns. A user chooses a template for a
particular data pattern and supplies the corresponding input from their database. In addition to the original ontology (CRMEH archaeological extension to CIDOC CRM), more general CIDOC CRM templates conforming to the CLAROS Project and SKOS format for glossaries/thesauri are extra outcomes.

Two tools have been produced. STELLAR.Console is a downloadable command line utility application that imports
delimited tabular data (TAB, CSV) files to an internal database where they can be queried using SQL. STELLAR.Web is a
simpler browser-based application that produces RDF directly from CSV data. Both tools express outputs in a form suitable
for linked data representation. The tools are freely available with guidelines and tutorials
(http://hypermedia.research.glam.ac.uk/resources/STELLAR-applications/).

In response to user feedback, in addition to the internal templates, a capability has been added for external user defined
templates to allow tailoring for specific user requirements and a wider set of ontologies, without rebuilding the STELLAR
tools. STELLAR user defined external templates generalise the techniques to facilitate data conversion to any user-defined
textual form.

The ADS, in its role as a digital repository, selected archived datasets for conversion to RDF. The resulting linked data
were ingested into a repository (triple store), which provides a SPARQL endpoint for consumption by a number of semantic
technologies including Pubby (an open source linked data front end). Content negotiation presents data in formats
appropriate for the requesting application (eg RDF/XML/HTML browsers). A range of datasets were processed, covering a
broad representative sample of archaeological excavation datasets. As ADS intends to continue to explore the use of linked data technologies, effort was devoted to ensure that URI construction was appropriate for the domain. The linked
data outputs (and the frontend) are available from the ADS website (http://data.archaeologydataservice.ac.uk).
Exploitation Route see Narrative impact section here.
The STELLAR tools are open source and freely available.

The online dissemination of datasets to accompany site monographs and summary documentation is becoming common practice within the archaeology domain. Since the legacy database schemas involved are often created on a per-site basis, cross searching or reusing this data remains difficult. Employing an integrating ontology, such as the CIDOC CRM, is one step towards resolving these issues. However, this has tended to require computing specialists with detailed knowledge of the ontologies involved.STELLAR provides lightweight tools to make it easier for non-specialists to publish Linked Data. Applications developed for the STELLAR project were applied by archaeologists to major excavation datasets and the resulting output was published as Linked Data, conforming to the CIDOC CRM ontology.
Sectors Creative Economy,Digital/Communication/Information Technologies (including Software),Culture, Heritage, Museums and Collections

URL http://hypermedia.research.southwales.ac.uk/kos/stellar/
 
Description The research on semantic data integration (STELLAR) provided tools and techniques that enabled the Archaeology Data Service (ADS http://archaeologydataservice.ac.uk/) to extract and publish Linked Data from major commercial archaeology units' excavation datasets, integrated semantically via mapping to the CIDOC CRM ontology. It is envisaged this will serve as a catalyst for further production of archaeological Linked Data by ADS and others. Building on this work, we are leading the FP7 ARIADNE archaeology e-infrastructure Work Package, Linking Archaeology Data. The research enabled ADS (non-specialists in semantic technologies) first foray into Linked Data and represents a major development in practice and capability by ADS and in UK archaeological data publication. It has generated considerable attention - from June 2012 over roughly 12 months, 41,110 requests were made to the SPARQL endpoint to the Linked Data which averages approximately 3425 requests per month. Lee (2012) positively refers to non-specialist STELLAR tools in an English Heritage Practitioner article. The significance also derives from the importance of the published datasets and the exemplar. The Linked Data includes datasets drawn the Channel Tunnel Rail Link and the Aggregates Levy Sustainability Fund, major archaeological programmes with excavations undertaken by two of the largest commercial units in England (Oxford Archaeology Ltd and Wessex Archaeology Ltd). Other datasets included an excavation database with details of the earliest ironworking yet known in Britain. As the only record of unrepeatable fieldwork, it is essential that these data are preserved and made available for re-use and re-interpretation. Commercial archaeology units benefit from wider exposure of their data and the ability to cross search across different datasets from different units and for reuse of data (with potential economic benefit). The reach is amplified by the key strategic role played by the ADS nationally and internationally. The ADS is a national repository for digital data from the UK historic environment sector, crosscutting the academic and public and private sectors. It provides online access to over one million metadata records on behalf of national government agencies, local government Historic Environment Records, and amenity and period societies and other specialist databases. The ADS user community includes national and local government archaeologists and cultural heritage managers, museums and commercial archaeologists and members of the public. The Deutsches Archäologisches Institut have used STELLAR research tools to make a SKOS version of the DAI Archaeological Thesaurus (German language).
First Year Of Impact 2011
Sector Digital/Communication/Information Technologies (including Software),Culture, Heritage, Museums and Collections
Impact Types Cultural,Societal,Economic

 
Description AHRC Follow on Fund
Amount £76,000 (GBP)
Organisation Arts & Humanities Research Council (AHRC) 
Sector Public
Country United Kingdom
Start 03/2013 
End 02/2014
 
Description EC FP7 Infrastructures Grant: ARIADNE (Advanced Research Infrastructure for Archaeological Dataset Networking in Europe)
Amount £205,000 (GBP)
Funding ID 313193 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 02/2013 
End 01/2017
 
Description H2020 Programme
Amount € 6,597,368 (EUR)
Funding ID H2020-INFRAIA-2018-1-823914 
Organisation European Commission H2020 
Sector Public
Country Belgium
Start 01/2019 
End 12/2022
 
Description Heritage Protection Commissions grants
Amount £28,757 (GBP)
Funding ID LD4HE 
Organisation Historic England 
Sector Public
Country United Kingdom
Start 01/2019 
End 06/2020
 
Title Archaeological Excavation datasets (commercial and HE) converted to RDF via STELLAR tools and ingested into the data.archaeologydataservices.ac.uk Linked Data Repository. 
Description Archaeological Excavation datasets (commercial and HE) converted to RDF via STELLAR tools and ingested into the data.archaeologydataservices.ac.uk Linked Data Repository 
Type Of Material Database/Collection of data 
Year Produced 2011 
Provided To Others? Yes  
Impact Various impacts outlined in REF 2014 Impact Case Study https://impact.ref.ac.uk/casestudies/CaseStudy.aspx?Id=27425 
URL https://archaeologydataservice.ac.uk/research/stellar.xhtml
 
Description English Heritage 
Organisation English Heritage
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution STAR, STELLAR and SENESCHAL outcomes made a significant contribution to EH strategic objectivies in digital heritage and vocabulary management and standards. In STAR, The collaboration with English Heritage (EH) on digital archaeology has been interdisciplinary. EH have seen direct benefit, both to their datasets and the wider exposure of their thesauri via the terminology services and the implementation of their extension to the CIDOC CRM ontology for archaeology. The following remarks on the project's significance are contributed by EH. "One key outcome of STAR has been the development and dissemination of the EH ontological modelling (referred to by the project as the CRM-EH) in RDF, which otherwise would have been unlikely to have happened, and certainly not as soon. Another outcome from STAR has been the enhanced awareness of the CRM-EH and its ontological basis in the CIDOC CRM across wider cultural heritage and related IT sectors which has been significantly increased through the various STAR project publications, workshops and project team attendance at conferences and presentations. This has helped EH in promoting the potential use of standards like CIDOC CRM, SKOS and Thesauri for developing interoperability in the sector. Conversion of the EH Thesauri into SKOS format would have been very unlikely to happen so succinctly and effectively without the R&D expertise provided by partnership with Glamorgan Uni. This is a major benefit and considerable technological step forward for our sector that will have benefits across and beyond the heritage sector where the EH thesauri and related terminologies are the most widely used resource of their type. An example of this is the development by the STAR project of the SKOS terminology web services, particularly for other related resources such as ADS, which will most likely facilitate enhancement of the thesauri in SKOS format for the OASIS pan-UK online archaeological reporting system. " in SENESCHAL We (and the vocabulary partners in the SENESCHAL project) published as (SKOS) Linked Data the nationally recognised cultural heritage thesauri standards from English Heritage, the Royal Commission on the Ancient and Historical Monuments of Scotland and the Royal Commission on the Ancient and Historical Monuments of Wales. This includes concepts widely used for indexing relating to monument types, archaeological events and time periods. The significance is that previously the vocabulary providers lacked the ability to facilitate uniquely identified semantic indexing of data. Major thesauri can act as vocabulary hubs for the Web of Data (as suggested by W3C Library Linked Data Incubator Group). For example, the availability of the Thesaurus of Monument Terms in this way is seen as a major development for the ADS archive metadata Linked Data . This Linked Data publication of the English Heritage thesauri is a significant development in their vocabulary standards practice and their information access strategy. The potential reach is wide since it is a core activity of ADS, English Heritage, The Royal Commissions on the Ancient and Historical Monuments of Scotland/Wales to promote and disseminate best practice to the heritage sectors, as well as providing guidance on appropriate data standards including thesauri. The linked data vocabularies and web services will be integrated into the widely used ADS reporting/archiving tool, OASIS, which is in near universal use by commercial and local government archaeologists. Adoption of linked data based vocabulary management in this tool will immediately affect how all sectors engage in archaeological field practice and development control planning. We represented the English Heritage archaeological extension to the CRM ontology in RDF and as Linked Data. This allowed it to be a key ontology hub in the ADS archaeology Linked Data. This is another important step in English Heritage's strategic plans for information standards.
Collaborator Contribution STAR - English Heritage. The collaboration with English Heritage was very significant to the whole research project and absolutely necessary. Although there was no formal agreement the collaboration was planned and detailed in the Proposal - EH effectively acted English Heritage. UK non Research Organisation Keith May of English Heritage was a key member of the project team and project management. His contributions included design of the CRM-EH ontology, intellectual mapping of datasets to CRM-EH, writing and presenting outcomes, etc.
Impact General STAR, STELLAR, SENESCHAL project outcomes and outputs http://hypermedia.research.southwales.ac.uk/kos/star/ http://intarch.ac.uk/journal/issue30/tudhope_index.html http://hypermedia.research.southwales.ac.uk/kos/stellar/ http://hypermedia.research.southwales.ac.uk/kos/SENESCHAL/ http://www.heritagedata.org/
Start Year 2006
 
Description Museum of London Archaeology (MOLA) 
Organisation Museum of London Archaeology
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution STAR used an extract from the MOLA database in the project for semantic integration and the final demsontrator
Collaborator Contribution Museum of London Archaeology (MOLA) made datasets available and hosted a project meeting, giving early feedback
Impact General STAR outcomes and outputs http://hypermedia.research.southwales.ac.uk/kos/star/ http://intarch.ac.uk/journal/issue30/tudhope_index.html
Start Year 2007
 
Description The Archaeology Data Service (ADS) 
Organisation University of York
Department Archaeology Data Service (ADS)
Country United Kingdom 
Sector Academic/University 
PI Contribution In STAR - Semantic information extraction from ADS OASIS grey literature library. Research demonstrator of semantic integration of archaeological datasets and grey literature reports, very relevant to ADS research strategy. Continuing collaboration through two other AHRC grants and FP7 project ARIADNE. ADS Co-I with us in AHRC projects STELLAR and SENSCHAL The Archaeology Data Service (ADS) were Co-Investigators in STELLAR. They used the STELLAR tools to map and extract CRM-based RDF and published Linked Data.
Collaborator Contribution The Archaeology Data Service (ADS) provided the extract of OASIS grey literature reports for the STAR NLP work. ADS hosted the final STAR workshop and also hosted a joint STAR/ArcheoTools project workshop which was very helpful in the early stages of the project. The Archaeology Data Service (ADS) were Co-Investigators in STELLAR. They used the STELLAR tools to map and extract CRM-based RDF and published Linked Data. ADS were Co-Is in STELLAR/. The research on semantic data integration (STELLAR) provided tools and techniques that enabled the Archaeology Data Service (ADS http://archaeologydataservice.ac.uk/) to extract and publish Linked Data from major commercial archaeology units' excavation datasets, integrated semantically via mapping to the CIDOC CRM ontology. It is envisaged this will serve as a catalyst for further production of archaeological Linked Data by ADS and others. Building on this work, we are leading the FP7 ARIADNE archaeology e-infrastructure Work Package, Linking Archaeology Data. The research enabled ADS (non-specialists in semantic technologies) first foray into Linked Data and represents a major development in practice and capability by ADS and in UK archaeological data publication. It has generated considerable attention. The significance also derives from the importance of the published datasets and the exemplar. The Linked Data includes datasets drawn the Channel Tunnel Rail Link and the Aggregates Levy Sustainability Fund, major archaeological programmes with excavations undertaken by two of the largest commercial units in England (Oxford Archaeology Ltd and Wessex Archaeology Ltd). Other datasets included an excavation database with details of the earliest ironworking yet known in Britain. As the only record of unrepeatable fieldwork, it is essential that these data are preserved and made available for re-use and re-interpretation. ADS were also Co-Is in SENESCHAL and made use of the SENESCHAL services in their content management system and actively partiicpated throughout the project.
Impact STAR, STELLAR, SENESCHAL project outcomes and outputs http://hypermedia.research.southwales.ac.uk/kos/star/ http://intarch.ac.uk/journal/issue30/tudhope_index.html http://hypermedia.research.southwales.ac.uk/kos/stellar/ http://hypermedia.research.southwales.ac.uk/kos/SENESCHAL/ http://www.heritagedata.org/
Start Year 2007
 
Description York Archaeological trust (YAT) 
Organisation York Archaeological Trust
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution An extract from the YAT database formed part of the STAR project and final Demonstrator
Collaborator Contribution York Archaeological trust (YAT) supplied one of the datasets used in the Demonstrator and also participated in project workshops, giving valuable feedback.
Impact General STAR outcomes and outputs http://hypermedia.research.southwales.ac.uk/kos/star/ http://intarch.ac.uk/journal/issue30/tudhope_index.html
Start Year 2007
 
Title STELLAR project templates and tools 
Description Semantic Technologies Enhancing Links and Linked data for Archaeological Resources (STELLAR) http://hypermedia.research.southwales.ac.uk/kos/stellar/ STELLAR generalised and extended the data extraction tools produced by the earlier STAR project to facilitate their adoption by third party data providers. The extracted data is represented in standard RDF formats that allow the datasets to be cross searched and linked by a variety of Semantic Web tools, following a Linked Data approach. The aim is to make it easier for data owners who are not ontology specialists to express their excavation data in terms of the CIDOC CRM ontology (and CRM-EH archaeological extension) and to generate semantic / linked data representations. The STELLAR tools convert archaeological data to RDF in a consistent manner, without requiring detailed knowledge of the underlying ontology. The current set of templates corresponds to the general aim of cross searching excavation datasets for inter-site analysis and comparison. Different templates that drew on other areas of the ontology could be designed for other purposes. Each template is a combination of various optional elements with a mandatory ID. The ID is prefixed with a namespace (a tool parameter) to generate URIs. To generate RDF, the user chooses a template for a particular data pattern and supplies the corresponding input from their database. In addition to CRM-based templates, there is a template allowing a glossary/thesaurus connected with the dataset to be expressed in SKOS. The CRM templates have elements, giving the (preferred) option of expressing controlled data items as SKOS URIs (either to local vocabularies generated by the SKOS template, or to Linked Data publications of a major SKOS vocabulary). The STELLAR templates are available from the Project website, along with the tools that operate over the templates. Documentation and tutorials are also available. The STELLAR tools are open source and freely available from https://github.com/cbinding/stellar 
Type Of Technology Webtool/Application 
Year Produced 2011 
Impact ADS (Archaeological Data Service) employed the STELLAR tools to publish a significant selection of excavation archives as Linked data http://data.archaeologydataservice.ac.uk/. STELLAR tools have been used by the DAI to skosify an archaeological thesaurus http://c4tc.wordpress.com/2012/10/08/skosifying-an-archaeological-thesaurus/ and in the Colonisation of Britain Project, commissioned by English Heritage http://www.archaeogeomancy.net/2014/05/colonisation-of-britain/. 
URL https://github.com/cbinding/stellar
 
Description Digital Past 2013 (Semantic Technologies and Linked Data) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Digital Past is a two day conference which showcases innovative digital technologies for data capture, interpretation and dissemination of heritage sites and artefacts. We led a workshop on the potential of Semantic Technologies and Linked Data. It stimulated discussion, questions and interest, leading to further contacts.

General raised awareness and interest and generated contacts
Year(s) Of Engagement Activity 2012,2013
URL http://www.heritagedata.org/blog/spreading-the-word/