Digging into Early Colonial Mexico: A large-scale computational analysis of 16th century historical sources

Lead Research Organisation: Lancaster University
Department Name: History

Abstract

The 'Colonisation of America' is a fundamental process in the history of the modern world. Along with archaeological remains, the historical writings related to the establishment of the so-called Virreinatos constitute primary sources of information for the understanding of this period. An extended compilation of information ordered by the Spanish crown in the 16th century, called Relaciones Geográficas, served to gather vast amounts of information about the New World through multiple records and descriptions, both in Spanish and indigenous style. Traditional research of these documents has relied on the close reading of a handful of these texts, which can take the scholar a life-time to examine. Using a Big-Data approach, this project will apply for the first time ground-breaking computational methodologies to study one of the most important sources for the colonial history of America, and it will identify, extract, cross-link, and analyse information of vital importance to historical enquiry. Our highly interdisciplinary team will combine techniques from different disciplines, including Corpus Linguistics, Text Mining, Natural Language Processing, Machine Learning, and Geographic Information Systems, to address questions related to the recording of information about indigenous cultures, the Spanish exploration of indigenous social and religious concepts, the appropriation and ideas about place and space in the indigenous world, and their attitudes towards politics and economy. In doing so, the project will transform the way historical sources and large corpora are approached and analysed by modern scholars.

Planned Impact

While the development of the project will have significant value for established practitioners of history, archaeology, anthropology, linguistics, digital humanities, and computer science, working in a range of subdisciplines, we are strongly committed to form a skill-base of new academics with a robust history of collaborative work and interdisciplinary mind set. As such, we think that working with different strategies related to education, networking, and publicity will generate a positive impact and encourage the adoption of interdisciplinary methods and relations by a broad audience. We have identified a number of ways in which we will do this: the project will employ research associates, post-doctoral researchers; it will offer a PhD studentship; and it will organise short courses, local seminars, and expert workshops. One studentship in Computer Science will be offered by the project and is described in section a of the PMDC. In addition, the project will employ three research associates working in Corpus Linguistics/NLP, two senior research associates, one in Computational Linguistics and one in Spatial Humanities, one post-doctoral researcher in NLP and Computational Social Science, and two postgraduate researchers specialised in Mexican Colonial History. All of them will work in a highly interdisciplinary environment and will develop considerable skills in their own fields, as well as in Digital Humanities. Where adequate, they will contribute to publications either as co-authors or on their own, and the team leaders will look to actively support their academic development.
The Universities of Lisbon and Chester will contribute, at no cost to the project, with the organisation of two courses each year, one in Spatial Humanities (Chester), and another in NLP and Machine Learning (a summer school in Lisbon). In addition, a focused conference and extended workshop in computational approaches to historical texts will be organised in year 2 by our team in Mexico. The PI and the team leaders have extensive experience delivering courses in these topics, which have a huge demand across Europe and beyond.
The results of the project will be disseminated to students and an academic audience via workshops, papers in conferences, at least five articles in international peer-reviewed journals, and multiple contributions to other possible book collections. The outputs will target journals from a range of disciplines (archaeology, history, linguistics, computer science, geography, digital humanities) and period-specialisation, looking to reach a broad and diverse academic audience. In order to disseminate our findings to stakeholders beyond the academy, including heritage managers, librarians, educators, and the wider public, the team will contribute to public lecture series and will engage actively with social media, the project's blog, and media interviews. Our project will therefore enhance both academic and public appreciation of the possibilities in the use of data and the development of digital technologies and methods for a wide range of research in Humanities fields. From the start of the project, our aims and emerging results will be communicated to the public on a website maintained by the University of Chester. This will be linked to an active presence in social media via Twitter, Facebook and Google Plus. The blog and a dedicated YouTube channel will feature our research processes and experiments, as well as videos of lectures and presentations.
All resources and code generated within the project will be stored in an open collaborative repository (i.e., the project's GitHub). The datasets (e.g., the annotated corpus and gazetteer) and other research outputs will be stored in the Digital Humanities Research Centre Repository and the Museo del Templo Mayor in Mexico, the UK Data Service and they will also be 'returned' to the Internet Archive and to Europeana.

Publications

10 25 50

publication icon
Monteiro (2019) Spatial Disaggregation of Historical Census Data Leveraging Multiple Sources of Ancillary Information in ISPRS International Journal of Geo-Information

 
Description The 16th century historical collection the project is working with, is nowadays compiled in 12 volumes covering 168 reports. This is known as the Geographic Reports of New Spain and it comprises one of the most important resources for the study of the early period of the contact between Europeans and Americans, and the processes of colonisation of Latin America. Among the many results we have achieved so far, our project has created a 'machine ready' version of these documents. This means that researchers and future generations to come will be able to enjoy these digital texts, permitting all sorts of digital processing of their content including text mining, automated searchers, etc.
In addition to this, we have created an annotated version of the collection. This means that we have added important substantial information to it, related to research topics of interest. For example, we have added 'tags' to terms, concepts or topics such as people, places, animals, resources and materials among many others. These topics are highly relevant to solve historical research questions. This is allowing us to ask the computer to extract and cross link this information in an automated way, retrieving for example: 'all those instances in the corpus that are related to a particular practice such as mining'. This query results in the computer showing us the paragraphs were activities and resources related to mining are mentioned. This has the potential to change the way we do research in the Humanities, as in the not too distant future, we will be able to automatically annotate other similar historical collections and cross link information between them.

Another exciting development is the use of Natural Language Processing (NLP) techniques combined with Machine Learning. A well known technique called Named Entity Recognition is allowing us to 'teach' the machine to identify in an automatic way when people, places, institutions and dates, among others are mentioned across this collection. Although this technique has been amply developed within computer science, our corpus presents important but stimulating challenges. These techniques were created with modern languages in mind. The Geographic Reports of New Spain are not only written in 16th century Spanish, but they are also peppered with around 69 indigenous languages including Nahuatl, Mixtec and several variants of Mayan. Our Computer Science team is advancing the field of NLP through this challenge modifying the algorithms we use for the identification of this historical multilingual information.

This complex set of documents are also accompanied by a fantastic variety of 'pinturas'. This is to say, maps that were ordered to be included in the original questionnaire, portraying geographic information of the towns and villages in New Spain. These maps are also of great importance for understanding the processes behind the contact between very different cultures. Many of them were created by indigenous informants and they were painted in a style previously unseen, including aspects of the Mesoamerican codices and European conceptions of cartography. Although in the original proposal we aimed only to work with the textual collection, we started also to do work now with the 78 existent maps. We carried out a citizen science project called 'Subaltern Recogito' where part of this map collection was annotated. The set of annotations is now freely available through the Nettie Lee Benson Library at the University of Texas. This work included the creation of a tailored ontology to identify the subjects of interest in the maps and annotate them. With this dataset is now possible to carry out cross-queries between different maps. In this case we are also developing annotation techniques combined with image processing analysis. We hope that this work will aid the comparison with many other similar collections of maps located in the Archive of the Nation in Mexico in the future.

The project has also developed the first 16th century geographic dictionary (digital gazetteer) of New Spain. This sounds easier than it is. Historical placenames present a phenomenal challenge, not only because they can be mentioned with different spellings, but also they might have changed over time, disappeared, moved, etc. This makes the task of locating historical places highly challenging. Our team has developed an efficient way to investigate and disambiguate (assign correct coordinates) to more than 14, 000 historical place names. This will not only advance substantially historical and archaeological research of the period, but it will also allow for the first time, the creation of enquiries with Geographic Information Systems (GIS can thought of as an advanced version of Google Maps) and to carry out spatial analysis. Our team will explore in the last part of the project the first comparison of settlement patterns at a large regional scale between the times before the arrival of the Spanish (called the Late Postclassic) and the early colonial period with this dataset. Although the research and GIS dataset was completed, COVID delayed further plans for this dataset. We needed to create an online database from this, and it is until now that this is being carried out.

In addition, but related to the previous development, we have also created a large dataset related to historical geographies with more than 40 layers of geographic information corresponding to the 16th, 17th and 18th centuries. We are now in the process of finalising other datasets, and we will enter at the end of this year our final stage, where we will be investigating exciting historical questions with this research and methods. For this, we are refining a methodology called Geographic Text Analysis that involves the combination of corpus linguistics and geographic approaches for the investigation of patterns in large volumes of data. Related to this, we are in the process of creating the Geographical Text Analysis online platform. This is a software that will allow any person to carry out this kind of analysis from any desired text. Although we started the development of this one time, COVID hampered a series of workshops that we needed to carry out for its development and testing, so work has been delayed. The software was supposed to be available in 2020, but we are now calculating this will be possible only until the end of 2021 or early 2022.
Exploitation Route We experienced a substantial delay due to covid. However, we are now in the last stage to release both the software and the datasets created. We believe these will benefit researchers in Latin America in the next decades as they will be able to carry out all sorts of research queries due to the methodologies we are developing, but also research in general with the datasets. We believe that these will have a substantial impact on History, Archaeology, Ethnography and Anthropology.
We are also carrying out research in collaboration with industries and we are already impacting the way they work, particularly in NLP. Through a series of collaborations, we have been able to tailor some of the software they are creating while also shaping ideas for new research, addressing some of the challenges we encounter in the Humanities, but that can be of high relevance in the industry.
Sectors Communities and Social Services/Policy

Digital/Communication/Information Technologies (including Software)

Education

Leisure Activities

including Sports

Recreation and Tourism

Government

Democracy and Justice

Culture

Heritage

Museums and Collections

URL https://www.lancaster.ac.uk/digging-ecm/
 
Description Our research and findings in the field of Natural Language Processing are being used by industry to inform their practices and improve services and products. This is particularly related to our established partnership with TagTog. Part or our work concerns the training of Machine Learning models for the automated identification of information in historical documents. This presents at the moment a major challenge related to the fact that our corpus is written in 16th century Spanish and it contains also inclusions of 69 additional indigenous languages. As current methods have only taken into account modern languages, and particularly European ones, we are in the process of testing and improving these models to account for this complexity. At the moment, our work with our industry partner TagTog, has lead already to an interesting collaboration, where they have transformed the platform they offer as a service thanks to our detailed feedback, and we are now also carrying out research with them as they do not normally encounter the kind of datasets we work with. This has resulted in an interesting synergy, were we are starting to closely collaborate with them on research and challenges. We carried out a workshop with them from funding provided by the ESRC Business Boost scheme ran at Lancaster University, where we explored the use of Artificial Intelligence for Humanities research in collaboration with the company. More information about this event can be seen here: https://www.lancaster.ac.uk/digging-ecm/2019/02/exploring-ai-workshop/ In addition to this have presented the work we are doing with the company in collaboration with them as authors at the international ADHO Digital Humanities Conference of 2019. Although these initiative form part of academic outputs, we are contributing to changes in this industry that do not form part of the academic realm. As part of this research, we received a grant from the ESRC-IAA to work together on a project to develop a tool within the tagtog platform called Interannotator Agreement. The research was finished in 2020 and we are now in the process of writing up the results and an article with Tagtog. We expect this to be published in 2021 or early 2022.
First Year Of Impact 2019
Sector Digital/Communication/Information Technologies (including Software),Culture, Heritage, Museums and Collections
Impact Types Cultural

Economic

 
Description DEMOS Think Tank Report
Geographic Reach National 
Policy Influence Type Citation in systematic reviews
URL https://demos.co.uk/wp-content/uploads/2019/10/Jisc-OCT-2019-2.pdf
 
Description Co-creating the future of the past: Harnessing AI for Humanities Research
Amount £11,392 (GBP)
Funding ID HIA7722 
Organisation Economic and Social Research Council 
Sector Public
Country United Kingdom
Start 06/2019 
End 07/2020
 
Description Digital Humanities and Latin American History
Amount £3,900 (GBP)
Organisation Lancaster University 
Sector Academic/University
Country United Kingdom
Start 03/2018 
End 08/2018
 
Description Digital Innovations in Latin American Colonial History
Amount £3,897 (GBP)
Organisation Lancaster University 
Sector Academic/University
Country United Kingdom
Start 02/2019 
End 08/2019
 
Description Exploring Artificial Intelligence for Humanities Research
Amount £3,283 (GBP)
Funding ID HIA6999 
Organisation Lancaster University 
Sector Academic/University
Country United Kingdom
Start 02/2019 
End 07/2019
 
Description Exploring the Uppsala Map
Amount £1,474 (GBP)
Funding ID HIA1030 
Organisation Lancaster University 
Sector Academic/University
Country United Kingdom
Start 01/2020 
End 08/2020
 
Description Lean Launch Programme
Amount £3,950 (GBP)
Funding ID Lancaster Innovate UK 
Organisation Innovate UK 
Sector Public
Country United Kingdom
Start 04/2021 
End 10/2021
 
Description Mesoamerican Apocalypse: A large scale analysis of the Indigenous perspective on the sixteenth-century epidemics of Colonial Mexico
Amount € 78,080 (EUR)
Organisation Heidelberg University 
Sector Academic/University
Country Germany
Start 07/2022 
End 07/2023
 
Description Pathways to understanding 16th century colonial Mexican geographies
Amount £1,498 (GBP)
Organisation Lancaster University 
Sector Academic/University
Country United Kingdom
Start 04/2019 
End 06/2019
 
Description Subaltern Recogito: Annotating the maps of the Relaciones Geográficas de Nueva España
Amount £3,500 (GBP)
Organisation Andrew W. Mellon Foundation 
Sector Private
Country United States
Start 04/2019 
End 10/2019
 
Description Text-to-map Partner Scoping
Amount £3,957 (GBP)
Funding ID ESRC-IAA 
Organisation Economic and Social Research Council 
Sector Public
Country United Kingdom
Start 03/2021 
End 09/2021
 
Description Unlocking the Colonial Archive: Harnessing AI for Indigenous and Spanish American historical collections
Amount £360,307 (GBP)
Funding ID AH/V009559/1 
Organisation University of Texas at Austin 
Sector Academic/University
Country United States
Start 02/2021 
End 02/2023
 
Title Geographic Text Analysis for 16th century material 
Description *Please note that there is no correct option for the kind of work we do on the 'Select the type of research tool or method' question. We are in the process of improving a methodology that enables the combination between spatial analysis, corpus linguistics methods and natural language processing techniques. We are working particularly on aspects of the next steps. This work includes both, further research and refining of the current method: 1. Preparation of corpora to transform it into machine readable format 2. Automatic and semi-automatic identification of place names and all entities mentioned in the corpus 3. Automated and semi-automatic disambiguation of toponyms 4. Annotation of the corpus 5. Geographical analysis of places mentioned in the corpus 6. Historical inquiry through the application of historical models and analysis techniques space 
Type Of Material Improvements to research infrastructure 
Year Produced 2018 
Provided To Others? No  
Impact We are refining GTA as a method, but we have already tested parts of it with success. 
 
Title NLP methods for multilingual historical corpora 
Description *Please note that there is no suitable option to describe the method in the 'Select the type of tool or method' question. Our team has developed a method with Machine Learning to carry out Named Entity Recognition of a corpus containing 69 indigenous languages and 16th century Spanish. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? No  
Impact The impact resulting from this tool will be substantial in the realm of History and Archaeology, particularly Mesoamerican and Colonial Latin American history. The development of this method will enable the automatic recognition of places, people, institutions and dates in historical documents, opening the possibilities of research in a substantial way. 
 
Title 16th Century Gazetteer of New Spain 
Description This dataset is the first digital historical gazetteer of 16th century New Spain. It contains at the moment around 9000 historical places. The creation of this dataset has comprised extended and significant research on historical geographies and the use of lexicons and other language resources such as: 1. Nomenclatura Geográfica de México: Etimologías de los nombres de lugar de Antonio Peñafiel Barranco, 3 vols. (1897). 2. Diccionario geográfico, histórico y biográfico de los Estados Unidos Mexicanos de Antonio García Cubas, 2 vols. (1888-1891). 3. Aztec place-names, their meaning and mode of composition, selected from the Spanish of Agustin de la Rosa, Antonio Peñafiel and Cecilio A. Robelo, de Frederick Starr (1920). 4. Nombres geográficos indígenas del Estado de Morelos, de Cecilio Robelo (1887). 5. Nombres geográficos indígenas del Estado de México, de Cecilio Robelo (1900). 6. Nombres geográficos mexicanos del Estado de Veracruz, de Cecilio Robelo (1902). 7. Diccionario de aztequismos, de Cecilio Robelo (1904). 8. Diccionario de mitología náhuatl, de Cecilio Robelo (1905-1908). 9. Toponimia Tarasco-hispano-nahoa, de Cecilio Robelo (1905-1908). The compilation of the gazetteer involves finding geographical coordinates (longitude and latitude) of each mentioned place not only in the Geographic Reports, but also those mentioned in other historical-geographical sources of the 16th century. Therefore, a good part of this stage has been dedicated to investigating and disambiguating places that today no longer exist, or which location is doubtful due to diverse reasons. To date, this gazetteer contains around 9000 place names. However, we consider this a non-static product as although we are already using it for research, we will continue to add to it as we advance. All databases and datasets created by the project are being developed with Link Open Data standards so we can guarantee their interoperability when we release them to the wider public. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? No  
Impact This gazetteer will change the landscape of historical research in Mexico and Guatemala. It was developed through semi and automated techniques of geoparsing and disambiguation for historical placenames. It contains not only coordinates of historical placenames as recorded in the 16th century, but also all alternative spellings and placenames that our research has found. With this dataset we are now proceeding to carry out research in multiple areas. This includes the first comparison of settlement patterns between the Mesoamerican Late Post-Classic and the Early Colonial period at a country scale. Other analyses using this dataset includes studies in indigenous toponymies and its changes over time, spatial analyses of the distribution of materials of archaeological interest, and solving research questions posed by the project. Another important development of this dataset is the use of the Alexandria Data Model for its interoperability with other datasets including the project 'GIS de las Indias'. 
URL https://github.com/patymurrieta/Digging-into-Early-Colonial-Mexico
 
Title Annotated Corpus of 16th century Geographic Reports of New Spain 
Description This dataset constitutes the first annotated corpus of 16th century Geographic Reports of New Spain. It consists in an annotated machine readable version of the corpus containing 168 16th century historical reports tagged with entitites as designed by the research team at the project. This corpus includes now: 10 volumes of the Relaciones Geográficas del siglo XVI (UNAM-IIA, 1982-1988) edited by Rene Acuña. 2 volumes of the Relaciones Histórico-Geográficas de la Gobernación de Yucatán (UNAM-1983) edited by Mercedes de la Garza We are in the process of annotating the versions by Fco. del Paso y Troncoso and the Real Academia de la Historia. We are also in the process of carrying out experiments with NLP anad Machine Learning techniques with these datasets. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? No  
Impact This dataset will be used for three innovative purposes: 1. It allows the automated identification, extraction and analysis of data relevant to solve historical questions. 2. It constitutes now an historical training dataset for Machine Learning models that will be of use to anyone working on NLP. 3. Thanks to the detailed annotation model design by the team, this dataset allows the cross-reference of information with other historical datasets. 
 
Title Annotated Corpus of 16th century maps 
Description This dataset contain a corpus of 78 16th century maps related to the Geographic Reports of New Spain. Our collaboration with the University of Texas resulted in the acquisition and permits for research of the collection held by the Nettle Benson Library. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? No  
Impact This dataset constitutes a fully annotated corpus of maps. This will enable in the future the automated search of characteristics based on image rather than text, and it will help in the analysis of elements and form of historical hybrid colonial maps. 
 
Title DECM 16th Century Gazetteer 
Description The DECM Historical Gazetteer is a digital gazetteer of historical places in Mexico available in different formats and built from detailed research. This includes a 16th century version (The DECM 16th Century Gazetteer) with the toponyms mentioned in primary sources including the Relaciones Geográficas (1577-1585) and the Suma de Visitas de los Pueblos de la Nueva España (1548-1550), as well as information on the political, religious and administrative units of the Viceroyalty of New Spain. The gazetteer is composed of 71 main files with geographic information of colonial provinces, alcaldias, corregimientos, diocesis, among many others, as well as thousands of historical cities, towns, villages, and other places. This is integrated in an interoperable model containing thousands of historical locations with alternative spelling place-name variations. The dataset also includes 30 tables with additional historical information related to toponyms, languages, repositories, maps, etc.There are two versions of the DECM Historical Gazetteer: 1) The DECM Gazetteer, and 2) The DECM_16thC_Gazetteer.This is the 16th Century DECM Gazetteer. This version constitutes a subset of the main DECM Historical Gazetteer, including only the toponyms that were mentioned in the the Suma de Visita de los Pueblos and the RG reports, providing a precise window into the period when these were recorded (1548-1550/1577-1585). This dataset contains: the toponyms with coordinates, mentioned and disambiguated for each RG volume and the Suma; plus, the 30 geographical layers and 32 tables as explained above, as well as the DECM Gazetteer Registry.Content:
1. DECM_16thCentury_Gazetteer_RGs: Contains 11 shapefiles (points) with historical geographies. They have been created identifying all the toponyms mentioned in the historical documents within the edited volumes of the Relaciones Geográficas de la Nueva España and Yucatán published by Acuña, De la Garza, and the Suma de Visita de los Pueblos by Del Paso y Troncoso (see sources used) disambiguated and atomized in a spatial database. Each shapefile include the following attributes: ID, Place name, Alternative Names, Modern Name, References, Location, Confidence Degree, ID of a related location, Relationship shared, Location Type, Type Thesaurus URL, Coord X, Coord Y, Time spam URL, Start date, End date. Every shapefile includes metadata.
2. DECM_Additional_Information: Contains 49 shapefiles (points and polygons) with additional information on historical geographies. They have been created by digitising historical studies on 16th century Colonial History of Mexico. 
3. DECM_Tables: Cointains 2 folders with additional tabular data with relevant information about 16th century colonial Mexico in two formats: csv and xlsx.
4. Documents: a) DECM_Gazetteer_Disambiguation_Percentage.xlsx - Excel document with the total amount of toponyms included in the primary and secondary sources used, the number and percentage of places disambiguated where X and Y coordinates were assigned or not found. 
 b) Souces_for_the_Disambiguation - Text file with the bibliographical references used to locate the place names in the DECM_Gazetteer_primarysources folder. 
 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This dataset is now being widely used by Latin American scholars and researchers interested in the historical geographies of Mexico and Guatemala. We are also now in the process of integrating the gazetteer into the World Historical Gazetteer, were it will be further distributed. 
URL https://figshare.com/articles/DECM_16th_Century_Gazetteer/12376814/2
 
Title DECM 16th Century Gazetteer 
Description The DECM Historical Gazetteer is a digital gazetteer of historical places in Mexico available in different formats and built from detailed research. This includes a 16th century version (The DECM 16th Century Gazetteer) with the toponyms mentioned in primary sources including the Relaciones Geográficas (1577-1585) and the Suma de Visitas de los Pueblos de la Nueva España (1548-1550), as well as information on the political, religious and administrative units of the Viceroyalty of New Spain. The gazetteer is composed of 71 main files with geographic information of colonial provinces, alcaldias, corregimientos, diocesis, among many others, as well as thousands of historical cities, towns, villages, and other places. This is integrated in an interoperable model containing thousands of historical locations with alternative spelling place-name variations. The dataset also includes 30 tables with additional historical information related to toponyms, languages, repositories, maps, etc.There are two versions of the DECM Historical Gazetteer: 1) The DECM Gazetteer, and 2) The DECM_16thC_Gazetteer.This is the 16th Century DECM Gazetteer. This version constitutes a subset of the main DECM Historical Gazetteer, including only the toponyms that were mentioned in the the Suma de Visita de los Pueblos and the RG reports, providing a precise window into the period when these were recorded (1548-1550/1577-1585). This dataset contains: the toponyms with coordinates, mentioned and disambiguated for each RG volume and the Suma; plus, the 30 geographical layers and 32 tables as explained above, as well as the DECM Gazetteer Registry.Content:
1. DECM_16thCentury_Gazetteer_RGs: Contains 11 shapefiles (points) with historical geographies. They have been created identifying all the toponyms mentioned in the historical documents within the edited volumes of the Relaciones Geográficas de la Nueva España and Yucatán published by Acuña, De la Garza, and the Suma de Visita de los Pueblos by Del Paso y Troncoso (see sources used) disambiguated and atomized in a spatial database. Each shapefile include the following attributes: ID, Place name, Alternative Names, Modern Name, References, Location, Confidence Degree, ID of a related location, Relationship shared, Location Type, Type Thesaurus URL, Coord X, Coord Y, Time spam URL, Start date, End date. Every shapefile includes metadata.
2. DECM_Additional_Information: Contains 49 shapefiles (points and polygons) with additional information on historical geographies. They have been created by digitising historical studies on 16th century Colonial History of Mexico. 
3. DECM_Tables: Cointains 2 folders with additional tabular data with relevant information about 16th century colonial Mexico in two formats: csv and xlsx.
4. Documents: a) DECM_Gazetteer_Disambiguation_Percentage.xlsx - Excel document with the total amount of toponyms included in the primary and secondary sources used, the number and percentage of places disambiguated where X and Y coordinates were assigned or not found. 
 b) Souces_for_the_Disambiguation - Text file with the bibliographical references used to locate the place names in the DECM_Gazetteer_primarysources folder. 
 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This dataset is now being widely used by Latin American scholars and researchers interested in the historical geographies of Mexico and Guatemala. We are also now in the process of integrating the gazetteer into the World Historical Gazetteer, were it will be further distributed. 
URL https://figshare.com/articles/DECM_16th_Century_Gazetteer/12376814
 
Title DECM 16th Century Gazetteer 
Description The DECM Historical Gazetteer is a digital gazetteer of historical places in Mexico available in different formats and built from detailed research. This includes a 16th century version (The DECM 16th Century Gazetteer) with the toponyms mentioned in primary sources including the Relaciones Geográficas (1577-1585) and the Suma de Visitas de los Pueblos de la Nueva España (1548-1550), as well as information on the political, religious and administrative units of the Viceroyalty of New Spain. The gazetteer is composed of 71 main files with geographic information of colonial provinces, alcaldias, corregimientos, diocesis, among many others, as well as thousands of historical cities, towns, villages, and other places. This is integrated in an interoperable model containing thousands of historical locations with alternative spelling place-name variations. The dataset also includes 30 tables with additional historical information related to toponyms, languages, repositories, maps, etc.There are two versions of the DECM Historical Gazetteer: 1) The DECM Gazetteer, and 2) The DECM_16thC_Gazetteer.This is the 16th Century DECM Gazetteer. This version constitutes a subset of the main DECM Historical Gazetteer, including only the toponyms that were mentioned in the the Suma de Visita de los Pueblos and the RG reports, providing a precise window into the period when these were recorded (1548-1550/1577-1585). This dataset contains: the toponyms with coordinates, mentioned and disambiguated for each RG volume and the Suma; plus, the 30 geographical layers and 32 tables as explained above, as well as the DECM Gazetteer Registry.Content:
1. DECM_16thCentury_Gazetteer_RGs: Contains 11 shapefiles (points) with historical geographies. They have been created identifying all the toponyms mentioned in the historical documents within the edited volumes of the Relaciones Geográficas de la Nueva España and Yucatán published by Acuña, De la Garza, and the Suma de Visita de los Pueblos by Del Paso y Troncoso (see sources used) disambiguated and atomized in a spatial database. Each shapefile include the following attributes: ID, Place name, Alternative Names, Modern Name, References, Location, Confidence Degree, ID of a related location, Relationship shared, Location Type, Type Thesaurus URL, Coord X, Coord Y, Time spam URL, Start date, End date. Every shapefile includes metadata.
2. DECM_Additional_Information: Contains 49 shapefiles (points and polygons) with additional information on historical geographies. They have been created by digitising historical studies on 16th century Colonial History of Mexico. 
3. DECM_Tables: Cointains 2 folders with additional tabular data with relevant information about 16th century colonial Mexico in two formats: csv and xlsx.
4. Documents: a) DECM_Gazetteer_Disambiguation_Percentage.xlsx - Excel document with the total amount of toponyms included in the primary and secondary sources used, the number and percentage of places disambiguated where X and Y coordinates were assigned or not found. 
 b) Souces_for_the_Disambiguation - Text file with the bibliographical references used to locate the place names in the DECM_Gazetteer_primarysources folder. 
 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This dataset is now being widely used by Latin American scholars and researchers interested in the historical geographies of Mexico and Guatemala. We are also now in the process of integrating the gazetteer into the World Historical Gazetteer, were it will be further distributed. 
URL https://figshare.com/articles/DECM_16th_Century_Gazetteer/12376814/1
 
Title DECM Bibliographic Digital Collection 
Description Another important element of collaborative research in our project has been the collaborative creation and compilation of bibliography related to the historical corpus we are investigating. So far, we have compiled around 300 articles, books and research related or linked to the historical corpus and period of study. This is stored at the moment as a collaborative project bibliography in Zotero.org. This collection will continue to augment, and we will make sure at the end of the project that this becomes available to the wider public also as a resource. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? No  
Impact We have been able to share articles and compile resources collectively and in an expedite manner that otherwise would have taken longer to gather. 
URL https://www.zotero.org/groups/562590/digging_into_data
 
Title DECM Gazetteer 
Description The DECM Historical Gazetteer is a digital gazetteer of historical places in Mexico available in different formats and built from detailed research. This includes a 16th century version (The DECM 16th Century Gazetteer) with the toponyms mentioned in primary sources including the Relaciones Geográficas (1577-1585) and the Suma de Visitas de los Pueblos de la Nueva España (1548-1550), as well as information on the political, religious and administrative units of the Viceroyalty of New Spain. The gazetteer is composed of 71 main files with geographic information of colonial provinces, alcaldias, corregimientos, diocesis, among many others, as well as thousands of historical cities, towns, villages, and other places. This is integrated in an interoperable model containing thousands of historical locations with alternative spelling place-name variations. The dataset also includes 30 tables with additional historical information related to toponyms, languages, repositories, maps, etc.There are two versions of the DECM Historical Gazetteer: 1) The DECM Gazetteer, and 2) The DECM_16thC_Gazetteer.This is the DECM Gazetteer and this version includes all the researched historical toponyms contained in primary and secondary sources. The information is organised by RG volume. It also includes layers and tables of historical information digitised and/or created by the project from secondary sources.This set is composed by GIS shapefiles & Linked Places format information with: a) all toponyms with coordinates, mentioned and disambiguated from the primary sources (13 volumes) and secondary sources (3 volumes); b) 30 geographical layers of additional historical information derived from secondary sources; c) 32 tables with other important historical information related to the RGs; and d) the DECM Gazetteer Registry.Content
1. DECM_Gazetteer_primarysources: Contains 17 shapefiles (points and polygons) with historical geographies. They have been created from the indexes of the edited volumes of the Relaciones Geográficas de la Nueva España and Yucatán published by Acuña, De la Garza, and the Suma de Visita de los Pueblos by Del Paso y Troncoso (see sources used) disambiguated and atomized in a spatial database. Each shapefile include the following attributes: ID, Place name, Alternative Names, Modern Name, References, Location, Confidence Degree, ID of a related location, Relationship shared, Location Type, Type Thesaurus URL, Coord X, Coord Y, Time spam URL, Start date, End date. Every shapefile includes metadata.
2. DECM_Gazetteer_secondarysources: Contains 4 shapefiles (points and polygons) with historical information mentioned and collected from 4 secondary sources. 
3. DECM_Additional_Information: Contains 49 shapefiles (points and polygons) with additional information on historical geographies. They have been created by digitising historical studies on 16th century Colonial History of Mexico.
4. DECM_Tables: Cointains 2 folders with additional tabular data with relevant information about 16th century colonial Mexico in two formats: csv and xlsx.
5. Documents: a) DECM_Gazetteer_Disambiguation_Percentage.xlsx - Excel document with the total amount of toponyms included in the primary and secondary sources used, the number and percentage of places disambiguated where X and Y coordinates were assigned or not found. b) DECM_Gazetteer_Registry.xlsx & CSV: Excel and CSV files with a list of all the information of sources and attibutes included in DECM_Gazetteer_primarysources, DECM_Gazetteer_secondarysources, DECM_Additional_Information, and DECM_Tables. c) Souces_for_the_Disambiguation - Text file with the bibliographical references used to locate the place names in the DECM_Gazetteer_primarysources folder.
 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This dataset is now being widely used by Latin American scholars and researchers interested in the historical geographies of Mexico and Guatemala. We are also now in the process of integrating the gazetteer into the World Historical Gazetteer, were it will be further distributed. 
URL https://figshare.com/articles/DECM_Gazetteer/12367385/1
 
Title DECM Gazetteer 
Description The DECM Historical Gazetteer is a digital gazetteer of historical places in Mexico available in different formats and built from detailed research. This includes a 16th century version (The DECM 16th Century Gazetteer) with the toponyms mentioned in primary sources including the Relaciones Geográficas (1577-1585) and the Suma de Visitas de los Pueblos de la Nueva España (1548-1550), as well as information on the political, religious and administrative units of the Viceroyalty of New Spain. The gazetteer is composed of 71 main files with geographic information of colonial provinces, alcaldias, corregimientos, diocesis, among many others, as well as thousands of historical cities, towns, villages, and other places. This is integrated in an interoperable model containing thousands of historical locations with alternative spelling place-name variations. The dataset also includes 30 tables with additional historical information related to toponyms, languages, repositories, maps, etc.There are two versions of the DECM Historical Gazetteer: 1) The DECM Gazetteer, and 2) The DECM_16thC_Gazetteer.This is the DECM Gazetteer and this version includes all the researched historical toponyms contained in primary and secondary sources. The information is organised by RG volume. It also includes layers and tables of historical information digitised and/or created by the project from secondary sources.This set is composed by GIS shapefiles & Linked Places format information with: a) all toponyms with coordinates, mentioned and disambiguated from the primary sources (13 volumes) and secondary sources (3 volumes); b) 30 geographical layers of additional historical information derived from secondary sources; c) 32 tables with other important historical information related to the RGs; and d) the DECM Gazetteer Registry.Content
1. DECM_Gazetteer_primarysources: Contains 17 shapefiles (points and polygons) with historical geographies. They have been created from the indexes of the edited volumes of the Relaciones Geográficas de la Nueva España and Yucatán published by Acuña, De la Garza, and the Suma de Visita de los Pueblos by Del Paso y Troncoso (see sources used) disambiguated and atomized in a spatial database. Each shapefile include the following attributes: ID, Place name, Alternative Names, Modern Name, References, Location, Confidence Degree, ID of a related location, Relationship shared, Location Type, Type Thesaurus URL, Coord X, Coord Y, Time spam URL, Start date, End date. Every shapefile includes metadata.
2. DECM_Gazetteer_secondarysources: Contains 4 shapefiles (points and polygons) with historical information mentioned and collected from 4 secondary sources. 
3. DECM_Additional_Information: Contains 49 shapefiles (points and polygons) with additional information on historical geographies. They have been created by digitising historical studies on 16th century Colonial History of Mexico.
4. DECM_Tables: Cointains 2 folders with additional tabular data with relevant information about 16th century colonial Mexico in two formats: csv and xlsx.
5. Documents: a) DECM_Gazetteer_Disambiguation_Percentage.xlsx - Excel document with the total amount of toponyms included in the primary and secondary sources used, the number and percentage of places disambiguated where X and Y coordinates were assigned or not found. b) DECM_Gazetteer_Registry.xlsx & CSV: Excel and CSV files with a list of all the information of sources and attibutes included in DECM_Gazetteer_primarysources, DECM_Gazetteer_secondarysources, DECM_Additional_Information, and DECM_Tables. c) Souces_for_the_Disambiguation - Text file with the bibliographical references used to locate the place names in the DECM_Gazetteer_primarysources folder.
 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This dataset is now being widely used by Latin American scholars and researchers interested in the historical geographies of Mexico and Guatemala. We are also now in the process of integrating the gazetteer into the World Historical Gazetteer, were it will be further distributed. 
URL https://figshare.com/articles/DECM_Gazetteer/12367385/2
 
Title DECM Gazetteer 
Description The DECM Historical Gazetteer is a digital gazetteer of historical places in Mexico available in different formats and built from detailed research. This includes a 16th century version (The DECM 16th Century Gazetteer) with the toponyms mentioned in primary sources including the Relaciones Geográficas (1577-1585) and the Suma de Visitas de los Pueblos de la Nueva España (1548-1550), as well as information on the political, religious and administrative units of the Viceroyalty of New Spain. The gazetteer is composed of 71 main files with geographic information of colonial provinces, alcaldias, corregimientos, diocesis, among many others, as well as thousands of historical cities, towns, villages, and other places. This is integrated in an interoperable model containing thousands of historical locations with alternative spelling place-name variations. The dataset also includes 30 tables with additional historical information related to toponyms, languages, repositories, maps, etc.There are two versions of the DECM Historical Gazetteer: 1) The DECM Gazetteer, and 2) The DECM_16thC_Gazetteer.This is the DECM Gazetteer and this version includes all the researched historical toponyms contained in primary and secondary sources. The information is organised by RG volume. It also includes layers and tables of historical information digitised and/or created by the project from secondary sources.This set is composed by GIS shapefiles & Linked Places format information with: a) all toponyms with coordinates, mentioned and disambiguated from the primary sources (13 volumes) and secondary sources (3 volumes); b) 30 geographical layers of additional historical information derived from secondary sources; c) 32 tables with other important historical information related to the RGs; and d) the DECM Gazetteer Registry.Content
1. DECM_Gazetteer_primarysources: Contains 17 shapefiles (points and polygons) with historical geographies. They have been created from the indexes of the edited volumes of the Relaciones Geográficas de la Nueva España and Yucatán published by Acuña, De la Garza, and the Suma de Visita de los Pueblos by Del Paso y Troncoso (see sources used) disambiguated and atomized in a spatial database. Each shapefile include the following attributes: ID, Place name, Alternative Names, Modern Name, References, Location, Confidence Degree, ID of a related location, Relationship shared, Location Type, Type Thesaurus URL, Coord X, Coord Y, Time spam URL, Start date, End date. Every shapefile includes metadata.
2. DECM_Gazetteer_secondarysources: Contains 4 shapefiles (points and polygons) with historical information mentioned and collected from 4 secondary sources. 
3. DECM_Additional_Information: Contains 49 shapefiles (points and polygons) with additional information on historical geographies. They have been created by digitising historical studies on 16th century Colonial History of Mexico.
4. DECM_Tables: Cointains 2 folders with additional tabular data with relevant information about 16th century colonial Mexico in two formats: csv and xlsx.
5. Documents: a) DECM_Gazetteer_Disambiguation_Percentage.xlsx - Excel document with the total amount of toponyms included in the primary and secondary sources used, the number and percentage of places disambiguated where X and Y coordinates were assigned or not found. b) DECM_Gazetteer_Registry.xlsx & CSV: Excel and CSV files with a list of all the information of sources and attibutes included in DECM_Gazetteer_primarysources, DECM_Gazetteer_secondarysources, DECM_Additional_Information, and DECM_Tables. c) Souces_for_the_Disambiguation - Text file with the bibliographical references used to locate the place names in the DECM_Gazetteer_primarysources folder.
 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This dataset is now being widely used by Latin American scholars and researchers interested in the historical geographies of Mexico and Guatemala. We are also now in the process of integrating the gazetteer into the World Historical Gazetteer, were it will be further distributed. 
URL https://figshare.com/articles/DECM_Gazetteer/12367385
 
Title DECM Gold Standard Corpus 
Description The DECM Corpus is a digital corpus of the texts of Relaciones Geográficas de Nueva España (the Geographic Reports of New Spain) with different versions, including a machine ready version, a gold standard annotated dataset, and an automatically annotated version ready for text mining and machine learning experiments.This version contains a sample of the RGs manually annotated by multiple researchers with the software of our industry partner, Tagtog. This corpus has been used to carry out the NLP and ML experiments and the files are available in JSON and TSV format. These files are composed by texts and annotations. This is also accompanied by the DECM ontology which provides an explanation of the entities and labels produced. This corpus can be used for further experimentation with Artificial Intelligence methods. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This is now being used in research in multiple projects, including PhD theses, funded grants, further research on NLP, etc. 
URL https://figshare.com/articles/DECM_Gold_Standard_Corpus/12366734/2
 
Title DECM Gold Standard Corpus 
Description The DECM Corpus is a digital corpus of the texts of Relaciones Geográficas de Nueva España (the Geographic Reports of New Spain) with different versions, including a machine ready version, a gold standard annotated dataset, and an automatically annotated version ready for text mining and machine learning experiments.This version contains a sample of the RGs manually annotated by multiple researchers with the software of our industry partner, Tagtog. This corpus has been used to carry out the NLP and ML experiments and the files are available in JSON and TSV format. These files are composed by texts and annotations. This is also accompanied by the DECM ontology which provides an explanation of the entities and labels produced. This corpus can be used for further experimentation with Artificial Intelligence methods. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This is now being used in research in multiple projects, including PhD theses, funded grants, further research on NLP, etc. 
URL https://figshare.com/articles/DECM_Gold_Standard_Corpus/12366734
 
Title DECM Gold Standard Corpus 
Description The DECM Corpus is a digital corpus of the texts of Relaciones Geográficas de Nueva España (the Geographic Reports of New Spain) with different versions, including a machine ready version, a gold standard annotated dataset, and an automatically annotated version ready for text mining and machine learning experiments.This version contains a sample of the RGs manually annotated by multiple researchers with the software of our industry partner, Tagtog. This corpus has been used to carry out the NLP and ML experiments and the files are available in JSON and TSV format. These files are composed by texts and annotations. This is also accompanied by the DECM ontology which provides an explanation of the entities and labels produced. This corpus can be used for further experimentation with Artificial Intelligence methods. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This is now being used in research in multiple projects, including PhD theses, funded grants, further research on NLP, etc. 
URL https://figshare.com/articles/DECM_Gold_Standard_Corpus/12366734/1
 
Title DECM Machine Annotated Corpus 
Description The DECM Corpus is a digital corpus of the texts of Relaciones Geográficas de Nueva España (the Geographic Reports of New Spain) with different versions, including a machine ready version, a gold standard annotated dataset, and an automatically annotated version ready for text mining and machine learning experiments. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This is now being used in research in multiple projects, including PhD theses, funded grants, further research on NLP, etc. 
URL https://figshare.com/articles/DECM_Machine_Annotated_Corpus/12366956/1
 
Title DECM Machine Annotated Corpus 
Description The DECM Corpus is a digital corpus of the texts of Relaciones Geográficas de Nueva España (the Geographic Reports of New Spain) with different versions, including a machine ready version, a gold standard annotated dataset, and an automatically annotated version ready for text mining and machine learning experiments.This is the version of the entire RG corpus automatically annotated using the ML models trained with the DECM Gold Standard Corpus. The files are available in JSON and TSV format, and it also contains the file for the DECM Ontology. This corpus can be further used for quantitative and qualitative research, as well as advanced analyses using text mining techniques, corpus linguistics and other methods such as Geographical Text Analysis. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This is now being used in research in multiple projects, including PhD theses, funded grants, further research on NLP, etc. 
URL https://figshare.com/articles/DECM_Machine_Annotated_Corpus/12366956/2
 
Title DECM Machine Annotated Corpus 
Description The DECM Corpus is a digital corpus of the texts of Relaciones Geográficas de Nueva España (the Geographic Reports of New Spain) with different versions, including a machine ready version, a gold standard annotated dataset, and an automatically annotated version ready for text mining and machine learning experiments.This is the version of the entire RG corpus automatically annotated using the ML models trained with the DECM Gold Standard Corpus. The files are available in JSON and TSV format, and it also contains the file for the DECM Ontology. This corpus can be further used for quantitative and qualitative research, as well as advanced analyses using text mining techniques, corpus linguistics and other methods such as Geographical Text Analysis. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This is now being used in research in multiple projects, including PhD theses, funded grants, further research on NLP, etc. 
URL https://figshare.com/articles/DECM_Machine_Annotated_Corpus/12366956
 
Title DECM Machine Ready Corpus 
Description The DECM Corpus is a digital corpus of the texts of Relaciones Geográficas de Nueva España (the Geographic Reports of New Spain) with different versions, including a machine ready version, a gold standard annotated dataset and an automatically annotated version ready for text mining and machine learning experiments.This is the DECM Machine Ready Corpus. This version includes text only files (.txt) containing each of the 10 volumes originally edited by Rene Acuña, the 2 volumes edited by Mercedes de la Garza, the Suma de Visita edited by Del Paso y Troncoso, a file with the original text of the Crown mandate (Instrucción), and metadata for this collection. This version contains only the original text of each of the RGs as transcribed by the scholars, excluding any editorial note, commentary, or historical work. This can be therefore used directly for corpus linguistics analyses, visualisations, etc. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This version includes text only files (.txt) containing each of the 10 volumes originally edited by Rene Acuña, the 2 volumes edited by Mercedes de la Garza, the Suma de Visita edited by Del Paso y Troncoso, a file with the original text of the Crown mandate (Instrucción), and metadata for this collection. This version contains only the original text of each of the RGs as transcribed by the scholars, excluding any editorial note, commentary, or historical work. This can be therefore used directly for corpus linguistics analyses, visualisations, etc. 
URL https://figshare.com/articles/DECM_Machine_Ready_Corpus/12048729/1
 
Title DECM Machine Ready Corpus 
Description The DECM Corpus is a digital corpus of the texts of Relaciones Geográficas de Nueva España (the Geographic Reports of New Spain) with different versions, including a machine ready version, a gold standard annotated dataset, and an automatically annotated version ready for text mining and machine learning experiments.This is the DECM Machine Ready Corpus. This version includes text only files (.txt) containing each of the 10 volumes originally edited by Rene Acuña, the 2 volumes edited by Mercedes de la Garza, the Suma de Visita edited by Del Paso y Troncoso, a file with the original text of the Crown mandate (Instrucción), and metadata for this collection. This version contains only the original text of each of the RGs as transcribed by the scholars, excluding any editorial note, commentary, or historical work. This can be therefore used directly for corpus linguistics analyses, visualisations, etc. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This is now being used in research in multiple projects, including PhD theses, funded grants, etc. 
URL https://figshare.com/articles/DECM_Machine_Ready_Corpus/12048729
 
Title DECM Machine Ready Corpus 
Description The DECM Corpus is a digital corpus of the texts of Relaciones Geográficas de Nueva España (the Geographic Reports of New Spain) with different versions, including a machine ready version, a gold standard annotated dataset, and an automatically annotated version ready for text mining and machine learning experiments.This is the DECM Machine Ready Corpus. This version includes text only files (.txt) containing each of the 10 volumes originally edited by Rene Acuña, the 2 volumes edited by Mercedes de la Garza, the Suma de Visita edited by Del Paso y Troncoso, a file with the original text of the Crown mandate (Instrucción), and metadata for this collection. This version contains only the original text of each of the RGs as transcribed by the scholars, excluding any editorial note, commentary, or historical work. This can be therefore used directly for corpus linguistics analyses, visualisations, etc. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This version includes text only files (.txt) containing each of the 10 volumes originally edited by Rene Acuña, the 2 volumes edited by Mercedes de la Garza, the Suma de Visita edited by Del Paso y Troncoso, a file with the original text of the Crown mandate (Instrucción), and metadata for this collection. This version contains only the original text of each of the RGs as transcribed by the scholars, excluding any editorial note, commentary, or historical work. This can be therefore used directly for corpus linguistics analyses, visualisations, etc. 
URL https://figshare.com/articles/DECM_Machine_Ready_Corpus/12048729/2
 
Title DECM Ontology. T-AP Digging into Early Colonial Mexico Project 
Description This document contains a tailored ontology that encompassed the complexity of Relaciones Geográficas de la Nueva España (1577-1585), resulting in twenty-one entities or categories (types of information). These entities were defined and linked to DBpedia, a semantic standard, to ensure the interoperability of our dataset. Additionally, most entities contain labels, also defined and linked to DBpedia to enrich the semantic extraction process. More information about the use of this ontology in combination with Machine Learning and Natural Language Processing techniques: https://www.academia.edu/41852118/Digital_Approaches_to_Historical_Archaeology_Exploring_the_Geographies_of_16th_Century_New_Spainhttps://www.academia.edu/39912267/Training_NLP_Models_for_the_Analysis_of_16th_century_Latin_American_Historical_Documents_Tagtog_and_the_Geographic_Reports_of_New_Spain 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This is now being used to develop new annotation vocabularies for historical research. The DECM Ontology and Annotation Rules: This .xls file contains two sheets. The one called 'Ontology' defines the entities and labels used to annotate the corpus of the RGs. This comprises 18 entities and labels marking important social, political, territorial, religious, and economic information. The second one, called 'Annotation rules' includes the basic rules followed by all the annotators in the project and examples that help to make decisions while carrying out the annotations. These rules were thought to achieve a better annotator consensus which in some cases reached up to 98 per cent in some entities https://github.com/patymurrieta/Digging-into-Early-Colonial-Mexico/tree/master/DECM_Corpus 
URL https://figshare.com/articles/DECM_Ontology_T-AP_Digging_into_Early_Colonial_Mexico_Project/12058095
 
Title DECM Ontology. T-AP Digging into Early Colonial Mexico Project 
Description This document contains a tailored ontology that encompassed the complexity of Relaciones Geográficas de la Nueva España (1577-1585), resulting in twenty-one entities or categories (types of information). These entities were defined and linked to DBpedia, a semantic standard, to ensure the interoperability of our dataset. Additionally, most entities contain labels, also defined and linked to DBpedia to enrich the semantic extraction process. More information about the use of this ontology in combination with Machine Learning and Natural Language Processing techniques: https://www.academia.edu/41852118/Digital_Approaches_to_Historical_Archaeology_Exploring_the_Geographies_of_16th_Century_New_Spainhttps://www.academia.edu/39912267/Training_NLP_Models_for_the_Analysis_of_16th_century_Latin_American_Historical_Documents_Tagtog_and_the_Geographic_Reports_of_New_Spain 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This is now the base for several annotation models being developed in other projects. 
URL https://figshare.com/articles/DECM_Ontology_T-AP_Digging_into_Early_Colonial_Mexico_Project/12058095...
 
Title DECM Ontology. T-AP Digging into Early Colonial Mexico Project 
Description This document contains a tailored ontology that encompassed the complexity of Relaciones Geográficas de la Nueva España (1577-1585), resulting in twenty-one entities or categories (types of information). These entities were defined and linked to DBpedia, a semantic standard, to ensure the interoperability of our dataset. Additionally, most entities contain labels, also defined and linked to DBpedia to enrich the semantic extraction process. More information about the use of this ontology in combination with Machine Learning and Natural Language Processing techniques: https://www.academia.edu/41852118/Digital_Approaches_to_Historical_Archaeology_Exploring_the_Geographies_of_16th_Century_New_Spainhttps://www.academia.edu/39912267/Training_NLP_Models_for_the_Analysis_of_16th_century_Latin_American_Historical_Documents_Tagtog_and_the_Geographic_Reports_of_New_Spain 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This is now being used for the development of new vocabularies for other historical corpora. 
URL https://figshare.com/articles/DECM_Ontology_T-AP_Digging_into_Early_Colonial_Mexico_Project/12058095...
 
Title Machine ready corpus of 16th century Geographic Reports of New Spain 
Description Electronic curated corpus of 16th century Geographic Reports of New Spain ready to carry out any quantitative or qualitative computational analysis. This includes: 10 volumes of the Relaciones Geográficas del siglo XVI (UNAM-IIA, 1982-1988) edited by Rene Acuña. 2 volumes of the Relaciones Histórico-Geográficas de la Gobernación de Yucatán (UNAM-1983) edited by Mercedes de la Garza 6 volumes of Papeles de Nueva España (1905) edited by Francisco del Paso y Troncoso 5 volumes of the Colección de Documentos Inéditos de las Posesiones de España en Ultramar (RealAcademia de la Historia, Madrid, 1864-1884) 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? No  
Impact This corpus enables its use with all sorts of computational approaches including corpus linguistics, extraction of information for its use with Geographic Information Systems, etc. 
URL https://github.com/patymurrieta/Digging-into-Early-Colonial-Mexico
 
Title Pathways to understanding 16th century Mesoamerica 
Description This is an engagement online collection of three different storymaps exploring different aspects of historical geographies derived from the research in the project Digging into Early Colonial Mexico. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact The resource has been used for teaching at Lancaster and other universities in Latin America and the US. It has also lead to some institutions to contact us querying about our work. 
URL https://www.lancaster.ac.uk/digging-ecm/2019/07/pathways-to-understanding-16th-century-mesoamerica/
 
Title Pelagios Commons: Subaltern Recogito Project 
Description The team created an ontology to annotate the maps and carried out a workshop which was attended by 27 scholars from UNAM and ENAH, delivering training on Recogito and presenting an introduction to the Spatial Humanities and the use of these technologies. The project coordinators and contributors met online every week to take part in 'mappathons' with all our participants, completing the annotation of our full corpus of sixteenth-century maps. This was a collaboration between LLILAS Benson Latin American Studies and Collections at The University of Texas at Austin, the National School of Anthropology and History (ENAH), The National Autonomous University of Mexico (UNAM), the National Institute of Anthropology and History (INAH), and the University of Lisbon. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact This dataset is now being used for the development of Computer Vision models in Computer Science. 
URL https://dataverse.tdl.org/citation?persistentId=doi:10.18738/T8/L2SJQT
 
Title Pelagios Commons: Subaltern Recogito Project 
Description This is the annotated collection of the maps from the Relaciones Geograficas de la Nueva Espana that are part of the Nettie Lee Benson Latin American Collection at the University of Texas. This dataset is product of the project 'Subaltern Recogito', an spin-off small citizen science project from the 'Digging into Early Colonial Mexico project. The project carried out 5 mappathons with a diversity of participants that culminated with the creation of this dataset now available at the Nettie Lee Benson Library at the University of Texas 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact Through a workshop carried out online, 35 participants were introduced and trained in map annotation with the Recogito software. This project aimed to (1) spread awareness of Digital Humanities tools within Latin American scholars; (2) train students and staff at diverse Mexican institutions in Spatial Humanities methods; and (3) create a collaborative project of the Relaciones Geográficas maps, working towards the creation of a fully annotated dataset. There are now a variety of scholars and interested public working in Mexico with these tools and other Latin American datasets. 
URL https://dataverse.tdl.org/dataset.xhtml?persistentId=doi:10.18738/T8/L2SJQT
 
Title Subaltern Recogito Ontology 
Description This document contains a tailored ontology that encompassed the complexity of the maps (pinturas) of the Relaciones Geográficas de la Nueva España (1577-1585), resulting in twenty-one entities or categories (types of information). These entities were defined and linked to DBpedia, a semantic standard, to ensure the interoperability of our dataset. More information about the use of this ontology in combination with Machine Learning and Natural Language Processing techniques: https://www.academia.edu/41852118/Digital_Approaches_to_Historical_Archaeology_Exploring_the_Geographies_of_16th_Century_New_Spain 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This is now being used for work on the Unlocking the Colonial Archive project to continue research on the annotation of historical images with Computer Vision techniques. 
URL https://figshare.com/articles/Subaltern_Recogito_Ontology/12063180
 
Title Subaltern Recogito Ontology 
Description This document contains a tailored ontology that encompassed the complexity of the maps (pinturas) of the Relaciones Geográficas de la Nueva España (1577-1585), resulting in twenty-one entities or categories (types of information). These entities were defined and linked to DBpedia, a semantic standard, to ensure the interoperability of our dataset. More information about the use of this ontology in combination with Machine Learning and Natural Language Processing techniques: https://www.academia.edu/41852118/Digital_Approaches_to_Historical_Archaeology_Exploring_the_Geographies_of_16th_Century_New_Spain 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This is now being used for work on the Unlocking the Colonial Archive project to continue research on the annotation of historical images with Computer Vision techniques. 
URL https://figshare.com/articles/Subaltern_Recogito_Ontology/12063180/1
 
Title Subaltern Recogito Ontology 
Description This document contains a tailored ontology that encompassed the complexity of the maps (pinturas) of the Relaciones Geográficas de la Nueva España (1577-1585), resulting in twenty-one entities or categories (types of information). These entities were defined and linked to DBpedia, a semantic standard, to ensure the interoperability of our dataset. More information about the use of this ontology in combination with Machine Learning and Natural Language Processing techniques: https://www.academia.edu/41852118/Digital_Approaches_to_Historical_Archaeology_Exploring_the_Geographies_of_16th_Century_New_Spain 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This is now being used for work on the Unlocking the Colonial Archive project to continue research on the annotation of historical images with Computer Vision techniques. 
URL https://figshare.com/articles/Subaltern_Recogito_Ontology/12063180/2
 
Title Transcripción del Catálogo Monumental de España: Provincia de Ávila por Manuel Gómez Moreno (1900-1901) 
Description Text in computer readable format of the 2 books of the volume dedicated to the province of Avila of the Monumental Catalogue of Spain written by Manuel Gómez-Moreno (1900-1901).Using Transkribus, specific Handwriting Text Recognition models have been trained to recognize this author's spelling and are available on the online platform (Gomez-Moreno_v1). After automatic transcription, the text was manually revised.The transcriptions were carried out by Raquel Liceras-Garrido, Alba Comino and Patricia Murrieta-Flores under the project "Goodbye reading glasses: a Machine Learning experiment on handwriting documents", funded by the Faculty of Arts and Social Sciences and the Digital Humanities Hub of Lancaster University (UK). 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact The transcriptions are now available for anyone to use for research, and other researchers are using the HTR models generated to do automated transcriptions. 
URL https://figshare.com/articles/Transcripci_n_del_Cat_logo_Monumental_de_Espa_a_Provincia_de_vila_por_...
 
Title Transcripción del Catálogo Monumental de España: Provincia de Ávila por Manuel Gómez Moreno (1900-1901) 
Description Text in computer readable format of the 2 books of the volume dedicated to the province of Avila of the Monumental Catalogue of Spain written by Manuel Gómez-Moreno (1900-1901).Using Transkribus, specific Handwriting Text Recognition models have been trained to recognize this author's spelling and are available on the online platform (Gomez-Moreno_v1). After automatic transcription, the text was manually revised.The transcriptions were carried out by Raquel Liceras-Garrido, Alba Comino and Patricia Murrieta-Flores under the project "Goodbye reading glasses: a Machine Learning experiment on handwriting documents", funded by the Faculty of Arts and Social Sciences and the Digital Humanities Hub of Lancaster University (UK). 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact The transcriptions are now available for anyone to use for research, and other researchers are using the HTR models generated to do automated transcriptions. 
URL https://figshare.com/articles/Transcripci_n_del_Cat_logo_Monumental_de_Espa_a_Provincia_de_vila_por_...
 
Title Transcripción del Catálogo Monumental de la Provincia de Soria por Juan Cabré (1916-1917) 
Description Text in computer readable format of the 8 books of the volume dedicated to the province of Soria of the Monumental Catalogue of Spain written by Juan Cabré (1916-1917).Using Transkribus, specific Handwriting Text Recognition models have been trained to recognize this author's spelling and are available on the online platform (Cabre_v3). After automatic transcription, the text was manually revised.The transcriptions were carried out by Raquel Liceras-Garrido, Alba Comino and Patricia Murrieta-Flores under the project "Goodbye reading glasses: a Machine Learning experiment on handwriting documents", funded by the Faculty of Arts and Social Sciences and the Digital Humanities Hub of Lancaster University (UK). 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact The transcriptions are now available for anyone to use for research, and other researchers are using the HTR models generated to do automated transcriptions. 
URL https://figshare.com/articles/Transcripci_n_del_Cat_logo_Monumental_de_la_Provincia_de_Soria_por_Jua...
 
Title Transcripción del Catálogo Monumental de la Provincia de Soria por Juan Cabré (1916-1917) 
Description Text in computer readable format of the 8 books of the volume dedicated to the province of Soria of the Monumental Catalogue of Spain written by Juan Cabré (1916-1917).Using Transkribus, specific Handwriting Text Recognition models have been trained to recognize this author's spelling and are available on the online platform (Cabre_v3). After automatic transcription, the text was manually revised.The transcriptions were carried out by Raquel Liceras-Garrido, Alba Comino and Patricia Murrieta-Flores under the project "Goodbye reading glasses: a Machine Learning experiment on handwriting documents", funded by the Faculty of Arts and Social Sciences and the Digital Humanities Hub of Lancaster University (UK). 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact The transcriptions are now available for anyone to use for research, and other researchers are using the HTR models generated to do automated transcriptions. 
URL https://figshare.com/articles/Transcripci_n_del_Cat_logo_Monumental_de_la_Provincia_de_Soria_por_Jua...
 
Title Transcripción del Catálogo Monumental y Artístico de la Provincia de Burgos por Narciso Sentenach (1925) 
Description Text in computer readable format of the 7 books of the volume dedicated to the province of Burgos of the Monumental Catalogue of Spain written by Narciso Sentenach (1927).Using Transkribus, specific Handwriting Text Recognition models have been trained to recognize this author's spelling and are available on the online platform (Sentenach_v3). After automatic transcription, the text was manually revised.The transcriptions were carried out by Raquel Liceras-Garrido, Alba Comino and Patricia Murrieta-Flores under the project "Goodbye reading glasses: a Machine Learning experiment on handwriting documents", funded by the Faculty of Arts and Social Sciences and the Digital Humanities Hub of Lancaster University (UK). 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact The transcriptions are now available for anyone to use for research, and other researchers are using the HTR models generated to do automated transcriptions. 
URL https://figshare.com/articles/Transcripci_n_del_Cat_logo_Monumental_y_Art_stico_de_la_Provincia_de_B...
 
Title Transcripción del Catálogo Monumental y Artístico de la Provincia de Burgos por Narciso Sentenach (1925) 
Description Text in computer readable format of the 7 books of the volume dedicated to the province of Burgos of the Monumental Catalogue of Spain written by Narciso Sentenach (1927).Using Transkribus, specific Handwriting Text Recognition models have been trained to recognize this author's spelling and are available on the online platform (Sentenach_v3). After automatic transcription, the text was manually revised.The transcriptions were carried out by Raquel Liceras-Garrido, Alba Comino and Patricia Murrieta-Flores under the project "Goodbye reading glasses: a Machine Learning experiment on handwriting documents", funded by the Faculty of Arts and Social Sciences and the Digital Humanities Hub of Lancaster University (UK). 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact The transcriptions are now available for anyone to use for research, and other researchers are using the HTR models generated to do automated transcriptions. 
URL https://figshare.com/articles/Transcripci_n_del_Cat_logo_Monumental_y_Art_stico_de_la_Provincia_de_B...
 
Description Colegio de Mexico 
Organisation College of Mexico
Country Mexico 
Sector Academic/University 
PI Contribution During this period we organised a meeting that took place in Mexico City with Dr. Alfonso Medina Urrea at CELL-COLMEX (Centre for Linguistic and Literary Studies) to discuss possible collaborative research and issues related to important resources they have been generated over the last years, such as the corpus of Colonial Mexican Spanish and the dictionary of Mexican Spanish. The result was the establishment of a collaboration agreement on work related to corpus linguistics and historical archaeology as well as the possible students and researchers exchange between both institutions.
Collaborator Contribution We looked to make an association and link with the Centre for Linguistic and Literary Studies at Colmex due to their expertise and potential benefits that an international collaboration, taking as starting point the materials we are working in this project, can bring.So far, we have held meetings to discuss particular aspects of potential research that can be carried out when the datasets we are preparing are ready.
Impact This is an ongoing interdisciplinary collaboration between linguistics, literature and historical archaeology, and we are working together at the moment. We foresee some collaborative outputs in the future, particularly related to the annotation of an extended historical corpus, additional to the one the project has already created.
Start Year 2018
 
Description HGIS de las Indias 
Organisation Austrian Science Fund (FWF)
Country Austria 
Sector Academic/University 
PI Contribution We have established a close collaboration with the project HGIS de las Indias funded by the Austrian Science Fund and located at the University of Graz. We are working with the developer exchanging datasets and we will integrate both databases together in the near future. Our DECM dataset will be facilitated to the developer and offered also as part of his platform. The HGIS of las Indias project integrated with ours in an attempt to provide users with a larger coverage of geographical colonial data, extending its original scope to include our dataset that focuses on the 16th century. We are also providing a more granular historical details to place-name disambiguation for the 17th and 18th centuries.
Collaborator Contribution HGIS de las Indias will provide their platform to offer also our T-AP DECM dataset in the future, and its developer is carrying out research with us as well.
Impact It is an interdisciplinary collaboration between Geography, Archaeology, History and Computer Science.
Start Year 2019
 
Description IIA-UNAM 
Organisation National Autonomous University of Mexico
Department Faculty of Political and Social Sciences (FCPyS)
Country Mexico 
Sector Academic/University 
PI Contribution During this period we established a collaboration with the Institute of Anthropological Research at UNAM, in collaboration with Dr Jorge Manuel Herrera-Tovar.
Collaborator Contribution This collaboration generated already a couple of events. The first one consisted of an introduction to the research carried out at the Digital Humanities Hub and the History Department at Lancaster, presenting the project as case study, with the participation of students at undergraduate and postgraduate level, as well as early career researchers and staff at IIA-UNAM. This attracted around 35 participants. During the second workshop we learned about UNAM projects being developed in the areas of Digital History and Archaeology, and participants had a guided discussion regarding technologies, methodologies and the future of techniques on these fields, as well as links between institutions. The workshops were organised in collaboration with Dr Jorge Manuel Herrera-Tovar who lead the UNAM side. We concluded with the agreement of future research collaborations and students/researchers exchanges.
Impact This is an interdisciplinary collaboration between History and Archaeology. The main outcomes from this collaboration will happen in the third year of our project, as these will consist on research carried out together with the datasets we are creating at the moment. In addition, two students from UNAM will be carrying out internships related to the project at Lacnaster University this summer.
Start Year 2018
 
Description You-I Lab Institute of Scientific and Technological Research of Potosi 
Organisation Potosino Institute of Scientific and Technological Research
Country Mexico 
Sector Academic/University 
PI Contribution We are posing a series of challenges in the form of questions to solve in historical archaeology that can be approached through technological means. Our research partner in turn is helping us to develop the concept for a software of use to the Humanities.
Collaborator Contribution We are working together in the development of the concept for a software for Geographical Text Analysis.
Impact At the moment we are holding collaborative meetings. The disciplines involved are: History, Archaeology, Literature, Computer Science, and Software Engineering.
Start Year 2018
 
Title Geographical Text Analysis Tool 
Description The Geographical Text Analysis Platform is the technical implementation of the methodology of the same name created by our research group at Lancaster University. It enables researchers with a low level of digital literacy to carry out the geographic analysis of any textual corpus, incorporating information that can be further used in Geographic Information Systems. 
Type Of Technology Webtool/Application 
Year Produced 2021 
Open Source License? Yes  
Impact Not known yet. 
URL https://www.lancaster.ac.uk/digging-ecm/
 
Description DECM Media: Twitter 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact We have accomplished a good impact in this period regarding our project with our main media channel. We get around 7.5K impressions every 28 days (around 256 impressions per day). Our audiences are international but we reach substantially the UK, USA, Germany, Mexico, Canada, Colombia and Portugal.
Year(s) Of Engagement Activity 2018,2019
URL https://twitter.com/DiggingCH/media
 
Description Digging into Early Colonial Mexico Blog 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Through our bilingual journal we are reaching students, practitioners and general public, expanding the general knowledge of historical and archaeological research related to Latin American Colonial period. The journal includes blog posts talking about specific areas or research and events related to the project and have demonstrated to be an excellent starting point of conversation with students both in Latin America, particularly Mexico and Guatemala, as well as Europe. People has not only emailed us asking about specific areas of research or possible venues of collaboration after seeing these blog-posts.
Year(s) Of Engagement Activity 2018,2019
URL https://www.lancaster.ac.uk/digging-ecm/journal/
 
Description Invited speaker in public debate panel: Man vs. Machine: the role of Artificial Intelligence within the Digital Humanities 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This was a public live and recorded podcast on a debate on the subject of the role of Artificial Intelligence in Humanities research. The outcome was really interesting resulting in further contacts with industry and conversations with other universities on the impact of Computer Science in Humanities research.
Year(s) Of Engagement Activity 2019
URL https://glc2.workcast.com/clusterSVCFS1/NAS/PseudoMedia/11044/5332003696877178/11044_9_9_107_68488_0...
 
Description Subaltern Recogito Citizen Science Project 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact This project looked to introduce members of the public as well as new and established scholars in Latin America to the field of Spatial Humanities and train them in the use of the online tool called Recogito. Using the 16th century maps of the Geographic Reports of New Spain, he aim of the project was to continue and extend digital annotations of these maps, which we hope will offer new insight into perceptions of space and place across sixteenth-century Mexico from both, an indigenous and European perspective.

Through a workshop carried out online and in presence, 35 participants were introduced and trained in map annotation with Recogito. Supported by the Nettie Lee Benson Latin American Collection at the University of Texas, the National School of Anthropology and History (ENAH), The National Autonomous University of Mexico (UNAM), the National Institute of Anthropology and History (INAH), and the University of Lisbon, the project included a range of American and European scholars, students and interested members of the public.This session will aim to (1) spread awareness of Digital Humanities tools within Latin American scholars; (2) train interested members of the public as well as students and staff at the aforementioned institutions in Spatial Humanities methods; and (3) create a collaborative project of the Relaciones Geográficas maps working towards the creation of a fully annotated dataset.
Year(s) Of Engagement Activity 2019