The Alan Turing Institute
Lead Research Organisation:
The Alan Turing Institute
Department Name: Research
Abstract
The work of the Alan Turing Institute will enable knowledge and predictions to be extracted from large-scale and diverse digital data. It will bring together the best people, organisations and technologies in data science for the development of foundational theory, methodologies and algorithms. These will inform scientific and technological discoveries, create new business opportunities, accelerate solutions to global challenges, inform policy-making, and improve the environment, health and infrastructure of the world in an 'Age of Algorithms'.
Planned Impact
The Institute will bring together leaders in advanced mathematics and computing science from the five founding universities and other partners. Its work is expected to encompass a wide range of scientific disciplines and be relevant to a large number of business sectors.
Publications
Candellero E
(2017)
Coupling of Brownian motions in Banach spaces
Giuffrida M
(2017)
ARIGAN: Synthetic Arabidopsis Plants Using Generative Adversarial Network
Cucuringu M
(2017)
On denoising modulo 1 samples of a function
Botev A.
(2017)
Complementary sum sampling for likelihood approximation in large scale classification
in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017
He Z.
(2017)
Wider and deeper, cheaper and faster: Tensorized LSTMs for sequence learning
in Advances in Neural Information Processing Systems
Nanda V
(2017)
Local cohomology and stratification
Choromanski K
(2017)
The Unreasonable Effectiveness of Structured Random Orthogonal Embeddings
Peinelt N.
(2017)
ClassifierGuesser: A Context-based Classifier Prediction System for Chinese Language Learners
in 8th International Joint Conference on Natural Language Processing - Proceedings of the IJCNLP 2017, System Demonstrations
Title | 2020-04-01 - Data Safe Havens in the Cloud - CW20 Workshop.pptx |
Description | A talk given at the SSI Collaborations Workshop in April 2020 discussing the Alan Turing Institute's "Data Safe A talk given at the SSI Collaborations Workshop in April 2020 discussing the Alan Turing Institute's "Data Safe Havens in the Cloud" project. The slides are included here. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
URL | https://cw20.figshare.com/articles/presentation/2020-04-01_-_Data_Safe_Havens_in_the_Cloud_-_CW20_Wo... |
Title | 2020-04-01 - Data Safe Havens in the Cloud - CW20 Workshop.pptx |
Description | A talk given at the SSI Collaborations Workshop in April 2020 discussing the Alan Turing Institute's "Data Safe A talk given at the SSI Collaborations Workshop in April 2020 discussing the Alan Turing Institute's "Data Safe Havens in the Cloud" project. The slides are included here. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
URL | https://cw20.figshare.com/articles/presentation/2020-04-01_-_Data_Safe_Havens_in_the_Cloud_-_CW20_Wo... |
Title | 34-productive-research-on-sensitive-data-using-cloud-based-secure-research-environments-james-robinson-martin-oreilly.mp4 |
Description | A talk given at the SSI Collaborations Workshop in April 2020 discussing the Alan Turing Institute's "Data Safe A talk given at the SSI Collaborations Workshop in April 2020 discussing the Alan Turing Institute's "Data Safe Havens in the Cloud" project. A video recording of the talk plus subsequent Q&A are included here. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
URL | https://cw20.figshare.com/articles/presentation/34-productive-research-on-sensitive-data-using-cloud... |
Title | 34-productive-research-on-sensitive-data-using-cloud-based-secure-research-environments-james-robinson-martin-oreilly.mp4 |
Description | A talk given at the SSI Collaborations Workshop in April 2020 discussing the Alan Turing Institute's "Data Safe A talk given at the SSI Collaborations Workshop in April 2020 discussing the Alan Turing Institute's "Data Safe Havens in the Cloud" project. A video recording of the talk plus subsequent Q&A are included here. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
URL | https://cw20.figshare.com/articles/presentation/34-productive-research-on-sensitive-data-using-cloud... |
Title | Reproducible secure research environments: Talk from Safe Data Access Professionals Quarterly Meeting on 08 June 2021 |
Description | Overview of the challenges of supporting reproducible research on sensitive data and how the Turing addresses these in its Safe Haven secure research environment. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
URL | https://figshare.com/articles/presentation/Reproducible_secure_research_environments_Talk_from_Safe_... |
Title | Reproducible secure research environments: Talk from Safe Data Access Professionals Quarterly Meeting on 08 June 2021 |
Description | Overview of the challenges of supporting reproducible research on sensitive data and how the Turing addresses these in its Safe Haven secure research environment. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
URL | https://figshare.com/articles/presentation/Reproducible_secure_research_environments_Talk_from_Safe_... |
Description | For Key Findings and Impact, please see our Annual Report: https://www.turing.ac.uk/about-us/annual-report-2021-22 |
Exploitation Route | Please see our Annual Report: https://www.turing.ac.uk/about-us/annual-report-2021-22 |
Sectors | Aerospace, Defence and Marine,Agriculture, Food and Drink,Communities and Social Services/Policy,Construction,Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Energy,Environment,Financial Services, and Management Consultancy,Healthcare,Leisure Activities, including Sports, Recreation and Tourism,Government, Democracy and Justice,Manufacturing, including Industrial Biotechology,Culture, Heritage, Museums and Collections,Pharmaceuticals and Medical Bio |
URL | https://www.turing.ac.uk/ |
Description | For Key Findings and Impact, please see our Annual Report: https://www.turing.ac.uk/about-us/annual-report-2021-22 |
Sector | Aerospace, Defence and Marine,Agriculture, Food and Drink,Communities and Social Services/Policy,Construction,Creative Economy,Digital/Communication/Information Technologies (including Software),Energy,Environment,Financial Services, and Management Consultancy,Healthcare,Government, Democracy and Justice,Manufacturing, including Industrial Biotechology,Culture, Heritage, Museums and Collections,Pharmaceuticals and Medical Biotechnology,Security and Diplomacy,Transport,Other |
Impact Types | Cultural,Societal,Economic,Policy & public services |
Title | DETOX seismic tomography models |
Description | -----------------------
DETOX tomography models ----------------------- This folder contains three tomography models, DETOX-P1, DETOX-P2 and DETOX-P3, in the following formats: - NetCDF (dirname: grid_nc4) - VTK (dirname: vtk) - xyz-value (dirname: txt_tetrahedron) - JPEG for GPLATES, only high-velocities (dirname: GPLATES) The directories are organized as follow: DETOX-P1 +-- GPLATES +-- grid_nc4 +-- txt_tetrahedron +-- vtk DETOX-P2 +-- GPLATES +-- grid_nc4 +-- txt_tetrahedron +-- vtk DETOX-P3 +-- GPLATES +-- grid_nc4 +-- txt_tetrahedron +-- vtk --------------------- Citation: * Kasra Hosseini, Karin Sigloch, Maria Tsekhmistrenko, Afsaneh Zaheri, Tarje Nissen-Meyer, Heiner Igel, Global mantle structure from multifrequency tomography using P, PP and P-diffracted waves, Geophysical Journal International, Volume 220, Issue 1, January 2020, Pages 96-141, https://doi.org/10.1093/gji/ggz394 |
Type Of Material | Database/Collection of data |
Year Produced | 2019 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3993276 |
Title | DETOX seismic tomography models |
Description | -----------------------
DETOX tomography models ----------------------- This folder contains three tomography models, DETOX-P1, DETOX-P2 and DETOX-P3, in the following formats: - NetCDF (dirname: grid_nc4) - VTK (dirname: vtk) - xyz-value (dirname: txt_tetrahedron) - JPEG for GPLATES, only high-velocities (dirname: GPLATES) The directories are organized as follow: DETOX-P1 +-- GPLATES +-- grid_nc4 +-- txt_tetrahedron +-- vtk DETOX-P2 +-- GPLATES +-- grid_nc4 +-- txt_tetrahedron +-- vtk DETOX-P3 +-- GPLATES +-- grid_nc4 +-- txt_tetrahedron +-- vtk --------------------- Citation: * Kasra Hosseini, Karin Sigloch, Maria Tsekhmistrenko, Afsaneh Zaheri, Tarje Nissen-Meyer, Heiner Igel, Global mantle structure from multifrequency tomography using P, PP and P-diffracted waves, Geophysical Journal International, Volume 220, Issue 1, January 2020, Pages 96-141, https://doi.org/10.1093/gji/ggz394 |
Type Of Material | Database/Collection of data |
Year Produced | 2019 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3993275 |
Title | DUKweb (Diachronic UK web) |
Description | We present DUKweb, a set of large-scale resources useful for the diachronic analysis of contemporary English. The dataset is derived from JISC UK Web Domain Dataset (1996-2013), which collects resources from the Internet Archive that were hosted on domains ending in '.uk'. The dataset includes co-occurrences matrices for each year and two types of word vectors by year, Temporal Random Indexing vectors and word2vec embeddings. |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://bl.iro.bl.uk/work/f9ff33ab-56b7-4594-8aca-49781296c0c6 |
Title | Data supporting "GABA, not BOLD, reveals dissociable learning-dependent plasticity mechanisms in the human brain" |
Description | Behavioural data. BOLD change measurements. GABA change measurements. Behavioural data under tDCs intervention. |
Type Of Material | Database/Collection of data |
Year Produced | 2018 |
Provided To Others? | Yes |
Title | Dataset for Toponym Resolution in Nineteenth-Century English Newspapers |
Description | We present a new dataset for the task of toponym resolution in digitised historical newspapers in English. It consists of 343 annotated articles from newspapers based in four different locations in England (Manchester, Ashton-under-Lyne, Poole and Dorchester), published between 1780 and 1870. The articles have been manually annotated with mentions of places, which are linked---whenever possible---to their corresponding entry on Wikipedia. The dataset is published on the British Library shared research repository, and is especially of interest to researchers working on improving semantic access to historical newspaper content. We share the 343 annotated files (one file per article) in the WebAnno TSV file format version 3.2, a CoNLL-based file format. We additionally provide a TSV file with metadata at the article level, and the annotation guidelines. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
URL | https://bl.iro.bl.uk/concern/datasets/de43a15c-e000-4fec-8b66-7ca94ae13db3 |
Title | Latin lexical semantic annotation |
Description | This dataset is a collection of lexical annotation of the corpus occurrences 40 Latin lemmas. The corpus instances are from LatinISE and the process is described in Schlechtweg et al. (2020, 2021).The annotation was coordinated by Barbara McGillivray, and done by Annie Burman, Daria Kondakova, Francesca Dell'Oro, Helena Bermudez Sabel, Hugo Burgess, Paola Marongiu, and Rozalia Dobos. The pre-annotation was coordinated and designed by Barbara McGillivray and done by Manuel Márquez Cruz.ReferencesMcGillivray, B. and Kilgarriff, A. (2013). Tools for historical corpus research, and a corpus of Latin. In Paul Bennett, Martin Durrell, Silke Scheible, Richard J. Whitt (eds.), New Methods in Historical Corpus Linguistics. Tu¨bingen: NarrBarbara McGillivray, Dominik Schlechtweg, Haim Dubossarsky, Nina Tahmasebi, & Simon Hengchen. (2021). DWUG LA: Diachronic Word Usage Graphs for Latin [Data set]. Zenodo. https://doi.org/10.5281/zenodo.5255228Schlechtweg, D., McGillivray, B., Hengchen, S., Dubossarsky, H., Tahmasebi, N. (2020). SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, 2020. International Committee for Computational Linguistics. DOI: 10.18653/v1/2020.semeval-1.1Schlechtweg, D., Tahmasebi, N., Hengchen, S., Dubossarsky, H., McGillivray, B. (2021). DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages. In Proceedings of EMNLP 2021. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
URL | https://kcl.figshare.com/articles/dataset/Latin_lexical_semantic_annotation/16974823 |
Title | LatinISE subcorpora for SemEval 2020 task 1 |
Description | This data collection contains the Latin test data for SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection]: a Latin text corpus pair (`corpus1/`, `corpus2/`) 40 lemmas which have been annotated for their lexical semantic change between the two corpora (`targets.txt`) The corpus data have been automatically lemmatized and part-of-speech tagged, and have been partially corrected by hand. For homonyms, the lemmas are followed by the '\#' symbol and the number of the homonym according to the Lewis-Short dictionary of Latin when this number is greater than 1. For example, the lemma 'dico' corresponds to the first homonym in the Lewis-Short dictionary and 'dico\#2' corresponds to the second homonym, cf. Lewis-Short dictionary. __Corpus 1__ based on: LatinISE (McGillivray and Kilgarriff 2013), version on Sketch Engine language: Latin time covered: from the beginning of the second century before Christ (BC) to the end of the first century BC size: ~1.7 million tokens format: lemmatized, sentence length >= 2, no punctuation, sentences randomly shuffled encoding: UTF-8 __Corpus 2__ based on: LatinISE (McGillivray and Kilgarriff 2013) , version on Sketch Engine language: Latin time covered: from the beginning of the first century after Christ (AD) to the end of the twenty-first century AD size: ~9.4 million tokens format: lemmatized, sentence length >= 2, no punctuation, sentences randomly shuffled encoding: UTF-8 Find more information on the data in the papers referenced below.
References Dominik Schlechtweg, Barbara McGillivray, Simon Hengchen, Haim Dubossarsky and Nina Tahmasebi SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection. To appear in SemEval@COLING2020. McGillivray, B. and Kilgarriff, A. (2013). Tools for historical corpus research, and a corpus of Latin. In Paul Bennett, Martin Durrell, Silke Scheible, Richard J. Whitt (eds.), New Methods in Historical Corpus Linguistics, Tübingen: Narr.
|
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3674988 |
Title | LatinISE subcorpora for SemEval 2020 task 1 |
Description | This data collection contains the Latin test data for SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection]: a Latin text corpus pair (`corpus1/lemma`, `corpus2/lemma`) 40 lemmas which have been annotated for their lexical semantic change between the two corpora (`targets.txt`) the annotated binary change scores of the targets for subtask 1, and their annotated graded change scores for subtask 2 (`truth/`) The corpus data have been automatically lemmatized and part-of-speech tagged, and have been partially corrected by hand. For homonyms, the lemmas are followed by the '\#' symbol and the number of the homonym according to the Lewis-Short dictionary of Latin when this number is greater than 1. For example, the lemma 'dico' corresponds to the first homonym in the Lewis-Short dictionary and 'dico\#2' corresponds to the second homonym, cf. Lewis-Short dictionary. __Corpus 1__ based on: LatinISE (McGillivray and Kilgarriff 2013), version on Sketch Engine language: Latin time covered: from the beginning of the second century before Christ (BC) to the end of the first century BC size: ~1.7 million tokens format: lemmatized, sentence length >= 2, no punctuation, sentences randomly shuffled encoding: UTF-8 __Corpus 2__ based on: LatinISE (McGillivray and Kilgarriff 2013) , version on Sketch Engine language: Latin time covered: from the beginning of the first century after Christ (AD) to the end of the twenty-first century AD size: ~9.4 million tokens format: lemmatized, sentence length >= 2, no punctuation, sentences randomly shuffled encoding: UTF-8 Find more information on the data in the papers referenced below.
References Dominik Schlechtweg, Barbara McGillivray, Simon Hengchen, Haim Dubossarsky and Nina Tahmasebi SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection. To appear in SemEval@COLING2020. McGillivray, B. and Kilgarriff, A. (2013). Tools for historical corpus research, and a corpus of Latin. In Paul Bennett, Martin Durrell, Silke Scheible, Richard J. Whitt (eds.), New Methods in Historical Corpus Linguistics, Tübingen: Narr.
|
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3732944 |
Title | LatinISE test data for SemEval 2020 task 1 |
Description | This data collection contains the Latin test data for SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection: a Latin text corpus pair (`corpus1/lemma`, `corpus2/lemma`) 40 lemmas which have been annotated for their lexical semantic change between the two corpora (`targets.txt`) the annotated binary change scores of the targets for subtask 1, and their annotated graded change scores for subtask 2 (`truth/`) The corpus data have been automatically lemmatized and part-of-speech tagged, and have been partially corrected by hand. For homonyms, the lemmas are followed by the '\#' symbol and the number of the homonym according to the Lewis-Short dictionary of Latin when this number is greater than 1. For example, the lemma 'dico' corresponds to the first homonym in the Lewis-Short dictionary and 'dico\#2' corresponds to the second homonym, cf. Lewis-Short dictionary. __Corpus 1__ based on: LatinISE (McGillivray and Kilgarriff 2013), version on Sketch Engine language: Latin time covered: from the beginning of the second century before Christ (BC) to the end of the first century BC size: ~1.7 million tokens format: lemmatized, sentence length >= 2, no punctuation, sentences randomly shuffled encoding: UTF-8 __Corpus 2__ based on: LatinISE (McGillivray and Kilgarriff 2013) , version on Sketch Engine language: Latin time covered: from the beginning of the first century after Christ (AD) to the end of the twenty-first century AD size: ~9.4 million tokens format: lemmatized, sentence length >= 2, no punctuation, sentences randomly shuffled encoding: UTF-8 Find more information on the data in the papers referenced below.
References Dominik Schlechtweg, Barbara McGillivray, Simon Hengchen, Haim Dubossarsky and Nina Tahmasebi SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection. To appear in SemEval@COLING2020. McGillivray, B. and Kilgarriff, A. (2013). Tools for historical corpus research, and a corpus of Latin. In Paul Bennett, Martin Durrell, Silke Scheible, Richard J. Whitt (eds.), New Methods in Historical Corpus Linguistics, Tübingen: Narr.
|
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3734089 |
Title | LatinISE test data for SemEval 2020 task 1 with additional token versions of the corpora |
Description | This data collection contains the Latin test data for SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection: a Latin text corpus pair (`corpus1/lemma`, `corpus2/lemma`) 40 lemmas which have been annotated for their lexical semantic change between the two corpora (`targets.txt`) the annotated binary change scores of the targets for subtask 1, and their annotated graded change scores for subtask 2 (`truth/`) The corpus data have been automatically lemmatized and part-of-speech tagged, and have been partially corrected by hand. For homonyms, the lemmas are followed by the '\#' symbol and the number of the homonym according to the Lewis-Short dictionary of Latin when this number is greater than 1. For example, the lemma 'dico' corresponds to the first homonym in the Lewis-Short dictionary and 'dico\#2' corresponds to the second homonym, cf. Lewis-Short dictionary. __Corpus 1__ based on: LatinISE (McGillivray and Kilgarriff 2013), version on Sketch Engine language: Latin time covered: from the beginning of the second century before Christ (BC) to the end of the first century BC size: ~1.7 million tokens format: lemmatized, sentence length >= 2, no punctuation, sentences randomly shuffled encoding: UTF-8 __Corpus 2__ based on: LatinISE (McGillivray and Kilgarriff 2013) , version on Sketch Engine language: Latin time covered: from the beginning of the first century after Christ (AD) to the end of the twenty-first century AD size: ~9.4 million tokens format: lemmatized, sentence length >= 2, no punctuation, sentences randomly shuffled encoding: UTF-8 Find more information on the data in the papers referenced below. Besides the official lemma version of the corpora for SemEval-2020 Task 1 we also provide the raw token version (
corpus1/token/ ,
corpus2/token/ ). It contains the raw sentences in the same order as in the lemma version. Find more information on the data and SemEval-2020 Task 1 in the paper referenced below. The creation of the data was supported by the CRETA center and the CLARIN-D grant funded by the German Ministry for Education and Research (BMBF).
References Dominik Schlechtweg, Barbara McGillivray, Simon Hengchen, Haim Dubossarsky and Nina Tahmasebi SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection. To appear in SemEval@COLING2020. McGillivray, B. and Kilgarriff, A. (2013). Tools for historical corpus research, and a corpus of Latin. In Paul Bennett, Martin Durrell, Silke Scheible, Richard J. Whitt (eds.), New Methods in Historical Corpus Linguistics, Tübingen: Narr.
|
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3674098 |
Title | LatinISE test data for SemEval 2020 task 1 with additional token versions of the corpora |
Description | This data collection contains the Latin test data for SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection: a Latin text corpus pair (`corpus1/lemma`, `corpus2/lemma`) 40 lemmas which have been annotated for their lexical semantic change between the two corpora (`targets.txt`) the annotated binary change scores of the targets for subtask 1, and their annotated graded change scores for subtask 2 (`truth/`) The corpus data have been automatically lemmatized and part-of-speech tagged, and have been partially corrected by hand. For homonyms, the lemmas are followed by the '\#' symbol and the number of the homonym according to the Lewis-Short dictionary of Latin when this number is greater than 1. For example, the lemma 'dico' corresponds to the first homonym in the Lewis-Short dictionary and 'dico\#2' corresponds to the second homonym, cf. Lewis-Short dictionary. __Corpus 1__ based on: LatinISE (McGillivray and Kilgarriff 2013), version on Sketch Engine language: Latin time covered: from the beginning of the second century before Christ (BC) to the end of the first century BC size: ~1.7 million tokens format: lemmatized, sentence length >= 2, no punctuation, sentences randomly shuffled encoding: UTF-8 __Corpus 2__ based on: LatinISE (McGillivray and Kilgarriff 2013) , version on Sketch Engine language: Latin time covered: from the beginning of the first century after Christ (AD) to the end of the twenty-first century AD size: ~9.4 million tokens format: lemmatized, sentence length >= 2, no punctuation, sentences randomly shuffled encoding: UTF-8 Find more information on the data in the papers referenced below. Besides the official lemma version of the corpora for SemEval-2020 Task 1 we also provide the raw token version (
corpus1/token/ ,
corpus2/token/ ). It contains the raw sentences in the same order as in the lemma version. Find more information on the data and SemEval-2020 Task 1 in the paper referenced below. The creation of the data was supported by the CRETA center and the CLARIN-D grant funded by the German Ministry for Education and Research (BMBF).
References Dominik Schlechtweg, Barbara McGillivray, Simon Hengchen, Haim Dubossarsky and Nina Tahmasebi SemEval 2020 Task 1: Unsupervised Lexical Semantic Change Detection. To appear in SemEval@COLING2020. McGillivray, B. and Kilgarriff, A. (2013). Tools for historical corpus research, and a corpus of Latin. In Paul Bennett, Martin Durrell, Silke Scheible, Richard J. Whitt (eds.), New Methods in Historical Corpus Linguistics, Tübingen: Narr.
|
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/3992738 |
Title | Living Machines atypical animacy dataset |
Description | Atypical animacy detection dataset, based on nineteenth-century sentences in English extracted from an open dataset of nineteenth-century books digitized by the British Library (available via https://doi.org/10.21250/db14, British Library Labs, 2014). This dataset contains 598 sentences containing mentions of machines. Each sentence has been annotated according to the animacy and humanness of the machine in the sentence. This dataset has been created as part of the following paper: Ardanuy, M. C., F. Nanni, K. Beelen, Kasra Hosseini, Ruth Ahnert, J. Lawrence, Katherine McDonough, Giorgia Tolfo, D. C. Wilson and B. McGillivray. "Living Machines: A study of atypical animacy." In Proceedings of the 28th International Conference on Computational Linguistics (COLING2020). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://bl.iro.bl.uk/work/323177af-6081-4e93-8aaf-7932ca4a390a |
Title | Monthly word embeddings for Twitter random sample (English, 2012-2018) |
Description | This dataset contains monthly word embeddings created from the tweets available via the statuses/sample endpoint of the Twitter Streaming API from 2012 to 2018. Full details of the creation of the dataset are given in Room to Glo: A Systematic Comparison of Semantic Change Detection Approaches with Word Embeddings. The md5sum of the gzipped tarball file is a76888ffec8cc7aebba09d365ca55ace . |
Type Of Material | Database/Collection of data |
Year Produced | 2019 |
Provided To Others? | Yes |
Title | Monthly word embeddings for Twitter random sample (English, 2012-2018) |
Description | This dataset contains monthly word embeddings created from the tweets available via the statuses/sample endpoint of the Twitter Streaming API from 2012 to 2018. Full details of the creation of the dataset are given in Room to Glo: A Systematic Comparison of Semantic Change Detection Approaches with Word Embeddings. The md5sum of the gzipped tarball file is a76888ffec8cc7aebba09d365ca55ace . |
Type Of Material | Database/Collection of data |
Year Produced | 2019 |
Provided To Others? | Yes |
Title | Research Data Supporting "Modelling prognostic trajectories of cognitive decline due to Alzheimer's disease" |
Description | |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://www.repository.cam.ac.uk/handle/1810/301740 |
Title | Research data supporting "Multimodal imaging of brain connectivity reveals predictors of individual decision strategy in statistical learning" |
Description | Behavioural data, resting-state fMRI connectivity data and graph metrics data (see supporting data description .doc file for more information) |
Type Of Material | Database/Collection of data |
Year Produced | 2019 |
Provided To Others? | Yes |
Title | Research data supporting "White-Matter Pathways for Statistical Learning of Temporal Structures" |
Description | Behavioural data and DTI connectivity data (see supporting data description .doc file for more information) |
Type Of Material | Database/Collection of data |
Year Produced | 2019 |
Provided To Others? | Yes |
Title | Supplementary material for 'A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching' |
Description | Supplementary material for the https://github.com/Living-with-machines/LwM_SIGSPATIAL2020_ToponymMatching repository, containing the underlying code and materials for the paper 'A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching', accepted to SIGSPATIAL2020 as a poster paper. Coll Ardanuy, M., Hosseini, K., McDonough, K., Krause, A., van Strien, D. and Nanni, F. (2020): A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching, SIGSPATIAL: Poster Paper. |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/4034818 |
Title | Supplementary material for 'A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching' |
Description | Supplementary material for the https://github.com/Living-with-machines/LwM_SIGSPATIAL2020_ToponymMatching repository, containing the underlying code and materials for the paper 'A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching', accepted to SIGSPATIAL2020 as a poster paper. Coll Ardanuy, M., Hosseini, K., McDonough, K., Krause, A., van Strien, D. and Nanni, F. (2020): A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching, SIGSPATIAL: Poster Paper. |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/4034819 |
Title | Visual Identification of Individual Holstein Friesian Cattle via Deep Metric Learning |
Description | This dataset accompanies the paper - "Visual Identification of Individual Holstein Friesian Cattle via Deep Metric Learning" available at - https://arxiv.org/abs/2006.09205. It consists of two components: (a) detection and localisation, (b) identification. For an overview of this dataset, refer to Section 3 in the paper. For any queries, contact the corresponding author in the paper. For accompanying source code, check out - https://github.com/CWOA/MetricLearningIdentification |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
URL | https://data.bris.ac.uk/data/dataset/10m32xl88x2b61zlkkgz3fml17/ |
Title | DeezyMatch |
Description | DeezyMatch: A Flexible Deep Neural Network Approach to Fuzzy String Matching DeezyMatch can be applied for performing the following tasks: Record linkage Candidate selection for entity linking systems Toponym matching |
Type Of Technology | Software |
Year Produced | 2020 |
Open Source License? | Yes |
URL | https://zenodo.org/record/3983554 |
Title | DeezyMatch |
Description | DeezyMatch: A Flexible Deep Neural Network Approach to Fuzzy String Matching DeezyMatch can be applied for performing the following tasks: Record linkage Candidate selection for entity linking systems Toponym matching |
Type Of Technology | Software |
Year Produced | 2020 |
Open Source License? | Yes |
URL | https://zenodo.org/record/3983555 |
Title | passt/miceandmen: Code released with manuscript. |
Description | Source code related to Stumpf et al. (2020) Transfer learning from mouse to man. |
Type Of Technology | Software |
Year Produced | 2020 |
URL | https://zenodo.org/record/4105890 |
Title | passt/miceandmen: Code released with manuscript. |
Description | Source code related to Stumpf et al. (2020) Transfer learning from mouse to man. |
Type Of Technology | Software |
Year Produced | 2020 |
URL | https://zenodo.org/record/4105891 |