Dig that lick: Analysing large-scale data for melodic patterns in jazz performances

Lead Research Organisation: Queen Mary University of London
Department Name: Sch of Electronic Eng & Computer Science

Abstract

The recorded legacy of jazz spans a century and provides a vast corpus of data documenting
its development. Recent advances in digital signal processing and data analysis technologies
enable automatic recognition of musical structures and their linkage through metadata to
historical and social context. Automatic metadata extraction and aggregation give unprecedented
access to large collections, fostering new interdisciplinary research opportunities.

This project aims to develop innovative technological and music-analytical methods to gain
fresh insight into jazz history by bringing together renowned scholars and results from several
high-profile projects. Musicologists and computer scientists will together create a deeper and
more comprehensive understanding of jazz in its social and cultural context. We exemplify our
methods via a full cycle of analysis of melodic patterns, or "licks", from audio recordings to an
aesthetically contextualised and historically situated understanding.

Planned Impact

The study of jazz requires insights from, and feeds knowledge back into, African American Studies,
Anthropology, Art History, Literary Studies, Music, Philosophy, Political Science, and Sociology.
A thorough analysis of a century's worth of jazz recordings, and the practices the music entails,
is now possible thanks to recent advances in the computational analysis of audio content, or
Music Information Retrieval (MIR), and to progress in processing large datasets and information
management with Semantic Web technologies. The former enables the automatic description
of audio recordings in terms of high-level or structural musical aspects, and the latter allows
such analyses to be linked to discographic metadata, distributed over multiple sites, describing
performers and composers, listeners, performance venues, and production and consumption
factors, and general historic, cultural and geographic information from external resources. These
technologies can now facilitate access to large collections by researchers from the many disciplines
interested in the evolution of musical expression.

The Dig that Lick project will: enhance infrastructures for semantic audio analyses
of large collections; facilitate access to large collections of audio and associated metadata via
interfaces for content selection, semantic analysis, and aggregation of results that humanities
researchers can easily use; develop this infrastructure to analyse melodic patterns across large
corpora of jazz audio; and relate the results to metadata and background knowledge in order
to trace and interpret musical influence across time and space as well as cultures and societies.
We will develop tool sets and resources that allow researchers to perform studies over wide
time-spans and geographic locations, for example to trace the evolution or spread of certain
musical phenomena. This will enable cross-historical or comparative geographical music research
with direct reference to data and metadata on music performance and creation, an approach
rarely attempted in the musicology of jazz, or of non-notated music.

Our target audiences include: academic communities in jazz studies, whether in music, cultural
studies, social sciences, or business management; MIR practitioners in engineering or library and
information sciences as they relate to music; the J-DISC user community; that is, researchers
and educators who require structured, comprehensive search capabilities in investigating the
cultural background and social networks of jazz performance, accessed via the recorded legacy
of jazz; the Jazzomat community of musicologists and engineers interested in musical and cognitive
questions derived from jazz solo analysis; jazz musicians interested in a topic they wish to document
or explore for professional reasons; and jazz fans or students wishing to know more about an artist they follow.

We will engage with our target audiences via academic publications and presentations, spe-
cial events, software and data releases, and communication to the public via non-academic
channels, including the Dig that Lick web site, blogs, outreach events, social media and press
releases, as appropriate.

Our audiences will benefit in the following ways: jazz researchers - our tools and resources
will provide a powerful new paradigm for evidence-based research; jazz musicians - an unusually
analytical community, we expect many of them to be interested in the software developed by
this project to investigate and reflect on their own performances, helping to develop a better
understanding of the process of transmission and assimilation of patterns that are sometimes
conscious, but often opaque to the artists themselves; jazz aficionados - our software tools and
resources will be available through the web platform operated by the Center for Jazz Studies,
giving deeper understanding of their favorite artists.
 
Description Initial work involved analysis of a set of ~450 transcriptions of jazz solos (the Weimar Jazz Database), which identified a large number of melodic patterns repeated within and between improvisations. This work has been extended by automatic analysis of a collection of over 1000 jazz recordings, where the solos were transcribed by computer. We developed and tested an automatic transcription method that works reasonably well for the purposes of this project. The metadata for these tunes was identified and linked with various online resources, according to new ontology that we developed, the Jazz Ontology. This ontology was populated with metadata from several large scale audio and bibliographic corpora (the Jazz Encyclopedia, the Jazz Discography), and the resulting datasets were merged and linked to existing Linked Open Data resources. These datasets are publicly available and have been integrated into the main showcase of our project, the Dig That Lick Pattern SImilarity Search web site. Users can search for patterns across several datasets, and view the metadata and listen to the relevant excerpts for all instances of the query pattern found in the datasets. This online application is being used by jazz researchers and music lovers for the systematic study of jazz.
Exploitation Route This work is being taken forward in studies of patterns automatically extracted from audio recordings. Musicians could use these resources to learn about jazz, and in particular to learn idiomatic phrases that could be built into their own improvisations.
Sectors Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Leisure Activities, including Sports, Recreation and Tourism,Culture, Heritage, Museums and Collections

URL http://dig-that-lick.eecs.qmul.ac.uk/index.html
 
Description Since the release of the Dig That Lick Pattern Similarity Search web site, we have been contacted by a number of amateur musicians who are using the site to explore jazz improvisation and the dataset that we have analysed. The feedback received was very positive.
First Year Of Impact 2019
Sector Creative Economy,Leisure Activities, including Sports, Recreation and Tourism,Culture, Heritage, Museums and Collections
Impact Types Cultural

 
Description New Directions in Digital Jazz Studies: Music Information Retrieval and AI Support for Jazz Scholarship in Digital Archives
Amount £199,685 (GBP)
Funding ID AH/V009699/1 
Organisation Arts & Humanities Research Council (AHRC) 
Sector Public
Country United Kingdom
Start 02/2021 
End 08/2023
 
Title The Jazz Ontology: Ontology and software for processing jazz metadata 
Description Jazz is a musical tradition with about 100 years of history; unlike in other Western musical traditions, improvisation plays a central role in jazz. Modelling the domain of jazz poses some ontological challenges due to specificities in musical content and performance practice, such as prevalence of recording sessions, band lineup fluidity and importance of short melodic patterns for improvisation. The Jazz Ontology is a semantic model that addresses these challenges, and also describes workflows for annotating melody transcriptions and for pattern search. The Jazz Ontology incorporates existing standards and ontologies such as FRBR and the Music Ontology. The ontology has been assessed by examining how well it supports describing and merging existing datasets and whether it facilitates novel discoveries in a music browsing application. The Jazz Ontology has been populated with the metadata from several large scale audio and bibliographic corpora (the Jazz Encyclopedia, the Jazz Discography). The resulting RDF datasets were merged and linked to existing Linked Open Data resources. These datasets are publicly available and are driving an online application that is being used by jazz researchers and music lovers for the systematic study of jazz. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact The Jazz Ontology is used in the Dig That Lick Pattern Similarity Search website, which is one of the major deliverables of our 2-year research project. It can be found here: https://dig-that-lick.hfm-weimar.de/similarity_search/ 
URL https://osf.io/rqk7z/
 
Title History of Recorded Jazz: DTL1000, 1920-2020 
Description We present the DTL1000 dataset, which was created in the "Dig That Lick" project and covers the history of recorded jazz with a sample of 1,750 improvisations extracted from 1,060 audio tracks. The dataset contains a mixture of collected (editorial metadata), manually annotated (structure, style), and automatically generated (main melody transcriptions of solos) data describing the recordings. The motivation for creating this dataset was the study of patterns in jazz improvisation, but there are many other applications for this resource. The accompanying paper presents the dataset creation process, data structure and contents with descriptive statistics and discusses the origin and process of the annotations, as well as general use cases and specifically the case of pattern analysis. These components and their combinations enable a number of use cases for jazz studies as well as algorithm development for music analysis. The DTL1000 dataset provides a rich resource for a variety of disciplines, and constitutes a contribution to a field where large datasets with rich annotations are scarce. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact The dataset is incorporated in the Pattern Search and Pattern Similarity Search web sites for exploring patterns in jazz solos. These are public-facing resources for researchers, students and enthusiasts to analyse the use of melodic patterns in jazz improvisation. See https://dig-that-lick.hfm-weimar.de/pattern_search/ and https://dig-that-lick.hfm-weimar.de/similarity_search/ . 
URL https://dx.doi.org/10.5255/UKDA-SN-854781
 
Title The Dig That Lick Pattern Similarity Search website 
Description The Dig That Lick Pattern Similarity Search website is one of the major deliverables of our 2-year research project. Key features: Currently we support four melodic databases: The new DTL1000 database, comprising 300000 tone events in 1736 monophonic solos from over 600 jazz tunes spanning the 100 years of jazz history. The solos have been extracted automatically from audio using a newly developed CRNN-based algorithm specialised for jazz. The well-known Weimar Jazz Database with about 200000 tone events from 456 monophonic solos by 78 jazz masters. The Charlie Parker Omnibook with about 18000 tones taken from 52 solos by the co-inventor of bebop. The Essen Folk Song Collection, comprising about 350000 notes from 7352 folk songs. Similarity search can be carried out using interval, refined contour, and pitch patterns (n-grams). The underlying similarity measure is based on the Levenshtein Distance, which gives a reasonable approximation to true perceptual similarity. Various user-definable search parameters, a virtual keyboard for query input, and extensive metadata filters are also available. The result list shows all pattern instances for a given query in the user-defined similarity range with essential metadata and audio snippets for quick aural control. The search results can be grouped by performer (or folk song collection) and by pattern. Extra information for each pattern instance can be displayed by the user according to his or her needs. Result sets can be exported to CSV files by a single click. Furthermore, we provide several visualisation options for result sets such as a pattern timeline and various kinds of pattern networks. Global and personal search histories are available for quick retrieval of previous searches and for exploration of other users' queries. And, of course, there is extensive documentation available. 
Type Of Material Computer model/algorithm 
Year Produced 2019 
Provided To Others? Yes  
Impact The algorithms, interfaces and data provided by this web site enable investigation of musical improvisation by musicologists and fans of jazz. 
URL https://dig-that-lick.hfm-weimar.de/similarity_search/