Data Stories: Engaging citizens with data in a post-truth society

Lead Research Organisation: King's College London

Department Name: Informatics

Abstract

In the post-truth society we live in, experts must find novel ways to bring hard, factual data to citizens. Data must entertain as well as inform, and excite as well as educate. It must be built with sharing through social channels in mind and become part of our everyday activities and interactions with others. Data Stories will look at novel frameworks and technologies for bringing data to people through art, games, and storytelling. It will examine the impact that varying levels of localisation, topicalisation, participation, and shareability have on the engagement of the general public with factual evidence substantiated by different forms of digital content derived and repurposed from a variety of sources. It will deliver the tools and guidance that community and civic groups need to achieve broader participation and support for their initiatives at local and national level, and empower artists, designers, statisticians, analysts, and journalists to communicate with data in inspiring, informative ways.

Our research hypotheses are as follows:
1. People engage more with data that is made relevant to them by localisation (data related to a specific geographic or geopolitical area of interest) and topicalisation (data about a particular entity, theme, or event).
2. People engage more with data and understand it better when said data is provided through interactive and participatory methods that help build a coherent narrative.
3. Data is more likely to be shared, and therefore reach more people, if shareability is built into its presentation.

We will test these hypotheses and propose a data experience framework supported by models, algorithms, tools, and guidelines that help individuals and groups in creating bespoke, participatory content (for example, art, games, and stories, from data). The framework design will be informed by practice-led research in three main areas: (i) finding and enriching data; (ii) generating content; and (iii) sharing and engaging with content. It will draw upon methods from several disciplines: data and content management; machine learning; human data interaction; game design and gamification; crowdsourcing; online communities; social and political sciences; creative writing; and visual arts. The research will be prototypically showcased in four contexts: (i) within the Data as Culture programme at the ODI, working together with artists, designers, and open data activists; (ii) as part of the Datapolis project run by the ODI, which looks at the use of game interfaces to demystify data, with the support of game designers and local communities; (iii) in a fact-checking & journalism showcase together with the BBC, Full Fact, and the Parliament Digital Service; and (iv) via datathons and our own Data Stories challenge, run by WSI and the ODI, alongside initiatives such as Bath:Hacked and ODCamp UK, which will build community-relevant data narratives from open data enriched with other media, using creative writing techniques.

Our proposal is well aligned with the EPSRC call, addressing several themes to varying degrees. The majority of the research is focused on enabling and facilitating content creation. Specifically, we look at providing intelligent tools to make it easier for people to create data experiences. The beneficiaries are artists, storytellers (such as journalists or analysts), game makers, and those in community and civil society groups wishing to use the modes of art, games, and narration to raise broader awareness of their work. The research will include using data to create immersive experiences through art, games and virtual reality environments that are built from structured data alongside other forms of digital content. Ultimately, these novel ways to get to know and interact with data, relevant to one's context and presented creatively and innovatively, will inform and educate the public, leading, to more sustainable digital ecosystems, and to a more inclusive society.

Planned Impact

Less than a month since the EU referendum, our research could not be timelier. The lack of public engagement with facts and the distrust of experts are core challenges in the UK and elsewhere as the world will face fundamental questions over the next decades. As a society, we will be dealing with significant economic, social, and environmental challenges: a lack of international investment, inequality and divisions, and a changing climate. The decisions that we make must be informed by evidence, but our appetite is for entertainment. To avoid being misled, it is essential for the public to question and understand the figures and statistics that they are presented with. This research will target the role of the creative industries in enabling better decision making, capitalising on areas of expertise in which the UK is internationally recognised: data-driven technologies and creativity, two of the fastest growing sectors of the economy.

The UK leads the world in open data; considerable effort and resources have been devoted over the last five years in publishing and promoting open data sets to create growth and stimulate innovation. Data Stories will help the UK remain at the forefront of new developments in this space by exploring an open data theme that focuses specifically on interdisciplinary contexts at the intersection between arts, design, and technology. The proposal complements and expands existing programmes such as ODI's Data as Culture and the European funded Open Data Incubator for Europe (ODINE), which looks at the use of open data in industrial settings. In addition, the work around data search has the potential for substantial impact on the UK's national data infrastructure; this topic is still underexplored and our research outputs will contribute directly to the success of existing investments in this space.

From an end-user and societal point of view, our showcases will prioritise the needs of local and national communities in the ODI Nodes network, with a special focus on triple bottom line impact and the three P's (people, planet, profit). In terms of academic impact, Data Stories will help maintain UK excellence in data-driven technologies, in particular in a cross-disciplinary context that seeks input from arts, design, social sciences, and HCI to define more engaging, immersive data experiences, which in turn will lead to more informed citizens and better decisions in virtually all areas from the economy to the environment. The project will shape the research agenda in this emerging field, leveraging the collaborations with national and international ODI Nodes network, as well as the outstanding position of Southampton's WSI as pioneer of interdisciplinary research in Web and data science. Given the increasing importance of data literacy in society, Data Stories will impact the state of the art by proposing a practice-led design and scalable implementation of data discovery and search mechanisms based around localisation and topicality; and by designing frameworks, templates, and tools to produce novel ways to interact with data, which appeal to experts and non-experts alike.

From an EPSRC point of view, our main focus is on enabling and facilitating content creation, providing intelligent tools to make it easier for people to experience data in a different way and advocating the use of open data, which anyone can access, use, and share. We believe that the ability to understand and engage with data is necessary for inclusion, in particular in the democratic process. Turning it into art, stories, and games should enable more people to engage with it, use it to inform their arguments, and thus empower them. Our proposal hence responds to two of the challenge areas of the RCUK Digital economy theme: Sustainable society, which is based on people being able to make better choices; and Communities and culture, and the responsible use of digital means.

Funded Value:

£167,018

Funded Period:

Feb 20 - Jan 21

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/P025676/2

Principal Investigator:

Elena Simperl

Research Subject:

Design (20%)

Info. & commun. Technol. (40%)

Media (40%)

Research Topic:

Computer Graphics & Visual. (8%)

Human-Computer Interactions (12%)

Information & Knowledge Mgmt (20%)

See subject area (60%)

Organisations

People	ORCID iD
Elena Simperl (Principal Investigator)
Wendy Hall (Co-Investigator)	http://orcid.org/0000-0003-4327-7811
Leslie Carr (Co-Investigator)
Justin Murphy (Co-Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Akhtar M (2024) Croissant: A Metadata Format for ML-Ready Datasets

Blount T (2020) Understanding the Use of Narrative Patterns by Novice Data Storytellers

Blount T (2020) Smoking gun in Interactions

Chapman A (2019) Dataset search: a survey in The VLDB Journal

Ibáñez L (2022) A comparison of dataset search behaviour of internal versus search engine referred sessions

Koesten L (2021) UX of data making data available doesn't make it usable in Interactions

Koesten L (2021) Talking datasets - Understanding data sensemaking behaviours in International Journal of Human-Computer Studies

Koesten L (2020) Everything you always wanted to know about a dataset: Studies in data summarisation in International Journal of Human-Computer Studies

Koesten L (2020) Dataset Reuse: Translating Principles to Practice in SSRN Electronic Journal

Maddison J (2024) Democratising access to data: Bridging the data divide with generative AI models New research exploring the landscape of generative AI tools to support data publishing and use to close the digital divide.

Related Projects

Project Reference	Relationship	Related To	Start	End	Award Value
EP/P025676/1			30/09/2017	31/01/2020	£704,835
EP/P025676/2	Transfer	EP/P025676/1	01/02/2020	31/01/2021	£167,019

Key Findings
Impact Summary
Further Funding
Collaboration
Software and Technical Products
Engagement Activities


Description	The framework design of the Data Stories project was informed by practice-led research in three main areas: Finding and enriching data Generating content from data Sharing and engaging with content Finding and enriching data - A number of peer reviewed publications in the area of data discovery and collaboration with structured data have resulted from this project. These, together with the datasearch workshops,described in the Section "Engagement activities", demonstrate the importance and recognition of dataset search within the research community. We believe this to be a significant achievement by contributing new knowledge to this rapidly evolving topic. Due to the exploratory nature of current research on data discovery we see a large space for future research taking our findings forward. Our findings on dataset search and dataset specific selection criteria have been used by the European Data portal to inform the development of their data search functionality. Based on the results of a mixed methods study on dataset summaries for human consumption we also proposed a guidelines to support people to write meaningful dataset summaries for the purpose of dataset reuse. These insights can inform the design of data discovery and exploration tools, by tailoring functionalities to user needs specifically directed at structured data. We further used our results to develop a small prototype for data publishers to guide them through the summary writing process. In order to better understand the patterns and specific attributes that data consumers use to search for data and how it compares with general web search, we performed a query log analysis based on logs from four national open data portals and conducted a qualitative analysis of user data requests for requests issued to one of them. In addition to that we conducted a crowdsourcing experiment where we asked crowdworkers to create queries for dataset described in a data request. The queries they provided were aimed at finding a dataset to answer a specific user need. It appeared that portals search functionalities are currently used in an exploratory manner, rather than to retrieve a specific resource, which reinforced our hypothesis that dataset search is different to general web search and needs tailored approaches taking advantage of the dataset structure. After identifying that lack of context in dataset retrieval is a big factor in how users assess whether the datasets is suitable to their task we looked into possible approaches to adding such context to the data inside the dataset. Approaches assigning semantic labels from knowledge bases to specific columns disambiguating their meaning exist but their primary focus till now was on columns with textual data rather than numbers. Given that numerical columns are the most popular column type on open data platforms we proposed an approach to add semantic meaning to numerical columns. The approach was evaluated using a benchmark generated for the purpose of this work. We showed the influence of the different levels of analysis on the success of assigning semantic labels to numerical values within tables. Further, we compared our work with state of the art approaches looking at this problem and showed that our approach is less affected by the structure of the data and by data quality issues. One reason to engage in dataset search is to find data that can be reused for other purposes. In order to understand whether a dataset can be reused people need to make sense of it and determine it's "fitness for use". We identified a gap in research aiming to understand sensemaking specifically for structured data as opposed to information seeking more generally. To this end we conducted a qualitative mixed methods study, looking at how researchers make sense of and reuse existing data. We were able to identify clusters of activity patterns and related data attributes important in data exploration and sensemaking. We derived concrete recommendations for how these activity patterns and data characteristics can inform tool design and documentation practices to support data-centric sensemaking behaviours. Through a number of partner workshops, we have been interrogating the diversity implications of the structuring of data. Data is usually categorised and structured by "neurotypicals". We have reports of neurodiverse data users being frustrated by the incoherence/illogical of categorisation - it appears that neurotypical people have a greater capacity to cope with inconsistency and illogicality. Hierarchical classifications and seemingly subjective schema design can be difficult to comprehend by neuro-atypical individuals. "Data based decision making" is a term used in relation to evidence-based processes, but the data can be illogical to certain people. It may bring a unique perspective to the difficulties of categorisation, and the process of creating standards. These findings ask us to question who makes the rules behind database structures and presentation, and do the designers of these rules consider a diverse user base? Generating content from data - In investigating the use of data games and the effects of play on recall and engagement, a simple data game was implemented (based on the work of Togelius and Friberger (2013)) aiming to help players memorise simple data sets. An experiment was carried out in which participants were shown either a variant of this "gamified" visualisation, or a set of traditional bar charts. However, experimental results have show that participants that were shown the gamified visualisation did not necessarily perform better in terms of recall that those that saw a traditional visualisation. There are a number of reasons this may be the case, such as participants focusing on using the in-game mechanics to achieve a higher score, rather than taking in the data. This leads us to conclude that simply incorporating the notion of play into a data visualisation is, alone, insufficient, and does not inherently help better communicate the message behind the data (and in some cases may distract from it). As such, ongoing work seeks to understand the way in which the individual mechanics of games can be used to encourage exploration of, and focus on, data, and how mid-game "gating" (or tasks/quizzes) can encourage a deeper understanding. We have a paper under review describing aspects of this work. As more of a theoretical contribution we published a paper about sensemaking with data using a mixed-methods study in which we identify three distinct clusters of sensemaking activity patterns and their related data attributes. This can be used to discuss user needs important when understanding and reusing data created by others and we propose design recommendations for tools to support data sensemaking and reuse. We further worked on the development of a web-tool to support the work of data journalists and any other authors of "data stories", which are articles or reports that tell a narrative inferred from an underlying dataset through the use of text and data visualisation. These stories incorporate a semi-automated generative logical structure and intelligently recommended visualisations. We included data journalists in the iterative development of the tool through feedback cycles and testing sessions in the form of contextual inquiries. The tool allows users to import their own data, provides an overview of said dataset, recommends suitable story-beats and visualisations, and exports the story to a number of formats. The tool was designed in concert with real-world data professionals and journalists. Sharing and engaging with content - Numeric data: Investigations of the shareability of data, in terms of reach and engagement have led to a public dataset of socially derived "numeric data", a unique corpus of more than 20 million occurrences of numeric data identified as appearing in social media feeds. The use of data rich language in natural language communication has not been the subject of significant research focus, and this dataset allows studies of the references to and the reliance on data in human communication. We conducted an analysis (and refinement) of the data to model the use of data which is incorporated into the WebData RA tool. Chart identification: For the training and testing purposes of our system for chart identification of chart images on social media, we built using crowdsourcing a new corpus consisting of 3k image tweets that have been posted by Twitter accounts of some major news agencies (e.g. nytgraphics and ReutersGraphics, GuardianData). The corpus was formed because we found that there are differences between the chart images that are made available in benchmark corpora and those that are shared on social media platforms. The latter are often augmented with additional elements, such as text and images. This deems the task of identifying them more challenging, especially for systems that have been built based on idealised examples. Based on the statistics from this new corpus, we found that bar charts (incl. column charts) are the most common type of visualisation used by data journalism-oriented accounts with 378 and 89 images showing solely a bar chart and a bar chart accompanied by a different chart type respectively; the second most common visualisation type are maps with respective quantities of 382 and 14. Furthermore, we built an architecture based on deep neural networks for predicting the virality potential of a chart image on Twitter. Our system predicts the expected virality as a function of the total number of its retweets and likes. Using this architecture, we tested the separate contribution of different signals (i.e. chart images, its original poster and the accompanying text) for the prediction of the expected number of likes and retweets. We evaluated our result using Spearman's rank correlation and Root Mean Square Error (RMSE) of the predicted values with respect to the actual retweet and like counts in our test set. We found that coupling the textual information from the text with author-related cues (e.g. number of friends, followers and likes) results in better performance gain for like counts prediction than combining it with extracted visual features from the corresponding chart. On the other hand, the combination of textual features with either author- or chart-related cues are equally important for predicting the total number of expected retweets. In general, we found that the most accurate predictions are computed when all three types of information (i.e. visual from the chart, textual from its accompanying text and social from its original poster's characteristics) are taken into consideration. To analyse how data-rich content is currently being shared, information is being collected from Twitter to classify the kinds of data used, the presentation mechanisms chosen, the role played by the data in the shared content and the individuals who share data-rich content. Data experiences: A different aspect of engaging with content was addressed in our work on "data-experiences", which resulted in two gamebased artworks. One was created in a participatory design process with a neurodiverse person to express a personal response to data, in an artistic context. The outcome facilitates the engagement of citizens with neurodiversity through the liaison of game (a playable pinball machine) and data. One of the key findings from the design process was how categorisations inherent to data are tailored towards neurotypical experiences. The second piece is insights on collaborative decision making with data, to make sense of story fragments in the context of a game. The goals were to explore how to use narrative and game mechanics to change the way the public engages with data. The project asked questions such as: Can the game experience encourage people to engage with types of data with which they might not otherwise engage? Can it encourage them to engage more thoroughly and rigorously than they would have otherwise?
Exploitation Route	(To add to URLs: https://fastfamiliar.com/research/smoking-gun/ , https://www.youtube.com/watch?v=M9-TfvYw7g4)
Sectors	Creative Economy Digital/Communication/Information Technologies (including Software) Education Culture Heritage Museums and Collections
URL	http://datastories.co.uk/


Description	Our work has received interest and a number of collaborations based on findings of this work emerged: We started a direct collaboration with the European Data Portal, including a webinar series and research activities. This resulted in interest to conduct more research on dataset search, informed by the related studies we published in the Data stories project, which will start in April 2021 with additonal user studies. The findings have also informed work in the project TheyBuyForYou where we advised public administration in Human Data Interaction around decisions in procurement intelligence. Parts of the datastories team have won a project with Google and a project with Nesta on sensemaking of data charts which is informed by the outcomes of our data centric sensemaking work. Another project which is a follow-up from the interactions with Artists in DataStories is a H2020 project that started in 2020 where we will be working with 40+ artists doing work with data and AI. Work in designing engagements with data artifacts has informed the design of an open-source machine-readable metadata vocabulary to support machine learning developers and practitioners. In addition, we are working together with the Open Data Baromoter team and the Open Data Institute and Microsoft to explore how to use generative AI conversational agents to facilitate interactions with data.
First Year Of Impact	2020
Sector	Communities and Social Services/Policy,Digital/Communication/Information Technologies (including Software),Education,Government, Democracy and Justice,Culture, Heritage, Museums and Collections
Impact Types	Cultural Societal Economic Policy & public services


Description	Nesta
Amount	£29,452 (GBP)
Organisation	Nesta
Sector	Charity/Non Profit
Country	United Kingdom
Start	03/2021
End	12/2021


Description	Strand-Aldwych Data Stories
Amount	£10,000 (GBP)
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	03/2022
End	03/2023


Description	Birmingham Open Media co-created artwork: Tribes, Treasure Hunts & Truth Seekers
Organisation	Birmingham Open Media
Country	United Kingdom
Sector	Private
PI Contribution	Put together a brief for the artists to develop a participatory artwork with members of the neurodiverse community in and around Birmingham. Two artist fellows Harmeet Chagger Kahn and Ben Neale have been commissioned to develop a participatory artwork with members of the neurodiverse community in and around Birmingham.
Collaborator Contribution	Managed by Birmingham Open Media, two artist fellows designed a series of co-creation workshops with neurodiverse artists and residents of Birmingham with the intention to develop, one or a number of 'data experiences'. Two artist fellows Harmeet Chagger Kahn and Ben Neale have been commissioned to develop a participatory artwork with members of the neurodiverse community in and around Birmingham. Two successful workshops were held with neurodiverse participants and teams from Birmingham Open Media (BOM) and ODI. Data Stories team members for each workshop were Tom Blount, Rachel Wilson and Hannah Redler Hawes. The workshops were attended by around five participants interested in data and art from the neurodiverse community in and around Birmingham. An early community building workshop was held in January 2018: The ODI organised and hosted a workshop as a kick-off for the Data Stories project bringing in 30 people from journalism, art, civil society and academia around the theme of "bringing data to citizens, and vice versa". The purpose of this workshop was to introduce the Data Stories project to data journalists and data activists, and survey the state-of-the-art in terms of data narratives and data engagement. Whilst the workshop was a success and achieved its objective to convene a variety of stakeholders around an interesting topic, the ODI felt it could play a more effective and unique leading role bringing a DAC flavour, rather than the initial plan to facilitate further workshops in support of the various work streams.
Impact	The artwork resulting from this collaboration was showcased in 3 exhibitions: 1) BOM Hacked! , 2) V&As Digital Design Weekend, 3) part of the "Copy That? Surplus Data in an Age of Repetitive Duplication" exhibition This was a multidisciplinary partnership between the Data as Culture programme at the ODI, artists and an art collective as well as the Data Stories research staff.
Start Year	2018


Description	Data springboard
Organisation	Westminster City Council
Country	United Kingdom
Sector	Public
PI Contribution	Strand Aldwych is a new destination for London, developed through an extensive co-design process led by Westminster City Council and over 70 stakeholders. The scheme aims to bring the inside out - celebrating the wealth of cultural and educational offer in the area, whilst providing a new green oasis in central London for people to come together. The Strand-Aldwych Data Springboard is a collaboration between researchers at King's Informatics and Westminster City Council, the local authority responsible for the Strand-Aldwych redevelopment. The concept of the Springboard emerged from the Smart Working Group, which explores the use of data to improve the newly created pedestrian zone at Strand, and how sharing of open, big and local data can encourage and promote cohesion and improve engagement with residents, workers and visitors. As part of the working group, we want to innovate with data during and beyond the redevelopment. Data sharing means allowing third parties specifically permissioned access to datasets to generate value. It allows organisations to innovate with and generate value from data resources that would otherwise not be accessible. For Strand-Aldwych, it means enabling the community of policy makers, students and researchers, businesses and cultural institutions to exchange data between themselves, for the benefit of the whole community. Building on interdisciplinary research in data science, open innovation, and AI, in projects such as Data Stories, MediaFutures, and EUHubs4Data, we design participatory data stewardship and governance formats facilitating bottom-up engagement with data and data-driven decision making. For the Data Springboard, we work with both public and private stakeholders of the redevelopment. We started with a series of workshops in 2022, in which we discussed the key design questions such as how everyone could benefit from the Springboard, what challenges it should focus on, how it should approach questions of data ethics, and what legal and community frameworks we would need to put into place to ensure everybody benefits.
Collaborator Contribution	"This area of work is a key council priority for the local authority which aligns with short and long-term ambitions of the transformation scheme and is foundational to our wider management model which involves various stakeholders and communities in the area." - Kirsten Zeller, Westminster City Council The results of our stakeholders workshops are promising. Stakeholders identified key challenges the Data Springboard could address, including increased efficiencies, supporting local businesses and governance, making data interoperable, and establishing accessible feedback routes. Participants identified a series of relevant datasets, including traffic flows and footfall data, weather and pollen forecasts, and accessibility. Ethical challenges were seen primarily in ensuring that the Springboard has core values embedded in its framework, including a duty of care for those who might be affected by the data, and especially does not put vulnerable groups at risk. Stakeholders were also especially keen to not only build a data sharing platform, but a whole community, which could not only share data, but also insights generated through it. Thanks to this demand, and the hyperlocal context for the data and insight that should be shared, there are no templates we can build on to establish this community and platform - so we have to build our own!
Impact	Workshops report to be published. Publication currently under development.
Start Year	2021


Description	Talking Datasets: A Study on Verbal Dataset Description
Organisation	University of Amsterdam
Country	Netherlands
Sector	Academic/University
PI Contribution	We have started to connect researchers and to some extent practitioners to come together and discuss challenges in data discovery, aiming to gain a better understanding of the extent to which techniques, methods and lessons learned from document retrieval (broadly construed) could be applied to data-centric contexts, providing an opportunity for an in-depth exploration of the differences between these two areas from both a technical and interaction perspective. Together with the Data Archiving Networked Services (Royal Netherlands Academy for Arts and Sciences) & Informatics Institute, University of Amsterdam, we conducted an interview study with 30 participants: design, set-up, execution and analysis of the study.
Collaborator Contribution	Working together with University of Southampton as above.
Impact	We conducted a mixed-methods study and wrote up the analysis as a paper submission the International Journal of Human Computer Studies where it is currently under review. We were able to built a network of researchers and practitioners interested in bringing the topic forward. We plan to continue a workshop with more focus on non-academic audiences and present the work at the Data Stories Symposium which we organise in June 2020.
Start Year	2018


Description	Understanding charts
Organisation	Google
Department	Google Crowdsource
Country	India
Sector	Private
PI Contribution	In this project we aim to build a dataset of charts that vary according to a range of design choices and data properties commonly displayed in data visualizations. The dataset will then be annotated, via a crowdsourcing task, with ratings of whether or not the chart is perceived to be readable and trustworthy. This is a collaboration in which we (ES, LK) create a study design for a large scale crowdsourcing experiment, led by us, in collaboration with a team from Google crowdsource. We contribute expertise, project management, data to be used in the experiment, as well as the original idea.
Collaborator Contribution	Our partners at the crowdsource team contribute expertise, resources and the task implementation. This includes implementing and running the experiement and advertising the task to their user base to increase participation.
Impact	No outcomes yet, collaboration is ongoing.
Start Year	2020


Title	Data Storytelling Tool
Description	The Data Storytelling Tool is a web-tool designed to support data journalists, auditors, and any other authors of "data stories": articles or reports that tell a narrative inferred from an underlying dataset through the use of text and data visualisation. The tool allows users to import their own data, provides an overview of said dataset, recommends suitable story-beats and visualisations, and exports the story to a number of formats. The tool is a client-side, javascript-powered, html page (i.e. while the tool will be deployed/hosted on a server, all data-processing happens on the user's machine - no sensitive data of any sort will leave the user's machine). The objective of this tool is to assist the analysis of data through authored (and semi-automated) narrative, making it useful and applicable to data journalists, procurement specialists, and any other authors of data stories. The tool will allow users to import their own data, provide an overview of the data, recommend suitable story-beats and visualisations, and export the story to a number of formats. Workflow Overview User uploads a (.csv) datafile (alternatively, the user can load a previously saved story) User selects fields of interest User selects possible/likely dependencies/correlations Tool recommends story-beats/visualisations User completes story-template with text/images/manually selected visualisations/etc. User exports story to one of several formats Features 1. Data upload/overview: Data can be uploaded from CSV (comma separated value) files; an overview of the data (including data type, selection of values, and min/max values and value distribution, if applicable) is shown to the user, allowing them to browse the data at a high level, and select values of interest 2. Visualisation generation: Bar chart, scatterplot, and line chart (time series) visualisations are currently generated based on user-submitted data, using the d3.js library 3. Simple visualisation recommendation: Visualisations are currently recommended based on user submitted context of dependencies/correlations 4. Narrative authoring: users are currently able to author their narrative (based on simple recommendations based on their submitted correlations); users can supply text, images, and generate additional charts to construct the narrative 5. Export to html/json: Data stories can be exported in a number of formats including html (tailored to stand-alone pages, embeddable content, or (in conjunction with additional js libraries) slide-based content) and json 6. Story saving/loading: the tool supports saving/loading of data stories to user-controlled files (as no data is passed to a server) 7. Narrative template system: a rule-based system to enhance the authoring experience, by guiding the user step-by-step through the narrative process 8. Advanced visualisation recommendation: enhanced recommendation, that may include elements such as (for example) trend-detection, correlation-detection, and/or anomaly detection 9. Visualisation annotation: allow users to add additional annotations on top of the generated visualisations to highlight any elements that, given their contextual knowledge, would be valuable to their audience Codebase: https://github.com/data-stories/storytelling Demo: https://TBFY.github.io/storytelling
Type Of Technology	Webtool/Application
Year Produced	2020
Open Source License?	Yes
Impact	The tool was designed in concert with real-world data professionals and journalists. The tool has been evaluated with support from industry and project partners using a series of "talk-aloud" contextual enquires and there has been interest in further specialisation of the tool for relevant domains. It has also been presented at the Data Stories Symposium 2020.
URL	https://github.com/data-stories/storytelling


Title	Metadata vocabulary
Description	Metadata vocabulary to describe datasets
Type Of Technology	New/Improved Technique/Technology
Year Produced	2023
Open Source License?	Yes
Impact	800k datasets on Dataverse, Kaggle, Hugging Face have a record expressed using this vocabulary.
URL	https://mlcommons.org/working-groups/data/croissant/


Description	Data Stories Symposium
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	The Data Stories Symposium 2020 brought together experts from academia, industry and the third sector to discuss, generate ideas and inspire future interdisciplinary collaborations aiming to explore Human Data Interaction in relation to storytelling with data. The event took place online, due to COVID-19, over two half days and had a continuous participant number of around 100 during the event, with over 400 sign ups, more than 50% of which came from Academia, the rest self-reported to be from a mix of industry, public and third sector. The event sparked many interesting discussions and resulted in collaboration opportunities as well as explicit interest to repeat the event.
Year(s) Of Engagement Activity	2020
URL	http://datastories.co.uk/symposium/


Description	Invited Talk at the Anual Open Data Conference
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Policymakers/politicians
Results and Impact	Presented research on insights in Human Data Interaction to civil servants working on Open Data in Ireland
Year(s) Of Engagement Activity	2020
URL	https://data.gov.ie/blog/annual-conference-2020


Description	Invited Talk: University of Bristol
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Professional Practitioners
Results and Impact	Invited talk at the University of Bristol as part of the Data Visualization Seminar, Deparment of Informatics
Year(s) Of Engagement Activity	2021
URL	https://dataviz.blogs.bristol.ac.uk/2020/11/16/upcoming-january-2021-talk-elena-simperl/