The use of interactive electronic-books in the teaching and application of modern quantitative methods in the social sciences

Lead Research Organisation: University of Bristol
Department Name: Education

Abstract

When social science researchers wish to carry out research into a specific problem, if they choose a quantitative approach they will collect either new data or existing data and perform statistical analysis on this data. The modern age and the impact of faster computers with larger storage capacities has resulted in far larger data sources which can be analysed. It has become increasingly important therefore for social science researchers to be trained in quantitative methods and the use of statistical computer software to analyse datasets and answer research questions. Modern statistical techniques have also become more computational and so there is a desire for software tools that simplify the research process whilst still allowing social scientists access to the most appropriate statistical methods.

In this proposal we build on work in a current grant in which we have prototyped an interactive electronic book (or eBook) system for learning about statistical techniques and performing statistical analysis. An eBook can be thought of as combining the features of a book with those of a statistical package as it contains a mixture of textual information, graphs and tables but also input boxes which when completed write sections of the book that are conditional on the inputs supplied.

We intend to investigate the appropriateness of the new technology and how it may be adapted to be used for various tasks that are commonly performed by social scientists. We will focus on three specific tasks:
1. Assisting the researcher while they work through the various stages of performing a statistical analysis for datasets with particular features.
2. Helping to make the researchers research reproducible, so that other researchers could repeat it, by allowing the researcher to link full details of the statistical analysis that they have performed within the publication that results from their work and allowing others to interact with the analysis resulting in an enhanced journal article.
3. Training social science researchers in new quantitative research methods and new statistical software through the use of interactive ebooks.

Planned Impact

Who will benefit from this research?

We expect in the longer term for the whole quantitative (social) science community to benefit directly or indirectly through the research proposed but we foresee this happening in stages as new groups of researchers become familiar with the tools that we produce. We will therefore focus on three groups to illustrate how our work will impact on their lives.

Group 1: Our team and our initial 'experts' working with us on work packages 1 and 2
Group 2: The community of users of our statistical software tools and training materials (currently over 10,000 strong and growing)
Group 3: Any researcher who accesses the (quantitative) scientific literature whether it be journal articles or textbooks and training materials

How will they benefit from the research?

Group 1: This first group will benefit directly during the lifetime of the project by being the first to use the new technology and develop interactive eBooks to showcase their research in a completely new way. They will learn how to use the system and have the unique opportunity to have input into the design of a system that we hope will change the way that researchers perform research. They will act as standard bearers taking the tools to their various disciplines and converting others to the eBook approach.
Group 2: The second group have already benefitted from our earlier research and will have produced many of the over 3,000 works that cite our existing software tools (MLwiN). They will therefore be used to using our tools for performing statistical analysis and will in time be converted to taking up our new system and the transparency that it offers. They will then use it to create the quantitative social science literature of tomorrow and create enhanced journal articles and training materials that use our interactive eBooks. They will be able to benefit from each others efforts by using our repository to access research that is tailored to their research questions and disciplines.
Group 3: The third group may not benefit immediately from our research but, as time passes and our user community produces more literature in the form of interactive eBooks, we will expect more and more researchers to have exposure to eBooks, whether through reading enhanced journal articles or training materials. The fact that our Stat-JR software, that sits under the eBook system, does not simply support our own MLwiN and eSTAT software but interoperates with all the main statistical software packages will open up our research to new groups who are wedded to other packages. As all our software tools are free to UK academics, financial cost will not be a factor in this market.
 
Description This grant funded three years of research and software development that studied the potential of interactive electronic-books (eBooks) for developing, applying and teaching modern quantitative methods in the social sciences.
Through the grant the software package Stat-JR (developed under previous ESRC grants in the NCRM and DSR programmes) was further developed so that it now includes a workflow-based interface (LEAF: Logging and Execution of Analysis Flows) that allows users to produce their own workflows that link together the steps of their statistical analysis. The workflow interface allows users to construct a workflow within a visual programming environment, arranging interlocking blocks describing a sequence of analytical steps, including, for example, data manipulation, visualisation, and descriptive and inferential statistics. We also conducted a series of case studies looking at how quantitative researchers do their research in practice. This highlighted a diversity of approaches, covering a range of research questions, and the difficulty of producing an automated system to cover all possible analyses.
The workflow software and our findings from the case studies fed into further software development which has resulted in the creation of a suite of statistical analysis assistant (SAA) eBooks within Stat-JR that guide the researcher in a semi-automated way through specific analyses of their own data and then produce a full analysis report. These SAAs cover many of the common, and indeed some cutting-edge, statistical models used by social scientists in their research. A range of models were covered from simple linear regression all the way up to a combined SAA that can handle a large number of different types of statistical models. Our most advanced SAA features the following: the ability to choose between continuous, binary, and count responses including the facility to transform responses; the capability to specify several underlying multilevel data structures including both hierarchical and crossed research designs; the possibility of investigating random intercepts, slopes and cross-level interaction terms. Work has also been conducted investigating statistical modelling of multilevel missing data in separate SAAs. All the SAAs embed analysis outputs in reports containing contextual descriptions of the steps of the analysis and interpretation of the resulting statistical outputs. They cover the entire process from descriptive statistics through preliminary analysis and model building and selection to residual checking and prediction.
During the period of the grant, our work on Stat-JR has been realised in additional functionality incorporated in two new release versions: firstly v1.0.4, released halfway through the grant, introduced workflows, and then v1.0.5, released at the end of the grant, further built on this work and introduced SAAs. Each release comes with extensive accompanying documentation in the form of five manuals. This has also been complimented by workflows for much of the Centre's free online LEMMA training materials (developed under ESRC grants in the NCRM programme), and the development of additional practicals for the existing LEMMA training materials.
Exploitation Route The Stat-JR software is freely-available to UK academics and any of our existing 18,000-plus software users. Users can apply the software to their own dataset and get it to produce bespoke analyses for their particular research question. Other statisticians can use our suite of statistical analysis assistants (SAAs) as templates to produce their own customised SAAs if they perform analysis in subtly different ways. We have already used the software ourselves for consulting work with the home office on crime statistics and have had funding from the British Academy to further develop additional aspects of Stat-JR to add functionality that allows the automation of bespoke teaching materials (linking with the SPSS package) for teaching basic statistics. We also have an ESRC NCRM funded collaborative grant starting in January that in part will expand on the family of SAAs to incorporate workflows that perform small area estimation. Other researchers can view our case studies and additional LEMMA materials (which currently have over 20,000 registered users).
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Education,Environment,Healthcare,Government, Democracy and Justice

URL http://www.bristol.ac.uk/cmm/research/ebooks/
 
Description Software developed in earlier grants and further developed here has many users both academic and non-academic.These users use the software to analyse data that is then used in journal articles and reports etc. In particular we have worked with the home office to analyse crime data using the new SAA features of the Stat-JR software that have been developed in this grant. This work had fed into home office consultations on resources.
First Year Of Impact 2014
Sector Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Education,Environment,Healthcare
Impact Types Societal,Economic

 
Description Funded under quantitative skills funding
Amount £92,501 (GBP)
Organisation The British Academy 
Sector Academic/University
Country United Kingdom
Start 04/2016 
End 03/2018
 
Title ProvToolbox 
Description Provenance is a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing. In particular, the provenance of information is crucial in deciding whether information is to be trusted, how it should be integrated with other diverse information sources, and how to give credit to its originators when reusing it. In an open and inclusive environment such as the Web, where users find information that is often contradictory or questionable, provenance can help those users to make trust judgements. PROV is a set of W3C specifications defining a model, corresponding serializations and other supporting definitions to enable the inter-operable interchange of provenance information in heterogeneous environments such as the Web. ProvToolbox is a Java library to create Java representations of the PROV data model (PROV-DM), and convert them between RDF, XML (in PROV-XML format), text (in PROV-N format), and JSON (in PROV-JSON format). 
Type Of Technology Software 
Year Produced 2013 
Open Source License? Yes  
Impact ProvToolbox is the basis of community services for provenance translation and validation at https://provenance.ecs.soton.ac.uk. ProvToolbox was used in the inter operability phase of the W3C Provenance Working group https://www.w3.org/TR/prov-implementations/ 2016 contribution: templating system 
URL http://lucmoreau.github.io/ProvToolbox/
 
Title Stat-JR v1.0.4 
Description Stat-JR is a multi-purpose statistical software package that we have been developing over several years. It has 2 original interfaces - a more standard statistical package type interface and an eBook interface. Both interfaces are accessed via a web-browser and as well as offering it's own statistical estimation engines, Stat-JR also interoperates with many other software package. Version 1.0.4, is a new version of the software which includes the new Workflow interface developed for the first time in this grant and a third way to interface with the software package. 
Type Of Technology Software 
Year Produced 2016 
Impact The new workflow interface is the basis of the work as yet not fully released to the community on statistical analysis assistants. The workflow interface is released along with workflows that demonstrate the LEMMA training materials developed by the centre. The eBook interface of v1.0.4 has been adapted to cope with both templates and workflows making this interface more flexible. 
URL http://www.bristol.ac.uk/cmm/software/statjr/
 
Title Stat-JR version 1.0.3 
Description Latest version of our Stat-JR software which is free to UK academics. Released in September 2014 
Type Of Technology Software 
Year Produced 2015 
Impact Stat-JR has been downloaded by 400+ users so far and cited 4 times so far 
URL http://www.bristol.ac.uk/cmm/software/statjr/
 
Description Creating a Statistical Analysis Assistant using Stat-JR 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A talk at the Royal Statistical Society conference to an audience of statistical education and more general statistical researchers.
Year(s) Of Engagement Activity 2016
 
Description Developing Interactive eBooks and an Analysis Assistant to Teach and Apply Modern Quantitative Methods 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Richard Parker talked to International Association for Statistical Education (IASE) 2015 Conference in Rio de Janeiro, Brazil,
Year(s) Of Engagement Activity 2015
 
Description Developing Interactive eBooks and an Analysis Assistant to Teach and Apply Modern Quantitative Methods 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Richard Parker talked at the Royal Statistical Society (RSS) Annual Conference in Exeter, UK
Year(s) Of Engagement Activity 2015
 
Description Enabling Provenance on the Web: Standardization and Research Questions 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Keynote presentation by Luc Moreau at the 14th International Conference on WWW/INTERNET (ICWI) in Maynooth, Greater Dublin, Ireland
Year(s) Of Engagement Activity 2015
 
Description Enabling Provenance on the Web: Standardization and Research Questions (Keynote at International Conference on WWW/INTERNET 2015) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Enabling Provenance on the Web: Standardization and Research Questions

Provenance is a record that describes the people, institutions,
entities, and activities, involved in producing, influencing, or
delivering a piece of data or a thing in the world.

Some 10 years after beginning research on the topic of provenance, I
co-chaired the provenance working group at the World Wide Web
Consortium. The working group published the PROV standard for
provenance in 2013.

In this talk, I will present some use cases for provenance, the PROV
standard and some flagship examples of adoption. I will then move
onto our current research area in exploiting provenance, in the
context of the Sociam, SmartSociety, ORCHID projects. Doing so, I will
present techniques to deal with large scale provenance, to build
predictive models based on provenance, and to analyse provenance.
Year(s) Of Engagement Activity 2015
 
Description IPAW 2006-2016: Retrospect and Prospect of Provenance (Keynote at International Provenance and Annotation Workshop ipaw'16) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact IPAW 2006-2016: Retrospect and Prospect of Provenance

IPAW, the biannual International Provenance and Annotation Workshop
series, was launched in 2006. We celebrate its 10th anniversary in
2016. During those 10 years, the field of provenance has seen a
tremendous amount of development. Among the 30 events I identified, I
will highlight some successes, such as the Provenance Challenge and a
standardisation activity at the World Wide Web Consortium. What is
the next step for the provenance community? By reviewing existing
applications of provenance and tooling, and by discussing some
research activities, I will attempt to map future directions for the
provenance community.
Year(s) Of Engagement Activity 2016
URL http://www2.mitre.org/public/provenance2016/ipaw.html
 
Description Presentation at JP Morgan TechFest, Bournemouth, Enabling Provenance on the Web: Standardization and Research Questions 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Industry/Business
Results and Impact Enabling Provenance on the Web: Standardization and Research Questions

Provenance is a record that describes the people, institutions,
entities, and activities, involved in producing, influencing, or
delivering a piece of data or a thing in the world.

Some 10 years after beginning research on the topic of provenance, I
co-chaired the provenance working group at the World Wide Web
Consortium. The working group published the PROV standard for
provenance in 2013.

In this talk, I will present some use cases for provenance, the PROV
standard and some flagship examples of adoption. I will then move
onto our current research area in exploiting provenance, in the
context of the Sociam, SmartSociety, ORCHID projects. Doing so, I will
present techniques to deal with large scale provenance, to build
predictive models based on provenance, and to analyse provenance.
Year(s) Of Engagement Activity 2015
 
Description Provenance for explaining and reproducing past results ( Invited presentation at the Alan Turing Institute Symposium on Reproducibility for Data- Intensive Research, University of Oxford) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Provenance for explaining and reproducing past results

The ESRC EBook project aims to offer a multi-modal tool-suite (command line, web-based interactive portal, and interactive workflows) aiding in the use and teaching of statistical analysis techniques with a particular emphasis on their application to social science. Provenance is at the heart of this approach, capturing traces of execution steps, irrespective of their modality. Provenance can also be used as input to a workflow reconstruction component, allowing traces of previously captured steps to be edited as re-executable workflows. In this talk, I will outline the EBook approach and I will illustrate the salient aspects of the provenance PROV model.
Year(s) Of Engagement Activity 2016
URL https://osf.io/bcef5/
 
Description Stat-JR Workflow and eBook Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact A 1 day workshop in September 15 to test out new features of our software package
Year(s) Of Engagement Activity 2015
URL http://www.bristol.ac.uk/cmm/research/ebooks/
 
Description Stat-JR Workflow and eBook Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact A training workshop that showcased new features of our software
Year(s) Of Engagement Activity 2016
 
Description Stat-JR, eBooks, workflows and other software developed at the multilevel modelling centre 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Invited research seminar by Bill Browne at the School of Mathematics, Statistics and Actuarial Science, University of Kent,
Year(s) Of Engagement Activity 2015
 
Description Stat-JR: eBooks, workflows and other software developments at the Centre for Multilevel Modelling 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Invited research seminar by Bill Browne for the Statistics and Probability Group at Durham University, UK
Year(s) Of Engagement Activity 2016
 
Description Statistical Software developments at the Centre for Multilevel Modelling 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Invited research seminar by Bill Browne at the School of Mathematical Sciences, University of Nottingham, UK
Year(s) Of Engagement Activity 2015
 
Description Talk at RSS conference in Sheffield 2 Sept 2014 (A Statistical Analysis Assistant - the future or folly?) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact around 60 other statisticians came to the talk

More people downloaded our software
Year(s) Of Engagement Activity 2014
URL https://rss.conference-services.net/programme.asp?conferenceID=4023&action=prog_list&session=29956
 
Description The Use of eBooks and a Statistical Analysis Assistant to Teach Multilevel Modelling 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A talk at the International Multilevel Modelling Conference in Utrecht (Plenary) to experts in the field
Year(s) Of Engagement Activity 2015
 
Description Using Computers to Teach Statistics to Reluctant Researchers 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact A talk to a more general education audience about the work in our project in Bristol.
Year(s) Of Engagement Activity 2016
 
Description What are Statistical eBooks? 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact A talk to the ESRC Research Methods Festival in July where I presented an overview of what is meant by the concept of statistical eBooks and how this relates to our current grant work.
Year(s) Of Engagement Activity 2016
 
Description eBook Writing in Stat-JR workshop (Edinburgh) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact A workshop to demonstrate software to potential users in Edinburgh
Year(s) Of Engagement Activity 2015
URL http://www.bristol.ac.uk/cmm/research/ebooks/
 
Description he use of electronic books for teaching statistical ideas with application to statistical ecology 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact A talk to a different academic and related community (statistical ecologists)
Year(s) Of Engagement Activity 2015