SOCIAM: The Theory and Practice of Social Machines

Lead Research Organisation: University of Oxford
Department Name: Computer Science

Abstract

SOCIAM - Social Machines - will research into pioneering methods of supporting purposeful human interaction on the World Wide Web, of the kind exemplified by phenomena such as Wikipedia and Galaxy Zoo. These collaborations are empowering, as communities identify and solve their own problems, harnessing their commitment, local knowledge and embedded skills, without having to rely on remote experts or governments.

Such interaction is characterised by a new kind of emergent, collective problem solving, in which we see (i) problems solved by very large scale human participation via the Web, (ii) access to, or the ability to generate, large amounts of relevant data using open data standards, (iii) confidence in the quality of the data and (iv) intuitive interfaces.

"Machines" used to be programmed by programmers and used by users. The Web, and the massive participation in it, has dissolved this boundary: we now see configurations of people interacting with content and each other, typified by social web sites. Rather than dividing between the human and machine parts of the collaboration (as computer science has traditionally done), we should draw a line around them and treat each such assembly as a machine in its own right comprising digital and human components - a Social Machine. This crucial transition in thinking acknowledges the reality of today's sociotechnical systems. This view is of an ecosystem not of humans and computers but of co-evolving Social Machines.

The ambition of SOCIAM is to enable us to build social machines that solve the routine tasks of daily life as well as the emergencies. Its aim is to develop the theory and practice so that we can create the next generation of decentralised, data intensive, social machines. Understanding the attributes of the current generation of successful social machines will help us build the next.

The research undertakes four necessary tasks. First, we need to discover how social computing can emerge given that society has to undertake much of the burden of identifying problems, designing solutions and dealing with the complexity of the problem solving. Online scaleable algorithms need to be put to the service of the users. This leads us to the second task, providing seamless access to a Web of Data including user generated data. Third, we need to understand how to make social machines accountable and to build the trust essential to their operation. Fourth, we need to design the interactions between all elements of social machines: between machine and human, between humans mediated by machines, and between machines, humans and the data they use and generate. SOCIAM's work will be empirically grounded by a Social Machines Observatory to track, monitor and classify existing social machines and new ones as they evolve, and act as an early warning facility for disruptive new social machines.

These lines of interlinked research will initially be tested and evaluated in the context of real-world applications in health, transport, policing and the drive towards open data cities (where all public data across an urban area is linked together) in collaboration with SOCIAM's partners. Putting research ideas into the field to encounter unvarnished reality provides a check as to their utility and durability. For example the Open City application will seek to harness citywide participation in shared problems (e.g. with health, transport and policing) exploiting common open data resources.

SOCIAM will undertake a breadth of integrated research, engaging with real application contexts, including the use of our observatory for longitudinal studies, to provide cutting edge theory and practice for social computation and social machines. It will support fundamental research; the creation of a multidisciplinary team; collaboration with industry and government in realization of the research; promote growth and innovation - most importantly - impact in changing the direction of ICT.

Planned Impact

The proposed programme will have beneficial impact on a wide range of stakeholders. Via technology transfer, companies will gain access to new technologues, and also gain the understanding that will allow them to develop new products for communities organising themselves in social machines. Those companies that partner us or support our research will of course have the ability to feed ideas into the research, and frame the problems we are trying to solve; we consider it essential that fundamental research feeds into, and back from, real-world applications.

Smaller-scale entrepreneurs will have new outlets for innovation, and new opportunities to develop radical business models. The public sector and third sector will have available new tools and methods for achieving policy ends. Communities using social machines will also benefit, of course, by the ability to identify and define their own problems, and develop their own solutions. These benefits, in social cohesion and cooperation, will often outlive the immediate issue which drove the development of the social machine.

We should not forget the benefits to the wider academic community of the proposed research. Of course, the development of a community of multi-disciplinary researchers in social machines will benefit the computer science field, but via the observatory and the strong social relevance of the research, we would expect a wide academic community in science and social science benefiting from the deepening of expertise in this area, and the large quantity of data. The 5-year programme would allow a strong multi-disciplinary cohort of researchers to emerge, able to influence a range of fields, spreading expertise in these relatively novel methods of social collaboration. Dissemination will also take place via our programme of Town Meetings, sandpits, hackathons, disruptive skills workshops, etc. Groups associated with the consortium, such as the Web Science Trust, will be able to ensure that SOCIAM's work is widely disseminated and one of our Partners is the world's largest Technical PR Agency.

The impacts will be both economic and non-economic. The economic impacts will be the benefits that come from innovation and cooperation, and from bottom-up solutions to problems. These will include both lowering costs of social problems (e.g. via community policing lowering the costs of crime), and creating opportunities for innovation and commercial exploitation of innovation (as for example with the development of new services based on creative uses of available data). Some of these benefits will fall to entrepreneurs, while others will spill over into the wider community.

Furthermore, the research will enable value to be extracted from the ever-growing quantities of data we see. The social return on investment in data acquisition, particularly public open data, will be dramatically improved as more tools and methods are created for using the data to drive services.

There will be several non-economic impacts too. In policy terms, the impacts will be high, particularly as local solutions for problems - inherently more efficient than centralised problem-solving which cannot always take account of local conditions - will emerge from collaboration in social machines in small communities. Communities will become empowered and self-reliant. The result will be a suite of tools and methods which can be put to work in social contexts by a range of actors - government, to achieve policy goals, groups of people, to achieve social goals, or entrepreneurs, to achieve commercial goals. Indeed, one would expect a social machine to encompass all of these at different times.

Publications

10 25 50

Related Projects

Project Reference Relationship Related To Start End Award Value
EP/J017728/1 01/06/2012 31/07/2015 £6,219,059
EP/J017728/2 Transfer EP/J017728/1 01/08/2015 31/05/2018 £2,675,356
 
Description The Key Findings of our research are set out below under 10 headings. The original research proposal was organised around 6 themes:
Theme 1: Social Computation
Theme 2: Curated Data and Social Computation
Theme 3: Privacy, Accountability and Trust
Theme 4: Interaction
Theme 5: Social Machines Implementations
Theme 6: Social Machine Observatory

Under the 10 headings to follow, we have indicated which of the above themes the research outputs fall under.

1. Classification of Social Machines [Themes 1, 4, 5]
Knowledge elicitation techniques were adapted to support the conceptual analysis and categorization of social machines. As a result of this analysis, a taxonomic framework of social machines was developed. This framework consists of a number of dimensions and associated features that define the total space of design possibilities for social machines.
Subsequent classification work has focused on a specific class of social machines, dubbed knowledge machines. Knowledge machines are social machines that yield epistemic outputs. A particularly important class of knowledge machines are citizen science systems and online games that support the process of scientific discovery.

2018: A new approach to the conceptualization of social machines has been developed, which is rooted in work pertaining to the philosophical analysis of mechanisms. This approach is dubbed the mechanistic view of social machines. According to this view, social machines should be seen as systems whose associated events, states, and processes are, at least in part, realized by socio-technical mechanisms, i.e., mechanisms whose material constituents include multiple human agents and human-engineered artefacts (especially computational systems). One advantage of this mechanistically-oriented approach to social machines is that it enables us to categorize social machines with respect to the properties of the mechanisms that are used to identify them. This has yielded an alternative taxonomy for social machines that draws on the kinds of distinctions made between mechanisms in other disciplines, e.g., stochastic mechanisms, fluid mechanisms, stigmergic mechanisms, autocatalytic mechanisms, ephemeral mechanisms, and so on.

Within the rubric of the mechanistic approach, special attention has been given to two kinds of social machines: knowledge machines and economic machines. Both kinds of social machine are distinguished by the nature of the phenomena (especially, processes) that are realized by underlying socio-technical mechanisms. In the former case, the phenomena of interest are knowledge-related phenomena. These are typically processes associated with the acquisition, modelling, discovery, representation, and exploitation of knowledge. Prominent examples of knowledge machines include citizen science systems, which are mostly (although not always) concerned with the discovery of (scientific) knowledge.

Economic machines are a second kind of social machine that has been studied from a mechanistic perspective. In this case, the socio-technical mechanisms of a social machine are involved in the realization of economic phenomena, i.e., the phenomena that are the primary focus of empirical and theoretical interest for economists. This highlights the way in which the mechanistic view (with its emphasis on mechanisms and the phenomena they realize) is poised to reveal points of contact with a number of other disciplines.

As part of the effort to taxonomize social machines, we have identified a new form of collective intelligence, called mandevillian intelligence. Mandevillian intelligence is a form of collective intelligence that emerges as the result of cognitive shortcomings, limitations, or biases in one or more of the individual agents that form part of the collective. Interestingly, mandevillian intelligence alters our view as to the value of interventions that might be seen to undermine the cognitive or epistemic standing of individuals. Personalized search algorithms, for example, have been criticized on account of their potential to yield filter bubbles and echo chambers, with some members of the academic community calling on governments to regulate major search engine providers. From the standpoint of mandevillian intelligence such calls are premature, for they are seen to confuse cognitive effects that emerge at the individual and collective/social levels. Just because personalized search yields no cognitive or epistemic benefits for the individual agent, this does not mean it has no benefits at the collective/social level. Indeed, it may be the case that some forms of collective cognitive good (e.g., collective intelligence) are predicated on the very individual-level cognitive limitations that personalized search systems are deemed to produce.

2. Citizen Science and Crowdsourcing [Themes 1, 2, 5]
2.1 Citizen Science
We have been leading several research streams with regards to the observation and analysis of citizen science platforms on the Web. We have been working with our project partner, Zooniverse, and our external collaborators, Eyewire (Princeton, US), in order to run several studies and experiments within the space of citizen science. This work has led to recent publications in several top-tier conferences including CHI, CSCW, ICWSM and Web Science, as well as publications in journals, including Computers in Human Behaviour. The work contributes to SOCIAM's overall aim of understanding the human component in social machines, both from a community aspect (e.g. social communications and interactions), to an engagement perspective, which included the large-scale analysis of user behaviour.
We have also been leading on research activities involving the development of several observational dashboards (linked to the Social Machines Web Observatory platform) which will be used by members of the Zooniverse team in order to observe and manage their community. The work is intended to contribute to current research activities which study how research teams manage and interact with their team.

2018: SOCIAM has continued gaining insights into citizen science social machines mainly through its collaboration with project partner The Zooniverse. Focus was given to helping the Zooniverse view and interact with its community in real time during busy periods such as live television events or new project launches. Based on previous web observatory work, SOCIAM was able to build a dashboard for the Zooniverse to view activity on the platform like never before. Data from this dashboard is currently being analysed by SOCIAM and should lead to publications in the near future.

In addition to this, SOCIAM collaborated with the Smart Society project to run an experiment on the Zooniverse platform using interventions to stimulate increased levels of participation and volunteer retention. Findings showed that certain messages delivered at the correct time can be significantly effective at motivating volunteers to increase their levels of participation. (Segal et al., 2018)

2.2 Crowdsourcing
Beyond citizen science, we have been exploring the effects of motivation, incentives, and task design in contexts that use other forms of crowdsourcing, in particular paid microtask platforms. We looked at ways to make paid crowdsourcing more effective and rewarding for the people involved by testing how gamification enhances crowd experience. This work led to a publication at the WWW Conference in 2015 - we developed a gamified paid crowdsourcing platform, which was more effective than traditional paid crowdsourcing with respect to the volume of work completed by the crowd and with respect to the feedback received from the workers. We also studied the effect of different furtherance incentives related to game mechanics (additional rewards, feedback on task performance, feedback on performance compared to others) have on retention. As a follow-up of this work, we started experimenting with other models of crowdsourcing on the same gamified platform. In an article accepted for publication in the ACM TIST journal, we study how collaborative microtasks compare with traditional, individual task designs and how social incentives enabled by collaboration (social flow, social pressure) affect retention. A second line of research studied categories of tasks relevant to data management and curation. In an article published in the Semantic Web Journal we proposed different types of workflows to improve the performance of existing entity classification algorithms. At Semantics 2014 and ESWC2015 we presented experiments that investigated how specific task parameters in named entity recognition tasks change crowd performance and behaviour. Finally, in a second article in the Semantic Web Journal we proposed a workflow using contests and paid microtasks which can accurately repair the quality of some of the most critical data quality issues in DBpedia.

2.3 Data Citation
Citation is essential to traditional scholarship. Citations identify the cited material, help retrieve it, give credit to the creator of the material, date it, and so on. In the context of printed materials, such as books and journals, citation is well understood. However, the world is now digital. Most of our scholarly and scientific resources are held online and many are in some kind of crowd-sourced, or better expert-sourced, database, i.e. a structured, evolving collection of data. For example, most biological reference works have been replaced by curated databases, and vast amounts of basic scientific data - geospatial, astronomical, molecular, etc. - are now available online. There is strong demand that we should accord the same scholarly status to these databases and cite them appropriately, but how can we do this effectively? Data citation is an interesting computational challenge, whose solution draws on several well-studied problems in database theory: query answering using views, and provenance. We describe the problem, suggest a practical approach to its solution, and highlight several open research problems.

3. Lightweight Social Calculus [Themes 1, 4, 5]
We'd like to change the way in which social machines are built. The central premise is: there is a model of interaction which sits behind any social machine, governing who can do what when, which kinds of messages can be sent and to whom, shaping the ways in which the complex socio-technical system unfolds through time. However, these interaction models are often not explicit, transparent, editable, discoverable, composable and so on, as they are scattered through many interacting pieces of code. In response, we are looking at a way to create social computation systems by starting with an interaction model, allowing all of the other components of a modern, large scale interactive system to be organised around a representation of the communication and coordination which takes place.
In order to explore this space, a modified version of the Lightweight Coordination Calculus (LCC) is used, termed the Lightweight Social Calculus (LSC). It is an executable, declarative process calculus for interactions between heterogeneous agents. Use of LSC is based on the enactment of protocols, which give a minimally intrusive framework for defining patterns of communication without overly constraining the internal knowledge and decision making architecture of the actors involved.
There are several motivations for doing this:
i. Since protocols can be mechanically enacted, they provide the potential for mixed initiative human computer interaction and human computation applications. Creating a protocol which represents human interaction allows computational agents to join in on an equal footing with the humans.
ii. Making protocols first class objects allows for their exchange and manipulation. It means that communities can discover interactions which suit their needs and adopt them, after making any modifications necessary.
iii. Protocols can be transparent to users, indicating what the bounds and rules of the interactions are, leading to a greater facility for understanding the implications of engaging. This does, of course, on being able to represent the protocols in a manner which makes sense to users.
iv. There is a clear separation of concerns between the structure of the interaction and the mediums and communities where it is enacted. The interaction can then be framed in a manner which is most appropriate to the community in question, and integrated into their existing practice by connecting the interaction model with the technical platforms already in use - essentially, the interaction becomes key, rather than the substrate on which it is performed.
v. A protocol is amenable to formal techniques. For example, properties can be verified, such as the flow of data through the interaction, termination criteria, and other qualities which relate to privacy and security.

In order for this to work, humans need some way to engage with protocols as they are enacted. One mechanism for doing this is to create an interface with which people can engage, whether through webpages or mobile devices, or mediated through APIs of some sort. This is the means of engagement with which we're familiar from earlier LCC work. Another possibility is to find a way to run the protocols alongside existing interaction, annotating their behaviour with formal structures. We call this approach 'Soft Institutions', where the formal edges of electronic institutions are softened to provide natural, human ways for people to engage with them.
LSC is a declarative, executable specification designed to give enough structure to manage fully distributed interactions by coordinating message passing and the roles which actors play, while leaving space for the actors to make their own decisions. It is derived from LCC with extensions designed to make it more amenable to mixed human-machine interactions; in practice, this means having language elements which cover user input, external computation or database lookup and storing knowledge and state.
An LSC protocol consists of a set of clauses. Each clause is composed of a role specification and a description of what an agent should do when playing that role (which we call the body of the clause). The body contains message sending (M ? a(role,ID)) and receiving (M ? a(role,ID)), sequencing and choice (then and or), implication (action ? condition), the assumption of new roles (a(role, ID)) and any extra computation or conditions necessary, including accessing environmental artifacts.
When computing with LSC, each agent joins the interaction by taking on one of the roles specified, and taking the clause given by that role as its initial state. The state is then repeatedly re-written in response to incoming events: incoming messages are matched against expected messages, role definitions are replaced with the body of matching clauses, values are substituted for variables and so on. As the interaction progresses, this state tree keeps a complete history of the agents' actions and communications. This supports the creation of multi-agent institutions where interaction is guided by shared protocols and a substrate which keeps track of state.
There are a couple of applications that demonstrate how LSC advances the state of the art:
i. We present models and techniques for coordination of human workers in crowdsourced software development environments. We combine the Social Compute Unit - a model of ad-hoc human worker teams - with versatile coordination protocols expressed in LSC. This allows us to combine coordination and quality constraints with dynamic assessments of end-user desires, dynamically discovering and applying development protocols.
ii. The vast majority of human (and, increasingly, automated) social interaction is now taking place in social media systems where social norms are soft concepts regulated essentially by the people involved. Being able to leverage the power of electronic institutions in these systems would ease the application of computational intelligence in support of social tasks. We describe a method by which electronic institutions can act in synergy with these sorts of social media streams and, in doing so, we define a "softer'' style of system that, nevertheless, retains connection to precise specifications of coordination. In addition, we question the tacit assumption that participating agents deliberately join appropriate institutions. Although our method is independent of choice of social media stream (given a few standard characteristics of these) we have an implementation of the method using Twitter as a target media stream.

In addition, we are investigating in ongoing work the use of LSC as part of a design methodology for Social Machines. Ideally, new social machines could be described by their users' interaction protocols, and prototype infrastructure could automatically be generated from these interaction protocols.
The current research focuses on identifying simple protocols that can be used as building blocks and composed to form complex social machines. We then aim to produce a tool with which non-expert users could design social machines using a graphical language, and from which prototype infrastructure could be automatically generated.
Our tentative solution involves a hypergraph where nodes are actors and hyperedges are interaction protocols, and the open research questions include the following: Can we describe (most) social machines with such structures? Do we need additional constructs, and if so, with which semantics? Can a working prototype be automatically generated from such a structure? and if not, then what additional constraints apply? What additional information is needed?

2018: We explored the use of LSC in a case study involving Unmanned Aerial Vehicles (UAVs). In particular, we worked towards a formalisation of UAV accident scenarios in LSC. This allows us to create executable simulations of such scenarios, identify specific hazards and test the effects of different solutions.

LSC facilitates this process thanks to its simple declarative syntax and distributed execution.

A key finding arising from this work is the need for an explicit specification of the context of the interaction. A user may change behaviour not only due to a change in their role or their knowledge and capabilities, but also based on changes in the context of the social interaction. For example, a UAV pilot will make different decisions depending if the UAV is on the ground, taking off, cruising, landing, or taxiing. The concept of the interaction context was explicitly formalised within an extension of LSC.

4. Health Social Machines [Themes 1, 4, 5]
4.1 Formal Modelling of Care Pathways
We have worked towards developing formal models of care that capture the necessary sequence of steps for effective patient care. Integrate Care Pathways (ICPs) are a prime example of such models.
Using process modelling tools, such as WorkflowFM and the Lightweight Social Calculus, we created formal, structured models that enable the following key activities:
i. An intuitive, diagrammatic visualisation of the care pathways which helps clinicians better understand their own practices, analyse and rethink them.
ii. The use of a structured data model facilitates the recording and versioning of these models so that they can be more easily maintained and evolved as policies and practices change. They can also be shared amongst the different stakeholders to enable collaborative development, as well as across sites for a fruitful comparison.
iii. Formal verification techniques can be applied on our models to mechanically check their correctness and consistency. This can help eliminate errors, increase the trustworthiness of the model, and improve patient safety.
iv. Executable code (including, for example, workflow automation and generated checklists) can be extracted from these models to form an optimised social machine for healthcare provision.

Capturing and recording care pathways is itself a social process. Our work has resulted in a rigorous, collaborative methodology for the formal modelling of care pathways, from the collection of data from all the involved stakeholders to an optimised, executable support system that enhances clinical coordination and care practices.
In short, we believe process-based workflow models can be an effective methodology to document and share care pathways provided they are developed in close collaboration with clinical stakeholders.

4.2 Designing a Social Machine for the Heart Manual Service
In addition to our ICP modelling work, we have investigated suitable methods for designing social machines in a healthcare context. We have studied the case of a social machine for the clinical facilitators of the Heart Manual Programme, which is a home-based cardiac rehabilitation programme. Requirements were elicited through various participatory design methods and a proof of concept evaluation was carried out with a prototype. The prototype was received largely positively and scored highly on the system usability scale, indicating the success of the proposed methodology. This work suggests that adopting a participatory approach where stakeholders are active, equal participants throughout the design process leads to a more usable, likeable, and thus more successful social machine.

4.3 Data Safe Havens
Generic methods for sharing data in secure infrastructures are necessary for sharing healthcare data. The potential of medical and healthcare data to support the sorts of extensive, data-intensive experiments being demanded by precision and stratified medicine is a long way from being realised. A key architectural problem remaining to be solved is how to maintain control of patient data within the governance of local data jurisdictions, while also allowing these jurisdictions to engage with experiment designs that (because of the need to scale to large population sizes) may require analyses across several jurisdictions. Our study provides a snapshot of architectural work underway to provide a clear, effective structure of data safe havens within jurisdictions. We investigate how formally specified experiment designs can be used to enable jurisdictions to work together on experiments that no single jurisdiction could tackle alone. Our work relates to two jurisdictions (in Scotland and in Italy), but the architecture and methods are general across similar jurisdictions.
2018:
4.4 Process Optimisation for the Scottish National Safe Haven
Following up on our Data Safe Haven work, we have participated in a collaborative project with the electronic Data Research and Innovation Service (eDRIS) of the Information Services Division (ISD), NHS National Services Scotland and the University of Trento, Italy. The eDRIS team is responsible for the management of research projects that request healthcare data from the Scottish National Safe Haven. They support the researchers in order to better understand what data is available, the ethical and legal considerations, and the required application to the Public Benefit and Privacy Panel for approval. They also manage data retrieval, aggregation, de-identification and provision.

Although the eDRIS process is rigorous and methodical, considerable time and effort is required from the expression of interest by the researcher to the delivery of the data. Our collaborative project aimed to investigate potential improvements to the process in order to minimize the required time and effort.

While researchers from the University of Trento applied knowledge management and semantic web techniques to allow better management and integration of the data, the SOCIAM team focused on a more detailed analysis of the eDRIS workflow. We examined the relevant documents, forms and guidelines for the standard operating procedures, analysed existing and previous case studies, and performed several contextual interviews with 3 eDRIS members and a survey across the entire team.

Our work resulted in key findings with respect to bottlenecks and issues in the workflow. For example, we discovered that the ratio of the time spent communicating and discussing with the researcher to the time spent on data integration and management was unexpectedly high. This lead to a proposal of several suggestions for improvement, involving both technological solutions and workflow modifications. In the aforementioned example, techniques and solutions, such as synthetic data and an accessible schema description, that allow the researcher to understand the format and availability of the data first-hand would be much more effective than streamlining the internal eDRIS workflow for data management.

4.5 Use of Quantified Self Data in Clinical Practice
The pervasiveness of wearable devices and apps for sensing and automatically capturing aspects relating to people's health and wellbeing have caused a rise in self-logging of such data. However, no work has yet considered the challenges of introducing this data into clinical workflows. This work comprises work with clinical cardiologists and other clinical care providers to understand opportunities and challenges that the use of self-logged data might pose. Key findings of this work include the following:

- Data quality poses substantial challenges for admitting QS data into certain kinds of clinical use, including differential diagnosis; however, may be suitable for subjective assessments of symptomatic burden and other longitudinal care needs
- Representations of this QS data are highly non-standardised, and this presents a barrier to clinical use
- QS data remain highly fragmented, and voluminous, which presents a barrier to their effective use because interpretation of this data can require time
- QS data often represent dimensions of patient wellbeing that are not typically measured or considered during clinical workflows and this presents a challenge to effective interpretation

Nonetheless there were several potential avenues which showed promise, including for longitudinal symptom burden assessment and management.

2018:
4.6 Anonymisation methods
Anonymisation is a key method of rendering data safe, and in a series of books and papers, work has been performed on the development of 'functional anonymisation', which in many ways parallels the safe haven concept. A paper in Computer Law and Security Review outlines the intellectual justification for functional anonymisation in the face of criticisms of the anonymisation approach. Methods for functional anonymisation have been explained in open source books for the UK and the Australian legal contexts.

5. Machine Intelligence and the Mind of Society [Themes 1, 2, 4, 6]
We have explored the various ways in which social machines are relevant to the development of machine-based cognitive capabilities. This includes the idea that the Internet provides an important form of cognitive contact with the human social environment - one that provides a rich range of learning and socialization opportunities. This work dovetails with work in developmental psychology and cognitive robotics.
A second aspect of work in this area relates to the notion of human-extended machine cognition. This is a new concept in the philosophy of mind that maintains that the human social environment can form part of the material fabric that realizes certain kinds of machine-based cognizing. The notion of human-extended machine cognition is relevant to SOCIAM in the sense that social machines support the possibility that human individuals could form part of the extended realization base for advanced forms of machine intelligence.
Another strand of work concerns the notion of social computation. The terms 'social computation' and 'crowd computation' are currently popular within computer science; however, the systems to which these terms are applied reveal a tension with the philosophical conception of computation. The upshot is that either the notion of social computation is inadmissible, or that the philosophical conception of computation needs to be revised. Either way, we can begin to see how the the philosophical analysis of social computation is poised to impact theoretical discourses in multiple disciplines.
Finally, we have expanded the theoretical terrain of the sciences of the mind by developing the concept of the 'mind of society'. In particular, we have developed a theoretical proposal for a form of machine intelligence that is grounded in the monitoring of social processes. The idea draws inspiration from recent research in theoretical neuroscience and research into deep learning systems. In particular, we suggest that the attempt to develop a generative model of social processes provides the sufficient basis for a form of experientially-potent machine-based cognition.
2018:
The Internet provides unprecedented access to the human social environment, yielding insight into the dynamics of human behaviour (both individual and collective), as well as providing access to the digital products of human cognitive labour (again, both individual and collective). Such access is interesting from the standpoint of research into machine intelligence, for the human social environment looks to be of crucial importance when it comes to the evolutionary and developmental origins of the human mind. As part of our work in this area, we have developed a theoretical account that sees the Internet as providing opportunities for online systems to function as socially-situated agents. The result is a vision of machine intelligence in which advanced forms of cognitive competence are seen to arise from the digital socio-ecological niche that has arisen as a result of the Internet. We suggest that various forms of machine learning are required to enable online systems to press maximal cognitive benefit from their exposure to this niche. They include the likes of incremental, active, social, and (perhaps most importantly) predictive learning.

One of the reasons to regard predictive learning as of particular importance relates to the cognitive significance of so-called generative models. Within cognitive science, generative models are seen to emerge in response to the brain's attempt to predict the flow of information originating from the sensorium. Similarly, the performance of some kinds of deep learning system is predicated on the acquisition (and subsequent deployment) of a generative model that aims to capture the hidden causal forces (latent variables) that give rise to bodies of training data. When we apply all of this to the realm of social machines, we can begin to see the ways in socially-oriented generative models might help to explain the intelligence of both ourselves and provide insight into a new form of artificial intelligence. This highlights one of the ways in which our work in the SOCIAM project speaks to recent work in cognitive science and machine learning.


6. Provenance and Annotation [Themes 2, 3]
Analysis of social media data is now central to many decision making processes, which results in complex data processing and integration pipelines, often incorporating components from several sources. In this scenario, traceability of data across analytics pipelines is important both for the data providers' and the consumer's perspective. While the parties involved in the data analytics process are interested in assessing quality, reproducibility, and quality of the outcome of an analysis, the data provider(s), e.g. users of a social network, may want to have some guarantees about how their data is being used. In a paper, Peter Buneman, Adria Gascon and Luc Moreau draw attention to the general problem of combining provenance collected across systems, and we show through an example how a basic method of augmenting the provenance of communicating systems allows us to provide answers to questions about the movement of data and the attribution of responsibility to agents. This work is being used by Dong Huynh to incorporate provenance in the SOCIAM Web Observatory.
2018:
Provenance network analytics is a novel data analytics approach that helps infer properties of data, such as quality or importance, from their provenance. Instead of analysing application data, which are typically domain-dependent, it analyses the data's provenance as represented using the World Wide Web Consortium's domain-agnostic PROV data model. Specifically, the approach proposes a number of network metrics for provenance data and applies established machine learning techniques over such metrics to build predictive models for some key properties of data. Applying this method to the provenance of real-world data from three different applications, we show that it can successfully identify the owners of provenance documents, assess the quality of crowdsourced data, and identify instructions from chat messages in an alternate-reality game with high levels of accuracy. By so doing, we demonstrate the different ways the proposed provenance network metrics can be used in analysing data, providing the foundation for provenance-based data analytics.

Ongoing work with David De Roure applied the technique to the simulator of PokemonGo. Publication in preparation in [2018].

The problem of data citation in curated databases (a form of social machine) is technically challenging. Data citation is in wide demand by research councils, digital librarians and people who indulge in bibliometrics. Recent work in collaboration with the University of Pennsylvania has shown how provenance semirings -- a theory for data provenance in relational databases -- can be used quite generally to form citations from queries in general databases. The results are already in use in some trial databases.

7. Supporting Privacy and End-User Data Management in Social Machines [Themes 3, 4]
7.1 Privacy Data Controller Indicators (App X-Rays)
Smartphone apps currently leak personal information of many forms and types to a variety of destinations, including first and third parties, for a similarly varied set of purposes. Some of these purposes are critical for providing the service, whilst others include advertising and analytics.
End-users are seldom aware of exactly how these information are disclosed, with whom they are disclosed, and for what purposes. This causes a number of problems, including heightened anxiety by end-users about potential misuse of data by apps and services, and an inability to reason about how and whether particular apps or services are "safe" to use in accordance with their privacy preferences.
This work aims to help end-users make better informed privacy decisions by making the hidden information flows embedded within and behind social machines visible, in particular smartphone and web applications. We first used several approaches to measure these hidden information disclosure activities, including dynamic approaches (intercepting network traffic), and static ones (e.g. static analysis of disassembled compiled binaries), we identify first and third party entities who gather personal data via applications. We then designed a series of visual interfaces to represent this information to users, to explore their privacy concerns and information management needs. We also conducted large-scale analysis of the third party trackers associated with the 5,000 most popular Android applications and 5,000 websites, in order to understand the concentration of power in the tracker ecosystem.
Key findings:
- Our visual representations of these hidden information flows from smartphone apps allowed individuals to make more nuanced, contextually-informed, and confident decisions
- The third party tracking market, in both the web and mobile ecosystems, is concentrated in the hands of a small number of firms; by mapping parent-subsidiary relationships we were also able to measure the impact of market consolidation in recent years.
2018:
The App X-Ray project continued building on previous work in 2017 regarding exposure of user data to third party trackers such as advertising networks. During the summer, we recruited a team of interns to help build a new infrastructure for collecting a much larger dataset of smartphone apps, in order to map the data flows from apps to third party trackers. This dataset now includes over a million apps and has led to the creation of several research papers (including 'X-Ray Refine...', and 'Measuring third party ...').

Representing end-user's overall exposure to third parties in the X-Ray Refine interface surfaced a range of nuanced privacy concerns, highlighted the shortcomings of existing protection measures, and suggested a need to support different kinds of bespoke preferences.

Our analysis of the Android app ecosystem is (to our knowledge) the largest study of data flows from apps in terms of the number of apps included (close to 1 million, two orders of magnitude larger than previous studies). It reveals the most prevalent user tracking technologies and the corporate networks behind them. We also found that the distribution of trackers differed significantly by genre - with Games and News apps disclosing user data to the largest number of third parties on average.

7.2 Privacy Languages
Privacy remains a primary concern for most users of web and smartphone apps. Despite numerous efforts, users remain powerless in controlling how their personal information should be used and by whom, and find limited options to actually opt-out of dominating service providers, who often process users information with limited transparency or respect for their privacy preferences. Privacy languages are designed to express the privacy-related preferences of users and the practices of organisations, in order to establish a privacy-preserved data handling protocol. However, in practice there has been limited adoption of these languages, by either users or data controllers.
This work focused on understanding the strengths and limitations of existing policy languages. The preliminary results showed that exist languages focused on enabling control for organisations, but lack a focus on normal web users and on enabling their control over actual data resources. This laid down the groundwork for our next step privacy protection designs that aim to be centred around web users for empowering their control of data privacy.
Also under this heading, SOCIAM looked at understandings of privacy and divergent privacy discourses, and the difficulties these have created for interpreting privacy as single idea or ideal, and therefore on the problems created for determining privacy policies. Seven different types of privacy discourse, ranging from the conceptual to political, moral and rights-based discourse, were outlined. Work is ongoing to employ the framework as a means for situating privacy discourses and design, and some of its ideas were employed to evaluate the recent Google Spain/Right to be Forgotten ruling of the Court of Justice of the European Union (see in impact section). A paper is in train, and a shorter version appeared in IEEE Internet Computing in 2016.

7.3 Self Curation Online
This study, comprising a set of interviews of active social media users, sought to understand tensions faced by individuals to provide personally-identifying information online, and the strategies that they used to cope with such demands. Many reasons are found for people fabricating their data (e.g. providing false information), including creative expression, hiding sensitive information, role-playing, and avoiding harassment or discrimination. The results suggest lying is often used for benign purposes, and we conclude that its use may be essential strategy for people to maintain their privacy in an era which incentivises companies to ask for as much personal data as possible.

7.4 Pro-Social Deception
The pervasiveness of smart devices and sensing, whilst bringing potential benefits, also poses considerable threats to personal autonomy. This experiment looks at future personal sensing tools that give people "space to deceive", and even that facilitate or automate deception to give people the ability to regain their autonomy. Using a speculative fictions methodology, we asked people to reflect on several sketches of fictional tools (called the lieCloud or liePhone suite) that used automated deception to help individuals preserve their privacy. The results revealed many people were open to many pro-social uses of such tools.
2018:
Some work was done, following earlier work on extremist groups and on pro-social deception, to study activity that was either anti-social, or less obviously pro-social. The use by cybercriminals of anonymising tools to trade securely is of interest from the point of view of anti-social machines, and the non-optimality of this suggests opportunities for law enforcement to disrupt such machines. A more ambiguous area is that of digital welfare service delivery, where communities and groups of welfare recipients often find themselves as the targets of government cybersecurity programmes, thereby being both the users of the system and its enemies. Work has been done to theorise the security framework, based on focus groups (performed by collaborators outside the SOCIAM project) of welfare recipients, and a paper published (details not currently available).

7.5 Fairness and accountability in data use
In addition to examining aspects of privacy, we also explored questions of algorithmic fairness, accountability and transparency regarding the downstream use of data gathered through social machines. A set of alternative institutional and technical arrangements were proposed for privacy-preserving forms of data-driven discrimination analysis, and a philosophical analysis of algorithmic transparency was developed in terms of the political notion of public reason.
2018:
We continued to study fairness and accountability aspects of algorithmic decision support systems, with a particular focus on the human factors of machine learning systems in a range of applied settings. In several user studies, we tested perceptions of justice regarding the outputs of machine learning models in a range of contexts (such as insurance, loans, and employment), and with a range of explanation styles. Working with a colleague from UCL, we examined the human-computer interaction challenges involved in deploying machine learning in the public sector, based on interviews with 27 practitioners in 5 OECD countries. Finally, we examined the connections between philosophical work on fairness and the nascent literature from the 'fairness in machine learning' community.

Decisions based on ML models provoke similar justice perceptions to human decisions, but also raise new kinds of ethical concerns; different ways of explaining such decisions sometimes effect justice perceptions, but are not a panacea for accountability

The use of ML in the public sector is already widespread, and practitioners already have their own ways of embedding values like transparency and fairness into their systems; but these are limited and HCI has much to contribute to improving the application of these systems

7.6 Personal Data Stores
Based on the user needs requirements analysis, and vision we described in "The Future of Data is Personal" and designed and developed a personal data store platform called INDX capable of negotiating, retrieving and communicating using Linked Data, including LDP, which was demonstrated at a number of different forums, which we published as FOSS software in the SOCIAM GitHub repository.
2018
7.7 Privacy concerns for Parents and Children
This project builds upon the X-Ray project to explore data privacy awareness of parents and young children (aged 6 to 10) during children's use of mobile devices, such as tablets or parents' smartphones. The goal of this project is to design and develop interventions that are tailored to young children's cognitive ability and needs. During summer 2017, we recruited 12 parents and children pairs from local Oxfordshire area, and 250 parents from an online survey platform. This pre-study provided us useful insights regarding parents' current level of privacy awareness when choosing apps for their children, and young children's limited ability of coping with risks by themselves, and their dependence on parents' or other grown-ups' helps. This project has led to further funding from the EPSRC IAA grant, namely Kids Online Anonymity & Lifelong Autonomy, with the Anna Freud Centre for Children and Families as our advisor partner.

Key findings:
1.Parents are generally concerned about their children's privacy risks. However, when choosing mobile apps for their children, parents primarily focus on the content of the apps and what the apps do, instead of the personal information that might be collected by the apps.
2.Most parents use a range of technical restrictions to safeguard their children and restrict what they can install on their devices. However, parents' motivations are more likely to be preventing accidental in-app purchases than safeguarding children's privacy.
3.Most parents are fairly happy with children's awareness of privacy risks. However, parents often struggle with persuading their children to choose alternative apps when risks are recognised by the parents.
4.Young children aged 6-10 have limited ability of recognising data privacy risks related to their use of tablet computers, and they largely rely on parents' guidance, who might not always understand the risks themselves.
5.Mobile apps from the Google Play Store that are designed for families are identified to have a high number of third-party trackers associated with them.

7.8 Study of news media on social networks during elections
Digital-born and legacy news media are competing to control the most central positions in the flow of online news. It has been an interesting research question: who is taking a leading role, the digital-born or the legacy news media, such as newspapers, broadcasters or radio stations. In this joint project with Reuters Institute for the Study of Journalism of Oxford University, we examined how this competition unfolds during three national elections in Europe in 2017, including the French presidential election, the UK General election, and the German Federal election. By collecting 76 millions of news related tweets jointly with Eurecat during these three elections, we identified that legacy media and digital-born news media figured differently during each political election on the Twitter sphere. This is the very first study that provided empirical evidence for the journalism research community. The project outcome has been featured on theconversation.fr and generated 3 media reports and a public-facing YouTube video [all linked below]:

Key findings:
1. Overall, legacy media outlets figured very prominently in the political discussions on Twitter. During the French election, their legacy media generated more than seven times as much activity and engagement as digital-born news media, and in the German and UK elections, legal media generated >4 times more activities than digital-born news media.
2. However, in the UK, although legacy media, such as BBC News, Channel 4 News, Sky News, The Economist and the Financial Times, featured strongly on Twitter, a few pure digital-players figured very prominently in the political discussion on Twitter.
3.A high number of followers and frequent tweeting does not automatically translate into high levels of audience engagement. For example, in the UK, a number of tabloid newspapers with considerable audience reach and frequent tweeting activities saw very limited audience engagement.

Conversation.fr: https://theconversation.com/lagenda-mediatique-de-la-presidentielle-domine-par-les-medias-traditionnels-79727

Media reports:
1.http://reutersinstitute.politics.ox.ac.uk/sites/default/files/2017-11/Digital-Born%20and%20Legacy%20News%20Media%20UK%20Factsheet.pdf
2.http://reutersinstitute.politics.ox.ac.uk/sites/default/files/2017-07/Majó-Vázquez%20-%20The%20Digital-Born%20and%20Legacy%20News%20Media%20on%20Twitter.pdf

8. Ethics [Themes 1, 3, 4]
Ethics has emerged as a lateral theme throughout the SOCIAM project and has gained increasing traction over time. The topic intersects with all of the main themes of the programme, but particularly with "Privacy, Accountability and Trust", "Interaction" and "Social Computation". Ethical behaviour is also a key 'social' dimension of social machines.
The ethics theme has influenced the development of several research outputs by SOCIAM staff, including published articles, conference papers, research reports and online commentaries focusing in particular on trust, deception and governance of digital social spaces and algorithms. Synergies between the engineering and social/policy dimension of SOCIAM have also given rise to opportunities for social machine innovation which are currently being explored, such as the development of 'ethical bots' as a means of mining the Terms and Conditions underpinning online services, social platforms or apps that could be characterized as social machines.
The ethics theme also benefits from synergies with related projects in which members of SOCIAM are involved; notably the 'responsible innovation' theme of the Smart Societies project and aspects of ethics, governance and public involvement surrounding RCUK's medical and administrative 'Big Data' data initiatives.
2018:
Work has been done to elucidate the ethical properties of social machines, which will be included in the forthcoming SOCIAM book by Shadbolt et al, as well as in papers submitted to the Web Science Conference and the associated Social Machines workshop. Social machines have a number of ethical dimensions associated with them, and ethical issues affect both the internal structure of the social machine (necessitating ethical engagement which is at least partly functional), and also the external relations of the social machine with the embedding society. It has been argued in a submitted paper that virtue ethics may well be able to deal with the diverse pressures of this situation somewhat better than more traditional consequentialist and deontological approaches. In this complex ethical context, privacy can be a two-edged ethical sword, which may conceal wrongdoing as well as protecting the rights of participants in good faith.

9. Social Machine and Web Observatory [Themes 5, 6]
The development of the Web Observatory has become a core platform and a set of associated technologies in order to support the observation, analysis, and visualisation of social machine activity. Drawing together existing Web standards and recent advances in big data technologies (e.g. Storm) have been central to the development of our tools.
At the core of the Social Machine Observatory is the Web Observatory platform which orchestrates several components for data ingestion, integration, storage, and streaming. The platform, which is part of a multi-site network of Social Machines observatories, allows for social machine researchers to capture and share data on the Web, and to reuse this data to build analytical applications and visualisations. Furthermore, the network of Web Observatories allows for multi-institute data querying and access.
There have been several research contributions during the development of the Web Observatory platform, including methods for data integration and re-streaming in real-time, and methods for data storage and retrieval in the context of collecting large scale social Web data. We have also been working several other institutes including RPI (USA) in order to establish the Web Observatory Schema.org Metadata schema, which is an extension of the widely used data metadata vocab.
We have also been developing a distributed Web Observatory crawler and search engine which enables data discovery across multiple Web Observatory services, or content which has been described using the Schema.org Web Observatory vocab.
Another research stream which has become a core contribution to the SOCIAM toolkit is the development of the Web Macroscope, a research project which investigates methods to map and visualise multiple streams of Web data. Underpinned by the streaming architecture of the Web Observatory, the Macroscope was developed as a multi-screen hardware solution which uses Web-based visualisations in order to provide context to several of the curated social data streams, such as Instagram, Twitter, Wikipedia, and the Zooniverse. This has been used in several and experiments, including the recent public engagement exercise at the London Science, which involved a study of how individuals engage and understand data the production of data on the Web. The work on public engagement has also been supported by a grant from the Royal Academy of Engineering (RAEng) in order to investigate the level of public awareness of how data can be collected and used from human activity on the Web.
As an extension of the Web Observatory and Macroscope research activities, Dr Ramine Tinati and Dr Markus Luczak Roesch were awarded a grant from the Lloyds Foundation in order to research novel techniques for analysing Web data. This work contributed to a set of analytical tools available for use on the Web Observatory infrastructure.
The Web Observatory has also been critical in supporting the joint research activities between KAIST and the University of Southampton on disaster management studies. SOCIAM researchers and investigators have been involved in several workshops and joint projects which use the Web Observatory infrastructure and datasets.
The Web Observatory platform is also being extended to as an infrastructure to support sharing of IoT datasets and analytical applications for EPSRC's PETRAS project. The extended "IoT Platform" will enable the various PETRAS academic and user-partners to share and reuse the datasets collected from empirical studies of IoT stakeholders and the high velocity streaming sensor data for research and implementation analysis. The platform supports metadata standards proposed by the Hypercat consortia based in the UK. One of the projects within PETRAS, which is looking at cyber risk assessment in coupled systems, builds directly on the experiences of SOCIAM and uses social machines as an analytic lens.
The Digital Humanities Oxford Summer School, which is the largest event of its type outside North America, introduced a new week-long workshop strand in 2016 under the banner of "social humanities". This theme, which is running again in 2017, draws on SOCIAM and directly engages SOCIAM researchers. The intersection of sociam with humanities has proven to be mutually beneficial, with humanities methods being adopted in our analysis of social machines.
2018:
10. Sociograms [Themes 1, 4 , 5]
The increasing autonomy and sophistication of both computational systems and human computation methods has meant that distinctions between human and machine based computation have started to blur. Advances on both sides have also meant that we have started to see the rise of systems comprised of both human- and machine- components, taking complementary roles side-by-side. Looking at these human-machine hybrids, which we call social machines, from a systems perspective, one might ask: how does one effectively design and implement such systems in a modular, decomposable fashion to support the rapid development of heterogeneous human-machine systems that support computational interchangeability? How can we democratize the design, implementation and analysis of social machines so that users can have control over their social interactions and develop transparency and trust?

We have identified the 3 main challenges towards a framework that can accomplish this, covering the spectrum from intuitive, diagrammatic models, to deployed social machines at scale:
1.Creating intuitive, models that accurately and clearly represent complex interactions. We draw inspiration from software modelling languages, such as UML and business process modelling languages such as BPMN. The challenge is to develop intuitive primitives that can be used as building blocks for meaningful models. Composing those blocks together is also an open challenge. Finally, bridging the gap between the abstract primitives and a concrete, fully functional implementation is non-trivial.
2.Creating infrastructure on the web. We aim to reuse existing, widely adopted platforms and integrate them within the designed social machines through accessible system configuration inspired by Model Driven Development. Capturing user preferences in an intuitive yet manageable in terms of integration way, individually for each participant, is a complex challenge.
3.Analysing and debugging the social computer. This involves using simulation and verification techniques to validate the correctness of a model, analysing sociological parameters such as community adoption and incentives, and runtime monitoring.

For this purpose, we employ protocols of social coordination written in the Lightweight Social Calculus (LSC), which offers a declarative, modular way to describe executable social norms for multi-agent coordination and computation. These protocols provide building blocks for social machines, so that they can be composed, combined, and instantiated to form complex systems. In addition, our design approach focuses on the roles involved in the social machine and a graphical representation of the interactions between them.

The system design is augmented with additional annotations involving, for example, social aspects such as incentives and motivations, management of physical objects, and knowledge sharing.

The composed LSC protocols are executable in a distributed way, reflecting the autonomy of each participating agent. This can be used to analyse and evaluate the social machines using both quantifiable properties (costs, delays) and social parameters (e.g. simulating human incentives).

We have also worked on rigorous algorithms for process composition that is both diagrammatic and intuitive and with mathematical guarantees of correctness.

In conclusion, we are developing a framework for the rapid assembly of social machines. This involves a combination of a diagrammatic representation and declarative LSC protocols to describe the participating roles and their interactions. Our approach relies on a flexible implementation that can integrate heterogeneous agents, and incorporates key social aspects that drive social machines. Overall, we envision a framework where experts provide useful, general-purpose protocols, which end-users can then use as visual building blocks to implement and analyse their own social machines.

11. SOCIAM Students [Themes 1, 2, 3, 4, 5]
There are currently six doctoral students associated with the SOCIAM Project. Their research topics and progress to date are outlined below.
11.1 Ulrik Lyngs (Oxford)
Topic: Social Machines and Attention Management
ICT users navigate a complex landscape of social machines. However, many users say that they get distracted by the large amounts of functionality that are perpetually available to them. Moreover, many social machines like Facebook and YouTube explicitly compete to find the most efficient ways to capture users' attention. As a result, there is a rapidly growing market for 'anti-distraction tools' that people use to e.g. block out distracting functionality, tweak interfaces to reward or punish certain behaviours, or obtain statistics on how they use their devices. These tools include apps and browser plugins like Newsfeed Eradicator, RescueTime, and Forest, and in combination have millions of users. However, we do not know how these tools influence actual user behaviour, which is the focus of my research.
Progress: I have conducted initial experimental studies on how performance in simple cognitive tasks is affected by presence of smartphone. I am planning further experimental studies that will compare how task switching during ICT use is influenced by use of popular and potential anti-distraction tools. This will involve longitudinal tracking of actual usage of smartphones and laptops, as well as eye-tracking studies in a lab setting.
Outputs to date: My paper 'It's More Fun With My Phone: A replication study of cell phone presence and task performance' has been accepted for the Student Research Competition at the CHI 2017 conference in May, and will afterwards be archived in the ACM Digital Library (http://dx.doi.org/10.1145/3027063.3048418).
I have a related position paper, 'Curiosity, ICTs, and Attention Management' accepted for the Curiosity in Design workshop at CHI 2017, which will be presented as a poster.
2018 Progress: I have integrated a wide range of self-regulation literature from the cognitive neurosciences with HCI work on technology non-use, and used this to create a cognitive design space for self-regulation-supporting design. I have reviewed 300 existing anti-distraction tools and classified and analysed them using the cognitive framework. In the coming months I am conducting an experimental study of widespread anti-distraction browser plugins for Facebook, and planning more qualitative research in collaboration with Petr Slovak and Anna Cox (UCLIC) and the Head of Oxford University's student counselling services.
Outputs:
Extended abstract summarising the work mentioned above is accepted for CHI'18's Student Research Competition: "A Cognitive Design Space for Supporting Self-Regulation of ICT Use"
With Reuben Binns, I first-authored a paper for alt.chi at CHI'18 on the meaning of optimising for users' "true preferences": So, Tell Me What Users Want, What They Really, Really Want!

11.2 Frode Hegland (Southampton)
Topic: Augmenting the social machine of academic document interaction and lifecycle
Progress: In the process of literature review and writing 9 month report. Project is outlined on frodehegland.com/phd.html and blogged at wordpress.liquid.info. Organised the 7th annual Future of Text Symposium for investigation into this space, hosted at the University of Southampton September 2017 thefutureoftext.org
Outputs to date: Produced software prototypes for testing to generate further insights into real needs and effectiveness.

11.3 Gatis Mikelsons (Oxford)
Topic: Science as social machine, challenges in the technological age
Progress: Identified three potential areas of research, surveyed key literature, made contacts in the research community, grown competence in cutting edge implementations of machine learning and NLP
Outputs to date: My first year is primarily taught - we aim to develop a smaller research project surveying public medical data as an applied study in machine learning. This has relevance for the public-health-related component of SOCIAM.
2018 Progress: We are seeking ways to grow my competence with hands-on machine learning research before embarking on the final theme for the doctoral dissertation. A project nearing completion involves the usage of machine-learning techniques for prognosticating human mood. This research uses data gathered by mobile phones, so it can be seen as having direct social-machine relevance (health, privacy, novel interaction channels). It was started as part of an internship at the Alan Turing Institute. We are also looking to start a project in health data analysis, possibly involving deep learning techniques.
Outputs:
Gatis Mikelsons, Matthew Smith, Abhinav Mehrotra and Mirco Musolesi, Towards Deep Learning Models for Psychological State Prediction using Smartphone Data: Challenges and Opportunities, in Proceedings of the NIPS Workshop on Machine Learning for Health (2017). Preprint: https://arxiv.org/abs/1711.06350

11.4 Michal Hoffman (Southampton)
Topic: Blockchain for Democracy: improving social transparency and openness with distributed ledger technologies, using blockchains to foster spending accountability and verifiable wealth distribution insights. The main focus is on non-financial blockchain applications in democracy and governance contexts with a keen interest in analysing policy discourses related to distributed technologies
Progress and outputs: Started March 2017, working on classifying blockchains and distributed ledger technologies, planning to collaborate on the CreaBlock initiative for asset tracking and money flow transparency in the creative industry. You can also follow my progress on http://deblocracy.eu .

11.5 Henry Story (Southampton)
Topic: Co-operating systems for decentralised secure privacy preserving social networks (project page: http://co-operating.systems/ ), build on W3C Standards, W3C community work, and research done at MIT's Distributed Information Group. The aim is to build a stack of protocols and libraries that make it easy to build Linked Data apps that integrate distributed access control.
Progress: I have been building on work I contributed to over the past 10 years at the W3C, including WebID Authentication and Authorization protocols detailed at https://www.w3.org/2005/Incubator/webid/spec/ and the W3C Linked Data Platform recommendation https://www.w3.org/TR/ldp/ (2015). This includes a server and client implementation (rww-play and react-foaf) available in Open Source at https://github.com/read-write-web/ of the SoLiD specification https://github.com/solid/ which explains how these specifications can be put together (Social Linked Data) project at MIT to which I have contributed regularly.
Outputs: Currently researching mathematical/logical tools to be able to prove the strength of the linked data based security mechanism used in SoLiD, in order to extend them, make them more reliable.
2018 Progress: The first year report contained an overview of the problem, and introduction to work showing the algebraic nature of RDF using category theory, and an initial description of the web, thus also of linked data, as a coalgebraic system, and initial pointers to articles that could be useful to prove security aspects of protocols.

11.6 Philip Sheldrake (Southampton)
Topic: Networked Agency. Seeking to define personal agency in the digital age, and developing normative architecture to maintain or enhance it. Framed in terms of agencement, trust, privacy, and decentralised and distributed technological architectures.
URL: http://www.philipsheldrake.com/research/
Progress and outputs: Literature review spanning sociology, privacy, and human-computer interaction. Identification and collation of software projects aiming to improve a conception of personal agency, and development of an ecosystem map to facilitate cluster analysis.
Exploitation Route This is covered under "What have you discovered or developed through the research funded on this grant" section.
Sectors Aerospace, Defence and Marine,Communities and Social Services/Policy,Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Electronics,Financial Services, and Management Consultancy,Healthcare,Leisure Activities, including Sports, Recreation and Tourism,Government, Democracy and Justice,Culture, Heritage, Museums and Collections,Pharmaceuticals and Medical Biotechnology,Security and Diplomacy,Transport

URL http://www.sociam.org
 
Description The impacts below relate to the research headings described under the "Key Findings" Section. Citizen Science and Crowdsourcing Citizen Science: SOCIAM has had a significant impact on the Zooniverse citizen science platform. Zooniverse is the world's largest and most popular platform for citizen science, currently hosting over 50 live projects across multiple fields of research. It has tens of thousands of active volunteers worldwide, with over 1.5 million registered volunteers in total taking part throughout the last decade. SOCIAM studies and experiments have helped the Zooniverse better understand its community, looking at their motivations, structure, and behaviour. This in turn has enabled the Zooniverse to create a better platform that suits the specific needs and behaviours of their crowd. 2018: In the last year SOCIAM has continued to have a significant impact on its main citizen science project partner, The Zooniverse. One key moment was the development and running of a bespoke dashboard for the live television events BBC Stargazing Live (UK) and ABC Stargazing Live (Australia). The SOCIAM dashboard allowed Zooniverse team members and scientists to observe their community of volunteers in real time during a massive event like never before, enabling them to interact with the social machine in a new way. Importantly it also helped the team quickly identify interesting findings being produced by the project (in this case, planets around distant stars!). Finally, the data collected during these live events is now being studied by SOCIAM to help unveil even more insights into how the Zooniverse social machine operates. Crowdsourcing: As well as classical citizen science we have explored the effects of motivation, incentives, and task design in contexts that use other forms of crowdsourcing, in particular paid microtask platforms. We looked at ways to make paid crowdsourcing more effective and rewarding for the people involved by testing how gamification enhances crowd experience. We have also studied how different incentives related to game mechanics influence retention - keeping individuals engaged the tasks (additional rewards, feedback on task performance, feedback on performance compared to others) have on retention. Data Citation: Our paper on data citation was made the lead article in CACM (September 2016) and the proposals have been partly implemented in two curated databases. Health Social Machines We have worked closely with clinical teams in the NHS and applied our methodology to help them record and rethink their patient pathways and social interactions. Our most notable collaborations are the following: - Modelling HIV care pathways in collaboration with the sexual health care team at NHS Lothian. - Workflow modelling for the care of burns in Scotland in collaboration with the Care Of Burns In Scotland (COBIS) network. - Designing a social machine for the Heart Manual Service, which is led by NHS Lothian and delivered by several clinical teams worldwide. - Carrying out a practical experiment between data safe haven jurisdictions in Scotland and Trentino. 2018: Our work with the eDRIS team has received positive feedback. Our report has helped inform the entire team on the effectiveness of their current operating procedures, giving a detailed overview of the workflow and an analysis of its gaps and bottlenecks. The eDRIS team has acknowledged the usefulness of our report and proposed solutions and aims to use it as a key driver for change and improvement going forward. This has the potential to vastly improve the ways in which NHS data is shared with researchers by minimizing the time and effort required without compromising the privacy and public benefit safeguards they currently have in place. Machine Intelligence and the Mind of Society Some informal collaboration has occurred with IBM UK to explore the implications of the work into digital immortality, data cryogenics, neuromorphic computing and sentient machines. Provenance and Annotation The initial driver for this work was the Human Brain project, with which the findings were shared. https://www.humanbrainproject.eu/ 2018: While data citation is hardly "non-academic", it is an application that is quite general and extends well beyond the scope of SOCIAM research. The results are already in use in some curated databases. Supporting Privacy and End-User Data Management Social Machines On the basis of the X-Ray work, we are launching an Ethical Data Pledge, to encourage developers to create apps which respect privacy and data ethics. We are also aiming to share the results of our large scale measurement with data protection and competition regulators to help inform their activities with respect to third party tracking. Our work on curation of online identity through deception was featured on a BBC Radio 4 Programme, "The Online Identity Crisis", which first aired on Sunday 11 September 2016, and later twice on 9 December 2016. Two SOCIAM senior research fellows, Max Van Kleek and Dave Murray-Rust, were interviewed in the programme as online identity privacy experts in the context of the WebSci paper on online identity discussed above. The key findings from our analysis have been published as a full article in the December 2016 issue of "Inspired Research", a magazine that is edited and published twice a year by the Department of Computer Science, University of Oxford, to showcase world leading research outcomes from the department, for the general public. Our work has also drawn interest from the Oxford University, and been invited to be part of the first ever Oxford University European Researcher's Night, to take place on 25th September 2017. European Researchers' Night is a Europe-wide celebration of academic research for the public, supported by the European Commission. This will be a city-wide public-facing celebration of research happening at the University. The event aims to seek to engage a diverse audience through activities such as live experiments, debates, bite-sized talks and other activities. A six month digital and media engagement campaign will lead up to the event, and the university aims to reach over 100,000 public audience before the event and attract over 10K public visitors on the night of the event. Work on privacy theory also informed the Global Commission on Internet Governance, chaired by Carl Bildt, and commissioned by the Centre for International Governance Innovation (CIGI) and Chatham House in response to trends toward fragmentation of the Internet, with the aim of offering guidance on how to address new challenges as they emerge. SOCIAM researchers produced a chapter on the implementation of the controversial Google Spain/right to be forgotten decision of the Court of Justice of the European Union. Papers in the Data Protection Law review and IEEE Internet Computing extended the arguments. The INDX Data platform, was demonstrated at various venues, including the ReDecentralise Conference in 2015 in London, and to the W3C at MIT CSAIL. The INDX repository has been cloned over 200 times, forked 7 times and is being watched by 25 individuals, and has now been registered in the project of Alternative Internets. 2018: The privacy theories developed during SOCIAM were used to inform a Royal Society/British Academy consultation about a Data Stewardship Governance Council. A paper was commissioned from O'Hara in 2017 to present at a seminar on the topic, to explore the details of a key recommendation of the Royal Society/British Academy report into Data Governance. The implications of our work on smartphone app privacy for children have led to several outreach efforts. We hosted an interactive stall at the Oxford Super Science Festival, where 400 families attended and were able to interact with our personalised smartphone app privacy interface and learn about privacy risks. In response, we established links with several local organisations including Oxford Coding Club (at Wolvercote Primary School), Oxfordshire Safeguarding Children Board, and Safer Internet day (Watlington Primary) who have requested to use our materials in their education and safeguarding efforts. We were invited by Parent Zone to advise on parents how to take additional considerations of privacy risks when choosing smart toys or smart home devices (https://parentzone.org.uk/article/everything-you-need-know-about-digital-assistants). We also contributed to policy-making processes relating to privacy and data protection in a variety of venues. We provided advice to the UK Parliamentary Digital Service on the technical capacity and risks associated with third party analytics (derived from our X-Ray project), which has helped support their efforts to move to an in-house analytics platform to better protect citizens' privacy while browsing UK Parliament documents online. We provided consultation feedback to ICANN (The Internet Corporation for Assigned Numbers and Names), on upcoming changes to the domain name system registration information, in order to protect the ability of researchers to access key information for privacy and security research. (https://www.icann.org/resources/pages/gdpr-legal-analysis-2017-11-17-en). We were invited to provide evidence and guidance based on our privacy research to a project funded by the German Federal Ministry of Education and Research on user profiling in smart homes (ABIDA). Our work on algorithmic fairness and accountability led to invitations to participate in the revision of the Government Digital Service and Cabinet Office's Data Science Ethical Framework, and an invited talk on explaining automated decisions by the Centre for Research into Information, Surveillance and Privacy (CRISP) and the Scottish Information Commissioner's Office in March 2018. We provided an invited briefing presentation to the British Standards Institute/NESTA on the need for standards for ethical artificial intelligence in October 2017. Finally, this work also inspired an interactive educational session on explaining algorithmic decisions at the Mozilla Festival in November 2017 entitled 'Why Does Computer Say No?'. Social Machine and Web Observatory The Web Observatory has become the core platform for many of the SOCIAM observational studies performed across several themes. The underlying data streaming architecture has powered the Web macroscope, which has had direct impact to the non-academic community. We have run several public engagement activities interested in interpreting and analysing big data and social machine activity. The development of the Web Observatory schema.org extension has contributed to the development of services outside the SOCIAM projects, included organisations such as the Library of Congress, where services have been built in order to enrich and improve discoverability of their dataset and archives. We have also been working with other institutes such as Stanford in order to enrich institute data. 2018: Ethics Ethics and Group Privacy has been the focus of a Workshop to be held in 2018 in Amsterdam at the ACM Web Science Conference, organised largely by SOCIAM researchers.
First Year Of Impact 2015
Sector Communities and Social Services/Policy,Digital/Communication/Information Technologies (including Software),Education,Electronics,Healthcare,Leisure Activities, including Sports, Recreation and Tourism,Government, Democracy and Justice
Impact Types Societal

 
Description Participation in revision of Government Digital Service and Cabinet Office's Data Science Ethical Framework
Geographic Reach National 
Policy Influence Type Membership of a guideline committee
 
Description Royal Society/British Academy consultation about a Data Stewardship Governance Council
Geographic Reach National 
Policy Influence Type Membership of a guideline committee
 
Title Data Safe Havens 
Description In this project, we deal with the intricate issue of secure methods and infrastructures for sharing of healthcare data. Current medical records hold great opportunity for development of precision and stratified medicine by making them available for use in data-intensive experiments, but this potential is a long way from being realised. A key architectural problem remaining to be solved is how to maintain control of patient data within the governance of local data jurisdictions, while also allowing making the data available for experiments which, because of the need to scale to large population sizes, may require analyses across several jurisdictions. 
Type Of Technology Software 
Year Produced 2018 
Impact Data Safe Havens are trialed for actual use in healthcare research and have resulted in four publications: Robertson, D., Giunchiglia F., Pavis S., Turra E., Bella G., Elliot E., et al. (2016). Healthcare Data Safe Havens: Towards a Logical Architecture and Experiment Automation. The Journal of Engineering. Tursunbayeva, A., Bunduchi R., Franco M., & Pagliari C. (2016). Human resource information systems in health care: a systematic evidence review. Journal of the American Medical Informatics Association. ocw141. Cucciniello, M., Lapsley I., Nasi G., & Pagliari C. (2015). Understanding key factors affecting electronic medical record implementation: a sociotechnical approach. {BMC} Health Services Research. 15, Pinciroli, F., & Pagliari C. (2015). Understanding the evolving role of the Personal Health Record. Computers in Biology and Medicine. 59, 160-163. 
URL https://sociam.org/data-safe-havens
 
Title INDX 
Description INDX is the world's first Social Personal Data Store (SPDS) that aims to give people control over their data by enabling them to build decentralised social applications in which their personal data remains on their own devices. Unlike traditional personal data stores, in which applications store data in one place, INDX supports applications that work across multiple people's INDX data stores, to support distributed data governance. At its core INDX relies on a fully versioned graph data store that can store RDF, JSON, and relational data models. 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact Many publications have resulted from the INDX work, including "7 billion home telescopes: observing social machines through personal data stores M Van Kleek, DA Smith, R Tinati, K O'Hara, W Hall - Proceedings of the 23rd International Conference on the WWW. (WWW 2014)" "Van Kleek M., OHara K. (2014) The Future of Social Is Personal: The Potential of the Personal Data Store. In: Miorandi D., Maltese V., Rovatsos M., Nijholt A., Stewart J. (eds) Social Collective Intelligence. Computational Social Sciences. Springer, Cham"; Van Kleek, Max, et al. "Social personal data stores: the nuclei of decentralised social machines." Proceedings of the 24th International Conference on World Wide Web. ACM, 2015. Van Kleek, Max. "Not in my castle: the case for the web, not app platforms, as a model for digital home ecosystems." (2015). 
URL https://github.com/sociam/indx
 
Title LSC - Lightweight Social Calculus 
Description LSC, or Lightweight Social Calculus is a modified version of a modified version of the Lightweight Coordination Calculus (LCC) , that is an executable, declarative process calculus for interactions between heterogeneous agents. Use of LSC is based on the enaction of protocols, which give a minimally intrusive framework for defining patterns of communication without overly constraining the internal knowledge and decision making architecture of the actors involved. Since protocols can be mechanically enacted, they provide the potential for mixed initiative human computer interaction and human computation applications. Creating a protocol which represents human interaction allows computational agents to join in on an equal footing with the humans. Making protocols first class objects allows for their exchange and manipulation. It means that communities can discover interactions which suit their needs and adopt them, after making any modifications necessary. Protocols can be transparent to users, indicating what the bounds and rules of the interactions are, leading to a greater facility for understanding the implications of engaging. This does, of course, on being able to represent the protocols in a manner which makes sense to users. There is a clear separation of concerns between the structure of the interac- tion and the mediums and communities where it is enacted. The interaction can then be framed in a manner which is most appropriate to the community in question, and integrated into their existing practice by connecting the in- teraction model with the technical platforms already in use-essentially, the interaction becomes key, rather than the substrate on which it is performed. A protocol is amenable to formal techniques. For example, properties can be verified, such as the flow of data through the interaction, termination criteria, and other qualities which relate to privacy and security. n order for this to work, humans need some way to engage with protocols as they are enacted. One mechanism for doing this is to create an interface with which people can engage, whether through webpages or mobile devices, or mediated through APIs of some sort. This is the means of engagement with which we're familiar from earlier LCC work. Another possibility is to find a way to run the protocols alongside existing interaction, annotating their behaviour with formal structures. We call this approach 'Soft Institutions', where the formal edges of electronic institutions are softened to provide natural, human ways for people to engage with them. LSC is a declarative, executable specification designed to give enough structure to manage fully distributed interactions by coordinating message passing and the roles which actors play, while leaving space for the actors to make their own decisions. It is derived from LCC with extensions designed to make it more amenable to mixed human-machine interactions; in practice, this means having language elements which cover user input, external computation or database lookup and storing knowledge and state. 
Type Of Technology Software 
Year Produced 2016 
Impact Several SOCIAM projects have been built on LSC, including Sociograms, Integrated Care Pathways, and Data Safe Havens. 
URL http://homepages.inf.ed.ac.uk/ppapapan/?p=81
 
Title ProvToolbox 
Description Provenance is a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing. In particular, the provenance of information is crucial in deciding whether information is to be trusted, how it should be integrated with other diverse information sources, and how to give credit to its originators when reusing it. In an open and inclusive environment such as the Web, where users find information that is often contradictory or questionable, provenance can help those users to make trust judgements. PROV is a set of W3C specifications defining a model, corresponding serializations and other supporting definitions to enable the inter-operable interchange of provenance information in heterogeneous environments such as the Web. ProvToolbox is a Java library to create Java representations of the PROV data model (PROV-DM), and convert them between RDF, XML (in PROV-XML format), text (in PROV-N format), and JSON (in PROV-JSON format). 
Type Of Technology Software 
Year Produced 2013 
Open Source License? Yes  
Impact ProvToolbox is the basis of community services for provenance translation and validation at https://provenance.ecs.soton.ac.uk. ProvToolbox was used in the inter operability phase of the W3C Provenance Working group https://www.w3.org/TR/prov-implementations/ 2016 contribution: templating system 
URL http://lucmoreau.github.io/ProvToolbox/
 
Title SOCIAM Web Macroscope 
Description The Web Macroscope is an infrastructure to enable real-time views of social machines. Macroscope installations in Southampton, Oxford, and the Digital Catapult in London demonstrated the use of high-resolution visualisations spanning high-resolution display arrays, to that enable web scientists and social machine researchers to see, in real time, the "pulse" of the Web and other Social Machines. The Macroscope has specific lenses for Wikipedia, The Zooniverse, Twitter, which allow viewing contributions and changes to be seen on these social machines as they occur. Using macroscopes, web scientists can identify hotspots (spikes) of activity as they occur across multiple social machines, identify conflict on social machines (such as wikipedia edit wars or spambot attacks on Twitter), and observe short-term temporal trends from their germination. 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact The Macroscope was installed at the London Science Museum, and showcased at multiple locations including the Web We Want Festival and Digital Catapult. 
URL https://sociam.org/web-macroscope
 
Title Sociograms: Social Machines for All 
Description This project enabled the rapid assembly of social machine using a visual programming interface, and the specification of social interactions formally modeled in LSC (Lightweight Social Calculus). 
Type Of Technology Software 
Year Produced 2018 
Impact Two publications: Dave Murray-Rust, Alan Davoust, Petros Papapanagiotou, Areti Manataki, Max Van Kleek, Nigel Shadbolt, Dave Robertson Towards Executable Representations of Social Machines, 10th International Conference on the Theory and Application of Diagrams 2018 Petros Papapanagiotou, Alan Davoust, Dave Murray-Rust, Areti Manataki, Max Van Kleek, Nigel Shadbolt, Dave Robertson. Social Machines For All, 17th Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 2018. 
URL https://sociam.org/sociagrams
 
Title The Digital Heart Manual 
Description An important case study on how to design social machines for healthcare contexts, this project investigates the Heart Manual Programme, a home-based cardiac rehabilitation programme. Through various participatory design methods, our work on this project elicited the requirements for a social machine in this space and we carried out a proof of concept evaluation with a prototype. Our prototype system was largely positively received and rated highly on system usability. Our work suggests that adopting a participatory approach where stakeholders are active, equal participants throughout the design process leads to more usable, likeable, and thus more successful social machines. 
Type Of Technology Webtool/Application 
Year Produced 2017 
Impact Two publications have resulted, including: Hanschke, V., Manataki A., Alexandru C. Adriana, Papapanagiotou P., Deighan C., Taylor L., et al. (2017). Designing a Social Machine for the Heart Manual Service. Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2017). 435-440. Deighan, C., Michalova L., Pagliari C., Elliott J.., Taylor L., & Rinaldi H.. (2017). The Digital Heart Manual: A pilot study of an innovative cardiac rehabilitation programme developed for and with users. Patient Education and Counseling. 100, 
URL https://sociam.org/the-digital-heart-manual
 
Title X-Ray Core 
Description A system for acquiring and analysing Smartphone apps at scale from major app market places to determine 1st and 3rd party data controllers. This infrastructure provides services for a variety of apps and interfaces to both visualise the data networks behind Smartphone apps and facilitate the control of individual data privacy preferences. 
Type Of Technology Software 
Year Produced 2017 
Impact Resulting first very large scale analysis of actual personal information exposure for almost 1 million applications. 
URL https://sociam.org/mobile-app-x-ray
 
Title Zooniverse (Panoptes) 
Description The Zooniverse is the world's largest and most successful citizen science platform to date, having over 1 million registered volunteers and contributions to hundreds of academic publications spanning the sciences and humanities. It initially pioneered fundamental methods and best practices in citizen science; but as it has developed and matured it has expanded its ambitions towards enabling citizens to be able to create, launch, and manage projects themselves. Supported by SOCIAM and contributions from other projects, Zoonvierse launched Panoptes, which is a tool for enabling non-specialists to create and launch citizen science projects themselves. Panoptes is now actively used and creates hundreds of candidate projects for Zooniverse each month, and continues to evolve to enable greater citizen autonomy. 
Type Of Technology Webtool/Application 
Year Produced 2017 
Open Source License? Yes  
Impact Panoptes continues to generate hundreds of new projects each month, which are then used by Zooniverse's active citizen volunteer army. 
URL http://zooniverse.org/
 
Description Age of Social Machines: Town Hall Meeting at Microsoft 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Industry/Business
Results and Impact The Town Hall Meeting at Microsoft HQ London was an all-day event designed filled with plenaries and demonstrations of results from the SOCIAM project that took place in central London at Microsoft's HQ. The event featured a live demonstration of the Social Machines (Web) Macroscope, the Web Observatory, and the INDX personal data platform. We also had a plenary from our industry partner, Microsoft, who discussed work on cloud computing as it relates to Social Machines.
Year(s) Of Engagement Activity 2015
 
Description Digital Humanities at Oxford Summer School 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact The Digital Humanities Oxford Summer School (DHOxSS) offers training to anyone with an interest in the Digital Humanities, including academics at all career stages, students, project managers, and people who work in IT, libraries, and cultural heritage to come together for a 1-week intensive set of workshops and mini-courses on specific digital humanities related topics.

For two years running, the SOCIAM project has partnered with DHOxSS to deliver a special track called SOCHUMS (Social Humanities) in which SOCIAM researchers teach participants how to apply social machine theory and practice to the study and design of online social systems. Course participants gain exposure and hands-on experience with methods for understanding social machines (such as posopography) and with designing and building social machines (such as with Panoptes and Sociograms).
Year(s) Of Engagement Activity 2016,2017
URL http://www.dhoxss.net/
 
Description Enabling Provenance on the Web: Standardization and Research Questions (Keynote at International Conference on WWW/INTERNET 2015) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Enabling Provenance on the Web: Standardization and Research Questions

Provenance is a record that describes the people, institutions,
entities, and activities, involved in producing, influencing, or
delivering a piece of data or a thing in the world.

Some 10 years after beginning research on the topic of provenance, I
co-chaired the provenance working group at the World Wide Web
Consortium. The working group published the PROV standard for
provenance in 2013.

In this talk, I will present some use cases for provenance, the PROV
standard and some flagship examples of adoption. I will then move
onto our current research area in exploiting provenance, in the
context of the Sociam, SmartSociety, ORCHID projects. Doing so, I will
present techniques to deal with large scale provenance, to build
predictive models based on provenance, and to analyse provenance.
Year(s) Of Engagement Activity 2015
 
Description IPAW 2006-2016: Retrospect and Prospect of Provenance (Keynote at International Provenance and Annotation Workshop ipaw'16) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact IPAW 2006-2016: Retrospect and Prospect of Provenance

IPAW, the biannual International Provenance and Annotation Workshop
series, was launched in 2006. We celebrate its 10th anniversary in
2016. During those 10 years, the field of provenance has seen a
tremendous amount of development. Among the 30 events I identified, I
will highlight some successes, such as the Provenance Challenge and a
standardisation activity at the World Wide Web Consortium. What is
the next step for the provenance community? By reviewing existing
applications of provenance and tooling, and by discussing some
research activities, I will attempt to map future directions for the
provenance community.
Year(s) Of Engagement Activity 2016
URL http://www2.mitre.org/public/provenance2016/ipaw.html
 
Description London Science Museum 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Demonstration of the SOCIAM Macroscope, a multi-screen hardware solution which uses Web-based visualisations in order to provide context to several of the curated social data streams, such as Instagram, Twitter, Wikipedia, and the Zooniverse. The public engagement exercise involved a study of how individuals engage and understand the production of data on the Web.
Year(s) Of Engagement Activity 2016
 
Description MozFest - session on explaining algorithmic decisions 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Participants in the workshop learned how to detect biases and explain the decisions of the machine learning models behind many of the systems that affect our lives. In groups, they investigated models trained on specific datasets to uncover how they work and their potential biases. Participants came away with a greater understanding of how biases can arise in algorithmic systems, to help them to advocate for responsible use of data in their communities, companies and platforms.
Year(s) Of Engagement Activity 2017
URL http://www.cs.ox.ac.uk/news/1412-full.html
 
Description Oxford Super Science Festival 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact We hosted an interactive stall at the Oxford Super Science Festival at the Oxford Museum of Natural History, where 400 families attended and were able to interact with our personalised smartphone app privacy interface and learn about privacy risks. This included X-Ray Core, X-Ray Refine, and new work pertaining to privacy perceptions and risks associated with apps for children.The implications of our work on smartphone app privacy for children have lead to several outreach efforts. We hosted an interactive stall at the Oxford Super Science Festival, where 400 families attended and were able to interact with our personalised smartphone app privacy interface and learn about privacy risks. In response, we established links with several local organisations including Oxford Coding Club (at Wolvercote Primary School), Oxfordshire Safeguarding Children Board, and Safer Internet day (Watlington Primary) who have requested to use our materials in their education and safeguarding efforts. We were invited by Parent Zone to advise on parents how to take additional considerations of privacy risks when choosing smart toys or smart home devices
Year(s) Of Engagement Activity 2017
URL http://www.ox.ac.uk/event/super-science-saturday-1
 
Description Presentation at Brussels Privacy Symposium 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Policymakers/politicians
Results and Impact The 2nd Annual Brussels Privacy Symposium is a global convening of practical, applicable, substantive privacy research and scholarship. The Symposium will draw on the expertise of leading EU and US academics, industry practitioners and policy makers to produce an annual workshop highlighting innovative research on emerging privacy issues.
Year(s) Of Engagement Activity 2017
URL https://fpf.org/brussels-privacy-symposium-november-6-2017-brussels-belgium/
 
Description Presentation at JP Morgan TechFest, Bournemouth, Enabling Provenance on the Web: Standardization and Research Questions 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Industry/Business
Results and Impact Enabling Provenance on the Web: Standardization and Research Questions

Provenance is a record that describes the people, institutions,
entities, and activities, involved in producing, influencing, or
delivering a piece of data or a thing in the world.

Some 10 years after beginning research on the topic of provenance, I
co-chaired the provenance working group at the World Wide Web
Consortium. The working group published the PROV standard for
provenance in 2013.

In this talk, I will present some use cases for provenance, the PROV
standard and some flagship examples of adoption. I will then move
onto our current research area in exploiting provenance, in the
context of the Sociam, SmartSociety, ORCHID projects. Doing so, I will
present techniques to deal with large scale provenance, to build
predictive models based on provenance, and to analyse provenance.
Year(s) Of Engagement Activity 2015
 
Description X-Ray Refine at ICT Forum Conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Other audiences
Results and Impact The ICT Forum exists to serve all whose job function includes support of ICT throughout the Collegiate University of Oxford. The annual conference and termly meetings showcase activities that are supported or enabled by the ICT services provided by staff at the University. At this meeting, we showcased the X-Ray Core and Refine projects that empower end-users to take control of their privacy by showing them the first and third-party data collection activities contained within smartphone apps. Attendees were invited to ask questions and interact with the Refine prototype.
Year(s) Of Engagement Activity 2017
URL https://www.ictf.ox.ac.uk/conference/conferences