myGrid: An OMII-UK Node (mymes: myGrid middleware for e-Scientists) (Services and Middleware for e-Science)

Lead Research Organisation: University of Manchester
Department Name: Computer Science

Abstract

Progress in science is largely made through experiments, whereby the properties and behaviour of naturally occurring or manufactured artefacts are studied in a controlled environment. However, a single experiment normally only tells part of the overall story, and real improvements in understanding normally emerge by integrating and comparing the results of many experiments. As relevant experiments may be carried out at different places, be extremely numerous, and involve large amounts of data, it is likely that many potential discoveries are missed because of difficulties accessing and interpreting diverse experimental results. e-Science seeks to use computational tools to assist scientists in making sense of, rather than simply being intimidated by, the increasing amount of scientific data that is being produced in research labs throughout the world. e-Scientists need effective tools for accessing experimental data, analysing it, and for managing the results of these analyses, and linking them up with other scientists results. e-Science is a research area in its own right, in which techniques are explored for making sense of scientific data. The software used to make computers, databases, tools and people cooperate and hence run in-silico experiments is called middleware. The Open Middleware Infrastructure Institute (OMII) is an organisation that supplies middleware for scientists. It also collects it from other scientists to reengineer it to a high standard so people who didn't develop it can easily use it too. One of the long-established projects that have produced popular and useful middleware for Life Scientists (that is biologists, chemists, medics and so on) is myGrid. myGrid allows scientists to bring together existing globally shared or locally kept e-Science data sets and analyses to learn new lessons using workflows. There are over 1000 tools and data sets available that were not designed to work together. The workflows make them work together. These workflows can then be shared between scientists who want to do the same or adapt them, or want to understand how results were generated. myGrid stores the history of what happened in an in-silico experiment, why and how it was performed and by who (its provenance). It also helps manage and link up results and find other people's tools. It thus allows new hypotheses to be tested over existing experimental results, and supports scientists in managing and making sense of the results of these in-silico experiments. The alternative is to do this manually.myGrid has been very successful and has many users all over the world, in particular its workflow workbench Taverna. It turns out to be useful for many scientists other than Life Scientists. However, the software needs to be made more robust and easier to use, properly tested and documented, and extended in the light of early feedback by its users. By becoming part of the OMII-UK consortium, the myGrid middleware can looked after and developed to support its current users and many other scientists so they can confidently use it and rely on it. By close collaboration with OMII-UK: OGSA-DAI and the OMII Hub at Southampton, we can draw one each others experiences to strengthen our software engineering processes. myGrid can deliver scientists-oriented that use the low-level plumbing middleware of OMII. We will also develop distributed query technology to link our Taverna workflows with distributed queries from OGSA-DAI. In the longer term we will have a coordinated integrated roadmap of well-engineered e-Infrastructure for UK researchers and industry.

Publications

10 25 50

publication icon
Belhajjame K (2008) Automatic annotation of Web services based on workflow definitions in ACM Transactions on the Web

publication icon
Belhajjame K (2006) The Semantic Web - ISWC 2006

publication icon
De Roure D (2009) Software Design for Empowering Scientists in IEEE Software

publication icon
Goble C (2008) Data curation + process curation=data integration + science. in Briefings in bioinformatics

publication icon
Goderis A (2008) Workflow Discovery Requirements from E-Science and a Graph-Based Solution in International Journal of Web Services Research

publication icon
Hull D (2006) Taverna: a tool for building and running workflows of services. in Nucleic acids research

publication icon
Lanzén A (2008) The Taverna Interaction Service: enabling manual interaction in workflows. in Bioinformatics (Oxford, England)

 
Description The myGrid e-Science pilot project (EPSRC Grant GR/R67743/01 ) produced software for finding and linking up resources into automated, multi-step data analysis pipelines called workflows. A workflow system enables scientists to define workflows and run them - calling resources; managing data; and keeping logs of results. Workflows explain how results were made, and can be reused for new problems.

The myGrid eScience pilot, and in particular its Taverna workflow management system, was a great success. The focus of the myGrid OMII-UK node was to migrate from a research prototype to a high quality, robust, and supported product, and to enable and facilitate widespread adoption.

Taverna is now one of the most widely used, general purpose, open source scientific workflow management systems in research, and immediately after the OMII-UK award (2010) was the prime general toolkit. Taverna has since entered the Apache Foundation - see software entry.

OMII-UK myGrid Node results:
•Taverna 1.0 moved from a research prototype used by a few enthusiasts and early adopters working closely with the team to a production quality suite of software adopted by over 350 organisations worldwide.
- Taverna 1 to a production quality platform, Taverna 1.x
- Taverna 2.X, a completely re-engineered production quality platform for workflow-based research.

• Three community leading resources spawned: for scientific workflows (myExperiment), web services in the life sciences (BioCatalogue), and SEEK (data and models in Systems Biology) which have their own funding streams.

• built and sustained a vibrant, multidisciplinary team of working scientific informaticians, mainly from the life sciences, and software engineers that work with local and international researchers, infrastructure providers (NGI, EGI, Globus, Cloud), community platform providers (Bioeclipse, CDK, Galaxy, SADI and BioMart), service providers (EMBL-EBI, NCBI, DDBJ, RCS) and boutique providers (e.g. using REST and SOAP interfaces). Any web service, commandline or scripting tool can be incorporated into the workflow, and workflows can be embedded in applications.

• established a reputation as a leading platform for bioinformatics, and expanded out to other disciplines, including biodiversity, social science, chemistry, astronomy, heliophysics, digital preservation, music, and engineering.

- over 1500 citations at the end of award, and now over 2600 citations, of the three most widely cited Taverna papers, including https://doi.org/10.1093/nar/gkl320 which has over 1000 citations.

- routine reference as a standard community platform, including recent Science and Nature commentaries, and used by EPSRC as a showcase in publicity material.

By way of example, scientific results and experiments using Taverna (at the time) include: Understanding and increasing the tolerance of Trypanosomiasis of zebu cattle for crop cultivation and dairy and beef production; colonic transcriptional profiling in resistance and susceptibility to trichuriasis; Single Nucleotide Polymorphisms; NMR-based metabolomics data analysis; Sequencing pipelines for computational and statistical genomics; Systems Biology model construction and validation. Many more outcomes have arisen since.

OMII-node was part of the wider Open Middleware Infrastructure Institute UK (http://www.omii.ac.uk).

- OMII-UK's prime mission was to identify, cultivate, promote adoption and sustain software important to all disciplines of research, developed by the programme or other means, in the UK and outside. Taverna is the prime software product sustained by the original partners that continues to thrive.
In four years of operation OMII-UK evolved from an emphasis on software delivery to one of software adoption, developing effective and sustainable pathways to impact for software developed by UK funding agency investments, and also for UK research by creating the means to adopt software developed worldwide. It has acted as the national focal point for research software sustainability, and has worked with UK, European and international funding agencies to provide a sustainable future for e-Research.

OMII-UK was highly regarded in the International Review of RCUK International Review of e Science [1]:
"OMII (Open Middleware Infrastructure Institute) is a serious effort to provide professional support for reused software. The staffing includes around 10 full-time software engineers at Manchester, 7 at Edinburgh, and 6 at Southampton. This is a model of professional software development and maintenance that needs to expand. The OMII is a unique service organisation with global importance and impact.
"The Panel believes that the UK e-Science Programme is in a global leadership position in... workflow environments (Taverna), and Grid architecture deployment (OGSA DAI)."

OMII-UK's legacy is more than software. It has demonstrated best practice, informing activities in the international arena, and has raised awareness of software sustainability within the research community and funders. Sustainability is now being written into major software calls by UK research councils, and it is increasingly accepted that software is a valid pathway to impact.

[1] Report of the International Panel for the 2009 Review of the UK Research Councils e-Science Programme, http://www.epsrc.ac.uk/research/intrevs/escience/

This award lead to the UK's Software Sustainability Institute (http://www.software.ac.uk)
Exploitation Route - Taverna is free and open source, and has been widely adopted (see Software and Technical Products entry), and forms the basis of major EU infrastructure projects such as VPH-Share, BioVeL, SCAPE and HELIO. It is estimated to have over 4000 users, and at anyone time over 1000 users are invoking its workbench (which calls home) - this does not include invocations from applications or portals using the Taverna Server. The work on development of Taverna lead to numerous adoptions by projects as the underpinning platform for research infrastructure, both partnered and independent of the prime investigators, including follow-on research, commercial contracts and EU infrastructure awards for the investigators.

- The OMII-UK Taverna node trained over 900 researchers in the use of Taverna. We have since trained over 1000 researchers.

- Taverna is accepted as an Apache Incubator project and is now an open development project as well as an open source one. It is now known as Apache Taverna.

- The OMII-UK concluded operation in December 2010.

The UK's Software Sustainability Institute was founded on the experience and partnership of OMII-UK and has thrived. It is now established as a world-leader in software sustainability practices and know-how (http://www.software.ac.uk).
Sectors Aerospace, Defence and Marine,Chemicals,Communities and Social Services/Policy,Digital/Communication/Information Technologies (including Software),Education,Environment,Healthcare,Manufacturing, including Industrial Biotechology,Culture, Heritage, Museums and Collections,Pharmaceuticals and Medical Biotechnology

URL http://www.mygrid.org.uk
 
Description The Taverna Workflow Management System, taken from prototype to production, has been widely adopted internationally. At any one time over 1000 users have invoked the workbench tool, and this does not include the applications such as Peptide Picker or the Portals such as the Biodiversity Virtual eLaboratory, that use the server backend. It is estimated that Taverna has had over 4000 users. Over 420 organisations have used or use Taverna, the vast majority independently of the investigators. Taverna was showcased by the EPSRC for its 20th Anniversary special issue, http://www.epsrc.ac.uk/newsevents "MyGrid's widespread use across the country clearly justifies its description as part of the UK's scientific e-infrastructure - a resource hundreds of teams resort to - to support their research." Other products spawned by this grant and its sister grant EP/C536444/1 (PLATFORM: MyGrid- A Platform for eBiology) have spawned their own user bases and funding streams. These include myExperiment, BioCatalogue and SEEK4Science. The OMII-UK's experience in software adoption and software sustainability lead to the founding of the Software Sustainability Institute UK - now acknowledged as the leading institute of its kind and refunded (http://www.software.ac.uk) OMII-UK worked with 400+ organisations across 35+ countries globally, 11 out of top 15 research intensive HEIs in UK use the software or services to enable their research. OMII-UK contributed software and knowledge to major international infrastructures across the world (including Teragrid, BIRN, caBIG, EC ESFRI projects, EGI and DEISA, and NAREGI). The OMII-UK model and processes have been used in Australia, China, Europe and Korea, as well as influencing new developments in the USA (NSF workshops on CyberInfrastructure). OMII-UK's two major surveys of UK researchers' requirements, each with 40+ interviewees, identified prevailing trends resulting in the first comprehensive analysis of the barriers and enablers to e-Infrastructure adoption in the UK. We ran four heavily subscribed Commissioned Software Programme calls (44 submissions, 13 funded) as well as responsive-mode funding (10 funded) supporting the development of new and pre-existing software products with 20 developer groups covering research-led UK Universities and including 4 international universities. We identified software gaps and opportunities in: data management, computation, collaboration, security, registry, portals and APIs for integration. 16 pieces of software from the CSP are in widespread use today with sustained funds and thriving communities, including across 7 national and international production infrastructures. OMII-UK carried out 75+ evaluations of 40+ components, including 50,000+ lines of documentation; this evaluation process has transformed to a heavyweight "throw over the wall" to a flexible "pre-and-post release" service which is better suited for research software whilst still helping developers to provide quality software for their users, and for users to understand the quality of the software. The ENGAGE triage process, which used the criteria of timeliness and availability of effort from the users to engage, was used to improve the OMII-UK CSP process, and subsequently broadened to 6 key criteria forming the basis of the Software Sustainability Institute's process for evaluating potential software sustainability work. OMII-UK delivered over 150 training events to over 2,500 researchers, improving the capability and capacity of the different groups to benefit from the use of OMII-UK software to support their research. The OMII-UK Product / Area Liaisons (PALs) programme selected 10 individuals external to OMII-UK who were specialists within their own communities, providing them with a £5,000/year travel budget for collaboration. The PALs acted as OMII-UK ambassadors, helped OMII-UK establish user communities for software components, and provided community intelligence. The PALs provide an innovative way of identifying researcher requirements through recruiting advocates in different discipline areas. It has been picked up by other projects. OMII-UK successfully focused on the dissemination of the impact of software on research. The OMII-UK Newsletter had over 1,500 downloads for each issue. We have commissioned articles from prestigious organisations such as NASA, CERN and articles are widely republished in other areas including iSGTW, NGS News. Focusing the website on researchers rather than developers led to 2,500 visitors a month, 1% monthly growth, 40% UK, 15% US. A 2009 survey showed that 39% read news articles on a regular basis.
First Year Of Impact 2007
Sector Aerospace, Defence and Marine,Chemicals,Communities and Social Services/Policy,Digital/Communication/Information Technologies (including Software),Education,Energy,Environment,Healthcare,Government, Democracy and Justice,Manufacturing, including Industrial Biotechology,Culture, Heritage, Museums and Collections,Pharmaceuticals and Medical Biotechnology
Impact Types Economic

 
Description Annotopia
Amount £50,000 (GBP)
Organisation Massachusetts General Hospital 
Sector Hospitals
Country United States
Start 04/2015 
End 07/2015
 
Description BBSRC BBR Web Services 4 Life Science: A Curated Catalogue of Life Science Web Services
Amount £304,319 (GBP)
Funding ID BB/F01046X/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 05/2008 
End 04/2011
 
Description BBSRC From Data to Knowledge - the ONDEX System for integrating Life Sciences data sources
Amount £913,259 (GBP)
Funding ID BBF0060391 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 04/2008 
End 03/2011
 
Description EPSRC EP/D044324/1 OMII-UK extension
Amount £136,523 (GBP)
Funding ID EP/D044324/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Academic/University
Country United Kingdom
Start 08/2009 
End 03/2010
 
Description EPSRC EP/H043160/1 SSI: The UK Software Sustainability Institute
Amount £643,231 (GBP)
Funding ID EP/H043160/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Academic/University
Country United Kingdom
Start 06/2010 
End 05/2016
 
Description ESRC Obesity e-Lab: e-Infrastructure for inter-disciplinary collaborative research into obesity
Amount £889,201 (GBP)
Funding ID ES/F029721/1 
Organisation Economic and Social Research Council 
Sector Public
Country United Kingdom
Start 09/2008 
End 02/2012
 
Description EU FP7 231519 e-Lico, e-Laboratory for interdisciplinary collaborative research in data mining and data-intensive sciences
Amount € 495,000 (EUR)
Funding ID 231519 
Organisation European Commission 
Department Seventh Framework Programme (FP7)
Sector Public
Country European Union (EU)
Start 02/2009 
End 01/2012
 
Description EU FP7 IP SCAPE, Scalable Preservation Environments
Amount € 794,000 (EUR)
Funding ID 97458 
Organisation European Commission 
Department Seventh Framework Programme (FP7)
Sector Public
Country European Union (EU)
Start 12/2010 
End 11/2014
 
Description EU FP7 STREP Wf4Ever Advanced Workflow Preservation Technologies for Enhanced Science
Amount € 500,000 (EUR)
Funding ID 270192 
Organisation European Commission 
Department Seventh Framework Programme (FP7)
Sector Public
Country European Union (EU)
Start 12/2010 
End 11/2013
 
Description Eli Lilly Provenance Project
Amount $1,500,000 (USD)
Organisation Eli Lilly & Company Ltd 
Sector Private
Country United Kingdom
Start 05/2008 
End 01/2009
 
Description FP7 Infrastructures BioVel: Biodiversity Virtual eLaboratory
Amount € 1,120,000 (EUR)
Funding ID 283359 
Organisation European Commission 
Department Seventh Framework Programme (FP7)
Sector Public
Country European Union (EU)
Start 08/2011 
End 12/2014
 
Description JISC e-Infrastructure for Social Simulation (NeiSS)
Amount £116,000 (GBP)
Organisation Jisc 
Sector Public
Country United Kingdom
Start 04/2009 
End 03/2012
 
Description JISC myExperiment
Amount £518,000 (GBP)
Organisation Jisc 
Sector Public
Country United Kingdom
Start 03/2007 
End 10/2009
 
Description KTA Eagle Taverna: enabling the provision of commercial support capacity for the Taverna Workflow Management System
Amount £19,000 (GBP)
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Department Knowledge Transfer Account (University of Manchester)
Sector Academic/University
Country United Kingdom
Start 01/2010 
End 12/2010
 
Description Microsoft Corp Shared Genomics: Accessible High Performance Computing for Genomics Medical Research
Amount $600,000 (USD)
Organisation Microsoft Research 
Sector Private
Country Global
Start 09/2007 
End 08/2009
 
Description Microsoft Research myExperiment
Amount £229,656 (GBP)
Organisation Microsoft Research 
Sector Private
Country Global
Start 10/2007 
End 09/2008
 
Description National Cancer Institute / NIH, Taverna-caGrid
Amount $200,000 (USD)
Organisation National Institutes of Health (NIH) 
Sector Public
Country United States
Start 09/2008 
End 08/2009
 
Description TSB award Cloud Analytics for Life Sciences
Amount £149,000 (GBP)
Funding ID 100932 
Organisation TSB Bank plc 
Sector Private
Country United Kingdom
Start 02/2010 
End 08/2012
 
Title open source software licence 
Description Taverna Workflow Workbench Suite General Public License (LGPL) Version 2.1. http://www.taverna.org.uk 
IP Reference  
Protection Protection not required
Year Protection Granted
Licensed Yes
Impact Wide adoption. Now accepted as an Apache Incubator Project for Open Development (2014).
 
Title Taverna Workflow Management System 2.x 
Description Scientific Workflow Management System and Toolsuite including: enactment engine, workbench, plugins and plugin framework, server, commandline tool, player, interaction service. 
Type Of Technology Software 
Year Produced 2008 
Open Source License? Yes  
Impact Widespread, global use throughout research labs, universities and some commercial adoption. A daily audit reveals over a 1000 different users a day across the globe are using the Taverna Workbench to make workflows, and this does not include workflows executed through applications or portals on a server. Taverna 2.x has More than 40000 downloads in total More than 5000 downloads of Taverna 2.5 Workbench Nearly 3000 downloads of Taverna 2.5 Command Line Tool More than 300 downloads of Taverna 2.5.4 Server 
URL http://www.taverna.org.uk
 
Title myExperiment 
Description Public repository for retaining and sharing scientific workflows. Social sharing platform. myExperiment makes it easy to find, use and share scientific workflows and other Research Objects, and to build communities. 
Type Of Technology Webtool/Application 
Year Produced 2008 
Impact First and arguably only public sharing platform for any workflow system. Over 500 citations (combined, google scholar) of 3 main myExperiment papers. on 13/03/2017 myExperiment has: 10472 registered members, 392 groups, 3811 workflows, 1223 files, 470 packs Used by several EU projects (e.g. BioVeL, SCAPE, HELIO, VPH), US (e.g. FLOSS) and companies (e.g. RapidMiner) as their workflow repository. over 22 workflow systems represented in repository. in the 30 days in Oct 2014, 2391 unique users (logged in and anonymous), which we can extrapolate. 
URL http://myexperiment.org