BioSolr: addressing the challenges in making biomedical data easily accesible using the world-leading Apache-Solr search-engine framework

Lead Research Organisation: European Bioinformatics Institute
Department Name: Protein Data Bank in Europe

Abstract

Not Required

Technical Summary

Not Required

Planned Impact

Not Required

Publications

10 25 50

publication icon
Armstrong DR (2020) PDBe: improved findability of macromolecular structure data in the PDB. in Nucleic acids research

publication icon
Berman HM (2016) The archiving and dissemination of biological structure data. in Current opinion in structural biology

publication icon
Berman HM (2014) The Protein Data Bank archive as an open data resource. in Journal of computer-aided molecular design

publication icon
Velankar S (2021) The Protein Data Bank Archive. in Methods in molecular biology (Clifton, N.J.)

publication icon
Young JY (2018) Worldwide Protein Data Bank biocuration supporting open access to high-quality 3D structural biology data. in Database : the journal of biological databases and curation

 
Description The major objectives of this grant were three fold:
1. To ensure that the interns/interchangers from software industry working at the EMBL-EBI, get background and domain knowledge in life sciences as part of the interchange program;
2. To ensure that their software skills are used in the process to develop new functionality relevant to the life science area;
3. To bring together the bioinformatics community to exchange expertise and experience of using various search engine technologies available for developing life science-related query mechanisms.

All these objectives were met as the interchange resulted in knowledge transfer to software industry experts. The interchangers were embedded within the EMBL-EBI teams for the period of the project and were exposed to the life sciences data and practices. The interchangers also helped the teams to assess use of search engine technologies after understanding the requirements and developed new functionality by contributing to the open source Apache Lucene Solr project. The new functionality is in use in production systems at EMBL-EBI. The other major outcome of the grant was bringing together Apache Lucene Solr "experts" and "non-experts" at workshops to discuss requirements and exchange knowledge and experience. Bioinformaticians from different UK, European and US universities and institutes attended the workshops alongside software industry experts resulting in exchange of knowledge and expertise.
As a result of the project better query systems are now available for life science data.
Exploitation Route The interactions established during the BioSolr project have been beneficial in improving the query mechanisms for life science data. By developing software that is open source, some of the developments were accepted by the main Apache Lucene Solr software committee and were integrated into the main distribution. As these developments are part of the main Apache Lucene Solr distribution, anyone using the search engine has access to the new functionality. The remaining BioSolr software that did not become part of the main distribution is available for distribution via Apache Lucene Solr site and Github as a plugin. The software is part of the query systems at EMBL-EBI.
Sectors Digital/Communication/Information Technologies (including Software),Other

 
Title BioSolr software repository 
Description Repository of all the code developed in BioSolr project. Contains enhancements to Apache Solr. One of the patches (facet-contains) is integrated in the official Apache Solr distribution. The remaining patches are linked to JIRA issues listed on Apache Solr site. These patches are used in production services by SPOT and PDBe teams. 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact The new developments have enhanced the Apache Solr functionality that was deemed essential for the improvements in the services provided by the SPOT and PDBe teams. The code is also evaluated by the NCBI teams in the USA. 
URL https://github.com/flaxsearch/BioSolr/
 
Description Apr 21 2015 : London Solr Meetup 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact The search experts in London were informed of the BioSolr project and the planned developments. There was quite a lot interest from this community and they have provided input in the development of the BioSolr plugins.
Year(s) Of Engagement Activity 2015
URL http://www.meetup.com/Apache-Lucene-Solr-London-User-Group/events/220603505/
 
Description Better search for life sciences at the BioSolr Workshop, day 1 - Apache Lucene/Solr 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Blog-post describing the activities on the first day of "Open source search for bioinformatics" workshop
Year(s) Of Engagement Activity 2016
URL http://www.flax.co.uk/blog/2016/02/10/better-search-life-sciences-biosolr-workshop-day-1-apache-luce...
 
Description Better search for life sciences at the BioSolr Workshop, day 2 - Elasticsearch & others 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The blog describes the activities on the second day of the "Open source search for bioinformatics" workshop. The blog was advertised on the PDBe twitter and facebook accounts.
Year(s) Of Engagement Activity 2016
URL http://www.flax.co.uk/blog/2016/02/15/better-search-life-sciences-biosolr-workshop-day-2-elasticsear...
 
Description BioSolr workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The initial BioSolr workshop was designed to bring together Solr users from Cambridge area. The workshop was attended by many teams on campus and by teams from NCBI in USA. There was also presence from industry from Cambridge area and Siren solutions from Ireland. This has resulted in continued involvement from all the people who attended the initial workshop which has resulted in better interactions between different teams.
Year(s) Of Engagement Activity 2014
URL http://www.flax.co.uk/blog/2014/10/02/biosolr-begins-with-a-workshop-day/
 
Description ECCB 2018 - PDBe/UniProt workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This international workshop was conducted jointly by PDBe and UniProt teams.
Year(s) Of Engagement Activity 2018
 
Description EMBL training course "Structural bioinformatics (Virtual)" 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This course explored bioinformatics data resources and tools for the investigation, analysis, and interpretation of biomacromolecular structures. It focused on how best to analyse and interpret available structural data to gain useful information given specific research contexts. The course content also covered predicting protein structure and function, and exploring interactions with other macromolecules as well as with low-MW compounds. Workshops were presented on PDBe search, pages and tools, as well as PDBe-KB pages.This course was a virtual event delivered via a mixture of live-streamed sessions, pre-recorded lectures, and tutorials with live support.
Year(s) Of Engagement Activity 2020
URL https://www.ebi.ac.uk/training/events/structural-bioinformatics-virtual/
 
Description EMBL-EBI training course "Summer school in bioinformatics (Virtual)" 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This course provided an introduction to the use of bioinformatics in biological research, giving participants guidance for using bioinformatics in their work whilst also providing hands-on training in tools and resources appropriate to their research. Participants were initially introduced to bioinformatics theory and practice, including best practices for undertaking bioinformatics analysis, data management and reproducibility. To enable specific exploration of resources in their particular field of interest, participants were divided into focused groups to work on a small project set by EMBL-EBI resource and research staff, ending in a presentation from each group on the final day of the course to bring together learnings from all participants. The course included training and mentoring by experts from EMBL-EBI and external institutes. PDBe supervised the group project for independent exploration and analysis of PDBe-KB data.
Year(s) Of Engagement Activity 2020
URL https://www.ebi.ac.uk/training/events/summer-school-bioinformatics-virtual/
 
Description Indian Biophysical Society-PDBe workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact This workshop was conducted as part of the Indian Biophysical Society meeting at the Indian Institute of Science Education and Research (IISER), India.
Year(s) Of Engagement Activity 2018
 
Description July 10-11, 2015 : BOSC Dublin 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A presentation at Bioinformatics Open Source Conference (BOSC) Special Interest Group meeting in Dublin, as part of this year's ISMB/ECCB conference. This made the community aware of the BioSolr developments resulting in more request for information.
Year(s) Of Engagement Activity 2015
URL http://www.flax.co.uk/blog/2015/07/13/biosolr-at-bosc-2015-open-source-search-for-bioinformatics/
 
Description October 13-16, 2015 : Lucene Revolution, Austin, Texas 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Matt Pearce presented the BioSolr project at the Lucene revolution meeting which is a meeting of Solr professionals. This presentation was accepted because of the number of votes it received as a presentation of interest to a large number of people attending the conference. This resulted in discussions and further interaction with search engine professionals.
Year(s) Of Engagement Activity 2015
URL http://www.flax.co.uk/blog/2015/10/16/lucenesolr-revolution-2015-biosolr-searching-the-stuff-of-life...
 
Description Open source search for bioinformatics workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact More than 40 bioinformatics and search engine professionals attended the workshop. The attendees included professionals working in biological data management at major bioinformatics centres (EBI and NCBI) as well as researchers from UK universities. The participants also included search engine technology experts from computer science and bioinformatics communities and companies.
Year(s) Of Engagement Activity 2016
URL http://www.ebi.ac.uk/pdbe/about/events/open-source-search-bioinformatics
 
Description PDBe API webinar series "Creating complex PDBe API queries" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This webinar was part of a 6-part PDBe API webinar series, introducing different levels of programmatic access at PDBe. The series ranged from basic data retrieval and search using the PDBe API to more advanced features, including access and reuse of PDBe data visualisation components.

This webinar demonstrated how to create more complex queries by combining the PDBe search API with numerous other calls. By introducing specific case studies, we highlighted the scope of PDBe programmatic access.
Year(s) Of Engagement Activity 2020
URL https://www.ebi.ac.uk/training/events/creating-complex-pdbe-api-queries/
 
Description PDBe API webinar series "Introduction to PDBe programmatic access" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This webinar was part of a 6-part PDBe API webinar series, introducing different levels of programmatic access at PDBe.The series ranged from basic data retrieval and search using the PDBe API to more advanced features, including access and reuse of PDBe data visualisation components. This webinar gave an introduction to programmatic access at PDBe, highlighting the type of data that is available and how this can be utilised.
Year(s) Of Engagement Activity 2020
URL https://www.ebi.ac.uk/training/events/introduction-pdbe-programmatic-access/
 
Description PDBe workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact This workshop involved 20 participants at the Max F. Perutz Laboratories (MFPL) in Vienna.
Year(s) Of Engagement Activity 2018
 
Description PDBe/EMPIAR HALOS consortium virtual workshop Hamburg 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This virtual workshop was organised by PDBe and HALOS, and also involved EMDB and EMPIAR. It provided an introduction to the Protein Data Bank and associated databases. It took the form of three afternoon online workshop sessions, combined with webinars that had be watched before the online sessions.
Year(s) Of Engagement Activity 2020
URL https://www.halos.lu.se/calendar/pdbeempiar-workshop-hamburg
 
Description PDBe/Uniprot API workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This workshop was conducted at the National Institute of Immunology in India and involved 50 international participants.
Year(s) Of Engagement Activity 2018
 
Description Presentation at Diamond light source 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Presentation on PDBe activities including SIFTS resource. The presentation described the new developments at PDBe including the web components and Web based 3D viewers that display annotations using SIFTS API. The SIFTS resource was described as a way to get value added annotation by linking Sequence and Structure based annotations from different data resources. The new query system and the search API at PDBe which is based on BioSolr developments was also described.
Year(s) Of Engagement Activity 2016
 
Description Presentation at NII Shonan meeting in Japan 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The NII Shonan meeting was organised to discuss visualisation of biological information. The presentation concentrated both on the visulisation of data but also source of annotation information with SIFTS data central to linking structure and sequence information. There were further inquiries from participants on the SIFTS data and the REST API that makes these data accessible. One of the work groups also discuss how to query information in most efficient way including some of the developments at PDBe that have come about due to BioSolr project.
Year(s) Of Engagement Activity 2016
URL http://shonan.nii.ac.jp/shonan/blog/2015/10/30/web-%E2%80%90based-molecular-graphics/
 
Description Presentation at Unité de glycobiologie structurale et fonctionnelle, Université de Lille 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact The presentation entitled "PDBe - Bringing structure to biology" described PDBe developments including SIFTS resource and the new query system. The presentation also described new developments on REST API and planned developments for SIFTS resource.
Year(s) Of Engagement Activity 2017
URL http://ugsf-umr-glycobiologie.univ-lille1.fr/Seminar-Friday-10th-February-Sameer-Velankar-PDBe-leade...
 
Description SWAT4LS workshop - A new Ontology Lookup Service at EMBL-EBI 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact The workshop presented how BioSolr has helped implement the new ontology lookup service. This created a lot of interest in the new developments from the participants.
Year(s) Of Engagement Activity 2015
URL http://ceur-ws.org/Vol-1546/
 
Description Structural bioinformatics course 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact 26 participants from an international background participated in this onsite workshop.
Year(s) Of Engagement Activity 2018
 
Description Talk and online training course on PDBe at Warwick University 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Undergraduate students
Results and Impact Workshop and talk to undergraduate Chemists at Warwick University, focusing on the PDB and accessing protein structure data.
Year(s) Of Engagement Activity 2020
 
Description Talk at IISER (Pune) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A presentation on PDBe, pdbe.org website and the infrastructure behind it, including SIFTS.
Year(s) Of Engagement Activity 2017
 
Description Talk at MBU (Bengaluru) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact A presentation on PDBe, pdbe.org website and the infrastructure behind it, including SIFTS, API and search functionality. The talk was attended by over 50 people from the Molecular Biophysics Unit and other departments of the IISc in Bengaluru, India.
Year(s) Of Engagement Activity 2017
 
Description Talk at NII (New Delhi) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A presentation on PDBe, pdbe.org website and the infrastructure behind it, including SIFTS.
Year(s) Of Engagement Activity 2017
 
Description Talk at Pune University (Pune) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact A presentation on PDBe, pdbe.org website and the infrastructure behind it, including SIFTS.
Year(s) Of Engagement Activity 2017
 
Description The fun and frustration of writing a plugin for Elasticsearch for ontology indexing 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The blog describes the work on ontology indexer carried out as part of the BioSolr project.
Year(s) Of Engagement Activity 2016
URL http://www.flax.co.uk/blog/2016/01/27/fun-frustration-writing-plugin-elasticsearch-ontology-indexing...
 
Description Webinar: Finding macromolecular structures more easily at PDBe 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact This webinar was conducted as part of the online training program.
Year(s) Of Engagement Activity 2018
 
Description XJoin for Solr, part 1: filtering using price discount data 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The blog describes the development work carried out in BioSolr and its applications outside bioinformatics field.
Year(s) Of Engagement Activity 2016
URL http://www.flax.co.uk/blog/2016/01/25/xjoin-solr-part-1-filtering-using-price-discount-data/
 
Description XJoin for Solr, part 2: a click-through example 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The blog describes application of the work carried out in the BioSolr project in the field of e-commerce and search engine deveopment.
Year(s) Of Engagement Activity 2016
URL http://www.flax.co.uk/blog/2016/01/29/xjoin-solr-part-2-click-example/