Context of Use:
To easily navigate through biodiversity publications and save time on an in-depth bibliographic exploration of the use of various types of biodiversity data (taxon names, taxon treatments, sequences, figures, tables, collections, persons, and more) in the literature. It runs in-depth searches of published biodiversity data to identify the papers and sections within them most relevant to the question of the user. Beyond pointing users to a relevant paper, it also gives information and re-directs the user to the accurate place within the original article, where the related data item has been mentioned. OpenBiodiv provides multiple search and browsing options: General semantic search (based on SPARQL), SPARQL endpoint, sample SPARQL queries, user apps (based on Elasticsearch indexing), and RESTful API.
Need:
In their studies, biodiversity researchers need to use and refer to extensive and diverse biodiversity data at once, e.g. taxon names, treatments, collections, authorities, genetic sequences, tables, images, etc. However, those would normally be scattered across a vast number of articles or belong to dissociated databases. OpenBiodiv brings all those data types together and provides semantic enhancements to them making them FAIR, easily findable, and re-usable.
Added Value:
OpenBiodiv is the very first semantic graph database for biodiversity. It offers a free-to-use and no-registration-required portal to browse and locate relevant and linked biodiversity data from a vast number of publications. Using RDF, OpenBiodiv allows for much more in-depth exploration of those data, compared to alternative methods. As a result, OpenBiodiv exposes ‘hidden’ links between data items across published content. For users comfortable with SPARQL, OpenBiodiv is also capable of further expediting bibliographic exploration by answering complex queries right away.
Competitive Advantage:
While some other tools and platforms browse the available (biodiversity) literature and retrieve relevant scientific publications, OpenBiodiv uses a knowledge graph and semantics to search within the content of each publication.
The OpenBiodiv’s database is updated daily with knowledge extracted from biodiversity-related articles published in Pensoft’s journals (e.g. ZooKeys, PhytoKeys, MycoKeys, Biodiversity Data Journal, and more than 30 others) and taxonomic treatments harvested and semantically annotated by Plazi from journals of other publishers. As of the moment of writing, the database contains data from over 30,000 biodiversity articles and information, including nearly a million taxon names, more than 130,000 sequences, 240,000 authors, etc. (see: https://openbiodiv.net/statistics).
A qualitative upgrade on the current use of biodiversity data:
OpenBiodiv facilitates and expedites bibliographic exploration so that biodiversity scientists and other users can be better informed about already existing knowledge, thereby improving its efficiency and reusability in further research and policy-making.
Exemplary Use of the Service:
By querying the OpenBiodiv database, users can retrieve articles and sub-article elements in four different ways:
* General SPARQL-based semantic search
* User apps for in-depth literature exploration, based on Elasticsearch indexing of entities
* SPARQL endpoint and sample SPARQL queries
* RESTful API
Thanks to the internal semantic structure, use of persistent identifiers for each data element, and consistent backend ontology, OpenBiodiv can answer also complex open-ended biodiversity-related questions, such as:
* Which publications contain treatments of the beetle genus Carabus?
* Which publications about the plant genus Ambrosia are published by author X and/or author Y?
* Which publications provide treatments of specimens kept at the National Museum of Natural Sciences in Madrid?
Competencies and Skills that are needed to use the Service:
Beginners’ knowledge of biodiversity and taxonomy would be sufficient for a user to browse and use the OpenBiodiv database. To more or less experienced SPARQL users, the platform offers additional features to run more specific questions and retrieve better-refined results.
Challenges for the Users:
While OpenBiodiv’s database is extensive and is continuously being updated with new data as those are being published across biodiversity articles in Pensoft’s portfolio of journals and other journals, whose treatments are being accessed and extracted by Plazi, it is not exhaustive. As a result, users might not be able to retrieve all data relevant to their queries.
No experience in SPARQL may limit the advantages of the platform for some users.
Users Role in the Service Development
OpenBiodiv is continuously being updated not only with new data as those are being published and/or extracted but with new workflows and features. The OpenBiodiv team welcomes all its users to reach out and share their feedback at: datascience@pensoft.net or to Open a ticket. We will be happy to use it in future developments and (re)designs.