SIBiLS provides personalized Information Retrieval in the biological literature. Indeed, SIBiLS allows fully customizable search in semantically enriched contents, based on keywords and/or mapped biomedical entities from a growing set of standardized and legacy vocabularies. The services have been used and favorably evaluated to assist the curation of genes and gene products, by delivering customized literature triage engines to different curation teams. SIBiLS are freely accessible via REST APIs and are ready to empower any curation workflow, built on modern technologies scalable with big data: MongoDB and Elasticsearch.


Search examples

Find all documents about assassin bugs

Question-answering examples

The question-answering mode is limited to MEDLINE and PLAZI collections.

What diseases are associated with ticks?
What is the gestation time of pangolins?
What is the tail size of pangolins?
When is the sexual maturity of pangolins?
Where potamopyrgus antipodarum are invasive?
What species can be a vector of eggs of Dermatobia hominis?

Data

SIBiLS today covers 4 collections: MEDLINE, PubMedCentral (PMC), Plazi treatments, and PMC supplementary files. Collections are daily updated. Contents are parsed and then enriched by billions of mapped biomedical entities from reference vocabularies (described here). Output is JSON, in BioC (for fetch) or native Elasticsearch (for search) formats.

Fetch API

It allows to retrieval of annotated contents from a given collection. The input is a set of document IDs (up to 1,000 per request). The output is a set of parsed and annotated contents, in JSON and/or BioC formats. For MEDLINE citations, delivered and annotated fields include for example abstracts, or MeSH terms; for PMC full texts, paragraphs provided with their hierarchical level in the document structure, or figure captions; for supplementary data, text extracted from Excel files or ocerized from images. Annotations are delivered with many features including the type of the mapped entity (drug, gene, disease...), the vocabulary used, the vocabulary unique identifier and preferred term, or the mapping characters offsets.

Customizable search API

It allows to perform a fully customizable search for valuable documents in a given collection. The power of this service is based on the efficiency of Elasticsearch engines, and on the rich Lucene query language, which allows to investigate of a large panel of searching strategies. For example, basic search with keywords or entity identifiers (“ZBED1” or “NP_NX_O96006”), searches in specified fields (“figures_captions: ZBED1” or “tables: mapped treatments”), boosting fields or query parts, Boolean, exploiting identified concepts or identified concept types... The input is thus a Lucene JSON query. The output is the Elasticsearch ranked result set in its native JSON format; for each document (up to 10,000 per request), a relevance score, and the indexed content.

Question Answering API

it allows one to ask questions in natural languages and to obtain answers extracted from documents from a given collection. The power of this service is based on previous Elasticserch indexes and the BERT language model. For example: asking for "What diseases are transmitted by ticks ?" in Plazi treatments. The input is a free text question. The output is a set of answers, ranked by scores, along with documents' snippets.

Let's consider a practical example of how researchers and citizen scientists might use SIBiLS (Swiss Institute of Bioinformatics Literature Services) for a biodiversity conservation project.

Project: Mapping Biodiversity Hotspots for Endangered Species Conservation

Objective:

To identify and map biodiversity hotspots, particularly focusing on endangered species, to aid in conservation efforts and habitat protection.

Steps Involving SIBiLS:

1.         Project Setup and Initial Planning:

  • A team of biodiversity researchers and conservationists sets up a project focusing on a specific geographical region known for its rich but threatened biodiversity.
  • They aim to identify key species, particularly those endangered or at risk, and their critical habitats.

2.        Customizing Search in SIBiLS:

  • The team uses SIBiLS to create a customized search query. They input keywords related to the region's biodiversity, specific endangered species, ecological indicators, and conservation studies.
  • They configure the search to include a wide range of sources, from academic journals to environmental reports and conservation databases.

3.         Literature Triage and Data Collection:

  • SIBiLS provides a curated list of literature, including recent studies, historical data, and relevant environmental impact assessments.
  • The team reviews the literature, focusing on studies that detail species distribution, habitat requirements, ecological roles, and threats to their survival.

4.         Data Analysis and Habitat Mapping:

  • The researchers analyze the collected data to identify critical habitats and biodiversity hotspots, particularly areas where endangered species are concentrated.
  • They use GIS tools in conjunction with the data from SIBiLS to create detailed maps of these hotspots.

5.         Engaging Citizen Scientists:

  • The project involves citizen scientists who use the SIBiLS platform to access simplified versions of the research data.
  • These volunteers conduct field surveys, observe and record sightings of the identified endangered species, and upload their findings to SIBiLS.

6.         Collaboration and Data Sharing:

  • The team collaborates with local conservation organizations, sharing their findings and maps.
  • They use SIBiLS to continuously update their data and maps based on ongoing research and citizen science contributions.

7.         Publishing and Advocacy:

  • The final biodiversity hotspot maps and associated species data are published through SIBiLS, making them accessible to conservationists, policymakers, and the public.
  • The team uses their findings to advocate for conservation measures, habitat protection, and sustainable development practices in the region.

Outcome:

This project leverages SIBiLS for efficient and comprehensive literature review, data curation, and public engagement in biodiversity conservation. By identifying and mapping biodiversity hotspots, especially for endangered species, the project provides crucial data for conservation efforts and informs strategies to protect vital habitats. The involvement of citizen scientists broadens the scope of data collection and fosters community engagement in biodiversity conservation.




Last modified: Monday, 20 November 2023, 1:46 PM