SIBiLS provides personalized Information Retrieval in the biological literature. Indeed, SIBiLS allows fully customizable search in semantically enriched contents, based on keywords and/or mapped biomedical entities from a growing set of standardized and legacy vocabularies. The services have been used and favorably evaluated to assist the curation of genes and gene products, by delivering customized literature triage engines to different curation teams. SIBiLS are freely accessible via REST APIs and are ready to empower any curation workflow, built on modern technologies scalable with big data: MongoDB and Elasticsearch.


Search examples

Find all documents about assassin bugs

Question-answering examples

The question-answering mode is limited to MEDLINE and PLAZI collections.

What diseases are associated with ticks?
What is the gestation time of pangolins?
What is the tail size of pangolins?
When is the sexual maturity of pangolins?
Where potamopyrgus antipodarum are invasive?
What species can be a vector of eggs of Dermatobia hominis?

Data

SIBiLS today covers 4 collections: MEDLINE, PubMedCentral (PMC), Plazi treatments, and PMC supplementary files. Collections are daily updated. Contents are parsed and then enriched by billions of mapped biomedical entities from reference vocabularies (described here). Output is JSON, in BioC (for fetch) or native Elasticsearch (for search) formats.

Fetch API

It allows to retrieval of annotated contents from a given collection. The input is a set of document IDs (up to 1,000 per request). The output is a set of parsed and annotated contents, in JSON and/or BioC formats. For MEDLINE citations, delivered and annotated fields include for example abstracts, or MeSH terms; for PMC full texts, paragraphs provided with their hierarchical level in the document structure, or figure captions; for supplementary data, text extracted from Excel files or ocerized from images. Annotations are delivered with many features including the type of the mapped entity (drug, gene, disease...), the vocabulary used, the vocabulary unique identifier and preferred term, or the mapping characters offsets.

Customizable search API

It allows to perform a fully customizable search for valuable documents in a given collection. The power of this service is based on the efficiency of Elasticsearch engines, and on the rich Lucene query language, which allows to investigate of a large panel of searching strategies. For example, basic search with keywords or entity identifiers (“ZBED1” or “NP_NX_O96006”), searches in specified fields (“figures_captions: ZBED1” or “tables: mapped treatments”), boosting fields or query parts, Boolean, exploiting identified concepts or identified concept types... The input is thus a Lucene JSON query. The output is the Elasticsearch ranked result set in its native JSON format; for each document (up to 10,000 per request), a relevance score, and the indexed content.

Question Answering API

it allows one to ask questions in natural languages and to obtain answers extracted from documents from a given collection. The power of this service is based on previous Elasticserch indexes and the BERT language model. For example: asking for "What diseases are transmitted by ticks ?" in Plazi treatments. The input is a free text question. The output is a set of answers, ranked by scores, along with documents' snippets.

Let's consider a practical example of how researchers and citizen scientists might use SIBiLS (Swiss Institute of Bioinformatics Literature Services) for a biodiversity conservation project.

Last modified: Monday, 13 November 2023, 4:35 PM