BKH: Data Exchange mechanisms | biodiversity_knowledge

In this section we describe the data flow and exchange process that are supported by the different services of BKH, with a focus on the integration and dissemination of taxonomic, genetic, and treatment information. This data flow ensures that biodiversity data, including taxonomic names, DNA sequences, treatment information, and associated metadata, is efficiently processed, integrated, and disseminated through various platforms and repositories. The integration of data and metadata in standardized formats allows for the sharing of comprehensive and up-to-date biodiversity information with the wider scientific community and the public. The different services involved, such as GBIF, Catalogue of Life, ENA, OpenBioDiv, SIBiLS, and Zenodo, each play a specific role in this ecosystem of data exchange and dissemination.

Specimens (GBIF)

Data transfer to GBIF uses DwC-A, with each archive bundling the data of all treatments from an individual source publication. The archives are (re-)generated as new treatments are added/marked in a publication, or existing ones are modified. After an archive is packed, it is either registered to the GBIF API (on first export), or the GBIF API gets notified of the update (on subsequent exports). GBIF then downloads and ingests the DwCA from TreatmentBank.

Taxonomic Names (Catalogue of Life)

Newly created taxon names, as well as other nomenclature acts, are included in the DwCAs exported from TreatmentBank, and thus ingested by GBIF. From there, the newly coined taxon names get forwarded to Catalogue of Life.

DNA sequence data (ENA)

Accession numbers cited in treatments are marked as such and linked to their respective sequence pages in ENA. In the other direction, ENA follows an update feed that informs it about new treatments as well as modifications to existing ones. Based upon this feed, ENA discovers treatments and material citations associated with the names of the taxa from whose specimens gene sequences are derived.

Linked Open Data (OpenBioDiv)

As new treatments are added to TreatmentBank and existing ones are modified, each one is registered in OpenBioDiv and enqueued for processing. OpenBioDiv then fetches the generic GG XML via the registered HTTP URI and ingests it into its knowledge base.

Treatment TaxPub (SIBiLS)

As new treatments are added to TreatmentBank and existing ones are modified, each one is transformed into TaxPub via XSLT, validated, and pushed to SIBiLS via SFTP.

XHTML (BLR/Zenodo)

As new treatments are added to TreatmentBank and existing ones are modified, each one is transformed into XHTML via XSLT, validated, and pushed to Zenodo via their API. The associated metadata is sent along as JSON, generated from the metadata of the source publication as well as details extracted from the treatment proper. The returned deposition number and the derived DOI are stored back into the treatments. For newly added publications, a similar mechanism exports the underlying source PDF to Zenodo and stores the returned deposition number in the converted publication; the derived DOI is stored only if the source publication does not come with a DOI that was minted and assigned by the publisher. Furthermore, individual figures and graphics are exported to their own individual Zenodo depositions as well (as PNGs), and the returned deposition numbers and derived DOIs are stored in their associated captions in the converted publications. The DOIs are also added to in-text citations of the figures, thereby establishing the link between treatments and the figures they cite.

Last modified: Tuesday, 7 November 2023, 2:03 PM