In parallel to this deluge of raw data (especially in light of the decline in sequencing costs), scholarly publications comprise knowledge based on billions of facts in millions of published narratives. This corpus of knowledge actually represents a rich citation network, albeit almost entirely based on implicit citation links. These links set by the authors have the potential to be a base for building a biodiversity knowledge graph needed to enhance the access and re-use of knowledge accumulated over centuries. Today, increasingly larger corpora of literature sources are text- and data-mined and used to generate new hypotheses as a basis for further analyses. This includes sub-article elements as named entities (biological taxa), structured blocks of text and figures (taxon treatments), figures, specimen records and others. In a next step, such data, extracted from literature, should be made FAIR and deposited in repositories, including rich metadata.


Some digitally born publications in the domain of biodiversity provide explicit links, embedded during the act of publishing, for example to DNA sequences, digital natural history specimens, species identifications, literature citations, people and ontologies and thus function as a hub that binds these various data types together.

Last modified: Wednesday, 8 November 2023, 9:22 AM