Best practices are essential for ensuring efficiency, accuracy, and consistency in any workflow, particularly in the digitization and data management processes used by PLAZI. Here’s a best practice example from the use of the PLAZI workflow:

Description:

Data quality and integrity are paramount in the digitization of taxonomic literature. The extracted information forms the basis for scientific research, conservation efforts, and policy-making. Therefore, it is crucial that the data is accurate, verifiable, and consistent.

Steps:

  1. Preparation and Pre-Processing:

    • Ensure that the literature to be digitized is of high quality, with clear text and images.
    • Use high-quality scanning equipment and settings to capture the details of the taxonomic literature accurately.
    • Apply Optical Character Recognition (OCR) software that is well-suited for scientific texts to minimize errors in text recognition.
  2. Extraction Using GoldenGATE Imagine:

    • Utilize GoldenGATE Imagine to extract taxonomic treatments, ensuring that all relevant information, including descriptions, images, and distribution data, is captured.
    • Follow standardized templates for different journal formats to maintain consistency across the digitized literature.
  3. Manual Review and Editing:

    • Conduct a thorough manual review of the extracted treatments to correct any OCR errors and formatting issues.
    • Verify the scientific names, taxonomic hierarchy, and bibliographic references against authoritative databases.
  4. Semantic Annotation:

    • Annotate the treatments with semantic tags meticulously, using a controlled vocabulary and linking to external data sources where applicable.
    • Ensure that the annotations are consistent and follow the guidelines provided by PLAZI and other relevant standards.
  5. Quality Control Checks:

    • Implement a multi-tiered quality control process involving peer reviews and cross-checks with original literature.
    • Use checklists and validation tools provided by PLAZI to ensure that all data meets the required standards.
  6. Data Export and Upload:

    • Export the annotated treatments in a format that is compatible with TreatmentBank and other biodiversity databases.
    • Upload the data along with comprehensive metadata to facilitate easy access and citation.
  7. Continuous Improvement:

    • Collect feedback from end-users and data consumers to identify areas for improvement.
    • Update the workflow and training materials regularly to incorporate new technologies and methodologies.

Outcomes:

Adhering to this best practice ensures that the data extracted from taxonomic literature is reliable and can be confidently used by the scientific community. It also facilitates interoperability with other biodiversity data systems, enhancing the value and reach of the digitized information.

Impact:

This best practice contributes to the creation of a robust and valuable digital taxonomic record. It supports the mission of PLAZI to make taxonomic data freely and widely available, thus advancing scientific research and biodiversity conservation.

By following such best practices, organizations and individuals using the PLAZI workflow can maintain high standards in the digitization of taxonomic literature, thereby ensuring that the resulting data is of the highest quality and utility.


Last modified: Saturday, 4 November 2023, 5:40 PM