Geographic And Taxonomic Occurrence R-based Scrubbing (gatoRs): An R package and workflow for processing biodiversity data

Patten, N. N., M. L. Gaynor, D. E. Soltis, and P. S. Soltis. 2024. Geographic And Taxonomic Occurrence R-based Scrubbing (gatoRs): An R package and workflow for processing biodiversity data. Applications in Plant Sciences 12:e11575.

https://doi.org/10.1002/aps3.11575

Abstract

Premise

Digitized biodiversity data offer extensive information; however, obtaining and processing biodiversity data can be daunting. Complexities arise during data cleaning, such as identifying and removing problematic records. To address these issues, we created the R package Geographic And Taxonomic Occurrence R-based Scrubbing (gatoRs).

Methods and Results

The gatoRs workflow includes functions that streamline downloading records from the Global Biodiversity Information Facility (GBIF) and Integrated Digitized Biocollections (iDigBio). We also created functions to clean downloaded specimen records. Unlike previous R packages, gatoRs accounts for differences in download structure between GBIF and iDigBio and allows for user control via interactive cleaning steps.

Conclusions

Our pipeline enables the scientific community to process biodiversity data efficiently and is accessible to the R coding novice. We anticipate that gatoRs will be useful for both established and beginning users. Furthermore, we expect our package will facilitate the introduction of biodiversity-related concepts into the classroom via the use of herbarium specimens.

See also: GitHub – nataliepatten/gatoRs