講演・口頭発表等

本文へのリンクあり 国際会議
2022年7月26日

Resolving species names rapidly and accurately with the “taxastand” R package

2022 Botanical Society of America Conference
  • Joel H. Nitta
  • ,
  • Wataru Iwasaki

開催年月日
2022年7月24日 - 2022年7月27日
記述言語
英語
会議種別
口頭発表(一般)
開催地
Anchorage, AK (online)
国・地域
アメリカ合衆国

Recently, it has become possible to conduct analyses of biodiversity on previously unimaginable scales by leveraging large, public datasets such as GBIF and GenBank. Species names are key identifiers that enable the merging of data between such datasets. However, it is not unusual to encounter multiple different synonyms applied to the same species across different datasets, which can prevent data merging if not resolved to the underlying species name. To enable analysis synthesizing data derived from multiple, large datasets, it is imperative to have software capable of resolving species names in a rapid, automated, and accurate fashion.
Here, we present the “taxastand” R package for standardization of species names across datasets. The taxastand package builds on the “taxon-tools” (https://github.com/camwebb/taxon-tools) command-line tool to enable species name matching and resolution in R, the popular programming environment used by many ecologists and evolutionary biologists. Features of taxastand include 1) ability to use any user-specified reference database, 2) completely local usage (no calls to an online API), thereby facilitating reproducibility, 3) fuzzy matching, and 4) awareness of the rules of botanical nomenclature when resolving names.
As a case-study, we demonstrate usage of taxastand to join distribution data of Japanese ferns from GBIF to a dataset on endangered status of Japanese ferns (the “Green List”). Of 1,092 species in GBIF, taxastand was able to successfully resolve 770 names to the Green List. As the Japanese pteridophyte flora only includes ca. 720 species (excluding hybrids), it is likely that many of the unresolved GBIF names were non-native taxa or artifacts. To verify the accuracy of name resolution, we generated maps of species richness and compared them to previously published maps. Except for a few outliers, the maps were nearly indistinguishable.
taxastand is freely available at https://github.com/joelnitta/taxastand.

リンク情報
URL
https://joelnitta.github.io/botany_2022_taxastand 本文へのリンクあり