Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Paleobiology Database (PBDB) is a non-governmental, non-profit public resource for paleontological data. It has been organized and operated by a multi-disciplinary, multi-institutional, international group of paleobiological researchers. Its purpose is to provide global, collection-based occurrence and taxonomic data for organisms of all geological ages, as well data services to allow easy access to data for independent development of analytical tools, visualization software, and applications of all types. The Database’s broader goal is to encourage and enable data-driven collaborative efforts that address large-scale paleobiological questions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Paleobiology Database is a public resource for the global scientific community. It has been organized and operated by a multi-disciplinary, multi-institutional, international group of paleobiological researchers. Its purpose is to provide global, collection-based occurrence and taxonomic data for marine and terrestrial animals and plants of any geological age, as well as web-based software for statistical analysis of the data. The project_s wider, long-term goal is to encourage collaborative efforts to answer large-scale paleobiological questions by developing a useful database infrastructure and bringing together large data sets.
http://paleobiodb.org/The Paleobiology Database is a public database of paleontological data that anyone can use, maintained by an international non-governmental group of paleontologists. https://paleobiodb.org/#/A non-governmental, non-profit public database for paleontological data providing researchers and the public with information about the entire fossil record. It has been organized and operated by a multi-disciplinary, multi-institutional, international group of paleobiological researchers. Its purpose is to provide global, collection-based occurrence and taxonomic data for organisms of all geological ages, as well data services to allow easy access to data for independent development of analytical tools, visualization software, and applications of all types. The Database's broader goal is to encourage and enable data-driven collaborative efforts that address large-scale paleobiological questions. Paleontological data files are accepted for upload. However, PaleoBioDB needs some basic data types to be included in order to perform an upload. The Application Programming Interface (API) gives scientists, students, and developers programmatic access to taxonomic, spatial, and temporal data contained within the database.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Paleobiology Database is a public database of paleontological data that anyone can use, maintained by an international non-governmental group of paleontologists.
Fossil occurrences from scientific publications are added to the database by our contributing members. Thanks to our membership, which includes nearly 400 scientists from over 130 institutions in 24 countries, the Paleobiology Database is able to provide scientists and the public with information about the fossil record.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the raw fossil data downloaded from the Paleobiology database on April 21st 2021. This data was used for the manuscript "Deep-time climate legacies affect origination rates of marine genera". The data is deposited in this additional repository, as the file was too big for the accompanying github repository.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Probiotics Database (PBDB) is a comprehensive bioinformatics resource that systematically collects and categorizes probiotic strains derived from diverse fermented food sources. Currently housing 1,730 well-characterized probiotic strains, the database features a structured classification system including production-related (768 strains), research-oriented (264 strains), environmental protection (22 strains), disease prevention and treatment (17 strains), and other specialized categories (659 strains). Designed as a dynamic knowledge platform, PBDB provides essential biological information to support research across human health, animal science, and agricultural applications, with regular updates to incorporate new discoveries. Beyond serving as a reference repository, the database enables advanced bioinformatics analyses to elucidate probiotic mechanisms and facilitates the computational prediction of novel probiotic candidates from unexplored microbial resources. By integrating empirical data with analytical tools, PBDB significantly enhances our capacity to understand, develop, and apply probiotic solutions across multiple scientific and industrial domains.
200 years after the naming of the first dinosaur, taxonomic studies remain an important component of dinosaur research. Around 50 new dinosaurs are named each year, and are discovered from across the globe. The rate of new dinosaur discovery shows no signs of slowing, but not all geographic areas and temporal windows have been equally investigated. The potential for new dinosaur discoveries in India and Africa seems particularly high, while the Carnian, when dinosaurs probably originated, and the Middle Jurassic, when the major clades diversified, offer the best opportunities to make discoveries that will fundamentally change our understanding of dinosaur evolution. A major challenge to the discovery of new dinosaurs is funding. Frontier fieldwork is sometimes viewed as too risky to fund, while basic taxonomic work is considered to lack impact. As a consequence, we risk an ‘extinction of experience’, where researchers have limited training in the basic field and specimen-based research ..., Collector curves–All dinosaur regular genera and species, both valid and invalid, were downloaded from the Paleobiology Database (PBDB; paleobiodb.org) on 17th December 2024. The data were cleaned to remove Avialae, ichnotaxa, and ootaxa. Taxa that were listed as invalid due to misspellings, obsolete variates, or that were renamed for grammatical or linguistic reasons were removed. Nomina dubia, nomina nuda, objective and subjective synonyms, and recombinations were retained. Collector curves (Fig. 1) were built in R 3.4.0 [124]. Code and raw data are available in the Supplementary Material. Time-calibrated phylogeny–A consensus dinosaur phylogeny was manually produced in Mesquite [125]. First and last appearance data were collected for all taxa in the phylogeny and are listed in the data file provided in the Supplementary Material. First and last appearances generally correspond to the earliest and latest dates of the Stage from which the taxon is known, unless more accurate info..., , # New frontiers in dinosaur exploration
https://doi.org/10.5061/dryad.05qfttfd3
These data were collected to review the state of dinosaur taxonomy and systematics today, as part of an invited review titled 'New Frontiers in Dinosaur Exploration'. The raw data tables in xlsx and csv format were downloaded from the Paleobiology Database or Scopus and then cleansed according to the methods provided here and in the publication. The .txt file was compiled from the literature, while the .nex file is a phylogenetic tree that represents a consensus dinosaur phylogeny and was hand-built in Mesquite.Â
Description:Â A file showing the first and last appearance data for dinosaur taxa in the phylogenetic tree. This file is needed for time-calibration of the phylogenetic tree (DinotreeR1.nex) and is used in the code "Time-calibration_palaeotree.R".
A Plant Proteome DataBase for Arabidopsis thaliana and maize (Zea mays). The PPDB stores experimental data from in-house proteome and mass spectrometry analysis, curated information about protein function, protein properties and subcellular localization. Importantly, proteins are particularly curated for possible (intra) plastid location and their plastid function. Protein accessions identified in published Arabidopsis (and other Brassicacea) proteomics papers are cross-referenced to rapidly determine previous experimental identification by mass spectrometry. All protein-encoding gene models in the Arabidopsis nuclear and organellar genomes, as assembled by TAIR, as well as all maize EST assemblies (ZmGI) as assembled by DFCI Maize Gene Index project. These are all uploaded in PPDB and are linked to each other via a BLAST alignment. Thus every predicted protein in both species can be searched for experimental and other information (even if not experimentally identified).
The Fezouata Shale Formation has dramatically impacted our understanding of early Ordovician marine ecosystems before the Great Ordovician Biodiversification Event (GOBE), thanks to the abundance and quality of exceptionally preserved animals within. Systematic work has noted that the shelly fossil sub-assemblages of the Fezouata Shale biota are typical of open-marine deposits from the Lower Ordovician, but no studies have tested the quantitative validity of this statement. We extracted 491 occurrences of recalcitrant fossil genera from the Paleobiology Database to reconstruct 31 sub-assemblages, to explore the paleoecology of the Fezouata Shale and other contemporary, high-latitude (66°S – 90°S) deposits from the Lower Ordovician (485.4 Ma – 470 Ma) and test the interpretation that the Fezouata Shale biota is typical for an Ordovician open-marine environment. Sørensen’s dissimilarity metrics and Wilcoxon tests indicate that the sub-assemblages of the Tremadocian-aged lower Fezouata Sha..., Script 1: Rstudio Script used to process and manipulate datasets, and to create visualizations and run statistical tests. Dataset 1: Early Ordovician occurence data downloaded from the paleobiological database (Early Ordovician PBDB). Dataset 2: Taxa from Dataset 1 that were categorized as either Biomineralizing or Conventional. The sheets labeled "Notes and Legend for Excel Docu" outlines the symbols used to delineate classification of taxonomic levels examined in all other sheets. The sheet labeled "Phyla Level" pertains to which metazoan phyla are biomineralizing, conventional, etc. The sheet labeled "Class Level" pertains to which metazoan classes are biomineralizing, conventional, etc. The sheet labeled "Order Level" pertains to which metazoan orders are biomineralizing, conventional, etc. The sheet labeled "Family Level" pertains to which metazoan families are biomineralizing, conventional, etc. The sheet labeled "Genera Level" pertains to which metazoan genera are biomineralizi..., , # Data from: the Fezouata Shale Formation biota is typical for the high latitudes of the early Ordovician – a quantitative approach
Dataset 1: Downloaded from the paleobiological database (Early Ordovician PBDB "Early Ordovician PBDB")).Â
Data Provider/Source: The Paleobiology Database
License URL:
Data URL: )
This occurence data was downloaded from the Paleobiology Database on June 21st 2022, setting the maximum age to 485.4 and the minimum age to 443.8.Â
Dataset 2: Taxa from Dataset 1 that were categorized as either Biomineralizing or Conventional. The sheets labeled "Notes and Legend for Excel Docu" outlines the symbols used to delineate classification of taxonomic levels examined in all other sheets. The sheet labeled "Phyla Level" pertains to which metazoan phyla are biomineralizing, conven...
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Geographic range is used as a correlate of extinction risk for extant and extinct organisms across the fields of conservation and paleobiology. However, the exact method used to measure geographic range, the biases, and the limitations of each are rarely discussed explicitly despite their potential to impact conclusions. Here I examine and quantify properties of five commonly used measures of geographic range (convex hull area, maximum pairwise great circle distance, latitudinal range, longitudinal range, and cell count) along with a rarely used measure (minimum spanning tree distance) in the context of three datasets. A simulated dataset of two shapes with known areal limits, a paleontological occurrence dataset of pre-Cenozoic brachiopod genera from the Paleobiology Database (PBDB), and 50000 occurrence records of birds species in the western hemisphere from the eBird database.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
This is the supplementary data repository of the Paleobiology paper titled Bedrock Geological Map Predictions for Phanerozoic Fossil Occurrences. Geographically-explicit, taxonomically resolved fossil occurrences are necessary for reconstructing macroevolutionary patterns and for testing a wide range of hypotheses in the Earth and life sciences. Heterogeneity in the spatial and temporal distribution of fossil occurrences in the Paleobiology Database (PBDB) is attributable to several different factors, including turnover among biological communities, socioeconomic disparities in the intensity of paleontological research, and geological controls on the distribution and fossil yield of sedimentary deposits. Here we use the intersection of global geologic map data from Macrostrat and fossil collections in the PBDB to assess the extent to which the potentially fossil-bearing, surface-expressed sedimentary record has yielded fossil occurrences. We find a significant and moderately strong positive correlation between geologic map area and the number of fossil occurrences. This correlation is consistent regardless of map unit age and binning protocol, except at period level; the Neogene and Quaternary have non-marine map units covering large areas and yielding fewer occurrences than expected. The sedimentary record of North America and Europe yields significantly more fossil occurrences per sedimentary area than similarly-aged deposits in most of the rest of the world. However, geographic differences in area and age of sedimentary deposits lead to regionally different expectations for fossil occurrences. Using the sampling of surface-expressed sedimentary units in North America and Europe as a predictor for what might be recoverable from the surface-expressed sedimentary deposits of other regions, we find that the rest of the globe is approximately 45% as well sampled in the PBDB. Using age and area of bedrock and sampling in North America and Europe as a basis for prediction, we estimate that over 639 thousand occurrences from outside of these regions would need to be added to the PBDB to achieve global geological parity in sampling. In general, new terrestrial fossil occurrences are expected to have the greatest impact on macroevolutionary patterns.
A plant promoter database that provides information on transcription start sites (TSSs), core promoter structure and regulatory element groups (REGs) as putative and comprehensive transcriptional regulatory elements. Microarray data-based predictions have been appended as REG annotations which inform their putative physiological roles.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Understanding how biodiversity has changed through time and space is a central aim of paleobiology. To elucidate accurate biodiversity patterns in deep time, regional case studies, where sampling biases can be minimized, are needed. The Upper Jurassic Morrison Formation of the western USA crops out over 1.2 million km2 and covers 12 degrees of latitude. It was deposited over a ~9-million-year time period and was home to some of the most iconic dinosaurs. Utilizing a new, high-resolution chronostratigraphic framework for the formation, tetrapod occurrences from the Paleobiology Database were temporally and spatially mapped to examine patterns of diversity change through time and space, and the geographic ranges of taxa were examined to shed light on niche partitioning. Latitudinally, diversity was found to peak in the center of the basin, perhaps due to the availability of water resources. Diversity increased over time in the Morrison Formation, and there is no evidence to indicate a decline in diversity prior to the extinction of the fauna at the end of the Jurassic. There appears to be some degree of geographic separation of faunas in the Morrison basin, with southeastern and northwestern fauna, albeit with a number of overlapping taxa. High-resolution climate models paired with detailed sedimentological analysis could help to elucidate the drivers of the patterns observed here.
Methods
All vertebrate occurrences in the Morrison Formation were downloaded from the Paleobiology Database (PBDB; paleobiodb.org; accessed 23/12/2022). The data were visually inspected and occurrences related to eggshells or tracks were removed, leaving only those pertaining to body fossils. This resulted in 1397 occurrences. Taxonomy was cleansed following the recent literature. Occurrences were manually attributed to systems tracts described in Maidment & Muxworthy (2019) based on stratigraphic logs or descriptions in the literature for each locality and supplemented with first-hand observations of a number of quarries. A full list of quarries, systems tracts, and references for the stratigraphic location are provided in the spreadsheet “Quarry data.csv” in the Online Supplementary Material available with the manuscript. As not all references provided stratigraphic logs or descriptions, it was not always possible to attribute quarries to stratigraphic locations, but 1144 occurrences (82%) could be attributed to a systems tract. The occurrences represent 300 discrete collections, for which stratigraphic data is known for 182 (60%). 957 occurrences are identified to the generic level or better, of which 799 could be assigned stratigraphic data (83%). These data are available in this data package in the spreadsheet “Occurrence data with STs.csv”. Diversity analyses were carried out in R ver. 4.0.4 (R Core Team, 2021) using the Tidyverse package (Wickham et al., 2019) and all code is available in this data package.
Latitudinal biodiversity
Raw diversity—In order to assess how biodiversity changed with latitude in the Morrison Formation, two measures were used. The first measure was diversity, which herein equates to generic richness. Generic occurrence data (available in this data package in the spreadsheet “Genera_with_latitude.xlsx”) were binned per degree of latitude and the number of distinct genera in each latitudinal bin was summed. This was carried out for the total dataset and for data within each system tract. The second measure was abundance. An occurrence in the PBDB is the presence of a taxon within a collection; however, for some collections, there were multiple occurrences of the same taxon, and that is signified in the PBDB using abundance data. Abundance was calculated for each collection based on the “abund_value” column in the PBDB data. Where no abundance was specified for an occurrence, the abundance was assumed to be equal to one. Not all abundances are equal: a single abundance datapoint might indicate a single, more-or-less complete articulated sauropod skeleton or might refer to a single isolated fish scale. Microsites and bone beds are therefore heavily over-represented in the abundance data, while sites with articulated skeletons may be under-represented. Abundance data was binned per degree of latitude. This data is available in the spreadsheet “corrected abundance with latitude.xlsx” in this data package.
In order to assess whether the raw diversity patterns observed were influenced by sampling bias, the number of collections per degree of latitude was calculated from the PBDB occurrence data. This data is available in the spreadsheet “Collections_with_latitude.xlsx” in this data package. Diversity, abundance, and collections were plotted against latitude in R, and correlations between the curves were investigated using Spearman’s Rho, Kendall’s Tau, and generalized least squares regression using a first-order autoregressive model (corARMA). The latter was carried out because it reduces the chances of overestimating the statistical significance of regression lines due to serial correlation in the latitudinal series. Data was naturally log-transformed prior to GLS regression, which was carried out using the gls() function in the R package nlme (Pinheiro et al., 2018). Code for these analyses is available in this data package as “diversity_analysis_code.R”.
Subsampled diversity—In order to account for the strong degree of sampling bias observed in the data (see Results), shareholder quorum sub-sampling (SQS; Alroy 2010) was carried out on the whole dataset using the ‘estimateD’ command and a confidence interval of 0.95 in the R package iNEXT (Hsieh et al. 2016). The analysis was carried out in R ver. 4.0.4 (R Core Team, 2021) and the code is available in this data package, “iNext_code.R”. Cleansed generic occurrence data from the PBDB was used, and abundances of specific taxa were calculated for each degree of latitude. Investigation of the corrected abundance data (see above) indicated that it was overwhelmed with occurrences from two sites: specimens of Diplodocus from the Mother’s Day Quarry in southern Montana (1483 specimens recorded), and specimens of Allosaurus from the Dry Mesa Quarry of Utah (200 specimens recorded). These quarries are bone beds and the abundance values most likely relate to the number of individual bones found, rather than the number of individuals that were actually present. Using these abundance data when sample-standardizing is therefore problematic, and consequently, it was not used. To investigate whether latitudinal bins with very low sample sizes and a limited number of generic occurrences were impacting the results of the analysis, the latitudinal data were examined and latitudinal bins with fewer than 10 occurrences were removed. The analysis was re-run. SQS was carried out at quorum levels from 0.7 to 0.3. Sub-sampled diversity analyses were also attempted for each system tract, but there was too little data to provide meaningful results.
Temporal diversity
Raw diversity—In order to assess how diversity changed through time in the Morrison Formation, diversity (=generic richness) and abundance were again used. Cleansed occurrences and abundance data from the PBDB were binned by systems tract; those for which no systems tract data was known were discarded. Diversity and abundance were plotted against systems tract in R.
Subsampled diversity—To account for different levels of sampling in different systems tracts, shareholder quorum subsampling was carried out on cleansed occurrence data for each systems tract, following the method used for latitudinal diversity. SQS was carried out with quorum levels from 0.7 to 0.3.
Collector curves
In order to assess how well sampled the B4 and C6 systems tracts were relative to each other (see Discussion), collector curves, showing the cumulative number of unique collections and the cumulative number of new taxa identified per year for the B4 and C6 systems tracts were built using the year the occurrence was published, which was provided in the PBDB download. The data is contained in the spreadsheet “Collector_curve_data.xlsx” and the code is provided as “collector_curve_code” in this data package.
Yacobucci_PBIO_Suppl_Table1Dataset of Cenomanian-Turonian cephalopod occurrences, derived from the Paleobiology Database and the M.S. thesis of Richard A. MacKenzie III (2007, Bowling Green State University, Bowling Green, OH, USA).Yacobucci_PBIO_Suppl_Table2This file contains tables of p-values from all Mann-Whitney U-tests run for this analysis, including comparisons of global and regional latitudinal distributions and geographic range sizes.Yacobucci_PBIO_AppendixThis Appendix describes the two datasets used in this paper, the Paleobiology Database dataset (PBDB) and Richard MacKenzie Database (MKDB), and provides citations for the original data sources for the Richard MacKenzie Database (MKDB).
Conservation planners and resource managers are concerned about ecological resilience and survival of species as climate and sea level change. The fossil record contains an excellent means to test species responses to changing conditions. This dataset utilizes molluscan faunal data extracted from a fossil database – the Paleobiology Database (PBDB; https://paleobiodb.org/classic) – for the late Pleistocene through Holocene (129,000 years before present (ybp) to present), limited to the south Florida region, as a way to address the question how many molluscan taxa survived the significant changes to Florida’s coastline over approximately the last 129,000 years. The initial PDBD download was cleaned by eliminating duplicate entries and invalid taxa. After the data cleaning and validation, 347 taxa remained (327 late Pleistocene, and 20 Holocene); of these, 314 are considered valid taxa for this study (294 late Pleistocene, 20 Holocene). The remaining 33 taxa had some uncertainty in their taxonomic standing that could not be resolved, but the names were retained for portions of the analysis. All 347 taxa were compared to databases and published lists of extant mollusks to determine which taxa have survived to the present, and if they are still found within Florida. When only the 314 valid species are examined for the late Pleistocene and Holocene, 93% of the taxa are still alive today, indicating survival throughout the last glacial cycle; 7% went extinct; and <1% were locally extirpated. Surviving species drop to 86% and extinct species rise to 13% if the 33 uncertain taxa are included for the late Pleistocene and Holocene. If just the late Pleistocene (0.129 Ma to 0.0117 Ma) valid taxa are compared to extant fauna, 92% survived, 8% went extinct, and less than 1% were locally extirpated. These data suggest that the molluscan fauna of south Florida are relatively resilient to significant changes, information that can be of value as resource managers develop conservation plans for changing conditions. The work described here is funded by the Greater Everglades Priority Ecosystem Science program of the USGS.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Strata of the Ediacaran Period (635-538.8 Ma) yield the oldest known fossils of complex, macroscopic organisms in the geologic record. These “Ediacaran-type” macrofossils (known as the Ediacaran biota) first appear in mid-Ediacaran strata, experience an apparent decline through the terminal Ediacaran, and directly precede the Cambrian (538.8-485.4 Ma) radiation of animals. Existing hypotheses for the origin and demise of the Ediacaran biota include: changing oceanic redox states, biotic replacement by succeeding Cambrian-type fauna, and mass extinction driven by environmental change. Few studies frame trends in Ediacaran and Cambrian macroevolution from the perspective of the sedimentary rock record, despite well-documented Phanerozoic covariation of macroevolutionary patterns and sedimentary rock quantity. Here we present a quantitative analysis of North American Ediacaran–Cambrian rock and fossil records from Macrostrat and the Paleobiology Database. Marine sedimentary rock quantity increases nearly monotonically and by over a factor of five from the latest Ediacaran to the late Cambrian. Ediacaran–Cambrian fossil quantities exhibit a comparable trajectory and have strong (rs > 0.8) positive correlations with marine sedimentary area and volume flux at multiple temporal resolutions. Even so, Ediacaran fossil quantities are dramatically reduced in comparison to the Cambrian when normalized by the quantity of preserved marine rock. Although aspects of these results are consistent with the expectations of a simple fossil-preservation induced sampling bias, together they suggest that transgression-regression and a large expansion of marine shelf environments coincided with the diversification of animals during a dramatic transition that is starkly evident in both the sedimentary rock and fossil records. Methods Two existing datasets, Macrostrat's database of rocks/stratigraphy and a subset of Paleobiology Database (PBDB) fossil occurrence data, were merged on the basis of their shared rock unit name field for Ediacaran-Cambrian age (635-485.4 Ma) rocks/fossils of North America. Once PBDB fossil occurrences were matched to Macrostrat rock units in time and space (and checked), the fossil occurrence age ranges were modified based on the Macrostrat provided age model of a given fossil occurrences' host rock. Time series of fossil occurrences were generated from this updated data. Correlation coefficients were calculated from the generated time series of fossil occurrences and rock quantities through the Ediacaran-Cambrian geologic time Periods.
Online Appendix : This online Appendix presents the detailed derivation of the model used by Andr��oletti, Zwaans et al., as well as supplementary results and figures. We extend results of Gupta et al. (2020) and Manceau et al. (2021) to piecewise-constant parameters, describe our implementation in the RevBayes software, and give detailed information on all priors used for simulation or inference in our analyses. Cetacean molecular, morphological and occurrence datasets : The initial raw files are included, but the modified files that were effectively used in the analysis are the following : - Taxa : Cetacea_genera.csv - Nuclear sequences : M4358_nuclear_simplified_newNames_genera_removeOutgroups.nex - Mitochondrial sequences : M4376_mt_simplified_newNames_genera_removeOutgroups.nex - Morphological characters : morpho_simplified_newNames_genera_removeOutgroupsUndescribedInvariants.nex - Fossil occurrences : Cetacea_occurrences_min_max_age_species_corrected.csv All modifications are described in the methods section and/or below. Molecular dataset : - "newNames" = updated names from the PBDB in May 2020 (physeter catodon -> Physeter macrocephalus) - "genera" = keep only the most complete specimen in each genus has been kept (present in the morphological dataset then longest nuclear sequence) for genus-level analyses // Removed species : Balaenoptera acutorostrata, Balaenoptera bonaerensis, Balaenoptera borealis, Balaenoptera brydei, Balaenoptera edeni, Balaenoptera musculus, Balaenoptera omurai, Berardius arnuxii, Cephalorhynchus commersonii, Cephalorhynchus eutropia, Cephalorhynchus hectori, Delphinus capensis, Delphinus tropicalis, Eubalaena australis, Eubalaena japonica, Globicephala melas, Hyperoodon planifrons, Kogia simus, Lagenorhynchus acutus, Lagenorhynchus obliquidens, Lagenorhynchus australis, Lagenorhynchus cruciger, Lagenorhynchus obscurus, Lissodelphis peronii, Mesoplodon bidens, Mesoplodon bowdoini, Mesoplodon carlhubbsi, Mesoplodon densirostris, Mesoplodon stejnegeri, Mesoplodon ginkgodens, Mesoplodon grayi, Mesoplodon hectori, Mesoplodon layardii, Mesoplodon mirus, Mesoplodon perrini, Mesoplodon peruvianus, Mesoplodon traversii, Phocoena dioptrica, Phocoena sinus, Phocoena spinipinnis, Platanista minor, Sotalia guianensis, Stenella attenuata, Stenella clymene, Stenella frontalis, Stenella longirostris, Tursiops aduncus - remove outgroups (Bos taurus, Sus scrofa, Hippopotamus amphibius) Morphological dataset : - morpho_conservative.nex : initial dataset - "newNames" = updated names from the PBDB in May 2020 - "simplified" = simpler NEXUS files for RevBayes - "genera" = keep only the most complete specimen in each genus has been kept (lowest missing proportion then higher number of unambiguous states) for genus-level analyses // Removed species : Atocetus nasalis, Brachydelphis jahuayensis, Haborophocoena minutus, Lophocetus repenningi, Odobenocetops peruvianus, Otekaikea huata, Parapontoporia wilsoni - remove outgroups (Bos taurus, Sus scrofa, Hippopotamus amphibius) - remove undescribed taxa (CCNHM 1078, CCNHM 208, CCNHM 210, CCNHM 567, CCNHM Schizodelphis, ChM PV2758, ChM PV2761, ChM PV2764, ChM PV4178, ChM PV4745, ChM PV4746, ChM PV4755, ChM PV4802, ChM PV4834, ChM PV4961, ChM PV5711, ChM PV5720, ChM PV5852, ChM PV7679, Schizodelphis morckhoviensis, Xenorophus sp.) - remove invariant characters - remove uncertainty-polymorphism (viewed as missing) Occurrence dataset : - downloaded from the Paleobiology Database (PBDB) on May 11th 2020 Funding provided by: ETH Z��rich Postdoctoral Fellowship*Crossref Funder Registry ID: Award Number: Online Appendix : available in the Related Works
section Cetacean Datasets : [copied from the subsection Material and methods > Cetacean data analysis > Molecular, morphological and occurrence datasets of the main paper] The data can be subdivided in three parts: molecular, morphological, and occurrences. Datasets were collected and analysed separately and are stored on the Open Science Framework (https://osf.io) ([dataset] Aguirre-Fern��andez et al., 2020). Molecular data comes from Steeman et al. (2009), and comprises 6 mitochondrial and 9 nuclear genes, for 87 of the 89 accepted extant cetacean species. Morphological data was obtained from Churchill et al. (2018), the most recent version of a widely-used dataset first produced by Geisler and Sanders (2003). After merging 2 taxa that are now considered synonyms on the Paleobiology Database (PBDB) and removing 3 outgroups that would have violated our model's assumptions, it now contains 327 variable morphological characters for 27 extant and 90 fossil taxa (mostly identified at the species level but 21 remain undescribed). In order to speed up the analysis we further excluded the undescribed specimens and reduced this dataset to the generic level by selecting the most complete specimen in each genera. Indeed, the computing cost increases quadratically with the maximum number of hidden linea...
This data set contains abundance data for fossil mollusk genera from the Late Cretaceous of the U.S. Coastal Plain published by Sohl and Koch (1983, 1984, 1987). It also contains global stratigraphic ranges, global geographic ranges, and taxonomic information for genera, downloaded from the Paleobiology Database (PBDB) at http://paleodb.org in February 2008. This data set is used to examine the link between rarity and extinction across the end-Cretaceous mass extinction in Coastal Plain mollusks.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Large-scale analysis of the fossil record requires aggregation of palaeontological data from individual fossil localities. Prior to computers these synoptic datasets were compiled by hand, a laborious undertaking that took years of effort and forced palaeontologists to make difficult choices about what types of data to tabulate. The advent of desktop computers ushered in palaeontology’s first digital revolution – online literature-based databases, such as the Paleobiology Database (PBDB). However, the published literature represents only a small proportion of the palaeontological data housed in museum collections. Although this issue has long been appreciated, the magnitude, and thus potential significance, of these so-called “dark data” has been difficult to determine. Here, in the early phases of a second digital revolution in palaeontology the digitization of museum collections – we provide an estimate of the magnitude of palaeontology’s dark data. Digitization of our nine institutions’ holdings of Cenozoic marine invertebrate collections from California, Oregon, and Washington in the United States reveals that they represent 23 times the number of unique localities than are currently available in the Paleobiology Database. These data, and the vast quantity of similarly untapped dark data in other museum collections, will when digitally mobilized enhance palaeontologists’ ability to make inferences about the patterns and processes of past evolutionary and ecological changes.
Dataset contains abundance data for fossil mollusk genera from the Late Cretaceous of the U.S. Coastal Plain published by Sohl and Koch (1983, 1984, 1987). Also contains global stratigraphic ranges, global geographic ranges, and taxonomic information for genera, downloaded from the Paleobiology Database (PBDB) in February 2008. Used to examine the link between rarity and extinction across the end-Cretaceous mass extinction in Coastal Plain mollusks.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Paleobiology Database (PBDB) is a non-governmental, non-profit public resource for paleontological data. It has been organized and operated by a multi-disciplinary, multi-institutional, international group of paleobiological researchers. Its purpose is to provide global, collection-based occurrence and taxonomic data for organisms of all geological ages, as well data services to allow easy access to data for independent development of analytical tools, visualization software, and applications of all types. The Database’s broader goal is to encourage and enable data-driven collaborative efforts that address large-scale paleobiological questions.