100+ datasets found
  1. I

    International Registry of Reproductive Pathology Database

    • databank.illinois.edu
    • aws-databank-alb.library.illinois.edu
    Updated Nov 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kenneth B. McEntee (2024). International Registry of Reproductive Pathology Database [Dataset]. http://doi.org/10.13012/B2IDB-3175716_V1
    Explore at:
    Dataset updated
    Nov 1, 2024
    Authors
    Kenneth B. McEntee
    Description

    The International Registry of Reproductive Pathology Database is part of pioneering work done by Dr. Kenneth McEntee to comprehensively document thousands of disease cases studies. His large and comprehensive collection of case reports and physical samples was complimented by development of the International Registry of Reproductive Pathology Database in the 1980s. The original FoxPro Database files and a migrated access version were completed by the College of Veterinary Medicine in 2016. Access CSV files were completed by the University of Illinois Library in 2017.

  2. d

    Community Registry

    • catalog.data.gov
    • data.austintexas.gov
    • +3more
    Updated Nov 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.austintexas.gov (2025). Community Registry [Dataset]. https://catalog.data.gov/dataset/neighborhood-groups-community-registry
    Explore at:
    Dataset updated
    Nov 25, 2025
    Dataset provided by
    data.austintexas.gov
    Description

    City of Austin Open Data Terms of Use https://data.austintexas.gov/stories/s/ranj-cccq This dataset is a monthly upload of the Community Registry (www.AustinTexas.gov/CR), where community organizations such as neighborhood associations may register with the City of Austin to receive notices of land development permit applications within 500 feet of the organization's specified boundaries. This dataset can be used to contact multiple registered organizations at once by filtering/sorting, for example, by Association Type or by Association ZipCode. The organizations' boundaries can be viewed in the City's interactive map at www.AustinTexas.gov/GIS/PropertyProfile/ - the Community Registry layer is under the Boundaries/Grids folder. Austin Development Services Data Disclaimer: The data provided are for informational use only and may differ from official department data. Austin Development Services’ database is continuously updated, so reports run at different times may produce different results. Care should be taken when comparing against other reports as different data collection methods and different data sources may have been used. Austin Development Services does not assume any liability for any decision made or action taken or not taken by the recipient in reliance upon any information or data provided.

  3. n

    Bookplate Registry Database

    • curate.nd.edu
    bin
    Updated Dec 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rare Books & Special Collections (2023). Bookplate Registry Database [Dataset]. http://doi.org/10.7274/r0-yq4p-t907
    Explore at:
    binAvailable download formats
    Dataset updated
    Dec 15, 2023
    Dataset provided by
    University of Notre Dame
    Authors
    Rare Books & Special Collections
    License

    https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106

    Description

    The bookplate registry database focuses on the bookplates that are pasted into the front matter of a book to show ownership. The bookplate registry is a searchable database image catalog of approximately 1100 sample bookplates and library stamps from Hesburgh Libraries Rare Books and Special Collections at the University of Notre Dame. The database was created during preliminary explorations of the cataloging and database methodology necessary to support a cooperative online bookplate registry for multiple universities. The database focuses on both the owners of the books as well as the artists that created the bookplate designs. The attached files include a powerpoint presenation given by Christian Dupont that was given at the 41st Annual Preconference of the Rare Books and Manuscripts Section of the American Library Association in Chicago, Illinois on July 7, 2000. The presentation explains the project in more detail and the data that was collected. The dataset gives information on the bookplates that were reviewed at the University of Notre Dame Hesburgh Libraries Rare Book and Special Collections. The original site that this information was searchable on was retired in the Fall of 2021.

  4. E

    [JeDI] - Jellyfish Database Initiative: Global records on gelatinous...

    • erddap.bco-dmo.org
    Updated Apr 3, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BCO-DMO (2018). [JeDI] - Jellyfish Database Initiative: Global records on gelatinous zooplankton for the past 200 years, collected from global sources and literature (Trophic BATS project) (Plankton Community Composition and Trophic Interactions as Modifiers of Carbon Export in the Sargasso Sea ) [Dataset]. https://erddap.bco-dmo.org/erddap/info/bcodmo_dataset_526852/index.html
    Explore at:
    Dataset updated
    Apr 3, 2018
    Dataset provided by
    Biological and Chemical Oceanographic Data Management Office (BCO-DMO)
    Authors
    BCO-DMO
    License

    https://www.bco-dmo.org/dataset/526852/licensehttps://www.bco-dmo.org/dataset/526852/license

    Area covered
    Sargasso Sea,
    Variables measured
    day, date, year, depth, month, taxon, contact, density, latitude, net_mesh, and 27 more
    Description

    The Jellyfish Database Initiative (JeDI) is a scientifically-coordinated global database dedicated to gelatinous zooplankton (members of the Cnidaria, Ctenophora and Thaliacea) and associated environmental data. The database holds 476,000 quantitative, categorical, presence-absence and presence only records of gelatinous zooplankton spanning the past four centuries (1790-2011) assembled from a variety of published and unpublished sources. Gelatinous zooplankton data are reported to species level, where identified, but taxonomic information on phylum, family and order are reported for all records. Other auxiliary metadata, such as physical, environmental and biometric information relating to the gelatinous zooplankton metadata, are included with each respective entry. JeDI has been developed and designed as an open access research tool for the scientific community to quantitatively define the global baseline of gelatinous zooplankton populations and to describe long-term and large-scale trends in gelatinous zooplankton populations and blooms. It has also been constructed as a future repository of datasets, thus allowing retrospective analyses of the baseline and trends in global gelatinous zooplankton populations to be conducted in the future. access_formats=.htmlTable,.csv,.json,.mat,.nc,.tsv,.esriCsv,.geoJson acquisition_description=This information has been synthesized by members of the Global Jellyfish Group from online databases, unpublished and published datasets. More specific details may be found in\u00a0"%5C%22http://dmoserv3.bco-%0Admo.org/data_docs/JeDI/Lucas_et_al_2014_GEB.pdf%5C%22">Lucas, C.J., et al. 2014. Gelatinous zooplankton biomass in the global oceans: geographic variation and environmental drivers. Global Ecol. Biogeogr. (DOI: 10.1111/geb.12169) in the\u00a0methods section. awards_0_award_nid=54810 awards_0_award_number=OCE-1030149 awards_0_data_url=http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=1030149 awards_0_funder_name=NSF Division of Ocean Sciences awards_0_funding_acronym=NSF OCE awards_0_funding_source_nid=355 awards_0_program_manager=David L. Garrison awards_0_program_manager_nid=50534 cdm_data_type=Other comment=JeDI: Jellyfish Database Initiative, associated with the Trophic BATS project PIs: R. Condon, C. Lucas, C. Duarte, K. Pitt version 2015.01.08 Note: The displayed view of this dataset is subject to updates Note: Duplicate records were removed on 2015.01.08 See: Dataset term legend for full text of abbreviations. Conventions=COARDS, CF-1.6, ACDD-1.3 data_source=extract_data_as_tsv version 2.3 19 Dec 2019 defaultDataQuery=&time<now doi=10.1575/1912/7191 Easternmost_Easting=180.0 geospatial_lat_max=88.74 geospatial_lat_min=-78.5 geospatial_lat_units=degrees_north geospatial_lon_max=180.0 geospatial_lon_min=-180.0 geospatial_lon_units=degrees_east geospatial_vertical_max=7632.0 geospatial_vertical_min=-10191.48 geospatial_vertical_positive=down geospatial_vertical_units=m infoUrl=https://www.bco-dmo.org/dataset/526852 institution=BCO-DMO metadata_source=https://www.bco-dmo.org/api/dataset/526852 Northernmost_Northing=88.74 param_mapping={'526852': {'lat': 'master - latitude', 'depth': 'master - depth', 'lon': 'master - longitude'}} parameter_source=https://www.bco-dmo.org/mapserver/dataset/526852/parameters people_0_affiliation=University of North Carolina - Wilmington people_0_affiliation_acronym=UNC-Wilmington people_0_person_name=Robert Condon people_0_person_nid=51335 people_0_role=Principal Investigator people_0_role_type=originator people_1_affiliation=University of Western Australia people_1_person_name=Carlos M. Duarte people_1_person_nid=526857 people_1_role=Co-Principal Investigator people_1_role_type=originator people_2_affiliation=National Oceanography Centre people_2_affiliation_acronym=NOC people_2_person_name=Cathy Lucas people_2_person_nid=526856 people_2_role=Co-Principal Investigator people_2_role_type=originator people_3_affiliation=Griffith University people_3_person_name=Kylie Pitt people_3_person_nid=526858 people_3_role=Co-Principal Investigator people_3_role_type=originator people_4_affiliation=Woods Hole Oceanographic Institution people_4_affiliation_acronym=WHOI BCO-DMO people_4_person_name=Danie Kinkade people_4_person_nid=51549 people_4_role=BCO-DMO Data Manager people_4_role_type=related project=Trophic BATS projects_0_acronym=Trophic BATS projects_0_description=Fluxes of particulate carbon from the surface ocean are greatly influenced by the size, taxonomic composition and trophic interactions of the resident planktonic community. Large and/or heavily-ballasted phytoplankton such as diatoms and coccolithophores are key contributors to carbon export due to their high sinking rates and direct routes of export through large zooplankton. The potential contributions of small, unballasted phytoplankton, through aggregation and/or trophic re-packaging, have been recognized more recently. This recognition comes as direct observations in the field show unexpected trends. In the Sargasso Sea, for example, shallow carbon export has increased in the last decade but the corresponding shift in phytoplankton community composition during this time has not been towards larger cells like diatoms. Instead, the abundance of the picoplanktonic cyanobacterium, Synechococccus, has increased significantly. The trophic pathways that link the increased abundance of Synechococcus to carbon export have not been characterized. These observations helped to frame the overarching research question, "How do plankton size, community composition and trophic interactions modify carbon export from the euphotic zone". Since small phytoplankton are responsible for the majority of primary production in oligotrophic subtropical gyres, the trophic interactions that include them must be characterized in order to achieve a mechanistic understanding of the function of the biological pump in the oligotrophic regions of the ocean. This requires a complete characterization of the major organisms and their rates of production and consumption. Accordingly, the research objectives are: 1) to characterize (qualitatively and quantitatively) trophic interactions between major plankton groups in the euphotic zone and rates of, and contributors to, carbon export and 2) to develop a constrained food web model, based on these data, that will allow us to better understand current and predict near-future patterns in export production in the Sargasso Sea. The investigators will use a combination of field-based process studies and food web modeling to quantify rates of carbon exchange between key components of the ecosystem at the Bermuda Atlantic Time-series Study (BATS) site. Measurements will include a novel DNA-based approach to characterizing and quantifying planktonic contributors to carbon export. The well-documented seasonal variability at BATS and the occurrence of mesoscale eddies will be used as a natural laboratory in which to study ecosystems of different structure. This study is unique in that it aims to characterize multiple food web interactions and carbon export simultaneously and over similar time and space scales. A key strength of the proposed research is also the tight connection and feedback between the data collection and modeling components. Characterizing the complex interactions between the biological community and export production is critical for predicting changes in phytoplankton species dominance, trophic relationships and export production that might occur under scenarios of climate-related changes in ocean circulation and mixing. The results from this research may also contribute to understanding of the biological mechanisms that drive current regional to basin scale variability in carbon export in oligotrophic gyres. projects_0_end_date=2014-09 projects_0_geolocation=Sargasso Sea, BATS site projects_0_name=Plankton Community Composition and Trophic Interactions as Modifiers of Carbon Export in the Sargasso Sea projects_0_project_nid=2150 projects_0_start_date=2010-10 sourceUrl=(local files) Southernmost_Northing=-78.5 standard_name_vocabulary=CF Standard Name Table v55 version=1 Westernmost_Easting=-180.0 xml_source=osprey2erddap.update_xml() v1.3

  5. Z

    Data from: Open-data release of aggregated Australian school-level...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Monteiro Lobato, (2020). Open-data release of aggregated Australian school-level information. Edition 2016.1 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_46086
    Explore at:
    Dataset updated
    Jan 24, 2020
    Authors
    Monteiro Lobato,
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Australia
    Description

    The file set is a freely downloadable aggregation of information about Australian schools. The individual files represent a series of tables which, when considered together, form a relational database. The records cover the years 2008-2014 and include information on approximately 9500 primary and secondary school main-campuses and around 500 subcampuses. The records all relate to school-level data; no data about individuals is included. All the information has previously been published and is publicly available but it has not previously been released as a documented, useful aggregation. The information includes: (a) the names of schools (b) staffing levels, including full-time and part-time teaching and non-teaching staff (c) student enrolments, including the number of boys and girls (d) school financial information, including Commonwealth government, state government, and private funding (e) test data, potentially for school years 3, 5, 7 and 9, relating to an Australian national testing programme know by the trademark 'NAPLAN'

    Documentation of this Edition 2016.1 is incomplete but the organization of the data should be readily understandable to most people. If you are a researcher, the simplest way to study the data is to make use of the SQLite3 database called 'school-data-2016-1.db'. If you are unsure how to use an SQLite database, ask a guru.

    The database was constructed directly from the other included files by running the following command at a command-line prompt: sqlite3 school-data-2016-1.db < school-data-2016-1.sql Note that a few, non-consequential, errors will be reported if you run this command yourself. The reason for the errors is that the SQLite database is created by importing a series of '.csv' files. Each of the .csv files contains a header line with the names of the variable relevant to each column. The information is useful for many statistical packages but it is not what SQLite expects, so it complains about the header. Despite the complaint, the database will be created correctly.

    Briefly, the data are organized as follows. (a) The .csv files ('comma separated values') do not actually use a comma as the field delimiter. Instead, the vertical bar character '|' (ASCII Octal 174 Decimal 124 Hex 7C) is used. If you read the .csv files using Microsoft Excel, Open Office, or Libre Office, you will need to set the field-separator to be '|'. Check your software documentation to understand how to do this. (b) Each school-related record is indexed by an identifer called 'ageid'. The ageid uniquely identifies each school and consequently serves as the appropriate variable for JOIN-ing records in different data files. For example, the first school-related record after the header line in file 'students-headed-bar.csv' shows the ageid of the school as 40000. The relevant school name can be found by looking in the file 'ageidtoname-headed-bar.csv' to discover that the the ageid of 40000 corresponds to a school called 'Corpus Christi Catholic School'. (3) In addition to the variable 'ageid' each record is also identified by one or two 'year' variables. The most important purpose of a year identifier will be to indicate the year that is relevant to the record. For example, if one turn again to file 'students-headed-bar.csv', one sees that the first seven school-related records after the header line all relate to the school Corpus Christi Catholic School with ageid of 40000. The variable that identifies the important differences between these seven records is the variable 'studentyear'. 'studentyear' shows the year to which the student data refer. One can see, for example, that in 2008, there were a total of 410 students enrolled, of whom 185 were girls and 225 were boys (look at the variable names in the header line). (4) The variables relating to years are given different names in each of the different files ('studentsyear' in the file 'students-headed-bar.csv', 'financesummaryyear' in the file 'financesummary-headed-bar.csv'). Despite the different names, the year variables provide the second-level means for joining information acrosss files. For example, if you wanted to relate the enrolments at a school in each year to its financial state, you might wish to JOIN records using 'ageid' in the two files and, secondarily, matching 'studentsyear' with 'financialsummaryyear'. (5) The manipulation of the data is most readily done using the SQL language with the SQLite database but it can also be done in a variety of statistical packages. (6) It is our intention for Edition 2016-2 to create large 'flat' files suitable for use by non-researchers who want to view the data with spreadsheet software. The disadvantage of such 'flat' files is that they contain vast amounts of redundant information and might not display the data in the form that the user most wants it. (7) Geocoding of the schools is not available in this edition. (8) Some files, such as 'sector-headed-bar.csv' are not used in the creation of the database but are provided as a convenience for researchers who might wish to recode some of the data to remove redundancy. (9) A detailed example of a suitable SQLite query can be found in the file 'school-data-sqlite-example.sql'. The same query, used in the context of analyses done with the excellent, freely available R statistical package (http://www.r-project.org) can be seen in the file 'school-data-with-sqlite.R'.

  6. Z

    SCAR Southern Ocean Diet and Energetics Database

    • data.niaid.nih.gov
    • data.aad.gov.au
    • +3more
    Updated Jul 24, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Scientific Committee on Antarctic Research (2023). SCAR Southern Ocean Diet and Energetics Database [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5072527
    Explore at:
    Dataset updated
    Jul 24, 2023
    Authors
    Scientific Committee on Antarctic Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Southern Ocean
    Description

    Information related to diet and energy flow is fundamental to a diverse range of Antarctic and Southern Ocean biological and ecosystem studies. This metadata record describes a database of such information being collated by the SCAR Expert Groups on Antarctic Biodiversity Informatics (EG-ABI) and Birds and Marine Mammals (EG-BAMM) to assist the scientific community in this work. It includes data related to diet and energy flow from conventional (e.g. gut content) and modern (e.g. molecular) studies, stable isotopes, fatty acids, and energetic content. It is a product of the SCAR community and open for all to participate in and use.

    Data have been drawn from published literature, existing trophic data collections, and unpublished data. The database comprises five principal tables, relating to (i) direct sampling methods of dietary assessment (e.g. gut, scat, and bolus content analyses, stomach flushing, and observed predation), (ii) stable isotopes, (iii) lipids, (iv) DNA-based diet assessment, and (v) energetics values. The schemas of these tables are described below, and a list of the sources used to populate the tables is provided with the data.

    A range of manual and automated checks were used to ensure that the entered data were as accurate as possible. These included visual checking of transcribed values, checking of row or column sums against known totals, and checking for values outside of allowed ranges. Suspicious entries were re-checked against original source.

    Notes on names: Names have been validated against the World Register of Marine Species (http://www.marinespecies.org/). For uncertain taxa, the most specific taxonomic name has been used (e.g. prey reported in a study as "Pachyptila sp." will appear here as "Pachyptila"; "Cephalopods" will appear as "Cephalopoda"). Uncertain species identifications (e.g. "Notothenia rossii?" or "Gymnoscopelus cf. piabilis") have been assigned the genus name (e.g. "Notothenia", "Gymnoscopelus"). Original names have been retained in a separate column to allow future cross-checking. WoRMS identifiers (APHIA_ID numbers) are given where possible.

    Grouped prey data in the diet sample table need to be handled with a bit of care. Papers commonly report prey statistics aggregated over groups of prey - e.g. one might give the diet composition by individual cephalopod prey species, and then an overall record for all cephalopod prey. The PREY_IS_AGGREGATE column identifies such records. This allows us to differentiate grouped data like this from unidentified prey items from a certain prey group - for example, an unidentifiable cephalopod record would be entered as Cephalopoda (the scientific name), with "N" in the PREY_IS_AGGREGATE column. A record that groups together a number of cephalopod records, possibly including some unidentifiable cephalopods, would also be entered as Cephalopoda, but with "Y" in the PREY_IS_AGGREGATE column. See the notes on PREY_IS_AGGREGATE, below.

    There are two related R packages that provide data access and functionality for working with these data. See the package home pages for more information: https://github.com/SCAR/sohungry and https://github.com/SCAR/solong.

    Data table schemas

    Sources data table

    • SOURCE_ID: The unique identifier of this source

    • DETAILS: The bibliographic details for this source (e.g. "Hindell M (1988) The diet of the royal penguin Eudyptes schlegeli at Macquarie Island. Emu 88:219–226")

    • NOTES: Relevant notes about this source – if it’s a published paper, this is probably the abstract

    • DOI: The DOI of the source (paper or dataset), in the form "10.xxxx/yyyy"

    Diet data table

    • RECORD_ID: The unique identifier of this record

    • SOURCE_ID: The identifier of the source study from which this record was obtained (see corresponding entry in the sources data table)

    • SOURCE_DETAILS, SOURCE_DOI: The details and DOI of the source, copied from the sources data table for convenience

    • ORIGINAL_RECORD_ID: The identifier of this data record in its original source, if it had one

    • LOCATION: The name of the location at which the data was collected

    • WEST: The westernmost longitude of the sampling region, in decimal degrees (negative values for western hemisphere longitudes)

    • EAST: The easternmost longitude of the sampling region, in decimal degrees (negative values for western hemisphere longitudes)

    • SOUTH: The southernmost latitude of the sampling region, in decimal degrees (negative values for southern hemisphere latitudes)

    • NORTH: The northernmost latitude of the sampling region, in decimal degrees (negative values for southern hemisphere latitudes)

    • ALTITUDE_MIN: The minimum altitude of the sampling region, in metres

    • ALTITUDE_MAX: The maximum altitude of the sampling region, in metres

    • DEPTH_MIN: The shallowest depth of the sampling, in metres

    • DEPTH_MAX: The deepest depth of the sampling, in metres

    • OBSERVATION_DATE_START: The start of the sampling period

    • OBSERVATION_DATE_END: The end of the sampling period. If sampling was carried out over multiple seasons (e.g. during January of 2002 and January of 2003), this will be the first and last dates (in this example, from 1-Jan-2002 to 31-Jan-2003)

    • PREDATOR_NAME: The name of the predator. This may differ from predator_name_original if, for example, taxonomy has changed since the original publication, if the original publication had spelling errors or used common (not scientific) names

    • PREDATOR_NAME_ORIGINAL: The name of the predator, as it appeared in the original source

    • PREDATOR_APHIA_ID: The numeric identifier of the predator in the WoRMS taxonomic register

    • PREDATOR_WORMS_RANK, PREDATOR_WORMS_KINGDOM, PREDATOR_WORMS_PHYLUM, PREDATOR_WORMS_CLASS, PREDATOR_WORMS_ORDER, PREDATOR_WORMS_FAMILY, PREDATOR_WORMS_GENUS: The taxonomic details of the predator, from the WoRMS taxonomic register

    • PREDATOR_GROUP_SOKI: A descriptive label of the group to which the predator belongs (currently used in the Southern Ocean Knowledge and Information wiki, http://soki.aq)

    • PREDATOR_LIFE_STAGE: Life stage of the predator, e.g. "adult", "chick", "larva", "juvenile". Note that if a food sample was taken from an adult animal, but that food was destined for a juvenile, then the life stage will be "juvenile" (this is common with seabirds feeding chicks)

    • PREDATOR_BREEDING_STAGE: Stage of the breeding season of the predator, if applicable, e.g. "brooding", "chick rearing", "nonbreeding", "posthatching"

    • PREDATOR_SEX: Sex of the predator: "male", "female", "both", or "unknown"

    • PREDATOR_SAMPLE_COUNT: The number of predators for which data are given. If (say) 50 predators were caught but only 20 analysed, this column will contain 20. For scat content studies, this will be the number of scats analysed

    • PREDATOR_SAMPLE_ID: The identifier of the predator(s). If predators are being reported at the individual level (i.e. PREDATOR_SAMPLE_COUNT = 1) then PREDATOR_SAMPLE_ID is the individual animal ID. Alternatively, if the data values being entered here are from a group of predators, then the PREDATOR_SAMPLE_ID identifies that group of predators. PREDATOR_SAMPLE_ID values are unique within a source (i.e. SOURCE_ID, PREDATOR_SAMPLE_ID pairs are globally unique). Rows with the same SOURCE_ID and PREDATOR_SAMPLE_ID values relate to the same predator individual or group of individuals, and so can be combined (e.g. for prey diversity analyses). Subsamples are indicated by a decimal number S.nnn, where S is the parent PREDATOR_SAMPLE_ID, and nnn (001-999) is the subsample number. Studies will sometimes report detailed prey information for a large sample, but then report prey information for various subsamples of that sample (e.g. broken down by predator sex, or sampling season). In the simplest case, the diet of each predator will be reported only once in the study, and in this scenario the PREDATOR_SAMPLE_ID values will simply be 1 to N (for N predators).

    • PREDATOR_SIZE_MIN, PREDATOR_SIZE_MAX, PREDATOR_SIZE_MEAN, PREDATOR_SIZE_SD: The minimum, maximum, mean, and standard deviation of the size of the predators in the sample

    • PREDATOR_SIZE_UNITS: The units of size (e.g. "mm")

    • PREDATOR_SIZE_NOTES: Notes on the predator size information, including a definition of what the size value represents (e.g. "total length", "standard length")

    • PREDATOR_MASS_MIN, PREDATOR_MASS_MAX, PREDATOR_MASS_MEAN, PREDATOR_MASS_SD: The minimum, maximum, mean, and standard deviation of the mass of the predators in the sample

    • PREDATOR_MASS_UNITS: The units of mass (e.g. "g", "kg")

    • PREDATOR_MASS_NOTES: Notes on the predator mass information, including a definition of what the mass value represents

    • PREY_NAME: The scientific name of the prey item (corrected, if necessary)

    • PREY_NAME_ORIGINAL: The name of the prey item, as it appeared in the original source

    PREY_APHIA_ID: The numeric identifier of the prey in the WoRMS taxonomic register

    • PREY_WORMS_RANK, PREY_WORMS_KINGDOM, PREY_WORMS_PHYLUM, PREY_WORMS_CLASS, PREY_WORMS_ORDER, PREY_WORMS_FAMILY, PREY_WORMS_GENUS: The taxonomic details of the prey, from the WoRMS taxonomic register

    • PREY_GROUP_SOKI: A descriptive label of the group to which the prey belongs (currently used in the Southern Ocean Knowledge and Information wiki, http://soki.aq)

    • PREY_IS_AGGREGATE: "Y" indicates that this row is an aggregation of other rows in this data source. For example, a study might give a number of individual squid species records, and then an overall squid record that encompasses the individual records. Use the PREY_IS_AGGREGATE information to avoid double-counting during analyses

    • PREY_LIFE_STAGE: Life stage of the prey (e.g. "adult", "chick", "larva")

    • PREY_SEX: The sex of the prey ("male", "female", "both", or "unknown"). Note that this is generally "unknown"

    • PREY_SAMPLE_COUNT: The number of prey individuals from which size and mass measurements were made (note: this is NOT the total number of individuals of

  7. g

    A multiproxy database of western North America Holocene paleoclimate records...

    • climatelibrary.ecc.gov.nt.ca
    Updated Apr 19, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). A multiproxy database of western North America Holocene paleoclimate records - Dataset - NWT Climate Change Library [Dataset]. https://climatelibrary.ecc.gov.nt.ca/dataset/a-multiproxy-database-of-western-north-america-holocene-paleoclimate-records
    Explore at:
    Dataset updated
    Apr 19, 2021
    Area covered
    Western North America
    Description

    Holocene climate reconstructions are useful for understanding the diverse features and spatial heterogeneity of past and future climate change. Here we present a database of western North American Holocene paleoclimate records. The database gathers paleoclimate time series from 184 terrestrial and marine sites, including 381 individual proxy records. The records span at least 4000 of the last 12 000 years (median duration of 10 725 years) and have been screened for resolution, chronologic control, and climate sensitivity. Records were included that reflect temperature, hydroclimate, or circulation features. The database is shared in the machine readable Linked Paleo Data (LiPD) format and includes geochronologic data for generating site-level time-uncertain ensembles. This publicly accessible and curated collection of proxy paleoclimate records will have wide research applications, including, for example, investigations of the primary features of ocean–atmospheric circulation along the eastern margin of the North Pacific and the latitudinal response of climate to orbital changes.

  8. d

    Data from: Site visit cross section surveys and multispectral image data...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Nov 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Site visit cross section surveys and multispectral image data from gaging stations throughout the Willamette and Delaware River Basins from 2022 and code for Bathymetric Mapping using Gage Records and Image Databases (BaMGRID) [Dataset]. https://catalog.data.gov/dataset/site-visit-cross-section-surveys-and-multispectral-image-data-from-gaging-stations-through
    Explore at:
    Dataset updated
    Nov 12, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Delaware River
    Description

    This data release includes cross section survey data collected during site visits to USGS gaging stations located throughout the Willamette and Delaware River Basins and multispectral images of these locations acquired as close in time as possible to the date of each site visit. In addition, MATLAB source code developed for the Bathymetric Mapping using Gage Records and Image Databases (BaMGRID) framework is also provided. The site visit data were obtained from the Aquarius Time Series database, part of the USGS National Water Information System (NWIS), using the Publish Application Programming Interface (API). More specifically, a custom MATLAB function was used to query the FieldVisitDataByLocationServiceRequest endpoint of the Aquarius API by specifying the gaging station ID number and the date range of interest and then retrieve the QRev XML attachments associated with site visits meeting these criteria. These XML files were then parsed using another custom MATLAB function that served to extract the cross section survey data collected during the site visit. Note that because many of the site visits involved surveying cross sections using instrumentation that was not GPS-enabled, latitude and longitude coordinates were not available and no data values (NaN) are used in the site visit files provided in this data release. Remotely sensed data acquired as close as possible to the date of each site visit were also retrieved via APIs. Multispectral satellite images from the PlanetScope constellation were obtained using custom MATLAB functions developed to interact with the Planet Orders API, which provided tools for clipping the images to a specified area of interest focused on the gaging station and harmonizing the pixel values to be consistent across the different satellites within the PlanetScope constellation. The data product retrieved was the PlanetScope orthorectified 8-band surface reflectance bundle. PlanetScope images are acquired with high frequency, often multiple times per day at a given location, and so the search was restricted to a time window spanning from three days prior to three days after the site visit. All images meeting these criteria were downloaded and manually inspected; the highest quality image closest in time to the site visit date was retained for further analysis. For the gaging stations within the Willamette River Basin, digital aerial photography acquired through the National Agricultural Imagery Program (NAIP) in 2022 were obtained using a similar set of MATLAB functions developed to access the USGS EarthExplorer Machine-to-Machine (M2M) API. The NAIP quarter-quadrangle image encompassing each gaging station was downloaded and then clipped to a smaller area centered on the gaging station. Only one NAIP image at each gaging station was acquired in 2022, so differences in streamflow between the image acquisition date and the date of the site visit closest in time were accounted for by performing separate NWIS web queries to retrieve the stage and discharge recorded at the gaging station on the date the image was acquired and on the date of the site visit. These data sets were used as an example application of the framework for Bathymetric Mapping using Gage Records and Image Databases (BaMGRID) and this data release also provides MATLAB source code developed to implement this approach. The code is packaged in a zip archive that includes the following individual .m files: 1) getSiteVisit.m, for retrieving data collected during site visits to USGS gaging stations through the Aquarius API; 2) Qrev2depth.m, for parsing the XML file from the site visit and extracting depth measurements surveyed along a channel cross section during a direct discharge measurement; 3) orderPlanet.m, for searching for and ordering PlanetScope images via the Planet Orders API; 4) pollThenGrabPlanet.m, for querying the status of an order and then downloading PlanetScope images requested through the Planet Orders API; 5) organizePlanet.m, for file management and cleanup of the original PlanetScope image data obtained via the previous two functions; 6) ingestNaip.m, for searching for, ordering, and downloading NAIP data via the USGS Machine-to-Machine (M2M) API; 7) naipExtractClip.m, for clipping the downloaded NAIP images to the specified area of interest and performing file management and cleanup; and 8) crossValObra.m, for performing spectrally based depth retrieval via the Optimal Band Ratio Analysis (OBRA) algorithm using a k-fold cross-validation approach intended for small sample sizes. The files provided through this data release include: 1) A zipped shapefile with polygons delineating the Willamette and Delaware River basins 2) .csv text files with information on site visits within each basin during 2022 3) .csv text files with information on PlanetScope images of each gaging station close in time to the date of each site visit that can be used to obtain the image data through the Planet Orders API or Planet Explorer web interface. 4) A .csv text tile with information on NAIP images of each gaging station in the Willamette River Basin as close in time as possible to the date of each site visit, along with the stage and discharge recorded at the gaging station on the date of image acquisition and the date of the site visit. 5) A zip archive of the clipped NAIP images of each gaging station in the Willamette River Basin in GeoTIFF format. 6) A zip archive with source code (MATLAB *.m files) developed to implement the Bathymetric Mapping using Gage Records and Image Databases (BaMGRID) framework.

  9. Methods for Specifying the Target Difference in a Randomised Controlled...

    • plos.figshare.com
    doc
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jenni Hislop; Temitope E. Adewuyi; Luke D. Vale; Kirsten Harrild; Cynthia Fraser; Tara Gurung; Douglas G. Altman; Andrew H. Briggs; Peter Fayers; Craig R. Ramsay; John D. Norrie; Ian M. Harvey; Brian Buckley; Jonathan A. Cook (2023). Methods for Specifying the Target Difference in a Randomised Controlled Trial: The Difference ELicitation in TriAls (DELTA) Systematic Review [Dataset]. http://doi.org/10.1371/journal.pmed.1001645
    Explore at:
    docAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Jenni Hislop; Temitope E. Adewuyi; Luke D. Vale; Kirsten Harrild; Cynthia Fraser; Tara Gurung; Douglas G. Altman; Andrew H. Briggs; Peter Fayers; Craig R. Ramsay; John D. Norrie; Ian M. Harvey; Brian Buckley; Jonathan A. Cook
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundRandomised controlled trials (RCTs) are widely accepted as the preferred study design for evaluating healthcare interventions. When the sample size is determined, a (target) difference is typically specified that the RCT is designed to detect. This provides reassurance that the study will be informative, i.e., should such a difference exist, it is likely to be detected with the required statistical precision. The aim of this review was to identify potential methods for specifying the target difference in an RCT sample size calculation.Methods and FindingsA comprehensive systematic review of medical and non-medical literature was carried out for methods that could be used to specify the target difference for an RCT sample size calculation. The databases searched were MEDLINE, MEDLINE In-Process, EMBASE, the Cochrane Central Register of Controlled Trials, the Cochrane Methodology Register, PsycINFO, Science Citation Index, EconLit, the Education Resources Information Center (ERIC), and Scopus (for in-press publications); the search period was from 1966 or the earliest date covered, to between November 2010 and January 2011. Additionally, textbooks addressing the methodology of clinical trials and International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH) tripartite guidelines for clinical trials were also consulted. A narrative synthesis of methods was produced. Studies that described a method that could be used for specifying an important and/or realistic difference were included. The search identified 11,485 potentially relevant articles from the databases searched. Of these, 1,434 were selected for full-text assessment, and a further nine were identified from other sources. Fifteen clinical trial textbooks and the ICH tripartite guidelines were also reviewed. In total, 777 studies were included, and within them, seven methods were identified—anchor, distribution, health economic, opinion-seeking, pilot study, review of the evidence base, and standardised effect size.ConclusionsA variety of methods are available that researchers can use for specifying the target difference in an RCT sample size calculation. Appropriate methods may vary depending on the aim (e.g., specifying an important difference versus a realistic difference), context (e.g., research question and availability of data), and underlying framework adopted (e.g., Bayesian versus conventional statistical approach). Guidance on the use of each method is given. No single method provides a perfect solution for all contexts.Please see later in the article for the Editors' Summary

  10. u

    ICA Records in Context - Examples

    • figshare.unimelb.edu.au
    zip
    Updated Mar 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Falk; Experts Group on Archival Description - ICA; https://orcid.org/0000-0001-5383-6993 (2025). ICA Records in Context - Examples [Dataset]. http://doi.org/10.26188/28069103.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 12, 2025
    Dataset provided by
    The University of Melbourne
    Authors
    Michael Falk; Experts Group on Archival Description - ICA; https://orcid.org/0000-0001-5383-6993
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This is an RO-Crate of a database from the Online Heritage Resource Manager, a databasing system from the eScholarhip Research Centre of the University of Melbourne. The crate has been built from a dump of the OHRM database. While we have made the best effort possible to create a valid RO-Crate, it is possible for various reasons that the crate contains broken links or improperly described files. If you are the owner of the original database, please contact the maintainer of this RO-Crate in Figshare.

    To view the contents of the crate, download the deposited file, unzip it, and open the ro-crate-preview.html file in the top directory.

  11. d

    Conversion Software Registry

    • dknet.org
    Updated Nov 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Conversion Software Registry [Dataset]. http://identifiers.org/RRID:SCR_007236
    Explore at:
    Dataset updated
    Nov 13, 2024
    Description

    Conversion Software Registry (CSR) has been designed for collecting information about software packages that are capable of file format conversions. The work is motivated by a community need for finding file format conversions inaccessible via current search engines and by the specific need to support systems that could actually perform conversions, such as the NCSA Polyglot. In addition, the value of CSR is in complementing the existing file format registries and introducing software quality information obtained by content-based comparisons of files before and after conversions. The contribution of this work is in the CSR data model design that includes file format extension based conversion, as well as software scripts, software quality measures and test file specific information for evaluating software quality. The CSR system serves as the source of information and a test bed for the system that can execute the conversions automatically by using the third party software, for example, NCSA Polyglot. The CSR system is a database with a web-based interface that provides services related to a) finding a conversion path between formats b) uploading information about the 3rd party software packages and file extensions, c) uploading files for testing, and finally d) uploading scripts in operating system (OS) specific scripting languages (Windows AutoHotKey, AppleScript and Perl) for automated conversions according to the idea of imposed code reuse used by NCSA Polyglot. In order to provide file format conversion services, CSR have included the following components into CSR related to software capable of conversions: input and output file formats (extensions), scripts operating on the software, validated files to be used for information loss measurements, as well as quantitative measures of the information loss for conversions. The CSR focuses: on software and finding the format conversion paths described by a number of software packages and unique input and output formats. The formats themselves are represented by extensions. While not always unique, extensions are often the only accessible information when the 3rd party software is installed (often listed under the File/Open menu in most packages). The CSR also contains information about the software, operating system, software interface and scripts to execute the software. The scripts are important for the automating conversions with the 3rd party software and can be implemented using AutoHotkey scripts (Windows), AppleScript (Mac) or one of a variety of scripting languages for Unix. The information loss due to file format conversions is measured externally by different techniques within the NCSA object-to-object comparison framework called Versus. The comparison is relevant to the software domain, for example for 3D applications surface area or spin images are used and the loss (0-100 range with 100 representing no loss) for a particular software-conversion pair is stored in the database. The information loss also represents edge weights to Input/Output (I/O) Graph, a simple workflow used for finding the shortest conversion path. The CSR is written as a web service. It consists of three main components: Query, Add, Edit. In the Query mode users can a) view list of all software packages with their conversion options, b) select subsets of software in the I/O-Graph, c) search the database by conversions, software, extensions, MIME and PUID. The I/O-Graph contains all information about installed applications and the conversions they allow. The JAVA applet front end is part of the CSR web visualization interface. Section Add allows users to add new software packages with their conversion capabilities and upload the software scripts to automate them. The last section, Edit is designed for adding detailed information about the software, extensions and for uploading the test files. CSR requires users to login for adding and editing. The web fields are auto completed to help search. Sponsors: This research was partially supported by a National Archive and Records Administration (NARA) supplement to NSF PACI cooperative agreement CA #SCI-9619019. Keywords: Software, Registry, Information, Conversion, Database, Tool

  12. d

    US Consumer Demographic Data - 269M+ Consumer Records - Programmatic Ads and...

    • datarade.ai
    Updated Jun 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giant Partners (2025). US Consumer Demographic Data - 269M+ Consumer Records - Programmatic Ads and Email Marketing Automation [Dataset]. https://datarade.ai/data-products/us-consumer-demographic-data-269m-consumer-records-progr-giant-partners
    Explore at:
    Dataset updated
    Jun 27, 2025
    Dataset authored and provided by
    Giant Partners
    Area covered
    United States of America
    Description

    Premium B2C Consumer Database - 269+ Million US Records

    Supercharge your B2C marketing campaigns with comprehensive consumer database, featuring over 269 million verified US consumer records. Our 20+ year data expertise delivers higher quality and more extensive coverage than competitors.

    Core Database Statistics

    Consumer Records: Over 269 million

    Email Addresses: Over 160 million (verified and deliverable)

    Phone Numbers: Over 76 million (mobile and landline)

    Mailing Addresses: Over 116,000,000 (NCOA processed)

    Geographic Coverage: Complete US (all 50 states)

    Compliance Status: CCPA compliant with consent management

    Targeting Categories Available

    Demographics: Age ranges, education levels, occupation types, household composition, marital status, presence of children, income brackets, and gender (where legally permitted)

    Geographic: Nationwide, state-level, MSA (Metropolitan Service Area), zip code radius, city, county, and SCF range targeting options

    Property & Dwelling: Home ownership status, estimated home value, years in residence, property type (single-family, condo, apartment), and dwelling characteristics

    Financial Indicators: Income levels, investment activity, mortgage information, credit indicators, and wealth markers for premium audience targeting

    Lifestyle & Interests: Purchase history, donation patterns, political preferences, health interests, recreational activities, and hobby-based targeting

    Behavioral Data: Shopping preferences, brand affinities, online activity patterns, and purchase timing behaviors

    Multi-Channel Campaign Applications

    Deploy across all major marketing channels:

    Email marketing and automation

    Social media advertising

    Search and display advertising (Google, YouTube)

    Direct mail and print campaigns

    Telemarketing and SMS campaigns

    Programmatic advertising platforms

    Data Quality & Sources

    Our consumer data aggregates from multiple verified sources:

    Public records and government databases

    Opt-in subscription services and registrations

    Purchase transaction data from retail partners

    Survey participation and research studies

    Online behavioral data (privacy compliant)

    Technical Delivery Options

    File Formats: CSV, Excel, JSON, XML formats available

    Delivery Methods: Secure FTP, API integration, direct download

    Processing: Real-time NCOA, email validation, phone verification

    Custom Selections: 1,000+ selectable demographic and behavioral attributes

    Minimum Orders: Flexible based on targeting complexity

    Unique Value Propositions

    Dual Spouse Targeting: Reach both household decision-makers for maximum impact

    Cross-Platform Integration: Seamless deployment to major ad platforms

    Real-Time Updates: Monthly data refreshes ensure maximum accuracy

    Advanced Segmentation: Combine multiple targeting criteria for precision campaigns

    Compliance Management: Built-in opt-out and suppression list management

    Ideal Customer Profiles

    E-commerce retailers seeking customer acquisition

    Financial services companies targeting specific demographics

    Healthcare organizations with compliant marketing needs

    Automotive dealers and service providers

    Home improvement and real estate professionals

    Insurance companies and agents

    Subscription services and SaaS providers

    Performance Optimization Features

    Lookalike Modeling: Create audiences similar to your best customers

    Predictive Scoring: Identify high-value prospects using AI algorithms

    Campaign Attribution: Track performance across multiple touchpoints

    A/B Testing Support: Split audiences for campaign optimization

    Suppression Management: Automatic opt-out and DNC compliance

    Pricing & Volume Options

    Flexible pricing structures accommodate businesses of all sizes:

    Pay-per-record for small campaigns

    Volume discounts for large deployments

    Subscription models for ongoing campaigns

    Custom enterprise pricing for high-volume users

    Data Compliance & Privacy

    VIA.tools maintains industry-leading compliance standards:

    CCPA (California Consumer Privacy Act) compliant

    CAN-SPAM Act adherence for email marketing

    TCPA compliance for phone and SMS campaigns

    Regular privacy audits and data governance reviews

    Transparent opt-out and data deletion processes

    Getting Started

    Our data specialists work with you to:

    1. Define your target audience criteria

    2. Recommend optimal data selections

    3. Provide sample data for testing

    4. Configure delivery methods and formats

    5. Implement ongoing campaign optimization

    Why We Lead the Industry

    With over two decades of data industry experience, we combine extensive database coverage with advanced targeting capabilities. Our commitment to data quality, compliance, and customer success has made us the preferred choice for businesses seeking superior B2C marketing performance.

    Contact our team to discuss your specific ta...

  13. Distribution of key characteristics in the overall North American dataset...

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nandi Siegfried; Michael Clarke; Jimmy Volmink; Lize Van der Merwe (2023). Distribution of key characteristics in the overall North American dataset (785) and the random sample (150) of potentially eligible North American trials records. [Dataset]. http://doi.org/10.1371/journal.pone.0003491.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Nandi Siegfried; Michael Clarke; Jimmy Volmink; Lize Van der Merwe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    *Null hypothesis: there was no difference between the number of records identified from each of the seven databases searched for the sampled and not sampled groups. The p-value is testing that the number of records identified in at least one of the seven different databases did not differ from that of at least one other database.

  14. BioSharing: examples of use in NIH BD2K CEDAR, BioCADDIE and ELIXIR

    • figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alejandra Gonzalez-Beltran; Peter McQuilton; Allyson L. Lister; Milo Thurston; Susanna-Assunta Sansone; Philippe Rocca-Serra (2023). BioSharing: examples of use in NIH BD2K CEDAR, BioCADDIE and ELIXIR [Dataset]. http://doi.org/10.6084/m9.figshare.1599797.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Alejandra Gonzalez-Beltran; Peter McQuilton; Allyson L. Lister; Milo Thurston; Susanna-Assunta Sansone; Philippe Rocca-Serra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BioSharing (http://www.biosharing.org) is a curated, web-based, searchable portal of over 1,300 records describing content standards, databases and data policies in the life sciences, broadly encompassing the biological, natural and biomedical sciences. Among many features, the records can be searched and filtered, or grouped via the ‘Collection’ feature according to field of interest. An example is the Collection curated with the NIH BD2K bioCADDIE project, for various purposes. First, to select and track content standards that have been reviewed during the creation of the metadata model underpinning the Data Discovery Index. Second, as the work progresses and the prototype Index harvests dataset descriptors from different databases, the Collection will be extended to include the descriptions of these databases, including which (if any) standards they implement. This is key to support one of the bioCADDIE project use cases: to allow the searching and filtering of datasets that are compliant to a given community standard. Despite a growing set of standard guidelines and formats for describing their experiments, the barriers to authoring the experimental metadata necessary for sharing and interpreting datasets are tremendously high. Understanding how to comply with these standards takes time and effort and researchers view this as a burden that may benefit other scientists, but not themselves. To tackle this, with and for the NIH BD2K CEDAR project, we will explore methods to serve machine-readable versions of these standards that can inform the creation of metadata templates, rendering standards invisible to the researchers and driving them to strive for easier authoring of the experimental metadata. Lastly, as part of the ELIXIR-UK Node BioSharing is being developed to be the ELIXIR Standards Registry and will be progressively cross-linked to other registries, such as the ELIXIR Tools and Services Registry and the ELIXIR Training e-Support System (TeSS).

  15. f

    Data from: The prognosis of glioblastoma: a large, multifactorial study

    • tandf.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated Feb 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chen Luo; Kun Song; Shuai Wu; N. U. Farrukh Hameed; Nijiati Kudulaiti; Hao Xu; Zhi-Yong Qin; Jin-Song Wu (2024). The prognosis of glioblastoma: a large, multifactorial study [Dataset]. http://doi.org/10.6084/m9.figshare.14932954.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    Feb 14, 2024
    Dataset provided by
    Taylor & Francis
    Authors
    Chen Luo; Kun Song; Shuai Wu; N. U. Farrukh Hameed; Nijiati Kudulaiti; Hao Xu; Zhi-Yong Qin; Jin-Song Wu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Glioblastoma is the most common and fatal primary brain tumor in adults. Even with maximal resection and a series of postoperative adjuvant treatments, the median overall survival (OS) of glioblastoma patients remains approximately 15 months. The Huashan Hospital glioma bank contains more than 2000 glioma tissue samples with long-term follow-up data; almost half of these samples are from glioblastoma patients. Several large glioma databases with long-term follow-up data have reported outcomes of glioblastoma patients from countries other than China. We investigated the prognosis of glioblastoma patients in China and compared the survival outcomes among patients from different databases. The data for 967 glioblastoma patients who underwent surgery at Huashan Hospital and had long-term follow-up records were obtained from our glioma registry (diagnosed from 29 March 2010, through 7 June 2017). Patients were eligible for inclusion if they underwent surgical resection for newly diagnosed glioblastomas and had available data of survival and personal information. Data of 778 glioblastoma patients were collected from three separate online databases (448 patients from The Cancer Genome Atlas (TCGA, https://cancergenome.nih.gov), 191 from REpository for Molecular BRAin Neoplasia DaTa (REMBRANDT) database (GSE108476) and 132 from data set GSE16011(Hereafter called as the French database). We compared the prognosis of glioblastoma patients from records among the different databases and the changes in survival outcomes of glioblastoma patients from Huashan Hospital over an 8-year period. The median OS of glioblastoma patients was 16.3 (95% CI: 15.4–17.2) months for Huashan Hospital, 13.8 (95% CI: 12.9–14.9) months for TCGA, 19.3 (95% CI: 17.0–20.0) months for the REMBRANDT database, and 9.1 months for the French database. The median OS of glioblastoma patients from Huashan Hospital improved from 15.6 (2010–2013, 95% CI: 14.4–16.6) months to 18.2 (2014–2017, 95% CI: 15.8–20.6) months over the study period (2010–2017). In addition, the prognosis of glioblastoma patients with total resection was significantly better than that of glioblastoma patients with sub-total resection or biopsy. Our study confirms that treatment centered around maximal surgical resection brought survival benefits to glioblastoma patients after adjusting to validated prognostic factors. In addition, an improvement in prognosis was observed among glioblastoma patients from Huashan Hospital over the course of our study. We attributed it to the adoption of a new standard of neurosurgical treatment on the basis of neurosurgical multimodal technologies. Even though the prognosis of glioblastoma patients remains poor, gradual progress is being made.

  16. n

    Biogeography of the world’s worst invasive species has spatially-biased...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Feb 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Jenkins; Hannah Bevan; Wei Chen; Jacob Hart; Amanda Lindsay; Laura Macamo; Mekail Negash; Leo Ohyama; Alessandra Pandolfi; George Zaragoza (2024). Biogeography of the world’s worst invasive species has spatially-biased knowledge gaps but is predictable [Dataset]. http://doi.org/10.5061/dryad.zw3r228bh
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 29, 2024
    Dataset provided by
    University of Florida
    University of Central Florida
    Authors
    David Jenkins; Hannah Bevan; Wei Chen; Jacob Hart; Amanda Lindsay; Laura Macamo; Mekail Negash; Leo Ohyama; Alessandra Pandolfi; George Zaragoza
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    The world’s “100 worst invasive species” were listed in 2000. The list is taxonomically diverse and often cited (typically for single-species studies), and its species are frequently reported in global biodiversity databases. We acted on the principle that these notorious species should be well-reported to help answer two questions about global biogeography of invasive species (i.e., not just their invaded ranges): (1) “how are data distributed globally?” and (2) “what predicts diversity?” We collected location data for each of the 100 species from multiple databases; 95 had sufficient data for analyses. For question (1), we mapped global species richness and cumulative occurrences since 2000 in (0.5 degree)2 grids. For question (2) we compared alternative regression models representing non-exclusive hypotheses for geography (i.e., spatial autocorrelation), sampling effort, climate, and anthropocentric effects. Reported locations of the invasive species were spatially-biased, leaving large gaps on multiple continents. Accordingly, species richness was best explained by both anthropocentric effects not often used in biogeographic models (Government Effectiveness, Voice & Accountability, human population size) and typical natural factors (climate, geography; R2 = 0.87). Cumulative occurrence was strongly related to anthropocentric effects (R2 = 0.62). We extract five lessons for invasive species biogeography; foremost is the importance of anthropocentric measures for understanding invasive species diversity patterns and large lacunae in their known global distributions. Despite those knowledge gaps, advanced models here predict well the biogeography of the world’s worst invasive species for much of the world. Methods Data Acquisition and Processing Data were acquired from multiple data bases for the 100 invasive species in February 2022 using the spocc package in R (Chamberlain 2021). Data sources (in alphabetical order) included: the Atlas of Living Australia ('ALA'; https://www.ala.org.au); eBird (http://www.ebird.org/home; Sullivan et al. 2009); the Integrated Digitized Biocollections ('iDigBio'; https://www.idigbio.org; Matsunaga et al. 2013); the Global Biodiversity Information Facility (GBIF (https://www.gbif.org); Ocean 'Biogeographic' Information System ('OBIS'; https://portal.obis.org; Grassle and Stocks 1999); VertNet (https://vertnet.org; Constable et al. 2010); and the US Geological Survey’s Biodiversity Information Serving Our Nation ('BISON'; replaced December 2021 by GBIF). Several databases set limits to 100,000 initial point records (before cleaning, described below) when accessed using spocc. As a result, data for 19 species with >100,000 point records (e.g., the European starling (Sturnus vulgaris Linnaeus) had >23 million point records) were obtained directly from GBIF on 23–25 February 2022, which included records already contributed to GBIF from multiple databases. All searches were based on genus and species epithets, where taxonomic changes in the historical records required decisions. Where an epithet changed since the 100 species were listed in 2000 (Lowe et al. 2000), both former and current names were searched and concatenated. For example, Lowe et al. (2000) listed the American bullfrog as Rana catesbeiana Shaw, 1802 which is now Lithobates catesbeianus (Shaw, 1802); both were included in searches, as well as synonyms. Taxonomic synonyms were resolved by referring to the Catalogue of Life (https://www.catalogueoflife.org/) Centre for Agriculture and Bioscience International (http://www.cabi.org) and World Flora Online (http://worldfloraonline.org). Listed synonyms and new combinations were included in data, whereas undocumented synonyms (i.e., provided in a database but not resolved above) were excluded. Database entries that lacked species epithets (i.e., genus only) were excluded and all identities were at the species level. Some taxa formerly identified in (Lowe et al. 2000) as a species are now subspecies (e.g., the red-ear slider Trachemys stricta (Thunberg in Schoepff, 1792) is now Trachemys stricta elegans (Wied-Neuwied, 1838)). For those taxa, data may be more inclusive in current taxonomy than the original intent. However, our use of species-level identities includes sub-specific hybrids (e.g., Parham et al. 2020). Overall, our approach: matched the taxonomic resolution of (Lowe et al. 2000); recognized variation through time and space; and included potential hybrids among subspecies. We set a threshold for a species to be included in analyses at > 30 records because we judged distributions with fewer records to be inadequately represented. As a result, four species (notably disease agents or vectors) had too few data to be analyzed here: Aphanomyces astaci Schikora, 1906, Cinara cupressi (Buckton, 1881), Plasmodium relictum (Grassi & Feletti, 1891), and Trogoderma granarium Everts, 1898. Likewise, banana bunchy top virus was not present in databases, despite a reported global distribution (https://www.cabi.org/isc/datasheet/8161). As a potential alternative, we searched for its aphid vector (Pentalonia nigronervosa Coquerel, 1859) but obtained records that fully lacked Africa and Asia, despite the widespread tropical distribution of the virus. We thus treated banana bunchy top virus as an under-reported species and omitted it here. Finally, rinderpest was listed by Lowe et al. (2000) but since eradicated (Morens et al. 2011). Following Luque et al. (2014), we replaced rinderpest with Salvinia molesta D. S. Mitch, leaving 95 species to evaluate. Species data were cleaned using two R packages. The scrubr package (https://github.com/ropensci/scrubr) was used with default settings to exclude records with geographic coordinates that were lacking, impossible, incomplete, imprecise, or unlikely. Data were further cleaned using the CoordinateCleaner package (Zizka et al. 2019), where records were excluded if geographic coordinates were zero (i.e., a flag for probable data error), near a country’s capital and geographic centroid, or near administrative locations (e.g., museums, GBIF headquarters). Data were then restricted to unique spatio-temporal records during the years 2000–2021 to exclude duplicate entries. This step also omitted older records that tend to have greater taxonomic and geographic uncertainty (e.g., GPS selective availability was removed in 2000). Finally, resulting maps were visually examined, where oddities (e.g., a tropical species located on Baffin Island or a terrestrial species in mid-ocean) were manually excluded from data. That last step removed a few locations per species, if any. As a result of the above process, data were cleaned to be conservative for errors in geographic distribution and consistent in taxonomy with Lowe et al. (2000). Aggregation and Mapping We spatially aggregated point data per species in (0.5 degree)2 grid cells, using the World Geodetic System (WGS84); the basis for the geospatial positioning system. Thus analyses below and summarized data refer to (0.5 degree)2 grid cells as units of study. Aggregation in space simplified variable coordinate accuracy in original records while retaining substantial resolution for global analyses. For two reasons that affected interpretations, we also aggregated data in time by pooling all records obtained for years 2000-2021. First, species richness is then based on presence/absence of reported species at any time during 22 consecutive years and should be sensitive to infrequent observations or occurrences. We reasoned that species absence maintained through two decades was either: (a) likely true or (b) due to lack of submitted records for that location, where the difference may be inferred from spatial patterns of records. Secondly, the difference between species richness (i.e., presence/absence) and cumulative occurrence was enhanced. Species richness is fully insensitive to commonality or rarity; a single record here obtains the same result as daily repeats for 22 years. In contrast, cumulative occurrences may range from 0 to thousands of (0.5 degree)2 pixels during 22 years and could indicate commonality or rarity. Therefore, fundamental differences between species richness and occurrences were enhanced here by using data for years 2000–2021. We mapped species richness and cumulative records to address question 1 (“how are data distributed globally?”). Potential Predictors of Invasive Species We analyzed spatial autocorrelation (using longitude and latitude of 0.5o grid cell centers) with local estimation (“loess”) regression to obtain a surface representing only geographic coordinate effects. Loess regression is a robust, nonparametric approach to represent a complex spatial surface (Ferrier et al. 2002, Helsel and Ryker 2002) and is not too computationally-intensive for fine-grained global data, unlike approaches based on covariance matrices or network meshes. The spatial texture of a loess regression surface is determined by its span, where values <1 have more texture and values >1 are smoother. We modeled species richness and cumulative records using the loess command in R, with degree = 2 (i.e., 2nd-order polynomial) and least-squares fitting. We iteratively adjusted span to minimize the residual standard error and maximize the correlation between predicted values and actual total records. Predicted values represented spatial autocorrelation alone. Subsequent models using additional predictors (Table 1) included predictions from the loess model to evaluate those additional effects after already accounting for spatial autocorrelation. In addition, hierarchical structure (i.e., non-independence of grids) of grid cells within countries and anthropogenic biomes (anthromes) was handled using spatial GLMMs (Dormann et al. 2007). All other predictors were matched to the 0.5° gridded species data using projectRaster in the R raster package. Climate conditions

  17. INSDC Environment Sample Sequences

    • gbif.org
    • demo.gbif.org
    • +1more
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Bioinformatics Institute (EMBL-EBI); European Bioinformatics Institute (EMBL-EBI) (2025). INSDC Environment Sample Sequences [Dataset]. http://doi.org/10.15468/mcmd5g
    Explore at:
    Dataset updated
    Nov 29, 2025
    Dataset provided by
    European Bioinformatics Institutehttp://www.ebi.ac.uk/
    Global Biodiversity Information Facilityhttps://www.gbif.org/
    Authors
    European Bioinformatics Institute (EMBL-EBI); European Bioinformatics Institute (EMBL-EBI)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    This dataset contains INSDC sequences associated with environmental sample identifiers. The dataset is prepared periodically using the public ENA API (https://www.ebi.ac.uk/ena/portal/api/) by querying data with the search parameters: `environmental_sample=True & host=""`

    EMBL-EBI also publishes other records in separate datasets (https://www.gbif.org/publisher/ada9d123-ddb4-467d-8891-806ea8d94230).

    The data was then processed as follows:

    1. Human sequences were excluded.

    2. For non-CONTIG records, the sample accession number (when available) along with the scientific name were used to identify sequence records corresponding to the same individuals (or group of organism of the same species in the same sample). Only one record was kept for each scientific name/sample accession number.

    3. Contigs and whole genome shotgun (WGS) records were added individually.

    4. The records that were missing some information were excluded. Only records associated with a specimen voucher or records containing both a location AND a date were kept.

    5. The records associated with the same vouchers are aggregated together.

    6. A lot of records left corresponded to individual sequences or reads corresponding to the same organisms. In practise, these were "duplicate" occurrence records that weren't filtered out in STEP 2 because the sample accession sample was missing. To identify those potential duplicates, we grouped all the remaining records by `scientific_name`, `collection_date`, `location`, `country`, `identified_by`, `collected_by` and `sample_accession` (when available). Then we excluded the groups that contained more than 50 records. The rationale behind the choice of threshold is explained here: https://github.com/gbif/embl-adapter/issues/10#issuecomment-855757978

    7. To improve the matching of the EBI scientific name to the GBIF backbone taxonomy, we incorporated the ENA taxonomic information. The kingdom, Phylum, Class, Order, Family, and genus were obtained from the ENA taxonomy checklist available here: http://ftp.ebi.ac.uk/pub/databases/ena/taxonomy/sdwca.zip

    More information available here: https://github.com/gbif/embl-adapter#readme

    You can find the mapping used to format the EMBL data to Darwin Core Archive here: https://github.com/gbif/embl-adapter/blob/master/DATAMAPPING.md

  18. ECG Lead 2 Dataset PhysioNet (Open Access)

    • kaggle.com
    zip
    Updated Jul 5, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nelson Sharma (2020). ECG Lead 2 Dataset PhysioNet (Open Access) [Dataset]. https://www.kaggle.com/nelsonsharma/ecg-lead-2-dataset-physionet-open-access
    Explore at:
    zip(258855576 bytes)Available download formats
    Dataset updated
    Jul 5, 2020
    Authors
    Nelson Sharma
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    see the introduction kernel on how to access data and use annotation labels

    CONTEXT

    This dataset contains Lead II signal (with annotations) of 201 records collected from following 3 databases available on PhysioNet under open access:

    1. MIT-BIH Arrhythmia Database [**mitdb**]
    2. MIT-BIH Supraventricular Arrhythmia Database [**svdb**]
    3. St Petersburg INCART 12-lead Arrhythmia Database [**incartdb**]

    NOTE

    1. All signals have been resampled to 128Hz and gain has been removed.
    2. Baseline wander has been removed using Median Filtering.
    3. Denoising was NOT used.
    4. Signal data and annotation labels have been saved in numpy (.npy) format.
    5. All Signals are nearly 30 mins long.

    CONTENT

    Data has been organised as follows:

    parent directory db_npy contains 3 sub-directories each of which represent one database

    mitdb_npy has 48 records

    svdb_npy has 78 records

    incartdb_npy has 75 records

    Each of these database directory contains a '***RECORDS***' file that lists the ecg records available in that database.

    Each record has 3 files associated with it:

    1. rec_BEAT.npy: contains 'beat' annotations (R-peaks and its label) for the record. each record may have variable number of beats based on heart rate usually we shall be interested in beat annotation labels only. Each beat label represents one R-peak and hence one beat
    1. rec_NBEAT.npy: contains 'non-beat' annotations (Other than R-peaks) for the record
    1. rec_SIG_II.npy: contains the Lead 2 signal data of the record as a single numpy array

    (* see the introduction kernel on how to access data and use annotation labels *)

    Understanding Annotations:

    There are two types of annotations: Beat and Non-Beat annotations. Beat annotations are associated with each heart-beat. If you are working with heart-beat classifications then only Beat annotations shall be useful and Non-Beat annotations can be ignored.

    Standard PhysioNet Annotations are described in db_npy/annotations.txt file. These are common across all databases. This file has 3 columns Column 1: Label Column 2: Type of label [ b=beat annotation; n=non-beat annotation ] Column 3: Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F5417013%2F43781477b37cdca4adb187e4b39c3648%2Fstdlab.png?generation=1593929940178708&alt=media" alt="">

    There are 19 Beat annotations and 22 Non-Beat annotations. However, not all annotations may occur in data files. For example, the Label 'r' does not occur even once in any three of the database but yet its the part of standard PhysioNet labels. (might be in use in some other database). It's advised to do a full annotation count before working with data.

    According to AAMI recommendation, each beat is classified into one of the 5 types [ N, V, S, F, Q ]. However, you are free to choose any classification strategy.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F5417013%2F58397c44c2e877ddb61418270eb2f3fd%2FRelation-between-MIT-BIH-heartbeats-and-AAMI-standards.png?generation=1593940064290748&alt=media" alt="">

    IMPORTANT

    1. mitdb's record '102' and '104' DO NOT have lead 2 signal available hence files '102_SIG_II.npy' and '104_SIG_II.npy' are not present. However, they have BEAT and NBEAT files present. Its advised not to use those two records
    1. mitdb's record '102', '104', '107' and '217' are paced records
    1. mitdb's record '207' is the only record with 'Flutter' waves that are not marked by beat-annotations (no R-peaks marked). However, they are marked by non-beat annotations.

    Acknowledgements

    **PhysioNet **[https://physionet.org/] MLA Goldberger, A., et al. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000). APA Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. Chicago Goldberger, A., L. Amaral, L. Glass, J. Hausdorff, P. C. Ivanov, R. Mark, J. E. Mietus, G. B. Moody, C. K. Peng, and H. E. Stanley. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000). Harvard Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P.C., Mark, R., Mietus, J.E., Moody, G.B., Peng, C.K. and Stanley, H.E., 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. Vancouver Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov PC, Mark R...

  19. Data cleaning using unstructured data

    • zenodo.org
    zip
    Updated Jul 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rihem Nasfi; Rihem Nasfi; Antoon Bronselaer; Antoon Bronselaer (2024). Data cleaning using unstructured data [Dataset]. http://doi.org/10.5281/zenodo.13135983
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 30, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Rihem Nasfi; Rihem Nasfi; Antoon Bronselaer; Antoon Bronselaer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this project, we work on repairing three datasets:

    • Trials design: This dataset was obtained from the European Union Drug Regulating Authorities Clinical Trials Database (EudraCT) register and the ground truth was created from external registries. In the dataset, multiple countries, identified by the attribute country_protocol_code, conduct the same clinical trials which is identified by eudract_number. Each clinical trial has a title that can help find informative details about the design of the trial.
    • Trials population: This dataset delineates the demographic origins of participants in clinical trials primarily conducted across European countries. This dataset include structured attributes indicating whether the trial pertains to a specific gender, age group or healthy volunteers. Each of these categories is labeled as (`1') or (`0') respectively denoting whether it is included in the trials or not. It is important to note that the population category should remain consistent across all countries conducting the same clinical trial identified by an eudract_number. The ground truth samples in the dataset were established by aligning information about the trial populations provided by external registries, specifically the CT.gov database and the German Trials database. Additionally, the dataset comprises other unstructured attributes that categorize the inclusion criteria for trial participants such as inclusion.
    • Allergens: This dataset contains information about products and their allergens. The data was collected from the German version of the `Alnatura' (Access date: 24 November, 2020), a free database of food products from around the world `Open Food Facts', and the websites: `Migipedia', 'Piccantino', and `Das Ist Drin'. There may be overlapping products across these websites. Each product in the dataset is identified by a unique code. Samples with the same code represent the same product but are extracted from a differentb source. The allergens are indicated by (‘2’) if present, or (‘1’) if there are traces of it, and (‘0’) if it is absent in a product. The dataset also includes information on ingredients in the products. Overall, the dataset comprises categorical structured data describing the presence, trace, or absence of specific allergens, and unstructured text describing ingredients.

    N.B: Each '.zip' file contains a set of 5 '.csv' files which are part of the afro-mentioned datasets:

    • "{dataset_name}_train.csv": samples used for the ML-model training. (e.g "allergens_train.csv")
    • "{dataset_name}_test.csv": samples used to test the the ML-model performance. (e.g "allergens_test.csv")
    • "{dataset_name}_golden_standard.csv": samples represent the ground truth of the test samples. (e.g "allergens_golden_standard.csv")
    • "{dataset_name}_parker_train.csv": samples repaired using Parker Engine used for the ML-model training. (e.g "allergens_parker_train.csv")
    • "{dataset_name}_parker_train.csv": samples repaired using Parker Engine used to test the the ML-model performance. (e.g "allergens_parker_test.csv")
  20. f

    Datasheet1_Feasibility of a drug allergy registry-based excipient allergy...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Jan 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kan, Andy Ka Chun; Li, Philip Hei; Au, Elaine Y. L.; Saha, Chinmoy; Chiang, Valerie (2024). Datasheet1_Feasibility of a drug allergy registry-based excipient allergy database and call for universal mandatory drug ingredient disclosure: the case of PEG.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001373353
    Explore at:
    Dataset updated
    Jan 16, 2024
    Authors
    Kan, Andy Ka Chun; Li, Philip Hei; Au, Elaine Y. L.; Saha, Chinmoy; Chiang, Valerie
    Description

    BackgroundExcipient allergy is a rare, but potentially lethal, form of drug allergy. Diagnosing excipient allergy remains difficult in regions without mandatory drug ingredient disclosure and is a significant barrier to drug safety.ObjectiveTo investigate the feasibility of a drug allergy registry-based excipient database to identify potential excipient culprits in patients with history of drug allergy, using polyethylene glycol (PEG) as an example.MethodsAn excipient registry was created by compiling the excipient lists pertaining to all available formulations of the top 50 most reported drug allergy culprits in Hong Kong. Availability of excipient information, and its relationship with total number of formulations of individual drugs were analysed. All formulations were checked for the presence or absence of PEG.ResultsComplete excipient information was available for 36.5% (729/2,000) of all formulations of the top 50 reported drug allergy culprits in Hong Kong. The number of formulations for each drug was associated with proportion of available excipient information (ρ = 0.466, p = 0.001). Out of 729 formulations, 109 (15.0%) and 620 (85.0%) were confirmed to contain and not contain PEG, respectively. Excipient information was not available for the other 1,271 (63.6%) formulations. We were unable to confirm the presence or absence of PEG in any of the top 50 drug allergy culprits in Hong Kong.ConclusionIn countries without mandatory drug ingredient disclosure, excipient databases are unlikely able to identify potential excipient allergy in drug allergy patients. Legislations to enforce mandatory and universal ingredient disclosure are urgently needed.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kenneth B. McEntee (2024). International Registry of Reproductive Pathology Database [Dataset]. http://doi.org/10.13012/B2IDB-3175716_V1

International Registry of Reproductive Pathology Database

Explore at:
Dataset updated
Nov 1, 2024
Authors
Kenneth B. McEntee
Description

The International Registry of Reproductive Pathology Database is part of pioneering work done by Dr. Kenneth McEntee to comprehensively document thousands of disease cases studies. His large and comprehensive collection of case reports and physical samples was complimented by development of the International Registry of Reproductive Pathology Database in the 1980s. The original FoxPro Database files and a migrated access version were completed by the College of Veterinary Medicine in 2016. Access CSV files were completed by the University of Illinois Library in 2017.

Search
Clear search
Close search
Google apps
Main menu