Facebook
TwitterThe International Registry of Reproductive Pathology Database is part of pioneering work done by Dr. Kenneth McEntee to comprehensively document thousands of disease cases studies. His large and comprehensive collection of case reports and physical samples was complimented by development of the International Registry of Reproductive Pathology Database in the 1980s. The original FoxPro Database files and a migrated access version were completed by the College of Veterinary Medicine in 2016. Access CSV files were completed by the University of Illinois Library in 2017.
Facebook
TwitterCity of Austin Open Data Terms of Use https://data.austintexas.gov/stories/s/ranj-cccq This dataset is a monthly upload of the Community Registry (www.AustinTexas.gov/CR), where community organizations such as neighborhood associations may register with the City of Austin to receive notices of land development permit applications within 500 feet of the organization's specified boundaries. This dataset can be used to contact multiple registered organizations at once by filtering/sorting, for example, by Association Type or by Association ZipCode. The organizations' boundaries can be viewed in the City's interactive map at www.AustinTexas.gov/GIS/PropertyProfile/ - the Community Registry layer is under the Boundaries/Grids folder. Austin Development Services Data Disclaimer: The data provided are for informational use only and may differ from official department data. Austin Development Services’ database is continuously updated, so reports run at different times may produce different results. Care should be taken when comparing against other reports as different data collection methods and different data sources may have been used. Austin Development Services does not assume any liability for any decision made or action taken or not taken by the recipient in reliance upon any information or data provided.
Facebook
Twitterhttps://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
The bookplate registry database focuses on the bookplates that are pasted into the front matter of a book to show ownership. The bookplate registry is a searchable database image catalog of approximately 1100 sample bookplates and library stamps from Hesburgh Libraries Rare Books and Special Collections at the University of Notre Dame. The database was created during preliminary explorations of the cataloging and database methodology necessary to support a cooperative online bookplate registry for multiple universities. The database focuses on both the owners of the books as well as the artists that created the bookplate designs. The attached files include a powerpoint presenation given by Christian Dupont that was given at the 41st Annual Preconference of the Rare Books and Manuscripts Section of the American Library Association in Chicago, Illinois on July 7, 2000. The presentation explains the project in more detail and the data that was collected. The dataset gives information on the bookplates that were reviewed at the University of Notre Dame Hesburgh Libraries Rare Book and Special Collections. The original site that this information was searchable on was retired in the Fall of 2021.
Facebook
Twitterhttps://www.bco-dmo.org/dataset/526852/licensehttps://www.bco-dmo.org/dataset/526852/license
The Jellyfish Database Initiative (JeDI) is a scientifically-coordinated global database dedicated to gelatinous zooplankton (members of the Cnidaria, Ctenophora and Thaliacea) and associated environmental data. The database holds 476,000 quantitative, categorical, presence-absence and presence only records of gelatinous zooplankton spanning the past four centuries (1790-2011) assembled from a variety of published and unpublished sources. Gelatinous zooplankton data are reported to species level, where identified, but taxonomic information on phylum, family and order are reported for all records. Other auxiliary metadata, such as physical, environmental and biometric information relating to the gelatinous zooplankton metadata, are included with each respective entry. JeDI has been developed and designed as an open access research tool for the scientific community to quantitatively define the global baseline of gelatinous zooplankton populations and to describe long-term and large-scale trends in gelatinous zooplankton populations and blooms. It has also been constructed as a future repository of datasets, thus allowing retrospective analyses of the baseline and trends in global gelatinous zooplankton populations to be conducted in the future. access_formats=.htmlTable,.csv,.json,.mat,.nc,.tsv,.esriCsv,.geoJson acquisition_description=This information has been synthesized by members of the Global Jellyfish Group from online databases, unpublished and published datasets. More specific details may be found in\u00a0"%5C%22http://dmoserv3.bco-%0Admo.org/data_docs/JeDI/Lucas_et_al_2014_GEB.pdf%5C%22">Lucas, C.J., et al. 2014. Gelatinous zooplankton biomass in the global oceans: geographic variation and environmental drivers. Global Ecol. Biogeogr. (DOI: 10.1111/geb.12169) in the\u00a0methods section. awards_0_award_nid=54810 awards_0_award_number=OCE-1030149 awards_0_data_url=http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=1030149 awards_0_funder_name=NSF Division of Ocean Sciences awards_0_funding_acronym=NSF OCE awards_0_funding_source_nid=355 awards_0_program_manager=David L. Garrison awards_0_program_manager_nid=50534 cdm_data_type=Other comment=JeDI: Jellyfish Database Initiative, associated with the Trophic BATS project PIs: R. Condon, C. Lucas, C. Duarte, K. Pitt version 2015.01.08 Note: The displayed view of this dataset is subject to updates Note: Duplicate records were removed on 2015.01.08 See: Dataset term legend for full text of abbreviations. Conventions=COARDS, CF-1.6, ACDD-1.3 data_source=extract_data_as_tsv version 2.3 19 Dec 2019 defaultDataQuery=&time<now doi=10.1575/1912/7191 Easternmost_Easting=180.0 geospatial_lat_max=88.74 geospatial_lat_min=-78.5 geospatial_lat_units=degrees_north geospatial_lon_max=180.0 geospatial_lon_min=-180.0 geospatial_lon_units=degrees_east geospatial_vertical_max=7632.0 geospatial_vertical_min=-10191.48 geospatial_vertical_positive=down geospatial_vertical_units=m infoUrl=https://www.bco-dmo.org/dataset/526852 institution=BCO-DMO metadata_source=https://www.bco-dmo.org/api/dataset/526852 Northernmost_Northing=88.74 param_mapping={'526852': {'lat': 'master - latitude', 'depth': 'master - depth', 'lon': 'master - longitude'}} parameter_source=https://www.bco-dmo.org/mapserver/dataset/526852/parameters people_0_affiliation=University of North Carolina - Wilmington people_0_affiliation_acronym=UNC-Wilmington people_0_person_name=Robert Condon people_0_person_nid=51335 people_0_role=Principal Investigator people_0_role_type=originator people_1_affiliation=University of Western Australia people_1_person_name=Carlos M. Duarte people_1_person_nid=526857 people_1_role=Co-Principal Investigator people_1_role_type=originator people_2_affiliation=National Oceanography Centre people_2_affiliation_acronym=NOC people_2_person_name=Cathy Lucas people_2_person_nid=526856 people_2_role=Co-Principal Investigator people_2_role_type=originator people_3_affiliation=Griffith University people_3_person_name=Kylie Pitt people_3_person_nid=526858 people_3_role=Co-Principal Investigator people_3_role_type=originator people_4_affiliation=Woods Hole Oceanographic Institution people_4_affiliation_acronym=WHOI BCO-DMO people_4_person_name=Danie Kinkade people_4_person_nid=51549 people_4_role=BCO-DMO Data Manager people_4_role_type=related project=Trophic BATS projects_0_acronym=Trophic BATS projects_0_description=Fluxes of particulate carbon from the surface ocean are greatly influenced by the size, taxonomic composition and trophic interactions of the resident planktonic community. Large and/or heavily-ballasted phytoplankton such as diatoms and coccolithophores are key contributors to carbon export due to their high sinking rates and direct routes of export through large zooplankton. The potential contributions of small, unballasted phytoplankton, through aggregation and/or trophic re-packaging, have been recognized more recently. This recognition comes as direct observations in the field show unexpected trends. In the Sargasso Sea, for example, shallow carbon export has increased in the last decade but the corresponding shift in phytoplankton community composition during this time has not been towards larger cells like diatoms. Instead, the abundance of the picoplanktonic cyanobacterium, Synechococccus, has increased significantly. The trophic pathways that link the increased abundance of Synechococcus to carbon export have not been characterized. These observations helped to frame the overarching research question, "How do plankton size, community composition and trophic interactions modify carbon export from the euphotic zone". Since small phytoplankton are responsible for the majority of primary production in oligotrophic subtropical gyres, the trophic interactions that include them must be characterized in order to achieve a mechanistic understanding of the function of the biological pump in the oligotrophic regions of the ocean. This requires a complete characterization of the major organisms and their rates of production and consumption. Accordingly, the research objectives are: 1) to characterize (qualitatively and quantitatively) trophic interactions between major plankton groups in the euphotic zone and rates of, and contributors to, carbon export and 2) to develop a constrained food web model, based on these data, that will allow us to better understand current and predict near-future patterns in export production in the Sargasso Sea. The investigators will use a combination of field-based process studies and food web modeling to quantify rates of carbon exchange between key components of the ecosystem at the Bermuda Atlantic Time-series Study (BATS) site. Measurements will include a novel DNA-based approach to characterizing and quantifying planktonic contributors to carbon export. The well-documented seasonal variability at BATS and the occurrence of mesoscale eddies will be used as a natural laboratory in which to study ecosystems of different structure. This study is unique in that it aims to characterize multiple food web interactions and carbon export simultaneously and over similar time and space scales. A key strength of the proposed research is also the tight connection and feedback between the data collection and modeling components. Characterizing the complex interactions between the biological community and export production is critical for predicting changes in phytoplankton species dominance, trophic relationships and export production that might occur under scenarios of climate-related changes in ocean circulation and mixing. The results from this research may also contribute to understanding of the biological mechanisms that drive current regional to basin scale variability in carbon export in oligotrophic gyres. projects_0_end_date=2014-09 projects_0_geolocation=Sargasso Sea, BATS site projects_0_name=Plankton Community Composition and Trophic Interactions as Modifiers of Carbon Export in the Sargasso Sea projects_0_project_nid=2150 projects_0_start_date=2010-10 sourceUrl=(local files) Southernmost_Northing=-78.5 standard_name_vocabulary=CF Standard Name Table v55 version=1 Westernmost_Easting=-180.0 xml_source=osprey2erddap.update_xml() v1.3
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The file set is a freely downloadable aggregation of information about Australian schools. The individual files represent a series of tables which, when considered together, form a relational database. The records cover the years 2008-2014 and include information on approximately 9500 primary and secondary school main-campuses and around 500 subcampuses. The records all relate to school-level data; no data about individuals is included. All the information has previously been published and is publicly available but it has not previously been released as a documented, useful aggregation. The information includes: (a) the names of schools (b) staffing levels, including full-time and part-time teaching and non-teaching staff (c) student enrolments, including the number of boys and girls (d) school financial information, including Commonwealth government, state government, and private funding (e) test data, potentially for school years 3, 5, 7 and 9, relating to an Australian national testing programme know by the trademark 'NAPLAN'
Documentation of this Edition 2016.1 is incomplete but the organization of the data should be readily understandable to most people. If you are a researcher, the simplest way to study the data is to make use of the SQLite3 database called 'school-data-2016-1.db'. If you are unsure how to use an SQLite database, ask a guru.
The database was constructed directly from the other included files by running the following command at a command-line prompt: sqlite3 school-data-2016-1.db < school-data-2016-1.sql Note that a few, non-consequential, errors will be reported if you run this command yourself. The reason for the errors is that the SQLite database is created by importing a series of '.csv' files. Each of the .csv files contains a header line with the names of the variable relevant to each column. The information is useful for many statistical packages but it is not what SQLite expects, so it complains about the header. Despite the complaint, the database will be created correctly.
Briefly, the data are organized as follows. (a) The .csv files ('comma separated values') do not actually use a comma as the field delimiter. Instead, the vertical bar character '|' (ASCII Octal 174 Decimal 124 Hex 7C) is used. If you read the .csv files using Microsoft Excel, Open Office, or Libre Office, you will need to set the field-separator to be '|'. Check your software documentation to understand how to do this. (b) Each school-related record is indexed by an identifer called 'ageid'. The ageid uniquely identifies each school and consequently serves as the appropriate variable for JOIN-ing records in different data files. For example, the first school-related record after the header line in file 'students-headed-bar.csv' shows the ageid of the school as 40000. The relevant school name can be found by looking in the file 'ageidtoname-headed-bar.csv' to discover that the the ageid of 40000 corresponds to a school called 'Corpus Christi Catholic School'. (3) In addition to the variable 'ageid' each record is also identified by one or two 'year' variables. The most important purpose of a year identifier will be to indicate the year that is relevant to the record. For example, if one turn again to file 'students-headed-bar.csv', one sees that the first seven school-related records after the header line all relate to the school Corpus Christi Catholic School with ageid of 40000. The variable that identifies the important differences between these seven records is the variable 'studentyear'. 'studentyear' shows the year to which the student data refer. One can see, for example, that in 2008, there were a total of 410 students enrolled, of whom 185 were girls and 225 were boys (look at the variable names in the header line). (4) The variables relating to years are given different names in each of the different files ('studentsyear' in the file 'students-headed-bar.csv', 'financesummaryyear' in the file 'financesummary-headed-bar.csv'). Despite the different names, the year variables provide the second-level means for joining information acrosss files. For example, if you wanted to relate the enrolments at a school in each year to its financial state, you might wish to JOIN records using 'ageid' in the two files and, secondarily, matching 'studentsyear' with 'financialsummaryyear'. (5) The manipulation of the data is most readily done using the SQL language with the SQLite database but it can also be done in a variety of statistical packages. (6) It is our intention for Edition 2016-2 to create large 'flat' files suitable for use by non-researchers who want to view the data with spreadsheet software. The disadvantage of such 'flat' files is that they contain vast amounts of redundant information and might not display the data in the form that the user most wants it. (7) Geocoding of the schools is not available in this edition. (8) Some files, such as 'sector-headed-bar.csv' are not used in the creation of the database but are provided as a convenience for researchers who might wish to recode some of the data to remove redundancy. (9) A detailed example of a suitable SQLite query can be found in the file 'school-data-sqlite-example.sql'. The same query, used in the context of analyses done with the excellent, freely available R statistical package (http://www.r-project.org) can be seen in the file 'school-data-with-sqlite.R'.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Information related to diet and energy flow is fundamental to a diverse range of Antarctic and Southern Ocean biological and ecosystem studies. This metadata record describes a database of such information being collated by the SCAR Expert Groups on Antarctic Biodiversity Informatics (EG-ABI) and Birds and Marine Mammals (EG-BAMM) to assist the scientific community in this work. It includes data related to diet and energy flow from conventional (e.g. gut content) and modern (e.g. molecular) studies, stable isotopes, fatty acids, and energetic content. It is a product of the SCAR community and open for all to participate in and use.
Data have been drawn from published literature, existing trophic data collections, and unpublished data. The database comprises five principal tables, relating to (i) direct sampling methods of dietary assessment (e.g. gut, scat, and bolus content analyses, stomach flushing, and observed predation), (ii) stable isotopes, (iii) lipids, (iv) DNA-based diet assessment, and (v) energetics values. The schemas of these tables are described below, and a list of the sources used to populate the tables is provided with the data.
A range of manual and automated checks were used to ensure that the entered data were as accurate as possible. These included visual checking of transcribed values, checking of row or column sums against known totals, and checking for values outside of allowed ranges. Suspicious entries were re-checked against original source.
Notes on names: Names have been validated against the World Register of Marine Species (http://www.marinespecies.org/). For uncertain taxa, the most specific taxonomic name has been used (e.g. prey reported in a study as "Pachyptila sp." will appear here as "Pachyptila"; "Cephalopods" will appear as "Cephalopoda"). Uncertain species identifications (e.g. "Notothenia rossii?" or "Gymnoscopelus cf. piabilis") have been assigned the genus name (e.g. "Notothenia", "Gymnoscopelus"). Original names have been retained in a separate column to allow future cross-checking. WoRMS identifiers (APHIA_ID numbers) are given where possible.
Grouped prey data in the diet sample table need to be handled with a bit of care. Papers commonly report prey statistics aggregated over groups of prey - e.g. one might give the diet composition by individual cephalopod prey species, and then an overall record for all cephalopod prey. The PREY_IS_AGGREGATE column identifies such records. This allows us to differentiate grouped data like this from unidentified prey items from a certain prey group - for example, an unidentifiable cephalopod record would be entered as Cephalopoda (the scientific name), with "N" in the PREY_IS_AGGREGATE column. A record that groups together a number of cephalopod records, possibly including some unidentifiable cephalopods, would also be entered as Cephalopoda, but with "Y" in the PREY_IS_AGGREGATE column. See the notes on PREY_IS_AGGREGATE, below.
There are two related R packages that provide data access and functionality for working with these data. See the package home pages for more information: https://github.com/SCAR/sohungry and https://github.com/SCAR/solong.
Data table schemas
Sources data table
SOURCE_ID: The unique identifier of this source
DETAILS: The bibliographic details for this source (e.g. "Hindell M (1988) The diet of the royal penguin Eudyptes schlegeli at Macquarie Island. Emu 88:219–226")
NOTES: Relevant notes about this source – if it’s a published paper, this is probably the abstract
DOI: The DOI of the source (paper or dataset), in the form "10.xxxx/yyyy"
Diet data table
RECORD_ID: The unique identifier of this record
SOURCE_ID: The identifier of the source study from which this record was obtained (see corresponding entry in the sources data table)
SOURCE_DETAILS, SOURCE_DOI: The details and DOI of the source, copied from the sources data table for convenience
ORIGINAL_RECORD_ID: The identifier of this data record in its original source, if it had one
LOCATION: The name of the location at which the data was collected
WEST: The westernmost longitude of the sampling region, in decimal degrees (negative values for western hemisphere longitudes)
EAST: The easternmost longitude of the sampling region, in decimal degrees (negative values for western hemisphere longitudes)
SOUTH: The southernmost latitude of the sampling region, in decimal degrees (negative values for southern hemisphere latitudes)
NORTH: The northernmost latitude of the sampling region, in decimal degrees (negative values for southern hemisphere latitudes)
ALTITUDE_MIN: The minimum altitude of the sampling region, in metres
ALTITUDE_MAX: The maximum altitude of the sampling region, in metres
DEPTH_MIN: The shallowest depth of the sampling, in metres
DEPTH_MAX: The deepest depth of the sampling, in metres
OBSERVATION_DATE_START: The start of the sampling period
OBSERVATION_DATE_END: The end of the sampling period. If sampling was carried out over multiple seasons (e.g. during January of 2002 and January of 2003), this will be the first and last dates (in this example, from 1-Jan-2002 to 31-Jan-2003)
PREDATOR_NAME: The name of the predator. This may differ from predator_name_original if, for example, taxonomy has changed since the original publication, if the original publication had spelling errors or used common (not scientific) names
PREDATOR_NAME_ORIGINAL: The name of the predator, as it appeared in the original source
PREDATOR_APHIA_ID: The numeric identifier of the predator in the WoRMS taxonomic register
PREDATOR_WORMS_RANK, PREDATOR_WORMS_KINGDOM, PREDATOR_WORMS_PHYLUM, PREDATOR_WORMS_CLASS, PREDATOR_WORMS_ORDER, PREDATOR_WORMS_FAMILY, PREDATOR_WORMS_GENUS: The taxonomic details of the predator, from the WoRMS taxonomic register
PREDATOR_GROUP_SOKI: A descriptive label of the group to which the predator belongs (currently used in the Southern Ocean Knowledge and Information wiki, http://soki.aq)
PREDATOR_LIFE_STAGE: Life stage of the predator, e.g. "adult", "chick", "larva", "juvenile". Note that if a food sample was taken from an adult animal, but that food was destined for a juvenile, then the life stage will be "juvenile" (this is common with seabirds feeding chicks)
PREDATOR_BREEDING_STAGE: Stage of the breeding season of the predator, if applicable, e.g. "brooding", "chick rearing", "nonbreeding", "posthatching"
PREDATOR_SEX: Sex of the predator: "male", "female", "both", or "unknown"
PREDATOR_SAMPLE_COUNT: The number of predators for which data are given. If (say) 50 predators were caught but only 20 analysed, this column will contain 20. For scat content studies, this will be the number of scats analysed
PREDATOR_SAMPLE_ID: The identifier of the predator(s). If predators are being reported at the individual level (i.e. PREDATOR_SAMPLE_COUNT = 1) then PREDATOR_SAMPLE_ID is the individual animal ID. Alternatively, if the data values being entered here are from a group of predators, then the PREDATOR_SAMPLE_ID identifies that group of predators. PREDATOR_SAMPLE_ID values are unique within a source (i.e. SOURCE_ID, PREDATOR_SAMPLE_ID pairs are globally unique). Rows with the same SOURCE_ID and PREDATOR_SAMPLE_ID values relate to the same predator individual or group of individuals, and so can be combined (e.g. for prey diversity analyses). Subsamples are indicated by a decimal number S.nnn, where S is the parent PREDATOR_SAMPLE_ID, and nnn (001-999) is the subsample number. Studies will sometimes report detailed prey information for a large sample, but then report prey information for various subsamples of that sample (e.g. broken down by predator sex, or sampling season). In the simplest case, the diet of each predator will be reported only once in the study, and in this scenario the PREDATOR_SAMPLE_ID values will simply be 1 to N (for N predators).
PREDATOR_SIZE_MIN, PREDATOR_SIZE_MAX, PREDATOR_SIZE_MEAN, PREDATOR_SIZE_SD: The minimum, maximum, mean, and standard deviation of the size of the predators in the sample
PREDATOR_SIZE_UNITS: The units of size (e.g. "mm")
PREDATOR_SIZE_NOTES: Notes on the predator size information, including a definition of what the size value represents (e.g. "total length", "standard length")
PREDATOR_MASS_MIN, PREDATOR_MASS_MAX, PREDATOR_MASS_MEAN, PREDATOR_MASS_SD: The minimum, maximum, mean, and standard deviation of the mass of the predators in the sample
PREDATOR_MASS_UNITS: The units of mass (e.g. "g", "kg")
PREDATOR_MASS_NOTES: Notes on the predator mass information, including a definition of what the mass value represents
PREY_NAME: The scientific name of the prey item (corrected, if necessary)
PREY_NAME_ORIGINAL: The name of the prey item, as it appeared in the original source
PREY_APHIA_ID: The numeric identifier of the prey in the WoRMS taxonomic register
PREY_WORMS_RANK, PREY_WORMS_KINGDOM, PREY_WORMS_PHYLUM, PREY_WORMS_CLASS, PREY_WORMS_ORDER, PREY_WORMS_FAMILY, PREY_WORMS_GENUS: The taxonomic details of the prey, from the WoRMS taxonomic register
PREY_GROUP_SOKI: A descriptive label of the group to which the prey belongs (currently used in the Southern Ocean Knowledge and Information wiki, http://soki.aq)
PREY_IS_AGGREGATE: "Y" indicates that this row is an aggregation of other rows in this data source. For example, a study might give a number of individual squid species records, and then an overall squid record that encompasses the individual records. Use the PREY_IS_AGGREGATE information to avoid double-counting during analyses
PREY_LIFE_STAGE: Life stage of the prey (e.g. "adult", "chick", "larva")
PREY_SEX: The sex of the prey ("male", "female", "both", or "unknown"). Note that this is generally "unknown"
PREY_SAMPLE_COUNT: The number of prey individuals from which size and mass measurements were made (note: this is NOT the total number of individuals of
Facebook
TwitterHolocene climate reconstructions are useful for understanding the diverse features and spatial heterogeneity of past and future climate change. Here we present a database of western North American Holocene paleoclimate records. The database gathers paleoclimate time series from 184 terrestrial and marine sites, including 381 individual proxy records. The records span at least 4000 of the last 12 000 years (median duration of 10 725 years) and have been screened for resolution, chronologic control, and climate sensitivity. Records were included that reflect temperature, hydroclimate, or circulation features. The database is shared in the machine readable Linked Paleo Data (LiPD) format and includes geochronologic data for generating site-level time-uncertain ensembles. This publicly accessible and curated collection of proxy paleoclimate records will have wide research applications, including, for example, investigations of the primary features of ocean–atmospheric circulation along the eastern margin of the North Pacific and the latitudinal response of climate to orbital changes.
Facebook
TwitterThis data release includes cross section survey data collected during site visits to USGS gaging stations located throughout the Willamette and Delaware River Basins and multispectral images of these locations acquired as close in time as possible to the date of each site visit. In addition, MATLAB source code developed for the Bathymetric Mapping using Gage Records and Image Databases (BaMGRID) framework is also provided. The site visit data were obtained from the Aquarius Time Series database, part of the USGS National Water Information System (NWIS), using the Publish Application Programming Interface (API). More specifically, a custom MATLAB function was used to query the FieldVisitDataByLocationServiceRequest endpoint of the Aquarius API by specifying the gaging station ID number and the date range of interest and then retrieve the QRev XML attachments associated with site visits meeting these criteria. These XML files were then parsed using another custom MATLAB function that served to extract the cross section survey data collected during the site visit. Note that because many of the site visits involved surveying cross sections using instrumentation that was not GPS-enabled, latitude and longitude coordinates were not available and no data values (NaN) are used in the site visit files provided in this data release. Remotely sensed data acquired as close as possible to the date of each site visit were also retrieved via APIs. Multispectral satellite images from the PlanetScope constellation were obtained using custom MATLAB functions developed to interact with the Planet Orders API, which provided tools for clipping the images to a specified area of interest focused on the gaging station and harmonizing the pixel values to be consistent across the different satellites within the PlanetScope constellation. The data product retrieved was the PlanetScope orthorectified 8-band surface reflectance bundle. PlanetScope images are acquired with high frequency, often multiple times per day at a given location, and so the search was restricted to a time window spanning from three days prior to three days after the site visit. All images meeting these criteria were downloaded and manually inspected; the highest quality image closest in time to the site visit date was retained for further analysis. For the gaging stations within the Willamette River Basin, digital aerial photography acquired through the National Agricultural Imagery Program (NAIP) in 2022 were obtained using a similar set of MATLAB functions developed to access the USGS EarthExplorer Machine-to-Machine (M2M) API. The NAIP quarter-quadrangle image encompassing each gaging station was downloaded and then clipped to a smaller area centered on the gaging station. Only one NAIP image at each gaging station was acquired in 2022, so differences in streamflow between the image acquisition date and the date of the site visit closest in time were accounted for by performing separate NWIS web queries to retrieve the stage and discharge recorded at the gaging station on the date the image was acquired and on the date of the site visit. These data sets were used as an example application of the framework for Bathymetric Mapping using Gage Records and Image Databases (BaMGRID) and this data release also provides MATLAB source code developed to implement this approach. The code is packaged in a zip archive that includes the following individual .m files: 1) getSiteVisit.m, for retrieving data collected during site visits to USGS gaging stations through the Aquarius API; 2) Qrev2depth.m, for parsing the XML file from the site visit and extracting depth measurements surveyed along a channel cross section during a direct discharge measurement; 3) orderPlanet.m, for searching for and ordering PlanetScope images via the Planet Orders API; 4) pollThenGrabPlanet.m, for querying the status of an order and then downloading PlanetScope images requested through the Planet Orders API; 5) organizePlanet.m, for file management and cleanup of the original PlanetScope image data obtained via the previous two functions; 6) ingestNaip.m, for searching for, ordering, and downloading NAIP data via the USGS Machine-to-Machine (M2M) API; 7) naipExtractClip.m, for clipping the downloaded NAIP images to the specified area of interest and performing file management and cleanup; and 8) crossValObra.m, for performing spectrally based depth retrieval via the Optimal Band Ratio Analysis (OBRA) algorithm using a k-fold cross-validation approach intended for small sample sizes. The files provided through this data release include: 1) A zipped shapefile with polygons delineating the Willamette and Delaware River basins 2) .csv text files with information on site visits within each basin during 2022 3) .csv text files with information on PlanetScope images of each gaging station close in time to the date of each site visit that can be used to obtain the image data through the Planet Orders API or Planet Explorer web interface. 4) A .csv text tile with information on NAIP images of each gaging station in the Willamette River Basin as close in time as possible to the date of each site visit, along with the stage and discharge recorded at the gaging station on the date of image acquisition and the date of the site visit. 5) A zip archive of the clipped NAIP images of each gaging station in the Willamette River Basin in GeoTIFF format. 6) A zip archive with source code (MATLAB *.m files) developed to implement the Bathymetric Mapping using Gage Records and Image Databases (BaMGRID) framework.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundRandomised controlled trials (RCTs) are widely accepted as the preferred study design for evaluating healthcare interventions. When the sample size is determined, a (target) difference is typically specified that the RCT is designed to detect. This provides reassurance that the study will be informative, i.e., should such a difference exist, it is likely to be detected with the required statistical precision. The aim of this review was to identify potential methods for specifying the target difference in an RCT sample size calculation.Methods and FindingsA comprehensive systematic review of medical and non-medical literature was carried out for methods that could be used to specify the target difference for an RCT sample size calculation. The databases searched were MEDLINE, MEDLINE In-Process, EMBASE, the Cochrane Central Register of Controlled Trials, the Cochrane Methodology Register, PsycINFO, Science Citation Index, EconLit, the Education Resources Information Center (ERIC), and Scopus (for in-press publications); the search period was from 1966 or the earliest date covered, to between November 2010 and January 2011. Additionally, textbooks addressing the methodology of clinical trials and International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH) tripartite guidelines for clinical trials were also consulted. A narrative synthesis of methods was produced. Studies that described a method that could be used for specifying an important and/or realistic difference were included. The search identified 11,485 potentially relevant articles from the databases searched. Of these, 1,434 were selected for full-text assessment, and a further nine were identified from other sources. Fifteen clinical trial textbooks and the ICH tripartite guidelines were also reviewed. In total, 777 studies were included, and within them, seven methods were identified—anchor, distribution, health economic, opinion-seeking, pilot study, review of the evidence base, and standardised effect size.ConclusionsA variety of methods are available that researchers can use for specifying the target difference in an RCT sample size calculation. Appropriate methods may vary depending on the aim (e.g., specifying an important difference versus a realistic difference), context (e.g., research question and availability of data), and underlying framework adopted (e.g., Bayesian versus conventional statistical approach). Guidance on the use of each method is given. No single method provides a perfect solution for all contexts.Please see later in the article for the Editors' Summary
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This is an RO-Crate of a database from the Online Heritage Resource Manager, a databasing system from the eScholarhip Research Centre of the University of Melbourne. The crate has been built from a dump of the OHRM database. While we have made the best effort possible to create a valid RO-Crate, it is possible for various reasons that the crate contains broken links or improperly described files. If you are the owner of the original database, please contact the maintainer of this RO-Crate in Figshare.
To view the contents of the crate, download the deposited file, unzip it, and open the ro-crate-preview.html file in the top directory.
Facebook
TwitterConversion Software Registry (CSR) has been designed for collecting information about software packages that are capable of file format conversions. The work is motivated by a community need for finding file format conversions inaccessible via current search engines and by the specific need to support systems that could actually perform conversions, such as the NCSA Polyglot. In addition, the value of CSR is in complementing the existing file format registries and introducing software quality information obtained by content-based comparisons of files before and after conversions. The contribution of this work is in the CSR data model design that includes file format extension based conversion, as well as software scripts, software quality measures and test file specific information for evaluating software quality. The CSR system serves as the source of information and a test bed for the system that can execute the conversions automatically by using the third party software, for example, NCSA Polyglot. The CSR system is a database with a web-based interface that provides services related to a) finding a conversion path between formats b) uploading information about the 3rd party software packages and file extensions, c) uploading files for testing, and finally d) uploading scripts in operating system (OS) specific scripting languages (Windows AutoHotKey, AppleScript and Perl) for automated conversions according to the idea of imposed code reuse used by NCSA Polyglot. In order to provide file format conversion services, CSR have included the following components into CSR related to software capable of conversions: input and output file formats (extensions), scripts operating on the software, validated files to be used for information loss measurements, as well as quantitative measures of the information loss for conversions. The CSR focuses: on software and finding the format conversion paths described by a number of software packages and unique input and output formats. The formats themselves are represented by extensions. While not always unique, extensions are often the only accessible information when the 3rd party software is installed (often listed under the File/Open menu in most packages). The CSR also contains information about the software, operating system, software interface and scripts to execute the software. The scripts are important for the automating conversions with the 3rd party software and can be implemented using AutoHotkey scripts (Windows), AppleScript (Mac) or one of a variety of scripting languages for Unix. The information loss due to file format conversions is measured externally by different techniques within the NCSA object-to-object comparison framework called Versus. The comparison is relevant to the software domain, for example for 3D applications surface area or spin images are used and the loss (0-100 range with 100 representing no loss) for a particular software-conversion pair is stored in the database. The information loss also represents edge weights to Input/Output (I/O) Graph, a simple workflow used for finding the shortest conversion path. The CSR is written as a web service. It consists of three main components: Query, Add, Edit. In the Query mode users can a) view list of all software packages with their conversion options, b) select subsets of software in the I/O-Graph, c) search the database by conversions, software, extensions, MIME and PUID. The I/O-Graph contains all information about installed applications and the conversions they allow. The JAVA applet front end is part of the CSR web visualization interface. Section Add allows users to add new software packages with their conversion capabilities and upload the software scripts to automate them. The last section, Edit is designed for adding detailed information about the software, extensions and for uploading the test files. CSR requires users to login for adding and editing. The web fields are auto completed to help search. Sponsors: This research was partially supported by a National Archive and Records Administration (NARA) supplement to NSF PACI cooperative agreement CA #SCI-9619019. Keywords: Software, Registry, Information, Conversion, Database, Tool
Facebook
TwitterPremium B2C Consumer Database - 269+ Million US Records
Supercharge your B2C marketing campaigns with comprehensive consumer database, featuring over 269 million verified US consumer records. Our 20+ year data expertise delivers higher quality and more extensive coverage than competitors.
Core Database Statistics
Consumer Records: Over 269 million
Email Addresses: Over 160 million (verified and deliverable)
Phone Numbers: Over 76 million (mobile and landline)
Mailing Addresses: Over 116,000,000 (NCOA processed)
Geographic Coverage: Complete US (all 50 states)
Compliance Status: CCPA compliant with consent management
Targeting Categories Available
Demographics: Age ranges, education levels, occupation types, household composition, marital status, presence of children, income brackets, and gender (where legally permitted)
Geographic: Nationwide, state-level, MSA (Metropolitan Service Area), zip code radius, city, county, and SCF range targeting options
Property & Dwelling: Home ownership status, estimated home value, years in residence, property type (single-family, condo, apartment), and dwelling characteristics
Financial Indicators: Income levels, investment activity, mortgage information, credit indicators, and wealth markers for premium audience targeting
Lifestyle & Interests: Purchase history, donation patterns, political preferences, health interests, recreational activities, and hobby-based targeting
Behavioral Data: Shopping preferences, brand affinities, online activity patterns, and purchase timing behaviors
Multi-Channel Campaign Applications
Deploy across all major marketing channels:
Email marketing and automation
Social media advertising
Search and display advertising (Google, YouTube)
Direct mail and print campaigns
Telemarketing and SMS campaigns
Programmatic advertising platforms
Data Quality & Sources
Our consumer data aggregates from multiple verified sources:
Public records and government databases
Opt-in subscription services and registrations
Purchase transaction data from retail partners
Survey participation and research studies
Online behavioral data (privacy compliant)
Technical Delivery Options
File Formats: CSV, Excel, JSON, XML formats available
Delivery Methods: Secure FTP, API integration, direct download
Processing: Real-time NCOA, email validation, phone verification
Custom Selections: 1,000+ selectable demographic and behavioral attributes
Minimum Orders: Flexible based on targeting complexity
Unique Value Propositions
Dual Spouse Targeting: Reach both household decision-makers for maximum impact
Cross-Platform Integration: Seamless deployment to major ad platforms
Real-Time Updates: Monthly data refreshes ensure maximum accuracy
Advanced Segmentation: Combine multiple targeting criteria for precision campaigns
Compliance Management: Built-in opt-out and suppression list management
Ideal Customer Profiles
E-commerce retailers seeking customer acquisition
Financial services companies targeting specific demographics
Healthcare organizations with compliant marketing needs
Automotive dealers and service providers
Home improvement and real estate professionals
Insurance companies and agents
Subscription services and SaaS providers
Performance Optimization Features
Lookalike Modeling: Create audiences similar to your best customers
Predictive Scoring: Identify high-value prospects using AI algorithms
Campaign Attribution: Track performance across multiple touchpoints
A/B Testing Support: Split audiences for campaign optimization
Suppression Management: Automatic opt-out and DNC compliance
Pricing & Volume Options
Flexible pricing structures accommodate businesses of all sizes:
Pay-per-record for small campaigns
Volume discounts for large deployments
Subscription models for ongoing campaigns
Custom enterprise pricing for high-volume users
Data Compliance & Privacy
VIA.tools maintains industry-leading compliance standards:
CCPA (California Consumer Privacy Act) compliant
CAN-SPAM Act adherence for email marketing
TCPA compliance for phone and SMS campaigns
Regular privacy audits and data governance reviews
Transparent opt-out and data deletion processes
Getting Started
Our data specialists work with you to:
Define your target audience criteria
Recommend optimal data selections
Provide sample data for testing
Configure delivery methods and formats
Implement ongoing campaign optimization
Why We Lead the Industry
With over two decades of data industry experience, we combine extensive database coverage with advanced targeting capabilities. Our commitment to data quality, compliance, and customer success has made us the preferred choice for businesses seeking superior B2C marketing performance.
Contact our team to discuss your specific ta...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
*Null hypothesis: there was no difference between the number of records identified from each of the seven databases searched for the sampled and not sampled groups. The p-value is testing that the number of records identified in at least one of the seven different databases did not differ from that of at least one other database.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BioSharing (http://www.biosharing.org) is a curated, web-based, searchable portal of over 1,300 records describing content standards, databases and data policies in the life sciences, broadly encompassing the biological, natural and biomedical sciences. Among many features, the records can be searched and filtered, or grouped via the ‘Collection’ feature according to field of interest. An example is the Collection curated with the NIH BD2K bioCADDIE project, for various purposes. First, to select and track content standards that have been reviewed during the creation of the metadata model underpinning the Data Discovery Index. Second, as the work progresses and the prototype Index harvests dataset descriptors from different databases, the Collection will be extended to include the descriptions of these databases, including which (if any) standards they implement. This is key to support one of the bioCADDIE project use cases: to allow the searching and filtering of datasets that are compliant to a given community standard. Despite a growing set of standard guidelines and formats for describing their experiments, the barriers to authoring the experimental metadata necessary for sharing and interpreting datasets are tremendously high. Understanding how to comply with these standards takes time and effort and researchers view this as a burden that may benefit other scientists, but not themselves. To tackle this, with and for the NIH BD2K CEDAR project, we will explore methods to serve machine-readable versions of these standards that can inform the creation of metadata templates, rendering standards invisible to the researchers and driving them to strive for easier authoring of the experimental metadata. Lastly, as part of the ELIXIR-UK Node BioSharing is being developed to be the ELIXIR Standards Registry and will be progressively cross-linked to other registries, such as the ELIXIR Tools and Services Registry and the ELIXIR Training e-Support System (TeSS).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Glioblastoma is the most common and fatal primary brain tumor in adults. Even with maximal resection and a series of postoperative adjuvant treatments, the median overall survival (OS) of glioblastoma patients remains approximately 15 months. The Huashan Hospital glioma bank contains more than 2000 glioma tissue samples with long-term follow-up data; almost half of these samples are from glioblastoma patients. Several large glioma databases with long-term follow-up data have reported outcomes of glioblastoma patients from countries other than China. We investigated the prognosis of glioblastoma patients in China and compared the survival outcomes among patients from different databases. The data for 967 glioblastoma patients who underwent surgery at Huashan Hospital and had long-term follow-up records were obtained from our glioma registry (diagnosed from 29 March 2010, through 7 June 2017). Patients were eligible for inclusion if they underwent surgical resection for newly diagnosed glioblastomas and had available data of survival and personal information. Data of 778 glioblastoma patients were collected from three separate online databases (448 patients from The Cancer Genome Atlas (TCGA, https://cancergenome.nih.gov), 191 from REpository for Molecular BRAin Neoplasia DaTa (REMBRANDT) database (GSE108476) and 132 from data set GSE16011(Hereafter called as the French database). We compared the prognosis of glioblastoma patients from records among the different databases and the changes in survival outcomes of glioblastoma patients from Huashan Hospital over an 8-year period. The median OS of glioblastoma patients was 16.3 (95% CI: 15.4–17.2) months for Huashan Hospital, 13.8 (95% CI: 12.9–14.9) months for TCGA, 19.3 (95% CI: 17.0–20.0) months for the REMBRANDT database, and 9.1 months for the French database. The median OS of glioblastoma patients from Huashan Hospital improved from 15.6 (2010–2013, 95% CI: 14.4–16.6) months to 18.2 (2014–2017, 95% CI: 15.8–20.6) months over the study period (2010–2017). In addition, the prognosis of glioblastoma patients with total resection was significantly better than that of glioblastoma patients with sub-total resection or biopsy. Our study confirms that treatment centered around maximal surgical resection brought survival benefits to glioblastoma patients after adjusting to validated prognostic factors. In addition, an improvement in prognosis was observed among glioblastoma patients from Huashan Hospital over the course of our study. We attributed it to the adoption of a new standard of neurosurgical treatment on the basis of neurosurgical multimodal technologies. Even though the prognosis of glioblastoma patients remains poor, gradual progress is being made.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The world’s “100 worst invasive species” were listed in 2000. The list is taxonomically diverse and often cited (typically for single-species studies), and its species are frequently reported in global biodiversity databases. We acted on the principle that these notorious species should be well-reported to help answer two questions about global biogeography of invasive species (i.e., not just their invaded ranges): (1) “how are data distributed globally?” and (2) “what predicts diversity?” We collected location data for each of the 100 species from multiple databases; 95 had sufficient data for analyses. For question (1), we mapped global species richness and cumulative occurrences since 2000 in (0.5 degree)2 grids. For question (2) we compared alternative regression models representing non-exclusive hypotheses for geography (i.e., spatial autocorrelation), sampling effort, climate, and anthropocentric effects. Reported locations of the invasive species were spatially-biased, leaving large gaps on multiple continents. Accordingly, species richness was best explained by both anthropocentric effects not often used in biogeographic models (Government Effectiveness, Voice & Accountability, human population size) and typical natural factors (climate, geography; R2 = 0.87). Cumulative occurrence was strongly related to anthropocentric effects (R2 = 0.62). We extract five lessons for invasive species biogeography; foremost is the importance of anthropocentric measures for understanding invasive species diversity patterns and large lacunae in their known global distributions. Despite those knowledge gaps, advanced models here predict well the biogeography of the world’s worst invasive species for much of the world. Methods Data Acquisition and Processing Data were acquired from multiple data bases for the 100 invasive species in February 2022 using the spocc package in R (Chamberlain 2021). Data sources (in alphabetical order) included: the Atlas of Living Australia ('ALA'; https://www.ala.org.au); eBird (http://www.ebird.org/home; Sullivan et al. 2009); the Integrated Digitized Biocollections ('iDigBio'; https://www.idigbio.org; Matsunaga et al. 2013); the Global Biodiversity Information Facility (GBIF (https://www.gbif.org); Ocean 'Biogeographic' Information System ('OBIS'; https://portal.obis.org; Grassle and Stocks 1999); VertNet (https://vertnet.org; Constable et al. 2010); and the US Geological Survey’s Biodiversity Information Serving Our Nation ('BISON'; replaced December 2021 by GBIF). Several databases set limits to 100,000 initial point records (before cleaning, described below) when accessed using spocc. As a result, data for 19 species with >100,000 point records (e.g., the European starling (Sturnus vulgaris Linnaeus) had >23 million point records) were obtained directly from GBIF on 23–25 February 2022, which included records already contributed to GBIF from multiple databases. All searches were based on genus and species epithets, where taxonomic changes in the historical records required decisions. Where an epithet changed since the 100 species were listed in 2000 (Lowe et al. 2000), both former and current names were searched and concatenated. For example, Lowe et al. (2000) listed the American bullfrog as Rana catesbeiana Shaw, 1802 which is now Lithobates catesbeianus (Shaw, 1802); both were included in searches, as well as synonyms. Taxonomic synonyms were resolved by referring to the Catalogue of Life (https://www.catalogueoflife.org/) Centre for Agriculture and Bioscience International (http://www.cabi.org) and World Flora Online (http://worldfloraonline.org). Listed synonyms and new combinations were included in data, whereas undocumented synonyms (i.e., provided in a database but not resolved above) were excluded. Database entries that lacked species epithets (i.e., genus only) were excluded and all identities were at the species level. Some taxa formerly identified in (Lowe et al. 2000) as a species are now subspecies (e.g., the red-ear slider Trachemys stricta (Thunberg in Schoepff, 1792) is now Trachemys stricta elegans (Wied-Neuwied, 1838)). For those taxa, data may be more inclusive in current taxonomy than the original intent. However, our use of species-level identities includes sub-specific hybrids (e.g., Parham et al. 2020). Overall, our approach: matched the taxonomic resolution of (Lowe et al. 2000); recognized variation through time and space; and included potential hybrids among subspecies. We set a threshold for a species to be included in analyses at > 30 records because we judged distributions with fewer records to be inadequately represented. As a result, four species (notably disease agents or vectors) had too few data to be analyzed here: Aphanomyces astaci Schikora, 1906, Cinara cupressi (Buckton, 1881), Plasmodium relictum (Grassi & Feletti, 1891), and Trogoderma granarium Everts, 1898. Likewise, banana bunchy top virus was not present in databases, despite a reported global distribution (https://www.cabi.org/isc/datasheet/8161). As a potential alternative, we searched for its aphid vector (Pentalonia nigronervosa Coquerel, 1859) but obtained records that fully lacked Africa and Asia, despite the widespread tropical distribution of the virus. We thus treated banana bunchy top virus as an under-reported species and omitted it here. Finally, rinderpest was listed by Lowe et al. (2000) but since eradicated (Morens et al. 2011). Following Luque et al. (2014), we replaced rinderpest with Salvinia molesta D. S. Mitch, leaving 95 species to evaluate. Species data were cleaned using two R packages. The scrubr package (https://github.com/ropensci/scrubr) was used with default settings to exclude records with geographic coordinates that were lacking, impossible, incomplete, imprecise, or unlikely. Data were further cleaned using the CoordinateCleaner package (Zizka et al. 2019), where records were excluded if geographic coordinates were zero (i.e., a flag for probable data error), near a country’s capital and geographic centroid, or near administrative locations (e.g., museums, GBIF headquarters). Data were then restricted to unique spatio-temporal records during the years 2000–2021 to exclude duplicate entries. This step also omitted older records that tend to have greater taxonomic and geographic uncertainty (e.g., GPS selective availability was removed in 2000). Finally, resulting maps were visually examined, where oddities (e.g., a tropical species located on Baffin Island or a terrestrial species in mid-ocean) were manually excluded from data. That last step removed a few locations per species, if any. As a result of the above process, data were cleaned to be conservative for errors in geographic distribution and consistent in taxonomy with Lowe et al. (2000). Aggregation and Mapping We spatially aggregated point data per species in (0.5 degree)2 grid cells, using the World Geodetic System (WGS84); the basis for the geospatial positioning system. Thus analyses below and summarized data refer to (0.5 degree)2 grid cells as units of study. Aggregation in space simplified variable coordinate accuracy in original records while retaining substantial resolution for global analyses. For two reasons that affected interpretations, we also aggregated data in time by pooling all records obtained for years 2000-2021. First, species richness is then based on presence/absence of reported species at any time during 22 consecutive years and should be sensitive to infrequent observations or occurrences. We reasoned that species absence maintained through two decades was either: (a) likely true or (b) due to lack of submitted records for that location, where the difference may be inferred from spatial patterns of records. Secondly, the difference between species richness (i.e., presence/absence) and cumulative occurrence was enhanced. Species richness is fully insensitive to commonality or rarity; a single record here obtains the same result as daily repeats for 22 years. In contrast, cumulative occurrences may range from 0 to thousands of (0.5 degree)2 pixels during 22 years and could indicate commonality or rarity. Therefore, fundamental differences between species richness and occurrences were enhanced here by using data for years 2000–2021. We mapped species richness and cumulative records to address question 1 (“how are data distributed globally?”). Potential Predictors of Invasive Species We analyzed spatial autocorrelation (using longitude and latitude of 0.5o grid cell centers) with local estimation (“loess”) regression to obtain a surface representing only geographic coordinate effects. Loess regression is a robust, nonparametric approach to represent a complex spatial surface (Ferrier et al. 2002, Helsel and Ryker 2002) and is not too computationally-intensive for fine-grained global data, unlike approaches based on covariance matrices or network meshes. The spatial texture of a loess regression surface is determined by its span, where values <1 have more texture and values >1 are smoother. We modeled species richness and cumulative records using the loess command in R, with degree = 2 (i.e., 2nd-order polynomial) and least-squares fitting. We iteratively adjusted span to minimize the residual standard error and maximize the correlation between predicted values and actual total records. Predicted values represented spatial autocorrelation alone. Subsequent models using additional predictors (Table 1) included predictions from the loess model to evaluate those additional effects after already accounting for spatial autocorrelation. In addition, hierarchical structure (i.e., non-independence of grids) of grid cells within countries and anthropogenic biomes (anthromes) was handled using spatial GLMMs (Dormann et al. 2007). All other predictors were matched to the 0.5° gridded species data using projectRaster in the R raster package. Climate conditions
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains INSDC sequences associated with environmental sample identifiers. The dataset is prepared periodically using the public ENA API (https://www.ebi.ac.uk/ena/portal/api/) by querying data with the search parameters: `environmental_sample=True & host=""`
EMBL-EBI also publishes other records in separate datasets (https://www.gbif.org/publisher/ada9d123-ddb4-467d-8891-806ea8d94230).
The data was then processed as follows:
1. Human sequences were excluded.
2. For non-CONTIG records, the sample accession number (when available) along with the scientific name were used to identify sequence records corresponding to the same individuals (or group of organism of the same species in the same sample). Only one record was kept for each scientific name/sample accession number.
3. Contigs and whole genome shotgun (WGS) records were added individually.
4. The records that were missing some information were excluded. Only records associated with a specimen voucher or records containing both a location AND a date were kept.
5. The records associated with the same vouchers are aggregated together.
6. A lot of records left corresponded to individual sequences or reads corresponding to the same organisms. In practise, these were "duplicate" occurrence records that weren't filtered out in STEP 2 because the sample accession sample was missing. To identify those potential duplicates, we grouped all the remaining records by `scientific_name`, `collection_date`, `location`, `country`, `identified_by`, `collected_by` and `sample_accession` (when available). Then we excluded the groups that contained more than 50 records. The rationale behind the choice of threshold is explained here: https://github.com/gbif/embl-adapter/issues/10#issuecomment-855757978
7. To improve the matching of the EBI scientific name to the GBIF backbone taxonomy, we incorporated the ENA taxonomic information. The kingdom, Phylum, Class, Order, Family, and genus were obtained from the ENA taxonomy checklist available here: http://ftp.ebi.ac.uk/pub/databases/ena/taxonomy/sdwca.zip
More information available here: https://github.com/gbif/embl-adapter#readme
You can find the mapping used to format the EMBL data to Darwin Core Archive here: https://github.com/gbif/embl-adapter/blob/master/DATAMAPPING.md
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
see the introduction kernel on how to access data and use annotation labels
This dataset contains Lead II signal (with annotations) of 201 records collected from following 3 databases available on PhysioNet under open access:
- All signals have been resampled to 128Hz and gain has been removed.
- Baseline wander has been removed using Median Filtering.
- Denoising was NOT used.
- Signal data and annotation labels have been saved in numpy (.npy) format.
- All Signals are nearly 30 mins long.
Data has been organised as follows:
parent directory db_npy contains 3 sub-directories each of which represent one database
mitdb_npy has 48 records
svdb_npy has 78 records
incartdb_npy has 75 records
Each of these database directory contains a '***RECORDS***' file that lists the ecg records available in that database.
Each record has 3 files associated with it:
- rec_BEAT.npy: contains 'beat' annotations (R-peaks and its label) for the record. each record may have variable number of beats based on heart rate usually we shall be interested in beat annotation labels only. Each beat label represents one R-peak and hence one beat
- rec_NBEAT.npy: contains 'non-beat' annotations (Other than R-peaks) for the record
- rec_SIG_II.npy: contains the Lead 2 signal data of the record as a single numpy array
(* see the introduction kernel on how to access data and use annotation labels *)
There are two types of annotations: Beat and Non-Beat annotations. Beat annotations are associated with each heart-beat. If you are working with heart-beat classifications then only Beat annotations shall be useful and Non-Beat annotations can be ignored.
Standard PhysioNet Annotations are described in db_npy/annotations.txt file. These are common across all databases. This file has 3 columns Column 1: Label Column 2: Type of label [ b=beat annotation; n=non-beat annotation ] Column 3: Description
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F5417013%2F43781477b37cdca4adb187e4b39c3648%2Fstdlab.png?generation=1593929940178708&alt=media" alt="">
There are 19 Beat annotations and 22 Non-Beat annotations. However, not all annotations may occur in data files. For example, the Label 'r' does not occur even once in any three of the database but yet its the part of standard PhysioNet labels. (might be in use in some other database). It's advised to do a full annotation count before working with data.
According to AAMI recommendation, each beat is classified into one of the 5 types [ N, V, S, F, Q ]. However, you are free to choose any classification strategy.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F5417013%2F58397c44c2e877ddb61418270eb2f3fd%2FRelation-between-MIT-BIH-heartbeats-and-AAMI-standards.png?generation=1593940064290748&alt=media" alt="">
- mitdb's record '102' and '104' DO NOT have lead 2 signal available hence files '102_SIG_II.npy' and '104_SIG_II.npy' are not present. However, they have BEAT and NBEAT files present. Its advised not to use those two records
- mitdb's record '102', '104', '107' and '217' are paced records
- mitdb's record '207' is the only record with 'Flutter' waves that are not marked by beat-annotations (no R-peaks marked). However, they are marked by non-beat annotations.
**PhysioNet **[https://physionet.org/] MLA Goldberger, A., et al. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000). APA Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. Chicago Goldberger, A., L. Amaral, L. Glass, J. Hausdorff, P. C. Ivanov, R. Mark, J. E. Mietus, G. B. Moody, C. K. Peng, and H. E. Stanley. "PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220." (2000). Harvard Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P.C., Mark, R., Mietus, J.E., Moody, G.B., Peng, C.K. and Stanley, H.E., 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. Vancouver Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov PC, Mark R...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this project, we work on repairing three datasets:
country_protocol_code, conduct the same clinical trials which is identified by eudract_number. Each clinical trial has a title that can help find informative details about the design of the trial.eudract_number. The ground truth samples in the dataset were established by aligning information about the trial populations provided by external registries, specifically the CT.gov database and the German Trials database. Additionally, the dataset comprises other unstructured attributes that categorize the inclusion criteria for trial participants such as inclusion.code. Samples with the same code represent the same product but are extracted from a differentb source. The allergens are indicated by (‘2’) if present, or (‘1’) if there are traces of it, and (‘0’) if it is absent in a product. The dataset also includes information on ingredients in the products. Overall, the dataset comprises categorical structured data describing the presence, trace, or absence of specific allergens, and unstructured text describing ingredients. N.B: Each '.zip' file contains a set of 5 '.csv' files which are part of the afro-mentioned datasets:
Facebook
TwitterBackgroundExcipient allergy is a rare, but potentially lethal, form of drug allergy. Diagnosing excipient allergy remains difficult in regions without mandatory drug ingredient disclosure and is a significant barrier to drug safety.ObjectiveTo investigate the feasibility of a drug allergy registry-based excipient database to identify potential excipient culprits in patients with history of drug allergy, using polyethylene glycol (PEG) as an example.MethodsAn excipient registry was created by compiling the excipient lists pertaining to all available formulations of the top 50 most reported drug allergy culprits in Hong Kong. Availability of excipient information, and its relationship with total number of formulations of individual drugs were analysed. All formulations were checked for the presence or absence of PEG.ResultsComplete excipient information was available for 36.5% (729/2,000) of all formulations of the top 50 reported drug allergy culprits in Hong Kong. The number of formulations for each drug was associated with proportion of available excipient information (ρ = 0.466, p = 0.001). Out of 729 formulations, 109 (15.0%) and 620 (85.0%) were confirmed to contain and not contain PEG, respectively. Excipient information was not available for the other 1,271 (63.6%) formulations. We were unable to confirm the presence or absence of PEG in any of the top 50 drug allergy culprits in Hong Kong.ConclusionIn countries without mandatory drug ingredient disclosure, excipient databases are unlikely able to identify potential excipient allergy in drug allergy patients. Legislations to enforce mandatory and universal ingredient disclosure are urgently needed.
Facebook
TwitterThe International Registry of Reproductive Pathology Database is part of pioneering work done by Dr. Kenneth McEntee to comprehensively document thousands of disease cases studies. His large and comprehensive collection of case reports and physical samples was complimented by development of the International Registry of Reproductive Pathology Database in the 1980s. The original FoxPro Database files and a migrated access version were completed by the College of Veterinary Medicine in 2016. Access CSV files were completed by the University of Illinois Library in 2017.