100+ datasets found
  1. O*NET Database

    • onetcenter.org
    excel, mysql, oracle +2
    Updated May 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Center for O*NET Development (2025). O*NET Database [Dataset]. https://www.onetcenter.org/database.html
    Explore at:
    oracle, sql server, text, mysql, excelAvailable download formats
    Dataset updated
    May 22, 2025
    Dataset provided by
    Occupational Information Network
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Dataset funded by
    United States Department of Laborhttp://www.dol.gov/
    Description

    The O*NET Database contains hundreds of standardized and occupation-specific descriptors on almost 1,000 occupations covering the entire U.S. economy. The database, which is available to the public at no cost, is continually updated by a multi-method data collection program. Sources of data include: job incumbents, occupational experts, occupational analysts, employer job postings, and customer/professional association input.

    Data content areas include:

    • Worker Characteristics (e.g., Abilities, Interests, Work Styles)
    • Worker Requirements (e.g., Education, Knowledge, Skills)
    • Experience Requirements (e.g., On-the-Job Training, Work Experience)
    • Occupational Requirements (e.g., Detailed Work Activities, Work Context)
    • Occupation-Specific Information (e.g., Job Titles, Tasks, Technology Skills)

  2. d

    Dr. Duke's Phytochemical and Ethnobotanical Databases

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Dr. Duke's Phytochemical and Ethnobotanical Databases [Dataset]. https://catalog.data.gov/dataset/dr-dukes-phytochemical-and-ethnobotanical-databases-0849e
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    Of interest to pharmaceutical, nutritional, and biomedical researchers, as well as individuals and companies involved with alternative therapies and and herbal products, this database is one of the world's leading repositories of ethnobotanical data, evolving out of the extensive compilations by the former Chief of USDA's Economic Botany Laboratory in the Agricultural Research Service in Beltsville, Maryland, in particular his popular Handbook of phytochemical constituents of GRAS herbs and other economic plants (CRC Press, Boca Raton, FL, 1992). In addition to Duke's own publications, the database documents phytochemical information and quantitative data collected over many years through research results presented at meetings and symposia, and findings from the published scientific literature. The current Phytochemical and Ethnobotanical databases facilitate plant, chemical, bioactivity, and ethnobotany searches. A large number of plants and their chemical profiles are covered, and data are structured to support browsing and searching in several user-focused ways. For example, users can get a list of chemicals and activities for a specific plant of interest, using either its scientific or common name download a list of chemicals and their known activities in PDF or spreadsheet form find plants with chemicals known for a specific biological activity display a list of chemicals with their LD toxicity data find plants with potential cancer-preventing activity display a list of plants for a given ethnobotanical use find out which plants have the highest levels of a specific chemical References to the supporting scientific publications are provided for each specific result. Resources in this dataset:Resource Title: Duke-Source-CSV.zip. File Name: Duke-Source-CSV.zipResource Description: Dr. Duke's Phytochemistry and Ethnobotany - raw database tables for archival purposes. Visit https://phytochem.nal.usda.gov/phytochem/search for the interactive web version of the database.Resource Title: Data Dictionary (preliminary). File Name: DrDukesDatabaseDataDictionary-prelim.csvResource Description: This Data Dictionary describes the columns for each table. [Note that this is in progress and some variables are yet to be defined or are unused in the current implementation. Please send comments/suggestions to nal-adc-curator@ars.usda.gov ]

  3. Data from: CottonGen: Cotton Database Resources

    • catalog.data.gov
    • datasetcatalog.nlm.nih.gov
    • +1more
    Updated Apr 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). CottonGen: Cotton Database Resources [Dataset]. https://catalog.data.gov/dataset/cottongen-cotton-database-resources-151bf
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    CottonGen (https://www.cottongen.org) is a curated and integrated web-based relational database providing access to publicly available genomic, genetic and breeding data to enable basic, translational and applied research in cotton. Built using the open-source Tripal database infrastructure, CottonGen supersedes CottonDB and the Cotton Marker Database, which includes sequences, genetic and physical maps, genotypic and phenotypic markers and polymorphisms, quantitative trait loci (QTLs), pathogens, germplasm collections and trait evaluations, pedigrees, and relevant bibliographic citations, with enhanced tools for easier data sharing, mining, visualization, and data retrieval of cotton research data. CottonGen contains annotated whole genome sequences, unigenes from expressed sequence tags (ESTs), markers, trait loci, genetic maps, genes, taxonomy, germplasm, publications and communication resources for the cotton community. Annotated whole genome sequences of Gossypium raimondii are available with aligned genetic markers and transcripts. These whole genome data can be accessed through genome pages, search tools and GBrowse, a popular genome browser. Most of the published cotton genetic maps can be viewed and compared using CMap, a comparative map viewer, and are searchable via map search tools. Search tools also exist for markers, quantitative trait loci (QTLs), germplasm, publications and trait evaluation data. CottonGen also provides online analysis tools such as NCBI BLAST and Batch BLAST. This project is funded/supported by Cotton Incorporated, the USDA-ARS Crop Germplasm Research Unit at College Station, TX, the Southern Association of Agricultural Experiment Station Directors, Bayer CropScience, Corteva/Agriscience, Dow/Phytogen, Monsanto, Washington State University, and NRSP10. Resources in this dataset:Resource Title: Website Pointer for CottonGen. File Name: Web Page, url: https://www.cottongen.org/ Genomic, Genetic and Breeding Resources for Cotton Research Discovery and Crop Improvement organized by : Species (Gossypium arboreum, barbadense, herbaceum, hirsutum, raimondii, others), Data (Contributors, Download, Submission, Community Projects, Archives, Cotton Trait Ontology, Nomenclatures, and links to Variety Testing Data and NCBISRA Datasets), Search options (Colleague, Genes and Transcripts, Genotype, Germplasm, Map, Markers, Publications, QTLs, Sequences, Trait Evaluation, MegaSearch), Tools (BIMS, BLAST+, CottonCyc, JBrowse, Map Viewer, Primer3, Sequence Retrieval, Synteny Viewer), International Cotton Genome Initiative (ICGI), and Help sources (User manual, FAQs). Also provides Quick Start links for Major Species and Tools.

  4. Annotated Benchmark of Real-World Data for Approximate Functional Dependency...

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Jul 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcel Parciak; Marcel Parciak; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren (2023). Annotated Benchmark of Real-World Data for Approximate Functional Dependency Discovery [Dataset]. http://doi.org/10.5281/zenodo.8098909
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 1, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marcel Parciak; Marcel Parciak; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren; Sebastiaan Weytjens; Frank Neven; Niel Hens; Liesbet M. Peeters; Stijn Vansummeren
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Annotated Benchmark of Real-World Data for Approximate Functional Dependency Discovery

    This collection consists of ten open access relations commonly used by the data management community. In addition to the relations themselves (please take note of the references to the original sources below), we added three lists in this collection that describe approximate functional dependencies found in the relations. These lists are the result of a manual annotation process performed by two independent individuals by consulting the respective schemas of the relations and identifying column combinations where one column implies another based on its semantics. As an example, in the claims.csv file, the AirportCode implies AirportName, as each code should be unique for a given airport.

    The file ground_truth.csv is a comma separated file containing approximate functional dependencies. table describes the relation we refer to, lhs and rhs reference two columns of those relations where semantically we found that lhs implies rhs.

    The file excluded_candidates.csv and included_candidates.csv list all column combinations that were excluded or included in the manual annotation, respectively. We excluded a candidate if there was no tuple where both attributes had a value or if the g3_prime value was too small.

    Dataset References

  5. H

    ACCESS DB Version (Aug 29 2017)

    • dataverse.harvard.edu
    7z
    Updated Nov 8, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2017). ACCESS DB Version (Aug 29 2017) [Dataset]. http://doi.org/10.7910/DVN/SHMDGU
    Explore at:
    7z(96722220), 7z(71086286)Available download formats
    Dataset updated
    Nov 8, 2017
    Dataset provided by
    Harvard Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Download CBDB Standalone Database. The standalone version of the China Biographical Database (CBDB) contains data on over 417,000 men and women. Users must download two compressed files: a BASE file and a USER file. To use this database do the following: Please download 20170829CBDBavBase.7z (the BASE file) and 20170829CBDBavUser.7z (the USER file) uncompress them to the same folder. Uncompressed there will be get four files CBDB_InstallationGuide.pdf, HelpFiles folder, 20170829CBDBavUser.mdb, 20170829CBDBavBase.mdb. The CBDB_InstallationGuide.pdf gives instructions on installation: see Part 2. "Installing the Database".

  6. d

    Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis...

    • catalog.data.gov
    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis and Summary Statistics [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-3-0-vector-analysis-and-summary-stati
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    United States
    Description

    Spatial analysis and statistical summaries of the Protected Areas Database of the United States (PAD-US) provide land managers and decision makers with a general assessment of management intent for biodiversity protection, natural resource management, and recreation access across the nation. The PAD-US 3.0 Combined Fee, Designation, Easement feature class (with Military Lands and Tribal Areas from the Proclamation and Other Planning Boundaries feature class) was modified to remove overlaps, avoiding overestimation in protected area statistics and to support user needs. A Python scripted process ("PADUS3_0_CreateVectorAnalysisFileScript.zip") associated with this data release prioritized overlapping designations (e.g. Wilderness within a National Forest) based upon their relative biodiversity conservation status (e.g. GAP Status Code 1 over 2), public access values (in the order of Closed, Restricted, Open, Unknown), and geodatabase load order (records are deliberately organized in the PAD-US full inventory with fee owned lands loaded before overlapping management designations, and easements). The Vector Analysis File ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") associated item of PAD-US 3.0 Spatial Analysis and Statistics ( https://doi.org/10.5066/P9KLBB5D ) was clipped to the Census state boundary file to define the extent and serve as a common denominator for statistical summaries. Boundaries of interest to stakeholders (State, Department of the Interior Region, Congressional District, County, EcoRegions I-IV, Urban Areas, Landscape Conservation Cooperative) were incorporated into separate geodatabase feature classes to support various data summaries ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip") and Comma-separated Value (CSV) tables ("PADUS3_0SummaryStatistics_TabularData_CSV.zip") summarizing "PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip" are provided as an alternative format and enable users to explore and download summary statistics of interest (Comma-separated Table [CSV], Microsoft Excel Workbook [.XLSX], Portable Document Format [.PDF] Report) from the PAD-US Lands and Inland Water Statistics Dashboard ( https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-statistics ). In addition, a "flattened" version of the PAD-US 3.0 combined file without other extent boundaries ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") allow for other applications that require a representation of overall protection status without overlapping designation boundaries. The "PADUS3_0VectorAnalysis_State_Clip_CENSUS2020" feature class ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.gdb") is the source of the PAD-US 3.0 raster files (associated item of PAD-US 3.0 Spatial Analysis and Statistics, https://doi.org/10.5066/P9KLBB5D ). Note, the PAD-US inventory is now considered functionally complete with the vast majority of land protection types represented in some manner, while work continues to maintain updates and improve data quality (see inventory completeness estimates at: http://www.protectedlands.net/data-stewards/ ). In addition, changes in protected area status between versions of the PAD-US may be attributed to improving the completeness and accuracy of the spatial data more than actual management actions or new acquisitions. USGS provides no legal warranty for the use of this data. While PAD-US is the official aggregation of protected areas ( https://www.fgdc.gov/ngda-reports/NGDA_Datasets.html ), agencies are the best source of their lands data.

  7. Great Lakes Environmental Database (GLENDA)

    • catalog.data.gov
    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    Updated Mar 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Environmental Protection Agency, Region 5 (2024). Great Lakes Environmental Database (GLENDA) [Dataset]. https://catalog.data.gov/dataset/great-lakes-environmental-database-glenda
    Explore at:
    Dataset updated
    Mar 16, 2024
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Area covered
    The Great Lakes
    Description

    The Great Lakes Environmental Database (GLENDA) houses environmental data collected by EPA Great Lakes National Program Office (GLNPO) programs that sample water, aquatic life, sediments, and air to assess the health of the Great Lakes ecosystem. GLENDA is available to the public on the EPA Central Data Exchange (CDX). A CDX account is required, which anyone may create. GLENDA offers “Ready to Download Data Files” prepared by GLNPO or a “Query Data” interface that allows users to select from predefined parameters to create a customized query. Query results can be downloaded in .csv format. GLNPO programs providing data in GLENDA include the Great Lakes Water Quality Survey and Great Lakes Biology Monitoring Program (1983-present, biannual monitoring throughout the Great Lakes to assess water quality, chemical, nutrient, and physical parameters, and biota such as plankton and benthic invertebrates), the Great Lakes Fish Monitoring and Surveillance Program (1977-present, annual analysis of top predator fish composites to assess historic and emerging persistent, bioaccumulative, or toxic chemical contaminants), the Cooperative Science and Monitoring Initiative (2002-present, intensive water quality and biology sampling of one lake per year focusing on key challenges and data gaps), the Great Lakes Integrated Atmospheric Deposition Network (1990-present, monitoring Great Lakes air and precipitation for persistent toxic chemicals), the Lake Michigan Mass Balance Study (1993-1996, analyzed the atmosphere, tributaries, sediments, water column, and biota of Lake Michigan for nutrients, atrazine, PCBs, trans-nonachlor, and mercury modelling), and the Great Lakes Legacy Act (1996-present, evaluations of sediment contamination in Areas of Concern). GLENDA is updated frequently with new data.

  8. a

    USDA NRCS Soil Survey Geographic Database (SSURGO) Access

    • njogis-newjersey.opendata.arcgis.com
    • share-open-data-njtpa.hub.arcgis.com
    Updated Sep 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New Jersey Office of GIS (2020). USDA NRCS Soil Survey Geographic Database (SSURGO) Access [Dataset]. https://njogis-newjersey.opendata.arcgis.com/documents/2af9435fb0f447258c38e8d9609a34cd
    Explore at:
    Dataset updated
    Sep 9, 2020
    Dataset authored and provided by
    New Jersey Office of GIS
    Description

    SSURGO consists of spatial data and a comprehensive relational database with tables that describe soil properties, interpretations and productivity values. The USDA Natural Resources Conservation Service (NRCS, formerly Soil Conservation Service) provides a download of the statewide SSURGO database that includes vector and raster spatial data, database tables and their relationship classes, and a user guide. To access SSURGO, go to the USDA NRCS Geospatial Data Gateway. To download the database, on the right side of the page, click on the Direct Data Download link under, I Want To... The Direct Data / NAIP Download page will then open. Click on the Soils Geographic Databases link. Then click on the folder named gSSURGO by State (date in folder name). Scroll through the list and select gSSURGO_NJ.zip. Then click on the Download button on the upper right. A message will open that Your Download is In Progress. You will then be prompted to select a file download location.

  9. r

    RAD-IT database

    • catalog.riits.net
    Updated Nov 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). RAD-IT database [Dataset]. https://catalog.riits.net/dataset/rad-it-database
    Explore at:
    Dataset updated
    Nov 16, 2021
    Description

    The RAD-IT database, based on version 8.1 of RAD-IT and ARC-IT, is available for download, and includes the interconnect details of the ITS elements and data flows. Interconnect diagrams developed from this database are used throughout the laconnect-it.com website.

  10. p

    Data from: MIT-BIH Arrhythmia Database

    • physionet.org
    • opendatalab.com
    • +1more
    Updated Feb 24, 2005
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    George Moody; Roger Mark (2005). MIT-BIH Arrhythmia Database [Dataset]. http://doi.org/10.13026/C2F305
    Explore at:
    Dataset updated
    Feb 24, 2005
    Authors
    George Moody; Roger Mark
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    The MIT-BIH Arrhythmia Database contains 48 half-hour excerpts of two-channel ambulatory ECG recordings, obtained from 47 subjects studied by the BIH Arrhythmia Laboratory between 1975 and 1979. Twenty-three recordings were chosen at random from a set of 4000 24-hour ambulatory ECG recordings collected from a mixed population of inpatients (about 60%) and outpatients (about 40%) at Boston's Beth Israel Hospital; the remaining 25 recordings were selected from the same set to include less common but clinically significant arrhythmias that would not be well-represented in a small random sample.

  11. u

    Genome Database for Rosaceae

    • agdatacommons.nal.usda.gov
    • datasetcatalog.nlm.nih.gov
    bin
    Updated Feb 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Washington State University; Clemson University (2024). Genome Database for Rosaceae [Dataset]. http://doi.org/10.15482/USDA.ADC/1176963
    Explore at:
    binAvailable download formats
    Dataset updated
    Feb 8, 2024
    Dataset provided by
    Ag Data Commons
    Authors
    Washington State University; Clemson University
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    Initiated in 2003, the Genome Database for Rosaceae (GDR) is a curated and integrated web-based relational database providing centralized access to Rosaceae genomics, genetics and breeding data and analysis tools to facilitate basic, translational and applied Rosaceae research. GDR is supported by grants from the National Science Foundation Plant Genome Program (2003-2008), USDA National Institute of Food and Agriculture (NIFA) Specialty Crop Research Program (2009-2019), USDA NIFA National Research Support Project 10 (2014-2019), and the Washington Tree Fruit Research Commission (2008-2016), Clemson University, University of Florida and Washington State University. http://www.ars.usda.gov/is/graphics/photos/aug97/k6084-1.htm">K6084-1: Photo by Jack Dykinga Resources in this dataset:Resource Title: Genome Database for Rosaceae - Download Data. File Name: Web Page, url: https://www.rosaceae.org/data/download This is the download page for the Genome Database for Rosaceae - datasets can be downloaded directly from this location

  12. g

    Download US ZIP Code Dataset - United States of America

    • geopostcodes.com
    csv
    Updated Jun 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GeoPostcodes (2025). Download US ZIP Code Dataset - United States of America [Dataset]. https://www.geopostcodes.com/country/united-states-zip-code/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 10, 2025
    Dataset authored and provided by
    GeoPostcodes
    Area covered
    United States
    Description

    Our United States zip code Database offers comprehensive postal code data for spatial analysis, including postal and administrative areas. This dataset contains accurate and up-to-date information on all administrative divisions, cities, and zip codes, making it an invaluable resource for various applications such as address capture and validation, map and visualization, reporting and business intelligence (BI), master data management, logistics and supply chain management, and sales and marketing. Our location data packages are available in various formats, including CSV, optimized for seamless integration with popular systems like Esri ArcGIS, Snowflake, QGIS, and more. Product features include fully and accurately geocoded data, multi-language support with address names in local and foreign languages, comprehensive city definitions, and the option to combine map data with UNLOCODE and IATA codes, time zones, and daylight saving times. Companies choose our location databases for their enterprise-grade service, reduction in integration time and cost by 30%, and weekly updates to ensure the highest quality.

  13. p

    MIMIC-III Clinical Database

    • physionet.org
    Updated Sep 4, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alistair Johnson; Tom Pollard; Roger Mark (2016). MIMIC-III Clinical Database [Dataset]. http://doi.org/10.13026/C2XW26
    Explore at:
    Dataset updated
    Sep 4, 2016
    Authors
    Alistair Johnson; Tom Pollard; Roger Mark
    License

    https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

    Description

    MIMIC-III is a large, freely-available database comprising deidentified health-related data associated with over forty thousand patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012. The database includes information such as demographics, vital sign measurements made at the bedside (~1 data point per hour), laboratory test results, procedures, medications, caregiver notes, imaging reports, and mortality (including post-hospital discharge).MIMIC supports a diverse range of analytic studies spanning epidemiology, clinical decision-rule improvement, and electronic tool development. It is notable for three factors: it is freely available to researchers worldwide; it encompasses a diverse and very large population of ICU patients; and it contains highly granular data, including vital signs, laboratory results, and medications.

  14. b

    The DINGO Database, v1.1 (UPDATED version available at DOI:...

    • data.bris.ac.uk
    Updated Apr 23, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). The DINGO Database, v1.1 (UPDATED version available at DOI: 10.5523/bris.1jraem68g7ara21p2oi6hv4z22) - Datasets - data.bris [Dataset]. https://data.bris.ac.uk/data/dataset/89r3npvewel2ea8ttb67ku4d
    Explore at:
    Dataset updated
    Apr 23, 2021
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a database of pile load test information that has been built as part of the Engineering and Physical Sciences Research Council (EPSRC) funded project EP/P020933/1: Databases to INterrogate Geotechnical Observations (DINGO) which ran between 1 July 2017 and 9 June 2019. The database is populated with data digitised from the literature as well as datasets supplied by contributors from the geotechnical engineering industry in the United Kingdom. Contributors have agreed in writing for their data to be shared via the DINGO Database and are cited as personal communication. v1.1 is a minor revision of v1.0 with some error corrections. v1.0 can be found at https://doi.org/10.5523/bris.3r14qbdhv648b2p83gjqby2fl8. N.b. these data have been superseded by The DINGO Database, v1.2 (https://doi.org/10.5523/bris.1jraem68g7ara21p2oi6hv4z22).

  15. R

    Global Database Dataset

    • universe.roboflow.com
    zip
    Updated May 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Faisal (2024). Global Database Dataset [Dataset]. https://universe.roboflow.com/faisal-trfop/global-database
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 10, 2024
    Dataset authored and provided by
    Faisal
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Variables measured
    Defects JMG4 Bounding Boxes
    Description

    Global Database

    ## Overview
    
    Global Database is a dataset for object detection tasks - it contains Defects JMG4 annotations for 3,449 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
    
  16. d

    Structural Antibody Database

    • dknet.org
    • neuinfo.org
    • +2more
    Updated Apr 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Structural Antibody Database [Dataset]. http://identifiers.org/RRID:SCR_022096
    Explore at:
    Dataset updated
    Apr 20, 2022
    Description

    Database containing all antibody structures available in the PDB, annotated and presented in consistent fashion.Each structure is annotated with number of properties including experimental details, antibody nomenclature (e.g. heavy-light pairings), curated affinity data and sequence annotations. You can use the database to inspect individual structures, create and download datasets for analysis, search the database for structures with similar sequences to your query, monitor the known structural repetoire of antibodies.

  17. Land cover of Bolivia - Globcover (22 classes)

    • data.amerigeoss.org
    html, http, png, zip
    Updated Mar 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Food and Agriculture Organization (2023). Land cover of Bolivia - Globcover (22 classes) [Dataset]. https://data.amerigeoss.org/dataset/e3cd1134-b99f-4436-aa59-c540d7237bcf
    Explore at:
    http, png, html, zipAvailable download formats
    Dataset updated
    Mar 14, 2023
    Dataset provided by
    Food and Agriculture Organizationhttp://fao.org/
    License

    Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
    License information was derived automatically

    Area covered
    Bolivia
    Description

    This land cover data set is derived from the original raster based Globcover global archive. It has been post-processed to generate a vector version at national extent with the LCCS regional legend (22 classes worldwide). The database can be analyzed in the GLCN software Advanced Database Gateway (ADG), which provides a user-friendly interface and advanced functionalities to breakdown the LCCS classes in their classifiers for further aggregations and analysis.

    The data set is intended for free public access.

    The shape file's attributes contain the following fields: -Area (sqm) -Perimeter (m) -ID -Gridcode (Globcover cell value) -LCCCode (unique LCCS code)

    You can download a zip archive containing: -the shape file (.shp) -the ArcGis layer file with global legend (.lyr) -the ArcView 3 legend file (.avl) -the LCCS legend table (.xls)

    Supplemental Information:

    This land cover product is a vector version (ESRI shape) of the Globcover archive that was published in 2008 as result of an initiative launched in 2004 by the European Space Agency (ESA). Globcover is currently the most recent (2005) and resoluted (300 m) datasets on land cover globally. Given the need of this valuable information for environmental studies, natural resources management and policy formulation, through activities of the Global Land Cover Network (GLCN) programme, the Globcover has been reprocessed to generate databases at national extent that can be analyzed through the Advanced Database Gateway software (ADG) by GLCN. ADG is a cross-cutting interrogation software that allows the easy and fast recombination of land cover polygons according to the individual end-user requirements. Aggregated land cover classes can be generated not only by name, but also using the set of existing classifiers. ADG uses land cover data with a Land Cover Classification System (LCCS) legend. The ADG software is available for download on the GLCN web site at http://www.glcn.org/sof_7_en.jsp

    Contact points:

    Metadata Contact: FAO-Data

    Resource Contact: Antonio Martucci

    Data lineage:

    This land cover database is provided as ESRI shape file (vector format) and derives from reprocessing the raster based global archive, Globcover. Globcover database has undergone the following process: a) vectoralization at the national extent using ESRI ArcGis (arcinfo) 9.3; b) topological reconstruction (custom AML scripts launched inside ArcGis-arcinfo 9.3); c) simplification of areas according to a minimum mapping unit of 0.1 skim (10 ha) (custom AML scripts launched inside ArcGis-arcinfo 9.3); application of the FAO/UNEP Land Cover Classification System (LCCS) legend (24 classes globally); final processing to assure full compatibility with the GLCN software Advanced Database Gateway (ADG).

    Online resources:

    Download Land cover of Bolivia - Shape file format

    GLOBCOVER on the ESA Web site

    Global Land Cover Network - GLCN

  18. MARMICRODB database for taxonomic classification of (marine) metagenomes

    • zenodo.org
    • explore.openaire.eu
    application/gzip, bin +3
    Updated Mar 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shane L Hogle; Shane L Hogle (2020). MARMICRODB database for taxonomic classification of (marine) metagenomes [Dataset]. http://doi.org/10.5281/zenodo.3520509
    Explore at:
    bin, application/gzip, tsv, html, bz2Available download formats
    Dataset updated
    Mar 20, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Shane L Hogle; Shane L Hogle
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction:
    This sequence database (MARMICRODB) was introduced in the publication JW Becker, SL Hogle, K Rosendo, and SW Chisholm. 2019. Co-culture and biogeography of Prochlorococcus and SAR11. ISME J. doi:10.1038/s41396-019-0365-4. Please see the original publication and its associated supplementary material for the original description of this resource.

    Motivation:
    We needed a reference database to annotate shotgun metagenomes from the Tara Oceans project [1] the GEOTRACES cruises GA02, GA03, GA10, and GP13 and the HOT and BATS time series [2]. Our interests are primarily in quantifying and annotating the free-living, oligotrophic bacterial groups Prochlorococcus, Pelagibacterales/SAR11, SAR116, and SAR86 from these samples using the protein classifier tool Kaiju [3]. Kaiju’s sensitivity and classification accuracy depend on the composition of the reference database, and highest sensitivity is achieved when the reference database contains a comprehensive representation of expected taxa from an environment/sample of interest. However, the speed of the algorithm decreases as database size increases. Therefore, we aimed to create a reference database that maximized the representation of sequences from marine bacteria, archaea, and microbial eukaryotes, while minimizing (but not excluding) the sequences from clinical, industrial, and terrestrial host-associated samples.

    Results/Description:
    MARMICRODB consists of 56 million sequence non-redundant protein sequences from 18769 bacterial/archaeal/eukaryote genome and transcriptome bins and 7492 viral genomes optimized for use with the protein homology classifier Kaiju [3]. To ensure maximum representation of marine bacteria, archaea, and microbial eukaryotes, we included translated genes/transcripts from 5397 representative “specI” species clusters from the proGenomes database [4]; 113 transcriptomes from the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) [5]; 10509 metagenome assembled genomes from the Tara Oceans expedition [6,7], the Red Sea [8], the Baltic Sea [9], and other aquatic and terrestrial sources [10]; 994 isolate genomes from the Genomic Encyclopedia of Bacteria and Archaea [11]; 7492 viral genomes from NCBI RefSeq [12]; 786 bacterial and archaeal genomes from MarRef [13]; and 677 marine single cell genomes [14]. In order to annotate metagenomic reads at the clade/ecotype level (subspecies) for the focal taxa Prochlorococcus, Synechococcus, SAR11/Pelagibacterales, SAR86, and SAR116, we generated custom MARMICRODB taxonomies based on curated genome phylogenies for each group. The curated phylogenies, Kaiju formatted Burrows-Wheeler index, translated genes, the custom taxonomy hierarchy, an interactive kronaplot of the taxonomic composition, and scripts and instructions for how to use or rebuild the resource is available from 10.5281/zenodo.3520509.

    Methods:
    The curation and quality control of MARMICRODB single cell, metagenome assembled, and isolate genomes was performed as described in [15]. Briefly, we downloaded all MARMICRODB genomes as raw nucleotide assemblies from NCBI. We determined an initial genome taxonomy for these assemblies using checkM with the default lineage workflow [16]. All genome bins met the completion/contamination thresholds outlined in prior studies [7,17]. For single cell and metagenome assembled genomes, especially those from Tara Oceans Mediterranean sea samples [18], we use the GTDB-Tk classification workflow [19] to verify the taxonomic fidelity of each genome bin. We then selected genomes with a checkM taxonomic assignment of Prochlorococcus, Synechococcus, SAR11/Pelagibacterales, SAR86, and SAR116 for further analysis and confirmed taxonomic assignment using blast matches to known Prochlorococcus/Synechococcus ITS sequences and by matching 16S sequences to the SILVA database [20]. To refine our estimates of completeness/contamination of Prochlorococcus genome bins we created a custom set of 730 single copy protein families (available from 10.5281/zenodo.3719132) from closed, isolate Prochlorococcus genomes [21] for quality assessments with checkM. For Synechococcus we used the CheckM taxonomic-specific workflow with the genus Synechococcus. After the custom CheckM quality control, we excluded any genome bins from downstream analysis that had an estimated quality < 30, defined as %completeness – 5x %contamination resulting in 18769 genome/transcriptome bins. We predicted genes in the resulting genome bins using prodigal [22] and excluded protein sequences with lengths less than 20 and greater than 20000 amino acids, removed non-standard amino acid residues, and condensed redundant protein sequences to a single representative sequence to which we assigned a lowest common ancestor (LCA) taxonomy identifier from the NCBI taxonomy database [23]. The resulting protein sequences were compiled and used to build a Kaiju [3] search database.

    The above filtering criteria resulted in 605 Prochlorococcus, 96 Synechococcus, 186 SAR11/Pelagibacterales, 60 SAR86, and 59 SAR116 high-quality genome bins. We constructed a high quality fixed reference phylogenetic tree for each taxonomic group based on genomes manually selected for completeness and the phylogenetic diversity. For example the Prochlorococcus and Synechococcus genomes for the fixed reference phylogeny are estimated > 90% complete, and SAR11 genomes are estimated > 70% complete. We created multiple sequence alignments of phylogenetically conserved genes from these genomes using the GTDB-Tk pipeline [19] with default settings. The pipeline identifies conserved proteins (120 bacterial proteins) and generates concatenated multi-protein alignments [17] from the genome assemblies using hmmalign from the hmmer software suite. We further filtered the resulting alignment columns using the bacterial and archaeal alignment masks from [17] (http://gtdb.ecogenomic.org/downloads). We removed columns represented by fewer than 50% of all taxa and/or columns with no single amino acid residue occuring at a frequency greater than 25%. We trimmed the alignments using trimal [24] with the automated -gappyout option to trim columns based on their gap distribution. We inferred reference phylogenies using multithreaded RAxML [25] with the GAMMA model of rate heterogeneity, empirically determined base frequencies, and the LG substitution model [26](PROTGAMMALGF). Branch support is based on 250 resampled bootstrap trees. This tree was then pruned to only allow a maximum average distance to the closest leaf (ADCL) of 0.003 to reduce the phylogenetic redundancy in the tree [27]. We then “placed” genomes that either did not pass completeness threshold or were considered phylogenetically redundant by ADCL within the fixed reference phylogeny for each group using pplacer [28] representing each placed genome as a pendant edge in the final tree. We then examined the resulting tree and manually selected clade/ecotype cutoffs to be as consistent as possible with clade definitions previously outlined for these groups [29–32]. We then gave clades from each taxonomic group custom taxonomic identifiers and we added these identifiers to the MARMICRODB Kaiju taxonomic hierarchy.

    Software/databases used:
    checkM v1.0.11[16]
    HMMERv3.1b2 (http://hmmer.org/)
    prodigal v2.6.3 [22]
    trimAl v1.4.rev22 [24]
    AliView v1.18.1 [33] [34]
    Phyx v0.1 [35]
    RAxML v8.2.12 [36]
    Pplacer v1.1alpha [28]
    GTDB-Tk v0.1.3 [19]
    Kaiju v1.6.0 [34]
    GTDB RS83 (https://data.ace.uq.edu.au/public/gtdb/data/releases/release83/83.0/)
    NCBI Taxonomy (accessed 2018-07-02) [23]
    TIGRFAM v14.0 [37]
    PFAM v31.0 [38]

    Discussion/Caveats:
    MARMICRODB is optimized for metagenomic samples from the marine environment, in particular planktonic microbes from the pelagic euphotic zone. We expect this database may also be useful for classifying other types of marine metagenomic samples (for example mesopelagic, bathypelagic, or even benthic or marine host-associated), but it has not been tested as such. The original purpose of this database was to quantify clades/ecotypes of Prochlorococcus, Synechococcus, SAR11/Pelagibacterales, SAR86, and SAR116 in metagenomes from Tara Oceans Expedition and the GEOTRACES project. We carefully annotated and quality controlled genomes from these five groups, but the processing of the other marine taxa was largely automated and unsupervised. Taxonomy for other groups was copied over from the Genome Taxonomy Database (GTDB) [19,39] and NCBI Taxonomy [23] so any inconsistencies in those databases will be propagated to MARMICRODB. For most use cases MARMICRODB can probably be used unmodified, but if the user’s goal is to focus on a particular organism/clade that we did not curate in the database then the user may wish to spend some time curating those genomes (ie checking for contamination, dereplicating, building a genome phylogeny for custom taxonomy node assignment). Currently the custom taxonomy is hardcoded in the MARMICRODB.fmi index, but if users wish to modify MARMICRODB by adding or removing genomes, or reconfiguring taxonomic ranks the names.dmp and nodes.dmp files can easily be modified as well as the fasta file of protein sequences. However, the Kaiju index will need to be rebuilt, and user will require a high

  19. f

    Dataset

    • figshare.com
    xlsx
    Updated Aug 4, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Pinto (2021). Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.15104079.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Aug 4, 2021
    Dataset provided by
    figshare
    Authors
    Maria Pinto
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In the dataset

  20. g

    USGS Wind Turbine Database - Data Download

    • data.geospatialhub.org
    Updated Jul 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WyomingGeoHub (2021). USGS Wind Turbine Database - Data Download [Dataset]. https://data.geospatialhub.org/items/c664094ef4684205974b5a17e7f30b0f
    Explore at:
    Dataset updated
    Jul 30, 2021
    Dataset authored and provided by
    WyomingGeoHub
    Description

    The USGS United States Wind Turbine Database (USWTDB) holds data which provide the locations of land based and offshore wind turbines in the United States as well as corresponding wind project information and turbine technical specifications. The data are available on this page in a variety of tabular and geospatial file formats; cached and dynamic web services are available for users that which to access the USWTDB as a Representational State Transfer Services (RESTful) web service.The methods of data collection and related publications are available on this page as well to inform users of the data compilations and other related data sources.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
National Center for O*NET Development (2025). O*NET Database [Dataset]. https://www.onetcenter.org/database.html
Organization logo

O*NET Database

Explore at:
oracle, sql server, text, mysql, excelAvailable download formats
Dataset updated
May 22, 2025
Dataset provided by
Occupational Information Network
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
United States
Dataset funded by
United States Department of Laborhttp://www.dol.gov/
Description

The O*NET Database contains hundreds of standardized and occupation-specific descriptors on almost 1,000 occupations covering the entire U.S. economy. The database, which is available to the public at no cost, is continually updated by a multi-method data collection program. Sources of data include: job incumbents, occupational experts, occupational analysts, employer job postings, and customer/professional association input.

Data content areas include:

  • Worker Characteristics (e.g., Abilities, Interests, Work Styles)
  • Worker Requirements (e.g., Education, Knowledge, Skills)
  • Experience Requirements (e.g., On-the-Job Training, Work Experience)
  • Occupational Requirements (e.g., Detailed Work Activities, Work Context)
  • Occupation-Specific Information (e.g., Job Titles, Tasks, Technology Skills)

Search
Clear search
Close search
Google apps
Main menu