100+ datasets found
  1. h

    finetune-data-28fee8943227

    • huggingface.co
    Updated Aug 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subset Data, Inc. (2023). finetune-data-28fee8943227 [Dataset]. https://huggingface.co/datasets/subset-data/finetune-data-28fee8943227
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 4, 2023
    Dataset authored and provided by
    Subset Data, Inc.
    Description

    Dataset Card for "finetune-data-28fee8943227"

    More Information needed

  2. d

    NCDC Hourly Global Surface Variables-Selected Subset

    • catalog.data.gov
    • data.ca.gov
    Updated Nov 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Office of Environmental Health Hazard Assessment (2024). NCDC Hourly Global Surface Variables-Selected Subset [Dataset]. https://catalog.data.gov/dataset/ncdc-hourly-global-surface-variables-selected-subset
    Explore at:
    Dataset updated
    Nov 27, 2024
    Dataset provided by
    California Office of Environmental Health Hazard Assessment
    Description

    Holds hourly surface temperature data from weather stations across the globe, and an important source of temperature data for temperature-health studies.

  3. E

    CELEX Dutch lexical database - Orthography Subset

    • catalogue.elra.info
    • live.european-language-grid.eu
    Updated Oct 5, 2005
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) (2005). CELEX Dutch lexical database - Orthography Subset [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-L0029_02/
    Explore at:
    Dataset updated
    Oct 5, 2005
    Dataset provided by
    ELRA (European Language Resources Association)
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    License

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    Description

    The Dutch CELEX data is derived from R.H. Baayen, R. Piepenbrock & L. Gulikers, The CELEX Lexical Database (CD-ROM), Release 2, Dutch Version 3.1, Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, 1995.Apart from orthographic features, the CELEX database comprises representations of the phonological, morphological, syntactic and frequency properties of lemmata. For the Dutch data, frequencies have been disambiguated on the basis of the 42.4m Dutch Instituut voor Nederlandse Lexicologie text corpora.To make for greater compatibility with other operating systems, the databases have not been tailored to fit any particular database management program. Instead, the information is presented in a series of plain ASCII files, which can be queried with tools such as AWK and ICON. Unique identity numbers allow the linking of information from different files.This database can be divided into different subsets:· orthography: with or without diacritics, with or without word division positions, alternative spellings, number of letters/syllables;· phonology: phonetic transcriptions with syllable boundaries or primary and secondary stress markers, consonant-vowel patterns, number of phonemes/syllables, alternative pronunciations, frequency per phonetic syllable within words;· morphology: division into stems and affixes, flat or hierarchical representations, stems and their inflections;· syntax: word class, subcategorisations per word class;· frequency of the entries: disambiguated for homographic lemmata.

  4. e

    Subsetting

    • paper.erudition.co.in
    html
    Updated Mar 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Einetic (2025). Subsetting [Dataset]. https://paper.erudition.co.in/makaut/bachelor-of-computer-application-2023-2024/2/data-analysis-with-r/subsetting
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Mar 17, 2025
    Dataset authored and provided by
    Einetic
    License

    https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms

    Description

    Question Paper Solutions of chapter Subsetting of Data Analysis with R, 2nd Semester , Bachelor of Computer Application 2023-2024

  5. H

    AORC Subset

    • hydroshare.org
    • beta.hydroshare.org
    zip
    Updated Dec 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AORC Subset [Dataset]. https://www.hydroshare.org/resource/c1bce473fff641d7a678565af9785c31
    Explore at:
    zip(28.3 KB)Available download formats
    Dataset updated
    Dec 6, 2023
    Dataset provided by
    HydroShare
    Authors
    Ayman Nassar; David Tarboton; Anthony M. Castronova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2010 - Dec 31, 2019
    Area covered
    Description

    The objective of this HydroShare resource is to query AORC v1.0 Forcing data stored on HydroShare's Thredds server and create a subset of this dataset for a designated watershed and timeframe. The user is prompted to define their temporal and spatial frames of interest, which specifies the start and end dates for the data subset. Additionally, the user is prompted to define a spatial frame of interest, which could be a bounding box or a shapefile, to subset the data spatially.

    Before the subsetting is performed, data is queried, and geospatial metadata is added to ensure that the data is correctly aligned with its corresponding location on the Earth's surface. To achieve this, two separate notebooks were created - this notebook and this notebook - which explain how to query the dataset and add geospatial metadata to AORC v1.0 data in detail, respectively. In this notebook, we call functions from the AORC.py script to perform these preprocessing steps, resulting in a cleaner notebook that focuses solely on the subsetting process.

  6. Comprehensive Ocean - Atmosphere Data Set (COADS) LMRF Arctic Subset, 1950 -...

    • data.staging.idas-ds1.appdat.jsc.nasa.gov
    • search.dataone.org
    • +3more
    Updated Feb 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Comprehensive Ocean - Atmosphere Data Set (COADS) LMRF Arctic Subset, 1950 - 1995, Version 1 [Dataset]. https://data.staging.idas-ds1.appdat.jsc.nasa.gov/dataset/comprehensive-ocean-atmosphere-data-set-coads-lmrf-arctic-subset-1950-1995-version-1
    Explore at:
    Dataset updated
    Feb 18, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    The Comprehensive Ocean - Atmosphere Data Set (COADS) Long Marine Reports Fixed-Length (LMRF) Arctic subset contains marine surface weather reports for regions north of 65 degrees N from ships, drifting ice stations, and buoys. The COADS LMRF Arctic subset contains data collected over the years 1950 to 1995 and includes the following parameters: air and sea temperature, cloudiness, humidity, and winds. The data are in the form of individual marine reports with a given latitude and longitude.

  7. E

    AURORA Project database - Subset of SpeechDat-Car - German database -...

    • catalogue.elra.info
    • live.european-language-grid.eu
    Updated Aug 16, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2017). AURORA Project database - Subset of SpeechDat-Car - German database - Evaluation Package [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-AURORA-CD0003_03/
    Explore at:
    Dataset updated
    Aug 16, 2017
    Dataset provided by
    ELRA (European Language Resources Association)
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    License

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    Description

    The Aurora project was originally set up to establish a world wide standard for the feature extraction software which forms the core of the front-end of a DSR (Distributed Speech Recognition) system. ETSI formally adopted this activity as work items 007 and 008.The two work items within ETSI are:- ETSI DES/STQ WI007: Distributed Speech Recognition - Front-End Feature Extraction Algorithm & Compression Algorithm- ETSI DES/STQ WI008: Distributed Speech Recognition - Advanced Feature Extraction Algorithm. This database is a subset of the SpeechDat-Car database in German language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected German digits spoken in the following noise and driving conditions inside a car:1. High speed good road2. Low speed rough road3. Stopped with motor running4. Town traffic

  8. n

    Subset of data from the TAGS database of known age seals - Weddell Seals

    • cmr.earthdata.nasa.gov
    • researchdata.edu.au
    • +2more
    Updated Nov 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Subset of data from the TAGS database of known age seals - Weddell Seals [Dataset]. http://doi.org/10.4225/15/5b35bbdc5c3de
    Explore at:
    Dataset updated
    Nov 16, 2020
    Time period covered
    Jan 22, 1973 - Oct 2, 2006
    Area covered
    Description

    This database is a compendium of histories of known age seals (Weddell) from observations across the Southern Ocean but focussed on the Windmill Islands, Mawson and the Vestfold Hills. Although the following information pertains to Elephant Seals, it is assumed similar procedures were undertaken with the Weddell Seals between 1973 and 2006:

    At Macquarie Island 1000 seals were weighed per annum between 1993-2003 at birth and individually marked with two plastic flipper tags in the inter-digital webbing of their hind flippers. These tagged seals were weighed again at weaning, when length, girth, fat depth, and flipper measurements were made. Three weeks after weaning 2000 seals were permanently and individually marked by hot-iron branding. Recaptures and re-weighings of these known aged individuals were used to calculate growth and age-specific survival of the seals.

    Similar data were collected from elephant seals between 1950 and 1965 when seals were individually marked by hot-iron branding. Mark-recapture data from these cohorts were used to assess the demography of the declining population. Length and mass data were also collected for these cohorts and were used, for the first time, to assess the growth of individual seals without killing them.

    The database was held by the Australian Antarctic Data Centre, but was taken offline due to maintenance problems. A snapshot of the database was taken in June 2018 and stored in an access database.

    This work was completed as part of ASAC project 90.

  9. Marine Connectivity Database subset for GovHack 2016

    • ecat.ga.gov.au
    • researchdata.edu.au
    Updated May 30, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Commonwealth of Australia (Geoscience Australia) (2019). Marine Connectivity Database subset for GovHack 2016 [Dataset]. https://ecat.ga.gov.au/geonetwork/srv/api/records/98f93235-e053-4dd5-b7df-8c1a2c0be461
    Explore at:
    www:link-1.0-http--linkAvailable download formats
    Dataset updated
    May 30, 2019
    Dataset provided by
    Geoscience Australiahttp://ga.gov.au/
    Time period covered
    Jan 26, 2010 - Jun 24, 2010
    Area covered
    Description

    This is a subset of Geoscience Australia's Marine Connectivity Database (here), covering the North-west marine planning region for initial releases taking place in the interval January-March 2010. The subset is intended for use in development and testing as part of the GovHack 2016 competition.

  10. Subset of data from the TAGS database of known age seals - Elephant Seals

    • catalogue-temperatereefbase.imas.utas.edu.au
    • data.aad.gov.au
    • +2more
    Updated Nov 2, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AU/AADC > Australian Antarctic Data Centre, Australia (2017). Subset of data from the TAGS database of known age seals - Elephant Seals [Dataset]. https://catalogue-temperatereefbase.imas.utas.edu.au/geonetwork/srv/api/records/TAGS_Elephant_Seals
    Explore at:
    www:link-1.0-http--linkAvailable download formats
    Dataset updated
    Nov 2, 2017
    Dataset provided by
    Australian Antarctic Divisionhttps://www.antarctica.gov.au/
    Australian Antarctic Data Centre
    Time period covered
    Jan 1, 1950 - Jan 31, 2015
    Area covered
    Description

    This database is a compendium of histories of known age seals (Southern elephant) from observations across the Southern Ocean but focussed on Macquarie Island, Marion Island, Heard Island, Mawson and the Vestfold Hills.

    At Macquarie Island 1000 seals were weighed per annum between 1993-2003 at birth and individually marked with two plastic flipper tags in the inter-digital webbing of their hind flippers. These tagged seals were weighed again at weaning, when length, girth, fat depth, and flipper measurements were made. Three weeks after weaning 2000 seals were permanently and individually marked by hot-iron branding. Recaptures and re-weighings of these known aged individuals were used to calculate growth and age-specific survival of the seals.

    Similar data were collected from elephant seals between 1950 and 1965 when seals were individually marked by hot-iron branding. Mark-recapture data from these cohorts were used to assess the demography of the declining population. Length and mass data were also collected for these cohorts and were used, for the first time, to assess the growth of individual seals without killing them.

    At Marion Island all the elephant seals have been individually marked with two plastic flipper tags in their rear flippers. Recaptures of these seals were used to compare survival at Marion and Macquarie Islands.

    At Heard Island, seals were branded between 1949-1953. Seal length was measured in feet and inches. Recaptures of seals were made up until 1955, and growth and age-specific survival was calculated. Survival data from Heard Island were compared with concurrent data from Macquarie Island.

    The database was held by the Australian Antarctic Data Centre, but was taken offline due to maintenance problems. A snapshot of the database was taken in January 2015 and stored in an access database and several csv files.

    This work was completed as part of ASAC project 90.

  11. C

    NLCD 2016 Land Cover California Subset

    • data.cnra.ca.gov
    • data.ca.gov
    • +4more
    Updated Dec 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Fish and Wildlife (2023). NLCD 2016 Land Cover California Subset [Dataset]. https://data.cnra.ca.gov/dataset/nlcd-2016-land-cover-california-subset
    Explore at:
    arcgis geoservices rest api, htmlAvailable download formats
    Dataset updated
    Dec 20, 2023
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    California Department of Fish and Wildlife
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    California
    Description

    The U.S. Geological Survey (USGS), in partnership with several federal agencies, has developed and released five National Land Cover Database (NLCD) products over the past two decades: NLCD 1992, 2001, 2006, 2011, and 2016. The 2016 release saw landcover created for additional years of 2003, 2008, and 2013. These products provide spatially explicit and reliable information on the Nation’s land cover and land cover change. To continue the legacy of NLCD and further establish a long-term monitoring capability for the Nation’s land resources, the USGS has designed a new generation of NLCD products named NLCD 2019. The NLCD 2019 design aims to provide innovative, consistent, and robust methodologies for production of a multi-temporal land cover and land cover change database from 2001 to 2019 at 2–3-year intervals. Comprehensive research was conducted and resulted in developed strategies for NLCD 2019: continued integration between impervious surface and all landcover products with impervious surface being directly mapped as developed classes in the landcover, a streamlined compositing process for assembling and preprocessing based on Landsat imagery and geospatial ancillary datasets; a multi-source integrated training data development and decision-tree based land cover classifications; a temporally, spectrally, and spatially integrated land cover change analysis strategy; a hierarchical theme-based post-classification and integration protocol for generating land cover and change products; a continuous fields biophysical parameters modeling method; and an automated scripted operational system for the NLCD 2019 production. The performance of the developed strategies and methods were tested in twenty composite referenced areas throughout the conterminous U.S. An overall accuracy assessment from the 2016 publication give a 91% overall landcover accuracy, with the developed classes also showing a 91% accuracy in overall developed. Results from this study confirm the robustness of this comprehensive and highly automated procedure for NLCD 2019 operational mapping. Questions about the NLCD 2019 land cover product can be directed to the NLCD 2019 land cover mapping team at USGS EROS, Sioux Falls, SD (605) 594-6151 or mrlc@usgs.gov. See included spatial metadata for more details.

  12. Census of Population and Housing 1960 - IPUMS Subset - United States

    • microdata.worldbank.org
    • datacatalog.ihsn.org
    • +1more
    Updated Apr 26, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Minnesota Population Center (2018). Census of Population and Housing 1960 - IPUMS Subset - United States [Dataset]. https://microdata.worldbank.org/index.php/catalog/2114
    Explore at:
    Dataset updated
    Apr 26, 2018
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Minnesota Population Center
    Time period covered
    1960
    Area covered
    United States
    Description

    Abstract

    IPUMS-International is an effort to inventory, preserve, harmonize, and disseminate census microdata from around the world. The project has collected the world's largest archive of publicly available census samples. The data are coded and documented consistently across countries and over time to facillitate comparative research. IPUMS-International makes these data available to qualified researchers free of charge through a web dissemination system.

    The IPUMS project is a collaboration of the Minnesota Population Center, National Statistical Offices, and international data archives. Major funding is provided by the U.S. National Science Foundation and the Demographic and Behavioral Sciences Branch of the National Institute of Child Health and Human Development. Additional support is provided by the University of Minnesota Office of the Vice President for Research, the Minnesota Population Center, and Sun Microsystems.

    Geographic coverage

    National coverage

    Analysis unit

    Households and Group Quarters

    UNITS IDENTIFIED: - Dwellings: No - Vacant units: No - Households: Yes - Individuals: Yes - Group quarters: Yes

    UNIT DESCRIPTIONS: - Households: Dwelling places with fewer than five persons unrelated to a household head, excluding institutions and transient quarters. - Group quarters: Institutions, transient quarters, and dwelling places with five or more persons unrelated to a household head.

    Universe

    Residents of the 50 states (not the outlying areas).

    Kind of data

    Census/enumeration data [cen]

    Sampling procedure

    MICRODATA SOURCE: U.S. Census Bureau

    SAMPLE UNIT: Household

    SAMPLE FRACTION: 1%

    SAMPLE SIZE (person records): 1,799,888

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The 1960 census used a machine-readable household form. Separate forms were used for each housing unit. Housing questions were included on the same form as the population items. Every fourth enumeration unit received a "long form," containing supplemental sample questions that were asked of all members of the unit. Sample questions are available for all individuals in every unit. Of the units receiving a long form, four-fifths received one version (the 20% questionnaire), and one-fifth received a second version with the same population questions but slightly different housing questions (the 5% questionnaire).

    Response rate

    UNDERCOUNT: No official estimates

  13. d

    Replication Data for: More Risk, More Information: How Passive Ownership Can...

    • dataone.org
    • dataverse.harvard.edu
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sundaresan, Savitar; Buss, Adrian (2023). Replication Data for: More Risk, More Information: How Passive Ownership Can Improve Informational Efficiency [Dataset]. http://doi.org/10.7910/DVN/SRAWOE
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Sundaresan, Savitar; Buss, Adrian
    Description

    This is the Stata code (.do file) and a subset of the data (.csv file) anonymized to show that the code works. The complete dataset includes information on ownership and firm characteristics for publicly traded firms from 2000 to 2016.

  14. E

    CELEX Dutch lexical database - Derivational Morphology Subset

    • catalogue.elra.info
    • live.european-language-grid.eu
    Updated Oct 5, 2005
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2005). CELEX Dutch lexical database - Derivational Morphology Subset [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-L0029_05/
    Explore at:
    Dataset updated
    Oct 5, 2005
    Dataset provided by
    ELRA (European Language Resources Association)
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    License

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    Description

    The Dutch CELEX data is derived from R.H. Baayen, R. Piepenbrock & L. Gulikers, The CELEX Lexical Database (CD-ROM), Release 2, Dutch Version 3.1, Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, 1995.Apart from orthographic features, the CELEX database comprises representations of the phonological, morphological, syntactic and frequency properties of lemmata. For the Dutch data, frequencies have been disambiguated on the basis of the 42.4m Dutch Instituut voor Nederlandse Lexicologie text corpora.To make for greater compatibility with other operating systems, the databases have not been tailored to fit any particular database management program. Instead, the information is presented in a series of plain ASCII files, which can be queried with tools such as AWK and ICON. Unique identity numbers allow the linking of information from different files.This database can be divided into different subsets:· orthography: with or without diacritics, with or without word division positions, alternative spellings, number of letters/syllables;· phonology: phonetic transcriptions with syllable boundaries or primary and secondary stress markers, consonant-vowel patterns, number of phonemes/syllables, alternative pronunciations, frequency per phonetic syllable within words;· morphology: division into stems and affixes, flat or hierarchical representations, stems and their inflections;· syntax: word class, subcategorisations per word class;· frequency of the entries: disambiguated for homographic lemmata.

  15. Million Song Data Set Subset

    • kaggle.com
    zip
    Updated Feb 22, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anurag Banerjee (2018). Million Song Data Set Subset [Dataset]. https://www.kaggle.com/datasets/anuragbanerjee/million-song-data-set-subset/data
    Explore at:
    zip(62653826 bytes)Available download formats
    Dataset updated
    Feb 22, 2018
    Authors
    Anurag Banerjee
    Description

    Dataset

    This dataset was created by Anurag Banerjee

    Contents

  16. Jellyfish Database Initiative: Global records on gelatinous zooplankton for...

    • obis.org
    zip
    Updated May 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CSIRO National Collections and Marine Infrastructure (2023). Jellyfish Database Initiative: Global records on gelatinous zooplankton for the past 200 years, collected from global sources and literature, subset of records from Australian and adjacent seas. (1907-2011) [Dataset]. https://obis.org/dataset/0dee1c07-b9b6-4260-a4bc-6c6fff25a041
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 11, 2023
    Dataset provided by
    CSIROhttp://www.csiro.au/
    Authors
    CSIRO National Collections and Marine Infrastructure
    Time period covered
    1907 - 2011
    Description

    The Jellyfish Database Initiative (JeDI) is a scientifically-coordinated global database dedicated to gelatinous zooplankton (members of the Cnidaria, Ctenophora and Thaliacea) and associated environmental data. The database holds 476,000 quantitative, categorical, presence-absence and presence only records of gelatinous zooplankton spanning the past four centuries (1790-2011) assembled from a variety of published and unpublished sources. Gelatinous zooplankton data are reported to species level, where identified, but taxonomic information on phylum, family and order are reported for all records. This dataset is a subset of data in Australian waters and adjacent seas.

  17. GEMM data subset

    • zenodo.org
    Updated Sep 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2024). GEMM data subset [Dataset]. http://doi.org/10.5281/zenodo.11846301
    Explore at:
    Dataset updated
    Sep 27, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    For full data access, please see record and instructions at 10.5281/zenodo.13827890.

  18. C

    NLCD 2021 Land Cover California Subset

    • data.cnra.ca.gov
    • data.ca.gov
    • +3more
    Updated Feb 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Fish and Wildlife (2024). NLCD 2021 Land Cover California Subset [Dataset]. https://data.cnra.ca.gov/dataset/nlcd-2021-land-cover-california-subset
    Explore at:
    arcgis geoservices rest api, html, zipAvailable download formats
    Dataset updated
    Feb 27, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    California Department of Fish and Wildlife
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    California
    Description

    The U.S. Geological Survey (USGS), in partnership with several federal agencies, has now developed and released seven National Land Cover Database (NLCD) products: NLCD 1992, 2001, 2006, 2011, 2016, 2019, and 2021. Beginning with the 2016 release, land cover products were created for two-to-three-year intervals between 2001 and the most recent year. These products provide spatially explicit and reliable information on the Nation’s land cover and land cover change. NLCD continues to provide innovative, consistent, and robust methodologies for production of a multi-temporal land cover and land cover change database. NLCD 2021 adds an additional year to the map products produced for NLCD 2019, with a streamlined compositing process for assembling and preprocessing Landsat imagery and geospatial ancillary datasets; a temporally, spectrally, and spatially integrated land cover change analysis strategy; a theme-based post-classification protocol for generating land cover and change products; a continuous fields biophysical parameters modeling method; and a scripted operational system. The overall accuracy of the 2019 Level I land cover was 91%. Results from this study confirm the robustness of this comprehensive and highly automated procedure for NLCD 2021 operational mapping (see https://doi.org/10.1080/15481603.2023.2181143 for the latest accuracy assessment publication). Questions about the NLCD 2021 land cover product can be directed to the NLCD 2021 land cover mapping team at USGS EROS, Sioux Falls, SD (605) 594-6151 or mrlc@usgs.gov. See included spatial metadata for more details.

  19. a

    AOE analysis subset of the Arthropod Easy Capture (AEC) database

    • spatialdiscovery-ucsb.opendata.arcgis.com
    • figshare.com
    • +3more
    Updated Jun 1, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of California, Santa Barbara (2016). AOE analysis subset of the Arthropod Easy Capture (AEC) database [Dataset]. https://spatialdiscovery-ucsb.opendata.arcgis.com/datasets/890dc2eae81b41ae9b1562339137248f
    Explore at:
    Dataset updated
    Jun 1, 2016
    Dataset authored and provided by
    University of California, Santa Barbara
    Area covered
    Description

    Based on the default parameters used in the analysis, the entire AOE database available through figshare (doi: 10.6084/m9.figshare.2060979), represents a subset of the AMNH instance of the AEC database, which includes additional tables to capture host plant data and host analysis.

    1) Miridae subFamily(id) =Mirinae(id:8150), Orthotylinae(id:6294), Phylinae(id:6295), Deraeocorinae(id:8163) from AEC database sql. 2) geographic range: North America Country.UID = Canada(id:2),Mexico(id:8),USA(id:11) 3) complete plant host analysis 4) cleaned plant host data

  20. f

    Data from: DigiMOF: A Database of Metal–Organic Framework Synthesis...

    • figshare.com
    • acs.figshare.com
    xlsx
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lawson T. Glasby; Kristian Gubsch; Rosalee Bence; Rama Oktavian; Kesler Isoko; Seyed Mohamad Moosavi; Joan L. Cordiner; Jason C. Cole; Peyman Z. Moghadam (2023). DigiMOF: A Database of Metal–Organic Framework Synthesis Information Generated via Text Mining [Dataset]. http://doi.org/10.1021/acs.chemmater.3c00788.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    ACS Publications
    Authors
    Lawson T. Glasby; Kristian Gubsch; Rosalee Bence; Rama Oktavian; Kesler Isoko; Seyed Mohamad Moosavi; Joan L. Cordiner; Jason C. Cole; Peyman Z. Moghadam
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    The vastness of materials space, particularly that which is concerned with metal–organic frameworks (MOFs), creates the critical problem of performing efficient identification of promising materials for specific applications. Although high-throughput computational approaches, including the use of machine learning, have been useful in rapid screening and rational design of MOFs, they tend to neglect descriptors related to their synthesis. One way to improve the efficiency of MOF discovery is to data-mine published MOF papers to extract the materials informatics knowledge contained within journal articles. Here, by adapting the chemistry-aware natural language processing tool, ChemDataExtractor (CDE), we generated an open-source database of MOFs focused on their synthetic properties: the DigiMOF database. Using the CDE web scraping package alongside the Cambridge Structural Database (CSD) MOF subset, we automatically downloaded 43,281 unique MOF journal articles, extracted 15,501 unique MOF materials, and text-mined over 52,680 associated properties including the synthesis method, solvent, organic linker, metal precursor, and topology. Additionally, we developed an alternative data extraction technique to obtain and transform the chemical names assigned to each CSD entry in order to determine linker types for each structure in the CSD MOF subset. This data enabled us to match MOFs to a list of known linkers provided by Tokyo Chemical Industry UK Ltd. (TCI) and analyze the cost of these important chemicals. This centralized, structured database reveals the MOF synthetic data embedded within thousands of MOF publications and contains further topology, metal type, accessible surface area, largest cavity diameter, pore limiting diameter, open metal sites, and density calculations for all 3D MOFs in the CSD MOF subset. The DigiMOF database and associated software are publicly available for other researchers to rapidly search for MOFs with specific properties, conduct further analysis of alternative MOF production pathways, and create additional parsers to search for additional desirable properties.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Subset Data, Inc. (2023). finetune-data-28fee8943227 [Dataset]. https://huggingface.co/datasets/subset-data/finetune-data-28fee8943227

finetune-data-28fee8943227

subset-data/finetune-data-28fee8943227

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 4, 2023
Dataset authored and provided by
Subset Data, Inc.
Description

Dataset Card for "finetune-data-28fee8943227"

More Information needed

Search
Clear search
Close search
Google apps
Main menu