100+ datasets found

h
finetune-data-28fee8943227
huggingface.co
Updated Aug 4, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Subset Data, Inc. (2023). finetune-data-28fee8943227 [Dataset]. https://huggingface.co/datasets/subset-data/finetune-data-28fee8943227
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 4, 2023
Dataset authored and provided by
Subset Data, Inc.
Description
Dataset Card for "finetune-data-28fee8943227"

More Information needed
d
NCDC Hourly Global Surface Variables-Selected Subset
catalog.data.gov
data.ca.gov
Updated Nov 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Office of Environmental Health Hazard Assessment (2024). NCDC Hourly Global Surface Variables-Selected Subset [Dataset]. https://catalog.data.gov/dataset/ncdc-hourly-global-surface-variables-selected-subset
Explore at:
Dataset updated
Nov 27, 2024
Dataset provided by
California Office of Environmental Health Hazard Assessment
Description
Holds hourly surface temperature data from weather stations across the globe, and an important source of temperature data for temperature-health studies.
E
CELEX Dutch lexical database - Orthography Subset
catalogue.elra.info
live.european-language-grid.eu
Updated Oct 5, 2005
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) (2005). CELEX Dutch lexical database - Orthography Subset [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-L0029_02/
Explore at:
Dataset updated
Oct 5, 2005
Dataset provided by
ELRA (European Language Resources Association)
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
License
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
Description
The Dutch CELEX data is derived from R.H. Baayen, R. Piepenbrock & L. Gulikers, The CELEX Lexical Database (CD-ROM), Release 2, Dutch Version 3.1, Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, 1995.Apart from orthographic features, the CELEX database comprises representations of the phonological, morphological, syntactic and frequency properties of lemmata. For the Dutch data, frequencies have been disambiguated on the basis of the 42.4m Dutch Instituut voor Nederlandse Lexicologie text corpora.To make for greater compatibility with other operating systems, the databases have not been tailored to fit any particular database management program. Instead, the information is presented in a series of plain ASCII files, which can be queried with tools such as AWK and ICON. Unique identity numbers allow the linking of information from different files.This database can be divided into different subsets:· orthography: with or without diacritics, with or without word division positions, alternative spellings, number of letters/syllables;· phonology: phonetic transcriptions with syllable boundaries or primary and secondary stress markers, consonant-vowel patterns, number of phonemes/syllables, alternative pronunciations, frequency per phonetic syllable within words;· morphology: division into stems and affixes, flat or hierarchical representations, stems and their inflections;· syntax: word class, subcategorisations per word class;· frequency of the entries: disambiguated for homographic lemmata.
e
Subsetting
paper.erudition.co.in
html
Updated Mar 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Subsetting [Dataset]. https://paper.erudition.co.in/makaut/bachelor-of-computer-application-2023-2024/2/data-analysis-with-r/subsetting
Explore at:
htmlAvailable download formats
Dataset updated
Mar 17, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Subsetting of Data Analysis with R, 2nd Semester , Bachelor of Computer Application 2023-2024
H
AORC Subset
hydroshare.org
beta.hydroshare.org
zip
Updated Dec 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AORC Subset [Dataset]. https://www.hydroshare.org/resource/c1bce473fff641d7a678565af9785c31
Explore at:
zip(28.3 KB)Available download formats
Dataset updated
Dec 6, 2023
Dataset provided by
HydroShare
Authors
Ayman Nassar; David Tarboton; Anthony M. Castronova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2010 - Dec 31, 2019
Area covered

Description
The objective of this HydroShare resource is to query AORC v1.0 Forcing data stored on HydroShare's Thredds server and create a subset of this dataset for a designated watershed and timeframe. The user is prompted to define their temporal and spatial frames of interest, which specifies the start and end dates for the data subset. Additionally, the user is prompted to define a spatial frame of interest, which could be a bounding box or a shapefile, to subset the data spatially.

Before the subsetting is performed, data is queried, and geospatial metadata is added to ensure that the data is correctly aligned with its corresponding location on the Earth's surface. To achieve this, two separate notebooks were created - this notebook and this notebook - which explain how to query the dataset and add geospatial metadata to AORC v1.0 data in detail, respectively. In this notebook, we call functions from the AORC.py script to perform these preprocessing steps, resulting in a cleaner notebook that focuses solely on the subsetting process.
Comprehensive Ocean - Atmosphere Data Set (COADS) LMRF Arctic Subset, 1950 -...
data.staging.idas-ds1.appdat.jsc.nasa.gov
search.dataone.org
+3more
Updated Feb 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Comprehensive Ocean - Atmosphere Data Set (COADS) LMRF Arctic Subset, 1950 - 1995, Version 1 [Dataset]. https://data.staging.idas-ds1.appdat.jsc.nasa.gov/dataset/comprehensive-ocean-atmosphere-data-set-coads-lmrf-arctic-subset-1950-1995-version-1
Explore at:
Dataset updated
Feb 18, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
The Comprehensive Ocean - Atmosphere Data Set (COADS) Long Marine Reports Fixed-Length (LMRF) Arctic subset contains marine surface weather reports for regions north of 65 degrees N from ships, drifting ice stations, and buoys. The COADS LMRF Arctic subset contains data collected over the years 1950 to 1995 and includes the following parameters: air and sea temperature, cloudiness, humidity, and winds. The data are in the form of individual marine reports with a given latitude and longitude.
E
AURORA Project database - Subset of SpeechDat-Car - German database -...
catalogue.elra.info
live.european-language-grid.eu
Updated Aug 16, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2017). AURORA Project database - Subset of SpeechDat-Car - German database - Evaluation Package [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-AURORA-CD0003_03/
Explore at:
Dataset updated
Aug 16, 2017
Dataset provided by
ELRA (European Language Resources Association)
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
License
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
Description
The Aurora project was originally set up to establish a world wide standard for the feature extraction software which forms the core of the front-end of a DSR (Distributed Speech Recognition) system. ETSI formally adopted this activity as work items 007 and 008.The two work items within ETSI are:- ETSI DES/STQ WI007: Distributed Speech Recognition - Front-End Feature Extraction Algorithm & Compression Algorithm- ETSI DES/STQ WI008: Distributed Speech Recognition - Advanced Feature Extraction Algorithm. This database is a subset of the SpeechDat-Car database in German language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected German digits spoken in the following noise and driving conditions inside a car:1. High speed good road2. Low speed rough road3. Stopped with motor running4. Town traffic
n
Subset of data from the TAGS database of known age seals - Weddell Seals
cmr.earthdata.nasa.gov
researchdata.edu.au
+2more
Updated Nov 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Subset of data from the TAGS database of known age seals - Weddell Seals [Dataset]. http://doi.org/10.4225/15/5b35bbdc5c3de
Explore at:
Unique identifier
https://doi.org/10.4225/15/5b35bbdc5c3de
Dataset updated
Nov 16, 2020
Time period covered
Jan 22, 1973 - Oct 2, 2006
Area covered

Description
This database is a compendium of histories of known age seals (Weddell) from observations across the Southern Ocean but focussed on the Windmill Islands, Mawson and the Vestfold Hills. Although the following information pertains to Elephant Seals, it is assumed similar procedures were undertaken with the Weddell Seals between 1973 and 2006:

At Macquarie Island 1000 seals were weighed per annum between 1993-2003 at birth and individually marked with two plastic flipper tags in the inter-digital webbing of their hind flippers. These tagged seals were weighed again at weaning, when length, girth, fat depth, and flipper measurements were made. Three weeks after weaning 2000 seals were permanently and individually marked by hot-iron branding. Recaptures and re-weighings of these known aged individuals were used to calculate growth and age-specific survival of the seals.

Similar data were collected from elephant seals between 1950 and 1965 when seals were individually marked by hot-iron branding. Mark-recapture data from these cohorts were used to assess the demography of the declining population. Length and mass data were also collected for these cohorts and were used, for the first time, to assess the growth of individual seals without killing them.

The database was held by the Australian Antarctic Data Centre, but was taken offline due to maintenance problems. A snapshot of the database was taken in June 2018 and stored in an access database.

This work was completed as part of ASAC project 90.
Marine Connectivity Database subset for GovHack 2016
ecat.ga.gov.au
researchdata.edu.au
Updated May 30, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Commonwealth of Australia (Geoscience Australia) (2019). Marine Connectivity Database subset for GovHack 2016 [Dataset]. https://ecat.ga.gov.au/geonetwork/srv/api/records/98f93235-e053-4dd5-b7df-8c1a2c0be461
Explore at:
www:link-1.0-http--linkAvailable download formats
Dataset updated
May 30, 2019
Dataset provided by
Geoscience Australiahttp://ga.gov.au/
Time period covered
Jan 26, 2010 - Jun 24, 2010
Area covered

Description
This is a subset of Geoscience Australia's Marine Connectivity Database (here), covering the North-west marine planning region for initial releases taking place in the interval January-March 2010. The subset is intended for use in development and testing as part of the GovHack 2016 competition.
Subset of data from the TAGS database of known age seals - Elephant Seals
catalogue-temperatereefbase.imas.utas.edu.au
data.aad.gov.au
+2more
Updated Nov 2, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AU/AADC > Australian Antarctic Data Centre, Australia (2017). Subset of data from the TAGS database of known age seals - Elephant Seals [Dataset]. https://catalogue-temperatereefbase.imas.utas.edu.au/geonetwork/srv/api/records/TAGS_Elephant_Seals
Explore at:
www:link-1.0-http--linkAvailable download formats
Dataset updated
Nov 2, 2017
Dataset provided by
Australian Antarctic Divisionhttps://www.antarctica.gov.au/
Australian Antarctic Data Centre
Time period covered
Jan 1, 1950 - Jan 31, 2015
Area covered

Description
This database is a compendium of histories of known age seals (Southern elephant) from observations across the Southern Ocean but focussed on Macquarie Island, Marion Island, Heard Island, Mawson and the Vestfold Hills.

At Macquarie Island 1000 seals were weighed per annum between 1993-2003 at birth and individually marked with two plastic flipper tags in the inter-digital webbing of their hind flippers. These tagged seals were weighed again at weaning, when length, girth, fat depth, and flipper measurements were made. Three weeks after weaning 2000 seals were permanently and individually marked by hot-iron branding. Recaptures and re-weighings of these known aged individuals were used to calculate growth and age-specific survival of the seals.

Similar data were collected from elephant seals between 1950 and 1965 when seals were individually marked by hot-iron branding. Mark-recapture data from these cohorts were used to assess the demography of the declining population. Length and mass data were also collected for these cohorts and were used, for the first time, to assess the growth of individual seals without killing them.

At Marion Island all the elephant seals have been individually marked with two plastic flipper tags in their rear flippers. Recaptures of these seals were used to compare survival at Marion and Macquarie Islands.

At Heard Island, seals were branded between 1949-1953. Seal length was measured in feet and inches. Recaptures of seals were made up until 1955, and growth and age-specific survival was calculated. Survival data from Heard Island were compared with concurrent data from Macquarie Island.

The database was held by the Australian Antarctic Data Centre, but was taken offline due to maintenance problems. A snapshot of the database was taken in January 2015 and stored in an access database and several csv files.

This work was completed as part of ASAC project 90.
C
NLCD 2016 Land Cover California Subset
data.cnra.ca.gov
data.ca.gov
+4more
Updated Dec 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Fish and Wildlife (2023). NLCD 2016 Land Cover California Subset [Dataset]. https://data.cnra.ca.gov/dataset/nlcd-2016-land-cover-california-subset
Explore at:
arcgis geoservices rest api, htmlAvailable download formats
Dataset updated
Dec 20, 2023
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
California Department of Fish and Wildlife
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
California
Description
The U.S. Geological Survey (USGS), in partnership with several federal agencies, has developed and released five National Land Cover Database (NLCD) products over the past two decades: NLCD 1992, 2001, 2006, 2011, and 2016. The 2016 release saw landcover created for additional years of 2003, 2008, and 2013. These products provide spatially explicit and reliable information on the Nation’s land cover and land cover change. To continue the legacy of NLCD and further establish a long-term monitoring capability for the Nation’s land resources, the USGS has designed a new generation of NLCD products named NLCD 2019. The NLCD 2019 design aims to provide innovative, consistent, and robust methodologies for production of a multi-temporal land cover and land cover change database from 2001 to 2019 at 2–3-year intervals. Comprehensive research was conducted and resulted in developed strategies for NLCD 2019: continued integration between impervious surface and all landcover products with impervious surface being directly mapped as developed classes in the landcover, a streamlined compositing process for assembling and preprocessing based on Landsat imagery and geospatial ancillary datasets; a multi-source integrated training data development and decision-tree based land cover classifications; a temporally, spectrally, and spatially integrated land cover change analysis strategy; a hierarchical theme-based post-classification and integration protocol for generating land cover and change products; a continuous fields biophysical parameters modeling method; and an automated scripted operational system for the NLCD 2019 production. The performance of the developed strategies and methods were tested in twenty composite referenced areas throughout the conterminous U.S. An overall accuracy assessment from the 2016 publication give a 91% overall landcover accuracy, with the developed classes also showing a 91% accuracy in overall developed. Results from this study confirm the robustness of this comprehensive and highly automated procedure for NLCD 2019 operational mapping. Questions about the NLCD 2019 land cover product can be directed to the NLCD 2019 land cover mapping team at USGS EROS, Sioux Falls, SD (605) 594-6151 or mrlc@usgs.gov. See included spatial metadata for more details.
Census of Population and Housing 1960 - IPUMS Subset - United States
microdata.worldbank.org
datacatalog.ihsn.org
+1more
Updated Apr 26, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Minnesota Population Center (2018). Census of Population and Housing 1960 - IPUMS Subset - United States [Dataset]. https://microdata.worldbank.org/index.php/catalog/2114
Explore at:
Dataset updated
Apr 26, 2018
Dataset provided by
United States Census Bureauhttp://census.gov/
Minnesota Population Center
Time period covered
1960
Area covered
United States
Description
Abstract

IPUMS-International is an effort to inventory, preserve, harmonize, and disseminate census microdata from around the world. The project has collected the world's largest archive of publicly available census samples. The data are coded and documented consistently across countries and over time to facillitate comparative research. IPUMS-International makes these data available to qualified researchers free of charge through a web dissemination system.

The IPUMS project is a collaboration of the Minnesota Population Center, National Statistical Offices, and international data archives. Major funding is provided by the U.S. National Science Foundation and the Demographic and Behavioral Sciences Branch of the National Institute of Child Health and Human Development. Additional support is provided by the University of Minnesota Office of the Vice President for Research, the Minnesota Population Center, and Sun Microsystems.

Geographic coverage

National coverage

Analysis unit

Households and Group Quarters

UNITS IDENTIFIED: - Dwellings: No - Vacant units: No - Households: Yes - Individuals: Yes - Group quarters: Yes

UNIT DESCRIPTIONS: - Households: Dwelling places with fewer than five persons unrelated to a household head, excluding institutions and transient quarters. - Group quarters: Institutions, transient quarters, and dwelling places with five or more persons unrelated to a household head.

Universe

Residents of the 50 states (not the outlying areas).

Kind of data

Census/enumeration data [cen]

Sampling procedure

MICRODATA SOURCE: U.S. Census Bureau

SAMPLE UNIT: Household

SAMPLE FRACTION: 1%

SAMPLE SIZE (person records): 1,799,888

Mode of data collection

Face-to-face [f2f]

Research instrument

The 1960 census used a machine-readable household form. Separate forms were used for each housing unit. Housing questions were included on the same form as the population items. Every fourth enumeration unit received a "long form," containing supplemental sample questions that were asked of all members of the unit. Sample questions are available for all individuals in every unit. Of the units receiving a long form, four-fifths received one version (the 20% questionnaire), and one-fifth received a second version with the same population questions but slightly different housing questions (the 5% questionnaire).

Response rate

UNDERCOUNT: No official estimates
d
Replication Data for: More Risk, More Information: How Passive Ownership Can...
dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sundaresan, Savitar; Buss, Adrian (2023). Replication Data for: More Risk, More Information: How Passive Ownership Can Improve Informational Efficiency [Dataset]. http://doi.org/10.7910/DVN/SRAWOE
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/SRAWOE
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Sundaresan, Savitar; Buss, Adrian
Description
This is the Stata code (.do file) and a subset of the data (.csv file) anonymized to show that the code works. The complete dataset includes information on ownership and firm characteristics for publicly traded firms from 2000 to 2016.
E
CELEX Dutch lexical database - Derivational Morphology Subset
catalogue.elra.info
live.european-language-grid.eu
Updated Oct 5, 2005
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2005). CELEX Dutch lexical database - Derivational Morphology Subset [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-L0029_05/
Explore at:
Dataset updated
Oct 5, 2005
Dataset provided by
ELRA (European Language Resources Association)
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
License
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
Description
The Dutch CELEX data is derived from R.H. Baayen, R. Piepenbrock & L. Gulikers, The CELEX Lexical Database (CD-ROM), Release 2, Dutch Version 3.1, Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, 1995.Apart from orthographic features, the CELEX database comprises representations of the phonological, morphological, syntactic and frequency properties of lemmata. For the Dutch data, frequencies have been disambiguated on the basis of the 42.4m Dutch Instituut voor Nederlandse Lexicologie text corpora.To make for greater compatibility with other operating systems, the databases have not been tailored to fit any particular database management program. Instead, the information is presented in a series of plain ASCII files, which can be queried with tools such as AWK and ICON. Unique identity numbers allow the linking of information from different files.This database can be divided into different subsets:· orthography: with or without diacritics, with or without word division positions, alternative spellings, number of letters/syllables;· phonology: phonetic transcriptions with syllable boundaries or primary and secondary stress markers, consonant-vowel patterns, number of phonemes/syllables, alternative pronunciations, frequency per phonetic syllable within words;· morphology: division into stems and affixes, flat or hierarchical representations, stems and their inflections;· syntax: word class, subcategorisations per word class;· frequency of the entries: disambiguated for homographic lemmata.
Million Song Data Set Subset
kaggle.com
zip
Updated Feb 22, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anurag Banerjee (2018). Million Song Data Set Subset [Dataset]. https://www.kaggle.com/datasets/anuragbanerjee/million-song-data-set-subset/data
Explore at:
zip(62653826 bytes)Available download formats
Dataset updated
Feb 22, 2018
Authors
Anurag Banerjee
Description
Dataset

This dataset was created by Anurag Banerjee

Contents
Jellyfish Database Initiative: Global records on gelatinous zooplankton for...
obis.org
zip
Updated May 11, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CSIRO National Collections and Marine Infrastructure (2023). Jellyfish Database Initiative: Global records on gelatinous zooplankton for the past 200 years, collected from global sources and literature, subset of records from Australian and adjacent seas. (1907-2011) [Dataset]. https://obis.org/dataset/0dee1c07-b9b6-4260-a4bc-6c6fff25a041
Explore at:
zipAvailable download formats
Dataset updated
May 11, 2023
Dataset provided by
CSIROhttp://www.csiro.au/
Authors
CSIRO National Collections and Marine Infrastructure
Time period covered
1907 - 2011
Description
The Jellyfish Database Initiative (JeDI) is a scientifically-coordinated global database dedicated to gelatinous zooplankton (members of the Cnidaria, Ctenophora and Thaliacea) and associated environmental data. The database holds 476,000 quantitative, categorical, presence-absence and presence only records of gelatinous zooplankton spanning the past four centuries (1790-2011) assembled from a variety of published and unpublished sources. Gelatinous zooplankton data are reported to species level, where identified, but taxonomic information on phylum, family and order are reported for all records. This dataset is a subset of data in Australian waters and adjacent seas.
GEMM data subset
zenodo.org
Updated Sep 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2024). GEMM data subset [Dataset]. http://doi.org/10.5281/zenodo.11846301
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.11846301
Dataset updated
Sep 27, 2024
Dataset provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
For full data access, please see record and instructions at 10.5281/zenodo.13827890.
C
NLCD 2021 Land Cover California Subset
data.cnra.ca.gov
data.ca.gov
+3more
Updated Feb 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Fish and Wildlife (2024). NLCD 2021 Land Cover California Subset [Dataset]. https://data.cnra.ca.gov/dataset/nlcd-2021-land-cover-california-subset
Explore at:
arcgis geoservices rest api, html, zipAvailable download formats
Dataset updated
Feb 27, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
California Department of Fish and Wildlife
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
California
Description
The U.S. Geological Survey (USGS), in partnership with several federal agencies, has now developed and released seven National Land Cover Database (NLCD) products: NLCD 1992, 2001, 2006, 2011, 2016, 2019, and 2021. Beginning with the 2016 release, land cover products were created for two-to-three-year intervals between 2001 and the most recent year. These products provide spatially explicit and reliable information on the Nation’s land cover and land cover change. NLCD continues to provide innovative, consistent, and robust methodologies for production of a multi-temporal land cover and land cover change database. NLCD 2021 adds an additional year to the map products produced for NLCD 2019, with a streamlined compositing process for assembling and preprocessing Landsat imagery and geospatial ancillary datasets; a temporally, spectrally, and spatially integrated land cover change analysis strategy; a theme-based post-classification protocol for generating land cover and change products; a continuous fields biophysical parameters modeling method; and a scripted operational system. The overall accuracy of the 2019 Level I land cover was 91%. Results from this study confirm the robustness of this comprehensive and highly automated procedure for NLCD 2021 operational mapping (see https://doi.org/10.1080/15481603.2023.2181143 for the latest accuracy assessment publication). Questions about the NLCD 2021 land cover product can be directed to the NLCD 2021 land cover mapping team at USGS EROS, Sioux Falls, SD (605) 594-6151 or mrlc@usgs.gov. See included spatial metadata for more details.
a
AOE analysis subset of the Arthropod Easy Capture (AEC) database
spatialdiscovery-ucsb.opendata.arcgis.com
figshare.com
+3more
Updated Jun 1, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of California, Santa Barbara (2016). AOE analysis subset of the Arthropod Easy Capture (AEC) database [Dataset]. https://spatialdiscovery-ucsb.opendata.arcgis.com/datasets/890dc2eae81b41ae9b1562339137248f
Explore at:
Dataset updated
Jun 1, 2016
Dataset authored and provided by
University of California, Santa Barbara
Area covered

Description
Based on the default parameters used in the analysis, the entire AOE database available through figshare (doi: 10.6084/m9.figshare.2060979), represents a subset of the AMNH instance of the AEC database, which includes additional tables to capture host plant data and host analysis.

1) Miridae subFamily(id) =Mirinae(id:8150), Orthotylinae(id:6294), Phylinae(id:6295), Deraeocorinae(id:8163) from AEC database sql. 2) geographic range: North America Country.UID = Canada(id:2),Mexico(id:8),USA(id:11) 3) complete plant host analysis 4) cleaned plant host data
f
Data from: DigiMOF: A Database of Metal–Organic Framework Synthesis...
figshare.com
acs.figshare.com
xlsx
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lawson T. Glasby; Kristian Gubsch; Rosalee Bence; Rama Oktavian; Kesler Isoko; Seyed Mohamad Moosavi; Joan L. Cordiner; Jason C. Cole; Peyman Z. Moghadam (2023). DigiMOF: A Database of Metal–Organic Framework Synthesis Information Generated via Text Mining [Dataset]. http://doi.org/10.1021/acs.chemmater.3c00788.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.chemmater.3c00788.s002
Dataset updated
Jun 2, 2023
Dataset provided by
ACS Publications
Authors
Lawson T. Glasby; Kristian Gubsch; Rosalee Bence; Rama Oktavian; Kesler Isoko; Seyed Mohamad Moosavi; Joan L. Cordiner; Jason C. Cole; Peyman Z. Moghadam
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
The vastness of materials space, particularly that which is concerned with metal–organic frameworks (MOFs), creates the critical problem of performing efficient identification of promising materials for specific applications. Although high-throughput computational approaches, including the use of machine learning, have been useful in rapid screening and rational design of MOFs, they tend to neglect descriptors related to their synthesis. One way to improve the efficiency of MOF discovery is to data-mine published MOF papers to extract the materials informatics knowledge contained within journal articles. Here, by adapting the chemistry-aware natural language processing tool, ChemDataExtractor (CDE), we generated an open-source database of MOFs focused on their synthetic properties: the DigiMOF database. Using the CDE web scraping package alongside the Cambridge Structural Database (CSD) MOF subset, we automatically downloaded 43,281 unique MOF journal articles, extracted 15,501 unique MOF materials, and text-mined over 52,680 associated properties including the synthesis method, solvent, organic linker, metal precursor, and topology. Additionally, we developed an alternative data extraction technique to obtain and transform the chemical names assigned to each CSD entry in order to determine linker types for each structure in the CSD MOF subset. This data enabled us to match MOFs to a list of known linkers provided by Tokyo Chemical Industry UK Ltd. (TCI) and analyze the cost of these important chemicals. This centralized, structured database reveals the MOF synthetic data embedded within thousands of MOF publications and contains further topology, metal type, accessible surface area, largest cavity diameter, pore limiting diameter, open metal sites, and density calculations for all 3D MOFs in the CSD MOF subset. The DigiMOF database and associated software are publicly available for other researchers to rapidly search for MOFs with specific properties, conduct further analysis of alternative MOF production pathways, and create additional parsers to search for additional desirable properties.