Dataset Card for "finetune-data-28fee8943227"
More Information needed
Holds hourly surface temperature data from weather stations across the globe, and an important source of temperature data for temperature-health studies.
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
The Dutch CELEX data is derived from R.H. Baayen, R. Piepenbrock & L. Gulikers, The CELEX Lexical Database (CD-ROM), Release 2, Dutch Version 3.1, Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, 1995.Apart from orthographic features, the CELEX database comprises representations of the phonological, morphological, syntactic and frequency properties of lemmata. For the Dutch data, frequencies have been disambiguated on the basis of the 42.4m Dutch Instituut voor Nederlandse Lexicologie text corpora.To make for greater compatibility with other operating systems, the databases have not been tailored to fit any particular database management program. Instead, the information is presented in a series of plain ASCII files, which can be queried with tools such as AWK and ICON. Unique identity numbers allow the linking of information from different files.This database can be divided into different subsets:· orthography: with or without diacritics, with or without word division positions, alternative spellings, number of letters/syllables;· phonology: phonetic transcriptions with syllable boundaries or primary and secondary stress markers, consonant-vowel patterns, number of phonemes/syllables, alternative pronunciations, frequency per phonetic syllable within words;· morphology: division into stems and affixes, flat or hierarchical representations, stems and their inflections;· syntax: word class, subcategorisations per word class;· frequency of the entries: disambiguated for homographic lemmata.
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Question Paper Solutions of chapter Subsetting of Data Analysis with R, 2nd Semester , Bachelor of Computer Application 2023-2024
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The objective of this HydroShare resource is to query AORC v1.0 Forcing data stored on HydroShare's Thredds server and create a subset of this dataset for a designated watershed and timeframe. The user is prompted to define their temporal and spatial frames of interest, which specifies the start and end dates for the data subset. Additionally, the user is prompted to define a spatial frame of interest, which could be a bounding box or a shapefile, to subset the data spatially.
Before the subsetting is performed, data is queried, and geospatial metadata is added to ensure that the data is correctly aligned with its corresponding location on the Earth's surface. To achieve this, two separate notebooks were created - this notebook and this notebook - which explain how to query the dataset and add geospatial metadata to AORC v1.0 data in detail, respectively. In this notebook, we call functions from the AORC.py script to perform these preprocessing steps, resulting in a cleaner notebook that focuses solely on the subsetting process.
The Comprehensive Ocean - Atmosphere Data Set (COADS) Long Marine Reports Fixed-Length (LMRF) Arctic subset contains marine surface weather reports for regions north of 65 degrees N from ships, drifting ice stations, and buoys. The COADS LMRF Arctic subset contains data collected over the years 1950 to 1995 and includes the following parameters: air and sea temperature, cloudiness, humidity, and winds. The data are in the form of individual marine reports with a given latitude and longitude.
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
The Aurora project was originally set up to establish a world wide standard for the feature extraction software which forms the core of the front-end of a DSR (Distributed Speech Recognition) system. ETSI formally adopted this activity as work items 007 and 008.The two work items within ETSI are:- ETSI DES/STQ WI007: Distributed Speech Recognition - Front-End Feature Extraction Algorithm & Compression Algorithm- ETSI DES/STQ WI008: Distributed Speech Recognition - Advanced Feature Extraction Algorithm. This database is a subset of the SpeechDat-Car database in German language which has been collected as part of the European Union funded SpeechDat-Car project. It contains isolated and connected German digits spoken in the following noise and driving conditions inside a car:1. High speed good road2. Low speed rough road3. Stopped with motor running4. Town traffic
This database is a compendium of histories of known age seals (Weddell) from observations across the Southern Ocean but focussed on the Windmill Islands, Mawson and the Vestfold Hills. Although the following information pertains to Elephant Seals, it is assumed similar procedures were undertaken with the Weddell Seals between 1973 and 2006:
At Macquarie Island 1000 seals were weighed per annum between 1993-2003 at birth and individually marked with two plastic flipper tags in the inter-digital webbing of their hind flippers. These tagged seals were weighed again at weaning, when length, girth, fat depth, and flipper measurements were made. Three weeks after weaning 2000 seals were permanently and individually marked by hot-iron branding. Recaptures and re-weighings of these known aged individuals were used to calculate growth and age-specific survival of the seals.
Similar data were collected from elephant seals between 1950 and 1965 when seals were individually marked by hot-iron branding. Mark-recapture data from these cohorts were used to assess the demography of the declining population. Length and mass data were also collected for these cohorts and were used, for the first time, to assess the growth of individual seals without killing them.
The database was held by the Australian Antarctic Data Centre, but was taken offline due to maintenance problems. A snapshot of the database was taken in June 2018 and stored in an access database.
This work was completed as part of ASAC project 90.
This is a subset of Geoscience Australia's Marine Connectivity Database (here), covering the North-west marine planning region for initial releases taking place in the interval January-March 2010. The subset is intended for use in development and testing as part of the GovHack 2016 competition.
This database is a compendium of histories of known age seals (Southern elephant) from observations across the Southern Ocean but focussed on Macquarie Island, Marion Island, Heard Island, Mawson and the Vestfold Hills.
At Macquarie Island 1000 seals were weighed per annum between 1993-2003 at birth and individually marked with two plastic flipper tags in the inter-digital webbing of their hind flippers. These tagged seals were weighed again at weaning, when length, girth, fat depth, and flipper measurements were made. Three weeks after weaning 2000 seals were permanently and individually marked by hot-iron branding. Recaptures and re-weighings of these known aged individuals were used to calculate growth and age-specific survival of the seals.
Similar data were collected from elephant seals between 1950 and 1965 when seals were individually marked by hot-iron branding. Mark-recapture data from these cohorts were used to assess the demography of the declining population. Length and mass data were also collected for these cohorts and were used, for the first time, to assess the growth of individual seals without killing them.
At Marion Island all the elephant seals have been individually marked with two plastic flipper tags in their rear flippers. Recaptures of these seals were used to compare survival at Marion and Macquarie Islands.
At Heard Island, seals were branded between 1949-1953. Seal length was measured in feet and inches. Recaptures of seals were made up until 1955, and growth and age-specific survival was calculated. Survival data from Heard Island were compared with concurrent data from Macquarie Island.
The database was held by the Australian Antarctic Data Centre, but was taken offline due to maintenance problems. A snapshot of the database was taken in January 2015 and stored in an access database and several csv files.
This work was completed as part of ASAC project 90.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The U.S. Geological Survey (USGS), in partnership with several federal agencies, has developed and released five National Land Cover Database (NLCD) products over the past two decades: NLCD 1992, 2001, 2006, 2011, and 2016. The 2016 release saw landcover created for additional years of 2003, 2008, and 2013. These products provide spatially explicit and reliable information on the Nation’s land cover and land cover change. To continue the legacy of NLCD and further establish a long-term monitoring capability for the Nation’s land resources, the USGS has designed a new generation of NLCD products named NLCD 2019. The NLCD 2019 design aims to provide innovative, consistent, and robust methodologies for production of a multi-temporal land cover and land cover change database from 2001 to 2019 at 2–3-year intervals. Comprehensive research was conducted and resulted in developed strategies for NLCD 2019: continued integration between impervious surface and all landcover products with impervious surface being directly mapped as developed classes in the landcover, a streamlined compositing process for assembling and preprocessing based on Landsat imagery and geospatial ancillary datasets; a multi-source integrated training data development and decision-tree based land cover classifications; a temporally, spectrally, and spatially integrated land cover change analysis strategy; a hierarchical theme-based post-classification and integration protocol for generating land cover and change products; a continuous fields biophysical parameters modeling method; and an automated scripted operational system for the NLCD 2019 production. The performance of the developed strategies and methods were tested in twenty composite referenced areas throughout the conterminous U.S. An overall accuracy assessment from the 2016 publication give a 91% overall landcover accuracy, with the developed classes also showing a 91% accuracy in overall developed. Results from this study confirm the robustness of this comprehensive and highly automated procedure for NLCD 2019 operational mapping. Questions about the NLCD 2019 land cover product can be directed to the NLCD 2019 land cover mapping team at USGS EROS, Sioux Falls, SD (605) 594-6151 or mrlc@usgs.gov. See included spatial metadata for more details.
IPUMS-International is an effort to inventory, preserve, harmonize, and disseminate census microdata from around the world. The project has collected the world's largest archive of publicly available census samples. The data are coded and documented consistently across countries and over time to facillitate comparative research. IPUMS-International makes these data available to qualified researchers free of charge through a web dissemination system.
The IPUMS project is a collaboration of the Minnesota Population Center, National Statistical Offices, and international data archives. Major funding is provided by the U.S. National Science Foundation and the Demographic and Behavioral Sciences Branch of the National Institute of Child Health and Human Development. Additional support is provided by the University of Minnesota Office of the Vice President for Research, the Minnesota Population Center, and Sun Microsystems.
National coverage
Households and Group Quarters
UNITS IDENTIFIED: - Dwellings: No - Vacant units: No - Households: Yes - Individuals: Yes - Group quarters: Yes
UNIT DESCRIPTIONS: - Households: Dwelling places with fewer than five persons unrelated to a household head, excluding institutions and transient quarters. - Group quarters: Institutions, transient quarters, and dwelling places with five or more persons unrelated to a household head.
Residents of the 50 states (not the outlying areas).
Census/enumeration data [cen]
MICRODATA SOURCE: U.S. Census Bureau
SAMPLE UNIT: Household
SAMPLE FRACTION: 1%
SAMPLE SIZE (person records): 1,799,888
Face-to-face [f2f]
The 1960 census used a machine-readable household form. Separate forms were used for each housing unit. Housing questions were included on the same form as the population items. Every fourth enumeration unit received a "long form," containing supplemental sample questions that were asked of all members of the unit. Sample questions are available for all individuals in every unit. Of the units receiving a long form, four-fifths received one version (the 20% questionnaire), and one-fifth received a second version with the same population questions but slightly different housing questions (the 5% questionnaire).
UNDERCOUNT: No official estimates
This is the Stata code (.do file) and a subset of the data (.csv file) anonymized to show that the code works. The complete dataset includes information on ownership and firm characteristics for publicly traded firms from 2000 to 2016.
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
The Dutch CELEX data is derived from R.H. Baayen, R. Piepenbrock & L. Gulikers, The CELEX Lexical Database (CD-ROM), Release 2, Dutch Version 3.1, Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, 1995.Apart from orthographic features, the CELEX database comprises representations of the phonological, morphological, syntactic and frequency properties of lemmata. For the Dutch data, frequencies have been disambiguated on the basis of the 42.4m Dutch Instituut voor Nederlandse Lexicologie text corpora.To make for greater compatibility with other operating systems, the databases have not been tailored to fit any particular database management program. Instead, the information is presented in a series of plain ASCII files, which can be queried with tools such as AWK and ICON. Unique identity numbers allow the linking of information from different files.This database can be divided into different subsets:· orthography: with or without diacritics, with or without word division positions, alternative spellings, number of letters/syllables;· phonology: phonetic transcriptions with syllable boundaries or primary and secondary stress markers, consonant-vowel patterns, number of phonemes/syllables, alternative pronunciations, frequency per phonetic syllable within words;· morphology: division into stems and affixes, flat or hierarchical representations, stems and their inflections;· syntax: word class, subcategorisations per word class;· frequency of the entries: disambiguated for homographic lemmata.
This dataset was created by Anurag Banerjee
The Jellyfish Database Initiative (JeDI) is a scientifically-coordinated global database dedicated to gelatinous zooplankton (members of the Cnidaria, Ctenophora and Thaliacea) and associated environmental data. The database holds 476,000 quantitative, categorical, presence-absence and presence only records of gelatinous zooplankton spanning the past four centuries (1790-2011) assembled from a variety of published and unpublished sources. Gelatinous zooplankton data are reported to species level, where identified, but taxonomic information on phylum, family and order are reported for all records. This dataset is a subset of data in Australian waters and adjacent seas.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For full data access, please see record and instructions at 10.5281/zenodo.13827890.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The U.S. Geological Survey (USGS), in partnership with several federal agencies, has now developed and released seven National Land Cover Database (NLCD) products: NLCD 1992, 2001, 2006, 2011, 2016, 2019, and 2021. Beginning with the 2016 release, land cover products were created for two-to-three-year intervals between 2001 and the most recent year. These products provide spatially explicit and reliable information on the Nation’s land cover and land cover change. NLCD continues to provide innovative, consistent, and robust methodologies for production of a multi-temporal land cover and land cover change database. NLCD 2021 adds an additional year to the map products produced for NLCD 2019, with a streamlined compositing process for assembling and preprocessing Landsat imagery and geospatial ancillary datasets; a temporally, spectrally, and spatially integrated land cover change analysis strategy; a theme-based post-classification protocol for generating land cover and change products; a continuous fields biophysical parameters modeling method; and a scripted operational system. The overall accuracy of the 2019 Level I land cover was 91%. Results from this study confirm the robustness of this comprehensive and highly automated procedure for NLCD 2021 operational mapping (see https://doi.org/10.1080/15481603.2023.2181143 for the latest accuracy assessment publication). Questions about the NLCD 2021 land cover product can be directed to the NLCD 2021 land cover mapping team at USGS EROS, Sioux Falls, SD (605) 594-6151 or mrlc@usgs.gov. See included spatial metadata for more details.
Based on the default parameters used in the analysis, the entire AOE database available through figshare (doi: 10.6084/m9.figshare.2060979), represents a subset of the AMNH instance of the AEC database, which includes additional tables to capture host plant data and host analysis.
1) Miridae subFamily(id) =Mirinae(id:8150), Orthotylinae(id:6294), Phylinae(id:6295), Deraeocorinae(id:8163) from AEC database sql. 2) geographic range: North America Country.UID = Canada(id:2),Mexico(id:8),USA(id:11) 3) complete plant host analysis 4) cleaned plant host data
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The vastness of materials space, particularly that which is concerned with metal–organic frameworks (MOFs), creates the critical problem of performing efficient identification of promising materials for specific applications. Although high-throughput computational approaches, including the use of machine learning, have been useful in rapid screening and rational design of MOFs, they tend to neglect descriptors related to their synthesis. One way to improve the efficiency of MOF discovery is to data-mine published MOF papers to extract the materials informatics knowledge contained within journal articles. Here, by adapting the chemistry-aware natural language processing tool, ChemDataExtractor (CDE), we generated an open-source database of MOFs focused on their synthetic properties: the DigiMOF database. Using the CDE web scraping package alongside the Cambridge Structural Database (CSD) MOF subset, we automatically downloaded 43,281 unique MOF journal articles, extracted 15,501 unique MOF materials, and text-mined over 52,680 associated properties including the synthesis method, solvent, organic linker, metal precursor, and topology. Additionally, we developed an alternative data extraction technique to obtain and transform the chemical names assigned to each CSD entry in order to determine linker types for each structure in the CSD MOF subset. This data enabled us to match MOFs to a list of known linkers provided by Tokyo Chemical Industry UK Ltd. (TCI) and analyze the cost of these important chemicals. This centralized, structured database reveals the MOF synthetic data embedded within thousands of MOF publications and contains further topology, metal type, accessible surface area, largest cavity diameter, pore limiting diameter, open metal sites, and density calculations for all 3D MOFs in the CSD MOF subset. The DigiMOF database and associated software are publicly available for other researchers to rapidly search for MOFs with specific properties, conduct further analysis of alternative MOF production pathways, and create additional parsers to search for additional desirable properties.
Dataset Card for "finetune-data-28fee8943227"
More Information needed