85 datasets found
  1. Z

    National Open Access Monitor Survey: Defining Requirements: Responses...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ferris, Catherine (2024). National Open Access Monitor Survey: Defining Requirements: Responses Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_7821825
    Explore at:
    Dataset updated
    Jul 12, 2024
    Dataset authored and provided by
    Ferris, Catherine
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the response data from the ‘National Open Access Monitor Survey: Defining Requirements’ which was carried out between February 2-28th 2023 under the National Open Access Monitor Project: https://doi.org/10.5281/zenodo.7588787

    To note:

    Email addresses have been redacted.

    Responses have been pseudonymised to the level of stakeholder-group e.g. Contributor 1, Research Funding Organisation #, where requested by the participant in the participant consent form: https://doi.org/10.5281/zenodo.7589770.

    Detail within responses by individuals who requested pseudonymisation, which might identify the individual or the organisation has been removed and marked "[redacted]".

    Responses by individuals who did not provide a signed consent form in advance of contributing to the survey were deleted immediately on receipt, as advised in the survey text: https://doi.org/10.5281/zenodo.7588787. Such responses are not included in these files.

    There are five files within in this dataset:

    • Results.NationalOpenAccessMonitorSurvey.DefiningRequirements.README - this PDF details the changes made to the raw data, as specified in the bullet points above and a description of the files within the dataset.

    • Results.NationalOpenAccessMonitorSurvey.DefiningRequirements.Pseudonymised.100323 - this is the original raw data, in csv format, as downloaded from the Online Surveys platform and subsequently pseudonymised and redacted.

    • NationalOpenAccessMonitor.DefiningRequirements.Contributor3Government1.270223 - this PDF file contains a single pseudonymised stakeholder response to the survey, obtained by the Project Manager by email and not through the Online Surveys platform.

    • NationalOpenAccessMonitorSurvey.MasterChangeFile.080323 - this is a change file, in csv format, which documents the changes which participants requested to be made to their submissions after they were received and the survey was closed.

    • Results.NationalOpenAccessMonitorSurvey.DefiningRequirements.Pseudonymised.Updated.100323 - this is the original raw data, pseudonymised and redacted, with the requested changes implemented. This is the file on which analysis was carried out for the purposes of the survey outcome report and the resulting National Open Access Monitor tender documents.

    The context for the survey files is detailed in the National Open Access Monitor Project Plan: https://doi.org/10.5281/zenodo.7331431

    This project is managed by IReL and has received funding from Ireland’s National Open Research Forum under the NORF Open Research Fund. https://norf.ie/funding/ https://norf.ie/orf-projects-announcement/

  2. Common Metadata Elements for Cataloging Biomedical Datasets

    • figshare.com
    xlsx
    Updated Jan 20, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kevin Read (2016). Common Metadata Elements for Cataloging Biomedical Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.1496573.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 20, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Kevin Read
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset outlines a proposed set of core, minimal metadata elements that can be used to describe biomedical datasets, such as those resulting from research funded by the National Institutes of Health. It can inform efforts to better catalog or index such data to improve discoverability. The proposed metadata elements are based on an analysis of the metadata schemas used in a set of NIH-supported data sharing repositories. Common elements from these data repositories were identified, mapped to existing data-specific metadata standards from to existing multidisciplinary data repositories, DataCite and Dryad, and compared with metadata used in MEDLINE records to establish a sustainable and integrated metadata schema. From the mappings, we developed a preliminary set of minimal metadata elements that can be used to describe NIH-funded datasets. Please see the readme file for more details about the individual sheets within the spreadsheet.

  3. n

    Jurisdictional Unit (Public) - Dataset - CKAN

    • nationaldataplatform.org
    Updated Feb 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Jurisdictional Unit (Public) - Dataset - CKAN [Dataset]. https://nationaldataplatform.org/catalog/dataset/jurisdictional-unit-public
    Explore at:
    Dataset updated
    Feb 28, 2024
    Description

    Jurisdictional Unit, 2022-05-21. For use with WFDSS, IFTDSS, IRWIN, and InFORM.This is a feature service which provides Identify and Copy Feature capabilities. If fast-drawing at coarse zoom levels is a requirement, consider using the tile (map) service layer located at https://nifc.maps.arcgis.com/home/item.html?id=3b2c5daad00742cd9f9b676c09d03d13.OverviewThe Jurisdictional Agencies dataset is developed as a national land management geospatial layer, focused on representing wildland fire jurisdictional responsibility, for interagency wildland fire applications, including WFDSS (Wildland Fire Decision Support System), IFTDSS (Interagency Fuels Treatment Decision Support System), IRWIN (Interagency Reporting of Wildland Fire Information), and InFORM (Interagency Fire Occurrence Reporting Modules). It is intended to provide federal wildland fire jurisdictional boundaries on a national scale. The agency and unit names are an indication of the primary manager name and unit name, respectively, recognizing that:There may be multiple owner names.Jurisdiction may be held jointly by agencies at different levels of government (ie State and Local), especially on private lands, Some owner names may be blocked for security reasons.Some jurisdictions may not allow the distribution of owner names. Private ownerships are shown in this layer with JurisdictionalUnitIdentifier=null,JurisdictionalUnitAgency=null, JurisdictionalUnitKind=null, and LandownerKind="Private", LandownerCategory="Private". All land inside the US country boundary is covered by a polygon.Jurisdiction for privately owned land varies widely depending on state, county, or local laws and ordinances, fire workload, and other factors, and is not available in a national dataset in most cases.For publicly held lands the agency name is the surface managing agency, such as Bureau of Land Management, United States Forest Service, etc. The unit name refers to the descriptive name of the polygon (i.e. Northern California District, Boise National Forest, etc.).These data are used to automatically populate fields on the WFDSS Incident Information page.This data layer implements the NWCG Jurisdictional Unit Polygon Geospatial Data Layer Standard.Relevant NWCG Definitions and StandardsUnit2. A generic term that represents an organizational entity that only has meaning when it is contextualized by a descriptor, e.g. jurisdictional.Definition Extension: When referring to an organizational entity, a unit refers to the smallest area or lowest level. Higher levels of an organization (region, agency, department, etc) can be derived from a unit based on organization hierarchy.Unit, JurisdictionalThe governmental entity having overall land and resource management responsibility for a specific geographical area as provided by law.Definition Extension: 1) Ultimately responsible for the fire report to account for statistical fire occurrence; 2) Responsible for setting fire management objectives; 3) Jurisdiction cannot be re-assigned by agreement; 4) The nature and extent of the incident determines jurisdiction (for example, Wildfire vs. All Hazard); 5) Responsible for signing a Delegation of Authority to the Incident Commander.See also: Unit, Protecting; LandownerUnit IdentifierThis data standard specifies the standard format and rules for Unit Identifier, a code used within the wildland fire community to uniquely identify a particular government organizational unit.Landowner Kind & CategoryThis data standard provides a two-tier classification (kind and category) of landownership. Attribute Fields JurisdictionalAgencyKind Describes the type of unit Jurisdiction using the NWCG Landowner Kind data standard. There are two valid values: Federal, and Other. A value may not be populated for all polygons.JurisdictionalAgencyCategoryDescribes the type of unit Jurisdiction using the NWCG Landowner Category data standard. Valid values include: ANCSA, BIA, BLM, BOR, DOD, DOE, NPS, USFS, USFWS, Foreign, Tribal, City, County, OtherLoc (other local, not in the standard), State. A value may not be populated for all polygons.JurisdictionalUnitNameThe name of the Jurisdictional Unit. Where an NWCG Unit ID exists for a polygon, this is the name used in the Name field from the NWCG Unit ID database. Where no NWCG Unit ID exists, this is the “Unit Name” or other specific, descriptive unit name field from the source dataset. A value is populated for all polygons.JurisdictionalUnitIDWhere it could be determined, this is the NWCG Standard Unit Identifier (Unit ID). Where it is unknown, the value is ‘Null’. Null Unit IDs can occur because a unit may not have a Unit ID, or because one could not be reliably determined from the source data. Not every land ownership has an NWCG Unit ID. Unit ID assignment rules are available from the Unit ID standard, linked above.LandownerKindThe landowner category value associated with the polygon. May be inferred from jurisdictional agency, or by lack of a jurisdictional agency. A value is populated for all polygons. There are three valid values: Federal, Private, or Other.LandownerCategoryThe landowner kind value associated with the polygon. May be inferred from jurisdictional agency, or by lack of a jurisdictional agency. A value is populated for all polygons. Valid values include: ANCSA, BIA, BLM, BOR, DOD, DOE, NPS, USFS, USFWS, Foreign, Tribal, City, County, OtherLoc (other local, not in the standard), State, Private.DataSourceThe database from which the polygon originated. Be as specific as possible, identify the geodatabase name and feature class in which the polygon originated.SecondaryDataSourceIf the Data Source is an aggregation from other sources, use this field to specify the source that supplied data to the aggregation. For example, if Data Source is "PAD-US 2.1", then for a USDA Forest Service polygon, the Secondary Data Source would be "USDA FS Automated Lands Program (ALP)". For a BLM polygon in the same dataset, Secondary Source would be "Surface Management Agency (SMA)."SourceUniqueIDIdentifier (GUID or ObjectID) in the data source. Used to trace the polygon back to its authoritative source.MapMethod:Controlled vocabulary to define how the geospatial feature was derived. Map method may help define data quality. MapMethod will be Mixed Method by default for this layer as the data are from mixed sources. Valid Values include: GPS-Driven; GPS-Flight; GPS-Walked; GPS-Walked/Driven; GPS-Unknown Travel Method; Hand Sketch; Digitized-Image; DigitizedTopo; Digitized-Other; Image Interpretation; Infrared Image; Modeled; Mixed Methods; Remote Sensing Derived; Survey/GCDB/Cadastral; Vector; Phone/Tablet; OtherDateCurrentThe last edit, update, of this GIS record. Date should follow the assigned NWCG Date Time data standard, using 24 hour clock, YYYY-MM-DDhh.mm.ssZ, ISO8601 Standard.CommentsAdditional information describing the feature. GeometryIDPrimary key for linking geospatial objects with other database systems. Required for every feature. This field may be renamed for each standard to fit the feature.JurisdictionalUnitID_sansUSNWCG Unit ID with the "US" characters removed from the beginning. Provided for backwards compatibility.JoinMethodAdditional information on how the polygon was matched information in the NWCG Unit ID database.LocalNameLocalName for the polygon provided from PADUS or other source.LegendJurisdictionalAgencyJurisdictional Agency but smaller landholding agencies, or agencies of indeterminate status are grouped for more intuitive use in a map legend or summary table.LegendLandownerAgencyLandowner Agency but smaller landholding agencies, or agencies of indeterminate status are grouped for more intuitive use in a map legend or summary table.DataSourceYearYear that the source data for the polygon were acquired.Data InputThis dataset is based on an aggregation of 4 spatial data sources: Protected Areas Database US (PAD-US 2.1), data from Bureau of Indian Affairs regional offices, the BLM Alaska Fire Service/State of Alaska, and Census Block-Group Geometry. NWCG Unit ID and Agency Kind/Category data are tabular and sourced from UnitIDActive.txt, in the WFMI Unit ID application (https://wfmi.nifc.gov/unit_id/Publish.html). Areas of with unknown Landowner Kind/Category and Jurisdictional Agency Kind/Category are assigned LandownerKind and LandownerCategory values of "Private" by use of the non-water polygons from the Census Block-Group geometry.PAD-US 2.1:This dataset is based in large part on the USGS Protected Areas Database of the United States - PAD-US 2.`. PAD-US is a compilation of authoritative protected areas data between agencies and organizations that ultimately results in a comprehensive and accurate inventory of protected areas for the United States to meet a variety of needs (e.g. conservation, recreation, public health, transportation, energy siting, ecological, or watershed assessments and planning). Extensive documentation on PAD-US processes and data sources is available.How these data were aggregated:Boundaries, and their descriptors, available in spatial databases (i.e. shapefiles or geodatabase feature classes) from land management agencies are the desired and primary data sources in PAD-US. If these authoritative sources are unavailable, or the agency recommends another source, data may be incorporated by other aggregators such as non-governmental organizations. Data sources are tracked for each record in the PAD-US geodatabase (see below).BIA and Tribal Data:BIA and Tribal land management data are not available in PAD-US. As such, data were aggregated from BIA regional offices. These data date from 2012 and were substantially updated in 2022. Indian Trust Land affiliated with Tribes, Reservations, or BIA Agencies: These data are not considered the system of record and are not intended to be used as such. The Bureau of Indian Affairs (BIA), Branch of Wildland Fire Management (BWFM) is not the originator of these data. The

  4. ERA5 monthly averaged data on single levels from 1940 to present

    • cds.climate.copernicus.eu
    grib
    Updated Aug 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ECMWF (2025). ERA5 monthly averaged data on single levels from 1940 to present [Dataset]. http://doi.org/10.24381/cds.f17050d7
    Explore at:
    gribAvailable download formats
    Dataset updated
    Aug 6, 2025
    Dataset provided by
    European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
    Authors
    ECMWF
    License

    https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdf

    Time period covered
    Jan 1, 1940 - Jul 1, 2025
    Description

    ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. ERA5 provides hourly estimates for a large number of atmospheric, ocean-wave and land-surface quantities. An uncertainty estimate is sampled by an underlying 10-member ensemble at three-hourly intervals. Ensemble mean and spread have been pre-computed for convenience. Such uncertainty estimates are closely related to the information content of the available observing system which has evolved considerably over time. They also indicate flow-dependent sensitive areas. To facilitate many climate applications, monthly-mean averages have been pre-calculated too, though monthly means are not available for the ensemble mean and spread. ERA5 is updated daily with a latency of about 5 days (monthly means are available around the 6th of each month). In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. In case that this occurs users are notified. The data set presented here is a regridded subset of the full ERA5 data set on native resolution. It is online on spinning disk, which should ensure fast and easy access. It should satisfy the requirements for most common applications. An overview of all ERA5 datasets can be found in this article. Information on access to ERA5 data on native resolution is provided in these guidelines. Data has been regridded to a regular lat-lon grid of 0.25 degrees for the reanalysis and 0.5 degrees for the uncertainty estimate (0.5 and 1 degree respectively for ocean waves). There are four main sub sets: hourly and monthly products, both on pressure levels (upper air fields) and single levels (atmospheric, ocean-wave and land surface quantities). The present entry is "ERA5 monthly mean data on single levels from 1940 to present".

  5. h

    turkish-image-description-dataset-shard-00

    • huggingface.co
    Updated May 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ertu (2025). turkish-image-description-dataset-shard-00 [Dataset]. https://huggingface.co/datasets/ozertuu/turkish-image-description-dataset-shard-00
    Explore at:
    Dataset updated
    May 4, 2025
    Authors
    ertu
    Description

    Turkish Image Description Dataset - Shard 0

    This dataset contains translated image descriptions from English to Turkish.

      Contents
    

    Images with their Turkish and original English descriptions

      How to use
    

    from datasets import load_dataset

    Load the dataset

    dataset = load_dataset("ozertuu/turkish-image-description-dataset-shard-00")

    Access data

    for item in dataset["train"]: image = item["image"] # PIL.Image object turkish_description =… See the full description on the dataset page: https://huggingface.co/datasets/ozertuu/turkish-image-description-dataset-shard-00.

  6. Park, Beach, Open Space, or Coastline Access

    • data.chhs.ca.gov
    • healthdata.gov
    • +5more
    csv, html, pdf, xlsx +1
    Updated Aug 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Public Health (2025). Park, Beach, Open Space, or Coastline Access [Dataset]. https://data.chhs.ca.gov/dataset/park-beach-open-space-or-coastline-access
    Explore at:
    xlsx, zip, pdf, csv(129337734), htmlAvailable download formats
    Dataset updated
    Aug 5, 2025
    Dataset authored and provided by
    California Department of Public Healthhttps://www.cdph.ca.gov/
    Description

    This table contains data on access to parks measured as the percent of population within ½ a mile of a parks, beach, open space or coastline for California, its regions, counties, county subdivisions, cities, towns, and census tracts. More information on the data table and a data dictionary can be found in the Data and Resources section. As communities become increasingly more urban, parks and the protection of green and open spaces within cities increase in importance. Parks and natural areas buffer pollutants and contribute to the quality of life by providing communities with social and psychological benefits such as leisure, play, sports, and contact with nature. Parks are critical to human health by providing spaces for health and wellness activities. The access to parks table is part of a series of indicators in the Healthy Communities Data and Indicators Project (HCI) of the Office of Health Equity. The goal of HCI is to enhance public health by providing data, a standardized set of statistical measures, and tools that a broad array of sectors can use for planning healthy communities and evaluating the impact of plans, projects, policy, and environmental changes on community health. The creation of healthy social, economic, and physical environments that promote healthy behaviors and healthy outcomes requires coordination and collaboration across multiple sectors, including transportation, housing, education, agriculture and others. Statistical metrics, or indicators, are needed to help local, regional, and state public health and partner agencies assess community environments and plan for healthy communities that optimize public health. The format of the access to parks table is based on the standardized data format for all HCI indicators. As a result, this data table contains certain variables used in the HCI project (e.g., indicator ID, and indicator definition). Some of these variables may contain the same value for all observations.

  7. COVID-19 Case Surveillance Restricted Access Detailed Data

    • data.cdc.gov
    • data.virginia.gov
    • +4more
    application/rdfxml +5
    Updated Nov 20, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CDC Data, Analytics and Visualization Task Force (2020). COVID-19 Case Surveillance Restricted Access Detailed Data [Dataset]. https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Restricted-Access-Detai/mbd7-r32t
    Explore at:
    application/rssxml, xml, json, csv, tsv, application/rdfxmlAvailable download formats
    Dataset updated
    Nov 20, 2020
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    Authors
    CDC Data, Analytics and Visualization Task Force
    License

    https://www.usa.gov/government-workshttps://www.usa.gov/government-works

    Description

    Note: Reporting of new COVID-19 Case Surveillance data will be discontinued July 1, 2024, to align with the process of removing SARS-CoV-2 infections (COVID-19 cases) from the list of nationally notifiable diseases. Although these data will continue to be publicly available, the dataset will no longer be updated.

    Authorizations to collect certain public health data expired at the end of the U.S. public health emergency declaration on May 11, 2023. The following jurisdictions discontinued COVID-19 case notifications to CDC: Iowa (11/8/21), Kansas (5/12/23), Kentucky (1/1/24), Louisiana (10/31/23), New Hampshire (5/23/23), and Oklahoma (5/2/23). Please note that these jurisdictions will not routinely send new case data after the dates indicated. As of 7/13/23, case notifications from Oregon will only include pediatric cases resulting in death.

    This case surveillance publicly available dataset has 33 elements for all COVID-19 cases shared with CDC and includes demographics, geography (county and state of residence), any exposure history, disease severity indicators and outcomes, and presence of any underlying medical conditions and risk behaviors. This dataset requires a registration process and a data use agreement.

    CDC has three COVID-19 case surveillance datasets:

    Requesting Access to the COVID-19 Case Surveillance Restricted Access Detailed Data Please review the following documents to determine your interest in accessing the COVID-19 Case Surveillance Restricted Access Detailed Data file: 1) CDC COVID-19 Case Surveillance Restricted Access Detailed Data: Summary, Guidance, Limitations Information, and Restricted Access Data Use Agreement Information 2) Data Dictionary for the COVID-19 Case Surveillance Restricted Access Detailed Data The next step is to complete the Registration Information and Data Use Restrictions Agreement (RIDURA). Once complete, CDC will review your agreement. After access is granted, Ask SRRG (eocevent394@cdc.gov) will email you information about how to access the data through GitHub. If you have questions about obtaining access, email eocevent394@cdc.gov.

    Overview

    The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020, to clarify the interpretation of antigen detection tests and serologic test results within the case classification (Interim-20-ID-02). The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and reported voluntarily to CDC.

    COVID-19 case surveillance data are collected by jurisdictions and are shared voluntarily with CDC. For more information, visit: https://www.cdc.gov/coronavirus/2019-ncov/covid-data/about-us-cases-deaths.html.

    The deidentified data in the restricted access dataset include demographic characteristics, state and county of residence, any exposure history, disease severity indicators and outcomes, clinical data, laboratory diagnostic test results, and comorbidities.

    All data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf.

    COVID-19 case reports have been routinely submitted using standardized case reporting forms.

    On April 5, 2020, CSTE released an Interim Position Statement with national surveillance case definitions for COVID-19 included. Current versions of these case definitions are available here: https://ndc.services.cdc.gov/case-definitions/coronavirus-disease-2019-2021/.

    CSTE updated the position statement on August 5, 2020, to clarify the interpretation of antigen detection tests and serologic test results within the case classification. All cases reported on or after were requested to be shared by public health departments to CDC using the standardized case definitions for lab-confirmed or probable cases.

    On May 5, 2020, the standardized case reporting form was revised. Case reporting using this new form is ongoing among U.S. states and territories.

    Data are Considered Provisional

    • The COVID-19 case surveillance data are dynamic; case reports can be modified at any time by the jurisdictions sharing COVID-19 data with CDC. CDC may update prior cases shared with CDC based on any updated information from jurisdictions. For instance, as new information is gathered about previously reported cases, health departments provide updated data to CDC. As more information and data become available, analyses might find changes in surveillance data and trends during a previously reported time window. Data may also be shared late with CDC due to the volume of COVID-19 cases.
    • Annual finalized data: To create the final NNDSS data used in the annual tables, CDC works carefully with the reporting jurisdictions to reconcile the data received during the year until each state or territorial epidemiologist confirms that the data from their area are correct.

    Access Addressing Gaps in Public Health Reporting of Race and Ethnicity for COVID-19, a report from the Council of State and Territorial Epidemiologists, to better understand the challenges in completing race and ethnicity data for COVID-19 and recommendations for improvement.

    Data Limitations

    To learn more about the limitations in using case surveillance data, visit FAQ: COVID-19 Data and Surveillance.

    Data Quality Assurance Procedures

    CDC’s Case Surveillance Section routinely performs data quality assurance procedures (i.e., ongoing corrections and logic checks to address data errors). To date, the following data cleaning steps have been implemented:

    • Questions that have been left unanswered (blank) on the case report form are reclassified to a Missing value, if applicable to the question. For example, in the question "Was the individual hospitalized?" where the possible answer choices include "Yes," "No," or "Unknown," the blank value is recoded to "Missing" because the case report form did not include a response to the question.
    • Logic checks are performed for date data. If an illogical date has been provided, CDC reviews the data with the reporting jurisdiction. For example, if a symptom onset date in the future is reported to CDC, this value is set to null until the reporting jurisdiction updates the date appropriately.
    • Additional data quality processing to recode free text data is ongoing. Data on symptoms, race, ethnicity, and healthcare worker status have been prioritized.

    Data Suppression

    To prevent release of data that could be used to identify people, data cells are suppressed for low frequency (<11 COVID-19 case records with a given values). Suppression includes low frequency combinations of case month, geographic characteristics (county and state of residence), and demographic characteristics (sex, age group, race, and ethnicity). Suppressed values are re-coded to the NA answer option; records with data suppression are never removed.

    Additional COVID-19 Data

    COVID-19 data are available to the public as summary or aggregate count files, including total counts of cases and deaths by state and by county. These and other COVID-19 data are available from multiple public locations:

  8. College Student Placement Factors Dataset

    • kaggle.com
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sahil Islam007 (2025). College Student Placement Factors Dataset [Dataset]. https://www.kaggle.com/datasets/sahilislam007/college-student-placement-factors-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 2, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sahil Islam007
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📘 College Student Placement Dataset

    A realistic, large-scale synthetic dataset of 10,000 students designed to analyze factors affecting college placements.

    📄 Dataset Description

    This dataset simulates the academic and professional profiles of 10,000 college students, focusing on factors that influence placement outcomes. It includes features like IQ, academic performance, CGPA, internships, communication skills, and more.

    The dataset is ideal for:

    • Predictive modeling of placement outcomes
    • Educational exercises in classification
    • Feature importance analysis
    • End-to-end machine learning projects

    📊 Columns Description

    Column NameDescription
    College_IDUnique ID of the college (e.g., CLG0001 to CLG0100)
    IQStudent’s IQ score (normally distributed around 100)
    Prev_Sem_ResultGPA from the previous semester (range: 5.0 to 10.0)
    CGPACumulative Grade Point Average (range: ~5.0 to 10.0)
    Academic_PerformanceAnnual academic rating (scale: 1 to 10)
    Internship_ExperienceWhether the student has completed any internship (Yes/No)
    Extra_Curricular_ScoreInvolvement in extracurriculars (score from 0 to 10)
    Communication_SkillsSoft skill rating (scale: 1 to 10)
    Projects_CompletedNumber of academic/technical projects completed (0 to 5)
    PlacementFinal placement result (Yes = Placed, No = Not Placed)

    🎯 Target Variable

    • Placement: This is the binary classification target (Yes/No) that you can try to predict based on the other features.

    🧠 Use Cases

    • 📈 Classification Modeling (Logistic Regression, Decision Trees, Random Forest, etc.)
    • 🔍 Exploratory Data Analysis (EDA)
    • 🎯 Feature Engineering and Selection
    • 🧪 Model Evaluation Practice
    • 👩‍🏫 Academic Projects & Capstone Use

    📦 Dataset Size

    • Rows: 10,000
    • Columns: 10
    • File Format: .csv

    📚 Context

    This dataset was generated to resemble real-world data in academic institutions for research and machine learning use. While it is synthetic, the variables and relationships are crafted to mimic authentic trends observed in student placements.

    📜 License

    MIT

    🔗 Source

    Created using Python (NumPy, Pandas) with data logic designed for educational and ML experimentation purposes.

  9. Database Creation Description and Data Dictionaries

    • figshare.com
    txt
    Updated Aug 11, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jordan Kempker; John David Ike (2016). Database Creation Description and Data Dictionaries [Dataset]. http://doi.org/10.6084/m9.figshare.3569067.v3
    Explore at:
    txtAvailable download formats
    Dataset updated
    Aug 11, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Jordan Kempker; John David Ike
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    There are several Microsoft Word documents here detailing data creation methods and with various dictionaries describing the included and derived variables.The Database Creation Description is meant to walk a user through some of the steps detailed in the SAS code with this project.The alphabetical list of variables is intended for users as sometimes this makes some coding steps easier to copy and paste from this list instead of retyping.The NIS Data Dictionary contains some general dataset description as well as each variable's responses.

  10. u

    Data from: Current and projected research data storage needs of Agricultural...

    • agdatacommons.nal.usda.gov
    • datasets.ai
    • +2more
    pdf
    Updated Nov 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cynthia Parr (2023). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. http://doi.org/10.15482/USDA.ADC/1346946
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 30, 2023
    Dataset provided by
    Ag Data Commons
    Authors
    Cynthia Parr
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey.
    Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values.

    Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

  11. Z

    DCASE 2023 Challenge Task 2 Development Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated May 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kota Dohi (2023). DCASE 2023 Challenge Task 2 Development Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7687463
    Explore at:
    Dataset updated
    May 2, 2023
    Dataset provided by
    Noboru
    Yuma
    Takashi
    Tomoya
    Kota Dohi
    Daisuke
    Harsh
    Keisuke
    Yohei
    Description

    Description

    This dataset is the "development dataset" for the DCASE 2023 Challenge Task 2 "First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring".

    The data consists of the normal/anomalous operating sounds of seven types of real/toy machines. Each recording is a single-channel 10-second audio that includes both a machine's operating sound and environmental noise. The following seven types of real/toy machines are used in this task:

    ToyCar

    ToyTrain

    Fan

    Gearbox

    Bearing

    Slide rail

    Valve

    Overview of the task

    Anomalous sound detection (ASD) is the task of identifying whether the sound emitted from a target machine is normal or anomalous. Automatic detection of mechanical failure is an essential technology in the fourth industrial revolution, which involves artificial-intelligence-based factory automation. Prompt detection of machine anomalies by observing sounds is useful for monitoring the condition of machines.

    This task is the follow-up from DCASE 2020 Task 2 to DCASE 2022 Task 2. The task this year is to develop an ASD system that meets the following four requirements.

    1. Train a model using only normal sound (unsupervised learning scenario)

    Because anomalies rarely occur and are highly diverse in real-world factories, it can be difficult to collect exhaustive patterns of anomalous sounds. Therefore, the system must detect unknown types of anomalous sounds that are not provided in the training data. This is the same requirement as in the previous tasks.

    1. Detect anomalies regardless of domain shifts (domain generalization task)

    In real-world cases, the operational states of a machine or the environmental noise can change to cause domain shifts. Domain-generalization techniques can be useful for handling domain shifts that occur frequently or are hard-to-notice. In this task, the system is required to use domain-generalization techniques for handling these domain shifts. This requirement is the same as in DCASE 2022 Task 2.

    1. Train a model for a completely new machine type

    For a completely new machine type, hyperparameters of the trained model cannot be tuned. Therefore, the system should have the ability to train models without additional hyperparameter tuning.

    1. Train a model using only one machine from its machine type

    While sounds from multiple machines of the same machine type can be used to enhance detection performance, it is often the case that sound data from only one machine are available for a machine type. In such a case, the system should be able to train models using only one machine from a machine type.

    The last two requirements are newly introduced in DCASE 2023 Task2 as the "first-shot problem".

    Definition

    We first define key terms in this task: "machine type," "section," "source domain," "target domain," and "attributes.".

    "Machine type" indicates the type of machine, which in the development dataset is one of seven: fan, gearbox, bearing, slide rail, valve, ToyCar, and ToyTrain.

    A section is defined as a subset of the dataset for calculating performance metrics.

    The source domain is the domain under which most of the training data and some of the test data were recorded, and the target domain is a different set of domains under which some of the training data and some of the test data were recorded. There are differences between the source and target domains in terms of operating speed, machine load, viscosity, heating temperature, type of environmental noise, signal-to-noise ratio, etc.

    Attributes are parameters that define states of machines or types of noise.

    Dataset

    This dataset consists of seven machine types. For each machine type, one section is provided, and the section is a complete set of training and test data. For each section, this dataset provides (i) 990 clips of normal sounds in the source domain for training, (ii) ten clips of normal sounds in the target domain for training, and (iii) 100 clips each of normal and anomalous sounds for the test. The source/target domain of each sample is provided. Additionally, the attributes of each sample in the training and test data are provided in the file names and attribute csv files.

    File names and attribute csv files

    File names and attribute csv files provide reference labels for each clip. The given reference labels for each training/test clip include machine type, section index, normal/anomaly information, and attributes regarding the condition other than normal/anomaly. The machine type is given by the directory name. The section index is given by their respective file names. For the datasets other than the evaluation dataset, the normal/anomaly information and the attributes are given by their respective file names. Attribute csv files are for easy access to attributes that cause domain shifts. In these files, the file names, name of parameters that cause domain shifts (domain shift parameter, dp), and the value or type of these parameters (domain shift value, dv) are listed. Each row takes the following format:

    [filename (string)], [d1p (string)], [d1v (int | float | string)], [d2p], [d2v]...
    

    Recording procedure

    Normal/anomalous operating sounds of machines and its related equipment are recorded. Anomalous sounds were collected by deliberately damaging target machines. For simplifying the task, we use only the first channel of multi-channel recordings; all recordings are regarded as single-channel recordings of a fixed microphone. We mixed a target machine sound with environmental noise, and only noisy recordings are provided as training/test data. The environmental noise samples were recorded in several real factory environments. We will publish papers on the dataset to explain the details of the recording procedure by the submission deadline.

    Directory structure

    • /dev_data

      • /raw
        • /fan
          • /train (only normal clips)
            • /section_00_source_train_normal_0000_.wav
            • ...
            • /section_00_source_train_normal_0989_.wav
            • /section_00_target_train_normal_0000_.wav
            • ...
            • /section_00_target_train_normal_0009_.wav
          • /test
            • /section_00_source_test_normal_0000_.wav
            • ...
            • /section_00_source_test_normal_0049_.wav
            • /section_00_source_test_anomaly_0000_.wav
            • ...
            • /section_00_source_test_anomaly_0049_.wav
            • /section_00_target_test_normal_0000_.wav
            • ...
            • /section_00_target_test_normal_0049_.wav
            • /section_00_target_test_anomaly_0000_.wav
            • ...
            • /section_00_target_test_anomaly_0049_.wav
          • attributes_00.csv (attribute csv for section 00)
      • /gearbox (The other machine types have the same directory structure as fan.)
      • /bearing
      • /slider (slider means "slide rail")
      • /ToyCar
      • /ToyTrain
      • /valve

    Baseline system

    The baseline system is available on the Github repository dcase2023_task2_baseline_ae.The baseline systems provide a simple entry-level approach that gives a reasonable performance in the dataset of Task 2. They are good starting points, especially for entry-level researchers who want to get familiar with the anomalous-sound-detection task.

    Condition of use

    This dataset was created jointly by Hitachi, Ltd. and NTT Corporation and is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

    Citation

    If you use this dataset, please cite all the following papers. We will publish a paper on the description of the DCASE 2023 Task 2, so pleasure make sure to cite the paper, too.

    Noboru Harada, Daisuke Niizumi, Yasunori Ohishi, Daiki Takeuchi, and Masahiro Yasuda. First-shot anomaly detection for machine condition monitoring: A domain generalization baseline. In arXiv e-prints: 2303.00455, 2023. [URL]

    Kota Dohi, Tomoya Nishida, Harsh Purohit, Ryo Tanabe, Takashi Endo, Masaaki Yamamoto, Yuki Nikaido, and Yohei Kawaguchi. MIMII DG: sound dataset for malfunctioning industrial machine investigation and inspection for domain generalization task. In Proceedings of the 7th Detection and Classification of Acoustic Scenes and Events 2022 Workshop (DCASE2022), 31-35. Nancy, France, November 2022, . [URL]

    Noboru Harada, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Masahiro Yasuda, and Shoichiro Saito. ToyADMOS2: another dataset of miniature-machine operating sounds for anomalous sound detection under domain shift conditions. In Proceedings of the 6th Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021), 1–5. Barcelona, Spain, November 2021. [URL]

    Contact

    If there is any problem, please contact us:

    Kota Dohi, kota.dohi.gr@hitachi.com

    Keisuke Imoto, keisuke.imoto@ieee.org

    Noboru Harada, noboru@ieee.org

    Daisuke Niizumi, daisuke.niizumi.dt@hco.ntt.co.jp

    Yohei Kawaguchi, yohei.kawaguchi.xk@hitachi.com

  12. Crunchbase Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Apr 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2022). Crunchbase Datasets [Dataset]. https://brightdata.com/products/datasets/crunchbase
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Apr 10, 2022
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Bright Data’s datasets are created by utilizing proprietary technology for retrieving public web data at scale, resulting in fresh, complete, and accurate datasets. CrunchBase datasets provide unique insights into the latest industry trends. They enable the tracking of company growth, identifying key businesses and professionals, tracking employee movement between companies, as well as enabling more efficient competitive intelligence. Easily define your Crunchbase dataset using our smart filter capabilities, enabling you to customize pre-existing datasets, ensuring the data received fits your business needs. Bright Data’s Crunchbase company data includes over 2.8 million company profiles, with subsets available by industry, region, and any other parameters according to your requirements. There are over 70 data points per company, including overview, details, news, financials, investors, products, people, and more. Choose between full coverage or a subset. Get your Crunchbase dataset Today!

  13. u

    JRA-55C: Monthly Means and Variances

    • data.ucar.edu
    • rda-web-prod.ucar.edu
    • +4more
    grib
    Updated Aug 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Japan Meteorological Agency, Japan (2024). JRA-55C: Monthly Means and Variances [Dataset]. http://doi.org/10.5065/D6C827B7
    Explore at:
    gribAvailable download formats
    Dataset updated
    Aug 4, 2024
    Dataset provided by
    Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory
    Authors
    Japan Meteorological Agency, Japan
    Time period covered
    Nov 1, 1972 - Jan 1, 2013
    Area covered
    Earth
    Description

    As a subset of the Japanese 55-year Reanalysis (JRA-55) project, the Meteorological Research Institute of the Japan Meteorological Agency has conducted a global atmospheric reanalysis that assimilates only conventional surface and upper air observations, with no use of satellite observations, using the same data assimilation system as the JRA-55. The project, named the JRA-55 Conventional (JRA-55C), aims to produce a more homogeneous dataset over a long period, unaffected by changes in historical satellite observing systems. The dataset is intended to be suitable for studies of climate change or multidecadal variability. The reanalysis period of JRA-55C is from November 1972 to December 2012. The JMA recommends the use of JRA-55 to extend JRA-55C back to January 1958. The Data Support Section at NCAR has downloaded all JRA-55C data. The entire archive has been reorganized into single parameter time series, and model resolution data has been transformed to a regular Gaussian grid. The JRA-55C products are currently being made accessible to RDA registered users of JRA-55, and will appear incrementally via the Data Access tab.

  14. Cloud Access Control Parameter Management

    • kaggle.com
    Updated Sep 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mainak Chaudhuri (2024). Cloud Access Control Parameter Management [Dataset]. https://www.kaggle.com/datasets/brijlaldhankour/cloud-access-control-parameter-management/suggestions?status=pending&yourSuggestions=true
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 22, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Mainak Chaudhuri
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Access control evaluation in a networking cloud architecture is influenced by a variety of factors that determine how securely and effectively resources are accessed and managed. Here are 50 factors that affect access control evaluation:

    1. Authentication Mechanisms: Type and strength of user authentication (e.g., MFA, SSO, biometric).
    2. Authorization Models: RBAC (Role-Based Access Control), ABAC (Attribute-Based Access Control), or other models.
    3. User Identity Management: How user identities are managed and verified across systems.
    4. Access Levels: Differentiation between read, write, modify, and admin privileges.
    5. User Roles: Specific permissions associated with different user roles in the system.
    6. Security Policies: Defined security policies governing who can access what data.
    7. Compliance Requirements: Regulatory compliance (GDPR, HIPAA) affecting access control configurations.
    8. User Session Management: How long user sessions last and session expiration policies.
    9. Privileged Access Management: Managing elevated access privileges for critical system components.
    10. Third-Party Integrations: Access control policies for third-party tools and applications integrated into the system.
    11. Cloud Service Provider (CSP) Policies: CSP-specific access control mechanisms (AWS IAM, GCP IAM, etc.).
    12. Geolocation Restrictions: Access restrictions based on geographical location of the user.
    13. Time-Based Access: Access control based on time of day or specific time windows.
    14. User Behavior Analytics: Using behavioral patterns to identify and restrict anomalous access attempts.
    15. Network Security Controls: Firewalls, VPNs, and segmentation impacting access control policies.
    16. Access Control Lists (ACLs): Network ACLs managing inbound/outbound traffic.
    17. Encryption Policies: Ensuring data is encrypted both at rest and in transit to prevent unauthorized access.
    18. Data Sensitivity Classification: Classification of data to impose stricter access controls based on sensitivity.
    19. Logging and Monitoring: Real-time access logging to detect and respond to unauthorized access attempts.
    20. Security Groups: Virtual firewall rules for controlling traffic to and from instances in the cloud.
    21. Identity Federation: Integration of external identity providers (Azure AD, Okta, etc.) for access control.
    22. Least Privilege Principle: Ensuring users only have the minimum access needed for their roles.
    23. Access Control Propagation: How access permissions propagate through cloud resources and services.
    24. API Access Control: Policies controlling access to cloud APIs and services.
    25. Cloud Workload Identity: How cloud workloads authenticate and authorize access to resources.
    26. Audit Trails: Comprehensive auditing for access control to ensure accountability and compliance.
    27. Access Revocation: Policies on promptly revoking access when roles or permissions change.
    28. Cross-Region Access: Managing access control across cloud regions and data centers.
    29. Data Loss Prevention (DLP): DLP policies affecting access to sensitive data.
    30. Multi-Tenancy Security: Ensuring proper segregation of access control in multi-tenant environments.
    31. Cloud Orchestration Layer Security: Managing access to orchestration platforms like Kubernetes.
    32. Token-based Access Control: Use of tokens (OAuth, JWT) for securing API calls and session management.
    33. Access Control Policies for Serverless: Security and access control for serverless functions.
    34. Granular Access Control: Fine-grained permissions for specific cloud resources.
    35. Cloud Native Directory Services: Use of services like AWS Directory Service for managing user access.
    36. Access to Logs and Monitoring Tools: Controlling who can view or manage logs, dashboards, and monitoring tools.
    37. Custom Access Control Policies: Tailored access control mechanisms beyond built-in cloud tools.
    38. Zero Trust Architecture: Implementing zero trust principles in access control.
    39. Infrastructure as Code (IaC): Managing and enforcing access control through infrastructure as code scripts.
    40. Virtual Private Cloud (VPC) Controls: VPC-specific access control rules and boundaries.
    41. Segmentation of Duties: Separation of access privileges across different roles to reduce risk.
    42. Instance Metadata Service (IMDS) Access: Controlling access to instance metadata in the cloud.
    43. Shared Responsibility Model: Understanding the shared security responsibilities between the cloud provider and customer.
    44. Cloud Storage Access Policies: Controlling access to cloud storage (e.g., S3 buckets, Azure Blob).
    45. Data Governance Framework: Governance policies that define how data access is controlled and audited. ...
  15. Emergency Medical Service Stations

    • wifire-data.sdsc.edu
    • gis-calema.opendata.arcgis.com
    csv, esri rest +4
    Updated May 22, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CA Governor's Office of Emergency Services (2019). Emergency Medical Service Stations [Dataset]. https://wifire-data.sdsc.edu/dataset/emergency-medical-service-stations
    Explore at:
    esri rest, kml, zip, geojson, csv, htmlAvailable download formats
    Dataset updated
    May 22, 2019
    Dataset provided by
    California Governor's Office of Emergency Services
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description
    The dataset represents Emergency Medical Services (EMS) locations in the United States and its territories. EMS Stations are part of the Fire Stations / EMS Stations HSIP Freedom sub-layer, which in turn is part of the Emergency Services and Continuity of Government Sector, which is itself a part of the Critical Infrastructure Category. The EMS stations dataset consists of any location where emergency medical service (EMS) personnel are stationed or based out of, or where equipment that such personnel use in carrying out their jobs is stored for ready use. Ambulance services are included even if they only provide transportation services, but not if they are located at, and operated by, a hospital. If an independent ambulance service or EMS provider happens to be collocated with a hospital, it will be included in this dataset. The dataset includes both private and governmental entities. A concerted effort was made to include all emergency medical service locations in the United States and its territories. This dataset is comprised completely of license free data. Records with "-DOD" appended to the end of the [NAME] value are located on a military base, as defined by the Defense Installation Spatial Data Infrastructure (DISDI) military installations and military range boundaries. At the request of NGA, text fields in this dataset have been set to all upper case to facilitate consistent database engine search results. At the request of NGA, all diacritics (e.g., the German umlaut or the Spanish tilde) have been replaced with their closest equivalent English character to facilitate use with database systems that may not support diacritics. The currentness of this dataset is indicated by the [CONTDATE] field. Based upon this field, the oldest record dates from 12/29/2004 and the newest record dates from 01/11/2010.

    This dataset represents the EMS stations of any location where emergency medical service (EMS) personnel are stationed or based out of, or where equipment that such personnel use in carrying out their jobs is stored for ready use. Homeland Security Use Cases: Use cases describe how the data may be used and help to define and clarify requirements. 1. An assessment of whether or not the total emergency medical services capability in a given area is adequate. 2. A list of resources to draw upon by surrounding areas when local resources have temporarily been overwhelmed by a disaster - route analysis can determine those entities that are able to respond the quickest. 3. A resource for Emergency Management planning purposes. 4. A resource for catastrophe response to aid in the retrieval of equipment by outside responders in order to deal with the disaster. 5. A resource for situational awareness planning and response for Federal Government events.


  16. Z

    SAPFLUXNET: A global database of sap flow measurements

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Sep 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Víctor Flo (2020). SAPFLUXNET: A global database of sap flow measurements [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2530797
    Explore at:
    Dataset updated
    Sep 26, 2020
    Dataset provided by
    Víctor Flo
    Maurizio Mencuccini
    Kathy Steppe
    Víctor Granda
    Roberto Molowny-Horas
    Jordi Martínez-Vilalta
    Rafael Poyatos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    General description

    SAPFLUXNET contains a global database of sap flow and environmental data, together with metadata at different levels. SAPFLUXNET is a harmonised database, compiled from contributions from researchers worldwide.

    The SAPFLUXNET version 0.1.5 database harbours 202 globally distributed datasets, from 121 geographical locations. SAPFLUXNET contains sap flow data for 2714 individual plants (1584 angiosperms and 1130 gymnosperms), belonging to 174 species (141 angiosperms and 33 gymnosperms), 95 different genera and 45 different families. More information on the database coverage can be found here: http://sapfluxnet.creaf.cat/shiny/sfn_progress_dashboard/.

    The SAPFLUXNET project has been developed by researchers at CREAF and other institutions (http://sapfluxnet.creaf.cat/#team), coordinated by Rafael Poyatos (CREAF, http://www.creaf.cat/staff/rafael-poyatos-lopez), and funded by two Spanish Young Researcher's Grants (SAPFLUXNET, CGL2014-55883-JIN; DATAFORUSE, RTI2018-095297-J-I00 ) and an Alexander von Humboldt Research Fellowship for Experienced Researchers).

    Changelog

    Compared to version 0.1.4, this version includes some changes in the metadata, but all time series data (sap flow, environmental) remain the same.

    For all datasets, climate metadata (temperature and precipitation, ‘si_mat’ and ‘si_map’) have been extracted from CHELSA (https://chelsa-climate.org/), replacing the previous climate data obtained with Wordclim. This change has modified the biome classification of the datasets in ‘si_biome’.

    In ‘species’ metadata, the percentage of basal area with sap flow measurements for each species (‘sp_basal_area_perc’) is now assigned a value of 0 if species are in the understorey. This affects two datasets: AUS_MAR_UBD and AUS_MAR_UBW, where, previously, the sum of species basal area percentages could add up to more than 100%.

    In ‘species’ metadata, the percentage of basal area with sap flow measurements for each species (‘sp_basal_area_perc’) has been corrected for datasets USA_SIL_OAK_POS, USA_SIL_OAK_1PR, USA_SIL_OAK_2PR.

    In ‘site’ metadata, the vegetation type (‘si_igbp’) has been changed to SAV for datasets CHN_ARG_GWD and CHN_ARG_GWS.

    Variables and units

    SAPFLUXNET contains whole-plant sap flow and environmental variables at sub-daily temporal resolution. Both sap flow and environmental time series have accompanying flags in a data frame, one for sap flow and another for environmental variables. These flags store quality issues detected during the quality control process and can be used to add further quality flags.

    Metadata contain relevant variables informing about site conditions, stand characteristics, tree and species attributes, sap flow methodology and details on environmental measurements. The description and units of all data and metadata variables can be found here: Metadata and data units.

    To learn more about variables, units and data flags please use the functionalities implemented in the sapfluxnetr package (https://github.com/sapfluxnet/sapfluxnetr). In particular, have a look at the package vignettes using R:

    remotes::install_github(

    'sapfluxnet/sapfluxnetr',

    build_opts = c("--no-resave-data", "--no-manual", "--build-vignettes")

    )

    library(sapfluxnetr)

    to list all vignettes

    vignette(package='sapfluxnetr')

    variables and units

    vignette('metadata-and-data-units', package='sapfluxnetr')

    data flags

    vignette('data-flags', package='sapfluxnetr')

    Data formats

    SAPFLUXNET data can be found in two formats: 1) RData files belonging to the custom-built 'sfn_data' class and 2) Text files in .csv format. We recommend using the sfn_data objects together with the sapfluxnetr package, although we also provide the text files for convenience. For each dataset, text files are structured in the same way as the slots of sfn_data objects; if working with text files, we recommend that you check the data structure of 'sfn_data' objects in the corresponding vignette.

    Working with sfn_data files

    To work with SAPFLUXNET data, first they have to be downloaded from Zenodo, maintaining the folder structure. A first level in the folder hierarchy corresponds to file format, either RData files or csv's. A second level corresponds to how sap flow is expressed: per plant, per sapwood area or per leaf area. Please note that interconversions among the magnitudes have been performed whenever possible. Below this level, data have been organised per dataset. In the case of RData files, each dataset is contained in a sfn_data object, which stores all data and metadata in different slots (see the vignette 'sfn-data-classes'). In the case of csv files, each dataset has 9 individual files, corresponding to metadata (5), sap flow and environmental data (2) and their corresponding data flags (2).

    After downloading the entire database, the sapfluxnetr package can be used to: - Work with data from a single site: data access, plotting and time aggregation. - Select the subset datasets to work with. - Work with data from multiple sites: data access, plotting and time aggregation.

    Please check the following package vignettes to learn more about how to work with sfn_data files:

    Quick guide

    Metadata and data units

    sfn_data classes

    Custom aggregation

    Memory and parallelization

    Working with text files

    We recommend to work with sfn_data objects using R and the sapfluxnetr package and we do not currently provide code to work with text files.

    Data issues and reporting

    Please report any issue you may find in the database by sending us an email: sapfluxnet@creaf.uab.cat.

    Temporary data fixes, detected but not yet included in released versions will be published in SAPFLUXNET main web page ('Known data errors').

    Data access, use and citation

    This version of the SAPFLUXNET database is open access and corresponds to the data paper submitted to Earth System Science Data in August 2020.

    When using SAPFLUXNET data in an academic work, please cite the data paper, when available, or alternatively, the Zenodo dataset (see the ‘Cite as’ section on the right panels of this web page).

  17. d

    Asset database for the Hunter subregion on 24 February 2016

    • data.gov.au
    • cloud.csiss.gmu.edu
    • +2more
    Updated Aug 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2023). Asset database for the Hunter subregion on 24 February 2016 [Dataset]. https://data.gov.au/data/dataset/activity/a39290ac-3925-4abc-9ecb-b91e911f008f
    Explore at:
    Dataset updated
    Aug 9, 2023
    Dataset authored and provided by
    Bioregional Assessment Program
    Description

    Abstract

    The dataset was derived by the Bioregional Assessment Programme. This dataset was derived from multiple datasets. You can find a link to the parent datasets in the Lineage Field in this metadata statement. The History Field in this metadata statement describes how this dataset was derived.

    Asset database for the Hunter subregion on 24 February 2016 (V2.5) supersedes the previous version of the HUN Asset database V2.4 (Asset database for the Hunter subregion on 20 November 2015, GUID: 0bbcd7f6-2d09-418c-9549-8cbd9520ce18). It contains the Asset database (HUN_asset_database_20160224.mdb), a Geodatabase version for GIS mapping purposes (HUN_asset_database_20160224_GISOnly.gdb), the draft Water Dependent Asset Register spreadsheet (BA-NSB-HUN-130-WaterDependentAssetRegister-AssetList-V20160224.xlsx), a data dictionary (HUN_asset_database_doc_20160224.doc), and a folder (NRM_DOC) containing documentation associated with the Water Asset Information Tool (WAIT) process as outlined below. This version should be used for Materiality Test (M2) test.

    The Asset database is registered to the BA repository as an ESRI personal goedatabase (.mdb - doubling as a MS Access database) that can store, query, and manage non-spatial data while the spatial data is in a separate file geodatabase joined by AID/ElementID.

    Under the BA program, a spatial assets database is developed for each defined bioregional assessment project. The spatial elements that underpin the identification of water dependent assets are identified in the first instance by regional NRM organisations (via the WAIT tool) and supplemented with additional elements from national and state/territory government datasets. A report on the WAIT process for the Hunter is included in the zip file as part of this dataset.

    Elements are initially included in the preliminary assets database if they are partly or wholly within the subregion's preliminary assessment extent (Materiality Test 1, M1). Elements are then grouped into assets which are evaluated by project teams to determine whether they meet the second Materiality Test (M2). Assets meeting both Materiality Tests comprise the water dependent asset list. Descriptions of the assets identified in the Hunter subregion are found in the "AssetList" table of the database.

    Assets are the spatial features used by project teams to model scenarios under the BA program. Detailed attribution does not exist at the asset level. Asset attribution includes only the core set of BA-derived attributes reflecting the BA classification hierarchy, as described in Appendix A of "HUN_asset_database_doc_20160224.doc ", located in this filet.

    The "Element_to_Asset" table contains the relationships and identifies the elements that were grouped to create each asset.

    Detailed information describing the database structure and content can be found in the document "HUN_asset_database_doc_20160224.doc" located in this file.

    Some of the source data used in the compilation of this dataset is restricted.

    The public version of this asset database can be accessed via the following dataset: Asset database for the Hunter subregion on 24 February 2016 Public 20170112 v02 (https://data.gov.au/data/dataset/9d16592c-543b-42d9-a1f4-0f6d70b9ffe7)

    Dataset History

    OBJECTID VersionID Notes Date_

    1 1 Initial database. 29/08/2014

    3 1.1 Update the classification for seven identical assets from Gloucester subregion 16/09/2014

    4 1.2 Added in NSW GDEs from Hunter - Central Rivers GDE mapping from NSW DPI (50 635 polygons). 28/01/2015

    5 1.3 New AIDs assiged to NSW GDE assets (Existing AID + 20000) to avoid duplication of AIDs assigned in other databases. 12/02/2015

    6 1.4 "(1) Add 20 additional datasets required by HUN assessment project team after HUN community workshop

           (2) Turn off previous GW point assets (AIDs from 7717-7810 inclusive) 
    
           (3) Turn off new GW point asset (AID: 0)
    
           (4) Assets (AIDs: 8023-8026) are duplicated to 4 assets (AID: 4747,4745,4744,4743 respectively) in NAM subregion . Their AID, Asset Name, Group, SubGroup, Depth, Source, ListDate and Geometry are using   
    
             values from that NAM assets.
    
          (5) Asset (AID 8595) is duplicated to 1 asset ( AID 57) in GLO subregion . Its AID, Asset Name, Group, SubGroup, Depth, Source, ListDate and Geometry are using values from that GLO assets.
    
          (6) 39 assets (AID from 2969 to 5040) are from NAM Asset database and their attributes were updated to use the latest attributes from NAM asset database 
    
         (7)The databases, especially spatial  database, were changed such as duplicated attributes fields in spatial data were removed and only ID field is kept. The user needs to join the Table Assetlist or Elementlist to 
    
            the spatial data"  16/06/2015
    

    7 2 "(1) Updated 131 new GW point assets with previous AID and some of them may include different element number due to the change of 77 FTypes requested by Hunter assessment project team

          (2) Added 104 EPBC assets, which were assessed and excluded by ERIN
    
          (3) Merged 30 Darling Hardyhead assets to one (asset AID 60140) and deleted another 29 
    
          (4) Turned off 5 assets from community workshop (60358 - 60362) as they are duplicated to 5 assets from 104 EPBC excluded assets
    
         (5) Updated M2 test results
    
         (6) Asset Names (AID: 4743 and 4747) were changed as requested by Hunter assessment project team (4 lower cases to 4 upper case only). Those two assets are from Namoi asset database and their asset names 
    
           may not match with original names in Namoi asset database.
    
         (7)One NSW WSP asset (AID: 60814) was added in as requested by Hunter assessment project team. The process method (without considering 1:M relation) for this asset is not robust and is different to other NSW 
    
          WSP assets. It should NOT use for other subregions. 
    
         (8) Queries of Find_All_Used_Assets and Find_All_WD_Assets in the asset database can be used to extract all used assts and all water dependant assts" 20/07/2015
    

    8 2.1 "(1) There are following six assets (in Hun subregion), which is same as 6 assets in GIP subregion. Their AID, Asset Name, Group, SubGroup, Depth, Source and ListDate are using values from GIP assets. You will

             not see AIDs from AID_from_HUN in whole HUN asset datable and spreadsheet anymore and you only can see AIDs from AID_from_GIP ( Actually (a) AID 11636 is GIP got from MBC (B) only AID, Asset Name 
    
             and ListDate are different and changed)
    
          (2) For BA-NSB-HUN-130-WaterDependentAssetRegister-AssetList-V20150827.xlsx, (a) Extracted long ( >255 characters) WD rationale for 19 assets (AIDs:  
    
             8682,9065,9073,9087,9088,9100,9102,9103,60000,60001,60792,60793,60801,60713,60739,60751,60764,60774,60812 ) in tab "Water-dependent asset register" and 37 assets (AIDs: 
    
             5040,8651,8677,8682,8650,8686,8687,8718,8762,9094,9065,9067,9073,9077,9081,9086,9087,9088,9100,9102,9103,60000,60001,60739,60742,60751,60713,60764,60771,
    
             60774,60792,60793,60798,60801,60809,60811,60812) in tab "Asset list" in 1.30 Excel file (b) recreated draft BA-NSB-HUN-130-WaterDependentAssetRegister-AssetList-V20150827.xlsx 
    
          (3) Modified queries (Find_All_Asset_List and Find_Waterdependent_asset_register) for (2)(a)"  27/08/2015
    

    9 2.2 "(1) Updated M2 results from the internal review for 386 Sociocultural assets

          (2)Updated the class to Ecological/Vegetation/Habitat (potential species distribution) for assets/elements from sources of WAIT_ALA_ERIN, NSW_TSEC, NSW_DPI_Fisheries_DarlingHardyhead"  8/09/2015
    

    10 2.3 "(1) Updated M2 results from the internal review

               \* Changed "Assessment team do not say No" to "All economic assets are by definition water dependent"
    
              \* Changed "Assessment team say No" : to "These are water dependent, but excluded by the project team based on intersection with the PAE is negligible"
    
              \* Changed "Rivertyles" to "RiverStyles""  22/09/2015
    

    11 2.4 "(1) Updated M2 test results for 86 assets from the external review

          (2) Updated asset names for two assets (AID: 8642 and 8643) required from the external review
    
          (3) Created Draft Water Dependent Asset Register file using the template V5"  20/11/2015
    

    12 2.5 "Total number of registered water assets was increased by 1 (= +2-1) due to:

                  Two assets changed M2 test from "No" to "Yes" , but one asset assets changed M2 test from "Yes" to "No" 
    
                 from the review done by Ecologist group." 24/02/2016
    

    Dataset Citation

    Bioregional Assessment Programme (2015) Asset database for the Hunter subregion on 24 February 2016. Bioregional Assessment Derived Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/a39290ac-3925-4abc-9ecb-b91e911f008f.

    Dataset Ancestors

    *

  18. c

    Alpine gridded monthly precipitation data since 1871 derived from in-situ...

    • cds.climate.copernicus.eu
    netcdf
    Updated Nov 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ECMWF (2024). Alpine gridded monthly precipitation data since 1871 derived from in-situ observations [Dataset]. http://doi.org/10.24381/cds.6a6d1bc3
    Explore at:
    netcdfAvailable download formats
    Dataset updated
    Nov 12, 2024
    Dataset authored and provided by
    ECMWF
    License

    https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdf

    Time period covered
    Jan 1, 1871
    Description

    This dataset, also known as the Long-term Alpine Precipitation Reconstruction (LAPrec), provides gridded fields of monthly precipitation for the Alpine region (eight countries). The dataset is derived from station observations and is provided in two issues:

    LAPrec1871 starts in 1871 and is based on data from 85 input series; LAPrec1901 starts in 1901 and is based on data from 165 input series.

    This allows user flexibility in terms of requirements defined by temporal extent or spatial accuracy. LAPrec was constructed to satisfy high climatological standards, such as temporal consistency and the realistic reproduction of spatial patterns in complex terrain. As the dataset covers over one-hundred years in temporal extent, it is a qualified basis for historical climate analysis in a mountain region that is highly affected by climate change. The production of LAPrec combines two data sources:

    HISTALP (Historical Instrumental Climatological Surface Time Series of the Greater Alpine Region) offers homogenised station series of monthly precipitation reaching back into the 19th century.

    APGD (Alpine Precipitation Grid Dataset) provides daily precipitation gridded data for the period 1971–2008 built from more than 8500 rain gauges.

    The adopted reconstruction method, Reduced Space Optimal Interpolation (RSOI), establishes a linear model between station and grid data, calibrated over the period when both are available. RSOI involves a Principal Component Analysis (PCA) of the high-resolution grid data, followed by an Optimal Interpolation (OI) using the long-term station data. The LAPrec dataset is updated on a two-year basis, by no later than the end of February each second year. The latest version of the dataset will extend until the end of the year before its release date. LAPrec has been developed in the framework of the Copernicus Climate Change Service in a collaboration between the national meteorological services of Switzerland (MeteoSwiss, Federal Office of Meteorology and Climatology) and Austria (ZAMG, Zentralanstalt für Meteorologie und Geodynamik). For more information on input data, methodical construction, applicability, versioning and data access, see the product user guide in the Documentation tab. The latest version of the dataset will temporally extend until the end of the year before its release date.

  19. Meta data and supporting documentation

    • catalog.data.gov
    Updated Nov 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Meta data and supporting documentation [Dataset]. https://catalog.data.gov/dataset/meta-data-and-supporting-documentation
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    We include a description of the data sets in the meta-data as well as sample code and results from a simulated data set. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The R code is available on line here: https://github.com/warrenjl/SpGPCW. Format: Abstract The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publicly available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. File format: R workspace file. Metadata (including data dictionary) • y: Vector of binary responses (1: preterm birth, 0: control) • x: Matrix of covariates; one row for each simulated individual • z: Matrix of standardized pollution exposures • n: Number of simulated individuals • m: Number of exposure time periods (e.g., weeks of pregnancy) • p: Number of columns in the covariate design matrix • alpha_true: Vector of “true” critical window locations/magnitudes (i.e., the ground truth that we want to estimate). This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

  20. Deck41 Surficial Seafloor Sediment Description Database

    • catalog.data.gov
    • data.cnra.ca.gov
    • +3more
    Updated Oct 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NOAA National Centers for Environmental Information (Point of Contact) (2024). Deck41 Surficial Seafloor Sediment Description Database [Dataset]. https://catalog.data.gov/dataset/deck41-surficial-seafloor-sediment-description-database1
    Explore at:
    Dataset updated
    Oct 18, 2024
    Dataset provided by
    National Oceanic and Atmospheric Administrationhttp://www.noaa.gov/
    National Centers for Environmental Informationhttps://www.ncei.noaa.gov/
    Description

    Deck41 is a digital summary of surficial sediment composition for 36,401 seafloor samples worldwide. Data include collecting source, ship, cruise, sample id, latitude/longitude, date of collection, water depth, sampling device, dominant lithology, secondary lithology, and a brief description of the surficial sediment at the location. Descriptions were abstracted largely from unpublished laboratory reports and core log sheets contributed to the National Oceanographic Data Center prior to 1975. Descriptions were assigned by Ms. Susie Bershad,and Dr. Martin Weiss of the Marine Geology and Geophysics Branch of NODC, which was transferred to the National Geophysical Data Center (NGDC) in 1976, at which time compilation ceased. Data are free for online search and download.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ferris, Catherine (2024). National Open Access Monitor Survey: Defining Requirements: Responses Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_7821825

National Open Access Monitor Survey: Defining Requirements: Responses Dataset

Explore at:
Dataset updated
Jul 12, 2024
Dataset authored and provided by
Ferris, Catherine
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains the response data from the ‘National Open Access Monitor Survey: Defining Requirements’ which was carried out between February 2-28th 2023 under the National Open Access Monitor Project: https://doi.org/10.5281/zenodo.7588787

To note:

Email addresses have been redacted.

Responses have been pseudonymised to the level of stakeholder-group e.g. Contributor 1, Research Funding Organisation #, where requested by the participant in the participant consent form: https://doi.org/10.5281/zenodo.7589770.

Detail within responses by individuals who requested pseudonymisation, which might identify the individual or the organisation has been removed and marked "[redacted]".

Responses by individuals who did not provide a signed consent form in advance of contributing to the survey were deleted immediately on receipt, as advised in the survey text: https://doi.org/10.5281/zenodo.7588787. Such responses are not included in these files.

There are five files within in this dataset:

  • Results.NationalOpenAccessMonitorSurvey.DefiningRequirements.README - this PDF details the changes made to the raw data, as specified in the bullet points above and a description of the files within the dataset.

  • Results.NationalOpenAccessMonitorSurvey.DefiningRequirements.Pseudonymised.100323 - this is the original raw data, in csv format, as downloaded from the Online Surveys platform and subsequently pseudonymised and redacted.

  • NationalOpenAccessMonitor.DefiningRequirements.Contributor3Government1.270223 - this PDF file contains a single pseudonymised stakeholder response to the survey, obtained by the Project Manager by email and not through the Online Surveys platform.

  • NationalOpenAccessMonitorSurvey.MasterChangeFile.080323 - this is a change file, in csv format, which documents the changes which participants requested to be made to their submissions after they were received and the survey was closed.

  • Results.NationalOpenAccessMonitorSurvey.DefiningRequirements.Pseudonymised.Updated.100323 - this is the original raw data, pseudonymised and redacted, with the requested changes implemented. This is the file on which analysis was carried out for the purposes of the survey outcome report and the resulting National Open Access Monitor tender documents.

The context for the survey files is detailed in the National Open Access Monitor Project Plan: https://doi.org/10.5281/zenodo.7331431

This project is managed by IReL and has received funding from Ireland’s National Open Research Forum under the NORF Open Research Fund. https://norf.ie/funding/ https://norf.ie/orf-projects-announcement/

Search
Clear search
Close search
Google apps
Main menu