100+ datasets found
  1. Data from: Current and projected research data storage needs of Agricultural...

    • catalog.data.gov
    • agdatacommons.nal.usda.gov
    • +2more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

  2. n

    FOI-01017 - Datasets - Open Data Portal

    • opendata.nhsbsa.net
    Updated Mar 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). FOI-01017 - Datasets - Open Data Portal [Dataset]. https://opendata.nhsbsa.net/dataset/foi-01017
    Explore at:
    Dataset updated
    Mar 30, 2023
    Description

    CSVs with more than 1 million rows can be viewed using add-ons to existing software, such as the Microsoft PowerPivot add-on for Excel, to handle larger data sets. The Microsoft PowerPivot add-on for Excel is available using the link in the 'Related Links' section below. Once PowerPivot has been installed, to load the large files, please follow the instructions below. Note that it may take at least 20 to 30 minutes to load one monthly file. Start Excel as normal

  3. m

    Global Burden of Disease analysis dataset of noncommunicable disease...

    • data.mendeley.com
    Updated Apr 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Cundiff (2023). Global Burden of Disease analysis dataset of noncommunicable disease outcomes, risk factors, and SAS codes [Dataset]. http://doi.org/10.17632/g6b39zxck4.10
    Explore at:
    Dataset updated
    Apr 6, 2023
    Authors
    David Cundiff
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This formatted dataset (AnalysisDatabaseGBD) originates from raw data files from the Institute of Health Metrics and Evaluation (IHME) Global Burden of Disease Study (GBD2017) affiliated with the University of Washington. We are volunteer collaborators with IHME and not employed by IHME or the University of Washington.

    The population weighted GBD2017 data are on male and female cohorts ages 15-69 years including noncommunicable diseases (NCDs), body mass index (BMI), cardiovascular disease (CVD), and other health outcomes and associated dietary, metabolic, and other risk factors. The purpose of creating this population-weighted, formatted database is to explore the univariate and multiple regression correlations of health outcomes with risk factors. Our research hypothesis is that we can successfully model NCDs, BMI, CVD, and other health outcomes with their attributable risks.

    These Global Burden of disease data relate to the preprint: The EAT-Lancet Commission Planetary Health Diet compared with Institute of Health Metrics and Evaluation Global Burden of Disease Ecological Data Analysis. The data include the following: 1. Analysis database of population weighted GBD2017 data that includes over 40 health risk factors, noncommunicable disease deaths/100k/year of male and female cohorts ages 15-69 years from 195 countries (the primary outcome variable that includes over 100 types of noncommunicable diseases) and over 20 individual noncommunicable diseases (e.g., ischemic heart disease, colon cancer, etc). 2. A text file to import the analysis database into SAS 3. The SAS code to format the analysis database to be used for analytics 4. SAS code for deriving Tables 1, 2, 3 and Supplementary Tables 5 and 6 5. SAS code for deriving the multiple regression formula in Table 4. 6. SAS code for deriving the multiple regression formula in Table 5 7. SAS code for deriving the multiple regression formula in Supplementary Table 7
    8. SAS code for deriving the multiple regression formula in Supplementary Table 8 9. The Excel files that accompanied the above SAS code to produce the tables

    For questions, please email davidkcundiff@gmail.com. Thanks.

  4. D

    Dataset Alerts - Open and Monitoring

    • datasf.org
    • data.sfgov.org
    • +1more
    application/rdfxml +5
    Updated Jun 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Dataset Alerts - Open and Monitoring [Dataset]. https://datasf.org/opendata/
    Explore at:
    json, application/rssxml, csv, tsv, xml, application/rdfxmlAvailable download formats
    Dataset updated
    Jun 20, 2025
    License

    ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
    License information was derived automatically

    Description

    A log of dataset alerts open, monitored or resolved on the open data portal. Alerts can include issues as well as deprecation or discontinuation notices.

  5. COVID-19 Case Surveillance Public Use Data

    • data.cdc.gov
    • opendatalab.com
    • +5more
    application/rdfxml +5
    Updated Jul 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CDC Data, Analytics and Visualization Task Force (2024). COVID-19 Case Surveillance Public Use Data [Dataset]. https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data/vbim-akqf
    Explore at:
    application/rdfxml, tsv, csv, json, xml, application/rssxmlAvailable download formats
    Dataset updated
    Jul 9, 2024
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    Authors
    CDC Data, Analytics and Visualization Task Force
    License

    https://www.usa.gov/government-workshttps://www.usa.gov/government-works

    Description

    Note: Reporting of new COVID-19 Case Surveillance data will be discontinued July 1, 2024, to align with the process of removing SARS-CoV-2 infections (COVID-19 cases) from the list of nationally notifiable diseases. Although these data will continue to be publicly available, the dataset will no longer be updated.

    Authorizations to collect certain public health data expired at the end of the U.S. public health emergency declaration on May 11, 2023. The following jurisdictions discontinued COVID-19 case notifications to CDC: Iowa (11/8/21), Kansas (5/12/23), Kentucky (1/1/24), Louisiana (10/31/23), New Hampshire (5/23/23), and Oklahoma (5/2/23). Please note that these jurisdictions will not routinely send new case data after the dates indicated. As of 7/13/23, case notifications from Oregon will only include pediatric cases resulting in death.

    This case surveillance public use dataset has 12 elements for all COVID-19 cases shared with CDC and includes demographics, any exposure history, disease severity indicators and outcomes, presence of any underlying medical conditions and risk behaviors, and no geographic data.

    CDC has three COVID-19 case surveillance datasets:

    The following apply to all three datasets:

    Overview

    The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020, to clarify the interpretation of antigen detection tests and serologic test results within the case classification (Interim-20-ID-02). The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and reported voluntarily to CDC.

    For more information: NNDSS Supports the COVID-19 Response | CDC.

    The deidentified data in the “COVID-19 Case Surveillance Public Use Data” include demographic characteristics, any exposure history, disease severity indicators and outcomes, clinical data, laboratory diagnostic test results, and presence of any underlying medical conditions and risk behaviors. All data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf.

    COVID-19 Case Reports

    COVID-19 case reports have been routinely submitted using nationally standardized case reporting forms. On April 5, 2020, CSTE released an Interim Position Statement with national surveillance case definitions for COVID-19 included. Current versions of these case definitions are available here: https://ndc.services.cdc.gov/case-definitions/coronavirus-disease-2019-2021/.

    All cases reported on or after were requested to be shared by public health departments to CDC using the standardized case definitions for laboratory-confirmed or probable cases. On May 5, 2020, the standardized case reporting form was revised. Case reporting using this new form is ongoing among U.S. states and territories.

    Data are Considered Provisional

    • The COVID-19 case surveillance data are dynamic; case reports can be modified at any time by the jurisdictions sharing COVID-19 data with CDC. CDC may update prior cases shared with CDC based on any updated information from jurisdictions. For instance, as new information is gathered about previously reported cases, health departments provide updated data to CDC. As more information and data become available, analyses might find changes in surveillance data and trends during a previously reported time window. Data may also be shared late with CDC due to the volume of COVID-19 cases.
    • Annual finalized data: To create the final NNDSS data used in the annual tables, CDC works carefully with the reporting jurisdictions to reconcile the data received during the year until each state or territorial epidemiologist confirms that the data from their area are correct.
    • Access Addressing Gaps in Public Health Reporting of Race and Ethnicity for COVID-19, a report from the Council of State and Territorial Epidemiologists, to better understand the challenges in completing race and ethnicity data for COVID-19 and recommendations for improvement.

    Data Limitations

    To learn more about the limitations in using case surveillance data, visit FAQ: COVID-19 Data and Surveillance.

    Data Quality Assurance Procedures

    CDC’s Case Surveillance Section routinely performs data quality assurance procedures (i.e., ongoing corrections and logic checks to address data errors). To date, the following data cleaning steps have been implemented:

    • Questions that have been left unanswered (blank) on the case report form are reclassified to a Missing value, if applicable to the question. For example, in the question “Was the individual hospitalized?” where the possible answer choices include “Yes,” “No,” or “Unknown,” the blank value is recoded to Missing because the case report form did not include a response to the question.
    • Logic checks are performed for date data. If an illogical date has been provided, CDC reviews the data with the reporting jurisdiction. For example, if a symptom onset date in the future is reported to CDC, this value is set to null until the reporting jurisdiction updates the date appropriately.
    • Additional data quality processing to recode free text data is ongoing. Data on symptoms, race and ethnicity, and healthcare worker status have been prioritized.

    Data Suppression

    To prevent release of data that could be used to identify people, data cells are suppressed for low frequency (<5) records and indirect identifiers (e.g., date of first positive specimen). Suppression includes rare combinations of demographic characteristics (sex, age group, race/ethnicity). Suppressed values are re-coded to the NA answer option; records with data suppression are never removed.

    For questions, please contact Ask SRRG (eocevent394@cdc.gov).

    Additional COVID-19 Data

    COVID-19 data are available to the public as summary or aggregate count files, including total counts of cases and deaths by state and by county. These

  6. Ten-year data tables by province, industry and substance – releases

    • open.canada.ca
    • ouvert.canada.ca
    csv, html
    Updated Jul 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Environment and Climate Change Canada (2025). Ten-year data tables by province, industry and substance – releases [Dataset]. https://open.canada.ca/data/en/dataset/ea0dc8ae-d93c-4e24-9f61-946f1736a26f
    Explore at:
    html, csvAvailable download formats
    Dataset updated
    Jul 15, 2025
    Dataset provided by
    Environment And Climate Change Canadahttps://www.canada.ca/en/environment-climate-change.html
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Time period covered
    Jan 1, 2014 - Dec 31, 2023
    Description

    The National Pollutant Release Inventory (NPRI) is Canada's public inventory of pollutant releases (to air, water and land), disposals and transfers for recycling. Each file contains annual total releases for the past ten years by media (air, water or land), broken-down by province, industry or substance. Files are in .CSV format. The results can be further broken down using the pre-defined search available at the bottom of the NPRI Data Search webpage. The results returned by the NPRI search engine may differ from the numbers contained in the downloadable files. The online search engine’s results will display releases, disposals and transfers reported by facilities, but does not distinguish between media type (i.e. air, water, land). It also displays facilities reporting only under Ontario Regulation 127/01 and facilities submitting “did not meet criteria” reports. Please consult the following resources to enhance your analysis: - Guide on using and Interpreting NPRI Data: https://www.canada.ca/en/environment-climate-change/services/national-pollutant-release-inventory/using-interpreting-data.html - Access additional data from the NPRI, including datasets and mapping products: https://www.canada.ca/en/environment-climate-change/services/national-pollutant-release-inventory/tools-resources-data/exploredata.html Supplemental Information More NPRI datasets and mapping products are available here: https://www.canada.ca/en/environment-climate-change/services/national-pollutant-release-inventory/tools-resources-data/access.html Supporting Projects: National Pollutant Release Inventory (NPRI)

  7. DCR 2023 Supplemental Excel Files

    • data.cnra.ca.gov
    • data.ca.gov
    • +1more
    zip
    Updated Jul 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Water Resources (2024). DCR 2023 Supplemental Excel Files [Dataset]. https://data.cnra.ca.gov/dataset/dcr-2023-supplemental-excel-files
    Explore at:
    zip(225202955)Available download formats
    Dataset updated
    Jul 31, 2024
    Dataset authored and provided by
    California Department of Water Resourceshttp://www.water.ca.gov/
    Description

    DCR 2023 Supplemental Excel Files

  8. o

    Messy data for data cleaning exercise - Dataset - openAFRICA

    • open.africa
    Updated Oct 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Messy data for data cleaning exercise - Dataset - openAFRICA [Dataset]. https://open.africa/dataset/messy-data-for-data-cleaning-exercise
    Explore at:
    Dataset updated
    Oct 6, 2021
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A messy data for demonstrating "how to clean data using spreadsheet". This dataset was intentionally formatted to be messy, for the purpose of demonstration. It was collated from here - https://openafrica.net/dataset/historic-and-projected-rainfall-and-runoff-for-4-lake-victoria-sub-regions

  9. FOI-01727 - Datasets - Open Data Portal

    • opendata.nhsbsa.net
    Updated Feb 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nhsbsa.net (2024). FOI-01727 - Datasets - Open Data Portal [Dataset]. https://opendata.nhsbsa.net/dataset/foi-01727
    Explore at:
    Dataset updated
    Feb 29, 2024
    Dataset provided by
    NHS Business Services Authority
    Description

    The number of COVID vaccinations carried out and payments made for these vaccinations to individual pharmacies, listed by their ODS code and with full postal address details. Could you provide the data for the month of January 2024 in EXCEL format please. The data should be: Column1--Administration Month Column2--ODS Code Column3--Pharmacy Name Column4--Pharmacy Trading Name Column5--Pharmacy Address Column6--Pharmacy Post Code Column7--Number of Vaccinations Claimed Column8--Number of Vaccinations Paid Column9--Payment Amount GB Response A copy of the information is attached. The NHSBSA calculates payments for Covid-19 vaccinations to Pharmacies and Primary Care Network (PCN) providers in England. Covid-19 vaccination data is keyed in via Point of Care (POC) Systems and they are transferred to the NHSBSA Manage Your Service (MYS) application. Each month, vaccine providers submit claims to request payment based on the data that has been transferred into MYS. To be paid in a timely fashion, such claims must be submitted during a specified declaration submission period. Should claims be submitted outside of the submission period, they will be processed in the following period. This means that in some cases, there is a difference between the number of vaccines that have been 'claimed' and the number that have been 'paid'. Both the number of 'claimed' and 'paid' vaccinations have been reported in this request. When considering the nature of the vaccine data, there are several ways it can be reported over time: Administration Month - This is the month in which the vaccine was administered to the patient. Payment Month - This is the month in which the payment was made to the vaccine dispenser. Note that all payments for Pharmacies are paid one month later than those for PCN providers. Keying Month - This is the month in which the vaccine record first appeared on the MYS system. Submission/Claim Month - This is the month in which the claim for payment for a vaccination occurred. For example, suppose that a PCN patient is given a Covid-19 vaccination dose 1 in January (Administration Month) and then the paper record of this is misplaced for a while. The record is found and keyed into a POC system during February (Keying Month). The Provider is allowed to claim for keying during February in the first five days of March, but they're slightly late and authorise the claim on 7 March (Submission Month). As the claim is outside the submission window, it is not paid in March, it will instead be processed during April (Payment Month). Another example could be a Pharmacy patient is given a Covid-19 vaccination dose 1 in January (Administration Month), keyed in January (Keying Month), then submitted in February (Submission Month) and then payments are calculated in February, however as this is for a pharmacy, the payments are held back and not paid until March (Payment Month). For the purposes of this request, we have chosen to report by Administration Month. Data included in this request is limited to vaccinations carried out by Pharmacies only. Data included in this request is also limited to vaccinations administered in January 2024. The latest data used is a snapshot of the MYS system data that was taken on 6 February 2024. This is the snapshot of data taken after the January 2024 submission period that was used to calculate payments. Pharmacy name and address are as held at this date. This payment data does not include any adjustments made by NHSBSA Provider Assurance as part of post-payment verification exercises. These adjustments are made at account level and may relate to several months of activity. Payment data includes payments made and those scheduled for payment in the future. Payments comprise an Item of Service (IoS) fee and potentially a supplementary fee. Payments do not relate to the value of the drugs dispensed. The total used for the payment calculation may not match the totals shown in 'live' POC systems or MYS that continue to receive updates after the snapshot used to calculate payments was taken. Vaccination records are limited to those which have been associated with a declaration submission. This may include late submission declarations received after the deadline for declarations such records are not processed until the next month. Please note that some vaccinations attract a supplementary fee, so it is not possible to determine the number of vaccinations by dividing the total paid by the basic IoS fee. It is possible for new records from old administration months to be entered in the future, thus the totals here for each administration month could change when more data is processed. Please note that this request and our response is published on our Freedom of Information disclosure log at: https://opendata.nhsbsa.net/dataset/foi-01727

  10. d

    Data from: Public Airports

    • catalog.data.gov
    • gstore.unm.edu
    Updated Dec 2, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Earth Data Analysis Center (Point of Contact) (2020). Public Airports [Dataset]. https://catalog.data.gov/dataset/public-airports
    Explore at:
    Dataset updated
    Dec 2, 2020
    Dataset provided by
    Earth Data Analysis Center (Point of Contact)
    Description

    This data set is a vector point digital data structure that contains the locations of General Public Use Airports in the State of New Mexico. It only contains those airports that the New Mexico State Highway and Transportation Department inspects. The point represents the center of the land owned by the airport.

  11. MHS Dashboard Adult Demographic Datasets

    • data.chhs.ca.gov
    • data.ca.gov
    • +3more
    csv, zip
    Updated Aug 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health Care Services (2024). MHS Dashboard Adult Demographic Datasets [Dataset]. https://data.chhs.ca.gov/dataset/adult-ab470-datasets
    Explore at:
    csv(512596), csv(41051287), csv(402564), csv(1254239), csv(297746), csv(30400), csv(11278), csv(499193), csv(1637878), csv(1916691), csv(22469442), csv(3515490), csv(142267), zip, csv(54553345), csv(197911), csv(1135935), csv(42734), csv(1699554), csv(45828311), csv(451612)Available download formats
    Dataset updated
    Aug 28, 2024
    Dataset provided by
    California Department of Health Care Serviceshttp://www.dhcs.ca.gov/
    Authors
    Department of Health Care Services
    Description

    The following datasets are based on the adult (age 21 and over) beneficiary population and consist of aggregate MHS data derived from Medi-Cal claims, encounter, and eligibility systems. These datasets were developed in accordance with California Welfare and Institutions Code (WIC) § 14707.5 (added as part of Assembly Bill 470 on 10/7/17). Please contact BHData@dhcs.ca.gov for any questions or to request previous years’ versions of these datasets. Note: The Performance Dashboard AB 470 Report Application Excel tool development has been discontinued. Please see the Behavioral Health reporting data hub at https://behavioralhealth-data.dhcs.ca.gov/ for access to dashboards utilizing these datasets and other behavioral health data.

  12. Z

    Dataset: A Systematic Literature Review on the topic of High-value datasets

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Magdalena Ciesielska (2023). Dataset: A Systematic Literature Review on the topic of High-value datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7944424
    Explore at:
    Dataset updated
    Jun 23, 2023
    Dataset provided by
    Anastasija Nikiforova
    Charalampos Alexopoulos
    Nina Rizun
    Magdalena Ciesielska
    Andrea Miletič
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains data collected during a study ("Towards High-Value Datasets determination for data-driven development: a systematic literature review") conducted by Anastasija Nikiforova (University of Tartu), Nina Rizun, Magdalena Ciesielska (Gdańsk University of Technology), Charalampos Alexopoulos (University of the Aegean) and Andrea Miletič (University of Zagreb) It being made public both to act as supplementary data for "Towards High-Value Datasets determination for data-driven development: a systematic literature review" paper (pre-print is available in Open Access here -> https://arxiv.org/abs/2305.10234) and in order for other researchers to use these data in their own work.

    The protocol is intended for the Systematic Literature review on the topic of High-value Datasets with the aim to gather information on how the topic of High-value datasets (HVD) and their determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks. The data in this dataset were collected in the result of the SLR over Scopus, Web of Science, and Digital Government Research library (DGRL) in 2023.

    Methodology

    To understand how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, all relevant literature covering this topic has been studied. To this end, the SLR was carried out to by searching digital libraries covered by Scopus, Web of Science (WoS), Digital Government Research library (DGRL).

    These databases were queried for keywords ("open data" OR "open government data") AND ("high-value data*" OR "high value data*"), which were applied to the article title, keywords, and abstract to limit the number of papers to those, where these objects were primary research objects rather than mentioned in the body, e.g., as a future work. After deduplication, 11 articles were found unique and were further checked for relevance. As a result, a total of 9 articles were further examined. Each study was independently examined by at least two authors.

    To attain the objective of our study, we developed the protocol, where the information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information.

    Test procedure Each study was independently examined by at least two authors, where after the in-depth examination of the full-text of the article, the structured protocol has been filled for each study. The structure of the survey is available in the supplementary file available (see Protocol_HVD_SLR.odt, Protocol_HVD_SLR.docx) The data collected for each study by two researchers were then synthesized in one final version by the third researcher.

    Description of the data in this data set

    Protocol_HVD_SLR provides the structure of the protocol Spreadsheets #1 provides the filled protocol for relevant studies. Spreadsheet#2 provides the list of results after the search over three indexing databases, i.e. before filtering out irrelevant studies

    The information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information

    Descriptive information
    1) Article number - a study number, corresponding to the study number assigned in an Excel worksheet 2) Complete reference - the complete source information to refer to the study 3) Year of publication - the year in which the study was published 4) Journal article / conference paper / book chapter - the type of the paper -{journal article, conference paper, book chapter} 5) DOI / Website- a link to the website where the study can be found 6) Number of citations - the number of citations of the article in Google Scholar, Scopus, Web of Science 7) Availability in OA - availability of an article in the Open Access 8) Keywords - keywords of the paper as indicated by the authors 9) Relevance for this study - what is the relevance level of the article for this study? {high / medium / low}

    Approach- and research design-related information 10) Objective / RQ - the research objective / aim, established research questions 11) Research method (including unit of analysis) - the methods used to collect data, including the unit of analy-sis (country, organisation, specific unit that has been ana-lysed, e.g., the number of use-cases, scope of the SLR etc.) 12) Contributions - the contributions of the study 13) Method - whether the study uses a qualitative, quantitative, or mixed methods approach? 14) Availability of the underlying research data- whether there is a reference to the publicly available underly-ing research data e.g., transcriptions of interviews, collected data, or explanation why these data are not shared? 15) Period under investigation - period (or moment) in which the study was conducted 16) Use of theory / theoretical concepts / approaches - does the study mention any theory / theoretical concepts / approaches? If any theory is mentioned, how is theory used in the study?

    Quality- and relevance- related information
    17) Quality concerns - whether there are any quality concerns (e.g., limited infor-mation about the research methods used)? 18) Primary research object - is the HVD a primary research object in the study? (primary - the paper is focused around the HVD determination, sec-ondary - mentioned but not studied (e.g., as part of discus-sion, future work etc.))

    HVD determination-related information
    19) HVD definition and type of value - how is the HVD defined in the article and / or any other equivalent term? 20) HVD indicators - what are the indicators to identify HVD? How were they identified? (components & relationships, “input -> output") 21) A framework for HVD determination - is there a framework presented for HVD identification? What components does it consist of and what are the rela-tionships between these components? (detailed description) 22) Stakeholders and their roles - what stakeholders or actors does HVD determination in-volve? What are their roles? 23) Data - what data do HVD cover? 24) Level (if relevant) - what is the level of the HVD determination covered in the article? (e.g., city, regional, national, international)

    Format of the file .xls, .csv (for the first spreadsheet only), .odt, .docx

    Licenses or restrictions CC-BY

    For more info, see README.txt

  13. 18 excel spreadsheets by species and year giving reproduction and growth...

    • catalog.data.gov
    • data.wu.ac.at
    Updated Aug 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2024). 18 excel spreadsheets by species and year giving reproduction and growth data. One excel spreadsheet of herbicide treatment chemistry. [Dataset]. https://catalog.data.gov/dataset/18-excel-spreadsheets-by-species-and-year-giving-reproduction-and-growth-data-one-excel-sp
    Explore at:
    Dataset updated
    Aug 17, 2024
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Excel spreadsheets by species (4 letter code is abbreviation for genus and species used in study, year 2010 or 2011 is year data collected, SH indicates data for Science Hub, date is date of file preparation). The data in a file are described in a read me file which is the first worksheet in each file. Each row in a species spreadsheet is for one plot (plant). The data themselves are in the data worksheet. One file includes a read me description of the column in the date set for chemical analysis. In this file one row is an herbicide treatment and sample for chemical analysis (if taken). This dataset is associated with the following publication: Olszyk , D., T. Pfleeger, T. Shiroyama, M. Blakely-Smith, E. Lee , and M. Plocher. Plant reproduction is altered by simulated herbicide drift toconstructed plant communities. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY. Society of Environmental Toxicology and Chemistry, Pensacola, FL, USA, 36(10): 2799-2813, (2017).

  14. MHS Dashboard Children and Youth Demographic Datasets

    • data.chhs.ca.gov
    • data.ca.gov
    • +1more
    csv, zip
    Updated Aug 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health Care Services (2024). MHS Dashboard Children and Youth Demographic Datasets [Dataset]. https://data.chhs.ca.gov/dataset/child-youth-ab470-datasets
    Explore at:
    csv(32085), csv(268395), csv(35041649), csv(270327), csv(191127), csv(44757018), csv(430905), csv(374496), csv(1358269), csv(2298761), csv(18869990), csv(116973), csv(11599), csv(1324593), csv(1396290), csv(998465), csv(31283542), csv(43150), csv(1072808), csv(461467), zipAvailable download formats
    Dataset updated
    Aug 28, 2024
    Dataset provided by
    California Department of Health Care Serviceshttp://www.dhcs.ca.gov/
    Authors
    Department of Health Care Services
    Description

    The following datasets are based on the children and youth (under age 21) beneficiary population and consist of aggregate Mental Health Service data derived from Medi-Cal claims, encounter, and eligibility systems. These datasets were developed in accordance with California Welfare and Institutions Code (WIC) § 14707.5 (added as part of Assembly Bill 470 on 10/7/17). Please contact BHData@dhcs.ca.gov for any questions or to request previous years’ versions of these datasets. Note: The Performance Dashboard AB 470 Report Application Excel tool development has been discontinued. Please see the Behavioral Health reporting data hub at https://behavioralhealth-data.dhcs.ca.gov/ for access to dashboards utilizing these datasets and other behavioral health data.

  15. f

    Achieving minimum standards for Infection Prevention and Control in Sierra...

    • figshare.com
    xlsx
    Updated Feb 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bobson Fofanah (2022). Achieving minimum standards for Infection Prevention and Control in Sierra Leone [Dataset]. http://doi.org/10.6084/m9.figshare.19134608.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 8, 2022
    Dataset provided by
    figshare
    Authors
    Bobson Fofanah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Sierra Leone
    Description

    This MS Excel files contains datasets and codebooks for Fofanah BD et al's Operational Research paper on AMR in Sierra Leone, Year 2022.

  16. Bulk data files for all years – releases, disposals, transfers and facility...

    • open.canada.ca
    • ouvert.canada.ca
    csv, html
    Updated Jul 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Environment and Climate Change Canada (2025). Bulk data files for all years – releases, disposals, transfers and facility locations [Dataset]. https://open.canada.ca/data/en/dataset/40e01423-7728-429c-ac9d-2954385ccdfb
    Explore at:
    csv, htmlAvailable download formats
    Dataset updated
    Jul 15, 2025
    Dataset provided by
    Environment And Climate Change Canadahttps://www.canada.ca/en/environment-climate-change.html
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Time period covered
    Jan 1, 1993 - Dec 31, 2023
    Description

    The National Pollutant Release Inventory (NPRI) is Canada's public inventory of pollutant releases (to air, water and land), disposals and transfers for recycling. Each file contains data from 1993 to the latest reporting year. These CSV format datasets are in normalized or ‘list’ format and are optimized for pivot table analyses. Here is a description of each file: - The RELEASES file contains all substance release quantities. - The DISPOSALS file contains all on-site and off-site disposal quantities, including tailings and waste rock (TWR). - The TRANSFERS file contains all quantities transferred for recycling or treatment prior to disposal. - The COMMENTS file contains all the comments provided by facilities about substances included in their report. - The GEO LOCATIONS file contains complete geographic information for all facilities that have reported to the NPRI. Please consult the following resources to enhance your analysis: - Guide on using and Interpreting NPRI Data: https://www.canada.ca/en/environment-climate-change/services/national-pollutant-release-inventory/using-interpreting-data.html - Access additional data from the NPRI, including datasets and mapping products: https://www.canada.ca/en/environment-climate-change/services/national-pollutant-release-inventory/tools-resources-data/exploredata.html Supplemental Information More NPRI datasets and mapping products are available here: https://www.canada.ca/en/environment-climate-change/services/national-pollutant-release-inventory/tools-resources-data/access.html Supporting Projects: National Pollutant Release Inventory (NPRI)

  17. N

    Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age...

    • neilsberg.com
    csv, json
    Updated Jul 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Excel, AL Age Group Population Dataset: A Complete Breakdown of Excel Age Demographics from 0 to 85 Years and Over, Distributed Across 18 Age Groups // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/aa8c95e0-4983-11ef-ae5d-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Jul 24, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Excel
    Variables measured
    Population Under 5 Years, Population over 85 years, Population Between 5 and 9 years, Population Between 10 and 14 years, Population Between 15 and 19 years, Population Between 20 and 24 years, Population Between 25 and 29 years, Population Between 30 and 34 years, Population Between 35 and 39 years, Population Between 40 and 44 years, and 9 more
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Excel population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Excel. The dataset can be utilized to understand the population distribution of Excel by age. For example, using this dataset, we can identify the largest age group in Excel.

    Key observations

    The largest age group in Excel, AL was for the group of age 45 to 49 years years with a population of 74 (15.64%), according to the ACS 2018-2022 5-Year Estimates. At the same time, the smallest age group in Excel, AL was the 85 years and over years with a population of 2 (0.42%). Source: U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates

    Age groups:

    • Under 5 years
    • 5 to 9 years
    • 10 to 14 years
    • 15 to 19 years
    • 20 to 24 years
    • 25 to 29 years
    • 30 to 34 years
    • 35 to 39 years
    • 40 to 44 years
    • 45 to 49 years
    • 50 to 54 years
    • 55 to 59 years
    • 60 to 64 years
    • 65 to 69 years
    • 70 to 74 years
    • 75 to 79 years
    • 80 to 84 years
    • 85 years and over

    Variables / Data Columns

    • Age Group: This column displays the age group in consideration
    • Population: The population for the specific age group in the Excel is shown in this column.
    • % of Total Population: This column displays the population of each age group as a proportion of Excel total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Excel Population by Age. You can refer the same here

  18. d

    Warehouse and Retail Sales

    • catalog.data.gov
    • data.montgomerycountymd.gov
    • +4more
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.montgomerycountymd.gov (2025). Warehouse and Retail Sales [Dataset]. https://catalog.data.gov/dataset/warehouse-and-retail-sales
    Explore at:
    Dataset updated
    Jul 5, 2025
    Dataset provided by
    data.montgomerycountymd.gov
    Description

    This dataset contains a list of sales and movement data by item and department appended monthly. Update Frequency : Monthly

  19. Study of Womens Health Across the Nation (SWAN) Public Use Data

    • healthdata.gov
    • data.virginia.gov
    • +3more
    application/rdfxml +5
    Updated Feb 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Study of Womens Health Across the Nation (SWAN) Public Use Data [Dataset]. https://healthdata.gov/dataset/Study-of-Womens-Health-Across-the-Nation-SWAN-Publ/2u9n-jnai
    Explore at:
    tsv, application/rssxml, csv, application/rdfxml, xml, jsonAvailable download formats
    Dataset updated
    Feb 13, 2021
    Description

    The SWAN Public Use Datasets provide access to longitudinal data describing the physical, biological, psychological, and social changes that occur during the menopausal transition. Data collected from 3,302 SWAN participants from Baseline through the 10th Annual Follow-Up visit are currently available to the public. Registered users are able to download datasets in a variety of formats, search variables and view recent publications.

  20. FOI: early years dataset as at 31 March 2016

    • gov.uk
    • s3.amazonaws.com
    Updated Jul 21, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ofsted (2021). FOI: early years dataset as at 31 March 2016 [Dataset]. https://www.gov.uk/government/statistical-data-sets/foi-early-years-dataset-as-at-31-march-2016
    Explore at:
    Dataset updated
    Jul 21, 2021
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Ofsted
    Description

    There is a requirement that public authorities, like Ofsted, must publish updated versions of datasets which are disclosed as a result of Freedom of Information requests.

    Some information which is requested is exempt from disclosure to the public under the Freedom of Information Act; it is therefore not appropriate for this information to be made available. Examples of information which it is not appropriate to make available includes the locations of women’s refuges, some military bases and all children’s homes and the personal data of providers and staff. Ofsted also considers that the names and addresses of registered childminders are their personal data which it is not appropriate to make publicly available unless those individuals have given their explicit consent to do so. This information has therefore not been included in the datasets.

    Data for both childcare and childminders are included in the excel file.

    https://assets.publishing.service.gov.uk/media/60f7f6a4d3bf7f568160edb1/FOI_early_years_dataset_as_at_31_March_2016.xlsx">FOI: early years dataset as at 31 March 2016

     <p class="gem-c-attachment_metadata"><span class="gem-c-attachment_attribute">MS Excel Spreadsheet</span>, <span class="gem-c-attachment_attribute">16.6 MB</span></p>
    
    
    
    
     <p class="gem-c-attachment_metadata">This file may not be suitable for users of assistive technology.</p>
     <details data-module="ga4-event-tracker" data-ga4-event='{"event_name":"select_content","type":"detail","text":"Request an accessible format.","section":"Request an accessible format.","index_section":1}' class="gem-c-details govuk-details govuk-!-margin-bottom-0" title="Request an accessible format.">
    

    Request an accessible format.

      If you use assistive technology (such as a screen reader) and need a version of this document in a more accessible format, please email <a href="mailto:enquiries@ofsted.gov.uk" target="_blank" class="govuk-link">enquiries@ofsted.gov.uk</a>. Please tell us what format you need. It will help us if you say what assistive technology you use.
    

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Agricultural Research Service (2025). Current and projected research data storage needs of Agricultural Research Service researchers in 2016 [Dataset]. https://catalog.data.gov/dataset/current-and-projected-research-data-storage-needs-of-agricultural-research-service-researc-f33da
Organization logo

Data from: Current and projected research data storage needs of Agricultural Research Service researchers in 2016

Related Article
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description

The USDA Agricultural Research Service (ARS) recently established SCINet , which consists of a shared high performance computing resource, Ceres, and the dedicated high-speed Internet2 network used to access Ceres. Current and potential SCINet users are using and generating very large datasets so SCINet needs to be provisioned with adequate data storage for their active computing. It is not designed to hold data beyond active research phases. At the same time, the National Agricultural Library has been developing the Ag Data Commons, a research data catalog and repository designed for public data release and professional data curation. Ag Data Commons needs to anticipate the size and nature of data it will be tasked with handling. The ARS Web-enabled Databases Working Group, organized under the SCINet initiative, conducted a study to establish baseline data storage needs and practices, and to make projections that could inform future infrastructure design, purchases, and policies. The SCINet Web-enabled Databases Working Group helped develop the survey which is the basis for an internal report. While the report was for internal use, the survey and resulting data may be generally useful and are being released publicly. From October 24 to November 8, 2016 we administered a 17-question survey (Appendix A) by emailing a Survey Monkey link to all ARS Research Leaders, intending to cover data storage needs of all 1,675 SY (Category 1 and Category 4) scientists. We designed the survey to accommodate either individual researcher responses or group responses. Research Leaders could decide, based on their unit's practices or their management preferences, whether to delegate response to a data management expert in their unit, to all members of their unit, or to themselves collate responses from their unit before reporting in the survey. Larger storage ranges cover vastly different amounts of data so the implications here could be significant depending on whether the true amount is at the lower or higher end of the range. Therefore, we requested more detail from "Big Data users," those 47 respondents who indicated they had more than 10 to 100 TB or over 100 TB total current data (Q5). All other respondents are called "Small Data users." Because not all of these follow-up requests were successful, we used actual follow-up responses to estimate likely responses for those who did not respond. We defined active data as data that would be used within the next six months. All other data would be considered inactive, or archival. To calculate per person storage needs we used the high end of the reported range divided by 1 for an individual response, or by G, the number of individuals in a group response. For Big Data users we used the actual reported values or estimated likely values. Resources in this dataset:Resource Title: Appendix A: ARS data storage survey questions. File Name: Appendix A.pdfResource Description: The full list of questions asked with the possible responses. The survey was not administered using this PDF but the PDF was generated directly from the administered survey using the Print option under Design Survey. Asterisked questions were required. A list of Research Units and their associated codes was provided in a drop down not shown here. Resource Software Recommended: Adobe Acrobat,url: https://get.adobe.com/reader/ Resource Title: CSV of Responses from ARS Researcher Data Storage Survey. File Name: Machine-readable survey response data.csvResource Description: CSV file includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed. This information is that same data as in the Excel spreadsheet (also provided).Resource Title: Responses from ARS Researcher Data Storage Survey. File Name: Data Storage Survey Data for public release.xlsxResource Description: MS Excel worksheet that Includes raw responses from the administered survey, as downloaded unfiltered from Survey Monkey, including incomplete responses. Also includes additional classification and calculations to support analysis. Individual email addresses and IP addresses have been removed.Resource Software Recommended: Microsoft Excel,url: https://products.office.com/en-us/excel

Search
Clear search
Close search
Google apps
Main menu