100+ datasets found

d
Data from: ESS-DIVE Reporting Format for Comma-separated Values (CSV) File...
dataone.org
search.dataone.org
+2more
Updated May 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Terri Velliquette; Jessica Welch; Michael Crow; Ranjeet Devarakonda; Susan Heinz; Robert Crystal-Ornelas (2023). ESS-DIVE Reporting Format for Comma-separated Values (CSV) File Structure [Dataset]. https://dataone.org/datasets/ess-dive-2d07e9e9b2bb3f3-20230504T212247921492
Explore at:
Dataset updated
May 4, 2023
Dataset provided by
ESS-DIVE
Authors
Terri Velliquette; Jessica Welch; Michael Crow; Ranjeet Devarakonda; Susan Heinz; Robert Crystal-Ornelas
Time period covered
Jan 1, 2020 - Sep 30, 2021
Description
The ESS-DIVE reporting format for Comma-separated Values (CSV) file structure is based on a combination of existing guidelines and recommendations including some found within the Earth Science Community with valuable input from the Environmental Systems Science (ESS) Community. The CSV reporting format is designed to promote interoperability and machine-readability of CSV data files while also facilitating the collection of some file-level metadata content. Tabular data in the form of rows and columns should be archived in its simplest form, and we recommend submitting these tabular data following the ESS-DIVE reporting format for generic comma-separated values (CSV) text format files. In general, the CSV file format is more likely accessible by future systems when compared to a proprietary format and CSV files are preferred because this format is easier to exchange between different programs increasing the interoperability of a data file. By defining the reporting format and providing guidelines for how to structure CSV files and some field content within, this can increase the machine-readability of the data file for extracting, compiling, and comparing the data across files and systems. Data package files are in .csv, .png, and .md. Open the .csv with e.g. Microsoft Excel, LibreOffice, or Google Sheets. Open the .md files by downloading and using a text editor (e.g., notepad or TextEdit). Open the .png in e.g. a web browser, photo viewer/editor, or Google Drive.
d
Tidal Daily Discharge and Quality Assurance Data Supporting an Assessment of...
catalog.data.gov
data.usgs.gov
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Tidal Daily Discharge and Quality Assurance Data Supporting an Assessment of Water Quality and Discharge in the Herring River, Wellfleet, Massachusetts, November 2015–September 2017 [Dataset]. https://catalog.data.gov/dataset/tidal-daily-discharge-and-quality-assurance-data-supporting-an-assessment-of-water-quality
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Wellfleet, Massachusetts, Herring River
Description
This data release provides data in support of an assessment of water quality and discharge in the Herring River at the Chequessett Neck Road dike in Wellfleet, Massachusetts, from November 2015 to September 2017. The assessment was a cooperative project among the U.S. Geological Survey, National Park Service, Cape Cod National Seashore, and the Friends of Herring River to characterize environmental conditions prior to a future removal of the dike. It is described in U.S. Geological Survey (USGS) Scientific Investigations Report "Assessment of Water Quality and Discharge in the Herring River, Wellfleet, Massachusetts, November 2015 – September 2017." This data release is structured as a set of comma-separated values (CSV) files, each of which contains information on data source (or laboratory used for analysis), USGS site identification (ID) number, beginning date of time of observation or sampling, ending date and time of observation or sampling and data such as flow rate and analytical results. The CSV files include calculated tidal daily flows (Flood_Tide_Tidal_Day.csv and Ebb_Tide_Tidal_Day.csv) that were used in Huntington and others (2020) for estimation of nutrient loads. Tidal daily flows are the estimated mean daily discharges for two consecutive flood and ebb tide cycles (average duration: 24 hours, 48 minutes). The associated date is the day on which most of the flow occurred. CSV files contain quality assurance data for water-quality samples including blanks (Blanks.csv), replicates (Replicates.csv), standard reference materials (Standard_Reference_Material.csv), and atmospheric ammonium contamination (NH4_Atmospheric_Contamination.csv). One CSV file (EWI_vs_ISCO.csv) contains data comparing composite samples collected by an automatic sampler (ISCO) at a fixed point with depth-integrated samples collected at equal width increments (EWI). One CSV file (Cross_Section_Field_Parameters.csv) contains field parameter data (specific conductance, temperature, pH, and dissolved oxygen) collected at a fixed location and data collected along the cross sections at variable water depths and horizontal distances across the openings of the culverts at the Chequessett Neck Road dike. One CSV file (LOADEST_Bias_Statistics.csv) contains data that include estimated natural log of load, model residuals, Z-scores, and seasonal model residuals for winter (December, January, and February); spring (March, April and May); summer (June, July and August); and fall (September, October, and November). The data release also includes a data dictionary (Data_Dictionary.csv) that provides detailed descriptions of each field in each CSV file, including: data filename; laboratory or data source; U.S. Geological Survey site ID numbers; data types; constituent (analyte) U.S. Geological Survey parameter codes; descriptions of parameters; units; methods; minimum reporting limits; limits of quantitation, if appropriate; method reference citations; and minimum, maximum, median, and average values for each analyte. The data release also includes an abbreviations file (Abbreviations.pdf) that defines all the abbreviations in the data dictionary and CSV files. Note that the USGS site ID includes a leading zero (011058798) and some of the parameter codes contain leading zeros, so care must be taken in opening and subsequently saving these files in other formats where leading zeros may be dropped.
d
Data from: "A guide to using GitHub for developing and versioning data...
dataone.org
knb.ecoinformatics.org
+1more
Updated Apr 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert Crystal-Ornelas; Charuleka Varadharajan; Ben Bond-Lamberty; Kristin Boye; Shreyas Cholia; Michael Crow; Ranjeet Devarakonda; Kim S. Ely; Amy Goldman; Susan Heinz; Valerie Hendrix; Joan Damerow; Stephanie Pennington; Madison Burrus; Zarine Kakalia; Emily Robles; Maegen Simmonds; Alistair Rogers; Terri Velliquette; Helen Weierbach; Pamela Weisenhorn; Jessica N. Welch; Deborah A. Agarwal (2023). Data from: "A guide to using GitHub for developing and versioning data standards and reporting formats" [Dataset]. http://doi.org/10.15485/1780565
Explore at:
Unique identifier
https://doi.org/10.15485/1780565
Dataset updated
Apr 6, 2023
Dataset provided by
ESS-DIVE
Authors
Robert Crystal-Ornelas; Charuleka Varadharajan; Ben Bond-Lamberty; Kristin Boye; Shreyas Cholia; Michael Crow; Ranjeet Devarakonda; Kim S. Ely; Amy Goldman; Susan Heinz; Valerie Hendrix; Joan Damerow; Stephanie Pennington; Madison Burrus; Zarine Kakalia; Emily Robles; Maegen Simmonds; Alistair Rogers; Terri Velliquette; Helen Weierbach; Pamela Weisenhorn; Jessica N. Welch; Deborah A. Agarwal
Time period covered
Sep 1, 2020 - Dec 3, 2020
Description
These data are the results of a systematic review that investigated how data standards and reporting formats are documented on the version control platform GitHub. Our systematic review identified 32 data standards in earth science, environmental science, and ecology that use GitHub for version control of data standard documents. In our analysis, we characterized the documents and content within each of the 32 GitHub repositories to identify common practices for groups that version control their documents on GitHub. In this data package, there are 8 CSV files that contain data that we characterized from each repository, according to the location within the repository. For example, in 'readme_pages.csv' we characterize the content that appears across the 32 GitHub repositories included in our systematic review. Each of the 8 CSV files has an associated data dictionary file (names appended with '_dd.csv' and here we describe each content category within CSV files. There is one file-level metadata file (flmd.csv) that provides a description of each file within the data package.
U
myhealthlondon Indicators
data.ubdc.ac.uk
data.wu.ac.at
csv
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Greater London Authority (2023). myhealthlondon Indicators [Dataset]. https://data.ubdc.ac.uk/dataset/myhealthlondon-indicators
Explore at:
csvAvailable download formats
Dataset updated
Nov 8, 2023
Dataset provided by
Greater London Authority
Description
Series of indicators underlying the myhealthlondon website. Information about outcome standards is also available.

Each indicator is provided as a separate data file which can be found using the links below. Full metadata for the indicators are also available.

Indicator 1 - One year cancer survival for lung cancer and breast cancer (Borough level)

Indicator 2 - Identifying the prevalence of cancer (Borough level)

Indicator 3 - Cervical Screening

Indicator 4 - GP recorded smoking (all patients)

Indicator 5 - GP recorded smoking (patients with long term conditions)

Indicator 6a - Identification of the prevalence of atrial fibrillation

Indicator 6b - Treatment of atrial fibrillation

Indicator 7 - Uptake of immunisations for children

Indicator 8 - Uptake of immunisation for influenza for patients aged over 65, and those at risk under 65

Indicator 9 - Identifying the prevalence of chronic obstructive pulmonary disease based on estimates for the local population

Indicator 10 - Identifying the prevalence of asthma - no data for 2009

Indicator 11 - Identifying the prevalence of Diabetes - no data for 2009

Indicator 12 - Identifying the prevalence of coronary heart disease

Indicator 13 - Identifying the prevalence of dementia - data no longer available

Indicator 14 - Monitoring safe, rational and cost effective prescribing in general practice (Borough level)

Indicator 15 - The rate of emergency hospital admission for people with long term conditions usually managed by GPs

Indicator 16 - Rate of A&E attendances

Indicator 17 - Satisfaction with the quality of consultation at the GP practice

Indicator 17a - Percentage of patients who gave a positive answer to 'did the doctor or nurse take notice of your views about how to deal with your health problem?'

Indicator 17b - Percentage of patients who gave a positive answer to 'did the doctor or nurse give you information about the things you might do to deal with your health problem?'

Indicator 17c - Percentage of patients who gave a positive answer to 'did you and the doctor or nurse agree about how best to manage your health problem?

Indicator 17d - Percentage of patients who gave a positive answer to 'did the doctor or nurse give you a written document about the discussions you had about managing your health problem?'

Indicator 17e - Percentage of patients who gave a positive answer to 'did you want a written plan summarising your discussion with the doctor or nurse?'

Indicator 17f - Percentage of patients who gave a positive answer to 'did the doctor or nurse ever tell you that you had something called a 'care plan?'

Indicator 17g - Percentage of patients who gave a positive answer to 'do you think that having these discussions with your doctor or nurse has helped improve how you manage your health problem?'

Indicator 18 - Satisfaction with overall care received at the GP practice

Indicator 19 - Patients leaving this GP practice without changing home address

Indicator 20 - Satisfaction in being able to see a preferred doctor

Indicator 21 - Satisfaction with accessing primary care

Indicator 22 - Significant event reviews (one year and three year minimum levels)

Indicators 23 & 24 - Do not exist

Indicator 25 - Early detection of Cancer

Indicator 26b - Identifying the prevalence and assessing patients with a new diagnosis of depression - 26a data no longer available

Indicator 26c - Assessment of depression

Indicator 27a - Percentage of patients in the local area with serious mental illness contacted by their GP practice after missing their health review

Indicator 27b - Percentage of patients with serious mental illness with a physical health check in the last 15 months

Indicator 27c - Blood pressure checks for serious mental health illness

Indicator 27d - Physical health checks for serious mental health illness

Indicator 28 - End of life care

All indicators are provided at GP Practice level except for those marked as Borough level.

19 and 25 as of 04/09/2012

Indicator 6 updated as of 01/05/2013

Indicators 1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 17a to 17g, 18, 20, 21, 22, 26a, 26c, 27c ,27d and 28 updated as of 17/12/2013

Indicator 26a has been discontinued and is no longer available on the Datastore. There is a possibility that it may be reinstated in the future.

Data are provided in a standardised schema with each record containing an indicator ID and organisation ID. These can be matched against the two lookup files below to identify the indicator (or sub-indicator where available) and organisation.

Indicator Lookup File (csv format)

Organisation Lookup File (csv format)

Period Lookup File (csv format)
d
Dataset metadata of known Dataverse installations
search.dataone.org
dataverse.harvard.edu
+1more
Updated Nov 22, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gautier, Julian (2023). Dataset metadata of known Dataverse installations [Dataset]. http://doi.org/10.7910/DVN/DCDKZQ
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/DCDKZQ
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
Gautier, Julian
Description
This dataset contains the metadata of the datasets published in 77 Dataverse installations, information about each installation's metadata blocks, and the list of standard licenses that dataset depositors can apply to the datasets they publish in the 36 installations running more recent versions of the Dataverse software. The data is useful for reporting on the quality of dataset and file-level metadata within and across Dataverse installations. Curators and other researchers can use this dataset to explore how well Dataverse software and the repositories using the software help depositors describe data. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation on October 2 and October 3, 2022 using a Python script kept in a GitHub repo at https://github.com/jggautier/dataverse-scripts/blob/main/other_scripts/get_dataset_metadata_of_all_installations.py. In order to get the metadata from installations that require an installation account API token to use certain Dataverse software APIs, I created a CSV file with two columns: one column named "hostname" listing each installation URL in which I was able to create an account and another named "apikey" listing my accounts' API tokens. The Python script expects and uses the API tokens in this CSV file to get metadata and other information from installations that require API tokens. How the files are organized ├── csv_files_with_metadata_from_most_known_dataverse_installations │ ├── author(citation).csv │ ├── basic.csv │ ├── contributor(citation).csv │ ├── ... │ └── topic_classification(citation).csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │ ├── Abacus_2022.10.02_17.11.19.zip │ ├── dataset_pids_Abacus_2022.10.02_17.11.19.csv │ ├── Dataverse_JSON_metadata_2022.10.02_17.11.19 │ ├── hdl_11272.1_AB2_0AQZNT_v1.0.json │ ├── ... │ ├── metadatablocks_v5.6 │ ├── astrophysics_v5.6.json │ ├── biomedical_v5.6.json │ ├── citation_v5.6.json │ ├── ... │ ├── socialscience_v5.6.json │ ├── ACSS_Dataverse_2022.10.02_17.26.19.zip │ ├── ADA_Dataverse_2022.10.02_17.26.57.zip │ ├── Arca_Dados_2022.10.02_17.44.35.zip │ ├── ... │ └── World_Agroforestry_-_Research_Data_Repository_2022.10.02_22.59.36.zip └── dataset_pids_from_most_known_dataverse_installations.csv └── licenses_used_by_dataverse_installations.csv └── metadatablocks_from_most_known_dataverse_installations.csv This dataset contains two directories and three CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 18 CSV files that contain the values from common metadata fields of all 77 Dataverse installations. For example, author(citation)_2022.10.02-2022.10.03.csv contains the "Author" metadata for all published, non-deaccessioned, versions of all datasets in the 77 installations, where there's a row for each author name, affiliation, identifier type and identifier. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 77 zipped files, one for each of the 77 Dataverse installations whose dataset metadata I was able to download using Dataverse APIs. Each zip file contains a CSV file and two sub-directories: The CSV file contains the persistent IDs and URLs of each published dataset in the Dataverse installation as well as a column to indicate whether or not the Python script was able to download the Dataverse JSON metadata for each dataset. For Dataverse installations using Dataverse software versions whose Search APIs include each dataset's owning Dataverse collection name and alias, the CSV files also include which Dataverse collection (within the installation) that dataset was published in. One sub-directory contains a JSON file for each of the installation's published, non-deaccessioned dataset versions. The JSON files contain the metadata in the "Dataverse JSON" metadata schema. The other sub-directory contains information about the metadata models (the "metadata blocks" in JSON files) that the installation was using when the dataset metadata was downloaded. I saved them so that they can be used when extracting metadata from the Dataverse JSON files. The dataset_pids_from_most_known_dataverse_installations.csv file contains the dataset PIDs of all published datasets in the 77 Dataverse installations, with a column to indicate if the Python script was able to download the dataset's metadata. It's a union of all of the "dataset_pids_..." files in each of the 77 zip files. The licenses_used_by_dataverse_installations.csv file contains information about the licenses that a number of the installations let depositors choose when creating datasets. When I collected ... Visit https://dataone.org/datasets/sha256%3Ad27d528dae8cf01e3ea915f450426c38fd6320e8c11d3e901c43580f997a3146 for complete metadata about this dataset.
d
Data from: Dissolved Inorganic Carbon and Dissolved Organic Carbon Data for...
search.dataone.org
knb.ecoinformatics.org
+2more
Updated Apr 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenming Dong; Curtis Beutler; Wendy Brown; Alexander Newman; Dylan O'Ryan; Roelof Versteeg; Kenneth Williams (2024). Dissolved Inorganic Carbon and Dissolved Organic Carbon Data for the East River Watershed, Colorado (2015-2023) [Dataset]. https://search.dataone.org/view/ess-dive-3e2f8b427ee0700-20240409T005513024
Explore at:
Dataset updated
Apr 9, 2024
Dataset provided by
ESS-DIVE
Authors
Wenming Dong; Curtis Beutler; Wendy Brown; Alexander Newman; Dylan O'Ryan; Roelof Versteeg; Kenneth Williams
Time period covered
Sep 1, 2015 - Nov 21, 2023
Area covered

Description
This data package contains mean values for dissolved organic carbon (DOC) and dissolved inorganic carbon (DIC) for water samples taken from the East River Watershed in Colorado. The East River is part of the Watershed Function Scientific Focus Area (WFSFA) located in the Upper Colorado River Basin, United States. DOC and DIC concentrations in water samples were determined using a TOC-VCPH analyzer (Shimadzu Corporation, Japan). DOC was analyzed as non-purgeable organic carbon (NPOC) by purging HCl acidified samples with carbon-free air to remove DIC prior to measurement. After the acidified sample has been sparged, it is injected into a combustion tube filled with oxidation catalyst heated to 680 degrees C. The DOC in samples is combusted to CO2 and measured by a non-dispersive infrared (NDIR) detector. The peak area of the analog signal produced by the NDIR detector is proportional to the DOC concentration of the sample. DIC was determined by acidifying the samples with HCl first, and then purge with carbon-free air to release CO2 for analysis by NDIR detector. All files are labeled by location and variable, and data reported are the mean values upon minimum three replicate measurements with a relative standard deviation < 3%. All samples were analyzed under a rigorous quality assurance and quality control (QA/QC) process as detailed in the methods. This data package contains (1) a zip file (dic_npoc_data_2015-2023.zip) containing a total of 323 files: 322 data files of DIC and NPOC data from across the Lawrence Berkeley National Laboratory (LBNL) Watershed Function Scientific Focus Area (SFA) which is reported in .csv files per location and a locations.csv (1 file) with latitude and longitude for each location; (2) a file-level metadata (v4_20240311_flmd.csv) file that lists each file contained in the dataset with associated metadata; (3) a data dictionary (v4_20240311_dd.csv) file that contains terms/column_headers used throughout the files along with a definition, units, and data type; and (4) PDF and docx files for the determination of Method Detection Limits (MDLs) for DIC and NPOC data, which has been updated in 2024-03. Missing values within the anion data files are noted as either "-9999" or "0.0" for not detectable (N.D.) data. There are a total of 107 locations containing DIC/NPOC data. Update on 2020-10-07: Updated the data files to remove times from the timestamps, so that only dates remain. The data values have not changed. Update on 2021-04-11: Added Determination of Method Detection Limits (MDLs) for DIC, NPOC and TDN Analyses document, which can be accessed as a PDF or with Microsoft Word. Update on 6/10/2022: versioned updates to this dataset were made along with these changes: (1) updated dissolved inorganic carbon and dissolved organic carbon data for all locations up to 2021-12-31, (2) removal of units from column headers in datafiles, (3) added row underneath headers to contain units of variables, (4) restructure of units to comply with CSV reporting format requirements, (5) added -9999 for empty numerical cells, and (6) the addition of the file-level metadata (flmd.csv) and data dictionary (dd.csv) were added to comply with the File-Level Metadata Reporting Format. Update on 2022-09-09: Updates were made to reporting format specific files (file-level metadata and data dictionary) to correct swapped file names, add additional details on metadata descriptions on both files, add a header_row column to enable parsing, and add version number and date to file names (v2_20220909_flmd.csv and v2_20220909_dd.csv). Update on 2023-08-08: Updates were made to both the data files and reporting format specific files. New available anion data was added, up until 2023-01-05. The file level metadata and data dictionary files were updated to reflect the additional data added. Update on 2024-03-11: Updates were made to both the data files and reporting format specific files. New available anion data was added, up until 2023-11-21. Further, revisions to the data files were made to remove incorrect data points (from 1970 and 2001). The reporting format specific files were updated to reflect the additional data added. Revised versions of the PDF and docx files for determination of MDLs for DIC and NPOC were added to replace previous versions.
Data articles in journals
zenodo.org
data.niaid.nih.gov
bin, csv, txt
Updated Sep 22, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carlota Balsa-Sanchez; Carlota Balsa-Sanchez; Vanesa Loureiro; Vanesa Loureiro (2023). Data articles in journals [Dataset]. http://doi.org/10.5281/zenodo.8367960
Explore at:
bin, csv, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8367960
Dataset updated
Sep 22, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Carlota Balsa-Sanchez; Carlota Balsa-Sanchez; Vanesa Loureiro; Vanesa Loureiro
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Version: 5

Authors: Carlota Balsa-Sánchez, Vanesa Loureiro

Date of data collection: 2023/09/05

General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
File list:

- data_articles_journal_list_v5.xlsx: full list of 140 academic journals in which data papers or/and software papers could be published
- data_articles_journal_list_v5.csv: full list of 140 academic journals in which data papers or/and software papers could be published

Relationship between files: both files have the same information. Two different formats are offered to improve reuse

Type of version of the dataset: final processed version

Versions of the files: 5th version
- Information updated: number of journals, URL, document types associated to a specific journal.

Version: 4

Authors: Carlota Balsa-Sánchez, Vanesa Loureiro

Date of data collection: 2022/12/15

General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
File list:

- data_articles_journal_list_v4.xlsx: full list of 140 academic journals in which data papers or/and software papers could be published
- data_articles_journal_list_v4.csv: full list of 140 academic journals in which data papers or/and software papers could be published

Relationship between files: both files have the same information. Two different formats are offered to improve reuse

Type of version of the dataset: final processed version

Versions of the files: 4th version
- Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types
- Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Journal Citation Reports (JCR) and/or Scimago Journal and Country Rank (SJR), Scopus and Web of Science (WOS), Journal Master List.

Version: 3

Authors: Carlota Balsa-Sánchez, Vanesa Loureiro

Date of data collection: 2022/10/28

General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
File list:

- data_articles_journal_list_v3.xlsx: full list of 124 academic journals in which data papers or/and software papers could be published
- data_articles_journal_list_3.csv: full list of 124 academic journals in which data papers or/and software papers could be published

Relationship between files: both files have the same information. Two different formats are offered to improve reuse

Type of version of the dataset: final processed version

Versions of the files: 3rd version
- Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types
- Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Journal Citation Reports (JCR) and/or Scimago Journal and Country Rank (SJR).

Erratum - Data articles in journals Version 3:

Botanical Studies -- ISSN 1999-3110 -- JCR (JIF) Q2
Data -- ISSN 2306-5729 -- JCR (JIF) n/a
Data in Brief -- ISSN 2352-3409 -- JCR (JIF) n/a

Version: 2

Author: Francisco Rubio, Universitat Politècnia de València.

Date of data collection: 2020/06/23

General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
File list:

- data_articles_journal_list_v2.xlsx: full list of 56 academic journals in which data papers or/and software papers could be published
- data_articles_journal_list_v2.csv: full list of 56 academic journals in which data papers or/and software papers could be published

Relationship between files: both files have the same information. Two different formats are offered to improve reuse

Type of version of the dataset: final processed version

Versions of the files: 2nd version
- Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types
- Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Scimago Journal and Country Rank (SJR)

Total size: 32 KB

Version 1: Description

This dataset contains a list of journals that publish data articles, code, software articles and database articles.

The search strategy in DOAJ and Ulrichsweb was the search for the word data in the title of the journals.
Acknowledgements:
Xaquín Lores Torres for his invaluable help in preparing this dataset.

Annotated 12 lead ECG dataset

zenodo.org

zip

Updated Jun 7, 2021

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Antonio H Ribeiro; Antonio H Ribeiro; Manoel Horta Ribeiro; Manoel Horta Ribeiro; Gabriela M. Paixão; Gabriela M. Paixão; Derick M. Oliveira; Derick M. Oliveira; Paulo R. Gomes; Paulo R. Gomes; Jéssica A. Canazart; Jéssica A. Canazart; Milton P. Ferreira; Milton P. Ferreira; Carl R. Andersson; Carl R. Andersson; Peter W. Macfarlane; Peter W. Macfarlane; Wagner Meira Jr.; Wagner Meira Jr.; Thomas B. Schön; Thomas B. Schön; Antonio Luiz P. Ribeiro; Antonio Luiz P. Ribeiro (2021). Annotated 12 lead ECG dataset [Dataset]. http://doi.org/10.5281/zenodo.3625007

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.3625007

Dataset updated

Jun 7, 2021

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

# Annotated 12 lead ECG dataset

Contain 827 ECG tracings from different patients, annotated by several cardiologists, residents and medical students.
It is used as test set on the paper:
"Automatic Diagnosis of the Short-Duration12-Lead ECG using a Deep Neural Network".

It contain annotations about 6 different ECGs abnormalities:
- 1st degree AV block (1dAVb);
- right bundle branch block (RBBB);
- left bundle branch block (LBBB);
- sinus bradycardia (SB);
- atrial fibrillation (AF); and,
- sinus tachycardia (ST).

## Folder content:

- `ecg_tracings.hdf5`: HDF5 file containing a single dataset named `tracings`. This dataset is a 
`(827, 4096, 12)` tensor. The first dimension correspond to the 827 different exams from different 
patients; the second dimension correspond to the 4096 signal samples; the third dimension to the 12
different leads of the ECG exam. 

The signals are sampled at 400 Hz. Some signals originally have a duration of 
10 seconds (10 * 400 = 4000 samples) and others of 7 seconds (7 * 400 = 2800 samples).
In order to make them all have the same size (4096 samples) we fill them with zeros
on both sizes. For instance, for a 7 seconds ECG signal with 2800 samples we include 648
samples at the beginning and 648 samples at the end, yielding 4096 samples that are them saved
in the hdf5 dataset. All signal are represented as floating point numbers at the scale 1e-4V: so it should
be multiplied by 1000 in order to obtain the signals in V.

In python, one can read this file using the following sequence:
```python
import h5py
with h5py.File(args.tracings, "r") as f:
  x = np.array(f['tracings'])
```

- The file `attributes.csv` contain basic patient attributes: sex (M or F) and age. It
contain 827 lines (plus the header). The i-th tracing in `ecg_tracings.hdf5` correspond to the i-th line.
- `annotations/`: folder containing annotations csv format. Each csv file contain 827 lines (plus the header).
The i-th line correspond to the i-th tracing in `ecg_tracings.hdf5` correspond to the in all csv files.
The csv files all have 6 columns `1dAVb, RBBB, LBBB, SB, AF, ST`
corresponding to weather the annotator have detect the abnormality in the ECG (`=1`) or not (`=0`).
 1. `cardiologist[1,2].csv` contain annotations from two different cardiologist.
 2. `gold_standard.csv` gold standard annotation for this test dataset. When the cardiologist 1 and cardiologist 2
 agree, the common diagnosis was considered as gold standard. In cases where there was any disagreement, a 
 third senior specialist, aware of the annotations from the other two, decided the diagnosis. 
 3. `dnn.csv` prediction from the deep neural network described in 
 "Automatic Diagnosis of the Short-Duration 12-Lead ECG using a Deep Neural Network". The threshold is set in such way 
 it maximizes the F1 score.
 4. `cardiology_residents.csv` annotations from two 4th year cardiology residents (each annotated half of the dataset).
 5. `emergency_residents.csv` annotations from two 3rd year emergency residents (each annotated half of the dataset).
 6. `medical_students.csv` annotations from two 5th year medical students (each annotated half of the dataset).

A Comprehensive Surface Water Quality Monitoring Dataset (1940-2023):...
figshare.com
csv
Updated Feb 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md. Rajaul Karim; Mahbubul Syeed; Ashifur Rahman; Khondkar Ayaz Rabbani; Kaniz Fatema; Razib Hayat Khan; Md Shakhawat Hossain; Mohammad Faisal Uddin (2025). A Comprehensive Surface Water Quality Monitoring Dataset (1940-2023): 2.82Million Record Resource for Empirical and ML-Based Research [Dataset]. http://doi.org/10.6084/m9.figshare.27800394.v2
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27800394.v2
Dataset updated
Feb 23, 2025
Dataset provided by
figshare
Authors
Md. Rajaul Karim; Mahbubul Syeed; Ashifur Rahman; Khondkar Ayaz Rabbani; Kaniz Fatema; Razib Hayat Khan; Md Shakhawat Hossain; Mohammad Faisal Uddin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data DescriptionWater Quality Parameters: Ammonia, BOD, DO, Orthophosphate, pH, Temperature, Nitrogen, Nitrate.Countries/Regions: United States, Canada, Ireland, England, China.Years Covered: 1940-2023.Data Records: 2.82 million.Definition of ColumnsCountry: Name of the water-body region.Area: Name of the area in the region.Waterbody Type: Type of the water-body source.Date: Date of the sample collection (dd-mm-yyyy).Ammonia (mg/l): Ammonia concentration.Biochemical Oxygen Demand (BOD) (mg/l): Oxygen demand measurement.Dissolved Oxygen (DO) (mg/l): Concentration of dissolved oxygen.Orthophosphate (mg/l): Orthophosphate concentration.pH (pH units): pH level of water.Temperature (°C): Temperature in Celsius.Nitrogen (mg/l): Total nitrogen concentration.Nitrate (mg/l): Nitrate concentration.CCME_Values: Calculated water quality index values using the CCME WQI model.CCME_WQI: Water Quality Index classification based on CCME_Values.Data Directory Description:Category 1: DatasetCombined Data: This folder contains two CSV files: Combined_dataset.csv and Summary.xlsx. The Combined_dataset.csv file includes all eight water quality parameter readings across five countries, with additional data for initial preprocessing steps like missing value handling, outlier detection, and other operations. It also contains the CCME Water Quality Index calculation for empirical analysis and ML-based research. The Summary.xlsx provides a brief description of the datasets, including data distributions (e.g., maximum, minimum, mean, standard deviation).Combined_dataset.csvSummary.xlsxCountry-wise Data: This folder contains separate country-based datasets in CSV files. Each file includes the eight water quality parameters for regional analysis. The Summary_country.xlsx file presents country-wise dataset descriptions with data distributions (e.g., maximum, minimum, mean, standard deviation).England_dataset.csvCanada_dataset.csvUSA_dataset.csvIreland_dataset.csvChina_dataset.csvSummary_country.xlsxCategory 2: CodeData processing and harmonization code (e.g., Language Conversion, Date Conversion, Parameter Naming and Unit Conversion, Missing Value Handling, WQI Measurement and Classification).Data_Processing_Harmonnization.ipynbThe code used for Technical Validation (e.g., assessing the Data Distribution, Outlier Detection, Water Quality Trend Analysis, and Vrifying the Application of the Dataset for the ML Models).Technical_Validation.ipynbCategory 3: Data Collection SourcesThis category includes links to the selected dataset sources, which were used to create the dataset and are provided for further reconstruction or data formation. It contains links to various data collection sources.DataCollectionSources.xlsxOriginal Paper Title: A Comprehensive Dataset of Surface Water Quality Spanning 1940-2023 for Empirical and ML Adopted ResearchAbstractAssessment and monitoring of surface water quality are essential for food security, public health, and ecosystem protection. Although water quality monitoring is a known phenomenon, little effort has been made to offer a comprehensive and harmonized dataset for surface water at the global scale. This study presents a comprehensive surface water quality dataset that preserves spatio-temporal variability, integrity, consistency, and depth of the data to facilitate empirical and data-driven evaluation, prediction, and forecasting. The dataset is assembled from a range of sources, including regional and global water quality databases, water management organizations, and individual research projects from five prominent countries in the world, e.g., the USA, Canada, Ireland, England, and China. The resulting dataset consists of 2.82 million measurements of eight water quality parameters that span 1940 - 2023. This dataset can support meta-analysis of water quality models and can facilitate Machine Learning (ML) based data and model-driven investigation of the spatial and temporal drivers and patterns of surface water quality at a cross-regional to global scale.Note: Cite this repository and the original paper when using this dataset.
Observations of carbon dioxide (CO2), methane (CH4), and carbon monoxide...
catalog.data.gov
s.cnmilf.com
+1more
Updated May 9, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2023). Observations of carbon dioxide (CO2), methane (CH4), and carbon monoxide (CO) mole fractions from the NIST Northeast Corridor urban testbed [Dataset]. https://catalog.data.gov/dataset/observations-of-carbon-dioxide-co2-methane-ch4-and-carbon-monoxide-co-mole-fractions-from-
Explore at:
Dataset updated
May 9, 2023
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
Note: This data archive supersedes our previous archive: https://doi.org/10.18434/mds2-2491. See Updates document for specific major and minor changes. Here we provide hourly observations of carbon dioxide (CO2), methane (CH4), and carbon monoxide (CO) from tower-based sites in the NIST Northeast Corridor network. Each *.tgz (tar/gzip) archive contains data files for a given site location, named by its 3-letter code (see NEC_sites.csv for codes and locations). To extract on a unix platform, use "tar -xvzf ". Data files within the .tgz archives are comma delimited (CSV); data files within the _NC.tgz archives are in NetCDF format. Site locations, heights, and other information are in a separate ascii (CSV) file (NEC_sites.csv), and also within each data file. Data in this archive is reported for the years 2015-2022. An ASCII Readme file (NEC_Readme_05052023) is also posted, along with an Updates_05052023.txt file that includes additional information on updates. Note about calibrations: CO2 data are reported on the NOAA/WMO X2007 calibration scale. CH4 data are reported on the NOAA/WMO X2004A calibration scale; CO data, where available, are reported on the NOAA/WMO X2014 scale. This archive, with CO2 data on the X2007 scale, will no longer be updated. A full revision of all the data on the NOAA/WMO X2019 scale for CO2 is forthcoming, and will be linked here; that archive will maintain its own record and DOI, include CH4 and CO, and will be continually updated. This data is being freely distributed for research, academic and related non-commercial purposes consistent with NIST's mandate to further the science and the promulgation of appropriate standards. Current update: May 5, 2023.
p
exp 5.csv
psycharchives.org
Updated Feb 11, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). exp 5.csv [Dataset]. https://www.psycharchives.org/handle/20.500.12034/4084
Explore at:
Dataset updated
Feb 11, 2021
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This article aims to present the development and test the psychometric properties of the Perceived Occupational Stress (POS) scale, a new brief instrument aimed at rating a worker’s perception of feeling stressed at work. Six studies are conducted on an overall sample of 1,805 Italian workers, to examine both the construct and concurrent validity of the POS scale. The results demonstrate the high internal consistency (α = .82) and test-retest reliability (r = .86) of the POS scale, as well as its structural validity and concurrent validity with the Maslach Burnout Inventory (r = .68 with Emotional Exhaustion) and the Effort-Reward Imbalance Questionnaire (r = .62 with Imbalance and r = .51 with Overcommitment). Moreover, the POS scale is determined to uniquely contribute toward predicting stress-related health complaints, over and above indicators of workplace stressors, as measured by the Health and Safety Executive Management Standards Indicator Tool (R2change = .06). Overall, the present findings indicate that the POS scale is a valid and reliable instrument for self-reporting occupational stress levels, and it could be used together with existing risk assessment measures of stress, to obtain a comprehensive evaluation of work-relatedstress. Dataset for: Marcatto, F., Di Blas, L., Luis, O., Festa, S., & Ferrante, D. (2021). The Perceived Occupational Stress Scale: A brief tool for measuring workers' peception of stress at work. European Journal of Psychological Assessment. https://doi.org/10.1027/1015-5759/a000677: study 5
MOT testing data for Great Britain
s3.amazonaws.com
gov.uk
Updated Mar 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Driver and Vehicle Standards Agency (2022). MOT testing data for Great Britain [Dataset]. https://s3.amazonaws.com/thegovernmentsays-files/content/179/1797262.html
Explore at:
Dataset updated
Mar 24, 2022
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Driver and Vehicle Standards Agency
Area covered
Great Britain, United Kingdom
Description
About this data set

This data set comes from data held by the Driver and Vehicle Standards Agency (DVSA).

It is not classed as an ‘official statistic’. This means it’s not subject to scrutiny and assessment by the UK Statistics Authority.

MOT test results by class

The MOT test checks that your vehicle meets road safety and environmental standards. Different types of vehicles (for example, cars and motorcycles) fall into different ‘classes’.

This data table shows the number of initial tests. It does not include abandoned tests, aborted tests, or retests.

The initial fail rate is the rate for vehicles as they were brought for the MOT. The final fail rate excludes vehicles that pass the test after rectification of minor defects at the time of the test.

This data table is updated every 3 months.

https://www.gov.uk/assets/whitehall/pub-cover-spreadsheet-471052e0d03e940bbc62528a05ac204a884b553e4943e63c8bffa6b8baef8967.png">

MOT test results by class of vehicle

Ref: DVSA/MOT/01 View online https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1060287/dvsa-mot-01-mot-test-results-by-class-of-vehicle1.csv"> Download CSV 16.1 KB

Initial failures by defect category

These tables give data for the following classes of vehicles:

class 1 and 2 vehicles - motorcycles

class 3 and 4 vehicles - cars and light vans up to 3,000kg

class 5 vehicles - private passenger vehicles with more than 12 seats

class 7 vehicles - goods vehicles between 3,000kg and 3,500kg gross vehicle weight

All figures are for vehicles as they were brought in for the MOT.

A failed test usually has multiple failure items.

The percentage of tests is worked out as the number of tests with one or more failure items in the defect as a percentage of total tests.

The percentage of defects is worked out as the total defects in the category as a percentage of total defects for all categories.

The average defects per initial test failure is worked out as the total failure items as a percentage of total tests failed plus tests that passed after rectification of a minor defect at the time of the test.

These data tables are updated every 3 months.

https://www.gov.uk/assets/whitehall/pub-cover-spreadsheet-471052e0d03e940bbc62528a05ac204a884b553e4943e63c8bffa6b8baef8967.png">

MOT class 1 and 2 vehicles: initial failures by defect category

Ref: DVSA/MOT/02 View online https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1060255/dvsa-mot-02-mot-class-1-and-2-vehicles-initial-failures-by-defect-category-.csv"> Download CSV 19.1 KB

https://www.gov.uk/assets/whitehall/pub-cover-spreadsheet-471052e0d03e940bbc62528a05ac204a884b553e4943e63c8bffa6b8baef8967.png">

MOT class 3 and 4 vehicles: initial failures by defect category</h3
d
Article 6 of Noise Control Standards: Noise Control Standard Values for...
data.gov.tw
csv, json, xml
Updated Mar 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yiland County Government (2025). Article 6 of Noise Control Standards: Noise Control Standard Values for Construction Projects [Dataset]. https://data.gov.tw/en/datasets/42243
Explore at:
json, csv, xmlAvailable download formats
Dataset updated
Mar 21, 2025
Dataset authored and provided by
Yiland County Government
License
https://data.gov.tw/licensehttps://data.gov.tw/license
Description
Provided by the Environmental Protection Bureau of Yilan County Government, Noise Control Standards Article 6: Construction Project Noise Control Standard Values (CSV, XML, JSON format data)
d
nzqa_exam_questions_contextual_population_parameter_definitions.csv -...
catalogue.data.govt.nz
Updated Nov 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). nzqa_exam_questions_contextual_population_parameter_definitions.csv - Dataset - data.govt.nz - discover and use data [Dataset]. https://catalogue.data.govt.nz/dataset/oai-figshare-com-article-27644367
Explore at:
Dataset updated
Nov 11, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data set represents contextualised population parameter definitions extracted and developed from past NZQA Level 3 Statistics exam questions. and assessment schedules, namely those used for the achievement standards AS90642 and AS91584. The data set was developed by Haozhong Wei as part of his MSc dissertation project, under the supervision of Dr Anna Fergusson and Dr Anne Patel (University of Auckland | Waipapa Taumata Rau).An overview of the variables used in the dataset:1. Year: This variable is the year of the exam. 2. Paper: This is the identifier of the paper, e.g., AS90642, indicating the specific exam to which the question belongs. 3. Type: This variable indicates the type of data and usually identifies whether the entry is a question or an answer. 4. Question part: This variable indicates the specific part number of the problem, e.g., 1a, 1b, 2, etc. 5. Text: This is the full text of the question. 6. Population parameter: A description of the parameter of the entire text. 7. Parameter type: These variables further detail the type of overall parameter, such as ‘single mean’ or ‘single proportion’ or even ‘difference between two means’.
d
Data from: ESS-DIVE Reporting Format for File-level Metadata
search.dataone.org
data.ess-dive.lbl.gov
+2more
Updated Oct 8, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Terri Velliquette; Jessica Welch; Michael Crow; Ranjeet Devarakonda; Susan Heinz; Robert Crystal-Ornelas (2021). ESS-DIVE Reporting Format for File-level Metadata [Dataset]. https://search.dataone.org/view/ess-dive-a95fac98da3b481-20210928T175904096
Explore at:
Dataset updated
Oct 8, 2021
Dataset provided by
ESS-DIVE
Authors
Terri Velliquette; Jessica Welch; Michael Crow; Ranjeet Devarakonda; Susan Heinz; Robert Crystal-Ornelas
Time period covered
Jan 1, 2020 - Sep 30, 2021
Description
The ESS-DIVE reporting format for file-level metadata (FLMD) provides granular information at the data file level to describe the contents, scope, and structure of the data file to enable comparison of data files within a data package. The FLMD are fully consistent with and augment the metadata collected at the data package level. We developed the FLMD template based on a review of a small number of existing FLMD in use at other agencies and repositories with valuable input from the Environmental Systems Science (ESS) Community. Also included is a template for a CSV Data Dictionary where users can provide file-level information about the contents of a CSV data file (e.g., define column names, provide units). Files are in .csv, .xlsx, and .md. Templates are in both .csv and .xlsx (open with e.g. Microsoft Excel, LibreOffice, or Google Sheets). Open the .md files by downloading and using a text editor (e.g. Notepad or TextEdit). Though we provide Excel templates for the file-level metadata reporting format, our instructions encourage users to 'Save the FLMD template as a CSV following the CSV Reporting Format guidance'. In addition, we developed the ESS-DIVE File Level Metadata Extractor which is a lightweight python script that can extract some FLMD fields following the recommended FLMD format and structure.
e
Standards that were repealed in 2017
data.europa.eu
csv
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Freie und Hansestadt Bremen, Standards that were repealed in 2017 [Dataset]. https://data.europa.eu/88u/dataset/bremen2014_tp-c-112574-de
Explore at:
csvAvailable download formats
Dataset authored and provided by
Freie und Hansestadt Bremen
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The dataset contains a list of all the standards listed in the Bremen law portal, which were repealed in 2017.The dataset in CSV format contains a list of all the standards listed in the Bremen law portal, which were repealed in 2017.The title and the date for the repeal are listed.
n
Standard observation data of the equatorial troposphere and lower...
heliophysicsdata.gsfc.nasa.gov
txt
Updated Jul 21, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). Standard observation data of the equatorial troposphere and lower stratosphere taken by the EAR (CSV format) [Dataset]. https://heliophysicsdata.gsfc.nasa.gov/WS/hdp/1/Spase?ResourceID=spase%3A%2F%2FIUGONET%2FNumericalData%2FRISH%2Fmisc%2FKTB%2FEAradar%2Fear_ktb_tropstd_csv
Explore at:
txtAvailable download formats
Dataset updated
Jul 21, 2016
License
https://cdla.io/permissive-1-0/https://cdla.io/permissive-1-0/
Variables measured
date, pwr1, pwr2, pwr3, pwr4, pwr5, time, uwnd, vwnd, wdt1, and 6 more
Description
The 10-minute average observation data in the equatorial troposphere (2-20 km) taken by the equatorial atmosphere radar (EAR) at Kototabang, Indonesia (0.20S, 100.32E, 865m MSL), which has been operated in the standard observation mode of the troposphere and lower stratosphere. The data are stored in the CSV (Comma Separated Values) file named (year)(month)(day).(variable).csv. The variable abbreviations are uwnd, vwnd, wwnd, pwr1, pwr2, pwr3, pwr4, pwr5, wdt1, wdt2, wdt3, wdt4 and wdt5, which mean zonal, meridional and vertical wind velocities, echo power and spectral width for beam 1-5, respectively. The azimuth and zenith angles of beam 1, 2, 3, 4 and 5 are (0, 0), (0, 10), (90, 10), (180, 10) and (270, 10), respectively, in unit of degree. The numbers of the first line of each file represent altitudes, and the second and below are observational local date and time (year/month/day hour:minute) which corresponds to the center time of averaging period and data at each altitude. The value of 999 means missing data.
Z
Data from: Bio-logger Ethogram Benchmark: A benchmark for computational...
data.niaid.nih.gov
portalcientifico.unileon.es
+3more
Updated Apr 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zacarian, Katherine (2024). Bio-logger Ethogram Benchmark: A benchmark for computational analysis of animal behavior, using animal-borne tags [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7807280
Explore at:
Dataset updated
Apr 19, 2024
Dataset provided by
Mata-Silva, Vicente
DeSantis, Dominic L.
Hoffman, Benjamin
Vehkaoja, Antti
Maekawa, Takuya
Cusimano, Maddie
Jeantet, Lorène
Baglione, Vittorio
Vainio, Outi
Zacarian, Katherine
Canestrari, Daniela
Friedlaender, Ari
Chevallier, Damien
Ladds, Monique A.
Moreno-González, Víctor
Yoda, Ken
Trapote, Eva
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains the datasets and experiment results presented in our arxiv paper:

B. Hoffman, M. Cusimano, V. Baglione, D. Canestrari, D. Chevallier, D. DeSantis, L. Jeantet, M. Ladds, T. Maekawa, V. Mata-Silva, V. Moreno-González, A. Pagano, E. Trapote, O. Vainio, A. Vehkaoja, K. Yoda, K. Zacarian, A. Friedlaender, "A benchmark for computational analysis of animal behavior, using animal-borne tags," 2023.

Standardized code to implement, train, and evaluate models can be found at https://github.com/earthspecies/BEBE/.

Please note the licenses in each dataset folder.

Zip folders beginning with "formatted": These are the datasets we used to run the experiments reported in the benchmark paper.

Zip folders beginning with "raw": These are the unprocessed datasets used in BEBE. Code to process these raw datasets into the formatted ones used by BEBE can be found at https://github.com/earthspecies/BEBE-datasets/.

Zip folders beginning with "experiments": Results of the cross-validation experiments reported in the paper, as well as hyperparameter optimization. Confusion matrices for all experiments can also be found here. Note that dt, rf, and svm refer to the feature set from Nathan et al., 2012.

Results used in Fig. 4 of arxiv paper (deep neural networks vs. classical models){dataset}_ harnet_nogyr{dataset}_CRNN{dataset}_CNN{dataset}_dt{dataset}_rf{dataset}_svm{dataset}_wavelet_dt{dataset}_wavelet_rf{dataset}_wavelet_svm

Results used in Fig. 5D of arxiv paper (full data setting)If dataset contains gyroscope (HAR, jeantet_turtles, vehkaoja_dogs):{dataset}_harnet_nogyr{dataset}_harnet_random_nogyr{dataset}_harnet_unfrozen_nogyr{dataset}_RNN_nogyr{dataset}_CRNN_nogyr{dataset}_rf_nogyrOtherwise:{dataset}_harnet_nogyr{dataset}_harnet_unfrozen_nogyr{dataset}_harnet_random_nogyr{dataset}_RNN_nogyr{dataset}_CRNN{dataset}_rf

Results used in Fig. 5E of arxiv paper (reduced data setting)If dataset contains gyroscope (HAR, jeantet_turtles, vehkaoja_dogs):{dataset}_harnet_low_data_nogyr{dataset}_harnet_random_low_data_nogyr{dataset}_harnet_unfrozen_low_data_nogyr{dataset}_RNN_low_data_nogyr{dataset}_wavelet_RNN_low_data_nogyr{dataset}_CRNN_low_data_nogyr{dataset}_rf_low_data_nogyr

Otherwise:{dataset}_harnet_low_data_nogyr{dataset}_harnet_random_low_data_nogyr{dataset}_harnet_unfrozen_low_data_nogyr{dataset}_RNN_low_data_nogyr{dataset}_wavelet_RNN_low_data_nogyr{dataset}_CRNN_low_data{dataset}_rf_low_data

CSV files: we also include summaries of the experimental results in experiments_summary.csv, experiments_by_fold_individual.csv, experiments_by_fold_behavior.csv.

experiments_summary.csv - results averaged over individuals and behavior classesdataset (str): name of datasetexperiment (str): name of model with experiment setting fig4 (bool): True if dataset+experiment was used in figure 4 of arxiv paperfig5d (bool): True if dataset+experiment was used in figure 5d of arxiv paperfig5e (bool): True if dataset+experiment was used in figure 5e of arxiv paperf1_mean (float): mean of macro-averaged F1 score, averaged over individuals in test foldsf1_std (float): standard deviation of macro-averaged F1 score, computed over individuals in test foldsprec_mean, prec_std (float): analogous for precisionrec_mean, rec_std (float): analogous for recallexperiments_by_fold_individual.csv - results per individual in the test foldsdataset (str): name of datasetexperiment (str): name of model with experiment setting fig4 (bool): True if dataset+experiment was used in figure 4 of arxiv paperfig5d (bool): True if dataset+experiment was used in figure 5d of arxiv paperfig5e (bool): True if dataset+experiment was used in figure 5e of arxiv paperfold (int): test fold indexindividual (int): individuals are numbered zero-indexed, starting from fold 1f1 (float): macro-averaged f1 score for this individualprecision (float): macro-averaged precision for this individualrecall (float): macro-averaged recall for this individual

experiments_by_fold_behavior.csv - results per behavior class, for each test folddataset (str): name of datasetexperiment (str): name of model with experiment setting fig4 (bool): True if dataset+experiment was used in figure 4 of arxiv paperfig5d (bool): True if dataset+experiment was used in figure 5d of arxiv paperfig5e (bool): True if dataset+experiment was used in figure 5e of arxiv paperfold (int): test fold indexbehavior_class (str): name of behavior classf1 (float): f1 score for this behavior, averaged over individuals in the test foldprecision (float): precision for this behavior, averaged over individuals in the test foldrecall (float): recall for this behavior, averaged over individuals in the test foldtrain_ground_truth_label_counts (int): number of timepoints labeled with this behavior class, in the training set
Additional file 3: of OMeta: an ontology-based, data-driven metadata...
springernature.figshare.com
txt
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Indresh Singh; Mehmet Kuscuoglu; Derek Harkins; Granger Sutton; Derrick Fouts; Karen Nelson (2023). Additional file 3: of OMeta: an ontology-based, data-driven metadata tracking system [Dataset]. http://doi.org/10.6084/m9.figshare.7552592.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7552592.v1
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Indresh Singh; Mehmet Kuscuoglu; Derek Harkins; Granger Sutton; Derrick Fouts; Karen Nelson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MixS human Gut Data Standard ProjectSetup. Example CSV file for setting up project registration and update events for MixS human Gut Data Standard. (CSV 7 kb)

CODE-test: An annotated 12-lead ECG dataset

zenodo.org
data.niaid.nih.gov

zip

Updated Jun 7, 2021

Facebook

Twitter

Click to copy link

Link copied

Cite

Antonio H Ribeiro; Antonio H Ribeiro; Manoel Horta Ribeiro; Manoel Horta Ribeiro; Gabriela M. Paixão; Gabriela M. Paixão; Derick M. Oliveira; Derick M. Oliveira; Paulo R. Gomes; Paulo R. Gomes; Jéssica A. Canazart; Jéssica A. Canazart; Milton P. Ferreira; Milton P. Ferreira; Carl R. Andersson; Carl R. Andersson; Peter W. Macfarlane; Peter W. Macfarlane; Wagner Meira Jr.; Wagner Meira Jr.; Thomas B. Schön; Thomas B. Schön; Antonio Luiz P. Ribeiro; Antonio Luiz P. Ribeiro (2021). CODE-test: An annotated 12-lead ECG dataset [Dataset]. http://doi.org/10.5281/zenodo.3765780

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.3765780

Dataset updated

Jun 7, 2021

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

# Annotated 12 lead ECG dataset

Contain 827 ECG tracings from different patients, annotated by several cardiologists, residents and medical students. It is used as test set on the paper: "Automatic diagnosis of the 12-lead ECG using a deep neural network". https://www.nature.com/articles/s41467-020-15432-4.

It contain annotations about 6 different ECGs abnormalities:
- 1st degree AV block (1dAVb);
- right bundle branch block (RBBB);
- left bundle branch block (LBBB);
- sinus bradycardia (SB);
- atrial fibrillation (AF); and,
- sinus tachycardia (ST).

Companion python scripts are available in:
https://github.com/antonior92/automatic-ecg-diagnosis

--------

Citation
```
Ribeiro, A.H., Ribeiro, M.H., Paixão, G.M.M. et al. Automatic diagnosis of the 12-lead ECG using a deep neural network. Nat Commun 11, 1760 (2020). https://doi.org/10.1038/s41467-020-15432-4
```

Bibtex:
```
@article{ribeiro_automatic_2020,
 title = {Automatic Diagnosis of the 12-Lead {{ECG}} Using a Deep Neural Network},
 author = {Ribeiro, Ant{\^o}nio H. and Ribeiro, Manoel Horta and Paix{\~a}o, Gabriela M. M. and Oliveira, Derick M. and Gomes, Paulo R. and Canazart, J{\'e}ssica A. and Ferreira, Milton P. S. and Andersson, Carl R. and Macfarlane, Peter W. and Meira Jr., Wagner and Sch{\"o}n, Thomas B. and Ribeiro, Antonio Luiz P.},
 year = {2020},
 volume = {11},
 pages = {1760},
 doi = {https://doi.org/10.1038/s41467-020-15432-4},
 journal = {Nature Communications},
 number = {1}
}
```
-----


## Folder content:

- `ecg_tracings.hdf5`: The HDF5 file containing a single dataset named `tracings`. This dataset is a `(827, 4096, 12)` tensor. The first dimension correspond to the 827 different exams from different patients; the second dimension correspond to the 4096 signal samples; the third dimension to the 12 different leads of the ECG exams in the following order: `{DI, DII, DIII, AVR, AVL, AVF, V1, V2, V3, V4, V5, V6}`. 

The signals are sampled at 400 Hz. Some signals originally have a duration of 10 seconds (10 * 400 = 4000 samples) and others of 7 seconds (7 * 400 = 2800 samples). In order to make them all have the same size (4096 samples) we fill them with zeros on both sizes. For instance, for a 7 seconds ECG signal with 2800 samples we include 648 samples at the beginning and 648 samples at the end, yielding 4096 samples that are them saved in the hdf5 dataset. All signal are represented as floating point numbers at the scale 1e-4V: so it should be multiplied by 1000 in order to obtain the signals in V.

In python, one can read this file using the following sequence:
```python
import h5py
with h5py.File(args.tracings, "r") as f:
  x = np.array(f['tracings'])
```

- The file `attributes.csv` contain basic patient attributes: sex (M or F) and age. It
contain 827 lines (plus the header). The i-th tracing in `ecg_tracings.hdf5` correspond to the i-th line.
- `annotations/`: folder containing annotations csv format. Each csv file contain 827 lines (plus the header). The i-th line correspond to the i-th tracing in `ecg_tracings.hdf5` correspond to the in all csv files. The csv files all have 6 columns `1dAVb, RBBB, LBBB, SB, AF, ST`
corresponding to weather the annotator have detect the abnormality in the ECG (`=1`) or not (`=0`).
 1. `cardiologist[1,2].csv` contain annotations from two different cardiologist.
 2. `gold_standard.csv` gold standard annotation for this test dataset. When the cardiologist 1 and cardiologist 2 agree, the common diagnosis was considered as gold standard. In cases where there was any disagreement, a third senior specialist, aware of the annotations from the other two, decided the diagnosis. 
 3. `dnn.csv` prediction from the deep neural network described in the paper. THe threshold is set in such way it maximizes the F1 score.
 4. `cardiology_residents.csv` annotations from two 4th year cardiology residents (each annotated half of the dataset).
 5. `emergency_residents.csv` annotations from two 3rd year emergency residents (each annotated half of the dataset).
 6. `medical_students.csv` annotations from two 5th year medical students (each annotated half of the dataset).

Facebook

Twitter

Click to copy link

Link copied

Cite

Terri Velliquette; Jessica Welch; Michael Crow; Ranjeet Devarakonda; Susan Heinz; Robert Crystal-Ornelas (2023). ESS-DIVE Reporting Format for Comma-separated Values (CSV) File Structure [Dataset]. https://dataone.org/datasets/ess-dive-2d07e9e9b2bb3f3-20230504T212247921492

Data from: ESS-DIVE Reporting Format for Comma-separated Values (CSV) File Structure

Explore at:

Dataset updated

May 4, 2023

Dataset provided by

ESS-DIVE

Authors

Terri Velliquette; Jessica Welch; Michael Crow; Ranjeet Devarakonda; Susan Heinz; Robert Crystal-Ornelas

Time period covered

Jan 1, 2020 - Sep 30, 2021

Description

The ESS-DIVE reporting format for Comma-separated Values (CSV) file structure is based on a combination of existing guidelines and recommendations including some found within the Earth Science Community with valuable input from the Environmental Systems Science (ESS) Community. The CSV reporting format is designed to promote interoperability and machine-readability of CSV data files while also facilitating the collection of some file-level metadata content. Tabular data in the form of rows and columns should be archived in its simplest form, and we recommend submitting these tabular data following the ESS-DIVE reporting format for generic comma-separated values (CSV) text format files. In general, the CSV file format is more likely accessible by future systems when compared to a proprietary format and CSV files are preferred because this format is easier to exchange between different programs increasing the interoperability of a data file. By defining the reporting format and providing guidelines for how to structure CSV files and some field content within, this can increase the machine-readability of the data file for extracting, compiling, and comparing the data across files and systems. Data package files are in .csv, .png, and .md. Open the .csv with e.g. Microsoft Excel, LibreOffice, or Google Sheets. Open the .md files by downloading and using a text editor (e.g., notepad or TextEdit). Open the .png in e.g. a web browser, photo viewer/editor, or Google Drive.

Clear search

Close search

Google apps

Main menu

Data from: ESS-DIVE Reporting Format for Comma-separated Values (CSV) File...

Tidal Daily Discharge and Quality Assurance Data Supporting an Assessment of...

Data from: "A guide to using GitHub for developing and versioning data...

myhealthlondon Indicators

Dataset metadata of known Dataverse installations

Data from: Dissolved Inorganic Carbon and Dissolved Organic Carbon Data for...

Data articles in journals

Annotated 12 lead ECG dataset

A Comprehensive Surface Water Quality Monitoring Dataset (1940-2023):...

Observations of carbon dioxide (CO2), methane (CH4), and carbon monoxide...

exp 5.csv

MOT testing data for Great Britain

About this data set

MOT test results by class

MOT test results by class of vehicle

Initial failures by defect category

MOT class 1 and 2 vehicles: initial failures by defect category

MOT class 3 and 4 vehicles: initial failures by defect category</h3

Article 6 of Noise Control Standards: Noise Control Standard Values for...

nzqa_exam_questions_contextual_population_parameter_definitions.csv -...

Data from: ESS-DIVE Reporting Format for File-level Metadata

Standards that were repealed in 2017

Standard observation data of the equatorial troposphere and lower...

Data from: Bio-logger Ethogram Benchmark: A benchmark for computational...

Additional file 3: of OMeta: an ontology-based, data-driven metadata...

CODE-test: An annotated 12-lead ECG dataset

Data from: ESS-DIVE Reporting Format for Comma-separated Values (CSV) File Structure