15 datasets found

m
An Extensive Dataset for the Heart Disease Classification System
data.mendeley.com
Updated Feb 15, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sozan S. Maghdid (2022). An Extensive Dataset for the Heart Disease Classification System [Dataset]. http://doi.org/10.17632/65gxgy2nmg.1
Explore at:
Unique identifier
https://doi.org/10.17632/65gxgy2nmg.1
Dataset updated
Feb 15, 2022
Authors
Sozan S. Maghdid
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Finding a good data source is the first step toward creating a database. Cardiovascular illnesses (CVDs) are the major cause of death worldwide. CVDs include coronary heart disease, cerebrovascular disease, rheumatic heart disease, and other heart and blood vessel problems. According to the World Health Organization, 17.9 million people die each year. Heart attacks and strokes account for more than four out of every five CVD deaths, with one-third of these deaths occurring before the age of 70 A comprehensive database for factors that contribute to a heart attack has been constructed , The main purpose here is to collect characteristics of Heart Attack or factors that contribute to it. As a result, a form is created to accomplish this. Microsoft Excel was used to create this form. Figure 1 depicts the form which It has nine fields, where eight fields for input fields and one field for output field. Age, gender, heart rate, systolic BP, diastolic BP, blood sugar, CK-MB, and Test-Troponin are representing the input fields, while the output field pertains to the presence of heart attack, which is divided into two categories (negative and positive).negative refers to the absence of a heart attack, while positive refers to the presence of a heart attack.Table 1 show the detailed information and max and min of values attributes for 1319 cases in the whole database.To confirm the validity of this data, we looked at the patient files in the hospital archive and compared them with the data stored in the laboratories system. On the other hand, we interviewed the patients and specialized doctors. Table 2 is a sample for 1320 cases, which shows 44 cases and the factors that lead to a heart attack in the whole database,After collecting this data, we checked the data if it has null values (invalid values) or if there was an error during data collection. The value is null if it is unknown. Null values necessitate special treatment. This value is used to indicate that the target isn’t a valid data element. When trying to retrieve data that isn't present, you can come across the keyword null in Processing. If you try to do arithmetic operations on a numeric column with one or more null values, the outcome will be null. An example of a null values processing is shown in Figure 2.The data used in this investigation were scaled between 0 and 1 to guarantee that all inputs and outputs received equal attention and to eliminate their dimensionality. Prior to the use of AI models, data normalization has two major advantages. The first is to avoid overshadowing qualities in smaller numeric ranges by employing attributes in larger numeric ranges. The second goal is to avoid any numerical problems throughout the process.After completion of the normalization process, we split the data set into two parts - training and test sets. In the test, we have utilized1060 for train 259 for testing Using the input and output variables, modeling was implemented.
d
Data and Code for: \"Universal Adaptive Normalization Scale (AMIS):...
search.dataone.org
dataverse.harvard.edu
Updated Nov 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kravtsov, Gennady (2025). Data and Code for: \"Universal Adaptive Normalization Scale (AMIS): Integration of Heterogeneous Metrics into a Unified System\" [Dataset]. http://doi.org/10.7910/DVN/BISM0N
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/BISM0N
Dataset updated
Nov 15, 2025
Dataset provided by
Harvard Dataverse
Authors
Kravtsov, Gennady
Description
Dataset Title: Data and Code for: "Universal Adaptive Normalization Scale (AMIS): Integration of Heterogeneous Metrics into a Unified System" Description: This dataset contains source data and processing results for validating the Adaptive Multi-Interval Scale (AMIS) normalization method. Includes educational performance data (student grades), economic statistics (World Bank GDP), and Python implementation of the AMIS algorithm with graphical interface. Contents: - Source data: educational grades and GDP statistics - AMIS normalization results (3, 5, 9, 17-point models) - Comparative analysis with linear normalization - Ready-to-use Python code for data processing Applications: - Educational data normalization and analysis - Economic indicators comparison - Development of unified metric systems - Methodology research in data scaling Technical info: Python code with pandas, numpy, scipy, matplotlib dependencies. Data in Excel format.
d
Data from: A systematic evaluation of normalization methods and probe...
datadryad.org
data.niaid.nih.gov
+2more
zip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
H. Welsh; C. M. P. F. Batalha; W. Li; K. L. Mpye; N. C. Souza-Pinto; M. S. Naslavsky; E. J. Parra (2023). A systematic evaluation of normalization methods and probe replicability using infinium EPIC methylation data [Dataset]. http://doi.org/10.5061/dryad.cnp5hqc7v
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.cnp5hqc7v
Dataset updated
May 30, 2023
Dataset provided by
Dryad
Authors
H. Welsh; C. M. P. F. Batalha; W. Li; K. L. Mpye; N. C. Souza-Pinto; M. S. Naslavsky; E. J. Parra
Time period covered
Apr 11, 2023
Description
We provide data on an Excel file, with absolute differences in beta values between replicate samples for each probe provided in different tabs for raw data and different normalization methods.
m
Spatial segmented protein expression profiles from IBD and IBS-associated...
data.mendeley.com
Updated Jan 10, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffrey Robinson (2020). Spatial segmented protein expression profiles from IBD and IBS-associated colonic biopsy FFPE sections using Nanostring GeoMx/DSP [Dataset]. http://doi.org/10.17632/kytd2vdkms.1
Explore at:
Unique identifier
https://doi.org/10.17632/kytd2vdkms.1
Dataset updated
Jan 10, 2020
Authors
Jeffrey Robinson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains results from Nanostring Digital Spatial Profiling (DSP, trade name is now GeoMx) experiments using colonic punch biopsy FFPE thin sections from IBD and IBS patients. The multiplex probe panel includes barcode-linked antibodies against 26 immune-oncology relevant proteins and 4 reference/normalization proteins.

The IF labeling strategy included Pan-cytokeratin, Tryptase, and DAPI staining for epithelia, mast cells, and sub-mucosal tissues, respectively. 21 FFPE sections were used, representing 19 individuals. 14 pediatric samples included 8 IBD, 5 IBS, and 1 recurring abdominal pain diagnoses. 7 adult samples were studied - 2 normal tissue biopsies from a single healthy control, 3 X-linked Severe Combined Immuno Deficiency (XSCID) samples from 2 individuals, 1 graft-versus-host disease, and 1 eosinophilic gastroenteritis sample. 8 representative ROIs per slide were selected, with a 9th ROI selected representing a lymphoid aggregate where present. Each of the ROIs contained the three masks (PanCK/epithelia, Tryptase/Mast cell, Dapi/submucosa), and therefore generated 24 individual 30-plex protein expression profiles per slide, with a 25th lymphoid ROI per sample (when present).

The data include: 1) Matrix of metadata with sample identifiers and clinical diagnoses (Excel file). 2) A PowerPoint for each sample showing an image of the full slide, images of each selected ROI and QC expression data. 3) An Excel file for each sample containing raw and normalized protein counts. Three normalization methods are reported: a) Normalization by nuclei count, b) Normalization by tissue area, c) Normalization by housekeeping proteins (Histone H3, Ribosomal protein S6).

Analysis derived from these data have been published in two conference proceedings (see references below)
Additional file 2 of On the optimistic performance evaluation of newly...
figshare.com
datasetcatalog.nlm.nih.gov
+1more
xlsx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stefan Buchka; Alexander Hapfelmeier; Paul P. Gardner; Rory Wilson; Anne-Laure Boulesteix (2023). Additional file 2 of On the optimistic performance evaluation of newly introduced bioinformatic methods [Dataset]. http://doi.org/10.6084/m9.figshare.14576336.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14576336.v1
Dataset updated
May 31, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Stefan Buchka; Alexander Hapfelmeier; Paul P. Gardner; Rory Wilson; Anne-Laure Boulesteix
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 2 Data set (excel file). The excel data file data_set_of_extracted_data_Buchka_et_al.xlsx contains the data from our bibliographical survey.
Amazon Financial Dataset
kaggle.com
zip
Updated Dec 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Krishna Yadu (2024). Amazon Financial Dataset [Dataset]. https://www.kaggle.com/datasets/krishnayadav456wrsty/amazon-financial-dataset
Explore at:
zip(7415 bytes)Available download formats
Dataset updated
Dec 18, 2024
Authors
Krishna Yadu
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Title:

Amazon Financial Dataset: R&D, Marketing, Campaigns, and Profit

Description:

This dataset provides fictional yet insightful financial data of Amazon's business activities across all 50 states of the USA. It is specifically designed to help students, researchers, and practitioners perform various data analysis tasks such as log normalization, Gaussian distribution visualization, and financial performance comparisons.

Each row represents a state and contains the following columns:
- R&D Amount (in $): The investment made in research and development.
- Marketing Amount (in $): The expenditure on marketing activities.
- Campaign Amount (in $): The costs associated with promotional campaigns.
- State: The state in which the data is recorded.
- Profit (in $): The net profit generated from the state.

Additional features include log-normalized and Z-score transformations for advanced analysis.

Use Cases:

This dataset is ideal for practicing:
1. Log Transformation: Normalize skewed data for better modeling and analysis.
2. Statistical Analysis: Explore relationships between financial investments and profit.
3. Visualization: Create compelling graphs such as Gaussian distributions and standard normal distributions.
4. Machine Learning Projects: Build regression models to predict profits based on R&D and marketing spend.

File Information:

File Format: Excel (.xlsx)

Number of Records: 50 (one for each state of the USA)

Columns: 5 primary financial columns and additional preprocessed columns for normalization and Z-scores.

Important Note:

This dataset is synthetically generated and is not based on actual Amazon financial records. It is created solely for educational and practice purposes.

Tags:

Financial Analysis

Data Visualization

Machine Learning

Statistical Analysis

Educational Dataset
m
Data for Knowledge gaps in Latin America and the Caribbean and economic...
data.mendeley.com
Updated Oct 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pablo Jarrin (2020). Data for Knowledge gaps in Latin America and the Caribbean and economic development [Dataset]. http://doi.org/10.17632/5j28czhtb7.1
Explore at:
Unique identifier
https://doi.org/10.17632/5j28czhtb7.1
Dataset updated
Oct 1, 2020
Authors
Pablo Jarrin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Latin America, Caribbean
Description
We provide the data used for this research in both Excel (one file with one matrix per sheet, 'Allmatrices.xlsx'), and CSV (one file per matrix).

Patent applications (Patent_applications.csv) Patent applications from residents and no residents per million inhabitants. Data obtained from the World Development Indicators database (World Bank 2020). Normalization by the number of inhabitants was made by the authors.

High-tech exports (High-tech_exports.csv) The proportion of exports of high-level technology manufactures from total exports by technology intensity, obtained from the Trade Structure by Partner, Product or Service-Category database (Lall, 2000; UNCTAD, 2019)

Expenditure on education (Expenditure_on_education.csv) Per capita government expenditure on education, total (2010 US$). The data was obtained from the government expenditure on education (total % of GDP), GDP (constant 2010 US$), and population indicators of the World Development Indicators database (World Bank 2020). Normalization by the number of inhabitants was made by the authors.

Scientific publications (Scientific_publications.csv) Scientific and technical journal articles per million inhabitants. The data were obtained from the scientific and technical journal articles and population indicators of the World Development Indicators database (World Bank 2020). Normalization by the number of inhabitants was made by the authors.

Expenditure on R&D (Expenditure_on_R&D.csv) Expenditure on research and development. Data obtained from the research and development expenditure (% of GDP), GDP (constant 2010 US$), and population indicators of the World Development Indicators database (World Bank 2020). Normalization by the number of inhabitants was made by the authors.

Two centuries of GDP (GDP_two_centuries.csv) GDP per capita that accounts for inflation. Data obtained from the Maddison Project Database, version 2018 (Inklaar et al. 2018), and available from the Open Numbers community (open-numbers.github.io).

Inklaar, R., de Jong, H., Bolt, J., & van Zanden, J. (2018). Rebasing “Maddison”: new income comparisons and the shape of long-run economic development (GD-174; GGDC Research Memorandum). https://www.rug.nl/research/portal/files/53088705/gd174.pdf

Lall, S. (2000). The Technological Structure and Performance of Developing Country Manufactured Exports, 1985‐98. Oxford Development Studies, 28(3), 337–369. https://doi.org/10.1080/713688318

Unctad. 2019. “Trade Structure by Partner, Product or Service-Category.” 2019. https://unctadstat.unctad.org/EN/.

World Bank. (2020). World Development Indicators. https://databank.worldbank.org/source/world-development-indicators
Data articles in journals
zenodo.org
bin, csv, txt
Updated Sep 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carlota Balsa-Sanchez; Carlota Balsa-Sanchez; Vanesa Loureiro; Vanesa Loureiro (2023). Data articles in journals [Dataset]. http://doi.org/10.5281/zenodo.7458466
Explore at:
bin, txt, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7458466
Dataset updated
Sep 21, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Carlota Balsa-Sanchez; Carlota Balsa-Sanchez; Vanesa Loureiro; Vanesa Loureiro
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Last Version: 4

Authors: Carlota Balsa-Sánchez, Vanesa Loureiro

Date of data collection: 2022/12/15

General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
File list:

- data_articles_journal_list_v4.xlsx: full list of 140 academic journals in which data papers or/and software papers could be published
- data_articles_journal_list_v4.csv: full list of 140 academic journals in which data papers or/and software papers could be published

Relationship between files: both files have the same information. Two different formats are offered to improve reuse

Type of version of the dataset: final processed version

Versions of the files: 4th version
- Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types
- Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Journal Citation Reports (JCR) and/or Scimago Journal and Country Rank (SJR), Scopus and Web of Science (WOS), Journal Master List.

Version: 3

Authors: Carlota Balsa-Sánchez, Vanesa Loureiro

Date of data collection: 2022/10/28

General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
File list:

- data_articles_journal_list_v3.xlsx: full list of 124 academic journals in which data papers or/and software papers could be published
- data_articles_journal_list_3.csv: full list of 124 academic journals in which data papers or/and software papers could be published

Relationship between files: both files have the same information. Two different formats are offered to improve reuse

Type of version of the dataset: final processed version

Versions of the files: 3rd version
- Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types
- Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Journal Citation Reports (JCR) and/or Scimago Journal and Country Rank (SJR).

Erratum - Data articles in journals Version 3:

Botanical Studies -- ISSN 1999-3110 -- JCR (JIF) Q2
Data -- ISSN 2306-5729 -- JCR (JIF) n/a
Data in Brief -- ISSN 2352-3409 -- JCR (JIF) n/a

Version: 2

Author: Francisco Rubio, Universitat Politècnia de València.

Date of data collection: 2020/06/23

General description: The publication of datasets according to the FAIR principles, could be reached publishing a data paper (or software paper) in data journals or in academic standard journals. The excel and CSV file contains a list of academic journals that publish data papers and software papers.
File list:

- data_articles_journal_list_v2.xlsx: full list of 56 academic journals in which data papers or/and software papers could be published
- data_articles_journal_list_v2.csv: full list of 56 academic journals in which data papers or/and software papers could be published

Relationship between files: both files have the same information. Two different formats are offered to improve reuse

Type of version of the dataset: final processed version

Versions of the files: 2nd version
- Information updated: number of journals, URL, document types associated to a specific journal, publishers normalization and simplification of document types
- Information added : listed in the Directory of Open Access Journals (DOAJ), indexed in Web of Science (WOS) and quartile in Scimago Journal and Country Rank (SJR)

Total size: 32 KB

Version 1: Description

This dataset contains a list of journals that publish data articles, code, software articles and database articles.

The search strategy in DOAJ and Ulrichsweb was the search for the word data in the title of the journals.
Acknowledgements:
Xaquín Lores Torres for his invaluable help in preparing this dataset.
m
The Notched Roller Test (NRT): Effective Volumes and Effective Surfaces for...
data.mendeley.com
Updated Oct 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maximilian Staudacher (2025). The Notched Roller Test (NRT): Effective Volumes and Effective Surfaces for Weibull Strength Scaling [Dataset]. http://doi.org/10.17632/pdxy669fvk.1
Explore at:
Unique identifier
https://doi.org/10.17632/pdxy669fvk.1
Dataset updated
Oct 27, 2025
Authors
Maximilian Staudacher
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In order to compare strength testing results of ceramic specimens obtained through different testing methods, the knowledge of the effective surface or effective volume is essential.

In this repository, data to determine the maximum tensile stress, the effective surface and effective volume for the "Notched Roller Test", described in [https://doi.org/10.1016/j.jeurceramsoc.2014.02.009], is given. The relevant geometrical and material parameters to determine the effective surface or effective volume are:

-Roller diameter D -Roller length H -Roller chamfering radius rf -Notch length l -Notch width w -Notch root radius rn -Poisson's ratio v -Weibull modulus m

The data is available within:

1 <= H/D <= 3 0 <= rf/D <= 0.05 0.74 <= l/D <= 0.9 0.05 <= w/D <= 0.2 0 <= rn/w <= 0.5 0.1 <= v <= 0.4 1 <= m <=50

Based on the data for stress interpolation, the maximum tensile stress can be determined from an interpolation of "finter" and the relevant geometrical properties (see equation 1 in the paper cited above). The normalized effective surface or effective volume can be determined through interpolation of the Seff and Veff data of this repository in the same way. The normalization volume Vnorm and normalization surface Snorm are given through the volume (= Pi*H*(D/2)^2) and surface (= Pi*H*D + 2*Pi*(D/2)^2) of the roller, respectively. To aid evaluation, interpolation files in Python, Excel and Mathematica are also provided in this repository.

Additional information:

-Data-files (.csv,.tsv,.xlsx)

The structure of the data in each file for stress evaluation is as follows:

H/D || rf/D || l/D || w/D || rn/w || v || finter

All files provided follow this convention, and the permutation follows v -> rn/w -> w/D -> l/D -> rf/D -> H/D

The structure of the data in each file for the evaluation of Veff and Seff is as follows:

H/D || rf/D || l/D || w/D || rn/w || v || m || Veff/Vnorm || Seff/Snorm

All files provided follow this convention, and the permutation follows m -> v -> rn/w -> w/D -> l/D -> rf/D -> H/D

-Interpolation files (.xlsx,.py,.nb)

The Interpolation implemented in the Excel-file is linear, while the others are cubic. The results from Python- and Mathematica-files vary slightly.

Excel-file:

Entering the specimen geometry and material parameters will automatically adjust the values for the maximum tensile stress and all effective quantities.

Python-file:

The .csv-files have to be in the same directory as the script. Running the script opens prompts in the command line to enter the specimen geometry and material parameters. Results for the maximum tensile stress and all effective quantities are given.

Mathematica-file:

The .csv-files have to be in the same directory as the script. The rows marked in red represent the input-lines for the specimen geometry and material parameters. Afterwards, results for the maximum tensile stress and all effective quantities are given in lines highlighted in green.
Data from: Blood proteome profiling using aptamer-based technology for...
figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrey Shubin; Branislav Kollar; Simon Dillon; Bohdan Pomahac; Towia A. Libermann; Leonardo V. Riella (2023). Blood proteome profiling using aptamer-based technology for rejection biomarker discovery in transplantation [Dataset]. http://doi.org/10.6084/m9.figshare.7991924.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7991924.v1
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Andrey Shubin; Branislav Kollar; Simon Dillon; Bohdan Pomahac; Towia A. Libermann; Leonardo V. Riella
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this study, blood proteome characterization in face transplantation using longitudinal serum samples from six face transplant patients was carried out with SOMAscan platform. Overall, 24 serum samples from 13 no-rejection, 5 nonsevere rejection and 6 severe rejection episodes were analyzed.Files attached:HMS-16-007.20160218.adat - raw SomaScan dataset presented in adat format.HMS-16-007_SQS_20160218.pdf - technical validation report on the dataset.HMS-16-007.HybNorm.20160218.adat - SomaScan dataset after hybridization control normalization presented in adat format.HMS-16-007.HybNorm.MedNorm.20160218.adat - SomaScan dataset after hybridization control normalization and median signal normalization presented in adat format.HMS-16-007.HybNorm.MedNorm.Cal.20160218.adat - SomaScan dataset after hybridization control normalization, median signal normalization, and calibration presented in adat format.HMS-16-007.HybNorm.MedNorm.Cal.20160218.xls - SomaScan dataset after hybridization control normalization, median signal normalization, and calibration presented in Microsoft Excel Spreadsheet format.Patients_metadata.txt – metadata file containing patients’ demographic and clinical information presented in tab-delimited text format. Metadata is linked to records in the SomaScan dataset via ‘SampleType’ column.SciData_R_script.R – this script is given as an example of a downstream statistical analysis of the HMS-16-007.HybNorm.MedNorm.Cal.20160218.adat dataset.SciData_R_script_SessionInfo - Session information for SciData_R_script.R script.
Human-in-the-Loop Decision Support in Process Control Room Dataset
zenodo.org
zip
Updated Mar 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ammar N. Abbas; Winniewelsh; Ammar N. Abbas; Winniewelsh (2024). Human-in-the-Loop Decision Support in Process Control Room Dataset [Dataset]. http://doi.org/10.5281/zenodo.10695810
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10695810
Dataset updated
Mar 8, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ammar N. Abbas; Winniewelsh; Ammar N. Abbas; Winniewelsh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview

This repository contains a comprehensive dataset to assess cognitive states, workload, situational awareness, stress, and performance in human-in-the-loop process control rooms. The dataset includes objective and subjective measures from various data collection tools such as NASA-TLX, SART, eye tracking, EEG, Health Monitoring Watch, surveys, and think-aloud situational awareness assessments. It is based on an experimental study of a formaldehyde production plant based on participants' interactions in a controlled control room experimental setting.

Purpose

The study compared three different setups of human system interfaces in four human-in-the-loop (HITL) configurations, incorporating two alarm design formats (Prioritised vs non-prioritised) and three procedural guidance setups (e.g. one presenting paper procedures, one offering digitised screen-based procedures, and lastly an AI-based procedural guidance system).

Key Features

Subject Area: Chemical Engineering, Control and Safety Engineering, Human Factors and Ergonomics, Human-Computer Interaction, and Artificial Intelligence

Data Format: Raw, Analyzed, Filtered

Type of Data: CSV File (.csv), Matlab File (.mat), Excel (.xlsx), Table

Data Collection: The dataset contains behavioural, cognitive, and performance data from 92 participants, including system data under each participant from three scenarios, each simulating a typical control room monitoring, alarm handling, planning, and intervention tasks and subtasks. The participants consented to participate on the test day, after which the researchers trained them. They performed tasks under three scenarios, each lasting 15 - 18 minutes. During these tests, the participant wore a watch for health monitoring, including an eye tracker. They were asked situational awareness questions based on the SPAM methodology at specific periods within 15 minutes, especially at the 6th, 8th, and 12th minutes. These questions assessed the three levels of situational awareness: perception, comprehension, and projection. This feedback collection process on situational awareness differed for one of the groups that used an AI-based decision support system. The question for this group was asked right after specific actions. Therefore, for the overall study, the following performance-shaping factors are considered: type of decision support system (alarm display design, procedure format, AI support, interface design), communication, situational awareness, cognitive workload, experience/training, task complexity, and stress. In both cases, communication was excluded as a factor considered in the first and second scenarios based on this absence. The data collected was normalized using the Min-Max normalization.

Potential Applications

The dataset provides an opportunity for various applications, including:

Developing human performance models and process safety models

Developing a digital twin simulating human-machine interaction in process control rooms

Optimizing human-AI interaction in safety-critical industries

Qualifying and quantifying the performance and effectiveness of AI-enhanced decision support systems incorporating Deep Reinforcement Learning (DRL) using a Specialized Reinforcement Learning Agent (SRLA) framework

Validating proposed solutions for the industry

Usage

The dataset is instrumental for researchers, decision-makers, system engineers, human factor engineers, and teams developing guidelines and standards. It is also applicable for validating proposed solutions for the industry and for researchers in similar or close domains.

Data Structure

The concatenated Excel file for the dataset may include the following detailed data:

Demographic and Educational Background Data:

Participant Identifier: A unique alphanumeric code assigned to each participant for anonymity and tracking purposes.

Age: The age of each participant at the time of the experiment.

Gender: The gender of each participant, typically categorized as male, female, or other.

Educational Background: Details of participants' academic qualifications, including degree type (e.g., Masters, PhD), year of study, and field of study (e.g., Chemical Engineering, IT).

Dominant Hand: Information on whether participants are right or left-handed, which could influence their interaction with the simulation interface.

Familiarity with Industry and Control Room: Self-reported familiarity levels with the industry in general and control room environments specifically, on a scale from 1 to 5.

SPAM Metrics:

Participant Identifier: Unique codes for participants (e.g., P04, P06), maintaining anonymity while allowing for individual analysis.

Group Assignment: Indicates the experimental group (e.g., G4, G3, G2, G1) to which participants belonged, reflecting different levels of decision support in the simulation.

Scenario Engagement: Identifies the specific scenarios (e.g., S1, S2, S3) each participant encountered, representing diverse challenges within the control room simulation.

SPAM Metrics: Participant ratings across three dimensions of the SPAM questionnaire - Perception, Understanding, and Projection, on a scale typically from 1 to 5.

SPAM Index: Composite scores derived from the SPAM, indicating overall situation awareness levels experienced by participants. Calculated as the average of the score on perception, understanding and projection.

NASA-TLX Responses:

Participant Identifier: A unique alphanumeric code assigned to each participant for anonymity and tracking purposes.

Group Assignment: Indicates the experimental group (e.g., G1) to which participants were assigned, reflecting different levels of decision support in the simulation.

TLX Ratings: Participants' responses utilizing the NASA Task Load Index (NASA TLX) questionnaire, providing insights into the cognitive, physical, and emotional workload experienced by operators in simulated control room scenarios.

TLX Index: Composite scores derived from the NASA TLX, representing the overall workload experienced by the participant, calculated as an average of the ratings across the six dimensions.

SART Data:

Participant Identifier: Unique codes for participants (e.g., P04, P06), maintaining anonymity while allowing for individual analysis.

Group Assignment: Indicates the experimental group (e.g., G1) to which participants belonged, reflecting different levels of decision support in the simulation.

SART Metrics: Participants' responses to the Situation Awareness Rating Technique (SART) questionnaire, capturing metrics reflecting the participants' situation awareness. It is calculated using the equation U - (D - S). Situation Understanding (U) comprises Information Quantity, Information Quality, and Familiarity. Situation demand (D) includes the situation's Instability, Complexity, and Variability. At the same time, the Supply of attentional resources (S) comprises Arousal, Concentration, Division of Attention, and Spare Capacity.

AI Decision Support System Feedback:

Participant Identifier: A unique alphanumeric code assigned to each participant for anonymity and tracking purposes.

AI System Ratings: Participants' feedback and ratings across different aspects of the AI decision support system, such as support, explainability, and trust, providing insights into the system's perceived strengths and areas for improvement.

Workload Impact Data: Information on the workload impact and the balance between AI benefits and additional workload, offering valuable perspectives on the practicality and efficiency of integrating AI systems in control room operations.

DRL (Deep Reinforcement Learning) Role: Emphasis on the importance of validating AI recommendations and the role of Deep Reinforcement Learning (DRL) in enhancing trust.

Performance Metrics:

Participant Identifier: A unique alphanumeric code assigned to each participant for anonymity and tracking purposes.

Scenario Engagement: Details of the specific scenario (e.g., S1, S2, S3) each participant encountered, representing various challenges in the control room environment.

Task-Specific Performance Measures: Data capturing the participants' experiences and performance across different scenarios in a control room simulation, including task-specific performance measures and outcomes related to decision-making processes in safety- critical environments.

This detailed breakdown provides a comprehensive view of the specific data elements that could be included in the concatenated Excel file, allowing for thorough analysis and exploration of the participants' experiences, cognitive states, workload, and decision-making processes in control room environments.

Citation

Please cite this article and dataset if you use this dataset in your research or publication.

Amazu, C. W., Mietkiewicz, J., Abbas, A. N., Briwa, H., Perez, A. A., Baldissone, G., ... & Leva, M. C. (2024). Experiment Data: Human-in-the-loop Decision Support in Process Control Rooms. Data in Brief, 110170.
t
Mobility Market Indicators and Macroeconomic Indicators for the MaaS Status...
researchdata.tuwien.at
researchdata.tuwien.ac.at
bin
Updated Jun 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tabea Fian; Tabea Fian (2024). Mobility Market Indicators and Macroeconomic Indicators for the MaaS Status Index (MSI) in Austria (2017-2028) [Dataset]. http://doi.org/10.48436/s7xg9-2hh50
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.48436/s7xg9-2hh50
Dataset updated
Jun 25, 2024
Dataset provided by
TU Wien
Authors
Tabea Fian; Tabea Fian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Austria
Description
Dataset description
The datasets include mobility market indicators and macroeconomic indicators for Austria, which were used to calculate the Mobility as a Service (MaaS) Status Index (MSI). The MSI evaluates the readiness and potential for implementing Mobility as a Service (MaaS) in Austria. The datasets cover two distinct periods: 2017-2022 (T1) and 2023-2028 (T2). The indicators include annual revenues, vehicle costs, number of users, market shares, GDP per capita, urbanization rates, and investments in transportation infrastructure, among others.
Context and methodology
Each indicator is represented by the average annual growth rate, a mean value, and a normalized mean value (min-max-normalization) for period T1 and T2. The data were sourced from Statista (2024)
Technical details
The dataset contains two Microsoft Excel files (one for mobility market indicators, one for macroeconomic indicators). Other than Microsoft Excel, there is no additional software needed to investigate the data.
S
Negotiation metaphor translation
scidb.cn
Updated Nov 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dahui Dong; Meng-Lin Chen (2025). Negotiation metaphor translation [Dataset]. http://doi.org/10.57760/sciencedb.31607
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.31607
Dataset updated
Nov 20, 2025
Dataset provided by
Science Data Bank
Authors
Dahui Dong; Meng-Lin Chen
Description
This dataset, titled "Negotiation Metaphor Translation," was created as part of a research project investigating the dynamics of metaphor translation in negotiation contexts. The data was generated through a combination of controlled translation tasks, longitudinal studies, and domain-specific analyses, involving both human translators and computational models.Data Generation and ProcessingThe dataset was compiled from multiple sources:Controlled translation tasks were designed and administered to professional translators, focusing on negotiation-related metaphors in Chinese and English.Longitudinal study data was collected over several translation sessions to capture changes in translation strategies and outcomes over time.Domain-specific analysis involved expert annotation and categorization of metaphors and negotiation strategies.Performance metrics were computed using both manual evaluation and automated scoring methods.All data was anonymized to protect participant privacy. Data processing included normalization of text, removal of personally identifiable information, and standardization of file formats.Temporal and Geographical ScopeThe data was collected between 2022 and 2024, primarily involving participants from Taiwan (Province of China) and English-speaking countries. The temporal resolution varies by file, with some files representing single translation sessions and others aggregating data over multiple sessions.Data Structure and File DescriptionsThe dataset is organized as follows:performance_metrics_summary.csv: Contains summary statistics of translation performance, including accuracy, fluency, and adequacy scores. Columns include Task_ID, Translator_ID, Accuracy, Fluency, Adequacy, and Comments.domain_specific_analysis.csv: Provides detailed analysis of metaphors and negotiation strategies by domain. Columns include Domain, Metaphor_Type, Strategy, Frequency, and Notes.negotiation_dynamics.csv: Records the dynamics of negotiation during translation, such as turn-taking, conflict resolution, and schema adaptation. Columns include Session_ID, Turn_Number, Speaker, Action, and Outcome.longitudinal_study_data.csv: Tracks changes in translation strategies and outcomes over time. Columns include Participant_ID, Session_Date, Strategy_Used, Outcome, and Comments.conflict_resolution_strategies.csv: Lists various strategies used to resolve conflicts in metaphor translation. Columns include Strategy_ID, Description, Effectiveness_Rating, and Example.All files are in CSV format and can be opened with standard spreadsheet software such as Microsoft Excel or LibreOffice Calc.Data SizeEach CSV file ranges from approximately 10 KB to 200 KB, depending on the number of entries.Column Names and UnitsEach file contains a header row with descriptive column names. Units of measurement, where applicable, are indicated in the column names or described in the accompanying README file.Missing DataSome entries may contain missing values, indicated by empty cells. These typically occur when a particular metric or annotation was not applicable or could not be determined for a given instance.Data Quality and Error ReportingData was manually checked for consistency and accuracy. Any known errors or limitations are documented in the README.md file included in the dataset.File Formats and SoftwareAll data files are provided in standard CSV format, compatible with most data analysis and spreadsheet tools. No proprietary or rare file formats are used.├── 📊 experimental_results/│ ├── performance_metrics_summary.csv │ ├── domain_specific_analysis.csv│ ├── negotiation_dynamics.csv│ ├── longitudinal_study_data.csv│ ├── conflict_resolution_strategies.csv| └──Readme.md
Excel S1 - A Model-Based Analysis of Chemical and Temporal Patterns of...
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Clement Kent; Reza Azanchi; Ben Smith; Adrienne Chu; Joel Levine (2023). Excel S1 - A Model-Based Analysis of Chemical and Temporal Patterns of Cuticular Hydrocarbons in Male Drosophila melanogaster [Dataset]. http://doi.org/10.1371/journal.pone.0000962.s001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0000962.s001
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Clement Kent; Reza Azanchi; Ben Smith; Adrienne Chu; Joel Levine
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This spreadsheet implements the FA normalization technique for analyzing a set of male Drosophila cuticular hydrocarbons. It is intended for GC-FID output. Sample data is included. New data can be copied into the file to apply the normalization. (0.07 MB DOC)
Additional file 6: of De novo assembly and characterization of breast cancer...
figshare.com
xlsx
Updated Jun 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vinay Mittal; John McDonald (2023). Additional file 6: of De novo assembly and characterization of breast cancer transcriptomes identifies large numbers of novel fusion-gene transcripts of potential functional significance [Dataset]. http://doi.org/10.6084/m9.figshare.c.3866515_D8.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.c.3866515_D8.v1
Dataset updated
Jun 18, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Vinay Mittal; John McDonald
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Expression (normalized read count) for breast cancer specific 79 fusion-protein and 419 3′-truncated protein transcripts. Expression is the normalized RNA-Seq read counts as estimated using RSEM and followed by upper quartile normalization. File contains expression data for breast cancer specific fusion-protein and 3′-truncated protein transcripts only. The first sheet in the excel file contains the data columns and a key describing the data is on the second excel sheet. (XLSX 33 kb)
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Sozan S. Maghdid (2022). An Extensive Dataset for the Heart Disease Classification System [Dataset]. http://doi.org/10.17632/65gxgy2nmg.1

An Extensive Dataset for the Heart Disease Classification System

Explore at:

Unique identifier

https://doi.org/10.17632/65gxgy2nmg.1

Dataset updated

Feb 15, 2022

Authors

Sozan S. Maghdid

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Finding a good data source is the first step toward creating a database. Cardiovascular illnesses (CVDs) are the major cause of death worldwide. CVDs include coronary heart disease, cerebrovascular disease, rheumatic heart disease, and other heart and blood vessel problems. According to the World Health Organization, 17.9 million people die each year. Heart attacks and strokes account for more than four out of every five CVD deaths, with one-third of these deaths occurring before the age of 70 A comprehensive database for factors that contribute to a heart attack has been constructed , The main purpose here is to collect characteristics of Heart Attack or factors that contribute to it. As a result, a form is created to accomplish this. Microsoft Excel was used to create this form. Figure 1 depicts the form which It has nine fields, where eight fields for input fields and one field for output field. Age, gender, heart rate, systolic BP, diastolic BP, blood sugar, CK-MB, and Test-Troponin are representing the input fields, while the output field pertains to the presence of heart attack, which is divided into two categories (negative and positive).negative refers to the absence of a heart attack, while positive refers to the presence of a heart attack.Table 1 show the detailed information and max and min of values attributes for 1319 cases in the whole database.To confirm the validity of this data, we looked at the patient files in the hospital archive and compared them with the data stored in the laboratories system. On the other hand, we interviewed the patients and specialized doctors. Table 2 is a sample for 1320 cases, which shows 44 cases and the factors that lead to a heart attack in the whole database,After collecting this data, we checked the data if it has null values (invalid values) or if there was an error during data collection. The value is null if it is unknown. Null values necessitate special treatment. This value is used to indicate that the target isn’t a valid data element. When trying to retrieve data that isn't present, you can come across the keyword null in Processing. If you try to do arithmetic operations on a numeric column with one or more null values, the outcome will be null. An example of a null values processing is shown in Figure 2.The data used in this investigation were scaled between 0 and 1 to guarantee that all inputs and outputs received equal attention and to eliminate their dimensionality. Prior to the use of AI models, data normalization has two major advantages. The first is to avoid overshadowing qualities in smaller numeric ranges by employing attributes in larger numeric ranges. The second goal is to avoid any numerical problems throughout the process.After completion of the normalization process, we split the data set into two parts - training and test sets. In the test, we have utilized1060 for train 259 for testing Using the input and output variables, modeling was implemented.

Clear search

Close search

Google apps

Main menu

An Extensive Dataset for the Heart Disease Classification System

Data and Code for: \"Universal Adaptive Normalization Scale (AMIS):...

Data from: A systematic evaluation of normalization methods and probe...

Spatial segmented protein expression profiles from IBD and IBS-associated...

Additional file 2 of On the optimistic performance evaluation of newly...

Amazon Financial Dataset

Title:

Description:

Use Cases:

File Information:

Important Note:

Tags:

Data for Knowledge gaps in Latin America and the Caribbean and economic...

Data articles in journals

The Notched Roller Test (NRT): Effective Volumes and Effective Surfaces for...

Data from: Blood proteome profiling using aptamer-based technology for...

Human-in-the-Loop Decision Support in Process Control Room Dataset

Overview

Purpose

Key Features

Potential Applications

Usage

Data Structure

Citation

Mobility Market Indicators and Macroeconomic Indicators for the MaaS Status...

Dataset description

Context and methodology

Technical details

Negotiation metaphor translation

Excel S1 - A Model-Based Analysis of Chemical and Temporal Patterns of...

Additional file 6: of De novo assembly and characterization of breast cancer...

An Extensive Dataset for the Heart Disease Classification SystemSee More Versions

An Extensive Dataset for the Heart Disease Classification System