Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset was created by Mustafa Ghzi
Released under CC BY-NC-SA 4.0
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Eda_all is a dataset for instance segmentation tasks - it contains All annotations for 1,314 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset accompanies the publication (currently under review):
Villalba-Bravo, R., Grande-Bueno, S., Trujillo-León, A., & Vidal-Verdú, F.
Analysis of EDA signal features under motion artifacts for non-personalized detection of startle events using a smart cane
IEEE SENSORS 2025, Vancouver, Canada.
This dataset includes Electrodermal Activity (EDA) signals collected from seven participants during an experiment in which they walked on a treadmill at a constant speed of 1 km/h while using a smart cane. During the walking task, participants were exposed to auditory startle stimuli designed to elicit stress responses. The smart cane was equipped with a Galvanic Skin Response (GSR) sensor integrated into its handle to continuously record physiological signals in a natural walking context.
The data is organized by participant. All participants provided written informed consent both to take part in the experiment and to allow their anonymized data to be publicly shared for research purposes. Furthermore, the experiment was approved by the Ethical Committee of the Universidad de Málaga (reference 46-2024-H).
Each folder corresponds to a particiapnt session (e.g., S0/
, S2/
, etc.) and contains the following files:
S0/
├── S0_DataExperiment.mat
├── S0_audioEventVector.mat
└── S0_SA_Score.mat
...
S8/
├── S8_DataExperiment.mat
├── S8_audioEventVector.mat
└── S8_SA_Score.mat
In addition, the dataset includes a CSV file named caneFeatures_pre_post.csv, containing the extracted features from the GSR, tonic and phasic signals, allowing for the replication of the statistical analyses presented in the study.
S*_DataExperiment.mat
Description: This file contains the EDA signals acquired at a 4 Hz sampling rate during the experiment, stored in MATLAB .mat
format as a structured variable.
Format: MATLAB Struct (3 fields)
GSR
: Contains the raw GSR signal along with associated time information: TimeStampDate
(UTC date-time format) and TimeStampPosix
(POSIX timestamp).
TONIC
: Contains the tonic component of the EDA signal with the same timestamp fields.
PHASIC
: Contains the phasic component of the EDA signal with the corresponding timestamps.S*_audioEventVector.mat
Description: This file contains information about the timing of the auditory startle stimuli presented during the experiment. The data is stored as a MATLAB struct sampled at 32 Hz.
Format: MATLAB Struct (3 fields)
data
: A binary step signal indicating the presence of auditory events (0 = no stimulus, 1 = stimulus being played).
TimeStampDate
: A vector of timestamps in MATLAB datetime format, corresponding to each sample in the data
field.
S*_SA_Score.mat
Description: This file contains the self-reported State Anxiety (STAI-State) scores provided by each participant before and after the experimental session. The data is stored as a MATLAB struct.
Format: MATLAB Struct (2 fields)
Training
: Numeric score reported after the training session.
Experiment
: Numeric score reported after the experimental session.
For any questions or further information regarding this dataset, please contact fvidal@uma.es.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Solar Panel EDA is a dataset for object detection tasks - it contains Solar Panel annotations for 721 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for Opencores
We gathered high-quality specification-code pairs from Opencores, a community aimed to developing digital open-source hardware using electronic design automation (EDA). We then filtered out data instances exceeding 4096 characters in length and those that could not be parsed into Abstract Syntax Trees (AST). The final dataset comprises approximately 800 data instances.
Dataset Features
instruction (string): The nature language instruction for… See the full description on the dataset page: https://huggingface.co/datasets/LLM-EDA/opencores.
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
According to Cognitive Market Research, The Global EDA Market size will be USD 14.9 billion in 2023 and will grow at a compound annual growth rate (CAGR) of 10.50% from 2023 to 2030.
The demand for the EDA Market is rising due to the rise in outdoor and adventure activities.
Changing consumer lifestyle trends are higher in the EDA market.
The cat segment held the highest EDA Market revenue share in 2023.
North American EDA will continue to lead, whereas the European EDA Market will experience the most substantial growth until 2030.
Supply Chain and Risk Analysis to Provide Viable Market Output
The industry is facing supply chain and logistics disruptions. EDA tools have been instrumental in analyzing supply chain data, identifying vulnerabilities, predicting risks, and developing disruption mitigation strategies. Consumer behavior has undergone drastic changes due to blockages and restrictions. EDA helps companies analyze changing trends in buying behavior, online shopping preferences, and demand patterns, enabling organizations to adjust their marketing and sales strategies accordingly.
Health and Pharmaceutical Research to Propel Market Growth.
EDA tools have played a key role in analyzing large amounts of data related to vaccine development, drug trials, patient records and epidemiological studies. These tools have helped researchers process and interpret complex medical data, leading to advances in the development of treatments and vaccines. The pandemic has created challenges in data collection, especially in sectors affected by lockdowns or blackouts. Rapidly changing conditions and incomplete data sets make effective EDA difficult due to data quality issues. The economic uncertainty caused by the pandemic has led to budget cuts in some sectors, impacting investment in new technologies. Some organizations have limited budgets that limit their ability to adopt or update EDA tools.
Market Dynamics of the EDA
Privacy and Data Security Issues to Restrict Market Growth.
With the focus on data privacy regulations such as GDPR, CCPA, etc., organizations need to ensure compliance when handling sensitive data. These compliance requirements may limit the scope of the EDA by limiting the availability and use of certain data sets for information analysis. EDA often requires data analysts or data scientists who are skilled in statistical analysis and data visualization tools. A lack of professionals with these specialized skills can hinder an organization's ability to use EDA tools effectively, limiting adoption. Advanced EDA techniques can involve complex algorithms and statistical techniques that are difficult for non-technical users to understand. Interpreting results and deriving actionable insights from EDA results pose challenges that affect applicability to a wider audience.
Key Opportunity of market.
Growing miniaturization in various industries can be an opportunity.
With the age of highly advanced electronics, miniaturization has become a trend that enabled organizations across diverse sectors such as healthcare, consumer electronics, aerospace and defense, automotive and others to design miniature electronic devices. The devices incorporate miniaturized semiconductor components, e.g., surgical instruments and blood glucose meters in healthcare, fitness bands in wearable devices, automotive modules in the automotive sector, and intelligent baggage labels. Miniaturization has a number of advantages such as freeing space for other features and better batteries. The increased consciousness among consumers towards fitness is fueling the demand for smaller fitness devices such as smartwatches and fitness trackers. This is motivating companies to come up with innovative products with improved features, while researchers are concentrating on cost-effective and efficient product development through electronic design tools. Besides, use of portable equipment has gained immense popularity among media professionals because of the increasing demand for live reporting of different events like riots, accidents, sports, and political rallies. As a result of the inconvenience in the use of cumbersome TV production vans to access such events, demand for portable handheld equipment has risen. Such devices are simply portable and can be quickly moved to the event venue if carried in backpacks. Therefore, the need for compact devices across various indust...
This data consists of the incidents involving guns. Perform EDA to find out the hidden patterns. Columns: 1) Race: Race of individual 2) Date: Date of incident 3) Education 4) Police involvment
Please leave an upvote if you find this relevant. P.S. I am new and it will help immensely. :)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data described in this repository has five items:DataSpecsThis excel file has six worksheets with the following information: demographic data, biofiles available, Immersive Tendencies Questionnaire responses, immersive questionnaire responses, items of questionnaires, and EEG electrode positions in Theta/Phi coordinates.LoudspeakerInformationPDF file explaining the alignment and positions of loudspeakers for stereo, PCMA-3D, and ESMA-3D audio playback. RawDataFolder with individual subfolders of participants labeled with assigned ID. Each folder has EEG, EDA, and BVP files in GDF format for three conditions: 1) resting state (Bl), 2) concert hall (Music), and 3) urban park (Park) soundscapes. The assigned audio group (Stereo or 3D) is specified in file names. Sample rates are: EEG = 500 Hz, BVP = 64 Hz, and EDA = 4 Hz. The assigned audio group is specified in file names. For example, file “01_Stereo_BVP_Bl” corresponds to BVP data in the resting state of the participant 01 assigned to the Stereo group.LatencyAdjustmentFolder with individual subfolders of participants labeled with assigned ID in SET/FDT format. The only difference is that "condition 8" onset was adjusted according to the latency caused by the distance between the audio system and participants (2 m). Condition 8 indicates the moment a soundscape (Music or Park) was played.AudioFilesThis folder contains two subfolders:Music: 2-minute long WAV audio files of concert hall recordings prepared to be heard on PCMA-3D and Stereo (Downmix files) loudspeaker array at 48k Hz of sample rate and 24-bit depthPark: 2-minute long WAV audio files of urban park recordings prepared to be heard on ESMA-3D and Stereo (Downmix files) loudspeaker array at 48k Hz of sample rate and 24-bit depthStereo downmix files include the word “_Downmix_”.Note: In the worksheet Items of DataSpecs, the codes that the questionnaires provide are included. Just one item of the Immersive Tendencies Questionnaire and the items of the Self-assessment manikin test do not have codes in their original publications.
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Rehenatun Jannat
Released under CC0: Public Domain
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Card for Opencores
In the process of continual pre-training, we utilized the publicly available VGen dataset. VGen aggregates Verilog repositories from GitHub, systematically filters out duplicates and excessively large files, and retains only those files containing \texttt{module} and \texttt{endmodule} statements. We also incorporated the CodeSearchNet dataset \cite{codesearchnet}, which contains approximately 40MB function codes and their documentation.… See the full description on the dataset page: https://huggingface.co/datasets/LLM-EDA/vgen_cpp.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
[EMNLP2024] DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models
DA-Code is a comprehensive evaluation dataset designed to assess the data analysis and code generation capabilities of LLM in agent-based data science tasks. Our papers and experiment reports have been published on Arxiv.
Dataset Overview
500 complex real-world data analysis tasks across Data Wrangling (DW), Machine Learning (ML), and Exploratory Data Analysis (EDA). Tasks cover… See the full description on the dataset page: https://huggingface.co/datasets/Jianwen2003/DA-Code.
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data_Analysis.ipynb
: A Jupyter Notebook containing the code for the Exploratory Data Analysis (EDA) presented in the thesis. Running this notebook reproduces the plots in the eda_plots/
directory.Dataset_Extension.ipynb
: A Jupyter Notebook used for the data enrichment process. It takes the raw `Inference_data.csv
` and produces the Inference_data_Extended.csv
by adding detailed hardware specifications, cost estimates, and derived energy metrics.Optimization_Model.ipynb
: The main Jupyter Notebook for the core contribution of this thesis. It contains the code to perform the 5-fold cross-validation, train the final predictive models, generate the Pareto-optimal recommendations, and create the final result figures.Inference_data.csv
: The raw, unprocessed data collected from the official MLPerf Inference v4.0 results.Inference_data_Extended.csv
: The final, enriched dataset used for all analysis and modeling. This is the output of the Dataset_Extension.ipynb
notebook.eda_log.txt
: A text log file containing summary statistics generated during the exploratory data analysis.requirements.txt
: A list of all necessary Python libraries and their versions required to run the code in this repository.eda_plots/
: A directory containing all plots (correlation matrices, scatter plots, box plots) generated by the EDA notebook.optimization_models_final/
: A directory where the trained and saved final model files (.joblib
) are stored after running the optimization notebook.pareto_validation_plot_fold_0.png
: The validation plot comparing the true vs. predicted Pareto fronts, as presented in the thesis.shap_waterfall_final_model.png
: The SHAP plot used for the model interpretability analysis, as presented in the thesis.
bash
git clone
cd
bash
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
bash
pip install -r requirements.txt
Inference_data_Extended.csv
`) is already provided. However, if you wish to reproduce the enrichment process from scratch, you can run the **`Dataset_Extension.ipynb
`** notebook. It will take `Inference_data.csv` as input and generate the extended version.eda_plots/
` directory. To regenerate them, run the **`Data_Analysis.ipynb
`** notebook. This will overwrite the existing plots and the `eda_log.txt` file.Optimization_Model.ipynb
notebook will execute the entire pipeline described in the paper:optimization_models_final/
directory.pareto_validation_plot_fold_0.png
and shap_waterfall_final_model.png
.Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F23516597%2F11309e6c4df1437ed2aa6a8fb121daa5%2FScreenshot%202025-04-10%20at%2004.17.42.png?generation=1744233480336962&alt=media" alt="">
https://www.kaggle.com/code/idmitri/exploratory-data-analysis
https://www.kaggle.com/code/idmitri/rul-prediction-modeling
Силовые трансформаторы на АЭС могут эксплуатироваться дольше расчетного срока службы (25 лет), что требует усиленного мониторинга их состояния для обеспечения надежности и безопасности эксплуатации.
Для оценки состояния трансформаторов применяется хроматографический анализ растворенных газов, который позволяет выявлять дефекты по концентрациям газов в масле и прогнозировать остаточный срок службы трансформатора (RUL). Традиционные системы мониторинга ограничиваются фиксированными пороговыми значениями концентраций, снижая точность диагностики и автоматизацию. Методы машинного обучения позволяют выявлять скрытые зависимости и повышать точность прогнозирования. Подробнее: https://habr.com/ru/articles/743682/
В данном проекте проводится глубокий анализ данных (EDA) с созданием 12 групп признаков:
- gases (концентрации газов)
- trend (трендовые компоненты)
- seasonal (сезонные компоненты)
- resid (остаточные компоненты)
- quantiles (квантили распределений)
- volatility (волатильность концентраций)
- range (размах значений)
- coefficient of variation (коэффициент вариации)
- standard deviation (стандартное отклонение)
- skewness (асимметрия распределения)
- kurtosis (эксцесс распределения)
- category (категориальные признаки неисправностей)
Использование статистических и декомпозиционных признаков позволило достичь совпадения точности силуэта распределения RUL с автоматической обработкой выбросов, что ранее требовало ручной корректировки.
Для моделирования использованы алгоритмы машинного обучения (LightGBM, CatBoost, Extra Trees) и их ансамбль. Лучшая точность достигнута моделью LightGBM с оптимизацией гиперпараметров с помощью Optuna: MAE = 61.85, RMSE = 88.21, R2 = 0.8634.
Код для проведения разведочного анализа данных (EDA) был разработан и протестирован локально в VSC Jupyter Notebook с использованием окружения Python 3.10.16. И на платформе Kaggle большинство графиков отображается корректно. Но некоторые сложные и комплексные визуализации (например, многомерные графики с цветовой шкалой) не адаптированы из-за ограничений среды. Несмотря на попытки оптимизировать код без существенных изменений, добиться полной совместимости не удалось. Основная проблема заключалась в конфликте версий библиотек и значительном снижении производительности — расчет занимал примерно в 10 раз больше времени по сравнению с локальной машиной MacBook M3 Pro. На Kaggle либо корректно выполнялись операции с использованием PyCaret, либо работали модели машинного обучения, но не обе части одновременно.
Предлагается гибридный вариант работы:
- Публикация и вывод метрик на Kaggle для визуализации результатов.
- Локальный расчет и обучение моделей с использованием предварительно настроенного окружения Python 3.10.16. Для воспроизведения экспериментов подготовлена папка Codes
с кодами VSC EDA
, RUL
и файлом libraries_for_modeling
, содержащим список версий всех используемых библиотек.
Готов ответить в комментариях на все вопросы по настройке и запуску кода. И буду признателен за советы по предотвращению подобных проблем.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Revalina F
Released under MIT
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was collected in a pharmaceutical case study where participants performed repetitive packing tasks for approximately 20 minutes directly on the production line. The study aimed to assess physiological and ergonomic factors affecting workers during the task.
Key Variables:
Participant Information:
ID participant
: Unique identifier for each participant.Age
: Age of the participant.Experience
: Work experience in years.Task Context:
Moment
: Time of measurement during the shift (Start, Middle, End).Turn
: Work shift number.Plant/Line
: Identification of the production line.Day
: Day of the week.Time
: Exact timestamp of data collection.LoTNum
: Lot number for batch packing.Physiological Measurements (from wearable devices):
eda_scl_usiemens
: Electrodermal activity (EDA) in microsiemens.pulse_rate_bpm
: Heart rate in beats per minute.temperature_celsius
: Skin temperature in Celsius.accelerometers_std_g
: Standard deviation of accelerometer readings (movement intensity).steps_count
: Number of steps taken.activity_counts
: General activity level.Ergonomic and Risk Indicators:
IndexRiskR
: Risk index for the right hand.IndexRiskL
: Risk index for the left hand.Borg Test
: Subjective rating of perceived exertion (Borg scale).This dataset was created by Gourav Rohra
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset was created by Mustafa Ghzi
Released under CC BY-NC-SA 4.0