48 datasets found

Multi-Dimensional Data Viewer (MDV) user manual for data exploration:...
zenodo.org
pdf, zip
Updated Jul 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Kiourlappou; Maria Kiourlappou; Martin Sergeant; Martin Sergeant; Joshua S. Titlow; Joshua S. Titlow; Jeffrey Y. Lee; Jeffrey Y. Lee; Darragh Ennis; Stephen Taylor; Stephen Taylor; Ilan Davis; Ilan Davis; Darragh Ennis (2024). Multi-Dimensional Data Viewer (MDV) user manual for data exploration: "Systematic analysis of YFP traps reveals common discordance between mRNA and protein across the nervous system" [Dataset]. http://doi.org/10.5281/zenodo.7875495
Explore at:
zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7875495
Dataset updated
Jul 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Maria Kiourlappou; Maria Kiourlappou; Martin Sergeant; Martin Sergeant; Joshua S. Titlow; Joshua S. Titlow; Jeffrey Y. Lee; Jeffrey Y. Lee; Darragh Ennis; Stephen Taylor; Stephen Taylor; Ilan Davis; Ilan Davis; Darragh Ennis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description

Please also see the latest version of the repository:
https://doi.org/10.5281/zenodo.6374011 and
our website: https://ilandavis.com/jcb2023-yfp

The explosion in the volume of biological imaging data challenges the available technologies for data interrogation and its intersection with related published bioinformatics data sets. Moreover, intersection of highly rich and complex datasets from different sources provided as flat csv files requires advanced informatics skills, which is time consuming and not accessible to all. Here, we provide a “user manual” to our new paradigm for systematically filtering and analysing a dataset with more than 1300 microscopy data figures using Multi-Dimensional Viewer (MDV) -link, a solution for interactive multimodal data visualisation and exploration. The primary data we use are derived from our published systematic analysis of 200 YFP traps reveals common discordance between mRNA and protein across the nervous system (eprint link). This manual provides the raw image data together with the expert annotations of the mRNA and protein distribution as well as associated bioinformatics data. We provide an explanation, with specific examples, of how to use MDV to make the multiple data types interoperable and explore them together. We also provide the open-source python code (github link) used to annotate the figures, which could be adapted to any other kind of data annotation task.
Data from: Multidimensional Data Exploration with Glue
figshare.com
pdf
Updated Jan 18, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Openproceedings Bot (2016). Multidimensional Data Exploration with Glue [Dataset]. http://doi.org/10.6084/m9.figshare.935503.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.935503.v1
Dataset updated
Jan 18, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Openproceedings Bot
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Modern research projects incorporate data from several sources, and new insights are increasingly driven by the ability to interpret data in the context of other data. Glue is an interactive environment built on top of the standard Python science stack to visualize relationships within and between datasets. With Glue, users can load and visualize multiple related datasets simultaneously. Users specify the logical connections that exist between data, and Glue transparently uses this information as needed to enable visualization across files. This functionality makes it trivial, for example, to interactively overplot catalogs on top of images. The central philosophy behind Glue is that the structure of research data is highly customized and problem-specific. Glue aims to accommodate this and simplify the "data munging" process, so that researchers can more naturally explore what their data have to say. The result is a cleaner scientific workflow, faster interaction with data, and an easier avenue to insight.
DataCite public data exploration
redivis.com
Updated Apr 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ian Mathews (2025). DataCite public data exploration [Dataset]. https://redivis.com/workflows/hx1e-a6w8vmwsx
Explore at:
Dataset updated
Apr 29, 2025
Dataset provided by
Redivis Inc.
Authors
Ian Mathews
Description
This is a sample project highlighting some basic methodologies in working with the DataCite public data file and Data Citation Corpus on Redivis.

Using the transform interface, we extract all records associated with DOIs for Stanford datasets on Redivis. We then make a simple plot using a python notebook to see DOI issuance over time. The nested nature of some of the public data file fields makes exploration a bit challenging; future work could break this dataset into multiple related tables for easier analysis.

We can also join with the Data Citation Corpus to find all citations referencing Stanford-on-Redivis DOIs (the citation corpus is a work in progress, and doesn't currently capture many of the citations in the literature).
f
Data_Sheet_1_ImputEHR: A Visualization Tool of Imputation for the Prediction...
frontiersin.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yi-Hui Zhou; Ehsan Saghapour (2023). Data_Sheet_1_ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data.PDF [Dataset]. http://doi.org/10.3389/fgene.2021.691274.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2021.691274.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Yi-Hui Zhou; Ehsan Saghapour
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Electronic health records (EHRs) have been widely adopted in recent years, but often include a high proportion of missing data, which can create difficulties in implementing machine learning and other tools of personalized medicine. Completed datasets are preferred for a number of analysis methods, and successful imputation of missing EHR data can improve interpretation and increase our power to predict health outcomes. However, use of the most popular imputation methods mainly require scripting skills, and are implemented using various packages and syntax. Thus, the implementation of a full suite of methods is generally out of reach to all except experienced data scientists. Moreover, imputation is often considered as a separate exercise from exploratory data analysis, but should be considered as art of the data exploration process. We have created a new graphical tool, ImputEHR, that is based on a Python base and allows implementation of a range of simple and sophisticated (e.g., gradient-boosted tree-based and neural network) data imputation approaches. In addition to imputation, the tool enables data exploration for informed decision-making, as well as implementing machine learning prediction tools for response data selected by the user. Although the approach works for any missing data problem, the tool is primarily motivated by problems encountered for EHR and other biomedical data. We illustrate the tool using multiple real datasets, providing performance measures of imputation and downstream predictive analysis.
Zegami user manual for data exploration: "Systematic analysis of YFP gene...
zenodo.org
pdf, zip
Updated Jul 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Kiourlappou; Maria Kiourlappou; Stephen Taylor; Ilan Davis; Ilan Davis; Stephen Taylor (2024). Zegami user manual for data exploration: "Systematic analysis of YFP gene traps reveals common discordance between mRNA and protein across the nervous system" [Dataset]. http://doi.org/10.5281/zenodo.6374012
Explore at:
pdf, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6374012
Dataset updated
Jul 17, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Maria Kiourlappou; Maria Kiourlappou; Stephen Taylor; Ilan Davis; Ilan Davis; Stephen Taylor
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The explosion in biological data generation challenges the available technologies and methodologies for data interrogation. Moreover, highly rich and complex datasets together with diverse linked data are difficult to explore when provided in flat files. Here we provide a way to filter and analyse in a systematic way a dataset with more than 18 thousand data points using Zegami, a solution for interactive data visualisation and exploration. The primary data we use are derived from a systematic analysis of 200 YFP gene traps reveals common discordance between mRNA and protein across the nervous system which is submitted elsewhere. This manual provides the raw image data together with annotations and associated data and explains how to use Zegami for exploring all these data types together by providing specific examples. We also provide the open source python code used to annotate the figures.
Healthcare Workforce Mental Health Dataset
kaggle.com
Updated Feb 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rivalytics (2025). Healthcare Workforce Mental Health Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/10768196
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/10768196
Dataset updated
Feb 16, 2025
Dataset provided by
Kaggle
Authors
Rivalytics
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
📌**Context**

The Healthcare Workforce Mental Health Dataset is designed to explore workplace mental health challenges in the healthcare industry, an environment known for high stress and burnout rates.

This dataset enables users to analyze key trends related to:

💠 Workplace Stressors: Examining the impact of heavy workloads, poor work environments, and emotional demands.

💠 Mental Health Outcomes: Understanding how stress and burnout influence job satisfaction, absenteeism, and turnover intention.

💠 Educational & Analytical Applications: A valuable resource for data analysts, students, and career changers looking to practice skills in data exploration and data visualization.

To help users gain deeper insights, this dataset is fully compatible with a Power BI Dashboard, available as part of a complete analytics bundle for enhanced visualization and reporting.

📌**Source**

This dataset was synthetically generated using the following methods:

💠 Python & Data Science Techniques: Probabilistic modeling to simulate realistic data distributions. Industry-informed variable relationships based on healthcare workforce studies.

💠 Guidance & Validation Using AI (ChatGPT): Assisted in refining dataset realism and logical mappings.

💠 Industry Research & Reports: Based on insights from WHO, CDC, OSHA, and academic studies on workplace stress and mental health in healthcare settings.

📌**Inspiration**

This dataset was inspired by ongoing discussions in healthcare regarding burnout, mental health, and staff retention. The goal is to bridge the gap between raw data and actionable insights by providing a structured, analyst-friendly dataset.

For those who want a ready-to-use reporting solution, a Power BI Dashboard Template is available, designed for interactive data exploration, workforce insights, and stress factor analysis.

📌**Important Note** This dataset is synthetic and intended for educational purposes only. It is not real-world employee data and should not be used for actual decision-making or policy implementation.
Open Data Package for Article "Exploring Complexity Issues in Junior...
figshare.com
xlsx
Updated Jul 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arthur-Jozsef Molnar (2024). Open Data Package for Article "Exploring Complexity Issues in Junior Developer Code using Static Analysis and FCA" [Dataset]. http://doi.org/10.6084/m9.figshare.25729587.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25729587.v1
Dataset updated
Jul 9, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Arthur-Jozsef Molnar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The present dataset include the SonarQube issues uncovered as part of our exploratory research targeting code complexity issues in junior developer code written in the Python or Java programming languages. The dataset also includes the actual rule configurations and thresholds used for the Python and Java languages during source code analysis.
m
Dataset for twitter Sentiment Analysis using Roberta and Vader
data.mendeley.com
Updated May 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jannatul Ferdoshi Jannatul Ferdoshi (2023). Dataset for twitter Sentiment Analysis using Roberta and Vader [Dataset]. http://doi.org/10.17632/2sjt22sb55.1
Explore at:
Unique identifier
https://doi.org/10.17632/2sjt22sb55.1
Dataset updated
May 14, 2023
Authors
Jannatul Ferdoshi Jannatul Ferdoshi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Our dataset comprises 1000 tweets, which were taken from Twitter using the Python programming language. The dataset was stored in a CSV file and generated using various modules. The random module was used to generate random IDs and text, while the faker module was used to generate random user names and dates. Additionally, the textblob module was used to assign a random sentiment to each tweet.

This systematic approach ensures that the dataset is well-balanced and represents different types of tweets, user behavior, and sentiment. It is essential to have a balanced dataset to ensure that the analysis and visualization of the dataset are accurate and reliable. By generating tweets with a range of sentiments, we have created a diverse dataset that can be used to analyze and visualize sentiment trends and patterns.

In addition to generating the tweets, we have also prepared a visual representation of the data sets. This visualization provides an overview of the key features of the dataset, such as the frequency distribution of the different sentiment categories, the distribution of tweets over time, and the user names associated with the tweets. This visualization will aid in the initial exploration of the dataset and enable us to identify any patterns or trends that may be present.
Titanic data for Data Preprocessing
kaggle.com
Updated Oct 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akshay Sehgal (2021). Titanic data for Data Preprocessing [Dataset]. https://www.kaggle.com/akshaysehgal/titanic-data-for-data-preprocessing/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 28, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Akshay Sehgal
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Description

Public "Titanic" dataset for data exploration, preprocessing and benchmarking basic classification/regression models.

Columns

'survived'

'pclass'

'sex'

'age'

'sibsp'

'parch'

'fare'

'embarked'

'class'

'who'

'adult_male'

'deck'

'embark_town'

'alive'

'alone'

Acknowledgements

Github: https://github.com/mwaskom/seaborn-data/blob/master/titanic.csv

Inspiration

Playground for visualizations, preprocessing feature engineering, model pipelining, and more.
d
Play Fairway Analysis of the Snake River Plain, Idaho: Final Report
catalog.data.gov
data.openei.org
+2more
Updated Jan 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Utah State University (2025). Play Fairway Analysis of the Snake River Plain, Idaho: Final Report [Dataset]. https://catalog.data.gov/dataset/play-fairway-analysis-of-the-snake-river-plain-idaho-final-report-3cd0c
Explore at:
Dataset updated
Jan 20, 2025
Dataset provided by
Utah State University
Area covered
Snake River, Idaho, Snake River Plain
Description
This submission includes the final project report of the Snake River Plain Play Fairway Analysis project as well as a separate appendix for the final report. The final report outlines the application of Play Fairway Analysis (PFA) to geothermal exploration, specifically within the Snake River Plain volcanic province. The goals of the report are to use PFA to lower risk and cost of geothermal exploration and stimulate development of geothermal power resources in Idaho. Further use of this report could include the application of PFA for geothermal exploration throughout the geothermal industry. The report utilizes ArcGIS and Python for data analysis which used to developed a systematic workflow to automate data analysis. The appendix for the report includes ArcGIS maps and data compilation information regarding the report.
d
Visualization of Stream Temperature of Logan River by exploring Water...
search.dataone.org
hydroshare.org
+1more
Updated Dec 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bhuwan Ghimire (2023). Visualization of Stream Temperature of Logan River by exploring Water Temperature Dataset Using Python [Dataset]. https://search.dataone.org/view/sha256%3Acc28c28eb10ae34bd6301bca67a8cf2244b8c65a6609128c6f026719c92027ba
Explore at:
Dataset updated
Dec 30, 2023
Dataset provided by
Hydroshare
Authors
Bhuwan Ghimire
Time period covered
Jan 1, 2014 - Dec 31, 2021
Area covered

Description
This resource contains the environmental data (Stream Temperature) for different monitoring sites of the Logan River Observatory in the SQLite database management system. The monitoring sites with SiteIDs 1,2,3,9 and 10 of the Logan River Observatory are considered for the evaluation and visualization of monthly average stream temperature whose Variable ID is 1. The python code which is included in this resource is capable to access the database (SQLite) file and this retrieved data can be analyzed to examine the average monthly stream temperature at different monitoring sites of the Logan River Observatory.
Classification Analysis Using Python
kaggle.com
Updated Jul 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nibedita Sahu (2023). Classification Analysis Using Python [Dataset]. https://www.kaggle.com/datasets/nibeditasahu/classification-analysis-using-python
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 3, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nibedita Sahu
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The Iris dataset is a classic and widely used dataset in machine learning for classification tasks. It consists of measurements of different iris flowers, including sepal length, sepal width, petal length, and petal width, along with their corresponding species. With a total of 150 samples, the dataset is balanced and serves as an excellent choice for understanding and implementing classification algorithms. This notebook explores the dataset, preprocesses the data, builds a decision tree classification model, and evaluates its performance, showcasing the effectiveness of decision trees in solving classification problems.
Data from: GreEn-ER - Electricity Consumption Data of a Tertiary Building
search.datacite.org
data.mendeley.com
Updated Sep 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gustavo Martin Nascimento (2020). GreEn-ER - Electricity Consumption Data of a Tertiary Building [Dataset]. http://doi.org/10.17632/h8mmnthn5w
Explore at:
Unique identifier
https://doi.org/10.17632/h8mmnthn5w
Dataset updated
Sep 20, 2020
Dataset provided by
DataCitehttps://www.datacite.org/
Mendeley
Authors
Gustavo Martin Nascimento
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset provides electricity consumption data collected from the building management system of GreEn-ER. This building, located in Grenoble, hosts Grenoble-INP Ense³ Engineering School and the G2ELab (Grenoble Electrical Engineering Laboratory). It brings together in one place the teaching and research actors around new energy technologies. The electricity consumption of the building is highly monitored with plus than 300 meters. The data from each meter is available in one csv file, which contains two columns. One contains the Timestamp and the other contains de electricity consumption in kWh. The sampling rate for all data is 10 min. There are data available for 2017 and 2018. The dataset also contains data of the external temperature for 2017 and 2018. The files are structured as follows: - The main folder called "Data" contains 2 sub-folders, each one corresponding to one year (2017 and 2018). - Each sub-folder contains 3 other sub-folders, each one corresponding to a sector of the building. - The main folder "Data" also contains the csv files with the electricity consumption data of the whole building and a file called "Temp.csv" with the temperature data. - The separator used in the csv files is ";". - The sampling rate is 10 min and the unity of the consumption is kWh. It means that each sample corresponds to the energy consumption in these 10 minutes. So if the user wants to retrieve the mean power in this period (that corresponds to each sample), the value must be multiplied by 6. - Four Jupyter Notebook files, a format that allows combining text, graphics and code in python are also available. These files allow exploring all the data within the dataset. - These jupyter notebook files contains all the metadata necessary for understanding the system, like drawings of the system design, of the building etc. - Each file is named by the number of its meter. These numbers can be retrieved in tables and drawings available in the Jupyter Notebooks. - A couple of csv files with the system design are also available. They are called "TGBT1_n.csv", "TGBT2_n.csv" and "PREDIS-MHI_n.csv".
d
Data from: Reinforcement-based processes actively regulate motor exploration...
dataone.org
data.niaid.nih.gov
+2more
Updated Mar 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam Roth; Jan Calalo; Rakshith Lokesh; Seth Sullivan; Stephen Grill; John Jeka; Katinka van der Kooij; Michael Carter; Joshua Cashaback (2024). Reinforcement-based processes actively regulate motor exploration along redundant solution manifolds [Dataset]. http://doi.org/10.5061/dryad.ngf1vhj10
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.ngf1vhj10
Dataset updated
Mar 14, 2024
Dataset provided by
Dryad Digital Repository
Authors
Adam Roth; Jan Calalo; Rakshith Lokesh; Seth Sullivan; Stephen Grill; John Jeka; Katinka van der Kooij; Michael Carter; Joshua Cashaback
Time period covered
Sep 12, 2023
Description
From a babyâ€™s babbling to a songbird practicing a new tune, exploration is critical to motor learning. A hallmark of exploration is the emergence of random walk behaviour along solution manifolds, where successive motor actions are not independent but rather become serially dependent. Such exploratory random walk behaviour is ubiquitous across species, neural firing, gait patterns, and reaching behaviour. Past work has suggested that exploratory random walk behaviour arises from an accumulation of movement variability and a lack of error-based corrections. Here we test a fundamentally different ideaâ€”that reinforcement-based processes regulate random walk behaviour to promote continual motor exploration to maximize success. Across three human-reaching experiments, we manipulated the size of both the visually displayed target and an unseen reward zone, as well as the probability of reinforcement feedback. Our empirical and modelling results parsimoniously support the notion that explorato..., Data was collected using a Kinarm and processed using Kinarm's Matlab scripts. The output of the Matlab scripts was then processed using Python (3.8.13) and stored in custom Python objects.Â , , # Reinforcement-Based Processes Actively Regulate Motor Exploration Along Redundant Solution Manifolds

https://doi.org/10.5061/dryad.ngf1vhj10

All files are compressed using the Python package dill. Each file contains a custom Python object that has data attributes and analysis methods. For a complete list of methods and attributes, see Exploration_Subject.py in the repository https://github.com/CashabackLab/Exploration-Along-Solution-Manifolds-Data

Files can be read into a Python script via the class method "from_pickle" inside the Exploration_Subject class.
Follow-up Attention: An Empirical Study of Developer and Neural Model Code...
figshare.com
zip
Updated Aug 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matteo Paltenghi (2024). Follow-up Attention: An Empirical Study of Developer and Neural Model Code Exploration - Raw Eye Tracking Data [Dataset]. http://doi.org/10.6084/m9.figshare.23599251.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23599251.v1
Dataset updated
Aug 15, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Matteo Paltenghi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset of 92 valid eye tracking sessions of 25 participants working in Vscode and answering 15 different code understanding questions (e.g., what is the output, side effects, algorithmic complexity, concurrency etc.) on source code written in 3 programming langauges: Python, C++, C#.
m
Multimodal Tactile Texture Dataset
data.mendeley.com
Updated Aug 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bruno Monteiro Rocha Lima (2023). Multimodal Tactile Texture Dataset [Dataset]. http://doi.org/10.17632/n666tk4mw9.1
Explore at:
Unique identifier
https://doi.org/10.17632/n666tk4mw9.1
Dataset updated
Aug 15, 2023
Authors
Bruno Monteiro Rocha Lima
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The objective of this dataset is to provide a comprehensive collection of data that explores the recognition of tactile textures in dynamic exploration scenarios. The dataset was generated using a tactile-enabled finger with a multi-modal tactile sensing module. By incorporating data from pressure, gravity, angular rate, and magnetic field sensors, the dataset aims to facilitate research on machine learning methods for texture classification.

The data is stored in pickle files, which can be read using Panda’s library in Python. The data files are organized in a specific folder structure and contain multiple readings for each texture and exploratory velocity. The dataset contains raw data of the recorded tactile measurements for 12 different textures and 3 different exploratory velocities stored in pickle files.

Pickles_30 - Folder containing pickle files with tactile data at an exploratory velocity of 30 mm/s. Pickles_40 - Folder containing pickle files with tactile data at an exploratory velocity of 40 mm/s. Pickles_45 - Folder containing pickle files with tactile data at an exploratory velocity of 45 mm/s. Texture_01 to Texture_12 - Folders containing pickle files for each texture, labelled as texture_01, texture_02, and so on. Full_baro - Folder containing pickle files with barometer data for each texture. Full_imu - Folder containing pickle files with IMU (Inertial Measurement Unit) data for each texture.

The "reading-pickle-file.ipynb" file is a script for reading and plotting the dataset.
Market Basket Analysis
kaggle.com
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 9, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Earthquake Early Warning Dataset
figshare.com
txt
Updated Nov 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kevin Fauvel; Daniel Balouek-Thomert; Diego Melgar; Pedro Silva; Anthony Simonet; Gabriel Antoniu; Alexandru Costan; Véronique Masson; Manish Parashar; Ivan Rodero; Alexandre Termier (2019). Earthquake Early Warning Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.9758555.v3
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9758555.v3
Dataset updated
Nov 20, 2019
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Kevin Fauvel; Daniel Balouek-Thomert; Diego Melgar; Pedro Silva; Anthony Simonet; Gabriel Antoniu; Alexandru Costan; Véronique Masson; Manish Parashar; Ivan Rodero; Alexandre Termier
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is composed of GPS stations (1 file) and seismometers (1 file) multivariate time series data associated with three types of events (normal activity / medium earthquakes / large earthquakes). Files Format: plain textFiles Creation Date: 02/09/2019Data Type: multivariate time seriesNumber of Dimensions: 3 (east-west, north-south and up-down)Time Series Length: 60 (one data point per second)Period: 2001-2018Geographic Location: -62 ≤ latitude ≤ 73, -179 ≤ longitude ≤ 25Data Collection - Large Earthquakes: GPS stations and seismometers data are obtained from the archive [1]. This archive includes 29 large eathquakes. In order to be able to adopt a homogeneous labeling method, dataset is limited to the data available from the American Incorporated Research Institutions for Seismology - IRIS (14 large earthquakes remaining over 29). > GPS stations (14 events): High Rate Global Navigation Satellite System (HR-GNSS) displacement data (1-5Hz). Raw observations have been processed with a precise point positioning algorithm [2] to obtain displacement time series in geodetic coordinates. Undifferenced GNSS ambiguities were fixed to integers to improve accuracy, especially over the low frequency band of tens of seconds [3]. Then, coordinates have been rotated to a local east-west, north-south and up-down system. > Seismometers (14 events): seismometers strong motion data (1-10Hz). Channel files are specifying the units, sample rates, and gains of each channel. - Normal Activity / Medium Earthquakes: > GPS stations (255 events: 255 normal activity): High Rate Global Navigation Satellite System (HR-GNSS) normal activity displacement data (1Hz). GPS data outside of large earthquake periods can be considered as normal activity (noise). Data is downloaded from [4], an archive maintained by the University of Oregon which stores a representative extract of GPS noise. It is an archive of real-time three component positions for 240 stations in the western U.S. from California to Alaska and spanning from October 2018 to the present day. The raw GPS data (observations of phase and range to visible satellites) are processed with an algorithm called FastLane [5] and converted to 1 Hz sampled positions. Normal activity MTS are randomly sampled from the archive to match the number of seismometers events and to keep a ratio above 30% between the number of large earthquakes MTS and normal activity in order not to encounter a class imbalance issue.> Seismometers (255 events: 170 normal activity, 85 medium earthquakes): seismometers strong motion data (1-10Hz). Time series data collected from the international Federation of Digital Seismograph Networks (FDSN) client available in Python package ObsPy [6]. Channel information is specifying the units, sample rates, and gains of each channel. The number of medium earthquakes is calculated by the ratio of medium over large earthquakes during the past 10 years in the region. A ratio above 30% is kept between the number of 60 seconds MTS corresponding to earthquakes (medium + large) and total (earthquakes + normal activity) number of MTS to prevent a class imbalance issue. The number of GPS stations and seismometers for each event varies (tens to thousands). Preprocessing:- Conversion (seismometers): data are available as digital signal, which is speciﬁc for each sensor. Therefore, each instrument digital signal is converted to its physical signal (acceleration) to obtain comparable seismometers data- Aggregation (GPS stations and seismometers): data aggregation by second (mean)Variables:- event_id: unique ID of an event. Dataset is composed of 269 events.- event_time: timestamp of the event occurence - event_magnitude: magnitude of the earthquake (Richter scale)- event_latitude: latitude of the event recorded (degrees)- event_longitude: longitude of the event recorded (degrees)- event_depth: distance below Earth's surface where earthquake happened (km)- mts_id: unique multivariate time series ID. Dataset is composed of 2,072 MTS from GPS stations and 13,265 MTS from seismometers.- station: sensor name (GPS station or seismometer)- station_latitude: sensor (GPS station or seismometer) latitude (degrees)- station_longitude: sensor (GPS station or seismometer) longitude (degrees)- timestamp: timestamp of the multivariate time series- dimension_E: East-West component of the sensor (GPS station or seismometer) signal (cm/s/s)- dimension_N: North-South component of the sensor (GPS station or seismometer) signal (cm/s/s)- dimension_Z: Up-Down component of the sensor (GPS station or seismometer) signal (cm/s/s)- label: label associated with the event. There are 3 labels: normal activity (GPS stations: 255 events, seismometers: 170 events) / medium earthquake (GPS stations: 0 event, seismometers: 85 events) / large earthquake (GPS stations: 14 events, seismometers: 14 events). EEW relies on the detection of the primary wave (P-wave) before the secondary wave (damaging wave) arrive. P-waves follow a propagation model (IASP91 [7]). Therefore, each MTS is labeled based on the P-wave arrival time on each sensor (seismometers, GPS stations) calculated with the propagation model.[1] Ruhl, C. J., Melgar, D., Chung, A. I., Grapenthin, R. and Allen, R. M. 2019. Quantifying the value of real‐time geodetic constraints for earthquake early warning using a global seismic and geodetic data set. Journal of Geophysical Research: Solid Earth 124:3819-3837.[2] Geng, J., Bock, Y., Melgar, D, Crowell, B. W., and Haase, J. S. 2013. A new seismogeodetic approach applied to GPS and accelerometer observations of the 2012 Brawley seismic swarm: Implications for earthquake early warning. Geochemistry, Geophysics, Geosystems 14:2124-2142.[3] Geng, J., Jiang, P., and Liu, J. 2017. Integrating GPS with GLONASS for high‐rate seismogeodesy. Geophysical Research Letters 44:3139-3146.[4] http://tunguska.uoregon.edu/rtgnss/data/cwu/mseed/[5] Melgar, D., Melbourne, T., Crowell, B., Geng, J, Szeliga, W., Scrivner, C., Santillan, M. and Goldberg, D. 2019. Real-Time High-Rate GNSS Displacements: Performance Demonstration During the 2019 Ridgecrest, CA Earthquakes (Version 1.0) [Data set]. Zenodo.[6] https://docs.obspy.org/packages/obspy.clients.fdsn.html[7] Kennet, B. L. N. 1991. Iaspei 1991 Seismological Tables. Terra Nova 3:122–122.
P
HEAPO Dataset
paperswithcode.com
Updated Mar 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). HEAPO Dataset [Dataset]. https://paperswithcode.com/dataset/heapo
Explore at:
Dataset updated
Mar 20, 2025
Description
Heat pumps are essential for decarbonizing residential heating but consume substantial electrical energy, impacting operational costs and grid demand. Many systems run inefficiently due to planning flaws, operational faults, or misconfigurations. While optimizing performance requires skilled professionals, labor shortages hinder large-scale interventions. However, digital tools and improved data availability create new service opportunities for energy efficiency, predictive maintenance, and demand-side management. To support research and practical solutions, we present an open-source dataset of electricity consumption from 1,408 households with heat pumps and smart electricity meters in the canton of Zurich, Switzerland, recorded at 15-minute and daily resolutions between 2018-11-03 and 2024-03-21. The dataset includes household metadata, weather data from 8 stations, and ground truth data from 410 field visit protocols collected by energy consultants during system optimizations. Additionally, the dataset includes a Python-based data loader to facilitate seamless data processing and exploration.
H
Exploration of Process-Structure Linkages in Simulated Additive...
dataverse.harvard.edu
Updated Sep 8, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Theron Rodgers (2015). Exploration of Process-Structure Linkages in Simulated Additive Manufacturing Microstructures [Dataset]. http://doi.org/10.7910/DVN/KJMK9Z
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/KJMK9Z
Dataset updated
Sep 8, 2015
Dataset provided by
Harvard Dataverse
Authors
Theron Rodgers
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
In this data set, we provide microstructure results from the simulation of additive manufacturing processes with the SPPARKS Monte Carlo code. The dataset will be used in our entry to the Materials Science and Engineering Data Challenge. The parameters varied during the study, and their extents are listed in the table below. All simulations were performed on a 300 x 300 x 200 rectangular lattice. All length and timescales are defined within the model and refer to no actual physical system. This release contains the input and output data files, as well as a Python script for the generation of Paraview-compatible files.

Facebook

Twitter

Click to copy link

Link copied

Cite

Maria Kiourlappou; Maria Kiourlappou; Martin Sergeant; Martin Sergeant; Joshua S. Titlow; Joshua S. Titlow; Jeffrey Y. Lee; Jeffrey Y. Lee; Darragh Ennis; Stephen Taylor; Stephen Taylor; Ilan Davis; Ilan Davis; Darragh Ennis (2024). Multi-Dimensional Data Viewer (MDV) user manual for data exploration: "Systematic analysis of YFP traps reveals common discordance between mRNA and protein across the nervous system" [Dataset]. http://doi.org/10.5281/zenodo.7875495

Multi-Dimensional Data Viewer (MDV) user manual for data exploration: "Systematic analysis of YFP traps reveals common discordance between mRNA and protein across the nervous system"

Explore at:

zip, pdfAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7875495

Dataset updated

Jul 12, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Please also see the latest version of the repository:
https://doi.org/10.5281/zenodo.6374011 and
our website: https://ilandavis.com/jcb2023-yfp

The explosion in the volume of biological imaging data challenges the available technologies for data interrogation and its intersection with related published bioinformatics data sets. Moreover, intersection of highly rich and complex datasets from different sources provided as flat csv files requires advanced informatics skills, which is time consuming and not accessible to all. Here, we provide a “user manual” to our new paradigm for systematically filtering and analysing a dataset with more than 1300 microscopy data figures using Multi-Dimensional Viewer (MDV) -link, a solution for interactive multimodal data visualisation and exploration. The primary data we use are derived from our published systematic analysis of 200 YFP traps reveals common discordance between mRNA and protein across the nervous system (eprint link). This manual provides the raw image data together with the expert annotations of the mRNA and protein distribution as well as associated bioinformatics data. We provide an explanation, with specific examples, of how to use MDV to make the multiple data types interoperable and explore them together. We also provide the open-source python code (github link) used to annotate the figures, which could be adapted to any other kind of data annotation task.

Clear search

Close search

Google apps

Main menu

Multi-Dimensional Data Viewer (MDV) user manual for data exploration:...

Data from: Multidimensional Data Exploration with Glue

DataCite public data exploration

Data_Sheet_1_ImputEHR: A Visualization Tool of Imputation for the Prediction...

Zegami user manual for data exploration: "Systematic analysis of YFP gene...

Healthcare Workforce Mental Health Dataset

Open Data Package for Article "Exploring Complexity Issues in Junior...

Dataset for twitter Sentiment Analysis using Roberta and Vader

Titanic data for Data Preprocessing

Description

Columns

Acknowledgements

Inspiration

Play Fairway Analysis of the Snake River Plain, Idaho: Final Report

Visualization of Stream Temperature of Logan River by exploring Water...

Classification Analysis Using Python

Data from: GreEn-ER - Electricity Consumption Data of a Tertiary Building

Data from: Reinforcement-based processes actively regulate motor exploration...

Follow-up Attention: An Empirical Study of Developer and Neural Model Code...

Multimodal Tactile Texture Dataset

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

Earthquake Early Warning Dataset

HEAPO Dataset

Exploration of Process-Structure Linkages in Simulated Additive...

Multi-Dimensional Data Viewer (MDV) user manual for data exploration: "Systematic analysis of YFP traps reveals common discordance between mRNA and protein across the nervous system"See More Versions

Multi-Dimensional Data Viewer (MDV) user manual for data exploration: "Systematic analysis of YFP traps reveals common discordance between mRNA and protein across the nervous system"