65 datasets found

Data from: Multidimensional Data Exploration with Glue
figshare.com
pdf
Updated Jan 18, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Openproceedings Bot (2016). Multidimensional Data Exploration with Glue [Dataset]. http://doi.org/10.6084/m9.figshare.935503.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.935503.v1
Dataset updated
Jan 18, 2016
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Openproceedings Bot
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Modern research projects incorporate data from several sources, and new insights are increasingly driven by the ability to interpret data in the context of other data. Glue is an interactive environment built on top of the standard Python science stack to visualize relationships within and between datasets. With Glue, users can load and visualize multiple related datasets simultaneously. Users specify the logical connections that exist between data, and Glue transparently uses this information as needed to enable visualization across files. This functionality makes it trivial, for example, to interactively overplot catalogs on top of images. The central philosophy behind Glue is that the structure of research data is highly customized and problem-specific. Glue aims to accommodate this and simplify the "data munging" process, so that researchers can more naturally explore what their data have to say. The result is a cleaner scientific workflow, faster interaction with data, and an easier avenue to insight.
DataCite public data exploration
redivis.com
Updated Apr 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ian Mathews (2025). DataCite public data exploration [Dataset]. https://redivis.com/workflows/hx1e-a6w8vmwsx
Explore at:
Dataset updated
Apr 29, 2025
Dataset provided by
Redivis Inc.
Authors
Ian Mathews
Description
This is a sample project highlighting some basic methodologies in working with the DataCite public data file and Data Citation Corpus on Redivis.

Using the transform interface, we extract all records associated with DOIs for Stanford datasets on Redivis. We then make a simple plot using a python notebook to see DOI issuance over time. The nested nature of some of the public data file fields makes exploration a bit challenging; future work could break this dataset into multiple related tables for easier analysis.

We can also join with the Data Citation Corpus to find all citations referencing Stanford-on-Redivis DOIs (the citation corpus is a work in progress, and doesn't currently capture many of the citations in the literature).
Multi-Dimensional Data Viewer (MDV) user manual for data exploration:...
zenodo.org
pdf, zip
Updated Jul 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Kiourlappou; Maria Kiourlappou; Martin Sergeant; Martin Sergeant; Joshua S. Titlow; Joshua S. Titlow; Jeffrey Y. Lee; Jeffrey Y. Lee; Darragh Ennis; Stephen Taylor; Stephen Taylor; Ilan Davis; Ilan Davis; Darragh Ennis (2024). Multi-Dimensional Data Viewer (MDV) user manual for data exploration: "Systematic analysis of YFP traps reveals common discordance between mRNA and protein across the nervous system" [Dataset]. http://doi.org/10.5281/zenodo.7875495
Explore at:
zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7875495
Dataset updated
Jul 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Maria Kiourlappou; Maria Kiourlappou; Martin Sergeant; Martin Sergeant; Joshua S. Titlow; Joshua S. Titlow; Jeffrey Y. Lee; Jeffrey Y. Lee; Darragh Ennis; Stephen Taylor; Stephen Taylor; Ilan Davis; Ilan Davis; Darragh Ennis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description

Please also see the latest version of the repository:
https://doi.org/10.5281/zenodo.6374011 and
our website: https://ilandavis.com/jcb2023-yfp

The explosion in the volume of biological imaging data challenges the available technologies for data interrogation and its intersection with related published bioinformatics data sets. Moreover, intersection of highly rich and complex datasets from different sources provided as flat csv files requires advanced informatics skills, which is time consuming and not accessible to all. Here, we provide a “user manual” to our new paradigm for systematically filtering and analysing a dataset with more than 1300 microscopy data figures using Multi-Dimensional Viewer (MDV) -link, a solution for interactive multimodal data visualisation and exploration. The primary data we use are derived from our published systematic analysis of 200 YFP traps reveals common discordance between mRNA and protein across the nervous system (eprint link). This manual provides the raw image data together with the expert annotations of the mRNA and protein distribution as well as associated bioinformatics data. We provide an explanation, with specific examples, of how to use MDV to make the multiple data types interoperable and explore them together. We also provide the open-source python code (github link) used to annotate the figures, which could be adapted to any other kind of data annotation task.
IMDb Top 4070: Explore the Cinema Data
kaggle.com
Updated Aug 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
K.T.S. Prabhu (2023). IMDb Top 4070: Explore the Cinema Data [Dataset]. https://www.kaggle.com/datasets/ktsprabhu/imdb-top-4070-explore-the-cinema-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 15, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
K.T.S. Prabhu
Description
Description: Dive into the world of exceptional cinema with our meticulously curated dataset, "IMDb's Gems Unveiled." This dataset is a result of an extensive data collection effort based on two critical criteria: IMDb ratings exceeding 7 and a substantial number of votes, surpassing 10,000. The outcome? A treasure trove of 4070 movies meticulously selected from IMDb's vast repository.

What sets this dataset apart is its richness and diversity. With more than 20 data points meticulously gathered for each movie, this collection offers a comprehensive insight into each cinematic masterpiece. Our data collection process leveraged the power of Selenium and Pandas modules, ensuring accuracy and reliability.

Cleaning this vast dataset was a meticulous task, combining both Excel and Python for optimum precision. Analysis is powered by Pandas, Matplotlib, and NLTK, enabling to uncover hidden patterns, trends, and themes within the realm of cinema.

Note: The data is collected as of April 2023. Future versions of this analysis include Movie recommendation system Please do connect for any queries, All Love, No Hate.
f
Data_Sheet_1_ImputEHR: A Visualization Tool of Imputation for the Prediction...
frontiersin.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yi-Hui Zhou; Ehsan Saghapour (2023). Data_Sheet_1_ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data.PDF [Dataset]. http://doi.org/10.3389/fgene.2021.691274.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2021.691274.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Yi-Hui Zhou; Ehsan Saghapour
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Electronic health records (EHRs) have been widely adopted in recent years, but often include a high proportion of missing data, which can create difficulties in implementing machine learning and other tools of personalized medicine. Completed datasets are preferred for a number of analysis methods, and successful imputation of missing EHR data can improve interpretation and increase our power to predict health outcomes. However, use of the most popular imputation methods mainly require scripting skills, and are implemented using various packages and syntax. Thus, the implementation of a full suite of methods is generally out of reach to all except experienced data scientists. Moreover, imputation is often considered as a separate exercise from exploratory data analysis, but should be considered as art of the data exploration process. We have created a new graphical tool, ImputEHR, that is based on a Python base and allows implementation of a range of simple and sophisticated (e.g., gradient-boosted tree-based and neural network) data imputation approaches. In addition to imputation, the tool enables data exploration for informed decision-making, as well as implementing machine learning prediction tools for response data selected by the user. Although the approach works for any missing data problem, the tool is primarily motivated by problems encountered for EHR and other biomedical data. We illustrate the tool using multiple real datasets, providing performance measures of imputation and downstream predictive analysis.
Zegami user manual for data exploration: "Systematic analysis of YFP gene...
zenodo.org
pdf, zip
Updated Jul 17, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Kiourlappou; Maria Kiourlappou; Stephen Taylor; Ilan Davis; Ilan Davis; Stephen Taylor (2024). Zegami user manual for data exploration: "Systematic analysis of YFP gene traps reveals common discordance between mRNA and protein across the nervous system" [Dataset]. http://doi.org/10.5281/zenodo.6374012
Explore at:
pdf, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6374012
Dataset updated
Jul 17, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Maria Kiourlappou; Maria Kiourlappou; Stephen Taylor; Ilan Davis; Ilan Davis; Stephen Taylor
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The explosion in biological data generation challenges the available technologies and methodologies for data interrogation. Moreover, highly rich and complex datasets together with diverse linked data are difficult to explore when provided in flat files. Here we provide a way to filter and analyse in a systematic way a dataset with more than 18 thousand data points using Zegami, a solution for interactive data visualisation and exploration. The primary data we use are derived from a systematic analysis of 200 YFP gene traps reveals common discordance between mRNA and protein across the nervous system which is submitted elsewhere. This manual provides the raw image data together with annotations and associated data and explains how to use Zegami for exploring all these data types together by providing specific examples. We also provide the open source python code used to annotate the figures.
o
Storytelling with Data
explore.openaire.eu
Updated Jun 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeremy R. Jeremy R. Manning (2021). Storytelling with Data [Dataset]. http://doi.org/10.5281/zenodo.5182774
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.5182774
Dataset updated
Jun 9, 2021
Authors
Jeremy R. Jeremy R. Manning
Description
Storytelling with Data is organized into 2 main parts. Part I comprises four modules, and is collectively aimed at introducing students to the process of creating "data stories" using Python data science tools: Module 1: What makes a good story? Module 2: Visualizing data Module 3: Python and Jupyter notebooks as a medium for data storytelling Module 4: Data science tools Part II is project-based, and revolves around mini data science projects. For each project, one or more students choose a question and dataset to explore and turn into a data story. Each week students and groups will report on their progress with the latest iterations of their stories. Students should aim to participate in three or more projects during Part II of the course. At students' discretion, those three (or more) projects may comprise the same questions and/or datasets (e.g., whereby each story builds on the previous story), or multiple questions and/or datasets that may or may not be related. In addition, students are encouraged to build off of each others' code, projects, and questions. Projects and project groups should form organically and should remain flexible to facilitate changing goals and interests.
Explore data formats and ingestion methods
kaggle.com
Updated Feb 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriel Preda (2021). Explore data formats and ingestion methods [Dataset]. https://www.kaggle.com/datasets/gpreda/iris-dataset/discussion?sort=undefined
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 12, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Gabriel Preda
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Why this Dataset

This dataset brings to you Iris Dataset in several data formats (see more details in the next sections).

You can use it to test the ingestion of data in all these formats using Python or R libraries. We also prepared Python Jupyter Notebook and R Markdown report that input all these formats:

Test Data Formats in Python

Test Data Formats in R

Iris Dataset

Iris Dataset was created by R. A. Fisher and donated by Michael Marshall.

Repository on UCI site: https://archive.ics.uci.edu/ml/datasets/iris

Data Source: https://archive.ics.uci.edu/ml/machine-learning-databases/iris/

The file downloaded is iris.data and is formatted as a comma delimited file.

This small data collection was created to help you test your skills with ingesting various data formats.

Content

This file was processed to convert the data in the following formats: * csv - comma separated values format * tsv - tab separated values format * parquet - parquet format
* feather - feather format * parquet.gzip - compressed parquet format * h5 - hdf5 format * pickle - Python binary object file - pickle format * xslx - Excel format
* npy - Numpy (Python library) binary format * npz - Numpy (Python library) binary compressed format * rds - Rds (R specific data format) binary format

Acknowledgements

I would like to acknowledge the work of the creator of the dataset - R. A. Fisher and of the donor - Michael Marshall.

Inspiration

Use these data formats to test your skills in ingesting data in various formats.
Meta Kaggle Code
kaggle.com
zip
Updated Aug 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
Explore at:
zip(153859818696 bytes)Available download formats
Dataset updated
Aug 21, 2025
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!
d
Data from: Reinforcement-based processes actively regulate motor exploration...
dataone.org
data.niaid.nih.gov
+2more
Updated Jul 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Adam Roth; Jan Calalo; Rakshith Lokesh; Seth Sullivan; Stephen Grill; John Jeka; Katinka van der Kooij; Michael Carter; Joshua Cashaback (2025). Reinforcement-based processes actively regulate motor exploration along redundant solution manifolds [Dataset]. http://doi.org/10.5061/dryad.ngf1vhj10
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.ngf1vhj10
Dataset updated
Jul 28, 2025
Dataset provided by
Dryad Digital Repository
Authors
Adam Roth; Jan Calalo; Rakshith Lokesh; Seth Sullivan; Stephen Grill; John Jeka; Katinka van der Kooij; Michael Carter; Joshua Cashaback
Time period covered
Sep 12, 2023
Description
From a babyâ€™s babbling to a songbird practicing a new tune, exploration is critical to motor learning. A hallmark of exploration is the emergence of random walk behaviour along solution manifolds, where successive motor actions are not independent but rather become serially dependent. Such exploratory random walk behaviour is ubiquitous across species, neural firing, gait patterns, and reaching behaviour. Past work has suggested that exploratory random walk behaviour arises from an accumulation of movement variability and a lack of error-based corrections. Here we test a fundamentally different ideaâ€”that reinforcement-based processes regulate random walk behaviour to promote continual motor exploration to maximize success. Across three human-reaching experiments, we manipulated the size of both the visually displayed target and an unseen reward zone, as well as the probability of reinforcement feedback. Our empirical and modelling results parsimoniously support the notion that explorato..., Data was collected using a Kinarm and processed using Kinarm's Matlab scripts. The output of the Matlab scripts was then processed using Python (3.8.13) and stored in custom Python objects.Â , , # Reinforcement-Based Processes Actively Regulate Motor Exploration Along Redundant Solution Manifolds

https://doi.org/10.5061/dryad.ngf1vhj10

All files are compressed using the Python package dill. Each file contains a custom Python object that has data attributes and analysis methods. For a complete list of methods and attributes, see Exploration_Subject.py in the repository https://github.com/CashabackLab/Exploration-Along-Solution-Manifolds-Data

Files can be read into a Python script via the class method "from_pickle" inside the Exploration_Subject class.
Healthcare Workforce Mental Health Dataset
kaggle.com
Updated Feb 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rivalytics (2025). Healthcare Workforce Mental Health Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/10768196
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/10768196
Dataset updated
Feb 16, 2025
Dataset provided by
Kaggle
Authors
Rivalytics
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
📌**Context**

The Healthcare Workforce Mental Health Dataset is designed to explore workplace mental health challenges in the healthcare industry, an environment known for high stress and burnout rates.

This dataset enables users to analyze key trends related to:

💠 Workplace Stressors: Examining the impact of heavy workloads, poor work environments, and emotional demands.

💠 Mental Health Outcomes: Understanding how stress and burnout influence job satisfaction, absenteeism, and turnover intention.

💠 Educational & Analytical Applications: A valuable resource for data analysts, students, and career changers looking to practice skills in data exploration and data visualization.

To help users gain deeper insights, this dataset is fully compatible with a Power BI Dashboard, available as part of a complete analytics bundle for enhanced visualization and reporting.

📌**Source**

This dataset was synthetically generated using the following methods:

💠 Python & Data Science Techniques: Probabilistic modeling to simulate realistic data distributions. Industry-informed variable relationships based on healthcare workforce studies.

💠 Guidance & Validation Using AI (ChatGPT): Assisted in refining dataset realism and logical mappings.

💠 Industry Research & Reports: Based on insights from WHO, CDC, OSHA, and academic studies on workplace stress and mental health in healthcare settings.

📌**Inspiration**

This dataset was inspired by ongoing discussions in healthcare regarding burnout, mental health, and staff retention. The goal is to bridge the gap between raw data and actionable insights by providing a structured, analyst-friendly dataset.

For those who want a ready-to-use reporting solution, a Power BI Dashboard Template is available, designed for interactive data exploration, workforce insights, and stress factor analysis.

📌**Important Note** This dataset is synthetic and intended for educational purposes only. It is not real-world employee data and should not be used for actual decision-making or policy implementation.
f
World Color Survey visualization web app
datasetcatalog.nlm.nih.gov
figshare.com
Updated Feb 25, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vejdemo-Johansson, Mikael; Ek, Carl-Henrik; Vejdemo, Susanne (2015). World Color Survey visualization web app [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001923653
Explore at:
Dataset updated
Feb 25, 2015
Authors
Vejdemo-Johansson, Mikael; Ek, Carl-Henrik; Vejdemo, Susanne
Description
This is a full distribution of a webapp for visualizing the World Color Survey using Mapper. The zip archive contains all files used, including the PyMapper library [MB]. It will distribute into the /var/www library of a file system, and includes a README-file with technical details for getting file permissions right. The app depends on a Python 2 installation (2.7 or later), with the colormath, matplotlib, numpy and scipy additional libraries. [MB] Daniel Müllner and Aravindakshan Babu, Python Mapper: An open-source toolchain for data exploration, analysis and visualization, 2013, URL http://danifold.net/mapper
f
Data from: OpenColab project: OpenSim in Google colaboratory to explore...
tandf.figshare.com
docx
Updated Jul 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hossein Mokhtarzadeh; Fangwei Jiang; Shengzhe Zhao; Fatemeh Malekipour (2023). OpenColab project: OpenSim in Google colaboratory to explore biomechanics on the web [Dataset]. http://doi.org/10.6084/m9.figshare.20440340.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20440340.v1
Dataset updated
Jul 6, 2023
Dataset provided by
Taylor & Francis
Authors
Hossein Mokhtarzadeh; Fangwei Jiang; Shengzhe Zhao; Fatemeh Malekipour
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
OpenSim is an open-source biomechanical package with a variety of applications. It is available for many users with bindings in MATLAB, Python, and Java via its application programming interfaces (APIs). Although the developers described well the OpenSim installation on different operating systems (Windows, Mac, and Linux), it is time-consuming and complex since each operating system requires a different configuration. This project aims to demystify the development of neuro-musculoskeletal modeling in OpenSim with zero configuration on any operating system for installation (thus cross-platform), easy to share models while accessing free graphical processing units (GPUs) on a web-based platform of Google Colab. To achieve this, OpenColab was developed where OpenSim source code was used to build a Conda package that can be installed on the Google Colab with only one block of code in less than 7 min. To use OpenColab, one requires a connection to the internet and a Gmail account. Moreover, OpenColab accesses vast libraries of machine learning methods available within free Google products, e.g. TensorFlow. Next, we performed an inverse problem in biomechanics and compared OpenColab results with OpenSim graphical user interface (GUI) for validation. The outcomes of OpenColab and GUI matched well (r≥0.82). OpenColab takes advantage of the zero-configuration of cloud-based platforms, accesses GPUs, and enables users to share and reproduce modeling approaches for further validation, innovative online training, and research applications. Step-by-step installation processes and examples are available at: https://simtk.org/projects/opencolab.
d
Play Fairway Analysis of the Snake River Plain, Idaho: Final Report
catalog.data.gov
data.openei.org
+3more
Updated Jan 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Utah State University (2025). Play Fairway Analysis of the Snake River Plain, Idaho: Final Report [Dataset]. https://catalog.data.gov/dataset/play-fairway-analysis-of-the-snake-river-plain-idaho-final-report-3cd0c
Explore at:
Dataset updated
Jan 20, 2025
Dataset provided by
Utah State University
Area covered
Snake River, Idaho, Snake River Plain
Description
This submission includes the final project report of the Snake River Plain Play Fairway Analysis project as well as a separate appendix for the final report. The final report outlines the application of Play Fairway Analysis (PFA) to geothermal exploration, specifically within the Snake River Plain volcanic province. The goals of the report are to use PFA to lower risk and cost of geothermal exploration and stimulate development of geothermal power resources in Idaho. Further use of this report could include the application of PFA for geothermal exploration throughout the geothermal industry. The report utilizes ArcGIS and Python for data analysis which used to developed a systematic workflow to automate data analysis. The appendix for the report includes ArcGIS maps and data compilation information regarding the report.
o
Data Manipulation on Heart Disease Dataset Using Pandas Library.
explore.openaire.eu
Updated Jul 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alaa Saif; Janat Alkhuld M. (2023). Data Manipulation on Heart Disease Dataset Using Pandas Library. [Dataset]. http://doi.org/10.5281/zenodo.8113014
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.8113014
Dataset updated
Jul 4, 2023
Authors
Alaa Saif; Janat Alkhuld M.
Description
With the constant development our world is facing, new diseases and dangers are marked down in human history as "Modern Day Diseases". In the developing world, the risk of heart diseas and related cardiovascular diseases are on the rise. This dataset aquired contains a dataset that is considered a stepping stone in the work to be done ahead in order to prevent the development or the occurance of a heart attack or stroke.
AdventureWorks-2014
kaggle.com
Updated Aug 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick McKown (2024). AdventureWorks-2014 [Dataset]. https://www.kaggle.com/datasets/duckduckboot/adventureworks-2014
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 18, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Patrick McKown
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
About This Dataset

This dataset is derived from the AdventureWorks 2014 test database published by Microsoft, and is designed to simplify and enhance data analysis workflows. The dataset consists of multiple CSV files that have been pre-joined and transformed from the original SQL database, facilitating a smoother analytical experience in Python.

Dataset Composition

The dataset includes: * SalesOrderHeader: Integrates the sales header and sales item tables, providing a unified view of sales transactions. * CustomerMaster: Combines customer names, countries, addresses, and other related information into a single, comprehensive file. * VendorMaster: Combines vendor names, countries, addresses, and other related information into a single, comprehensive file.

These pre-joined CSVs aim to streamline data analysis, making it more accessible for users working in Python. The dataset can be used to showcase various Python projects or as a foundation for your own analyses.

Usage

Feel free to leverage this dataset for your data analysis projects, explore trends, and create visualizations. Whether you're showcasing your own Python projects or conducting independent analyses, this dataset is designed to support a wide range of data science tasks.

Documentation

For those interested in recreating the CSV files from the SQL database, detailed documentation is included at the bottom of this section. It provides step-by-step instructions on how to replicate the CSVs from the AdventureWorks 2014 database using SQL queries.

AdventureWorks_SalesOrderHeader

SELECT SalesOrderID , CAST (OrderDate AS date) AS OrderDate , CAST (ShipDate AS date) AS ShipDate , CustomerID , ShipToAddressID , BillToAddressID , SubTotal , TaxAmt , Freight , TotalDue FROM Sales.SalesOrderHeader

AdventureWorks_CustomerMaster

SELECT pa.AddressID , pbea.BusinessEntityID , pa.AddressLine1 , pa.City , pa.PostalCode , psp.[Name] AS ProvinceStateName , pat.[Name] AS AddressType , pea.EmailAddress , ppp.PhoneNumber , pp.FirstName , pp.LastName , sst.CountryRegionCode , pcr.[Name] AS CountryName , sst.[Group] AS CountryGroup FROM Person.[Address] AS pa INNER JOIN Person.BusinessEntityAddress AS pbea ON pa.AddressID = pbea.AddressID INNER JOIN Person.StateProvince AS psp ON pa.StateProvinceID = psp.StateProvinceID INNER JOIN Person.AddressType AS pat ON pbea.AddressTypeID = pat.AddressTypeID INNER JOIN Person.EmailAddress AS pea ON pbea.BusinessEntityID = pea.BusinessEntityID INNER JOIN Person.Person AS pp ON pbea.BusinessEntityID = pp.BusinessEntityID INNER JOIN Person.PersonPhone AS ppp ON pbea.BusinessEntityID = ppp.BusinessEntityID INNER JOIN Sales.SalesTerritory AS sst ON psp.TerritoryID = sst.TerritoryID INNER JOIN Person.CountryRegion AS pcr ON sst.CountryRegionCode = pcr.CountryRegionCode;
d
Waterhackweek 2019 Cyberseminar: Data access and time-series statistics
search.dataone.org
hydroshare.org
+1more
Updated Dec 5, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emilio Mayorga; Yifan Cheng (2021). Waterhackweek 2019 Cyberseminar: Data access and time-series statistics [Dataset]. https://search.dataone.org/view/sha256%3A2536d9bad9399388928955f129d510b295fff064e6c4fb8edb45bc921a8bbaca
Explore at:
Dataset updated
Dec 5, 2021
Dataset provided by
Hydroshare
Authors
Emilio Mayorga; Yifan Cheng
Time period covered
Feb 7, 2019
Description
Data about water are found in many types of formats distributed by many different sources and depicting different spatial representations such as points, polygons and grids. How do we find and explore the data we need for our specific research or application? This seminar will present common challenges and strategies for finding and accessing relevant datasets, focusing on time series data from sites commonly represented as fixed geographical points. This type of data may come from automated monitoring stations such as river gauges and weather stations, from repeated in-person field observations and samples, or from model output and processed data products. We will present and explore useful data catalogs, including the CUAHSI HIS catalog accessible via HydroClient, CUAHSI HydroShare, the EarthCube Data Discovery Studio, Google Dataset search, and agency-specific catalogs. We will also discuss programmatic data access approaches and tools in Python, particularly the ulmo data access package, touching on the role of community standards for data formats and data access protocols. Once we have accessed datasets we are interested in, the next steps are typically exploratory, focusing on visualization and statistical summaries. This seminar will illustrate useful approaches and Python libraries used for processing and exploring time series data, with an emphasis on the distinctive needs posed by temporal data. Core Python packages used include Pandas, GeoPandas, Matplotlib and the geospatial visualization tools introduced at the last seminar. Approaches presented can be applied to other data types that can be summarized as single time series, such as averages over a watershed or data extracts from a single cell in a gridded dataset – the topic for the next seminar.
d
Features extraction from the LAI2200C Plant Canopy Analyzer
search.dataone.org
hydroshare.org
Updated Dec 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rui Gao; Alfonso Faustino Torres-Rua (2021). Features extraction from the LAI2200C Plant Canopy Analyzer [Dataset]. http://doi.org/10.4211/hs.6d0c4a14289742d0951ba5ab9eca7dc0
Explore at:
Unique identifier
https://doi.org/10.4211/hs.6d0c4a14289742d0951ba5ab9eca7dc0
Dataset updated
Dec 5, 2021
Dataset provided by
Hydroshare
Authors
Rui Gao; Alfonso Faustino Torres-Rua
Time period covered
Jan 1, 2020 - Dec 31, 2022
Description
Leaf area index (LAI) plays an important role in land-surface models to describe the energy, carbon, and water fluxes between the soil and canopy vegetation. Indirect ground LAI measurements, such as using the LAI2200C Plant Canopy Analyzer (PCA), can not only increase the measurement efficiency but also protect the vegetation compared with the direct and destructive ground LAI measurement. Additionally, indirect measurements provide opportunities for remote-sensing-based LAI monitoring. This project focuses on the extraction of several features observed using the LAI2200C PCA because the extracted features can help to explore the relationship between the ground measurements and remote sensing data. Although FV2200 software can provide convenient data calculation, data visualization, etc., it cannot generate features such as time, coordinates, and LAI from the data log for deeper exploration, especially when facing a large amount of collected data that needs to process. In order to increase efficiency, this project developed a simple python script for feature extraction, and demo data are provided.
q
Earth Analytics in Python Course
qubeshub.org
Updated Nov 6, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leah Wasser; Jenny Palomino; Chris Holdgraf (2019). Earth Analytics in Python Course [Dataset]. http://doi.org/10.25334/SH4R-QH25
Explore at:
Unique identifier
https://doi.org/10.25334/SH4R-QH25
Dataset updated
Nov 6, 2019
Dataset provided by
QUBES
Authors
Leah Wasser; Jenny Palomino; Chris Holdgraf
Area covered
Earth
Description
Earth analytics is an intermediate, multidisciplinary course that addresses major questions in Earth science and teaches students to use the analytical tools necessary to undertake exploration of heterogeneous ‘big scientific data’.
Titanic data for Data Preprocessing
kaggle.com
Updated Oct 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akshay Sehgal (2021). Titanic data for Data Preprocessing [Dataset]. https://www.kaggle.com/akshaysehgal/titanic-data-for-data-preprocessing/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 28, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Akshay Sehgal
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Description

Public "Titanic" dataset for data exploration, preprocessing and benchmarking basic classification/regression models.

Columns

'survived'

'pclass'

'sex'

'age'

'sibsp'

'parch'

'fare'

'embarked'

'class'

'who'

'adult_male'

'deck'

'embark_town'

'alive'

'alone'

Acknowledgements

Github: https://github.com/mwaskom/seaborn-data/blob/master/titanic.csv

Inspiration

Playground for visualizations, preprocessing feature engineering, model pipelining, and more.

Facebook

Twitter

Click to copy link

Link copied

Cite

Openproceedings Bot (2016). Multidimensional Data Exploration with Glue [Dataset]. http://doi.org/10.6084/m9.figshare.935503.v1

Data from: Multidimensional Data Exploration with Glue

Explore at:

pdfAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.935503.v1

Dataset updated

Jan 18, 2016

Dataset provided by

figshare
Figsharehttp://figshare.com/

Authors

Openproceedings Bot

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Modern research projects incorporate data from several sources, and new insights are increasingly driven by the ability to interpret data in the context of other data. Glue is an interactive environment built on top of the standard Python science stack to visualize relationships within and between datasets. With Glue, users can load and visualize multiple related datasets simultaneously. Users specify the logical connections that exist between data, and Glue transparently uses this information as needed to enable visualization across files. This functionality makes it trivial, for example, to interactively overplot catalogs on top of images. The central philosophy behind Glue is that the structure of research data is highly customized and problem-specific. Glue aims to accommodate this and simplify the "data munging" process, so that researchers can more naturally explore what their data have to say. The result is a cleaner scientific workflow, faster interaction with data, and an easier avenue to insight.

Clear search

Close search

Google apps

Main menu

Data from: Multidimensional Data Exploration with Glue

DataCite public data exploration

Multi-Dimensional Data Viewer (MDV) user manual for data exploration:...

IMDb Top 4070: Explore the Cinema Data

Data_Sheet_1_ImputEHR: A Visualization Tool of Imputation for the Prediction...

Zegami user manual for data exploration: "Systematic analysis of YFP gene...

Storytelling with Data

Explore data formats and ingestion methods

Why this Dataset

Iris Dataset

Content

Acknowledgements

Inspiration

Meta Kaggle Code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments

Data from: Reinforcement-based processes actively regulate motor exploration...

Healthcare Workforce Mental Health Dataset

World Color Survey visualization web app

Data from: OpenColab project: OpenSim in Google colaboratory to explore...

Play Fairway Analysis of the Snake River Plain, Idaho: Final Report

Data Manipulation on Heart Disease Dataset Using Pandas Library.

AdventureWorks-2014

About This Dataset

Dataset Composition

Usage

Documentation

AdventureWorks_SalesOrderHeader

AdventureWorks_CustomerMaster

Waterhackweek 2019 Cyberseminar: Data access and time-series statistics

Features extraction from the LAI2200C Plant Canopy Analyzer

Earth Analytics in Python Course

Titanic data for Data Preprocessing

Description

Columns

Acknowledgements

Inspiration

Data from: Multidimensional Data Exploration with Glue