100+ datasets found

o
Data Manipulation on Heart Disease Dataset Using Pandas Library.
explore.openaire.eu
Updated Jul 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alaa Saif; Janat Alkhuld M. (2023). Data Manipulation on Heart Disease Dataset Using Pandas Library. [Dataset]. http://doi.org/10.5281/zenodo.8113014
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.8113014
Dataset updated
Jul 4, 2023
Authors
Alaa Saif; Janat Alkhuld M.
Description
With the constant development our world is facing, new diseases and dangers are marked down in human history as "Modern Day Diseases". In the developing world, the risk of heart diseas and related cardiovascular diseases are on the rise. This dataset aquired contains a dataset that is considered a stepping stone in the work to be done ahead in order to prevent the development or the occurance of a heart attack or stroke.
Sample data files for Python Course
figshare.com
txt
Updated Nov 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter Verhaar (2022). Sample data files for Python Course [Dataset]. http://doi.org/10.6084/m9.figshare.21501549.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21501549.v1
Dataset updated
Nov 4, 2022
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Peter Verhaar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sample data set used in an introductory course on Programming in Python
Python Codes for Data Analysis of The Impact of COVID-19 on Technical...
figshare.com
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elizabeth Szkirpan (2022). Python Codes for Data Analysis of The Impact of COVID-19 on Technical Services Units Survey Results [Dataset]. http://doi.org/10.6084/m9.figshare.20416092.v1
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.20416092.v1
Dataset updated
Aug 1, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
Elizabeth Szkirpan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Copies of Anaconda 3 Jupyter Notebooks and Python script for holistic and clustered analysis of "The Impact of COVID-19 on Technical Services Units" survey results. Data was analyzed holistically using cleaned and standardized survey results and by library type clusters. To streamline data analysis in certain locations, an off-shoot CSV file was created so data could be standardized without compromising the integrity of the parent clean file. Three Jupyter Notebooks/Python scripts are available in relation to this project: COVID_Impact_TechnicalServices_HolisticAnalysis (a holistic analysis of all survey data) and COVID_Impact_TechnicalServices_LibraryTypeAnalysis (a clustered analysis of impact by library type, clustered files available as part of the Dataverse for this project).
Data from: PLEIAData: real data from the Pleiades building for smart...
zenodo.org
zip
Updated Feb 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Antonio Martínez Ibarra; Antonio Martínez Ibarra; Aurora González-Vidal; Aurora González-Vidal; Antonio Skarmeta Gómez; Antonio Skarmeta Gómez (2023). PLEIAData: real data from the Pleiades building for smart applications [Dataset]. http://doi.org/10.5281/zenodo.7096790
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7096790
Dataset updated
Feb 8, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Antonio Martínez Ibarra; Antonio Martínez Ibarra; Aurora González-Vidal; Aurora González-Vidal; Antonio Skarmeta Gómez; Antonio Skarmeta Gómez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset presents detailed building operation data from the three blocks (A, B and C) of the Pleiades building of the University of Murcia, which is a pilot building of the European project PHOENIX. The aim of PHOENIX is to improve buildings efficiency, and therefore we included information of:
(i) consumption data, aggregated by block in kWh; (ii) HVAC (Heating, Ventilation and Air Conditioning) data with several features, such as state (ON=1, OFF=0), operation mode (None=0, Heating=1, Cooling=2), setpoint and device type; (iii) indoor temperature per room; (iv) weather data, including temperature, humidity, radiation, dew point, wind direction and precipitation; (v) carbon dioxide and presence data for few rooms; (vi) relationships between HVAC, temperature, carbon dioxide and presence sensors identifiers with their respective rooms and blocks. Weather data was acquired from the IMIDA (Instituto Murciano de Investigación y Desarrollo Agrario y Alimentario).
Learn Data Science Series Part 1
kaggle.com
Updated Dec 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rupesh Kumar (2022). Learn Data Science Series Part 1 [Dataset]. https://www.kaggle.com/datasets/hunter0007/learn-data-science-part-1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 30, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rupesh Kumar
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Please feel free to share it with others and consider supporting me if you find it helpful ⭐️.

Overview:

Chapter 1: Getting started with pandas

Chapter 2: Analysis: Bringing it all together and making decisions

Chapter 3: Appending to DataFrame

Chapter 4: Boolean indexing of dataframes

Chapter 5: Categorical data

Chapter 6: Computational Tools

Chapter 7: Creating DataFrames

Chapter 8: Cross sections of different axes with MultiIndex

Chapter 9: Data Types

Chapter 10: Dealing with categorical variables

Chapter 11: Duplicated data

Chapter 12: Getting information about DataFrames

Chapter 13: Gotchas of pandas

Chapter 14: Graphs and Visualizations

Chapter 15: Grouping Data

Chapter 16: Grouping Time Series Data

Chapter 17: Holiday Calendars

Chapter 18: Indexing and selecting data

Chapter 19: IO for Google BigQuery

Chapter 20: JSON

Chapter 21: Making Pandas Play Nice With Native Python Datatypes

Chapter 22: Map Values

Chapter 23: Merge, join, and concatenate

Chapter 24: Meta: Documentation Guidelines

Chapter 25: Missing Data

Chapter 26: MultiIndex

Chapter 27: Pandas Datareader

Chapter 28: Pandas IO tools (reading and saving data sets)

Chapter 29: pd.DataFrame.apply

Chapter 30: Read MySQL to DataFrame

Chapter 31: Read SQL Server to Dataframe

Chapter 32: Reading files into pandas DataFrame

Chapter 33: Resampling

Chapter 34: Reshaping and pivoting

Chapter 35: Save pandas dataframe to a csv file

Chapter 36: Series

Chapter 37: Shifting and Lagging Data

Chapter 38: Simple manipulation of DataFrames

Chapter 39: String manipulation

Chapter 40: Using .ix, .iloc, .loc, .at and .iat to access a DataFrame

Chapter 41: Working with Time Series
u
Python codes for STM data analysis
researchdata.cab.unipd.it
Updated 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christian Durante; Francesco Cazzadori; Alessandro Facchin; Silvio Reginato (2025). Python codes for STM data analysis [Dataset]. http://doi.org/10.25430/researchdata.cab.unipd.it.00001489
Explore at:
Unique identifier
https://doi.org/10.25430/researchdata.cab.unipd.it.00001489
Dataset updated
2025
Dataset provided by
Research Data Unipd
Authors
Christian Durante; Francesco Cazzadori; Alessandro Facchin; Silvio Reginato
Description
Python codes were conceived to work with ASCII .txt files with XYZ arrays, both as input and output. This makes codes highly compatible and universally usable. Code A provides an example of conversion from a .s94 data format to the requested ASCII .txt. Image analysis software always allow to export source files to .txt files with XYZ arrays, sometimes placing a text header before the data values to indicate the data scales. The script (code A) is created to convert raw STM files (.s94) into XYZ-type ASCII files, that can be opened by the WSxM software The script (code B) is developed to read the XYZ-type ASCII files and perform the flattening and equalizing filters by operating with an entire input file folder. The script (code C) was conceived with the possibility of optimizing the number of clusters The script (code D) reads a sample of images starting from the first one to the number N, which is selected by the user, it calculates the maximum extension of the Z values distribution for every image and returns an average extension value The script (code E) correct the drift affecting STM images
o
Introduction to Machine Learning using Python: Classification
explore.openaire.eu
Updated Jan 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Khuong Tran; Dr Ghulam Murtaza; Dr Anastasios Papaioannou (2020). Introduction to Machine Learning using Python: Classification [Dataset]. http://doi.org/10.5281/zenodo.6423725
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6423725
Dataset updated
Jan 1, 2020
Authors
Khuong Tran; Dr Ghulam Murtaza; Dr Anastasios Papaioannou
Description
About this course Machine Learning (ML) is a new way to program computers to solve real world problems. It has gained popularity over the last few years by achieving tremendous success in tasks that we believed only humans could solve, from recognising images to self-driving cars. In this course, we will explore the fundamentals of Machine Learning from a practical perspective with the help of the Python programming language and its scientific computing libraries. Learning Outcomes Comprehensive introduction to Machine Learning models and techniques such as Logistic Regression, Decision Trees and Ensemble Learning. Know the differences between various core Machine Learning models. Understand the Machine Learning modelling workflows. Use Python and scikit-learn to process real datasets, train and apply Machine Learning models Prerequisites Either Learn to Program: Python, Data Manipulation in Python and Introduction to ML using Python: Introduction & Linear Regression or Learn to Program: Python, Data Manipulation and Visualisation in Python and Introduction to ML using Python: Introduction & Linear Regression needed to attend this course. If you already have experience with programming, please check the topics covered in the Learn to Program: Python, Data Manipulation in Python, Data Manipulation and Visualisation in Python and Introduction to ML using Python: Introduction & Linear Regression courses to ensure that you are familiar with the knowledge needed for this course, such as good understanding of Python syntax, basic programming concepts and familiarity with Pandas, Numpy and Seaborn libraries, and basic understanding of Machine Learning and Model Training. Maths knowledge is not required. There are only a few Math formula that you are going to see in this course, however references to Mathematics required for learning about Machine Learning will be provided. Having an understanding of the Mathematics behind each Machine Learning algorithms is going to make you appreciate the behaviour of the model and know its pros/cons when using them. Why do this course? Useful for anyone who wants to learn about Machine Learning but are overwhelmed with the tremendous amount of resources. It does not go in depth into mathematical concepts and formula, however formal intuitions and references are provided to guide the participants for further learning. We do have applications on real datasets! Machine Learning models are introduced in this course together with important feature engineering techniques that are guaranteed to be useful in your own projects. Give you enough background to kickstart your own Machine Learning journey, or transition yourself into Deep Learning. For a better and more complete understanding of the most popular Machine Learning models and techniques please consider attending all three Introduction to Machine Learning using Python workshops: Introduction to Machine Learning using Python: Introduction & Linear Regression Introduction to Machine Learning using Python: Classification Introduction to Machine Learning using Python: SVM & Unsupervised Learning Licence Copyright © 2021 Intersect Australia Ltd. All rights reserved.
f
Data analysis V5 for python.xlsx
figshare.com
xlsx
Updated May 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pingfei Jiang (2025). Data analysis V5 for python.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.28956233.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28956233.v1
Dataset updated
May 8, 2025
Dataset provided by
figshare
Authors
Pingfei Jiang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the original data for processing for manuscript "A Comparative Study on Retrieval-Augmented Generation and Chain-of-Thought Applications for LLM-Assisted Engineering Design Ideation"
H
Using Python Packages and HydroShare to Advance Open Data Science and...
hydroshare.org
beta.hydroshare.org
zip
Updated Sep 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffery S. Horsburgh; Amber Spackman Jones; Anthony M. Castronova; Scott Black (2023). Using Python Packages and HydroShare to Advance Open Data Science and Analytics for Water [Dataset]. https://www.hydroshare.org/resource/4f4acbab5a8c4c55aa06c52a62a1d1fb
Explore at:
zip(31.0 MB)Available download formats
Dataset updated
Sep 28, 2023
Dataset provided by
HydroShare
Authors
Jeffery S. Horsburgh; Amber Spackman Jones; Anthony M. Castronova; Scott Black
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Scientific and management challenges in the water domain require synthesis of diverse data. Many data analysis tasks are difficult because datasets are large and complex; standard data formats are not always agreed upon or mapped to efficient structures for analysis; scientists may lack training for tackling large and complex datasets; and it can be difficult to share, collaborate around, and reproduce scientific work. Overcoming barriers to accessing, organizing, and preparing datasets for analyses can transform the way water scientists work. Building on the HydroShare repository’s cyberinfrastructure, we have advanced two Python packages that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS) (i.e., a Python equivalent of USGS’ R dataRetrieval package), loading data into performant structures that integrate with existing visualization, analysis, and data science capabilities available in Python, and writing analysis results back to HydroShare for sharing and publication. While these Python packages can be installed for use within any Python environment, we will demonstrate how the technical burden for scientists associated with creating a computational environment for executing analyses can be reduced and how sharing and reproducibility of analyses can be enhanced through the use of these packages within CUAHSI’s HydroShare-linked JupyterHub server.

This HydroShare resource includes all of the materials presented in a workshop at the 2023 CUAHSI Biennial Colloquium.
Data analysis codes
figshare.com
txt
Updated Sep 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr Auguste Vadisiute; Fernando Messore; Marissa Mueller (2024). Data analysis codes [Dataset]. http://doi.org/10.6084/m9.figshare.26963674.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26963674.v1
Dataset updated
Sep 7, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Dr Auguste Vadisiute; Fernando Messore; Marissa Mueller
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data analysis scripts for neurone, glial cells and interneurons
d
(HS 2) Automate Workflows using Jupyter notebook to create Large Extent...
search.dataone.org
hydroshare.org
Updated Oct 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Young-Don Choi (2024). (HS 2) Automate Workflows using Jupyter notebook to create Large Extent Spatial Datasets [Dataset]. http://doi.org/10.4211/hs.a52df87347ef47c388d9633925cde9ad
Explore at:
Unique identifier
https://doi.org/10.4211/hs.a52df87347ef47c388d9633925cde9ad
Dataset updated
Oct 19, 2024
Dataset provided by
Hydroshare
Authors
Young-Don Choi
Description
We implemented automated workflows using Jupyter notebooks for each state. The GIS processing, crucial for merging, extracting, and projecting GeoTIFF data, was performed using ArcPy—a Python package for geographic data analysis, conversion, and management within ArcGIS (Toms, 2015). After generating state-scale LES (large extent spatial) datasets in GeoTIFF format, we utilized the xarray and rioxarray Python packages to convert GeoTIFF to NetCDF. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. Xarray facilitated data manipulation and metadata addition in the NetCDF file, while rioxarray was used to save GeoTIFF as NetCDF. These procedures resulted in the creation of three HydroShare resources (HS 3, HS 4 and HS 5) for sharing state-scale LES datasets. Notably, due to licensing constraints with ArcGIS Pro, a commercial GIS software, the Jupyter notebook development was undertaken on a Windows OS.
Storage and Transit Time Data and Code
zenodo.org
zip
Updated Oct 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Felton; Andrew Felton (2024). Storage and Transit Time Data and Code [Dataset]. http://doi.org/10.5281/zenodo.14009758
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14009758
Dataset updated
Oct 29, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andrew Felton; Andrew Felton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Author: Andrew J. Felton
Date: 10/29/2024

This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis, and figure production for the study entitled:

"Global estimates of the storage and transit time of water through vegetation"

Please note that 'turnover' and 'transit' are used interchangeably. Also please note that this R project has been updated multiple times as the analysis has updated.

Data information:

The data folder contains key data sets used for analysis. In particular:

"data/turnover_from_python/updated/august_2024_lc/" contains the core datasets used in this study including global arrays summarizing five year (2016-2020) averages of mean (annual) and minimum (monthly) transit time, storage, canopy transpiration, and number of months of data able as both an array (.nc) or data table (.csv). These data were produced in python using the python scripts found in the "supporting_code" folder. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here. The "supporting_data"" folder also contains annual (2016-2020) MODIS land cover data used in the analysis and contains separate filters containing the original data (.hdf) and then the final process (filtered) data in .nc format. The resulting annual land cover distributions were used in the pre-processing of data in python.

#Code information

Python scripts can be found in the "supporting_code" folder.

Each R script in this project has a role:

"01_start.R": This script sets the working directory, loads in the tidyverse package (the remaining packages in this project are called using the `::` operator), and can run two other scripts: one that loads the customized functions (02_functions.R) and one for importing and processing the key dataset for this analysis (03_import_data.R).

"02_functions.R": This script contains custom functions. Load this using the
`source()` function in the 01_start.R script.

"03_import_data.R": This script imports and processes the .csv transit data. It joins the mean (annual) transit time data with the minimum (monthly) transit data to generate one dataset for analysis: annual_turnover_2. Load this using the
`source()` function in the 01_start.R script.

"04_figures_tables.R": This is the main workhouse for figure/table production and
supporting analyses. This script generates the key figures and summary statistics
used in the study that then get saved in the manuscript_figures folder. Note that all
maps were produced using Python code found in the "supporting_code"" folder.

"supporting_generate_data.R": This script processes supporting data used in the analysis, primarily the varying ground-based datasets of leaf water content.

"supporting_process_land_cover.R": This takes annual MODIS land cover distributions and processes them through a multi-step filtering process so that they can be used in preprocessing of datasets in python.
COM model and data analysis scripts
figshare.com
search.datacite.org
txt
Updated Jan 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
José Pedro Correia (2016). COM model and data analysis scripts [Dataset]. http://doi.org/10.6084/m9.figshare.1428652.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1428652.v1
Dataset updated
Jan 19, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
José Pedro Correia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This fileset contains scripts used for model implementation, simulation execution, and data processing for the work presented in J.P. Correia, R. Ocelák, and J. Mašek's "Towards more realistic modeling of linguistic color categorization" (to appear). Python script for model implementation and simulation execution is adapted from an another implementation originally by Gerhard Jaeger and later extended by Michael Franke. The code is provided as is to support a deeper understanding of the details involved in the data analysis we carried out. It is not fully organized or documented (it might even be a bit hacky in places), and for that we apologize.
Data processing tools python
kaggle.com
Updated Apr 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amir Islam (2024). Data processing tools python [Dataset]. https://www.kaggle.com/datasets/amirislam100/data-processing-tools-python
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 26, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Amir Islam
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Amir Islam

Released under MIT

Contents
o
Scientific Data Analysis and Visualization with Python
explore.openaire.eu
Updated Feb 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md. Jalal Uddin (2022). Scientific Data Analysis and Visualization with Python [Dataset]. http://doi.org/10.5281/zenodo.5944708
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.5944708
Dataset updated
Feb 2, 2022
Authors
Md. Jalal Uddin
Description
The training materials are provided for international learners. However, the following lectures on Python are available on YouTube for both international and Bangladeshi learners. For international learners: https://youtube.com/playlist?list=PL4T8G4Q9_JQ9ci8DAhpizHGQ7IsCZFsKu For Bangladeshi learners: https://youtube.com/playlist?list=PL4T8G4Q9_JQ_byYGwq3FyGhDOFRNdHRL8 My profile: https://researchsociety20.org/founder-and-director/
Raw and Processed Survey Data and Python Analysis Code
figshare.com
xlsx
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicolas Schmelling (2023). Raw and Processed Survey Data and Python Analysis Code [Dataset]. http://doi.org/10.6084/m9.figshare.23282879.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.23282879.v1
Dataset updated
Jun 2, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Nicolas Schmelling
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data contains the raw survey data (Excel file), the manually processed data (Excel file), the literature analysis of the growth conditions of Synechocystis sp. PCC 6803 (Excel file), the manually curated open-question answers (Excel file), a comparison of the original recipes of BG11 media (Excel file), and the data analysis in Pythong (Jupyter notebook)
Datasets for manuscript "A data engineering framework for chemical flow...
catalog.data.gov
gimi9.com
Updated Nov 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2021). Datasets for manuscript "A data engineering framework for chemical flow analysis of industrial pollution abatement operations" [Dataset]. https://catalog.data.gov/dataset/datasets-for-manuscript-a-data-engineering-framework-for-chemical-flow-analysis-of-industr
Explore at:
Dataset updated
Nov 7, 2021
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
The EPA GitHub repository PAU4ChemAs as described in the README.md file, contains Python scripts written to build the PAU dataset modules (technologies, capital and operating costs, and chemical prices) for tracking chemical flows transfers, releases estimation, and identification of potential occupation exposure scenarios in pollution abatement units (PAUs). These PAUs are employed for on-site chemical end-of-life management. The folder datasets contains the outputs for each framework step. The Chemicals_in_categories.csv contains the chemicals for the TRI chemical categories. The EPA GitHub repository PAU_case_study as described in its readme.md entry, contains the Python scripts to run the manuscript case study for designing the PAUs, the data-driven models, and the decision-making module for chemicals of concern and tracking flow transfers at the end-of-life stage. The data was obtained by means of data engineering using different publicly-available databases. The properties of chemicals were obtained using the GitHub repository Properties_Scraper, while the PAU dataset using the repository PAU4Chem. Finally, the EPA GitHub repository Properties_Scraper contains a Python script to massively gather information about exposure limits and physical properties from different publicly-available sources: EPA, NOAA, OSHA, and the institute for Occupational Safety and Health of the German Social Accident Insurance (IFA). Also, all GitHub repositories describe the Python libraries required for running their code, how to use them, the obtained outputs files after running the Python script modules, and the corresponding EPA Disclaimer. This dataset is associated with the following publication: Hernandez-Betancur, J.D., M. Martin, and G.J. Ruiz-Mercado. A data engineering framework for on-site end-of-life industrial operations. JOURNAL OF CLEANER PRODUCTION. Elsevier Science Ltd, New York, NY, USA, 327: 129514, (2021).
u
Data from: dblp XML dataset as CSV for Python Data Analysis Library
observatorio-cientifico.ua.es
Updated 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Carrasco, Rafael C.; Candela, Gustavo; Carrasco, Rafael C.; Candela, Gustavo (2021). dblp XML dataset as CSV for Python Data Analysis Library [Dataset]. https://observatorio-cientifico.ua.es/documentos/668fc45db9e7c03b01bdb2d0
Explore at:
Dataset updated
2021
Authors
Carrasco, Rafael C.; Candela, Gustavo; Carrasco, Rafael C.; Candela, Gustavo
Description
Based on the dblp XML file, this dataset consists on a CSV file that has been extracted using a python script. The dataset can be easily loaded in a Python Data Analysis Library dataframe.
d
Python and R Basics for Environmental Data Sciences
search.dataone.org
hydroshare.org
Updated Dec 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tao Wen (2021). Python and R Basics for Environmental Data Sciences [Dataset]. https://search.dataone.org/view/sha256%3Aa4a66e6665773400ae76151d376607edf33cfead15ffad958fe5795436ff48ff
Explore at:
Dataset updated
Dec 5, 2021
Dataset provided by
Hydroshare
Authors
Tao Wen
Area covered

Description
This resource collects teaching materials that are originally created for the in-person course 'GEOSC/GEOG 497 – Data Mining in Environmental Sciences' at Penn State University (co-taught by Tao Wen, Susan Brantley, and Alan Taylor) and then refined/revised by Tao Wen to be used in the online teaching module 'Data Science in Earth and Environmental Sciences' hosted on the NSF-sponsored HydroLearn platform.

This resource includes both R Notebooks and Python Jupyter Notebooks to teach the basics of R and Python coding, data analysis and data visualization, as well as building machine learning models in both programming languages by using authentic research data and questions. All of these R/Python scripts can be executed either on the CUAHSI JupyterHub or on your local machine.

This resource is shared under the CC-BY license. Please contact the creator Tao Wen at Syracuse University (twen08@syr.edu) for any questions you have about this resource. If you identify any errors in the files, please contact the creator.
Python use cases globally 2022
statista.com
Updated Jul 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Python use cases globally 2022 [Dataset]. https://www.statista.com/statistics/1338409/python-use-cases/
Explore at:
Dataset updated
Jul 11, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Oct 2022 - Dec 2022
Area covered
Worldwide
Description
Python has become one of the most popular programming languages, with a wide variety of use cases. In 2022, Python is most used for web development and data analysis, with ** percent and ** percent respectively.

Facebook

Twitter

Click to copy link

Link copied

Cite

Alaa Saif; Janat Alkhuld M. (2023). Data Manipulation on Heart Disease Dataset Using Pandas Library. [Dataset]. http://doi.org/10.5281/zenodo.8113014

Data Manipulation on Heart Disease Dataset Using Pandas Library.

Explore at:

22 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://doi.org/10.5281/zenodo.8113014

Dataset updated

Jul 4, 2023

Authors

Alaa Saif; Janat Alkhuld M.

Description

With the constant development our world is facing, new diseases and dangers are marked down in human history as "Modern Day Diseases". In the developing world, the risk of heart diseas and related cardiovascular diseases are on the rise. This dataset aquired contains a dataset that is considered a stepping stone in the work to be done ahead in order to prevent the development or the occurance of a heart attack or stroke.

Clear search

Close search

Google apps

Main menu

Data Manipulation on Heart Disease Dataset Using Pandas Library.

Sample data files for Python Course

Python Codes for Data Analysis of The Impact of COVID-19 on Technical...

Data from: PLEIAData: real data from the Pleiades building for smart...

Learn Data Science Series Part 1

Please feel free to share it with others and consider supporting me if you find it helpful ⭐️.

Overview:

Python codes for STM data analysis

Introduction to Machine Learning using Python: Classification

Data analysis V5 for python.xlsx

Using Python Packages and HydroShare to Advance Open Data Science and...

Data analysis codes

(HS 2) Automate Workflows using Jupyter notebook to create Large Extent...

Storage and Transit Time Data and Code

COM model and data analysis scripts

Data processing tools python

Dataset

Contents

Scientific Data Analysis and Visualization with Python

Raw and Processed Survey Data and Python Analysis Code

Datasets for manuscript "A data engineering framework for chemical flow...

Data from: dblp XML dataset as CSV for Python Data Analysis Library

Python and R Basics for Environmental Data Sciences

Python use cases globally 2022

Data Manipulation on Heart Disease Dataset Using Pandas Library.