100+ datasets found
  1. o

    Data Manipulation on Heart Disease Dataset Using Pandas Library.

    • explore.openaire.eu
    Updated Jul 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alaa Saif; Janat Alkhuld M. (2023). Data Manipulation on Heart Disease Dataset Using Pandas Library. [Dataset]. http://doi.org/10.5281/zenodo.8113014
    Explore at:
    Dataset updated
    Jul 4, 2023
    Authors
    Alaa Saif; Janat Alkhuld M.
    Description

    With the constant development our world is facing, new diseases and dangers are marked down in human history as "Modern Day Diseases". In the developing world, the risk of heart diseas and related cardiovascular diseases are on the rise. This dataset aquired contains a dataset that is considered a stepping stone in the work to be done ahead in order to prevent the development or the occurance of a heart attack or stroke.

  2. Sample data files for Python Course

    • figshare.com
    txt
    Updated Nov 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Verhaar (2022). Sample data files for Python Course [Dataset]. http://doi.org/10.6084/m9.figshare.21501549.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 4, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Peter Verhaar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sample data set used in an introductory course on Programming in Python

  3. Python Codes for Data Analysis of The Impact of COVID-19 on Technical...

    • figshare.com
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elizabeth Szkirpan (2022). Python Codes for Data Analysis of The Impact of COVID-19 on Technical Services Units Survey Results [Dataset]. http://doi.org/10.6084/m9.figshare.20416092.v1
    Explore at:
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Elizabeth Szkirpan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Copies of Anaconda 3 Jupyter Notebooks and Python script for holistic and clustered analysis of "The Impact of COVID-19 on Technical Services Units" survey results. Data was analyzed holistically using cleaned and standardized survey results and by library type clusters. To streamline data analysis in certain locations, an off-shoot CSV file was created so data could be standardized without compromising the integrity of the parent clean file. Three Jupyter Notebooks/Python scripts are available in relation to this project: COVID_Impact_TechnicalServices_HolisticAnalysis (a holistic analysis of all survey data) and COVID_Impact_TechnicalServices_LibraryTypeAnalysis (a clustered analysis of impact by library type, clustered files available as part of the Dataverse for this project).

  4. Data from: PLEIAData: real data from the Pleiades building for smart...

    • zenodo.org
    zip
    Updated Feb 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonio Martínez Ibarra; Antonio Martínez Ibarra; Aurora González-Vidal; Aurora González-Vidal; Antonio Skarmeta Gómez; Antonio Skarmeta Gómez (2023). PLEIAData: real data from the Pleiades building for smart applications [Dataset]. http://doi.org/10.5281/zenodo.7096790
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 8, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Antonio Martínez Ibarra; Antonio Martínez Ibarra; Aurora González-Vidal; Aurora González-Vidal; Antonio Skarmeta Gómez; Antonio Skarmeta Gómez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset presents detailed building operation data from the three blocks (A, B and C) of the Pleiades building of the University of Murcia, which is a pilot building of the European project PHOENIX. The aim of PHOENIX is to improve buildings efficiency, and therefore we included information of:
    (i) consumption data, aggregated by block in kWh; (ii) HVAC (Heating, Ventilation and Air Conditioning) data with several features, such as state (ON=1, OFF=0), operation mode (None=0, Heating=1, Cooling=2), setpoint and device type; (iii) indoor temperature per room; (iv) weather data, including temperature, humidity, radiation, dew point, wind direction and precipitation; (v) carbon dioxide and presence data for few rooms; (vi) relationships between HVAC, temperature, carbon dioxide and presence sensors identifiers with their respective rooms and blocks. Weather data was acquired from the IMIDA (Instituto Murciano de Investigación y Desarrollo Agrario y Alimentario).

  5. Learn Data Science Series Part 1

    • kaggle.com
    Updated Dec 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rupesh Kumar (2022). Learn Data Science Series Part 1 [Dataset]. https://www.kaggle.com/datasets/hunter0007/learn-data-science-part-1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 30, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rupesh Kumar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Please feel free to share it with others and consider supporting me if you find it helpful ⭐️.

    Overview:

    • Chapter 1: Getting started with pandas
    • Chapter 2: Analysis: Bringing it all together and making decisions
    • Chapter 3: Appending to DataFrame
    • Chapter 4: Boolean indexing of dataframes
    • Chapter 5: Categorical data
    • Chapter 6: Computational Tools
    • Chapter 7: Creating DataFrames
    • Chapter 8: Cross sections of different axes with MultiIndex
    • Chapter 9: Data Types
    • Chapter 10: Dealing with categorical variables
    • Chapter 11: Duplicated data
    • Chapter 12: Getting information about DataFrames
    • Chapter 13: Gotchas of pandas
    • Chapter 14: Graphs and Visualizations
    • Chapter 15: Grouping Data
    • Chapter 16: Grouping Time Series Data
    • Chapter 17: Holiday Calendars
    • Chapter 18: Indexing and selecting data
    • Chapter 19: IO for Google BigQuery
    • Chapter 20: JSON
    • Chapter 21: Making Pandas Play Nice With Native Python Datatypes
    • Chapter 22: Map Values
    • Chapter 23: Merge, join, and concatenate
    • Chapter 24: Meta: Documentation Guidelines
    • Chapter 25: Missing Data
    • Chapter 26: MultiIndex
    • Chapter 27: Pandas Datareader
    • Chapter 28: Pandas IO tools (reading and saving data sets)
    • Chapter 29: pd.DataFrame.apply
    • Chapter 30: Read MySQL to DataFrame
    • Chapter 31: Read SQL Server to Dataframe
    • Chapter 32: Reading files into pandas DataFrame
    • Chapter 33: Resampling
    • Chapter 34: Reshaping and pivoting
    • Chapter 35: Save pandas dataframe to a csv file
    • Chapter 36: Series
    • Chapter 37: Shifting and Lagging Data
    • Chapter 38: Simple manipulation of DataFrames
    • Chapter 39: String manipulation
    • Chapter 40: Using .ix, .iloc, .loc, .at and .iat to access a DataFrame
    • Chapter 41: Working with Time Series
  6. u

    Python codes for STM data analysis

    • researchdata.cab.unipd.it
    Updated 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christian Durante; Francesco Cazzadori; Alessandro Facchin; Silvio Reginato (2025). Python codes for STM data analysis [Dataset]. http://doi.org/10.25430/researchdata.cab.unipd.it.00001489
    Explore at:
    Dataset updated
    2025
    Dataset provided by
    Research Data Unipd
    Authors
    Christian Durante; Francesco Cazzadori; Alessandro Facchin; Silvio Reginato
    Description

    Python codes were conceived to work with ASCII .txt files with XYZ arrays, both as input and output. This makes codes highly compatible and universally usable. Code A provides an example of conversion from a .s94 data format to the requested ASCII .txt. Image analysis software always allow to export source files to .txt files with XYZ arrays, sometimes placing a text header before the data values to indicate the data scales. The script (code A) is created to convert raw STM files (.s94) into XYZ-type ASCII files, that can be opened by the WSxM software The script (code B) is developed to read the XYZ-type ASCII files and perform the flattening and equalizing filters by operating with an entire input file folder. The script (code C) was conceived with the possibility of optimizing the number of clusters The script (code D) reads a sample of images starting from the first one to the number N, which is selected by the user, it calculates the maximum extension of the Z values distribution for every image and returns an average extension value The script (code E) correct the drift affecting STM images

  7. o

    Introduction to Machine Learning using Python: Classification

    • explore.openaire.eu
    Updated Jan 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khuong Tran; Dr Ghulam Murtaza; Dr Anastasios Papaioannou (2020). Introduction to Machine Learning using Python: Classification [Dataset]. http://doi.org/10.5281/zenodo.6423725
    Explore at:
    Dataset updated
    Jan 1, 2020
    Authors
    Khuong Tran; Dr Ghulam Murtaza; Dr Anastasios Papaioannou
    Description

    About this course Machine Learning (ML) is a new way to program computers to solve real world problems. It has gained popularity over the last few years by achieving tremendous success in tasks that we believed only humans could solve, from recognising images to self-driving cars. In this course, we will explore the fundamentals of Machine Learning from a practical perspective with the help of the Python programming language and its scientific computing libraries. Learning Outcomes Comprehensive introduction to Machine Learning models and techniques such as Logistic Regression, Decision Trees and Ensemble Learning. Know the differences between various core Machine Learning models. Understand the Machine Learning modelling workflows. Use Python and scikit-learn to process real datasets, train and apply Machine Learning models Prerequisites Either Learn to Program: Python, Data Manipulation in Python and Introduction to ML using Python: Introduction & Linear Regression or Learn to Program: Python, Data Manipulation and Visualisation in Python and Introduction to ML using Python: Introduction & Linear Regression needed to attend this course. If you already have experience with programming, please check the topics covered in the Learn to Program: Python, Data Manipulation in Python, Data Manipulation and Visualisation in Python and Introduction to ML using Python: Introduction & Linear Regression courses to ensure that you are familiar with the knowledge needed for this course, such as good understanding of Python syntax, basic programming concepts and familiarity with Pandas, Numpy and Seaborn libraries, and basic understanding of Machine Learning and Model Training. Maths knowledge is not required. There are only a few Math formula that you are going to see in this course, however references to Mathematics required for learning about Machine Learning will be provided. Having an understanding of the Mathematics behind each Machine Learning algorithms is going to make you appreciate the behaviour of the model and know its pros/cons when using them. Why do this course? Useful for anyone who wants to learn about Machine Learning but are overwhelmed with the tremendous amount of resources. It does not go in depth into mathematical concepts and formula, however formal intuitions and references are provided to guide the participants for further learning. We do have applications on real datasets! Machine Learning models are introduced in this course together with important feature engineering techniques that are guaranteed to be useful in your own projects. Give you enough background to kickstart your own Machine Learning journey, or transition yourself into Deep Learning. For a better and more complete understanding of the most popular Machine Learning models and techniques please consider attending all three Introduction to Machine Learning using Python workshops: Introduction to Machine Learning using Python: Introduction & Linear Regression Introduction to Machine Learning using Python: Classification Introduction to Machine Learning using Python: SVM & Unsupervised Learning Licence Copyright © 2021 Intersect Australia Ltd. All rights reserved.

  8. f

    Data analysis V5 for python.xlsx

    • figshare.com
    xlsx
    Updated May 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pingfei Jiang (2025). Data analysis V5 for python.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.28956233.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 8, 2025
    Dataset provided by
    figshare
    Authors
    Pingfei Jiang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the original data for processing for manuscript "A Comparative Study on Retrieval-Augmented Generation and Chain-of-Thought Applications for LLM-Assisted Engineering Design Ideation"

  9. H

    Using Python Packages and HydroShare to Advance Open Data Science and...

    • hydroshare.org
    • beta.hydroshare.org
    zip
    Updated Sep 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeffery S. Horsburgh; Amber Spackman Jones; Anthony M. Castronova; Scott Black (2023). Using Python Packages and HydroShare to Advance Open Data Science and Analytics for Water [Dataset]. https://www.hydroshare.org/resource/4f4acbab5a8c4c55aa06c52a62a1d1fb
    Explore at:
    zip(31.0 MB)Available download formats
    Dataset updated
    Sep 28, 2023
    Dataset provided by
    HydroShare
    Authors
    Jeffery S. Horsburgh; Amber Spackman Jones; Anthony M. Castronova; Scott Black
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Scientific and management challenges in the water domain require synthesis of diverse data. Many data analysis tasks are difficult because datasets are large and complex; standard data formats are not always agreed upon or mapped to efficient structures for analysis; scientists may lack training for tackling large and complex datasets; and it can be difficult to share, collaborate around, and reproduce scientific work. Overcoming barriers to accessing, organizing, and preparing datasets for analyses can transform the way water scientists work. Building on the HydroShare repository’s cyberinfrastructure, we have advanced two Python packages that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS) (i.e., a Python equivalent of USGS’ R dataRetrieval package), loading data into performant structures that integrate with existing visualization, analysis, and data science capabilities available in Python, and writing analysis results back to HydroShare for sharing and publication. While these Python packages can be installed for use within any Python environment, we will demonstrate how the technical burden for scientists associated with creating a computational environment for executing analyses can be reduced and how sharing and reproducibility of analyses can be enhanced through the use of these packages within CUAHSI’s HydroShare-linked JupyterHub server.

    This HydroShare resource includes all of the materials presented in a workshop at the 2023 CUAHSI Biennial Colloquium.

  10. Data analysis codes

    • figshare.com
    txt
    Updated Sep 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dr Auguste Vadisiute; Fernando Messore; Marissa Mueller (2024). Data analysis codes [Dataset]. http://doi.org/10.6084/m9.figshare.26963674.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Sep 7, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Dr Auguste Vadisiute; Fernando Messore; Marissa Mueller
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data analysis scripts for neurone, glial cells and interneurons

  11. d

    (HS 2) Automate Workflows using Jupyter notebook to create Large Extent...

    • search.dataone.org
    • hydroshare.org
    Updated Oct 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Young-Don Choi (2024). (HS 2) Automate Workflows using Jupyter notebook to create Large Extent Spatial Datasets [Dataset]. http://doi.org/10.4211/hs.a52df87347ef47c388d9633925cde9ad
    Explore at:
    Dataset updated
    Oct 19, 2024
    Dataset provided by
    Hydroshare
    Authors
    Young-Don Choi
    Description

    We implemented automated workflows using Jupyter notebooks for each state. The GIS processing, crucial for merging, extracting, and projecting GeoTIFF data, was performed using ArcPy—a Python package for geographic data analysis, conversion, and management within ArcGIS (Toms, 2015). After generating state-scale LES (large extent spatial) datasets in GeoTIFF format, we utilized the xarray and rioxarray Python packages to convert GeoTIFF to NetCDF. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. Xarray facilitated data manipulation and metadata addition in the NetCDF file, while rioxarray was used to save GeoTIFF as NetCDF. These procedures resulted in the creation of three HydroShare resources (HS 3, HS 4 and HS 5) for sharing state-scale LES datasets. Notably, due to licensing constraints with ArcGIS Pro, a commercial GIS software, the Jupyter notebook development was undertaken on a Windows OS.

  12. Storage and Transit Time Data and Code

    • zenodo.org
    zip
    Updated Oct 29, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Felton; Andrew Felton (2024). Storage and Transit Time Data and Code [Dataset]. http://doi.org/10.5281/zenodo.14009758
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 29, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andrew Felton; Andrew Felton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Author: Andrew J. Felton
    Date: 10/29/2024

    This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis, and figure production for the study entitled:

    "Global estimates of the storage and transit time of water through vegetation"

    Please note that 'turnover' and 'transit' are used interchangeably. Also please note that this R project has been updated multiple times as the analysis has updated.

    Data information:

    The data folder contains key data sets used for analysis. In particular:

    "data/turnover_from_python/updated/august_2024_lc/" contains the core datasets used in this study including global arrays summarizing five year (2016-2020) averages of mean (annual) and minimum (monthly) transit time, storage, canopy transpiration, and number of months of data able as both an array (.nc) or data table (.csv). These data were produced in python using the python scripts found in the "supporting_code" folder. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here. The "supporting_data"" folder also contains annual (2016-2020) MODIS land cover data used in the analysis and contains separate filters containing the original data (.hdf) and then the final process (filtered) data in .nc format. The resulting annual land cover distributions were used in the pre-processing of data in python.

    #Code information

    Python scripts can be found in the "supporting_code" folder.

    Each R script in this project has a role:

    "01_start.R": This script sets the working directory, loads in the tidyverse package (the remaining packages in this project are called using the `::` operator), and can run two other scripts: one that loads the customized functions (02_functions.R) and one for importing and processing the key dataset for this analysis (03_import_data.R).

    "02_functions.R": This script contains custom functions. Load this using the
    `source()` function in the 01_start.R script.

    "03_import_data.R": This script imports and processes the .csv transit data. It joins the mean (annual) transit time data with the minimum (monthly) transit data to generate one dataset for analysis: annual_turnover_2. Load this using the
    `source()` function in the 01_start.R script.

    "04_figures_tables.R": This is the main workhouse for figure/table production and
    supporting analyses. This script generates the key figures and summary statistics
    used in the study that then get saved in the manuscript_figures folder. Note that all
    maps were produced using Python code found in the "supporting_code"" folder.

    "supporting_generate_data.R": This script processes supporting data used in the analysis, primarily the varying ground-based datasets of leaf water content.

    "supporting_process_land_cover.R": This takes annual MODIS land cover distributions and processes them through a multi-step filtering process so that they can be used in preprocessing of datasets in python.

  13. COM model and data analysis scripts

    • figshare.com
    • search.datacite.org
    txt
    Updated Jan 19, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    José Pedro Correia (2016). COM model and data analysis scripts [Dataset]. http://doi.org/10.6084/m9.figshare.1428652.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 19, 2016
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    José Pedro Correia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This fileset contains scripts used for model implementation, simulation execution, and data processing for the work presented in J.P. Correia, R. Ocelák, and J. Mašek's "Towards more realistic modeling of linguistic color categorization" (to appear). Python script for model implementation and simulation execution is adapted from an another implementation originally by Gerhard Jaeger and later extended by Michael Franke. The code is provided as is to support a deeper understanding of the details involved in the data analysis we carried out. It is not fully organized or documented (it might even be a bit hacky in places), and for that we apologize.

  14. Data processing tools python

    • kaggle.com
    Updated Apr 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amir Islam (2024). Data processing tools python [Dataset]. https://www.kaggle.com/datasets/amirislam100/data-processing-tools-python
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 26, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Amir Islam
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Amir Islam

    Released under MIT

    Contents

  15. o

    Scientific Data Analysis and Visualization with Python

    • explore.openaire.eu
    Updated Feb 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md. Jalal Uddin (2022). Scientific Data Analysis and Visualization with Python [Dataset]. http://doi.org/10.5281/zenodo.5944708
    Explore at:
    Dataset updated
    Feb 2, 2022
    Authors
    Md. Jalal Uddin
    Description

    The training materials are provided for international learners. However, the following lectures on Python are available on YouTube for both international and Bangladeshi learners. For international learners: https://youtube.com/playlist?list=PL4T8G4Q9_JQ9ci8DAhpizHGQ7IsCZFsKu For Bangladeshi learners: https://youtube.com/playlist?list=PL4T8G4Q9_JQ_byYGwq3FyGhDOFRNdHRL8 My profile: https://researchsociety20.org/founder-and-director/

  16. Raw and Processed Survey Data and Python Analysis Code

    • figshare.com
    xlsx
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicolas Schmelling (2023). Raw and Processed Survey Data and Python Analysis Code [Dataset]. http://doi.org/10.6084/m9.figshare.23282879.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Nicolas Schmelling
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data contains the raw survey data (Excel file), the manually processed data (Excel file), the literature analysis of the growth conditions of Synechocystis sp. PCC 6803 (Excel file), the manually curated open-question answers (Excel file), a comparison of the original recipes of BG11 media (Excel file), and the data analysis in Pythong (Jupyter notebook)

  17. Datasets for manuscript "A data engineering framework for chemical flow...

    • catalog.data.gov
    • gimi9.com
    Updated Nov 7, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2021). Datasets for manuscript "A data engineering framework for chemical flow analysis of industrial pollution abatement operations" [Dataset]. https://catalog.data.gov/dataset/datasets-for-manuscript-a-data-engineering-framework-for-chemical-flow-analysis-of-industr
    Explore at:
    Dataset updated
    Nov 7, 2021
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    The EPA GitHub repository PAU4ChemAs as described in the README.md file, contains Python scripts written to build the PAU dataset modules (technologies, capital and operating costs, and chemical prices) for tracking chemical flows transfers, releases estimation, and identification of potential occupation exposure scenarios in pollution abatement units (PAUs). These PAUs are employed for on-site chemical end-of-life management. The folder datasets contains the outputs for each framework step. The Chemicals_in_categories.csv contains the chemicals for the TRI chemical categories. The EPA GitHub repository PAU_case_study as described in its readme.md entry, contains the Python scripts to run the manuscript case study for designing the PAUs, the data-driven models, and the decision-making module for chemicals of concern and tracking flow transfers at the end-of-life stage. The data was obtained by means of data engineering using different publicly-available databases. The properties of chemicals were obtained using the GitHub repository Properties_Scraper, while the PAU dataset using the repository PAU4Chem. Finally, the EPA GitHub repository Properties_Scraper contains a Python script to massively gather information about exposure limits and physical properties from different publicly-available sources: EPA, NOAA, OSHA, and the institute for Occupational Safety and Health of the German Social Accident Insurance (IFA). Also, all GitHub repositories describe the Python libraries required for running their code, how to use them, the obtained outputs files after running the Python script modules, and the corresponding EPA Disclaimer. This dataset is associated with the following publication: Hernandez-Betancur, J.D., M. Martin, and G.J. Ruiz-Mercado. A data engineering framework for on-site end-of-life industrial operations. JOURNAL OF CLEANER PRODUCTION. Elsevier Science Ltd, New York, NY, USA, 327: 129514, (2021).

  18. u

    Data from: dblp XML dataset as CSV for Python Data Analysis Library

    • observatorio-cientifico.ua.es
    Updated 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carrasco, Rafael C.; Candela, Gustavo; Carrasco, Rafael C.; Candela, Gustavo (2021). dblp XML dataset as CSV for Python Data Analysis Library [Dataset]. https://observatorio-cientifico.ua.es/documentos/668fc45db9e7c03b01bdb2d0
    Explore at:
    Dataset updated
    2021
    Authors
    Carrasco, Rafael C.; Candela, Gustavo; Carrasco, Rafael C.; Candela, Gustavo
    Description

    Based on the dblp XML file, this dataset consists on a CSV file that has been extracted using a python script. The dataset can be easily loaded in a Python Data Analysis Library dataframe.

  19. d

    Python and R Basics for Environmental Data Sciences

    • search.dataone.org
    • hydroshare.org
    Updated Dec 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tao Wen (2021). Python and R Basics for Environmental Data Sciences [Dataset]. https://search.dataone.org/view/sha256%3Aa4a66e6665773400ae76151d376607edf33cfead15ffad958fe5795436ff48ff
    Explore at:
    Dataset updated
    Dec 5, 2021
    Dataset provided by
    Hydroshare
    Authors
    Tao Wen
    Area covered
    Description

    This resource collects teaching materials that are originally created for the in-person course 'GEOSC/GEOG 497 – Data Mining in Environmental Sciences' at Penn State University (co-taught by Tao Wen, Susan Brantley, and Alan Taylor) and then refined/revised by Tao Wen to be used in the online teaching module 'Data Science in Earth and Environmental Sciences' hosted on the NSF-sponsored HydroLearn platform.

    This resource includes both R Notebooks and Python Jupyter Notebooks to teach the basics of R and Python coding, data analysis and data visualization, as well as building machine learning models in both programming languages by using authentic research data and questions. All of these R/Python scripts can be executed either on the CUAHSI JupyterHub or on your local machine.

    This resource is shared under the CC-BY license. Please contact the creator Tao Wen at Syracuse University (twen08@syr.edu) for any questions you have about this resource. If you identify any errors in the files, please contact the creator.

  20. Python use cases globally 2022

    • statista.com
    Updated Jul 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Python use cases globally 2022 [Dataset]. https://www.statista.com/statistics/1338409/python-use-cases/
    Explore at:
    Dataset updated
    Jul 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Oct 2022 - Dec 2022
    Area covered
    Worldwide
    Description

    Python has become one of the most popular programming languages, with a wide variety of use cases. In 2022, Python is most used for web development and data analysis, with ** percent and ** percent respectively.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Alaa Saif; Janat Alkhuld M. (2023). Data Manipulation on Heart Disease Dataset Using Pandas Library. [Dataset]. http://doi.org/10.5281/zenodo.8113014

Data Manipulation on Heart Disease Dataset Using Pandas Library.

Explore at:
22 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jul 4, 2023
Authors
Alaa Saif; Janat Alkhuld M.
Description

With the constant development our world is facing, new diseases and dangers are marked down in human history as "Modern Day Diseases". In the developing world, the risk of heart diseas and related cardiovascular diseases are on the rise. This dataset aquired contains a dataset that is considered a stepping stone in the work to be done ahead in order to prevent the development or the occurance of a heart attack or stroke.

Search
Clear search
Close search
Google apps
Main menu