100+ datasets found
  1. Data from: PLEIAData:consumption, HVAC (Heating, Ventilation & Air...

    • zenodo.org
    • portalinvestigacion.um.es
    • +1more
    zip
    Updated Feb 8, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antonio Martínez Ibarra; Antonio Martínez Ibarra; Aurora González-Vidal; Aurora González-Vidal; Antonio Skarmeta Gómez; Antonio Skarmeta Gómez (2023). PLEIAData:consumption, HVAC (Heating, Ventilation & Air Conditioning), temperature, weather and motion sensor data for smart buildings applications [Dataset]. http://doi.org/10.5281/zenodo.7433380
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 8, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Antonio Martínez Ibarra; Antonio Martínez Ibarra; Aurora González-Vidal; Aurora González-Vidal; Antonio Skarmeta Gómez; Antonio Skarmeta Gómez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset presents detailed building operation data from the three blocks (A, B and C) of the Pleiades building of the University of Murcia, which is a pilot building of the European project PHOENIX. The aim of PHOENIX is to improve buildings efficiency, and therefore we included information of:
    (i) consumption data, aggregated by block in kWh; (ii) HVAC (Heating, Ventilation and Air Conditioning) data with several features, such as state (ON=1, OFF=0), operation mode (None=0, Heating=1, Cooling=2), setpoint and device type; (iii) indoor temperature per room; (iv) weather data, including temperature, humidity, radiation, dew point, wind direction and precipitation; (v) carbon dioxide and presence data for few rooms; (vi) relationships between HVAC, temperature, carbon dioxide and presence sensors identifiers with their respective rooms and blocks. Weather data was acquired from the IMIDA (Instituto Murciano de Investigación y Desarrollo Agrario y Alimentario).

  2. H

    Using Python Packages and HydroShare to Advance Open Data Science and...

    • beta.hydroshare.org
    • hydroshare.org
    zip
    Updated Sep 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeffery S. Horsburgh; Amber Spackman Jones; Anthony M. Castronova; Scott Black (2023). Using Python Packages and HydroShare to Advance Open Data Science and Analytics for Water [Dataset]. https://beta.hydroshare.org/resource/4f4acbab5a8c4c55aa06c52a62a1d1fb/
    Explore at:
    zip(31.0 MB)Available download formats
    Dataset updated
    Sep 28, 2023
    Dataset provided by
    HydroShare
    Authors
    Jeffery S. Horsburgh; Amber Spackman Jones; Anthony M. Castronova; Scott Black
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Scientific and management challenges in the water domain require synthesis of diverse data. Many data analysis tasks are difficult because datasets are large and complex; standard data formats are not always agreed upon or mapped to efficient structures for analysis; scientists may lack training for tackling large and complex datasets; and it can be difficult to share, collaborate around, and reproduce scientific work. Overcoming barriers to accessing, organizing, and preparing datasets for analyses can transform the way water scientists work. Building on the HydroShare repository’s cyberinfrastructure, we have advanced two Python packages that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS) (i.e., a Python equivalent of USGS’ R dataRetrieval package), loading data into performant structures that integrate with existing visualization, analysis, and data science capabilities available in Python, and writing analysis results back to HydroShare for sharing and publication. While these Python packages can be installed for use within any Python environment, we will demonstrate how the technical burden for scientists associated with creating a computational environment for executing analyses can be reduced and how sharing and reproducibility of analyses can be enhanced through the use of these packages within CUAHSI’s HydroShare-linked JupyterHub server.

    This HydroShare resource includes all of the materials presented in a workshop at the 2023 CUAHSI Biennial Colloquium.

  3. 911 Calls Data (Subset)

    • kaggle.com
    zip
    Updated Jun 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    hardly_human (2020). 911 Calls Data (Subset) [Dataset]. https://www.kaggle.com/rehan1024/911-calls-data-subset
    Explore at:
    zip(3828316 bytes)Available download formats
    Dataset updated
    Jun 3, 2020
    Authors
    hardly_human
    License

    https://www.usa.gov/government-works/https://www.usa.gov/government-works/

    Description

    Dataset

    This dataset was created by hardly_human

    Released under U.S. Government Works

    Contents

  4. o

    Scientific Data Analysis and Visualization with Python

    • explore.openaire.eu
    Updated Feb 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md. Jalal Uddin; Nishat Rayhana Eshita; Md. Asif Newaz; Naiem Sheikh; Afifa Talukder; Aysha Akter; Md. Habibur Rahman; Md. Babul Miah (2022). Scientific Data Analysis and Visualization with Python [Dataset]. http://doi.org/10.5281/zenodo.5944707
    Explore at:
    Dataset updated
    Feb 2, 2022
    Authors
    Md. Jalal Uddin; Nishat Rayhana Eshita; Md. Asif Newaz; Naiem Sheikh; Afifa Talukder; Aysha Akter; Md. Habibur Rahman; Md. Babul Miah
    Description

    The publication "Scientific Data Analysis and Visualisation with Python" delves into various facets of Python programming, with a special focus on data analysis and visualisation. Let us deconstruct the main sections: Examining operators and expressions: The text explores arithmetic, comparison, logic, bitwise, assignment and membership operators. These operators serve as fundamental components in the construction of any Python script. Illustrative real-world scenarios show the practical applications of these operators. For example, arithmetic operators are essential for performing mathematical calculations, while comparison operators facilitate decision-making processes. Discussion of data structures and control flow: The book discusses procedures for input, handling strings, working with lists, dictionaries, loops, and conditional expressions. Scientists and software developers can learn how to manipulate data structures efficiently. In particular, lists and dictionaries play a crucial role in organising and retrieving data. Insight into functions and modularisation: Functions are central to Python programming. The publication offers valuable perspectives on the creation and use of functions. The process of modularisation increases the reusability and maintainability of code. By breaking down complex tasks into smaller functions, developers can improve the understandability of their code. Exploring data with Pandas: The book presents a detailed examination of Pandas, a robust library. Readers will gain skills in loading, manipulating, and analysing data frames. Explain data presentation and visualisation: Effective visualisation is critical to understanding data. The publication introduces matplotlib and other plotting libraries. Scientific researchers and analysts can create powerful visual representations to effectively communicate insights. In summary, this publication serves as a valuable resource for individuals at various levels of Python proficiency, including beginners and experienced users. Whether you are a scientist navigating through data or a developer honing your skills, the comprehensive content in this book will guide you towards mastering Python data analysis and visualisation. The training materials are provided for international learners. However, the following lectures on Python are available on YouTube for both international and Bangladeshi learners. For international learners: https://youtube.com/playlist?list=PL4T8G4Q9_JQ9ci8DAhpizHGQ7IsCZFsKu For Bangladeshi learners: https://youtube.com/playlist?list=PL4T8G4Q9_JQ_byYGwq3FyGhDOFRNdHRL8 My profile: https://researchsociety20.org/founder-and-director/

  5. P

    Python Program Development Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jan 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Python Program Development Report [Dataset]. https://www.datainsightsmarket.com/reports/python-program-development-1402872
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Jan 25, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global Python program development market is projected to reach USD 8.52 billion by 2033, exhibiting a CAGR of 11.2% from 2023 to 2033. Increasing adoption of Python in various industries, including commercial offices, industrial production management, and game entertainment, is driving market growth. Python's versatility, open-source nature, and ease of use make it an attractive choice for developers, further contributing to its popularity. Key trends in the market include the growing demand for mobile applications, the emergence of cloud-based Python development, and the increasing use of artificial intelligence (AI) and machine learning (ML) in Python-based applications. However, the lack of standardization and the shortage of skilled Python developers pose challenges to the market's growth. The market is segmented by application, type, and region, with North America and Asia Pacific holding significant market shares due to the presence of established technology hubs and the rapid adoption of Python in these regions.

  6. w

    Murach's Python for data analysis

    • workwithdata.com
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2024). Murach's Python for data analysis [Dataset]. https://www.workwithdata.com/object/murach-s-python-for-data-analysis-book-by-scott-mccoy-0000
    Explore at:
    Dataset updated
    Jun 25, 2024
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Murach's Python for data analysis is a book. It was written by Scott McCoy and published by Mike Murach&Associates Inc in 2021.

  7. Pre-Processed Power Grid Frequency Time Series

    • data.subak.org
    csv
    Updated Feb 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2023). Pre-Processed Power Grid Frequency Time Series [Dataset]. https://data.subak.org/dataset/pre-processed-power-grid-frequency-time-series
    Explore at:
    csvAvailable download formats
    Dataset updated
    Feb 16, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Description

    Overview

    This repository contains ready-to-use frequency time series as well as the corresponding pre-processing scripts in python. The data covers three synchronous areas of the European power grid:

    • Continental Europe
    • Great Britain
    • Nordic

    This work is part of the paper "Predictability of Power Grid Frequency"[1]. Please cite this paper, when using the data and the code. For a detailed documentation of the pre-processing procedure we refer to the supplementary material of the paper.

    Data sources

    We downloaded the frequency recordings from publically available repositories of three different Transmission System Operators (TSOs).

    • Continental Europe [2]: We downloaded the data from the German TSO TransnetBW GmbH, which retains the Copyright on the data, but allows to re-publish it upon request [3].
    • Great Britain [4]: The download was supported by National Grid ESO Open Data, which belongs to the British TSO National Grid. They publish the frequency recordings under the NGESO Open License [5].
    • Nordic [6]: We obtained the data from the Finish TSO Fingrid, which provides the data under the open license CC-BY 4.0 [7].

    Content of the repository

    A) Scripts

    1. In the "Download_scripts" folder you will find three scripts to automatically download frequency data from the TSO's websites.
    2. In "convert_data_format.py" we save the data with corrected timestamp formats. Missing data is marked as NaN (processing step (1) in the supplementary material of [1]).
    3. In "clean_corrupted_data.py" we load the converted data and identify corrupted recordings. We mark them as NaN and clean some of the resulting data holes (processing step (2) in the supplementary material of [1]).

    The python scripts run with Python 3.7 and with the packages found in "requirements.txt".

    B) Yearly converted and cleansed data

    The folders "

    • File type: The files are zipped csv-files, where each file comprises one year.
    • Data format: The files contain two columns. The second column contains the frequency values in Hz. The first one represents the time stamps in the format Year-Month-Day Hour-Minute-Second, which is given as naive local time. The local time refers to the following time zones and includes Daylight Saving Times (python time zone in brackets):
      • TransnetBW: Continental European Time (CE)
      • Nationalgrid: Great Britain (GB)
      • Fingrid: Finland (Europe/Helsinki)
    • NaN representation: We mark corrupted and missing data as "NaN" in the csv-files.

    Use cases

    We point out that this repository can be used in two different was:

    • Use pre-processed data: You can directly use the converted or the cleansed data. Note however, that both data sets include segments of NaN-values due to missing and corrupted recordings. Only a very small part of the NaN-values were eliminated in the cleansed data to not manipulate the data too much.

    • Produce your own cleansed data: Depending on your application, you might want to cleanse the data in a custom way. You can easily add your custom cleansing procedure in "clean_corrupted_data.py" and then produce cleansed data from the raw data in "

    License

    This work is licensed under multiple licenses, which are located in the "LICENSES" folder.

    • We release the code in the folder "Scripts" under the MIT license .
    • The pre-processed data in the subfolders "**/Fingrid" and "**/Nationalgrid" are licensed under CC-BY 4.0.
    • TransnetBW originally did not publish their data under an open license. We have explicitly received the permission to publish the pre-processed version from TransnetBW. However, we cannot publish our pre-processed version under an open license due to the missing license of the original TransnetBW data.

    Changelog

    Version 2:

    • Add time zone information to description
    • Include new frequency data
    • Update references
    • Change folder structure to yearly folders

    Version 3:

    • Correct TransnetBW files for missing data in May 2016
  8. Sales Analysis

    • kaggle.com
    zip
    Updated Jun 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vinay Shaw (2020). Sales Analysis [Dataset]. https://www.kaggle.com/datasets/vinayshaw/sales-analysis
    Explore at:
    zip(2492073 bytes)Available download formats
    Dataset updated
    Jun 30, 2020
    Authors
    Vinay Shaw
    Description

    Dataset

    This dataset was created by Vinay Shaw

    Contents

  9. Demo dataset for: SPACEc, a streamlined, interactive Python workflow for...

    • data.niaid.nih.gov
    • zenodo.org
    • +1more
    zip
    Updated Jul 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuqi Tan; Tim Kempchen (2024). Demo dataset for: SPACEc, a streamlined, interactive Python workflow for multiplexed image processing and analysis [Dataset]. http://doi.org/10.5061/dryad.brv15dvj1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 8, 2024
    Dataset provided by
    Stanford University School of Medicine
    Authors
    Yuqi Tan; Tim Kempchen
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Multiplexed imaging technologies provide insights into complex tissue architectures. However, challenges arise due to software fragmentation with cumbersome data handoffs, inefficiencies in processing large images (8 to 40 gigabytes per image), and limited spatial analysis capabilities. To efficiently analyze multiplexed imaging data, we developed SPACEc, a scalable end-to-end Python solution, that handles image extraction, cell segmentation, and data preprocessing and incorporates machine-learning-enabled, multi-scaled, spatial analysis, operated through a user-friendly and interactive interface. The demonstration dataset was derived from a previous analysis and contains TMA cores from a human tonsil and tonsillitis sample that were acquired with the Akoya PhenocyclerFusion platform. The dataset can be used to test the workflow and establish it on a user’s system or to familiarize oneself with the pipeline. Methods Tissue samples: Tonsil cores were extracted from a larger multi-tumor tissue microarray (TMA), which included a total of 66 unique tissues (51 malignant and semi-malignant tissues, as well as 15 non-malignant tissues). Representative tissue regions were annotated on corresponding hematoxylin and eosin (H&E)-stained sections by a board-certified surgical pathologist (S.Z.). Annotations were used to generate the 66 cores each with cores of 1mm diameter. FFPE tissue blocks were retrieved from the tissue archives of the Institute of Pathology, University Medical Center Mainz, Germany, and the Department of Dermatology, University Medical Center Mainz, Germany. The multi-tumor-TMA block was sectioned at 3µm thickness onto SuperFrost Plus microscopy slides before being processed for CODEX multiplex imaging as previously described. CODEX multiplexed imaging and processing To run the CODEX machine, the slide was taken from the storage buffer and placed in PBS for 10 minutes to equilibrate. After drying the PBS with a tissue, a flow cell was sealed onto the tissue slide. The assembled slide and flow cell were then placed in a PhenoCycler Buffer made from 10X PhenoCycler Buffer & Additive for at least 10 minutes before starting the experiment. A 96-well reporter plate was prepared with each reporter corresponding to the correct barcoded antibody for each cycle, with up to 3 reporters per cycle per well. The fluorescence reporters were mixed with 1X PhenoCycler Buffer, Additive, nuclear-staining reagent, and assay reagent according to the manufacturer's instructions. With the reporter plate and assembled slide and flow cell placed into the CODEX machine, the automated multiplexed imaging experiment was initiated. Each imaging cycle included steps for reporter binding, imaging of three fluorescent channels, and reporter stripping to prepare for the next cycle and set of markers. This was repeated until all markers were imaged. After the experiment, a .qptiff image file containing individual antibody channels and the DAPI channel was obtained. Image stitching, drift compensation, deconvolution, and cycle concatenation are performed within the Akoya PhenoCycler software. The raw imaging data output (tiff, 377.442nm per pixel for 20x CODEX) is first examined with QuPath software (https://qupath.github.io/) for inspection of staining quality. Any markers that produce unexpected patterns or low signal-to-noise ratios should be excluded from the ensuing analysis. The qptiff files must be converted into tiff files for input into SPACEc. Data preprocessing includes image stitching, drift compensation, deconvolution, and cycle concatenation performed using the Akoya Phenocycler software. The raw imaging data (qptiff, 377.442 nm/pixel for 20x CODEX) files from the Akoya PhenoCycler technology were first examined with QuPath software (https://qupath.github.io/) to inspect staining qualities. Markers with untenable patterns or low signal-to-noise ratios were excluded from further analysis. A custom CODEX analysis pipeline was used to process all acquired CODEX data (scripts available upon request). The qptiff files were converted into tiff files for tissue detection (watershed algorithm) and cell segmentation.

  10. Python use cases globally 2022

    • statista.com
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Python use cases globally 2022 [Dataset]. https://www.statista.com/statistics/1338409/python-use-cases/
    Explore at:
    Dataset updated
    Feb 11, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Oct 2022 - Dec 2022
    Area covered
    Worldwide
    Description

    Python has become one of the most popular programming languages, with a wide variety of use cases. In 2022, Python is most used for web development and data analysis, with 46 percent and 54 percent respectively.

  11. Multi-Dimensional Data Viewer (MDV) user manual for data exploration:...

    • zenodo.org
    pdf, zip
    Updated Jul 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maria Kiourlappou; Maria Kiourlappou; Martin Sergeant; Martin Sergeant; Joshua S. Titlow; Joshua S. Titlow; Jeffrey Y. Lee; Jeffrey Y. Lee; Darragh Ennis; Stephen Taylor; Stephen Taylor; Ilan Davis; Ilan Davis; Darragh Ennis (2024). Multi-Dimensional Data Viewer (MDV) user manual for data exploration: "Systematic analysis of YFP traps reveals common discordance between mRNA and protein across the nervous system" [Dataset]. http://doi.org/10.5281/zenodo.7875495
    Explore at:
    zip, pdfAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Maria Kiourlappou; Maria Kiourlappou; Martin Sergeant; Martin Sergeant; Joshua S. Titlow; Joshua S. Titlow; Jeffrey Y. Lee; Jeffrey Y. Lee; Darragh Ennis; Stephen Taylor; Stephen Taylor; Ilan Davis; Ilan Davis; Darragh Ennis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please also see the latest version of the repository:
    https://doi.org/10.5281/zenodo.6374011 and
    our website: https://ilandavis.com/jcb2023-yfp

    The explosion in the volume of biological imaging data challenges the available technologies for data interrogation and its intersection with related published bioinformatics data sets. Moreover, intersection of highly rich and complex datasets from different sources provided as flat csv files requires advanced informatics skills, which is time consuming and not accessible to all. Here, we provide a “user manual” to our new paradigm for systematically filtering and analysing a dataset with more than 1300 microscopy data figures using Multi-Dimensional Viewer (MDV) -link, a solution for interactive multimodal data visualisation and exploration. The primary data we use are derived from our published systematic analysis of 200 YFP traps reveals common discordance between mRNA and protein across the nervous system (eprint link). This manual provides the raw image data together with the expert annotations of the mRNA and protein distribution as well as associated bioinformatics data. We provide an explanation, with specific examples, of how to use MDV to make the multiple data types interoperable and explore them together. We also provide the open-source python code (github link) used to annotate the figures, which could be adapted to any other kind of data annotation task.

  12. Z

    Plot Data: Analysis of thin liquid films driven by SAW

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mitas, Kevin David Joachim (2021). Plot Data: Analysis of thin liquid films driven by SAW [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5069876
    Explore at:
    Dataset updated
    Oct 25, 2021
    Dataset authored and provided by
    Mitas, Kevin David Joachim
    Description

    Data of the relevant plots of the thesis

  13. H

    Advancing Open and Reproducible Water Data Science by Integrating Data...

    • hydroshare.org
    • beta.hydroshare.org
    • +1more
    zip
    Updated Jan 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Advancing Open and Reproducible Water Data Science by Integrating Data Analytics with an Online Data Repository [Dataset]. https://www.hydroshare.org/resource/45d3427e794543cfbee129c604d7e865
    Explore at:
    zip(50.9 MB)Available download formats
    Dataset updated
    Jan 9, 2024
    Dataset provided by
    HydroShare
    Authors
    Jeffery S. Horsburgh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Scientific and related management challenges in the water domain require synthesis of data from multiple domains. Many data analysis tasks are difficult because datasets are large and complex; standard formats for data types are not always agreed upon nor mapped to an efficient structure for analysis; water scientists may lack training in methods needed to efficiently tackle large and complex datasets; and available tools can make it difficult to share, collaborate around, and reproduce scientific work. Overcoming these barriers to accessing, organizing, and preparing datasets for analyses will be an enabler for transforming scientific inquiries. Building on the HydroShare repository’s established cyberinfrastructure, we have advanced two packages for the Python language that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS), loading of data into performant structures keyed to specific scientific data types and that integrate with existing visualization, analysis, and data science capabilities available in Python, and then writing analysis results back to HydroShare for sharing and eventual publication. These capabilities reduce the technical burden for scientists associated with creating a computational environment for executing analyses by installing and maintaining the packages within CUAHSI’s HydroShare-linked JupyterHub server. HydroShare users can leverage these tools to build, share, and publish more reproducible scientific workflows. The HydroShare Python Client and USGS NWIS Data Retrieval packages can be installed within a Python environment on any computer running Microsoft Windows, Apple MacOS, or Linux from the Python Package Index using the PIP utility. They can also be used online via the CUAHSI JupyterHub server (https://jupyterhub.cuahsi.org/) or other Python notebook environments like Google Collaboratory (https://colab.research.google.com/). Source code, documentation, and examples for the software are freely available in GitHub at https://github.com/hydroshare/hsclient/ and https://github.com/USGS-python/dataretrieval.

    This presentation was delivered as part of the Hawai'i Data Science Institute's regular seminar series: https://datascience.hawaii.edu/event/data-science-and-analytics-for-water/

  14. Storage and Transit Time Data and Code

    • zenodo.org
    zip
    Updated Nov 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Felton; Andrew Felton (2024). Storage and Transit Time Data and Code [Dataset]. http://doi.org/10.5281/zenodo.14171251
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 15, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andrew Felton; Andrew Felton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Author: Andrew J. Felton
    Date: 11/15/2024

    This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis, and figure production for the study entitled:

    "Global estimates of the storage and transit time of water through vegetation"

    Please note that 'turnover' and 'transit' are used interchangeably. Also please note that this R project has been updated multiple times as the analysis has updated throughout the peer review process.

    #Data information:

    The data folder contains key data sets used for analysis. In particular:

    "data/turnover_from_python/updated/august_2024_lc/" contains the core datasets used in this study including global arrays summarizing five year (2016-2020) averages of mean (annual) and minimum (monthly) transit time, storage, canopy transpiration, and number of months of data able as both an array (.nc) or data table (.csv). These data were produced in python using the python scripts found in the "supporting_code" folder. The remaining files in the "data" and "data/supporting_data" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here. The "supporting_data"" folder also contains annual (2016-2020) MODIS land cover data used in the analysis and contains separate filters containing the original data (.hdf) and then the final process (filtered) data in .nc format. The resulting annual land cover distributions were used in the pre-processing of data in python.

    #Code information

    Python scripts can be found in the "supporting_code" folder.

    Each R script in this project has a role:

    "01_start.R": This script sets the working directory, loads in the tidyverse package (the remaining packages in this project are called using the `::` operator), and can run two other scripts: one that loads the customized functions (02_functions.R) and one for importing and processing the key dataset for this analysis (03_import_data.R).

    "02_functions.R": This script contains custom functions. Load this using the `source()` function in the 01_start.R script.

    "03_import_data.R": This script imports and processes the .csv transit data. It joins the mean (annual) transit time data with the minimum (monthly) transit data to generate one dataset for analysis: annual_turnover_2. Load this using the
    `source()` function in the 01_start.R script.

    "04_figures_tables.R": This is the main workhouse for figure/table production and supporting analyses. This script generates the key figures and summary statistics used in the study that then get saved in the "manuscript_figures" folder. Note that all maps were produced using Python code found in the "supporting_code"" folder. Also note that within the "manuscript_figures" folder there is an "extended_data" folder, which contains tables of the summary statistics (e.g., quartiles and sample sizes) behind figures containing box plots or depicting regression coefficients.

    "supporting_generate_data.R": This script processes supporting data used in the analysis, primarily the varying ground-based datasets of leaf water content.

    "supporting_process_land_cover.R": This takes annual MODIS land cover distributions and processes them through a multi-step filtering process so that they can be used in preprocessing of datasets in python.

  15. Data processing tools python

    • kaggle.com
    zip
    Updated Apr 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amir Islam (2024). Data processing tools python [Dataset]. https://www.kaggle.com/amirislam100/data-processing-tools-python
    Explore at:
    zip(22269 bytes)Available download formats
    Dataset updated
    Apr 26, 2024
    Authors
    Amir Islam
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Amir Islam

    Released under MIT

    Contents

  16. Z

    Data from: Technical Leverage Analysis in the Python Ecosystem

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paramitha, Ranindya (2023). Technical Leverage Analysis in the Python Ecosystem [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7186627
    Explore at:
    Dataset updated
    Jul 24, 2023
    Dataset provided by
    Massacci, Fabio
    Paramitha, Ranindya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Technical Leverage Analysis in the Python Ecosystem

    This dataset is the original dataset used in the publication [1]. It includes 21205 distinct package versions from the top 600 Python packages. An online demo for computing the proposed metrics for real-world software libraries is also available under the following URL: https://techleverage.eu/.

    This work has been partially funded by the EU under the H2020 Program AssureMOSS (Grant n. 952647).

    [1] DOI: 10.1007/s10664-023-10355-2

  17. S

    Data and Python script for article "A text mining analysis of the climate...

    • data.subak.org
    • zenodo.org
    csv
    Updated Feb 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Illinois at Chicago (2023). Data and Python script for article "A text mining analysis of the climate change literature in industrial ecology' [Dataset]. https://data.subak.org/dataset/data-and-python-script-for-article-a-text-mining-analysis-of-the-climate-change-literature-in-i
    Explore at:
    csvAvailable download formats
    Dataset updated
    Feb 16, 2023
    Dataset provided by
    University of Illinois at Chicago
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data and Python script are part of the forum article "A text mining analysis of the climate change literature in industrial ecology" authored by Dayeen, F.R., Sharma, A.S., and Derrible, S., and published in the Journal of Industrial Ecology in 2020.

    The Python script and instructions are included in the LiTCoF_v1.00-py.zip file. The original data is available in two formats: .csv and .pkl.

    Updates of the script will be posted at https://github.com/csunlab/LiTCoF and at https://csun.uic.edu/codes/LiTCoF.html. The data is also available at https://csun.uic.edu/datasets.html#AbstractsIE.

    Feel free to contact any of the authors for information and questions about the data and code.

  18. t

    Data from: Decoding Wayfinding: Analyzing Wayfinding Processes in the...

    • researchdata.tuwien.at
    html, pdf, zip
    Updated Feb 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Negar Alinaghi; Ioannis Giannopoulos; Ioannis Giannopoulos; Negar Alinaghi; Negar Alinaghi; Negar Alinaghi (2025). Decoding Wayfinding: Analyzing Wayfinding Processes in the Outdoor Environment [Dataset]. http://doi.org/10.48436/m2ha4-t1v92
    Explore at:
    html, zip, pdfAvailable download formats
    Dataset updated
    Feb 23, 2025
    Dataset provided by
    TU Wien
    Authors
    Negar Alinaghi; Ioannis Giannopoulos; Ioannis Giannopoulos; Negar Alinaghi; Negar Alinaghi; Negar Alinaghi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Folder Structure

    The folder named “submission” contains the following:

    1. “pythonProject”: This folder contains all the Python files and subfolders needed for analysis.
    2. ijgis.yml: This file lists all the Python libraries and dependencies required to run the code.

    Setting Up the Environment

    1. Use the ijgis.yml file to create a Python project and environment. Ensure you activate the environment before running the code.
    2. The pythonProject folder contains several .py files and subfolders, each with specific functionality as described below.

    Subfolders

    1. Data_4_IJGIS

    • This folder contains the data used for the results reported in the paper.
    • Note: The data analysis that we explain in this paper already begins with the synchronization and cleaning of the recorded raw data. The published data is already synchronized and cleaned. Both the cleaned files and the merged files with features extracted for them are given in this directory. If you want to perform the segmentation and feature extraction yourself, you should run the respective Python files yourself. If not, you can use the “merged_…csv” files as input for the training.

    2. results_[DateTime] (e.g., results_20240906_15_00_13)

    • This folder will be generated when you run the code and will store the output of each step.
    • The current folder contains results created during code debugging for the submission.
    • When you run the code, a new folder with fresh results will be generated.

    Python Files

    1. helper_functions.py

    • Contains reusable functions used throughout the analysis.
    • Each function includes a description of its purpose and the input parameters required.

    2. create_sanity_plots.py

    • Generates scatter plots like those in Figure 3 of the paper.
    • Although the code has been run for all 309 trials, it can be used to check the sample data provided.
    • Output: A .png file for each column of the raw gaze and IMU recordings, color-coded with logged events.
    • Usage: Run this file to create visualizations similar to Figure 3.

    3. overlapping_sliding_window_loop.py

    • Implements overlapping sliding window segmentation and generates plots like those in Figure 4.
    • Output:
      • Two new subfolders, “Gaze” and “IMU”, will be added to the Data_4_IJGIS folder.
      • Segmented files (default: 2–10 seconds with a 1-second step size) will be saved as .csv files.
      • A visualization of the segments, similar to Figure 4, will be automatically generated.

    4. gaze_features.py & imu_features.py

    • These files compute features as explained in Tables 1 and 2 of the paper, respectively.
    • They process the segmented recordings generated by the overlapping_sliding_window_loop.py.
    • Usage: Just to know how the features are calculated, you can run this code after the segmentation with the sliding window and run these files to calculate the features from the segmented data.

    5. training_prediction.py

    • This file contains the main machine learning analysis of the paper. This file contains all the code for the training of the model, its evaluation, and its use for the inference of the “monitoring part”. It covers the following steps:
    a. Data Preparation (corresponding to Section 5.1.1 of the paper)
    • Prepares the data according to the research question (RQ) described in the paper. Since this data was collected with several RQs in mind, we remove parts of the data that are not related to the RQ of this paper.
    • A function named plot_labels_comparison(df, save_path, x_label_freq=10, figsize=(15, 5)) in line 116 visualizes the data preparation results. As this visualization is not used in the paper, the line is commented out, but if you want to see visually what has been changed compared to the original data, you can comment out this line.
    b. Training/Validation/Test Split
    • Splits the data for machine learning experiments (an explanation can be found in Section 5.1.1. Preparation of data for training and inference of the paper).
    • Make sure that you follow the instructions in the comments to the code exactly.
    • Output: The split data is saved as .csv files in the results folder.
    c. Machine and Deep Learning Experiments

    This part contains three main code blocks:

    iii. One for the XGboost code with correct hyperparameter tuning:
    Please read the instructions for each block carefully to ensure that the code works smoothly. Regardless of which block you use, you will get the classification results (in the form of scores) for unseen data. The way we empirically test the confidence threshold of

    • MLP Network (Commented Out): This code was used for classification with the MLP network, and the results shown in Table 3 are from this code. If you wish to use this model, please comment out the following blocks accordingly.
    • XGBoost without Hyperparameter Tuning: If you want to run the code but do not want to spend time on the full training with hyperparameter tuning (as was done for the paper), just uncomment this part. This will give you a simple, untuned model with which you can achieve at least some results.
    • XGBoost with Hyperparameter Tuning: If you want to train the model the way we trained it for the analysis reported in the paper, use this block (the plots in Figure 7 are from this block). We ran this block with different feature sets and different segmentation files and created a simple bar chart from the saved results, shown in Figure 6.

    Note: Please read the instructions for each block carefully to ensure that the code works smoothly. Regardless of which block you use, you will get the classification results (in the form of scores) for unseen data. The way we empirically calculated the confidence threshold of the model (explained in the paper in Section 5.2. Part II: Decoding surveillance by sequence analysis) is given in this block in lines 361 to 380.

    d. Inference (Monitoring Part)
    • Final inference is performed using the monitoring data. This step produces a .csv file containing inferred labels.
    • Figure 8 in the paper is generated using this part of the code.

    6. sequence_analysis.py

    • Performs analysis on the inferred data, producing Figures 9 and 10 from the paper.
    • This file reads the inferred data from the previous step and performs sequence analysis as described in Sections 5.2.1 and 5.2.2.

    Licenses

    The data is licensed under CC-BY, the code is licensed under MIT.

  19. o

    Demographic Analysis Workflow using Census API in Jupyter Notebook:...

    • openicpsr.org
    delimited
    Updated Jul 23, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Donghwan Gu; Nathanael Rosenheim (2020). Demographic Analysis Workflow using Census API in Jupyter Notebook: 1990-2000 Population Size and Change [Dataset]. http://doi.org/10.3886/E120381V1
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Jul 23, 2020
    Dataset provided by
    Texas A&M University
    Authors
    Donghwan Gu; Nathanael Rosenheim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Boone County, Kentucky, US Counties
    Description

    This archive reproduces a table titled "Table 3.1 Boone county population size, 1990 and 2000" from Wang and vom Hofe (2007, p.58). The archive provides a Jupyter Notebook that uses Python and can be run in Google Colaboratory. The workflow uses Census API to retrieve data, reproduce the table, and ensure reproducibility for anyone accessing this archive.The Python code was developed in Google Colaboratory, or Google Colab for short, which is an Integrated Development Environment (IDE) of JupyterLab and streamlines package installation, code collaboration and management. The Census API is used to obtain population counts from the 1990 and 2000 Decennial Census (Summary File 1, 100% data). All downloaded data are maintained in the notebook's temporary working directory while in use. The data are also stored separately with this archive.The notebook features extensive explanations, comments, code snippets, and code output. The notebook can be viewed in a PDF format or downloaded and opened in Google Colab. References to external resources are also provided for the various functional components. The notebook features code to perform the following functions:install/import necessary Python packagesintroduce a Census API Querydownload Census data via CensusAPI manipulate Census tabular data calculate absolute change and percent changeformatting numbersexport the table to csvThe notebook can be modified to perform the same operations for any county in the United States by changing the State and County FIPS code parameters for the Census API downloads. The notebook could be adapted for use in other environments (i.e., Jupyter Notebook) as well as reading and writing files to a local or shared drive, or cloud drive (i.e., Google Drive).

  20. Interlab: A python module for consensus analysis in interlaboratory studies

    • catalog.data.gov
    • data.nist.gov
    Updated Jul 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2022). Interlab: A python module for consensus analysis in interlaboratory studies [Dataset]. https://catalog.data.gov/dataset/interlab-a-python-module-for-consensus-analysis-in-interlaboratory-studies-aa732
    Explore at:
    Dataset updated
    Jul 29, 2022
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    A python module for consensus analysis in interlaboratory studies

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Antonio Martínez Ibarra; Antonio Martínez Ibarra; Aurora González-Vidal; Aurora González-Vidal; Antonio Skarmeta Gómez; Antonio Skarmeta Gómez (2023). PLEIAData:consumption, HVAC (Heating, Ventilation & Air Conditioning), temperature, weather and motion sensor data for smart buildings applications [Dataset]. http://doi.org/10.5281/zenodo.7433380
Organization logo

Data from: PLEIAData:consumption, HVAC (Heating, Ventilation & Air Conditioning), temperature, weather and motion sensor data for smart buildings applications

Related Article
Explore at:
zipAvailable download formats
Dataset updated
Feb 8, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Antonio Martínez Ibarra; Antonio Martínez Ibarra; Aurora González-Vidal; Aurora González-Vidal; Antonio Skarmeta Gómez; Antonio Skarmeta Gómez
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset presents detailed building operation data from the three blocks (A, B and C) of the Pleiades building of the University of Murcia, which is a pilot building of the European project PHOENIX. The aim of PHOENIX is to improve buildings efficiency, and therefore we included information of:
(i) consumption data, aggregated by block in kWh; (ii) HVAC (Heating, Ventilation and Air Conditioning) data with several features, such as state (ON=1, OFF=0), operation mode (None=0, Heating=1, Cooling=2), setpoint and device type; (iii) indoor temperature per room; (iv) weather data, including temperature, humidity, radiation, dew point, wind direction and precipitation; (v) carbon dioxide and presence data for few rooms; (vi) relationships between HVAC, temperature, carbon dioxide and presence sensors identifiers with their respective rooms and blocks. Weather data was acquired from the IMIDA (Instituto Murciano de Investigación y Desarrollo Agrario y Alimentario).

Search
Clear search
Close search
Google apps
Main menu