Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents detailed building operation data from the three blocks (A, B and C) of the Pleiades building of the University of Murcia, which is a pilot building of the European project PHOENIX. The aim of PHOENIX is to improve buildings efficiency, and therefore we included information of:
(i) consumption data, aggregated by block in kWh; (ii) HVAC (Heating, Ventilation and Air Conditioning) data with several features, such as state (ON=1, OFF=0), operation mode (None=0, Heating=1, Cooling=2), setpoint and device type; (iii) indoor temperature per room; (iv) weather data, including temperature, humidity, radiation, dew point, wind direction and precipitation; (v) carbon dioxide and presence data for few rooms; (vi) relationships between HVAC, temperature, carbon dioxide and presence sensors identifiers with their respective rooms and blocks. Weather data was acquired from the IMIDA (Instituto Murciano de Investigación y Desarrollo Agrario y Alimentario).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Scientific and management challenges in the water domain require synthesis of diverse data. Many data analysis tasks are difficult because datasets are large and complex; standard data formats are not always agreed upon or mapped to efficient structures for analysis; scientists may lack training for tackling large and complex datasets; and it can be difficult to share, collaborate around, and reproduce scientific work. Overcoming barriers to accessing, organizing, and preparing datasets for analyses can transform the way water scientists work. Building on the HydroShare repository’s cyberinfrastructure, we have advanced two Python packages that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS) (i.e., a Python equivalent of USGS’ R dataRetrieval package), loading data into performant structures that integrate with existing visualization, analysis, and data science capabilities available in Python, and writing analysis results back to HydroShare for sharing and publication. While these Python packages can be installed for use within any Python environment, we will demonstrate how the technical burden for scientists associated with creating a computational environment for executing analyses can be reduced and how sharing and reproducibility of analyses can be enhanced through the use of these packages within CUAHSI’s HydroShare-linked JupyterHub server.
This HydroShare resource includes all of the materials presented in a workshop at the 2023 CUAHSI Biennial Colloquium.
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
This dataset was created by hardly_human
Released under U.S. Government Works
The publication "Scientific Data Analysis and Visualisation with Python" delves into various facets of Python programming, with a special focus on data analysis and visualisation. Let us deconstruct the main sections: Examining operators and expressions: The text explores arithmetic, comparison, logic, bitwise, assignment and membership operators. These operators serve as fundamental components in the construction of any Python script. Illustrative real-world scenarios show the practical applications of these operators. For example, arithmetic operators are essential for performing mathematical calculations, while comparison operators facilitate decision-making processes. Discussion of data structures and control flow: The book discusses procedures for input, handling strings, working with lists, dictionaries, loops, and conditional expressions. Scientists and software developers can learn how to manipulate data structures efficiently. In particular, lists and dictionaries play a crucial role in organising and retrieving data. Insight into functions and modularisation: Functions are central to Python programming. The publication offers valuable perspectives on the creation and use of functions. The process of modularisation increases the reusability and maintainability of code. By breaking down complex tasks into smaller functions, developers can improve the understandability of their code. Exploring data with Pandas: The book presents a detailed examination of Pandas, a robust library. Readers will gain skills in loading, manipulating, and analysing data frames. Explain data presentation and visualisation: Effective visualisation is critical to understanding data. The publication introduces matplotlib and other plotting libraries. Scientific researchers and analysts can create powerful visual representations to effectively communicate insights. In summary, this publication serves as a valuable resource for individuals at various levels of Python proficiency, including beginners and experienced users. Whether you are a scientist navigating through data or a developer honing your skills, the comprehensive content in this book will guide you towards mastering Python data analysis and visualisation. The training materials are provided for international learners. However, the following lectures on Python are available on YouTube for both international and Bangladeshi learners. For international learners: https://youtube.com/playlist?list=PL4T8G4Q9_JQ9ci8DAhpizHGQ7IsCZFsKu For Bangladeshi learners: https://youtube.com/playlist?list=PL4T8G4Q9_JQ_byYGwq3FyGhDOFRNdHRL8 My profile: https://researchsociety20.org/founder-and-director/
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global Python program development market is projected to reach USD 8.52 billion by 2033, exhibiting a CAGR of 11.2% from 2023 to 2033. Increasing adoption of Python in various industries, including commercial offices, industrial production management, and game entertainment, is driving market growth. Python's versatility, open-source nature, and ease of use make it an attractive choice for developers, further contributing to its popularity. Key trends in the market include the growing demand for mobile applications, the emergence of cloud-based Python development, and the increasing use of artificial intelligence (AI) and machine learning (ML) in Python-based applications. However, the lack of standardization and the shortage of skilled Python developers pose challenges to the market's growth. The market is segmented by application, type, and region, with North America and Asia Pacific holding significant market shares due to the presence of established technology hubs and the rapid adoption of Python in these regions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Murach's Python for data analysis is a book. It was written by Scott McCoy and published by Mike Murach&Associates Inc in 2021.
Overview
This repository contains ready-to-use frequency time series as well as the corresponding pre-processing scripts in python. The data covers three synchronous areas of the European power grid:
This work is part of the paper "Predictability of Power Grid Frequency"[1]. Please cite this paper, when using the data and the code. For a detailed documentation of the pre-processing procedure we refer to the supplementary material of the paper.
Data sources
We downloaded the frequency recordings from publically available repositories of three different Transmission System Operators (TSOs).
Content of the repository
A) Scripts
The python scripts run with Python 3.7 and with the packages found in "requirements.txt".
B) Yearly converted and cleansed data
The folders "
Use cases
We point out that this repository can be used in two different was:
Use pre-processed data: You can directly use the converted or the cleansed data. Note however, that both data sets include segments of NaN-values due to missing and corrupted recordings. Only a very small part of the NaN-values were eliminated in the cleansed data to not manipulate the data too much.
Produce your own cleansed data: Depending on your application, you might want to cleanse the data in a custom way. You can easily add your custom cleansing procedure in "clean_corrupted_data.py" and then produce cleansed data from the raw data in "
License
This work is licensed under multiple licenses, which are located in the "LICENSES" folder.
Changelog
Version 2:
Version 3:
This dataset was created by Vinay Shaw
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Multiplexed imaging technologies provide insights into complex tissue architectures. However, challenges arise due to software fragmentation with cumbersome data handoffs, inefficiencies in processing large images (8 to 40 gigabytes per image), and limited spatial analysis capabilities. To efficiently analyze multiplexed imaging data, we developed SPACEc, a scalable end-to-end Python solution, that handles image extraction, cell segmentation, and data preprocessing and incorporates machine-learning-enabled, multi-scaled, spatial analysis, operated through a user-friendly and interactive interface. The demonstration dataset was derived from a previous analysis and contains TMA cores from a human tonsil and tonsillitis sample that were acquired with the Akoya PhenocyclerFusion platform. The dataset can be used to test the workflow and establish it on a user’s system or to familiarize oneself with the pipeline. Methods Tissue samples: Tonsil cores were extracted from a larger multi-tumor tissue microarray (TMA), which included a total of 66 unique tissues (51 malignant and semi-malignant tissues, as well as 15 non-malignant tissues). Representative tissue regions were annotated on corresponding hematoxylin and eosin (H&E)-stained sections by a board-certified surgical pathologist (S.Z.). Annotations were used to generate the 66 cores each with cores of 1mm diameter. FFPE tissue blocks were retrieved from the tissue archives of the Institute of Pathology, University Medical Center Mainz, Germany, and the Department of Dermatology, University Medical Center Mainz, Germany. The multi-tumor-TMA block was sectioned at 3µm thickness onto SuperFrost Plus microscopy slides before being processed for CODEX multiplex imaging as previously described. CODEX multiplexed imaging and processing To run the CODEX machine, the slide was taken from the storage buffer and placed in PBS for 10 minutes to equilibrate. After drying the PBS with a tissue, a flow cell was sealed onto the tissue slide. The assembled slide and flow cell were then placed in a PhenoCycler Buffer made from 10X PhenoCycler Buffer & Additive for at least 10 minutes before starting the experiment. A 96-well reporter plate was prepared with each reporter corresponding to the correct barcoded antibody for each cycle, with up to 3 reporters per cycle per well. The fluorescence reporters were mixed with 1X PhenoCycler Buffer, Additive, nuclear-staining reagent, and assay reagent according to the manufacturer's instructions. With the reporter plate and assembled slide and flow cell placed into the CODEX machine, the automated multiplexed imaging experiment was initiated. Each imaging cycle included steps for reporter binding, imaging of three fluorescent channels, and reporter stripping to prepare for the next cycle and set of markers. This was repeated until all markers were imaged. After the experiment, a .qptiff image file containing individual antibody channels and the DAPI channel was obtained. Image stitching, drift compensation, deconvolution, and cycle concatenation are performed within the Akoya PhenoCycler software. The raw imaging data output (tiff, 377.442nm per pixel for 20x CODEX) is first examined with QuPath software (https://qupath.github.io/) for inspection of staining quality. Any markers that produce unexpected patterns or low signal-to-noise ratios should be excluded from the ensuing analysis. The qptiff files must be converted into tiff files for input into SPACEc. Data preprocessing includes image stitching, drift compensation, deconvolution, and cycle concatenation performed using the Akoya Phenocycler software. The raw imaging data (qptiff, 377.442 nm/pixel for 20x CODEX) files from the Akoya PhenoCycler technology were first examined with QuPath software (https://qupath.github.io/) to inspect staining qualities. Markers with untenable patterns or low signal-to-noise ratios were excluded from further analysis. A custom CODEX analysis pipeline was used to process all acquired CODEX data (scripts available upon request). The qptiff files were converted into tiff files for tissue detection (watershed algorithm) and cell segmentation.
Python has become one of the most popular programming languages, with a wide variety of use cases. In 2022, Python is most used for web development and data analysis, with 46 percent and 54 percent respectively.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please also see the latest version of the repository: |
The explosion in the volume of biological imaging data challenges the available technologies for data interrogation and its intersection with related published bioinformatics data sets. Moreover, intersection of highly rich and complex datasets from different sources provided as flat csv files requires advanced informatics skills, which is time consuming and not accessible to all. Here, we provide a “user manual” to our new paradigm for systematically filtering and analysing a dataset with more than 1300 microscopy data figures using Multi-Dimensional Viewer (MDV) -link, a solution for interactive multimodal data visualisation and exploration. The primary data we use are derived from our published systematic analysis of 200 YFP traps reveals common discordance between mRNA and protein across the nervous system (eprint link). This manual provides the raw image data together with the expert annotations of the mRNA and protein distribution as well as associated bioinformatics data. We provide an explanation, with specific examples, of how to use MDV to make the multiple data types interoperable and explore them together. We also provide the open-source python code (github link) used to annotate the figures, which could be adapted to any other kind of data annotation task.
Data of the relevant plots of the thesis
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Scientific and related management challenges in the water domain require synthesis of data from multiple domains. Many data analysis tasks are difficult because datasets are large and complex; standard formats for data types are not always agreed upon nor mapped to an efficient structure for analysis; water scientists may lack training in methods needed to efficiently tackle large and complex datasets; and available tools can make it difficult to share, collaborate around, and reproduce scientific work. Overcoming these barriers to accessing, organizing, and preparing datasets for analyses will be an enabler for transforming scientific inquiries. Building on the HydroShare repository’s established cyberinfrastructure, we have advanced two packages for the Python language that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS), loading of data into performant structures keyed to specific scientific data types and that integrate with existing visualization, analysis, and data science capabilities available in Python, and then writing analysis results back to HydroShare for sharing and eventual publication. These capabilities reduce the technical burden for scientists associated with creating a computational environment for executing analyses by installing and maintaining the packages within CUAHSI’s HydroShare-linked JupyterHub server. HydroShare users can leverage these tools to build, share, and publish more reproducible scientific workflows. The HydroShare Python Client and USGS NWIS Data Retrieval packages can be installed within a Python environment on any computer running Microsoft Windows, Apple MacOS, or Linux from the Python Package Index using the PIP utility. They can also be used online via the CUAHSI JupyterHub server (https://jupyterhub.cuahsi.org/) or other Python notebook environments like Google Collaboratory (https://colab.research.google.com/). Source code, documentation, and examples for the software are freely available in GitHub at https://github.com/hydroshare/hsclient/ and https://github.com/USGS-python/dataretrieval.
This presentation was delivered as part of the Hawai'i Data Science Institute's regular seminar series: https://datascience.hawaii.edu/event/data-science-and-analytics-for-water/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Author: Andrew J. Felton
Date: 11/15/2024
This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis, and figure production for the study entitled:
"Global estimates of the storage and transit time of water through vegetation"
Please note that 'turnover' and 'transit' are used interchangeably. Also please note that this R project has been updated multiple times as the analysis has updated throughout the peer review process.
#Data information:
The data folder contains key data sets used for analysis. In particular:
"data/turnover_from_python/updated/august_2024_lc/" contains the core datasets used in this study including global arrays summarizing five year (2016-2020) averages of mean (annual) and minimum (monthly) transit time, storage, canopy transpiration, and number of months of data able as both an array (.nc) or data table (.csv). These data were produced in python using the python scripts found in the "supporting_code" folder. The remaining files in the "data" and "data/supporting_data" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here. The "supporting_data"" folder also contains annual (2016-2020) MODIS land cover data used in the analysis and contains separate filters containing the original data (.hdf) and then the final process (filtered) data in .nc format. The resulting annual land cover distributions were used in the pre-processing of data in python.
#Code information
Python scripts can be found in the "supporting_code" folder.
Each R script in this project has a role:
"01_start.R": This script sets the working directory, loads in the tidyverse package (the remaining packages in this project are called using the `::` operator), and can run two other scripts: one that loads the customized functions (02_functions.R) and one for importing and processing the key dataset for this analysis (03_import_data.R).
"02_functions.R": This script contains custom functions. Load this using the `source()` function in the 01_start.R script.
"03_import_data.R": This script imports and processes the .csv transit data. It joins the mean (annual) transit time data with the minimum (monthly) transit data to generate one dataset for analysis: annual_turnover_2. Load this using the
`source()` function in the 01_start.R script.
"04_figures_tables.R": This is the main workhouse for figure/table production and supporting analyses. This script generates the key figures and summary statistics used in the study that then get saved in the "manuscript_figures" folder. Note that all maps were produced using Python code found in the "supporting_code"" folder. Also note that within the "manuscript_figures" folder there is an "extended_data" folder, which contains tables of the summary statistics (e.g., quartiles and sample sizes) behind figures containing box plots or depicting regression coefficients.
"supporting_generate_data.R": This script processes supporting data used in the analysis, primarily the varying ground-based datasets of leaf water content.
"supporting_process_land_cover.R": This takes annual MODIS land cover distributions and processes them through a multi-step filtering process so that they can be used in preprocessing of datasets in python.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Amir Islam
Released under MIT
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Technical Leverage Analysis in the Python Ecosystem
This dataset is the original dataset used in the publication [1]. It includes 21205 distinct package versions from the top 600 Python packages. An online demo for computing the proposed metrics for real-world software libraries is also available under the following URL: https://techleverage.eu/.
This work has been partially funded by the EU under the H2020 Program AssureMOSS (Grant n. 952647).
[1] DOI: 10.1007/s10664-023-10355-2
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data and Python script are part of the forum article "A text mining analysis of the climate change literature in industrial ecology" authored by Dayeen, F.R., Sharma, A.S., and Derrible, S., and published in the Journal of Industrial Ecology in 2020.
The Python script and instructions are included in the LiTCoF_v1.00-py.zip file. The original data is available in two formats: .csv and .pkl.
Updates of the script will be posted at https://github.com/csunlab/LiTCoF and at https://csun.uic.edu/codes/LiTCoF.html. The data is also available at https://csun.uic.edu/datasets.html#AbstractsIE.
Feel free to contact any of the authors for information and questions about the data and code.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The folder named “submission” contains the following:
ijgis.yml
: This file lists all the Python libraries and dependencies required to run the code.ijgis.yml
file to create a Python project and environment. Ensure you activate the environment before running the code.pythonProject
folder contains several .py
files and subfolders, each with specific functionality as described below..png
file for each column of the raw gaze and IMU recordings, color-coded with logged events..csv
files.overlapping_sliding_window_loop.py
.plot_labels_comparison(df, save_path, x_label_freq=10, figsize=(15, 5))
in line 116 visualizes the data preparation results. As this visualization is not used in the paper, the line is commented out, but if you want to see visually what has been changed compared to the original data, you can comment out this line..csv
files in the results folder.This part contains three main code blocks:
iii. One for the XGboost code with correct hyperparameter tuning:
Please read the instructions for each block carefully to ensure that the code works smoothly. Regardless of which block you use, you will get the classification results (in the form of scores) for unseen data. The way we empirically test the confidence threshold of
Note: Please read the instructions for each block carefully to ensure that the code works smoothly. Regardless of which block you use, you will get the classification results (in the form of scores) for unseen data. The way we empirically calculated the confidence threshold of the model (explained in the paper in Section 5.2. Part II: Decoding surveillance by sequence analysis) is given in this block in lines 361 to 380.
.csv
file containing inferred labels.The data is licensed under CC-BY, the code is licensed under MIT.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This archive reproduces a table titled "Table 3.1 Boone county population size, 1990 and 2000" from Wang and vom Hofe (2007, p.58). The archive provides a Jupyter Notebook that uses Python and can be run in Google Colaboratory. The workflow uses Census API to retrieve data, reproduce the table, and ensure reproducibility for anyone accessing this archive.The Python code was developed in Google Colaboratory, or Google Colab for short, which is an Integrated Development Environment (IDE) of JupyterLab and streamlines package installation, code collaboration and management. The Census API is used to obtain population counts from the 1990 and 2000 Decennial Census (Summary File 1, 100% data). All downloaded data are maintained in the notebook's temporary working directory while in use. The data are also stored separately with this archive.The notebook features extensive explanations, comments, code snippets, and code output. The notebook can be viewed in a PDF format or downloaded and opened in Google Colab. References to external resources are also provided for the various functional components. The notebook features code to perform the following functions:install/import necessary Python packagesintroduce a Census API Querydownload Census data via CensusAPI manipulate Census tabular data calculate absolute change and percent changeformatting numbersexport the table to csvThe notebook can be modified to perform the same operations for any county in the United States by changing the State and County FIPS code parameters for the Census API downloads. The notebook could be adapted for use in other environments (i.e., Jupyter Notebook) as well as reading and writing files to a local or shared drive, or cloud drive (i.e., Google Drive).
A python module for consensus analysis in interlaboratory studies
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset presents detailed building operation data from the three blocks (A, B and C) of the Pleiades building of the University of Murcia, which is a pilot building of the European project PHOENIX. The aim of PHOENIX is to improve buildings efficiency, and therefore we included information of:
(i) consumption data, aggregated by block in kWh; (ii) HVAC (Heating, Ventilation and Air Conditioning) data with several features, such as state (ON=1, OFF=0), operation mode (None=0, Heating=1, Cooling=2), setpoint and device type; (iii) indoor temperature per room; (iv) weather data, including temperature, humidity, radiation, dew point, wind direction and precipitation; (v) carbon dioxide and presence data for few rooms; (vi) relationships between HVAC, temperature, carbon dioxide and presence sensors identifiers with their respective rooms and blocks. Weather data was acquired from the IMIDA (Instituto Murciano de Investigación y Desarrollo Agrario y Alimentario).