100+ datasets found

Python Codes for Data Analysis of The Impact of COVID-19 on Technical...
figshare.com
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elizabeth Szkirpan (2022). Python Codes for Data Analysis of The Impact of COVID-19 on Technical Services Units Survey Results [Dataset]. http://doi.org/10.6084/m9.figshare.20416092.v1
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.20416092.v1
Dataset updated
Aug 1, 2022
Dataset provided by
Figsharehttp://figshare.com/
Authors
Elizabeth Szkirpan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Copies of Anaconda 3 Jupyter Notebooks and Python script for holistic and clustered analysis of "The Impact of COVID-19 on Technical Services Units" survey results. Data was analyzed holistically using cleaned and standardized survey results and by library type clusters. To streamline data analysis in certain locations, an off-shoot CSV file was created so data could be standardized without compromising the integrity of the parent clean file. Three Jupyter Notebooks/Python scripts are available in relation to this project: COVID_Impact_TechnicalServices_HolisticAnalysis (a holistic analysis of all survey data) and COVID_Impact_TechnicalServices_LibraryTypeAnalysis (a clustered analysis of impact by library type, clustered files available as part of the Dataverse for this project).
H
Using Python Packages and HydroShare to Advance Open Data Science and...
beta.hydroshare.org
hydroshare.org
zip
Updated Sep 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffery S. Horsburgh; Amber Spackman Jones; Anthony M. Castronova; Scott Black (2023). Using Python Packages and HydroShare to Advance Open Data Science and Analytics for Water [Dataset]. https://beta.hydroshare.org/resource/4f4acbab5a8c4c55aa06c52a62a1d1fb/
Explore at:
zip(31.0 MB)Available download formats
Dataset updated
Sep 28, 2023
Dataset provided by
HydroShare
Authors
Jeffery S. Horsburgh; Amber Spackman Jones; Anthony M. Castronova; Scott Black
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Scientific and management challenges in the water domain require synthesis of diverse data. Many data analysis tasks are difficult because datasets are large and complex; standard data formats are not always agreed upon or mapped to efficient structures for analysis; scientists may lack training for tackling large and complex datasets; and it can be difficult to share, collaborate around, and reproduce scientific work. Overcoming barriers to accessing, organizing, and preparing datasets for analyses can transform the way water scientists work. Building on the HydroShare repository’s cyberinfrastructure, we have advanced two Python packages that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS) (i.e., a Python equivalent of USGS’ R dataRetrieval package), loading data into performant structures that integrate with existing visualization, analysis, and data science capabilities available in Python, and writing analysis results back to HydroShare for sharing and publication. While these Python packages can be installed for use within any Python environment, we will demonstrate how the technical burden for scientists associated with creating a computational environment for executing analyses can be reduced and how sharing and reproducibility of analyses can be enhanced through the use of these packages within CUAHSI’s HydroShare-linked JupyterHub server.

This HydroShare resource includes all of the materials presented in a workshop at the 2023 CUAHSI Biennial Colloquium.
Storage and Transit Time Data and Code
zenodo.org
zip
Updated Oct 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Felton; Andrew Felton (2024). Storage and Transit Time Data and Code [Dataset]. http://doi.org/10.5281/zenodo.14009758
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14009758
Dataset updated
Oct 29, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andrew Felton; Andrew Felton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Author: Andrew J. Felton
Date: 10/29/2024

This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis, and figure production for the study entitled:

"Global estimates of the storage and transit time of water through vegetation"

Please note that 'turnover' and 'transit' are used interchangeably. Also please note that this R project has been updated multiple times as the analysis has updated.

Data information:

The data folder contains key data sets used for analysis. In particular:

"data/turnover_from_python/updated/august_2024_lc/" contains the core datasets used in this study including global arrays summarizing five year (2016-2020) averages of mean (annual) and minimum (monthly) transit time, storage, canopy transpiration, and number of months of data able as both an array (.nc) or data table (.csv). These data were produced in python using the python scripts found in the "supporting_code" folder. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here. The "supporting_data"" folder also contains annual (2016-2020) MODIS land cover data used in the analysis and contains separate filters containing the original data (.hdf) and then the final process (filtered) data in .nc format. The resulting annual land cover distributions were used in the pre-processing of data in python.

#Code information

Python scripts can be found in the "supporting_code" folder.

Each R script in this project has a role:

"01_start.R": This script sets the working directory, loads in the tidyverse package (the remaining packages in this project are called using the `::` operator), and can run two other scripts: one that loads the customized functions (02_functions.R) and one for importing and processing the key dataset for this analysis (03_import_data.R).

"02_functions.R": This script contains custom functions. Load this using the
`source()` function in the 01_start.R script.

"03_import_data.R": This script imports and processes the .csv transit data. It joins the mean (annual) transit time data with the minimum (monthly) transit data to generate one dataset for analysis: annual_turnover_2. Load this using the
`source()` function in the 01_start.R script.

"04_figures_tables.R": This is the main workhouse for figure/table production and
supporting analyses. This script generates the key figures and summary statistics
used in the study that then get saved in the manuscript_figures folder. Note that all
maps were produced using Python code found in the "supporting_code"" folder.

"supporting_generate_data.R": This script processes supporting data used in the analysis, primarily the varying ground-based datasets of leaf water content.

"supporting_process_land_cover.R": This takes annual MODIS land cover distributions and processes them through a multi-step filtering process so that they can be used in preprocessing of datasets in python.
u
Python codes for STM data analysis
researchdata.cab.unipd.it
Updated 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christian Durante; Francesco Cazzadori; Alessandro Facchin; Silvio Reginato (2025). Python codes for STM data analysis [Dataset]. http://doi.org/10.25430/researchdata.cab.unipd.it.00001489
Explore at:
Unique identifier
https://doi.org/10.25430/researchdata.cab.unipd.it.00001489
Dataset updated
2025
Dataset provided by
Research Data Unipd
Authors
Christian Durante; Francesco Cazzadori; Alessandro Facchin; Silvio Reginato
Description
Python codes were conceived to work with ASCII .txt files with XYZ arrays, both as input and output. This makes codes highly compatible and universally usable. Code A provides an example of conversion from a .s94 data format to the requested ASCII .txt. Image analysis software always allow to export source files to .txt files with XYZ arrays, sometimes placing a text header before the data values to indicate the data scales. The script (code A) is created to convert raw STM files (.s94) into XYZ-type ASCII files, that can be opened by the WSxM software The script (code B) is developed to read the XYZ-type ASCII files and perform the flattening and equalizing filters by operating with an entire input file folder. The script (code C) was conceived with the possibility of optimizing the number of clusters The script (code D) reads a sample of images starting from the first one to the number N, which is selected by the user, it calculates the maximum extension of the Z values distribution for every image and returns an average extension value The script (code E) correct the drift affecting STM images
Explore data formats and ingestion methods
kaggle.com
Updated Feb 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriel Preda (2021). Explore data formats and ingestion methods [Dataset]. https://www.kaggle.com/datasets/gpreda/iris-dataset/discussion?sort=undefined
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 12, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Gabriel Preda
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Why this Dataset

This dataset brings to you Iris Dataset in several data formats (see more details in the next sections).

You can use it to test the ingestion of data in all these formats using Python or R libraries. We also prepared Python Jupyter Notebook and R Markdown report that input all these formats:

Test Data Formats in Python

Test Data Formats in R

Iris Dataset

Iris Dataset was created by R. A. Fisher and donated by Michael Marshall.

Repository on UCI site: https://archive.ics.uci.edu/ml/datasets/iris

Data Source: https://archive.ics.uci.edu/ml/machine-learning-databases/iris/

The file downloaded is iris.data and is formatted as a comma delimited file.

This small data collection was created to help you test your skills with ingesting various data formats.

Content

This file was processed to convert the data in the following formats: * csv - comma separated values format * tsv - tab separated values format * parquet - parquet format
* feather - feather format * parquet.gzip - compressed parquet format * h5 - hdf5 format * pickle - Python binary object file - pickle format * xslx - Excel format
* npy - Numpy (Python library) binary format * npz - Numpy (Python library) binary compressed format * rds - Rds (R specific data format) binary format

Acknowledgements

I would like to acknowledge the work of the creator of the dataset - R. A. Fisher and of the donor - Michael Marshall.

Inspiration

Use these data formats to test your skills in ingesting data in various formats.
D
Python Compiler Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Sep 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Python Compiler Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-python-compiler-market
Explore at:
pdf, csv, pptxAvailable download formats
Dataset updated
Sep 22, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Python Compiler Market Outlook

The Python Compiler market size was valued at USD 341.2 million in 2023, and it is projected to grow at a robust compound annual growth rate (CAGR) of 10.8% to reach USD 842.6 million by 2032. The market growth is primarily driven by the increasing adoption of Python as a programming language across various industries, due to its simplicity and versatility.

The growth of the Python Compiler market can be attributed to several key factors. Firstly, the rising prominence of Python in data science, machine learning, and artificial intelligence domains is a significant driver. Python’s extensive libraries and frameworks make it an ideal choice for data processing and algorithm development, leading to increased demand for efficient Python compilers. This widespread application is spurring investments and advancements in compiler technologies to support increasingly complex computational tasks. Additionally, the open-source nature of Python encourages innovation and customization, further fueling market expansion.

Secondly, the educational sector's growing emphasis on coding and computer science education is another pivotal growth factor. Python is often chosen as the introductory programming language in educational institutions due to its readability and straightforward syntax. This trend is creating a steady demand for Python compilers that are user-friendly and suitable for educational purposes. As more schools and universities integrate Python into their curriculums, the market for Python compilers is expected to grow correspondingly, supporting a new generation of programmers and developers.

Furthermore, the increasing adoption of Python by small and medium enterprises (SMEs) is propelling the market forward. SMEs are leveraging Python for various applications, including web development, automation, and data analysis, due to its cost-effectiveness and ease of use. Python’s versatility allows businesses to streamline their operations and develop robust solutions without significant financial investment. This has led to a burgeoning demand for both on-premises and cloud-based Python compilers that can cater to the diverse needs of SMEs across different sectors.

Regionally, the Python Compiler market is witnessing notable growth in North America and the Asia Pacific. North America remains a key market due to the early adoption of advanced technologies and a strong presence of tech giants and startups alike. In contrast, the Asia Pacific region is experiencing rapid growth thanks to its expanding technological infrastructure and burgeoning IT industry. Countries like India and China are emerging as significant players due to their large pool of skilled developers and increasing investment in tech education and innovation.

Component Analysis

In the Python Compiler market, the component segment is divided into software and services. The software segment encompasses the actual compiler tools and integrated development environments (IDEs) that developers use to write and optimize Python code. This segment is crucial as it directly impacts the efficiency and performance of Python applications. The demand for advanced compiler software is on the rise due to the need for high-performance computing in areas like machine learning, artificial intelligence, and big data analytics. Enhanced features such as real-time error detection, optimization techniques, and seamless integration with other development tools are driving the adoption of sophisticated Python compiler software.

The services segment includes support, maintenance, consulting, and training services associated with Python compilers. As organizations increasingly adopt Python for critical applications, the need for professional services to ensure optimal performance and scalability is growing. Consulting services help businesses customize and optimize their Python environments to meet specific needs, while training services are essential for upskilling employees and staying competitive in the tech-driven market. Additionally, support and maintenance services ensure that the compilers continue to operate efficiently and securely, minimizing downtime and enhancing productivity.

Within the software sub-segment, integrated development environments (IDEs) like PyCharm, Spyder, and Jupyter Notebooks are gaining traction. These IDEs not only provide robust compiling capabilities but also offer features like debugging, syntax highlighting, and version control, which streamline the development process. The increasing complexity of software develo
f
Data from: Simple Tool for Rapidly Assessing the Quality of Multiplexed...
acs.figshare.com
xlsx
Updated Nov 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Conor Jenkins; Benjamin C. Orsburn (2023). Simple Tool for Rapidly Assessing the Quality of Multiplexed Single Cell Proteomics Data [Dataset]. http://doi.org/10.1021/jasms.3c00238.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/jasms.3c00238.s001
Dataset updated
Nov 24, 2023
Dataset provided by
ACS Publications
Authors
Conor Jenkins; Benjamin C. Orsburn
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Recent advances in the sensitivity and speed of mass spectrometers coupled with improved sample preparation methods have enabled the field of single cell proteomics to proliferate. While heavy development is occurring in the label free space, dramatic improvements in throughput are provided by multiplexing with tandem mass tags. Hundreds or thousands of single cells can be analyzed with this method, yielding large data sets which may contain poor data arising from loss of material during cell sorting or poor digestion, labeling, and lysis. To date, no tools have been described that can assess data quality prior to data processing. We present herein a lightweight python script and accompanying graphic user interface that can rapidly quantify reporter ion peaks within each MS/MS spectrum in a file. With simple summary reports, we can identify single cell samples that fail to pass a set quality threshold, thus reducing analysis time waste. In addition, this tool, Diagnostic Ion Data Analysis Reduction (DIDAR), will create reduced MGF files containing only spectra possessing a user-specified number of single cell reporter ions. By reducing the number of spectra that have excessive zero values, we can speed up sample processing with little loss in data completeness as these spectra are removed in later stages in data processing workflows. DIDAR and the DIDAR GUI are compatible with all modern operating systems and are available at: https://github.com/orsburn/DIDARSCPQC. All files described in this study are available at www.massive.ucsd.edu as accession MSV000088887.
e
PyPlant: A Python Framework for Cached Function Pipelines - Dataset - B2FIND...
b2find.eudat.eu
Updated Feb 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). PyPlant: A Python Framework for Cached Function Pipelines - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/d0071f07-9179-5086-aef7-c1076eb3d771
Explore at:
Dataset updated
Feb 24, 2025
Description
PyPlant is a simple coroutine-based framework for writing data processing pipelines. PyPlant's goal is to simplify caching of intermediate results in the pipeline and avoid re-running expensive early stages of the pipeline, when only the later stages have changed. PyPlant is a simple coroutine-based framework for writing data processing pipelines. Given a set of Python functions that consume and produce data, it automatically runs them in a correct order and caches intermediate results. When the pipeline is executed again, only the necessary parts are re-run. Importantly, PyPlant was designed with the following design consideration in mind: Simple: Quick to learn, no custom language and workflow design programs. Start prototyping right away. DRY: Function code is metadata. No need to write execution graphs or external metadata. It just works (tm). Automatic: No need to manually re-run outdated parts. Large data: Handle data that doesn't fit into memory. Persist between runs. PyPlant can be installed from PyPI: pip install pyplant For documentation, see README.md.
f
Data from: spectrum_utils: A Python Package for Mass Spectrometry Data...
acs.figshare.com
text/x-python
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wout Bittremieux (2023). spectrum_utils: A Python Package for Mass Spectrometry Data Processing and Visualization [Dataset]. http://doi.org/10.1021/acs.analchem.9b04884.s001
Explore at:
text/x-pythonAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.analchem.9b04884.s001
Dataset updated
May 31, 2023
Dataset provided by
ACS Publications
Authors
Wout Bittremieux
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Given the wide diversity in applications of biological mass spectrometry, custom data analyses are often needed to fully interpret the results of an experiment. Such bioinformatics scripts necessarily include similar basic functionality to read mass spectral data from standard file formats, process it, and visualize it. Rather than having to reimplement this functionality, to facilitate this task, spectrum_utils is a Python package for mass spectrometry data processing and visualization. Its high-level functionality enables developers to quickly prototype ideas for computational mass spectrometry projects in only a few lines of code. Notably, the data processing functionality is highly optimized for computational efficiency to be able to deal with the large volumes of data that are generated during mass spectrometry experiments. The visualization functionality makes it possible to easily produce publication-quality figures as well as interactive spectrum plots for inclusion on web pages. spectrum_utils is available for Python 3.6+, includes extensive online documentation and examples, and can be easily installed using conda. It is freely available as open source under the Apache 2.0 license at https://github.com/bittremieux/spectrum_utils.
t
Data from: Decoding Wayfinding: Analyzing Wayfinding Processes in the...
researchdata.tuwien.at
researchdata.tuwien.ac.at
+1more
html, pdf, zip
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Negar Alinaghi; Ioannis Giannopoulos; Ioannis Giannopoulos; Negar Alinaghi; Negar Alinaghi; Negar Alinaghi (2025). Decoding Wayfinding: Analyzing Wayfinding Processes in the Outdoor Environment [Dataset]. http://doi.org/10.48436/m2ha4-t1v92
Explore at:
html, zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.48436/m2ha4-t1v92
Dataset updated
Mar 19, 2025
Dataset provided by
TU Wien
Authors
Negar Alinaghi; Ioannis Giannopoulos; Ioannis Giannopoulos; Negar Alinaghi; Negar Alinaghi; Negar Alinaghi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
How To Cite?

Alinaghi, N., Giannopoulos, I., Kattenbeck, M., & Raubal, M. (2025). Decoding wayfinding: analyzing wayfinding processes in the outdoor environment. International Journal of Geographical Information Science, 1–31. https://doi.org/10.1080/13658816.2025.2473599

Link to the paper: https://www.tandfonline.com/doi/full/10.1080/13658816.2025.2473599

Folder Structure

The folder named “submission” contains the following:

“pythonProject”: This folder contains all the Python files and subfolders needed for analysis.

ijgis.yml: This file lists all the Python libraries and dependencies required to run the code.

Setting Up the Environment

Use the ijgis.yml file to create a Python project and environment. Ensure you activate the environment before running the code.

The pythonProject folder contains several .py files and subfolders, each with specific functionality as described below.

Subfolders

1. Data_4_IJGIS

This folder contains the data used for the results reported in the paper.

Note: The data analysis that we explain in this paper already begins with the synchronization and cleaning of the recorded raw data. The published data is already synchronized and cleaned. Both the cleaned files and the merged files with features extracted for them are given in this directory. If you want to perform the segmentation and feature extraction yourself, you should run the respective Python files yourself. If not, you can use the “merged_…csv” files as input for the training.

2. results_[DateTime] (e.g., results_20240906_15_00_13)

This folder will be generated when you run the code and will store the output of each step.

The current folder contains results created during code debugging for the submission.

When you run the code, a new folder with fresh results will be generated.

Python Files

1. helper_functions.py

Contains reusable functions used throughout the analysis.

Each function includes a description of its purpose and the input parameters required.

2. create_sanity_plots.py

Generates scatter plots like those in Figure 3 of the paper.

Although the code has been run for all 309 trials, it can be used to check the sample data provided.

Output: A .png file for each column of the raw gaze and IMU recordings, color-coded with logged events.

Usage: Run this file to create visualizations similar to Figure 3.

3. overlapping_sliding_window_loop.py

Implements overlapping sliding window segmentation and generates plots like those in Figure 4.

Output:

Two new subfolders, “Gaze” and “IMU”, will be added to the Data_4_IJGIS folder.

Segmented files (default: 2–10 seconds with a 1-second step size) will be saved as .csv files.

A visualization of the segments, similar to Figure 4, will be automatically generated.

4. gaze_features.py & imu_features.py (Note: there has been an update to the IDT function implementation in the gaze_features.py on 19.03.2025.)

These files compute features as explained in Tables 1 and 2 of the paper, respectively.

They process the segmented recordings generated by the overlapping_sliding_window_loop.py.

Usage: Just to know how the features are calculated, you can run this code after the segmentation with the sliding window and run these files to calculate the features from the segmented data.

5. training_prediction.py

This file contains the main machine learning analysis of the paper. This file contains all the code for the training of the model, its evaluation, and its use for the inference of the “monitoring part”. It covers the following steps:

a. Data Preparation (corresponding to Section 5.1.1 of the paper)

Prepares the data according to the research question (RQ) described in the paper. Since this data was collected with several RQs in mind, we remove parts of the data that are not related to the RQ of this paper.

A function named plot_labels_comparison(df, save_path, x_label_freq=10, figsize=(15, 5)) in line 116 visualizes the data preparation results. As this visualization is not used in the paper, the line is commented out, but if you want to see visually what has been changed compared to the original data, you can comment out this line.

b. Training/Validation/Test Split

Splits the data for machine learning experiments (an explanation can be found in Section 5.1.1. Preparation of data for training and inference of the paper).

Make sure that you follow the instructions in the comments to the code exactly.

Output: The split data is saved as .csv files in the results folder.

c. Machine and Deep Learning Experiments

This part contains three main code blocks:

iii. One for the XGboost code with correct hyperparameter tuning:
Please read the instructions for each block carefully to ensure that the code works smoothly. Regardless of which block you use, you will get the classification results (in the form of scores) for unseen data. The way we empirically test the confidence threshold of

MLP Network (Commented Out): This code was used for classification with the MLP network, and the results shown in Table 3 are from this code. If you wish to use this model, please comment out the following blocks accordingly.

XGBoost without Hyperparameter Tuning: If you want to run the code but do not want to spend time on the full training with hyperparameter tuning (as was done for the paper), just uncomment this part. This will give you a simple, untuned model with which you can achieve at least some results.

XGBoost with Hyperparameter Tuning: If you want to train the model the way we trained it for the analysis reported in the paper, use this block (the plots in Figure 7 are from this block). We ran this block with different feature sets and different segmentation files and created a simple bar chart from the saved results, shown in Figure 6.

Note: Please read the instructions for each block carefully to ensure that the code works smoothly. Regardless of which block you use, you will get the classification results (in the form of scores) for unseen data. The way we empirically calculated the confidence threshold of the model (explained in the paper in Section 5.2. Part II: Decoding surveillance by sequence analysis) is given in this block in lines 361 to 380.

d. Inference (Monitoring Part)

Final inference is performed using the monitoring data. This step produces a .csv file containing inferred labels.

Figure 8 in the paper is generated using this part of the code.

6. sequence_analysis.py

Performs analysis on the inferred data, producing Figures 9 and 10 from the paper.

This file reads the inferred data from the previous step and performs sequence analysis as described in Sections 5.2.1 and 5.2.2.

Licenses

The data is licensed under CC-BY, the code is licensed under MIT.
n
How Python Can Work For You
nccip.org
cope-open-data-deegsnccu.hub.arcgis.com
+1more
Updated Aug 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
East Carolina University (2023). How Python Can Work For You [Dataset]. https://www.nccip.org/datasets/ECU::how-python-can-work-for-you-
Explore at:
Dataset updated
Aug 26, 2023
Dataset authored and provided by
East Carolina University
Description
Python is a free computer language that prioritizes readability for humans and general application. It is one of the easier computer languages to learn and start especially with no prior programming knowledge. I have been using Python for Excel spreadsheet automation, data analysis, and data visualization. It has allowed me to better focus on learning how to automate my data analysis workload. I am currently examining the North Carolina Department of Environmental Quality (NCDEQ) database for water quality sampling for the Town of Nags Head, NC. It spans over 26 years (1997-2023) and lists a total of currently 41 different testing site locations. You can see at the bottom of image 2 below that I have 148,204 testing data points for the entirety of the NCDEQ testing for the state. From this large dataset 34,759 data points are from Dare County (Nags Head) specifically with this subdivided into testing sites.
Data Accompanying "CountASAP: A Lightweight, Easy to Use Python Package for...
zenodo.org
application/gzip
Updated Jul 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christopher Boughter; Christopher Boughter (2025). Data Accompanying "CountASAP: A Lightweight, Easy to Use Python Package for Processing ASAPseq Data" [Dataset]. http://doi.org/10.5281/zenodo.11264203
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11264203
Dataset updated
Jul 13, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Christopher Boughter; Christopher Boughter
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Raw CITEseq, ASAPseq, and ATAC seq FASTQ files that were processed using CountASAP, a recently developed software. These data were used in a manuscript highlighting the functionality and benchmarking of this software. These raw data should make it possible to recereate every figure in the manuscript.

Preprint Link:

https://www.biorxiv.org/content/10.1101/2024.05.20.595042v1

Github Link:

https://github.com/ctboughter/countASAP
d
Techniques for Increased Automation of Aquatic Sensor Data Post Processing...
search.dataone.org
hydroshare.org
+1more
Updated Dec 5, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amber Spackman Jones; Jeffery S. Horsburgh; Tannner Jones (2021). Techniques for Increased Automation of Aquatic Sensor Data Post Processing in Python: Video Presentation [Dataset]. https://search.dataone.org/view/sha256%3Ac5b617be5f503d53736c7b2393b85b95f764e569a31935c4829ced0a048c5760
Explore at:
Dataset updated
Dec 5, 2021
Dataset provided by
Hydroshare
Authors
Amber Spackman Jones; Jeffery S. Horsburgh; Tannner Jones
Description
This resource contains a video recording for a presentation given as part of the National Water Quality Monitoring Council conference in April 2021. The presentation covers the motivation for performing quality control for sensor data, the development of PyHydroQC, a Python package with functions for automating sensor quality control including anomaly detection and correction, and the performance of the algorithms applied to data from multiple sites in the Logan River Observatory.

The initial abstract for the presentation: Water quality sensors deployed to aquatic environments make measurements at high frequency and commonly include artifacts that do not represent the environmental phenomena targeted by the sensor. Sensors are subject to fouling from environmental conditions, often exhibit drift and calibration shifts, and report anomalies and erroneous readings due to issues with datalogging, transmission, and other unknown causes. The suitability of data for analyses and decision making often depend on subjective and time-consuming quality control processes consisting of manual review and adjustment of data. Data driven and machine learning techniques have the potential to automate identification and correction of anomalous data, streamlining the quality control process. We explored documented approaches and selected several for implementation in a reusable, extensible Python package designed for anomaly detection for aquatic sensor data. Implemented techniques include regression approaches that estimate values in a time series, flag a point as anomalous if the difference between the sensor measurement exceeds a threshold, and offer replacement values for correcting anomalies. Additional algorithms that scaffold the central regression approaches include rules-based preprocessing, thresholds for determining anomalies that adjust with data variability, and the ability to detect and correct anomalies using forecasted and backcasted estimation. The techniques were developed and tested based on several years of data from aquatic sensors deployed at multiple sites in the Logan River Observatory in northern Utah, USA. Performance was assessed based on labels and corrections applied previously by trained technicians. In this presentation, we describe the techniques for detection and correction, report their performance, illustrate the workflow for applying to high frequency aquatic sensor data, and demonstrate the possibility for additional approaches to help increase automation of aquatic sensor data post processing.
Datasets for manuscript "A data engineering framework for chemical flow...
catalog.data.gov
gimi9.com
Updated Nov 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2021). Datasets for manuscript "A data engineering framework for chemical flow analysis of industrial pollution abatement operations" [Dataset]. https://catalog.data.gov/dataset/datasets-for-manuscript-a-data-engineering-framework-for-chemical-flow-analysis-of-industr
Explore at:
Dataset updated
Nov 7, 2021
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
The EPA GitHub repository PAU4ChemAs as described in the README.md file, contains Python scripts written to build the PAU dataset modules (technologies, capital and operating costs, and chemical prices) for tracking chemical flows transfers, releases estimation, and identification of potential occupation exposure scenarios in pollution abatement units (PAUs). These PAUs are employed for on-site chemical end-of-life management. The folder datasets contains the outputs for each framework step. The Chemicals_in_categories.csv contains the chemicals for the TRI chemical categories. The EPA GitHub repository PAU_case_study as described in its readme.md entry, contains the Python scripts to run the manuscript case study for designing the PAUs, the data-driven models, and the decision-making module for chemicals of concern and tracking flow transfers at the end-of-life stage. The data was obtained by means of data engineering using different publicly-available databases. The properties of chemicals were obtained using the GitHub repository Properties_Scraper, while the PAU dataset using the repository PAU4Chem. Finally, the EPA GitHub repository Properties_Scraper contains a Python script to massively gather information about exposure limits and physical properties from different publicly-available sources: EPA, NOAA, OSHA, and the institute for Occupational Safety and Health of the German Social Accident Insurance (IFA). Also, all GitHub repositories describe the Python libraries required for running their code, how to use them, the obtained outputs files after running the Python script modules, and the corresponding EPA Disclaimer. This dataset is associated with the following publication: Hernandez-Betancur, J.D., M. Martin, and G.J. Ruiz-Mercado. A data engineering framework for on-site end-of-life industrial operations. JOURNAL OF CLEANER PRODUCTION. Elsevier Science Ltd, New York, NY, USA, 327: 129514, (2021).
Z
Dataset for paper "Mitigating the effect of errors in source parameters on...
data.niaid.nih.gov
zenodo.org
Updated Sep 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Phil-Simon Hardalupas (2022). Dataset for paper "Mitigating the effect of errors in source parameters on seismic (waveform) inversion" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6969601
Explore at:
Dataset updated
Sep 28, 2022
Dataset provided by
Nicholas Rawlinson
Nienke Blom
Phil-Simon Hardalupas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset corresponding to the journal article "Mitigating the effect of errors in source parameters on seismic (waveform) inversion" by Blom, Hardalupas and Rawlinson, accepted for publication in Geophysical Journal International. In this paper, we demonstrate the effect or errors in source parameters on seismic tomography, with a particular focus on (full) waveform tomography. We study effect both on forward modelling (i.e. comparing waveforms and measurements resulting from a perturbed vs. unperturbed source) and on seismic inversion (i.e. using a source which contains an (erroneous) perturbation to invert for Earth structure. These data were obtained using Salvus, a state-of-the-art (though proprietary) 3-D solver that can be used for wave propagation simulations (Afanasiev et al., GJI 2018).

This dataset contains:

The entire Salvus project. This project was prepared using Salvus version 0.11.x and 0.12.2 and should be fully compatible with the latter.

A number of Jupyter notebooks used to create all the figures, set up the project and do the data processing.

A number of Python scripts that are used in above notebooks.

two conda environment .yml files: one with the complete environment as used to produce this dataset, and one with the environment as supplied by Mondaic (the Salvus developers), on top of which I installed basemap and cartopy.

An overview of the inversion configurations used for each inversion experiment and the name of hte corresponding figures: inversion_runs_overview.ods / .csv .

Datasets corresponding to the different figures.

One dataset for Figure 1, showing the effect of a source perturbation in a real-world setting, as previously used by Blom et al., Solid Earth 2020

One dataset for Figure 2, showing how different methodologies and assumptions can lead to significantly different source parameters, notably including systematic shifts. This dataset was kindly supplied by Tim Craig (Craig, 2019).

A number of datasets (stored as pickled Pandas dataframes) derived from the Salvus project. We have computed:

travel-time arrival predictions from every source to all stations (df_stations...pkl)

misfits for different metrics for both P-wave centered and S-wave centered windows for all components on all stations, comparing every time waveforms from a reference source against waveforms from a perturbed source (df_misfits_cc.28s.pkl)

addition of synthetic waveforms for different (perturbed) moment tenors. All waveforms are stored in HDF5 (.h5) files of the ASDF (adaptable seismic data format) type

How to use this dataset:

To set up the conda environment:

make sure you have anaconda/miniconda

make sure you have access to Salvus functionality. This is not absolutely necessary, but most of the functionality within this dataset relies on salvus. You can do the analyses and create the figures without, but you'll have to hack around in the scripts to build workarounds.

Set up Salvus / create a conda environment. This is best done following the instructions on the Mondaic website. Check the changelog for breaking changes, in that case download an older salvus version.

Additionally in your conda env, install basemap and cartopy:

conda-env create -n salvus_0_12 -f environment.yml conda install -c conda-forge basemap conda install -c conda-forge cartopy

Install LASIF (https://github.com/dirkphilip/LASIF_2.0) and test. The project uses some lasif functionality.

To recreate the figures: This is extremely straightforward. Every figure has a corresponding Jupyter Notebook. Suffices to run the notebook in its entirety.

Figure 1: separate notebook, Fig1_event_98.py

Figure 2: separate notebook, Fig2_TimCraig_Andes_analysis.py

Figures 3-7: Figures_perturbation_study.py

Figures 8-10: Figures_toy_inversions.py

To recreate the dataframes in DATA: This can be done using the example notebook Create_perturbed_thrust_data_by_MT_addition.py and Misfits_moment_tensor_components.M66_M12.py . The same can easily be extended to the position shift and other perturbations you might want to investigate.

To recreate the complete Salvus project: This can be done using:

the notebook Prepare_project_Phil_28s_absb_M66.py (setting up project and running simulations)

the notebooks Moment_tensor_perturbations.py and Moment_tensor_perturbation_for_NS_thrust.py

For the inversions: using the notebook Inversion_SS_dip.M66.28s.py as an example. See the overview table inversion_runs_overview.ods (or .csv) as to naming conventions.

References:

Michael Afanasiev, Christian Boehm, Martin van Driel, Lion Krischer, Max Rietmann, Dave A May, Matthew G Knepley, Andreas Fichtner, Modular and flexible spectral-element waveform modelling in two and three dimensions, Geophysical Journal International, Volume 216, Issue 3, March 2019, Pages 1675–1692, https://doi.org/10.1093/gji/ggy469

Nienke Blom, Alexey Gokhberg, and Andreas Fichtner, Seismic waveform tomography of the central and eastern Mediterranean upper mantle, Solid Earth, Volume 11, Issue 2, 2020, Pages 669–690, 2020, https://doi.org/10.5194/se-11-669-2020

Tim J. Craig, Accurate depth determination for moderate-magnitude earthquakes using global teleseismic data. Journal of Geophysical Research: Solid Earth, 124, 2019, Pages 1759– 1780. https://doi.org/10.1029/2018JB016902
o
Thesis Artifacts for: Analyzing Environmental Data with the Titan Platform
explore.openaire.eu
Updated Oct 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cedric Tsatia Tsida (2020). Thesis Artifacts for: Analyzing Environmental Data with the Titan Platform [Dataset]. http://doi.org/10.5281/zenodo.4117704
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4117704
Dataset updated
Oct 22, 2020
Authors
Cedric Tsatia Tsida
Description
Thesis Artifacts for: Analyzing Environmental Data with the Titan Platform A detailed description can be found in the README.md. The corressponding thesis can be found here. Abstract The natural environment is one of our biggest treasures ― if not our biggest treasure ― and it must be preserved. Therefore, understanding our natural environment is becoming increasingly important as we struggle to respond to the implications of climate change and new diseases. Climate change is the greatest challenge faced by almost every species. With the help of environmental data, we can understand most of the complex interrelationships present in the climate and consequently take preventive measures. However, with the rise of new technologies, the collection of environmental data has become very rapid and complex, and therefore, it is difficult or even impossible to process them using traditional methods. To overcome this problem, it is important to extract pieces of information from the data as soon as they are produced. We used Titan, a software that applies stream processing, to analyze the environmental data collected across Germany. Titan comes with a set of solutions that facilitate the processing of huge amounts of data and allow software components to consume and supply data. In this work, three data sources have been connected directly to the Titan platform to collect diverse types of data. Once these connections were established, we analyzed the incoming data on the Titan platform using a set of data flows that consist of modular components connected to each other. Each modular component is responsible for a single task, such as data collection, data filtering, or data transformation. Moreover, we connect an external database to the platform to save the results of our analysis. Finally, we used a visualization tool to, create a dashboard for each flow implemented on the platform. We evaluate the scalability of the Titan platform based on the number of incoming records. To do so, we continuously increase the amount of workload into the Titan platform and determine the minimum number of component instances that are necessary to process data without reaching high traffic of data across the platform.
Demo dataset for: SPACEc, a streamlined, interactive Python workflow for...
zenodo.org
datadryad.org
bin, zip
Updated Jul 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuqi Tan; Yuqi Tan; Tim Kempchen; Tim Kempchen (2024). Demo dataset for: SPACEc, a streamlined, interactive Python workflow for multiplexed image processing and analysis [Dataset]. http://doi.org/10.5061/dryad.brv15dvj1
Explore at:
bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.brv15dvj1
Dataset updated
Jul 8, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yuqi Tan; Yuqi Tan; Tim Kempchen; Tim Kempchen
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Measurement technique
Tissue samples: Tonsil cores were extracted from a larger multi-tumor tissue microarray (TMA), which included a total of 66 unique tissues (51 malignant and semi-malignant tissues, as well as 15 non-malignant tissues). Representative tissue regions were annotated on corresponding hematoxylin and eosin (H&E)-stained sections by a board-certified surgical pathologist (S.Z.). Annotations were used to generate the 66 cores each with cores of 1mm diameter. FFPE tissue blocks were retrieved from the tissue archives of the Institute of Pathology, University Medical Center Mainz, Germany, and the Department of Dermatology, University Medical Center Mainz, Germany. The multi-tumor-TMA block was sectioned at 3µm thickness onto SuperFrost Plus microscopy slides before being processed for CODEX multiplex imaging as previously described. CODEX multiplexed imaging and processing To run the CODEX machine, the slide was taken from the storage buffer and placed in PBS for 10 minutes to equilibrate. After drying the PBS with a tissue, a flow cell was sealed onto the tissue slide. The assembled slide and flow cell were then placed in a PhenoCycler Buffer made from 10X PhenoCycler Buffer \& Additive for at least 10 minutes before starting the experiment. A 96-well reporter plate was prepared with each reporter corresponding to the correct barcoded antibody for each cycle, with up to 3 reporters per cycle per well. The fluorescence reporters were mixed with 1X PhenoCycler Buffer, Additive, nuclear-staining reagent, and assay reagent according to the manufacturer's instructions. With the reporter plate and assembled slide and flow cell placed into the CODEX machine, the automated multiplexed imaging experiment was initiated. Each imaging cycle included steps for reporter binding, imaging of three fluorescent channels, and reporter stripping to prepare for the next cycle and set of markers. This was repeated until all markers were imaged. After the experiment, a .qptiff image file containing individual antibody channels and the DAPI channel was obtained. Image stitching, drift compensation, deconvolution, and cycle concatenation are performed within the Akoya PhenoCycler software. The raw imaging data output (tiff, 377.442nm per pixel for 20x CODEX) is first examined with QuPath software (https://qupath.github.io/) for inspection of staining quality. Any markers that produce unexpected patterns or low signal-to-noise ratios should be excluded from the ensuing analysis. The qptiff files must be converted into tiff files for input into SPACEc. Data preprocessing includes image stitching, drift compensation, deconvolution, and cycle concatenation performed using the Akoya Phenocycler software. The raw imaging data (qptiff, 377.442 nm/pixel for 20x CODEX) files from the Akoya PhenoCycler technology were first examined with QuPath software (https://qupath.github.io/) to inspect staining qualities. Markers with untenable patterns or low signal-to-noise ratios were excluded from further analysis. A custom CODEX analysis pipeline was used to process all acquired CODEX data (scripts available upon request). The qptiff files were converted into tiff files for tissue detection (watershed algorithm) and cell segmentation.
Description
Multiplexed imaging technologies provide insights into complex tissue architectures. However, challenges arise due to software fragmentation with cumbersome data handoffs, inefficiencies in processing large images (8 to 40 gigabytes per image), and limited spatial analysis capabilities. To efficiently analyze multiplexed imaging data, we developed SPACEc, a scalable end-to-end Python solution, that handles image extraction, cell segmentation, and data preprocessing and incorporates machine-learning-enabled, multi-scaled, spatial analysis, operated through a user-friendly and interactive interface.

The demonstration dataset was derived from a previous analysis and contains TMA cores from a human tonsil and tonsillitis sample that were acquired with the Akoya PhenocyclerFusion platform. The dataset can be used to test the workflow and establish it on a user's system or to familiarize oneself with the pipeline.
d
Reporting behavior from WHO COVID-19 public data
search.dataone.org
data.niaid.nih.gov
+1more
Updated Jul 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Auss Abbood (2025). Reporting behavior from WHO COVID-19 public data [Dataset]. http://doi.org/10.5061/dryad.9s4mw6mmb
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.9s4mw6mmb
Dataset updated
Jul 14, 2025
Dataset provided by
Dryad Digital Repository
Authors
Auss Abbood
Time period covered
Dec 16, 2022
Description
Objective Daily COVID-19 data reported by the World Health Organization (WHO) may provide the basis for political ad hoc decisions including travel restrictions. Data reported by countries, however, is heterogeneous and metrics to evaluate its quality are scarce. In this work, we analyzed COVID-19 case counts provided by WHO and developed tools to evaluate country-specific reporting behaviors. Methods In this retrospective cross-sectional study, COVID-19 data reported daily to WHO from 3rd January 2020 until 14th June 2021 were analyzed. We proposed the concepts of binary reporting rate and relative reporting behavior and performed descriptive analyses for all countries with these metrics. We developed a score to evaluate the consistency of incidence and binary reporting rates. Further, we performed spectral clustering of the binary reporting rate and relative reporting behavior to identify salient patterns in these metrics. Results Our final analysis included 222 countries and regions...., Data collection COVID-19 data was downloaded from WHO. Using a public repository, we have added the countries' full names to the WHO data set using the two-letter abbreviations for each country to merge both data sets. The provided COVID-19 data covers January 2020 until June 2021. We uploaded the final data set used for the analyses of this paper. Data processing We processed data using a Jupyter Notebook with a Python kernel and publically available external libraries. This upload contains the required Jupyter Notebook (reporting_behavior.ipynb) with all analyses and some additional work, a README, and the conda environment yml (env.yml)., Any text editor including Microsoft Excel and their free alternatives can open the uploaded CSV file. Any web browser and some code editors (like the freely available Visual Studio Code) can show the uploaded Jupyter Notebook if the required Python environment is set up correctly.
Python use cases globally 2022
statista.com
Updated Jul 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Python use cases globally 2022 [Dataset]. https://www.statista.com/statistics/1338409/python-use-cases/
Explore at:
Dataset updated
Jul 11, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Oct 2022 - Dec 2022
Area covered
Worldwide
Description
Python has become one of the most popular programming languages, with a wide variety of use cases. In 2022, Python is most used for web development and data analysis, with ** percent and ** percent respectively.
4
Data processing code of the HU method to assess wave dissipation capacity of...
data.4tu.nl
zip
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhan Hu, Data processing code of the HU method to assess wave dissipation capacity of coastal and estuarine forested wetlands [Dataset]. http://doi.org/10.4121/bbb3cf31-043a-40a4-9bbc-42c599874db3.v5
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/bbb3cf31-043a-40a4-9bbc-42c599874db3.v5
Dataset provided by
4TU.ResearchData
Authors
Zhan Hu
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Area covered

Description
Data processing code of HU method, which applies Ursell number as an index for the assessment of wave dissipation capacity by coastal and estuarine forested wetlands. The data processing procedure can be done with Python or MATLAB scripts, with a readme file. Before using the scripts, the users need to prepare measured data of significant wave height, water depth and peak wave period from field measurements or lab experiments. The distance between each station should also be prepared. An instruction of setting up field measuring transects and mechanism of the HU method can be find in the deposited file 'practice workflow.docx'. The Python codes are supported by Zhitong Jiang from School of Ecology, Environment and Resources, Guangdong University of Technology.

Facebook

Twitter

Click to copy link

Link copied

Cite

Elizabeth Szkirpan (2022). Python Codes for Data Analysis of The Impact of COVID-19 on Technical Services Units Survey Results [Dataset]. http://doi.org/10.6084/m9.figshare.20416092.v1

Python Codes for Data Analysis of The Impact of COVID-19 on Technical Services Units Survey Results

Explore at:

Unique identifier

https://doi.org/10.6084/m9.figshare.20416092.v1

Dataset updated

Aug 1, 2022

Dataset provided by

Figsharehttp://figshare.com/

Authors

Elizabeth Szkirpan

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Copies of Anaconda 3 Jupyter Notebooks and Python script for holistic and clustered analysis of "The Impact of COVID-19 on Technical Services Units" survey results. Data was analyzed holistically using cleaned and standardized survey results and by library type clusters. To streamline data analysis in certain locations, an off-shoot CSV file was created so data could be standardized without compromising the integrity of the parent clean file. Three Jupyter Notebooks/Python scripts are available in relation to this project: COVID_Impact_TechnicalServices_HolisticAnalysis (a holistic analysis of all survey data) and COVID_Impact_TechnicalServices_LibraryTypeAnalysis (a clustered analysis of impact by library type, clustered files available as part of the Dataverse for this project).

Clear search

Close search

Google apps

Main menu

Python Codes for Data Analysis of The Impact of COVID-19 on Technical...

Using Python Packages and HydroShare to Advance Open Data Science and...

Storage and Transit Time Data and Code

Python codes for STM data analysis

Explore data formats and ingestion methods

Why this Dataset

Iris Dataset

Content

Acknowledgements

Inspiration

Python Compiler Market Report | Global Forecast From 2025 To 2033

Python Compiler Market Outlook

Component Analysis

Data from: Simple Tool for Rapidly Assessing the Quality of Multiplexed...

PyPlant: A Python Framework for Cached Function Pipelines - Dataset - B2FIND...

Data from: spectrum_utils: A Python Package for Mass Spectrometry Data...

Data from: Decoding Wayfinding: Analyzing Wayfinding Processes in the...

How To Cite?

Folder Structure

Setting Up the Environment

Subfolders

1. Data_4_IJGIS

2. results_[DateTime] (e.g., results_20240906_15_00_13)

Python Files

1. helper_functions.py

2. create_sanity_plots.py

3. overlapping_sliding_window_loop.py

4. gaze_features.py & imu_features.py (Note: there has been an update to the IDT function implementation in the gaze_features.py on 19.03.2025.)

5. training_prediction.py

a. Data Preparation (corresponding to Section 5.1.1 of the paper)

b. Training/Validation/Test Split

c. Machine and Deep Learning Experiments

d. Inference (Monitoring Part)

6. sequence_analysis.py

Licenses

How Python Can Work For You

Data Accompanying "CountASAP: A Lightweight, Easy to Use Python Package for...

Techniques for Increased Automation of Aquatic Sensor Data Post Processing...

Datasets for manuscript "A data engineering framework for chemical flow...

Dataset for paper "Mitigating the effect of errors in source parameters on...

Thesis Artifacts for: Analyzing Environmental Data with the Titan Platform

Demo dataset for: SPACEc, a streamlined, interactive Python workflow for...

Reporting behavior from WHO COVID-19 public data

Python use cases globally 2022

Data processing code of the HU method to assess wave dissipation capacity of...

Python Codes for Data Analysis of The Impact of COVID-19 on Technical Services Units Survey Results