cmomy is a python package to calculate central moments and co-moments in a numerical stable and direct way. Behind the scenes, cmomy makes use of Numba to rapidly calculate moments. cmomy provides utilities to calculate central moments from individual samples, precomputed central moments, and precomputed raw moments. It also provides routines to perform bootstrap resampling based on raw data, or precomputed moments. cmomy has numpy array and xarray DataArray interfaces.
Quality assurance and quality control (QAQC) of geochemical data is an important first step before any interpretation of the data is undertaken. Due to the increasing number of elements that are being reported by laboratories undertaking multi-element analysis, the time to undertake QAQC of the data has increased. In order to alleviate the increasing time constraints of undertaking QAQC this python script was developed. This script provides a quick first pass of the data automatically to produce summary statistics and plots of the included standards laboratory duplicates and analytical duplicates. The statistics and plots allow for rapid assessment of geochemical data to discover potential issues with the data and trends though time, whilst also providing a consistent approach. It should be noted that no general quality cut-offs have been included within the script as it does not replace the need for an expert examining the data to identify potential issues.
The publication "Scientific Data Analysis and Visualisation with Python" delves into various facets of Python programming, with a special focus on data analysis and visualisation. Let us deconstruct the main sections: Examining operators and expressions: The text explores arithmetic, comparison, logic, bitwise, assignment and membership operators. These operators serve as fundamental components in the construction of any Python script. Illustrative real-world scenarios show the practical applications of these operators. For example, arithmetic operators are essential for performing mathematical calculations, while comparison operators facilitate decision-making processes. Discussion of data structures and control flow: The book discusses procedures for input, handling strings, working with lists, dictionaries, loops, and conditional expressions. Scientists and software developers can learn how to manipulate data structures efficiently. In particular, lists and dictionaries play a crucial role in organising and retrieving data. Insight into functions and modularisation: Functions are central to Python programming. The publication offers valuable perspectives on the creation and use of functions. The process of modularisation increases the reusability and maintainability of code. By breaking down complex tasks into smaller functions, developers can improve the understandability of their code. Exploring data with Pandas: The book presents a detailed examination of Pandas, a robust library. Readers will gain skills in loading, manipulating, and analysing data frames. Explain data presentation and visualisation: Effective visualisation is critical to understanding data. The publication introduces matplotlib and other plotting libraries. Scientific researchers and analysts can create powerful visual representations to effectively communicate insights. In summary, this publication serves as a valuable resource for individuals at various levels of Python proficiency, including beginners and experienced users. Whether you are a scientist navigating through data or a developer honing your skills, the comprehensive content in this book will guide you towards mastering Python data analysis and visualisation. The training materials are provided for international learners. However, the following lectures on Python are available on YouTube for both international and Bangladeshi learners. For international learners: https://youtube.com/playlist?list=PL4T8G4Q9_JQ9ci8DAhpizHGQ7IsCZFsKu For Bangladeshi learners: https://youtube.com/playlist?list=PL4T8G4Q9_JQ_byYGwq3FyGhDOFRNdHRL8 My profile: https://researchsociety20.org/founder-and-director/
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Tarun Bisht (From Huggingface) [source]
The python_code_instructions_18k_alpaca dataset is a comprehensive training dataset specifically curated for researchers and developers involved in the analysis and comprehension of Python code instructions. It contains a vast collection of Python code snippets along with their corresponding instruction, input, output, and prompt information. By utilizing this dataset, users can gain valuable insights into various Python programming concepts and techniques.
The dataset is organized into columns to facilitate easy access to the required information. The instruction column holds the specific task or instruction that the Python code snippet is designed to perform. This allows users to understand the purpose or requirement of each code snippet at a glance.
The input column contains all necessary input data or parameters that are required for executing the Python code snippet accurately. These inputs provide context and enable users to comprehend how different variables or values impact the overall functioning of each code snippet.
Likewise, the output column presents expected results or outcomes that should be produced when executing each Python code snippet with its specified input values. This allows for validation and verification purposes, ensuring that each code snippet performs as intended.
In addition to instruction, input, and output details, this dataset also includes prompts. The prompt column provides additional context or information intended to assist users in better understanding the purpose or requirements of each particular Python code snippet.
By leveraging this comprehensive python_code_instructions_18k_alpaca training dataset, researchers and developers can delve into numerous real-world examples of Python programming challenges - helping them enhance their coding skills while gaining invaluable knowledge about effective implementation techniques across various domains
- Code Instruction Analysis: This dataset can be used to analyze different types of Python code instructions and identify patterns or common practices. Researchers or developers can use this dataset to gain insights into effective ways of writing code instructions.
- Code Output Prediction: With the given input and instruction, this dataset can be used to train models for predicting the expected output of a Python code snippet. This can be useful in automating the testing process or verifying the correctness of the code.
- Prompt Generation: Developers often struggle with providing clear and concise prompts for their code snippets. This dataset can serve as a resource for generating prompts by analyzing existing examples and extracting key information or requirements from them
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: train.csv | Column name | Description | |:----------------|:------------------------------------------------------------------------------------------------------------------| | instruction | Specific tasks or instructions assigned to each Python code snippet. (Text) | | input | The input data or parameters required for executing the code instruction. (Text) | | output | The expected result or output that should be produced when executing the code instruction. (Text) | | prompt | Additional information or context to help understand the purpose or requirements of each code instruction. (Text) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Tarun Bisht (From Huggingface).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository complements the identically titled paper submitted to WAMTA 2025 and allows to reproduce the published results. For a more description please consider the README.md file.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This archive reproduces a figure titled "Figure 3.2 Boone County population distribution" from Wang and vom Hofe (2007, p.60). The archive provides a Jupyter Notebook that uses Python and can be run in Google Colaboratory. The workflow uses the Census API to retrieve data, reproduce the figure, and ensure reproducibility for anyone accessing this archive.The Python code was developed in Google Colaboratory, or Google Colab for short, which is an Integrated Development Environment (IDE) of JupyterLab and streamlines package installation, code collaboration, and management. The Census API is used to obtain population counts from the 2000 Decennial Census (Summary File 1, 100% data). Shapefiles are downloaded from the TIGER/Line FTP Server. All downloaded data are maintained in the notebook's temporary working directory while in use. The data and shapefiles are stored separately with this archive. The final map is also stored as an HTML file.The notebook features extensive explanations, comments, code snippets, and code output. The notebook can be viewed in a PDF format or downloaded and opened in Google Colab. References to external resources are also provided for the various functional components. The notebook features code that performs the following functions:install/import necessary Python packagesdownload the Census Tract shapefile from the TIGER/Line FTP Serverdownload Census data via CensusAPI manipulate Census tabular data merge Census data with TIGER/Line shapefileapply a coordinate reference systemcalculate land area and population densitymap and export the map to HTMLexport the map to ESRI shapefileexport the table to CSVThe notebook can be modified to perform the same operations for any county in the United States by changing the State and County FIPS code parameters for the TIGER/Line shapefile and Census API downloads. The notebook can be adapted for use in other environments (i.e., Jupyter Notebook) as well as reading and writing files to a local or shared drive, or cloud drive (i.e., Google Drive).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset of Python projects used for the study of code change patterns and their automation. The dataset lists 120 projects, divided into four domains — Web, Media, Data, and ML+DL.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This artifact accompanies the SEET@ICSE article "Assessing the impact of hints in learning formal specification", which reports on a user study to investigate the impact of different types of automated hints while learning a formal specification language, both in terms of immediate performance and learning retention, but also in the emotional response of the students. This research artifact provides all the material required to replicate this study (except for the proprietary questionnaires passed to assess the emotional response and user experience), as well as the collected data and data analysis scripts used for the discussion in the paper.
Dataset
The artifact contains the resources described below.
Experiment resources
The resources needed for replicating the experiment, namely in directory experiment:
alloy_sheet_pt.pdf: the 1-page Alloy sheet that participants had access to during the 2 sessions of the experiment. The sheet was passed in Portuguese due to the population of the experiment.
alloy_sheet_en.pdf: a version the 1-page Alloy sheet that participants had access to during the 2 sessions of the experiment translated into English.
docker-compose.yml: a Docker Compose configuration file to launch Alloy4Fun populated with the tasks in directory data/experiment for the 2 sessions of the experiment.
api and meteor: directories with source files for building and launching the Alloy4Fun platform for the study.
Experiment data
The task database used in our application of the experiment, namely in directory data/experiment:
Model.json, Instance.json, and Link.json: JSON files with to populate Alloy4Fun with the tasks for the 2 sessions of the experiment.
identifiers.txt: the list of all (104) available participant identifiers that can participate in the experiment.
Collected data
Data collected in the application of the experiment as a simple one-factor randomised experiment in 2 sessions involving 85 undergraduate students majoring in CSE. The experiment was validated by the Ethics Committee for Research in Social and Human Sciences of the Ethics Council of the University of Minho, where the experiment took place. Data is shared the shape of JSON and CSV files with a header row, namely in directory data/results:
data_sessions.json: data collected from task-solving in the 2 sessions of the experiment, used to calculate variables productivity (PROD1 and PROD2, between 0 and 12 solved tasks) and efficiency (EFF1 and EFF2, between 0 and 1).
data_socio.csv: data collected from socio-demographic questionnaire in the 1st session of the experiment, namely:
participant identification: participant's unique identifier (ID);
socio-demographic information: participant's age (AGE), sex (SEX, 1 through 4 for female, male, prefer not to disclosure, and other, respectively), and average academic grade (GRADE, from 0 to 20, NA denotes preference to not disclosure).
data_emo.csv: detailed data collected from the emotional questionnaire in the 2 sessions of the experiment, namely:
participant identification: participant's unique identifier (ID) and the assigned treatment (column HINT, either N, L, E or D);
detailed emotional response data: the differential in the 5-point Likert scale for each of the 14 measured emotions in the 2 sessions, ranging from -5 to -1 if decreased, 0 if maintained, from 1 to 5 if increased, or NA denoting failure to submit the questionnaire. Half of the emotions are positive (Admiration1 and Admiration2, Desire1 and Desire2, Hope1 and Hope2, Fascination1 and Fascination2, Joy1 and Joy2, Satisfaction1 and Satisfaction2, and Pride1 and Pride2), and half are negative (Anger1 and Anger2, Boredom1 and Boredom2, Contempt1 and Contempt2, Disgust1 and Disgust2, Fear1 and Fear2, Sadness1 and Sadness2, and Shame1 and Shame2). This detailed data was used to compute the aggregate data in data_emo_aggregate.csv and in the detailed discussion in Section 6 of the paper.
data_umux.csv: data collected from the user experience questionnaires in the 2 sessions of the experiment, namely:
participant identification: participant's unique identifier (ID);
user experience data: summarised user experience data from the UMUX surveys (UMUX1 and UMUX2, as a usability metric ranging from 0 to 100).
participants.txt: the list of participant identifiers that have registered for the experiment.
Analysis scripts
The analysis scripts required to replicate the analysis of the results of the experiment as reported in the paper, namely in directory analysis:
analysis.r: An R script to analyse the data in the provided CSV files; each performed analysis is documented within the file itself.
requirements.r: An R script to install the required libraries for the analysis script.
normalize_task.r: A Python script to normalize the task JSON data from file data_sessions.json into the CSV format required by the analysis script.
normalize_emo.r: A Python script to compute the aggregate emotional response in the CSV format required by the analysis script from the detailed emotional response data in the CSV format of data_emo.csv.
Dockerfile: Docker script to automate the analysis script from the collected data.
Setup
To replicate the experiment and the analysis of the results, only Docker is required.
If you wish to manually replicate the experiment and collect your own data, you'll need to install:
A modified version of the Alloy4Fun platform, which is built in the Meteor web framework. This version of Alloy4Fun is publicly available in branch study of its repository at https://github.com/haslab/Alloy4Fun/tree/study.
If you wish to manually replicate the analysis of the data collected in our experiment, you'll need to install:
Python to manipulate the JSON data collected in the experiment. Python is freely available for download at https://www.python.org/downloads/, with distributions for most platforms.
R software for the analysis scripts. R is freely available for download at https://cran.r-project.org/mirrors.html, with binary distributions available for Windows, Linux and Mac.
Usage
Experiment replication
This section describes how to replicate our user study experiment, and collect data about how different hints impact the performance of participants.
To launch the Alloy4Fun platform populated with tasks for each session, just run the following commands from the root directory of the artifact. The Meteor server may take a few minutes to launch, wait for the "Started your app" message to show.
cd experimentdocker-compose up
This will launch Alloy4Fun at http://localhost:3000. The tasks are accessed through permalinks assigned to each participant. The experiment allows for up to 104 participants, and the list of available identifiers is given in file identifiers.txt. The group of each participant is determined by the last character of the identifier, either N, L, E or D. The task database can be consulted in directory data/experiment, in Alloy4Fun JSON files.
In the 1st session, each participant was given one permalink that gives access to 12 sequential tasks. The permalink is simply the participant's identifier, so participant 0CAN would just access http://localhost:3000/0CAN. The next task is available after a correct submission to the current task or when a time-out occurs (5mins). Each participant was assigned to a different treatment group, so depending on the permalink different kinds of hints are provided. Below are 4 permalinks, each for each hint group:
Group N (no hints): http://localhost:3000/0CAN
Group L (error locations): http://localhost:3000/CA0L
Group E (counter-example): http://localhost:3000/350E
Group D (error description): http://localhost:3000/27AD
In the 2nd session, likewise the 1st session, each permalink gave access to 12 sequential tasks, and the next task is available after a correct submission or a time-out (5mins). The permalink is constructed by prepending the participant's identifier with P-. So participant 0CAN would just access http://localhost:3000/P-0CAN. In the 2nd sessions all participants were expected to solve the tasks without any hints provided, so the permalinks from different groups are undifferentiated.
Before the 1st session the participants should answer the socio-demographic questionnaire, that should ask the following information: unique identifier, age, sex, familiarity with the Alloy language, and average academic grade.
Before and after both sessions the participants should answer the standard PrEmo 2 questionnaire. PrEmo 2 is published under an Attribution-NonCommercial-NoDerivatives 4.0 International Creative Commons licence (CC BY-NC-ND 4.0). This means that you are free to use the tool for non-commercial purposes as long as you give appropriate credit, provide a link to the license, and do not modify the original material. The original material, namely the depictions of the diferent emotions, can be downloaded from https://diopd.org/premo/. The questionnaire should ask for the unique user identifier, and for the attachment with each of the depicted 14 emotions, expressed in a 5-point Likert scale.
After both sessions the participants should also answer the standard UMUX questionnaire. This questionnaire can be used freely, and should ask for the user unique identifier and answers for the standard 4 questions in a 7-point Likert scale. For information about the questions, how to implement the questionnaire, and how to compute the usability metric ranging from 0 to 100 score from the answers, please see the original paper:
Kraig Finstad. 2010. The usability metric for user experience. Interacting with computers 22, 5 (2010), 323–327.
Analysis of other applications of the experiment
This section describes how to replicate the analysis of the data collected in an application of the experiment described in Experiment replication.
The analysis script expects data in 4 CSV files,
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Raw data and Python script
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The archived data 'data_figshare.zip' produces figures 3-9 of the paper under review. Data includes .sqlite databases for each run, .json files that describe run starts, run stops, and events. An archive of numpy archives ('numpy_arrays.zip') that store captured time sequences (arrays) is also included, but is a much larger file and is only needed if modified post processing is to be applied on the ADA2200 data. See acquisition and analysis code for more details including information on how to configure databroker to run the analysis code. The attached local_file.yml should be placed into ~/.config/databroker/ and the placeholder 'your_directory' must be modified to point to the data_figshare directory.https://github.com/lucask07/instrbuilder/tree/master/instrbuilder/bluesky_demo/lockin_analysis(no legal or ethical requirements)
There are a suite of powerful open source python libraries that can be used to work with spatial data. Learn how to use geopandas, rasterio and matplotlib to plot and manipulate spatial data in Python.
Environmental DNA (eDNA) water samples were collected at 15 tree islands containing wading bird breeding colonies (order Pelecaniformes) and 15 empty control islands in the central Everglades of Florida in spring of 2017 (April through June) and analyzed for the presence of eDNA from invasive Burmese pythons (Python bivittatus). The Burmese python is now established as a breeding population throughout south Florida, USA. Pythons can consume large quantities of prey and may be a particular threat to wading bird breeding colonies in the Everglades. To quantify python occupancy rates at tree islands where wading birds breed, we utilized environmental DNA (eDNA) analysis—a genetic tool which detects shed DNA in water samples and provides high detection probabilities compared to traditional survey methods. We fitted multi-scale Bayesian occupancy models to test the prediction that Burmese pythons occupy islands with wading bird colonies in the central Everglades at higher rates compared to representative control islands in the same region containing no breeding birds.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Author: Andrew J. Felton
Date: 10/29/2024
This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis, and figure production for the study entitled:
"Global estimates of the storage and transit time of water through vegetation"
Please note that 'turnover' and 'transit' are used interchangeably. Also please note that this R project has been updated multiple times as the analysis has updated.
Data information:
The data folder contains key data sets used for analysis. In particular:
"data/turnover_from_python/updated/august_2024_lc/" contains the core datasets used in this study including global arrays summarizing five year (2016-2020) averages of mean (annual) and minimum (monthly) transit time, storage, canopy transpiration, and number of months of data able as both an array (.nc) or data table (.csv). These data were produced in python using the python scripts found in the "supporting_code" folder. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here. The "supporting_data"" folder also contains annual (2016-2020) MODIS land cover data used in the analysis and contains separate filters containing the original data (.hdf) and then the final process (filtered) data in .nc format. The resulting annual land cover distributions were used in the pre-processing of data in python.
#Code information
Python scripts can be found in the "supporting_code" folder.
Each R script in this project has a role:
"01_start.R": This script sets the working directory, loads in the tidyverse package (the remaining packages in this project are called using the `::` operator), and can run two other scripts: one that loads the customized functions (02_functions.R) and one for importing and processing the key dataset for this analysis (03_import_data.R).
"02_functions.R": This script contains custom functions. Load this using the
`source()` function in the 01_start.R script.
"03_import_data.R": This script imports and processes the .csv transit data. It joins the mean (annual) transit time data with the minimum (monthly) transit data to generate one dataset for analysis: annual_turnover_2. Load this using the
`source()` function in the 01_start.R script.
"04_figures_tables.R": This is the main workhouse for figure/table production and
supporting analyses. This script generates the key figures and summary statistics
used in the study that then get saved in the manuscript_figures folder. Note that all
maps were produced using Python code found in the "supporting_code"" folder.
"supporting_generate_data.R": This script processes supporting data used in the analysis, primarily the varying ground-based datasets of leaf water content.
"supporting_process_land_cover.R": This takes annual MODIS land cover distributions and processes them through a multi-step filtering process so that they can be used in preprocessing of datasets in python.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Here, we develop and show the use of an open-source Python library to control commercial potentiostats. It standardizes the commands for different potentiostat models, opening the possibility to perform automated experiments independently of the instrument used. At the time of this writing, we have included potentiostats from CH Instruments (models 1205B, 1242B, 601E, and 760E) and PalmSens (model Emstat Pico), although the open-source nature of the library allows for more to be included in the future. To showcase the general workflow and implementation of a real experiment, we have automated the Randles–Ševčı́k methodology to determine the diffusion coefficient of a redox-active species in solution using cyclic voltammetry. This was accomplished by writing a Python script that includes data acquisition, data analysis, and simulation. The total run time was 1 min and 40 s, well below the time it would take even an experienced electrochemist to apply the methodology in a traditional manner. Our library has potential applications that expand beyond the automation of simple repetitive tasks; for example, it can interface with peripheral hardware and well-established third-party Python libraries as part of a more complex and intelligent setup that relies on laboratory automation, advanced optimization, and machine learning.
This resource contains a video recording for a presentation given as part of the National Water Quality Monitoring Council conference in April 2021. The presentation covers the motivation for performing quality control for sensor data, the development of PyHydroQC, a Python package with functions for automating sensor quality control including anomaly detection and correction, and the performance of the algorithms applied to data from multiple sites in the Logan River Observatory.
The initial abstract for the presentation: Water quality sensors deployed to aquatic environments make measurements at high frequency and commonly include artifacts that do not represent the environmental phenomena targeted by the sensor. Sensors are subject to fouling from environmental conditions, often exhibit drift and calibration shifts, and report anomalies and erroneous readings due to issues with datalogging, transmission, and other unknown causes. The suitability of data for analyses and decision making often depend on subjective and time-consuming quality control processes consisting of manual review and adjustment of data. Data driven and machine learning techniques have the potential to automate identification and correction of anomalous data, streamlining the quality control process. We explored documented approaches and selected several for implementation in a reusable, extensible Python package designed for anomaly detection for aquatic sensor data. Implemented techniques include regression approaches that estimate values in a time series, flag a point as anomalous if the difference between the sensor measurement exceeds a threshold, and offer replacement values for correcting anomalies. Additional algorithms that scaffold the central regression approaches include rules-based preprocessing, thresholds for determining anomalies that adjust with data variability, and the ability to detect and correct anomalies using forecasted and backcasted estimation. The techniques were developed and tested based on several years of data from aquatic sensors deployed at multiple sites in the Logan River Observatory in northern Utah, USA. Performance was assessed based on labels and corrections applied previously by trained technicians. In this presentation, we describe the techniques for detection and correction, report their performance, illustrate the workflow for applying to high frequency aquatic sensor data, and demonstrate the possibility for additional approaches to help increase automation of aquatic sensor data post processing.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
We develop an open-source Python-based Parameter Estimation Tool utilizing Bayesian Optimization (petBOA) with a unique wrapper interface for gradient-free parameter estimation of expensive black-box kinetic models. We provide examples for Python macrokinetic and microkinetic modeling (MKM) tools, such as Cantera and OpenMKM. petBOA leverages surrogate Gaussian processes to approximate and minimize the objective function designed for parameter estimation. Bayesian Optimization (BO) is implemented using the open-source BoTorch toolkit. petBOA employs local and global sensitivity analyses to identify important parameters optimized against experimental data, and leverages pMuTT for consistent kinetic and thermodynamic parameters while perturbing species binding energies within the typical error of conventional DFT exchange-correlation functionals (20-30 kJ/mol). The source code and documentation are hosted on GitHub.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Replication Package for A Study on the Pythonic Functional Constructs' Understandability
This package contains several folders and files with code and data used in the study.
examples/
Contains the code snippets used as objects of the study, named as reported in Table 1, summarizing the experiment design.
RQ1-RQ2-files-for-statistical-analysis/
Contains three .csv files used as input for conducting the statistical analysis and drawing the graphs for addressing the first two research questions of the study. Specifically:
- ConstructUsage.csv contains the declared frequency usage of the three functional constructs object of the study. This file is used to draw Figure 4.
- RQ1.csv contains the collected data used for the mixed-effect logistic regression relating the use of functional constructs with the correctness of the change task, and the logistic regression relating the use of map/reduce/filter functions with the correctness of the change task.
- RQ1Paired-RQ2.csv contains the collected data used for the ordinal logistic regression of the relationship between the perceived ease of understanding of the functional constructs and (i) participants' usage frequency, and (ii) constructs' complexity (except for map/reduce/filter).
inter-rater-RQ3-files/
Contains four .csv files used as input for computing the inter-rater agreement for the manual labeling used for addressing RQ3. Specifically, you will find one file for each functional construct, i.e., comprehension.csv, lambda.csv, and mrf.csv, and a different file used for highlighting the reasons why participants prefer to use the procedural paradigm, i.e., procedural.csv.
Questionnaire-Example.pdf
This file contains the questionnaire submitted to one of the ten experimental groups within our controlled experiment. Other questionnaires are similar, except for the code snippets used for the first section, i.e., change tasks, and the second section, i.e., comparison tasks.
RQ2ManualValidation.csv
This file contains the results of the manual validation being done to sanitize the answers provided by our participants used for addressing RQ2. Specifically, we coded the behavior description using four different levels: (i) correct, (ii) somewhat correct, (iii) wrong, and (iv) automatically generated.
RQ3ManualValidation.xlsx
This file contains the results of the open coding applied to address our third research question. Specifically, you will find four sheets, one for each functional construct and one for the procedural paradigm. For each sheet, you will find the provided answers together with the categories assigned to them.
Appendix.pdf
This file contains the results of the logistic regression relating the use of map, filter, and reduce functions with the correctness of the change task, not shown in the paper.
FuncConstructs-Statistics.r
This file contains an R script that you can reuse to re-run all the analyses conducted and discussed in the paper.
FuncConstructs-Statistics.ipynb
This file contains the code to re-execute all the analysis conducted in the paper as a notebook.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Multiplexed imaging technologies provide insights into complex tissue architectures. However, challenges arise due to software fragmentation with cumbersome data handoffs, inefficiencies in processing large images (8 to 40 gigabytes per image), and limited spatial analysis capabilities. To efficiently analyze multiplexed imaging data, we developed SPACEc, a scalable end-to-end Python solution, that handles image extraction, cell segmentation, and data preprocessing and incorporates machine-learning-enabled, multi-scaled, spatial analysis, operated through a user-friendly and interactive interface. The demonstration dataset was derived from a previous analysis and contains TMA cores from a human tonsil and tonsillitis sample that were acquired with the Akoya PhenocyclerFusion platform. The dataset can be used to test the workflow and establish it on a user’s system or to familiarize oneself with the pipeline. Methods Tissue samples: Tonsil cores were extracted from a larger multi-tumor tissue microarray (TMA), which included a total of 66 unique tissues (51 malignant and semi-malignant tissues, as well as 15 non-malignant tissues). Representative tissue regions were annotated on corresponding hematoxylin and eosin (H&E)-stained sections by a board-certified surgical pathologist (S.Z.). Annotations were used to generate the 66 cores each with cores of 1mm diameter. FFPE tissue blocks were retrieved from the tissue archives of the Institute of Pathology, University Medical Center Mainz, Germany, and the Department of Dermatology, University Medical Center Mainz, Germany. The multi-tumor-TMA block was sectioned at 3µm thickness onto SuperFrost Plus microscopy slides before being processed for CODEX multiplex imaging as previously described. CODEX multiplexed imaging and processing To run the CODEX machine, the slide was taken from the storage buffer and placed in PBS for 10 minutes to equilibrate. After drying the PBS with a tissue, a flow cell was sealed onto the tissue slide. The assembled slide and flow cell were then placed in a PhenoCycler Buffer made from 10X PhenoCycler Buffer & Additive for at least 10 minutes before starting the experiment. A 96-well reporter plate was prepared with each reporter corresponding to the correct barcoded antibody for each cycle, with up to 3 reporters per cycle per well. The fluorescence reporters were mixed with 1X PhenoCycler Buffer, Additive, nuclear-staining reagent, and assay reagent according to the manufacturer's instructions. With the reporter plate and assembled slide and flow cell placed into the CODEX machine, the automated multiplexed imaging experiment was initiated. Each imaging cycle included steps for reporter binding, imaging of three fluorescent channels, and reporter stripping to prepare for the next cycle and set of markers. This was repeated until all markers were imaged. After the experiment, a .qptiff image file containing individual antibody channels and the DAPI channel was obtained. Image stitching, drift compensation, deconvolution, and cycle concatenation are performed within the Akoya PhenoCycler software. The raw imaging data output (tiff, 377.442nm per pixel for 20x CODEX) is first examined with QuPath software (https://qupath.github.io/) for inspection of staining quality. Any markers that produce unexpected patterns or low signal-to-noise ratios should be excluded from the ensuing analysis. The qptiff files must be converted into tiff files for input into SPACEc. Data preprocessing includes image stitching, drift compensation, deconvolution, and cycle concatenation performed using the Akoya Phenocycler software. The raw imaging data (qptiff, 377.442 nm/pixel for 20x CODEX) files from the Akoya PhenoCycler technology were first examined with QuPath software (https://qupath.github.io/) to inspect staining qualities. Markers with untenable patterns or low signal-to-noise ratios were excluded from further analysis. A custom CODEX analysis pipeline was used to process all acquired CODEX data (scripts available upon request). The qptiff files were converted into tiff files for tissue detection (watershed algorithm) and cell segmentation.
Want to keep the data in your Hosted Feature Service current? Not interested in writing a lot of code?Leverage this Python Script from the command line, Windows Scheduled Task, or from within your own code to automate the replacement of data in an existing Hosted Feature Service. It can also be leveraged by your Notebook environment and automatically managed by the MNCD Tool!See the Sampler Notebook that features the OverwriteFS tool run from Online to update a Feature Service. It leverages MNCD to cache the OverwriteFS script for import to the Notebook. A great way to jump start your Feature Service update workflow! RequirementsPython v3.xArcGIS Python APIStored Connection Profile, defined by Python API 'GIS' module. Also accepts 'pro', to specify using the active ArcGIS Pro connection. Will require ArcGIS Pro and Arcpy!Pre-Existing Hosted Feature ServiceCapabilitiesOverwrite a Feature Service, refreshing the Service Item and DataBackup and reapply Service, Layer, and Item properties - New at v2.0.0Manage Service to Service or Service to Data relationships - New at v2.0.0Repair Lost Service File Item to Service Relationships, re-enabling Service Overwrite - New at v2.0.0'Swap Layer' capability for Views, allowing two Services to support a View, acting as Active and Idle role during Updates - New at v2.0.0Data Conversion capability, able to invoke following a download and before Service update - New at v2.0.0Includes 'Rss2Json' Conversion routine, able to read a RSS or GeoRSS source and generate GeoJson for Service Update - New at v2.0.0Renamed 'Rss2Json' to 'Xml2GeoJSON' for its enhanced capabilities, 'Rss2Json' remains for compatability - Revised at v2.1.0Added 'Json2GeoJSON' Conversion routine, able to read and manipulate Json or GeoJSON data for Service Updates - New at v2.1.0Can update other File item types like PDF, Word, Excel, and so on - New at v2.1.0Supports ArcGIS Python API v2.0 - New at v2.1.2RevisionsSep 29, 2021: Long awaited update to v2.0.0!Sep 30, 2021: v2.0.1, Patch to correct Outcome Status when download or Coversion resulted in no change. Also updated documentation.Oct 7, 2021: v2.0.2, workflow Patch correcting Extent update of Views when Overwriting Service, discovered following recent ArcGIS Online update. Enhancements to 'datetimeUtil' Support script.Nov 30, 2021: v2.1.0, added new 'Json2GeoJSON' Converter, enhanced 'Xml2GeoJSON' Converter, retired 'Rss2Json' Converter, added new Option Switches 'IgnoreAge' and 'UpdateTarget' for source age control and QA/QC workflows, revised Optimization logic and CRC comparison on downloads.Dec 1, 2021: v2.1.1, Only a patch to Conversion routines: Corrected handling of null Z-values in Geometries (discovered immediately following release 2.1.0), improve error trapping while processing rows, and added deprecation message to retired 'Rss2Json' conversion routine.Feb 22, 2022: v2.1.2, Patch to detect and re-apply case-insensitive field indexes. Update to allow Swapping Layers to Service without an associated file item. Added cache refresh following updates. Patch to support Python API 2.0 service 'table' property. Patches to 'Json2GeoJSON' and 'Xml2GeoJSON' converter routines.Sep 5, 2024: v2.1.4, Patch service manager refresh failure issue. Added trace report to Convert execution on exception. Set 'ignore-DataItemCheck' property to True when 'GetTarget' action initiated. Hardened Async job status check. Update 'overwriteFeatureService' to support GeoPackage type and file item type when item.name includes a period, updated retry loop to try one final overwrite after del, fixed error stop issue on failed overwrite attempts. Removed restriction on uploading files larger than 2GB. Restores missing 'itemInfo' file on service File items. Corrected false swap success when view has no layers. Lifted restriction of Overwrite/Swap Layers for OGC. Added 'serviceDescription' to service detail backup. Added 'thumbnail' to item backup/restore logic. Added 'byLayerOrder' parameter to 'swapFeatureViewLayers'. Added 'SwapByOrder' action switch. Patch added to overwriteFeatureService 'status' check. Patch for June 2024 update made to 'managers.overwrite' API script that blocks uploads > 25MB, API v2.3.0.3. Patch 'overwriteFeatureService' to correctly identify overwrite file if service has multiple Service2Data relationships.Includes documentation updates!
For the automated workflows, we create Jupyter notebooks for each state. In these workflows, GIS processing to merge, extract and project GeoTIFF data was the most important process. For this process, we used ArcPy which is a python package to perform geographic data analysis, data conversion, and data management in ArcGIS (Toms, 2015). After creating state-scale LSS datasets in GeoTIFF format, we convert GeoTIFF to NetCDF using xarray and rioxarray Python packages. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. We used xarray to manipulate data type and add metadata in NetCDF file and rioxarray to save GeoTIFF to NetCDF format. Through these procedures, we created three composite HyddroShare resources to share state-scale LSS datasets. Due to the limitation of ArcGIS Pro license which is a commercial GIS software, we developed this Jupyter notebook on Windows OS.
cmomy is a python package to calculate central moments and co-moments in a numerical stable and direct way. Behind the scenes, cmomy makes use of Numba to rapidly calculate moments. cmomy provides utilities to calculate central moments from individual samples, precomputed central moments, and precomputed raw moments. It also provides routines to perform bootstrap resampling based on raw data, or precomputed moments. cmomy has numpy array and xarray DataArray interfaces.