34 datasets found

Functions & Loops in Python
kaggle.com
zip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sadique Khan (2023). Functions & Loops in Python [Dataset]. https://www.kaggle.com/datasets/sadiquekhann/functions-and-loops-in-python/discussion?sort=undefined
Explore at:
zip(5790 bytes)Available download formats
Dataset updated
May 31, 2023
Authors
Sadique Khan
Description
Dataset

This dataset was created by Sadique Khan

Contents
t
ESA CCI SM GAPFILLED Long-term Climate Data Record of Surface Soil Moisture...
researchdata.tuwien.at
researchdata.tuwien.ac.at
zip
Updated Sep 5, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wolfgang Preimesberger; Wolfgang Preimesberger; Pietro Stradiotti; Pietro Stradiotti; Wouter Arnoud Dorigo; Wouter Arnoud Dorigo (2025). ESA CCI SM GAPFILLED Long-term Climate Data Record of Surface Soil Moisture from merged multi-satellite observations [Dataset]. http://doi.org/10.48436/3fcxr-cde10
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.48436/3fcxr-cde10
Dataset updated
Sep 5, 2025
Dataset provided by
TU Wien
Authors
Wolfgang Preimesberger; Wolfgang Preimesberger; Pietro Stradiotti; Pietro Stradiotti; Wouter Arnoud Dorigo; Wouter Arnoud Dorigo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was produced with funding from the European Space Agency (ESA) Climate Change Initiative (CCI) Plus Soil Moisture Project (CCN 3 to ESRIN Contract No: 4000126684/19/I-NB "ESA CCI+ Phase 1 New R&D on CCI ECVS Soil Moisture"). Project website: https://climate.esa.int/en/projects/soil-moisture/

This dataset contains information on the Surface Soil Moisture (SM) content derived from satellite observations in the microwave domain.

Dataset Paper (Open Access)

A description of this dataset, including the methodology and validation results, is available at:

Preimesberger, W., Stradiotti, P., and Dorigo, W.: ESA CCI Soil Moisture GAPFILLED: an independent global gap-free satellite climate data record with uncertainty estimates, Earth Syst. Sci. Data, 17, 4305–4329, https://doi.org/10.5194/essd-17-4305-2025, 2025.

Abstract

ESA CCI Soil Moisture is a multi-satellite climate data record that consists of harmonized, daily observations coming from 19 satellites (as of v09.1) operating in the microwave domain. The wealth of satellite information, particularly over the last decade, facilitates the creation of a data record with the highest possible data consistency and coverage.
However, data gaps are still found in the record. This is particularly notable in earlier periods when a limited number of satellites were in operation, but can also arise from various retrieval issues, such as frozen soils, dense vegetation, and radio frequency interference (RFI). These data gaps present a challenge for many users, as they have the potential to obscure relevant events within a study area or are incompatible with (machine learning) software that often relies on gap-free inputs.
Since the requirement of a gap-free ESA CCI SM product was identified, various studies have demonstrated the suitability of different statistical methods to achieve this goal. A fundamental feature of such gap-filling method is to rely only on the original observational record, without need for ancillary variable or model-based information. Due to the intrinsic challenge, there was until present no global, long-term univariate gap-filled product available. In this version of the record, data gaps due to missing satellite overpasses and invalid measurements are filled using the Discrete Cosine Transform (DCT) Penalized Least Squares (PLS) algorithm (Garcia, 2010). A linear interpolation is applied over periods of (potentially) frozen soils with little to no variability in (frozen) soil moisture content. Uncertainty estimates are based on models calibrated in experiments to fill satellite-like gaps introduced to GLDAS Noah reanalysis soil moisture (Rodell et al., 2004), and consider the gap size and local vegetation conditions as parameters that affect the gapfilling performance.

Summary

Gap-filled global estimates of volumetric surface soil moisture from 1991-2023 at 0.25° sampling

Fields of application (partial): climate variability and change, land-atmosphere interactions, global biogeochemical cycles and ecology, hydrological and land surface modelling, drought applications, and meteorology

Method: Modified version of DCT-PLS (Garcia, 2010) interpolation/smoothing algorithm, linear interpolation over periods of frozen soils. Uncertainty estimates are provided for all data points.

More information: See Preimesberger et al. (2025) and https://doi.org/10.5281/zenodo.8320869" target="_blank" rel="noopener">ESA CCI SM Algorithm Theoretical Baseline Document [Chapter 7.2.9] (Dorigo et al., 2023)

Programmatic Download

You can use command line tools such as wget or curl to download (and extract) data for multiple years. The following command will download and extract the complete data set to the local directory ~/Download on Linux or macOS systems.

#!/bin/bash

# Set download directory
DOWNLOAD_DIR=~/Downloads

base_url="https://researchdata.tuwien.at/records/3fcxr-cde10/files"

# Loop through years 1991 to 2023 and download & extract data
for year in {1991..2023}; do
echo "Downloading $year.zip..."
wget -q -P "$DOWNLOAD_DIR" "$base_url/$year.zip"
unzip -o "$DOWNLOAD_DIR/$year.zip" -d $DOWNLOAD_DIR
rm "$DOWNLOAD_DIR/$year.zip"
done

Data details

The dataset provides global daily estimates for the 1991-2023 period at 0.25° (~25 km) horizontal grid resolution. Daily images are grouped by year (YYYY), each subdirectory containing one netCDF image file for a specific day (DD), month (MM) in a 2-dimensional (longitude, latitude) grid system (CRS: WGS84). The file name has the following convention:

ESACCI-SOILMOISTURE-L3S-SSMV-COMBINED_GAPFILLED-YYYYMMDD000000-fv09.1r1.nc

Data Variables

Each netCDF file contains 3 coordinate variables (WGS84 longitude, latitude and time stamp), as well as the following data variables:

sm: (float) The Soil Moisture variable reflects estimates of daily average volumetric soil moisture content (m3/m3) in the soil surface layer (~0-5 cm) over a whole grid cell (0.25 degree).

sm_uncertainty: (float) The Soil Moisture Uncertainty variable reflects the uncertainty (random error) of the original satellite observations and of the predictions used to fill observation data gaps.

sm_anomaly: Soil moisture anomalies (reference period 1991-2020) derived from the gap-filled values (`sm`)

sm_smoothed: Contains DCT-PLS predictions used to fill data gaps in the original soil moisture field. These values are also provided for cases where an observation was initially available (compare `gapmask`). In this case, they provided a smoothed version of the original data.

gapmask: (0 | 1) Indicates grid cells where a satellite observation is available (1), and where the interpolated (smoothed) values are used instead (0) in the 'sm' field.

frozenmask: (0 | 1) Indicates grid cells where ERA5 soil temperature is <0 °C. In this case, a linear interpolation over time is applied.

Additional information for each variable is given in the netCDF attributes.

Version Changelog

Changes in v9.1r1 (previous version was v09.1):

This version uses a novel uncertainty estimation scheme as described in Preimesberger et al. (2025).

Software to open netCDF files

These data can be read by any software that supports Climate and Forecast (CF) conform metadata standards for netCDF files, such as:

https://github.com/pydata/xarray" target="_blank" rel="noopener">Xarray (python)

https://unidata.github.io/netcdf4-python/" target="_blank" rel="noopener">netCDF4 (python)

https://github.com/TUW-GEO/esa_cci_sm">esa_cci_sm (python)

Similar tools exists for other programming languages (Matlab, R, etc.)

Software packages and GIS tools can open netCDF files, e.g. CDO, NCO, QGIS, ArCGIS

You can also use the GUI software Panoply to view the contents of each file

References

Preimesberger, W., Stradiotti, P., and Dorigo, W.: ESA CCI Soil Moisture GAPFILLED: an independent global gap-free satellite climate data record with uncertainty estimates, Earth Syst. Sci. Data, 17, 4305–4329, https://doi.org/10.5194/essd-17-4305-2025, 2025.

Dorigo, W., Preimesberger, W., Stradiotti, P., Kidd, R., van der Schalie, R., van der Vliet, M., Rodriguez-Fernandez, N., Madelon, R., & Baghdadi, N. (2023). ESA Climate Change Initiative Plus - Soil Moisture Algorithm Theoretical Baseline Document (ATBD) Supporting Product Version 08.1 (version 1.1). Zenodo. https://doi.org/10.5281/zenodo.8320869

Garcia, D., 2010. Robust smoothing of gridded data in one and higher dimensions with missing values. Computational Statistics & Data Analysis, 54(4), pp.1167-1178. Available at: https://doi.org/10.1016/j.csda.2009.09.020

Rodell, M., Houser, P. R., Jambor, U., Gottschalck, J., Mitchell, K., Meng, C.-J., Arsenault, K., Cosgrove, B., Radakovich, J., Bosilovich, M., Entin, J. K., Walker, J. P., Lohmann, D., and Toll, D.: The Global Land Data Assimilation System, Bulletin of the American Meteorological Society, 85, 381 – 394, https://doi.org/10.1175/BAMS-85-3-381, 2004.

Related Records

The following records are all part of the ESA CCI Soil Moisture science data records community

1
ESA CCI SM MODELFREE Surface Soil Moisture Record
<a href="https://doi.org/10.48436/svr1r-27j77" target="_blank"
a
COCO
datasets.activeloop.ai
huggingface.co
deeplake
Updated Feb 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tsung-Yi Lin (2022). COCO [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/coco-dataset/
Explore at:
deeplakeAvailable download formats
Dataset updated
Feb 5, 2022
Authors
Tsung-Yi Lin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 2014 - Dec 31, 2015
Dataset funded by
Microsoft Research
Description
The COCO dataset is a large dataset of labeled images and annotations. It is a popular dataset for machine learning and artificial intelligence research. The dataset consists of 330,000 images and 500,000 object annotations. The annotations include the bounding boxes of objects in the images, as well as the labels of the objects.
PV-gradient (PVG) tropopause: Time series 1980--2017 in four reanalyses
zenodo.org
data-staging.niaid.nih.gov
zip
Updated Mar 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katharina Turhal; Katharina Turhal (2024). PV-gradient (PVG) tropopause: Time series 1980--2017 in four reanalyses [Dataset]. http://doi.org/10.5281/zenodo.10529153
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10529153
Dataset updated
Mar 27, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Katharina Turhal; Katharina Turhal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PV-gradient tropopause time series

General description

These datasets contain time series of the PV-gradient tropopause (PVG tropopause) introduced by A. Kunz (2011, doi:10.1029/2010JD014343) and calculated by K. Turhal (2024, paper " Variability and Trends in the PVG Tropopause", preprint in EGUsphere: https://doi.org/10.5194/egusphere-2024-471).

Data and methods

The PVG tropopause has been computed by means of the Eddy Tracking Toolkit (developed by J. Clemens and K. Turhal, to be published):

from four reanalyses: ERA5, ERA-Interim, MERRA-2 and JRA-55

for the time range 1980/01/01 -- 2017/12/31 in time steps of the according reanalyses, i.e. four times daily at 00h, 06h, 12h and 18h

on each isentropic level, with potential temperatures (theta) ranging from 320 K to 380 K, in steps of 5 K for ERA5 and 10 K for the other reanalyses.

Contents

Datasets are provided for each year and isentropic level in NetCDF4 format, every file consisting of two groups for the northern and southern hemisphere. Each group contains the following variables, with time as dimension:

time in seconds since 2000/01/01 00:00 UTC

u_lim: Zonal wind speed at the PVG tropopause

vh_lim: Horizontal wind speed at the PVG tropopause

q_lim: Maximum of Q = vh * Grad PV

eqlat_lim: Location of the PVG tropopause in equivalent latitudes

latmean_lim: Location of the PVG tropopause in latitudes

pv_lim: PV value at the PVG tropopause

In this upload, the PVG tropopause time series are included as *.zip files:

ERA5 dataset: "pvg-tp_era5_ts.zip"

ERA-Interim dataset: "pvg-tp_eraint_ts.zip"

MERRA-2 dataset: "pvg-tp_merra2_ts.zip"

JRA-55 dataset: "pvg-tp_jra55_ts.zip"

Plots of time series for each reanalysis of the variables eqlat_lim, latmean_lim and pv_lim: "pvg_tropopause_timeseries_plots.zip".

How to use

The variables in these netCDF files are grouped by hemisphere. To read in the data, specify the group first ("NorthernHemisphere" or "SouthernHemisphere") and then the variable name (see list above). In Python, this can be done as follows:

import netCDF4 as nc file="

If you would like to read in all variables in both hemispheres, you can loop e.g. as follows:

import netCDF4 as nc file = "

Funding

This project has been funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – TRR 301 – Project-ID 428312742, TPChange: The Tropopause Region in a Changing Atmosphere (https://tpchange.de/).
Brain-in-the-Loop Learning for Intelligent Vehicle Decision-Making
figshare.com
zip
Updated May 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiaofei Zhang; Haoyi Zheng; Jun Li; Chaosheng Huang; Hong Wang (2025). Brain-in-the-Loop Learning for Intelligent Vehicle Decision-Making [Dataset]. http://doi.org/10.6084/m9.figshare.27685629.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27685629.v2
Dataset updated
May 7, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
Xiaofei Zhang; Haoyi Zheng; Jun Li; Chaosheng Huang; Hong Wang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The inflexible human-autonomy relationship within autonomous driving scenarios still has not realized deep intelligent synergy, therefore unable to provide adaptive and context-sensitive decision-making and sometimes leading to violation of human preferences or even hazards. In this paper, we utilize functional near-infrared spectroscopy (fNIRS) signals as real-time human risk-perception feedback to establish a brain-in-the-loop (BiTL) trained artificial intelligence algorithm for decision-making. The proposed algorithm uses the result of driving risk reasoning as one input of reinforcement learning combining fNIRS-based risk and driving safety field model-based risk, realizing integrating human brain activity into the reinforcement learning scheme, then overcoming the disadvantage of machine-oriented intelligence that could violate human intentions. To achieve policy learning within limited BiTL training periods, we add two modification features to the proposed algorithm based on TD3. The experiment involving twenty participants has been conducted, and the results show that in continuously high-risk driving scenarios, compared to traditional reinforcement learning algorithms without human participation, the proposed algorithm can maintain a cautious driving policy and avoid potential collisions, validated with both proximal surrogate indicators and success rates. This repository contains the experimental dataset and Python code to reproduce the experimental results used in our research on 'Brain-in-the-Loop Learning for Intelligent Vehicle Decision-Making'. Both human subject studies, control groups and ablation studies data are included in this repository. Detailed description of file organization, data structures, requirements could be found in the README.md document.
t
Dataset for "Fractional Skyrmion Tubes in 3D Magnetic Nanowires"
researchdata.tuwien.ac.at
zip
Updated Apr 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amalio Fernandez-Pacheco Chicon; Naemi Riccarda Leo; Naemi Riccarda Leo; John Fullerton; John Fullerton; Amalio Fernandez-Pacheco Chicon; Amalio Fernandez-Pacheco Chicon; Amalio Fernandez-Pacheco Chicon (2025). Dataset for "Fractional Skyrmion Tubes in 3D Magnetic Nanowires" [Dataset]. http://doi.org/10.48436/3qbnk-w3115
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.48436/3qbnk-w3115
Dataset updated
Apr 15, 2025
Dataset provided by
TU Wien
Authors
Amalio Fernandez-Pacheco Chicon; Naemi Riccarda Leo; Naemi Riccarda Leo; John Fullerton; John Fullerton; Amalio Fernandez-Pacheco Chicon; Amalio Fernandez-Pacheco Chicon; Amalio Fernandez-Pacheco Chicon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
About the dataset

This dataset supports a study where fractional skyrmion tubes were observed in double-helical nanowires fabricated by 3D nano-printing using focused electron beam-induced deposition.

The dataset includes code, images and processed data for reproducing the figures from the associated paper, and is intended to support researchers interested in reproducing the data of the scientific article, including simulations and experiments. For more information about the code and data, please refer to the readme.txt file.

The published preprint can be found here: https://arxiv.org/abs/2412.14069

Data & File Overview

1) Micromagnetic Simulations
Contains Mumax3 files and scripts used to generate simulated data for the publication:

shape.go: Modified Mumax3 source file enabling double-helix geometry.

Double_Helix_Phase_Diagram.mx3: Script for calculating the energetic phase diagram (Fig. 1d), run with varying Msat, arm_Separation, and nm_per_turn_d.

Double_Helix_Two_Vortex_State_Hysteresis_Loop.mx3 + .ovf: Script and initial state for hysteresis loop simulations (Figs. 2e/f/k).

Double_Helix_Two_Vortex_State_Minor_Hysteresis_Loop.mx3 + Hybrid_Vortex-AP_State_MinorLoop.ovf: For minor loop simulations starting from hybrid vortex–AP states (Fig. 3c + Supplementary).

Double_Helix_Two_Vortex_State_Topological_Charge_Variation.mx3 + .ovf: For topological charge variation in two-vortex state (Figs. 4c/d).

Double_Helix_Fractional_Skyrmion_State_Topological_Charge_Variation.mx3 + .ovf: For topological charge variation in fractional Skyrmion state (Figs. 4e/f/g/h).

2) XMCD
Contains original ptychographic XMCD data from SOLEIL (beamtime 20210958, June 2022), processed from CL and CR reconstructions, aligned and normalized. Data saved as .dat arrays and .png images with field values in filenames. Includes metadata in [figure_name]_data_list.csv.

Fig2: Data for Fig. 2; includes Fig2_fitparameters.dat (linecut fit results for experimental loops in Fig. 2j).

SFig4: Data for Fig. 2 and Supplementary Fig. 4.

SFig5-major: Data for Fig. 3 and Supplementary Fig. 5.

SFig5-minor: Data for Fig. 3 and Supplementary Figs. 5 & 6.

SFig9: Data for Supplementary Fig. 9.

3) SEM
Contains an original SEM image of the fabricated double-helix structures used in Fig. 1.

4) TEM
Contains original TEM images of the FEBID Co nanostructure, used in Supplementary Fig. 7.

Requirements

The code can be executed using Python, MATLAB, Paraview and Mumax3, depending on the file.

The images can be opened with any standard image software.

Licenses

The data is licensed under CC-BY, the code is licensed under MIT.
Code Snippets: Insights and Readability
kaggle.com
zip
Updated Feb 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paakhi Maheshwari (2024). Code Snippets: Insights and Readability [Dataset]. https://www.kaggle.com/datasets/paakhim10/code-snippets-insights-and-readability
Explore at:
zip(643750 bytes)Available download formats
Dataset updated
Feb 2, 2024
Authors
Paakhi Maheshwari
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset 1: data_python

Title: Python Code Metrics and Readability Dataset

Description: This dataset provides a comprehensive collection of metrics and readability scores for Python code snippets. Each entry includes information such as the problem title, Python solutions, difficulty level, number of lines, code length, comments, cyclomatic complexity, indents, loop count, line length, identifiers, and readability score. The dataset is designed to facilitate the analysis of coding patterns, complexity, and readability in Python programming.

Columns: - problem_title: Title of the coding problem - python_solutions: Python code solutions for the problem - difficulty: Difficulty level of the coding problem - num_of_lines: Number of lines in the Python code - code_length: Length of the Python code - comments: Number of comments in the code - cyclomatic_complexity: Cyclomatic complexity of the code - indents: Number of indents in the code - loop_count: Count of loops in the code - line_length: Average line length in the code - identifiers: Number of identifiers used in the code - readability: Readability score of the code

Dataset 2: data_cpp

Title: C++ Code Metrics and Readability Dataset

Description: This dataset offers a comprehensive set of metrics and readability scores for C++ code snippets. Each record includes details such as the code itself, number of lines, code length, comments, cyclomatic complexity, number of indents, loop count, line length, identifiers, and readability score. The dataset is crafted to support the exploration of coding styles, complexity, and readability in C++ programming.

Columns: - Answer: C++ code snippet - num_of_lines: Number of lines in the C++ code - code_length: Length of the C++ code - comments: Number of comments in the code - cyclomatic_complexity: Cyclomatic complexity of the code - num_of_indents: Number of indents in the code - loop_count: Count of loops in the code - line_length: Average line length in the code - identifiers: Number of identifiers used in the code - readability: Readability score of the code

These datasets are valuable resources for researchers, educators, and practitioners interested in code analysis, programming styles, and software readability in Python and C++.

All features of the dataset have been generated through coded functions that will be linked in the code file by the author.
d
Data from: Closed Loop Geothermal Working Group: GeoCLUSTER App, Subsurface...
catalog.data.gov
gdr.openei.org
+4more
Updated Jan 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pacific Northwest National Laboratory (2025). Closed Loop Geothermal Working Group: GeoCLUSTER App, Subsurface Simulation Results, and Publications [Dataset]. https://catalog.data.gov/dataset/closed-loop-geothermal-working-group-geocluster-app-subsurface-simulation-results-and-publ-1d377
Explore at:
Dataset updated
Jan 20, 2025
Dataset provided by
Pacific Northwest National Laboratory
Description
To better understand the heat production, electricity generation performance, and economic viability of closed-loop geothermal systems in hot-dry rock, the Closed-Loop Geothermal Working Group -- a consortium of several national labs and academic institutions has tabulated time-dependent numerical solutions and levelized cost results of two popular closed-loop heat exchanger designs (u-tube and co-axial). The heat exchanger designs were evaluated for two working fluids (water and supercritical CO2) while varying seven continuous independent parameters of interest (mass flow rate, vertical depth, horizontal extent, borehole diameter, formation gradient, formation conductivity, and injection temperature). The corresponding numerical solutions (approximately 1.2 million per heat exchanger design) are stored as multi-dimensional HDF5 datasets and can be queried at off-grid points using multi-dimensional linear interpolation. A Python script was developed to query this database and estimate time-dependent electricity generation using an organic Rankine cycle (for water) or direct turbine expansion cycle (for CO2) and perform a cost assessment. This document aims to give an overview of the HDF5 database file and highlights how to read, visualize, and query quantities of interest (e.g., levelized cost of electricity, levelized cost of heat) using the accompanying Python scripts. Details regarding the capital, operation, and maintenance and levelized cost calculation using the techno-economic analysis script are provided. This data submission will contain results from the Closed Loop Geothermal Working Group study that are within the public domain, including publications, simulation results, databases, and computer codes. GeoCLUSTER is a Python-based web application created using Dash, an open-source framework built on top of Flask that streamlines the building of data dashboards. GeoCLUSTER provides users with a collection of interactive methods for streamlining the exploration and visualization of an HDF5 dataset. The GeoCluster app and database are contained in the compressed file geocluster_vx.zip, where the "x" refers to the version number. For example, geocluster_v1.zip is Version 1 of the app. This zip file also contains installation instructions. **To use the GeoCLUSTER app in the cloud, click the link to "GeoCLUSTER on AWS" in the Resources section below. To use the GeoCLUSTER app locally, download the geocluster_vx.zip to your computer and uncompress this file. When uncompressed this file comprises two directories and the geocluster_installation.pdf file. The geo-data app contains the HDF5 database in condensed format, and the GeoCLUSTER directory contains the GeoCLUSTER app in the subdirectory dash_app, as app.py. The geocluster_installation.pdf file provides instructions on installing Python, the needed Python modules, and then executing the app.
h
depyler-citl
huggingface.co
Updated Nov 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Noah Gift (2025). depyler-citl [Dataset]. https://huggingface.co/datasets/paiml/depyler-citl
Explore at:
Dataset updated
Nov 29, 2025
Authors
Noah Gift
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Depyler CITL Corpus

Python→Rust transpilation pairs for Compiler-in-the-Loop training.

Dataset Description

606 Python CLI examples with corresponding Rust translations (where available), designed for training transpiler ML models.

Split Examples With Rust Size

train 606 439 (72.4%) 957 KB

Schema

example_name: str # Directory name (e.g., "example_fibonacci")

python_file: str # Python filename

python_code: str… See the full description on the dataset page: https://huggingface.co/datasets/paiml/depyler-citl.
h
tokenized-github-code-python
huggingface.co
Updated Aug 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Loubna Ben Allal (2023). tokenized-github-code-python [Dataset]. https://huggingface.co/datasets/loubnabnl/tokenized-github-code-python
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 8, 2023
Authors
Loubna Ben Allal
Description
Pretokenized GitHub Code Dataset

Dataset Description

This is a pretokenized version of the Python files of the GitHub Code dataset, that consists of 115M code files from GitHub in 32 programming languages. We tokenized the dataset using BPE Tokenizer trained on code, available in this repo. Having a pretokenized dataset can speed up the training loop by not having to tokenize data at each batch call. We also include ratio_char_token which gives the ratio between the… See the full description on the dataset page: https://huggingface.co/datasets/loubnabnl/tokenized-github-code-python.
t
ESA CCI SM PASSIVE Daily Gap-filled Root-Zone Soil Moisture from merged...
researchdata.tuwien.at
researchdata.tuwien.ac.at
zip
Updated Oct 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wolfgang Preimesberger; Wolfgang Preimesberger; Johanna Lems; Martin Hirschi; Martin Hirschi; Wouter Arnoud Dorigo; Wouter Arnoud Dorigo; Johanna Lems; Johanna Lems; Johanna Lems (2025). ESA CCI SM PASSIVE Daily Gap-filled Root-Zone Soil Moisture from merged multi-satellite observations [Dataset]. http://doi.org/10.48436/8dda4-xne96
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.48436/8dda4-xne96
Dataset updated
Oct 3, 2025
Dataset provided by
TU Wien
Authors
Wolfgang Preimesberger; Wolfgang Preimesberger; Johanna Lems; Martin Hirschi; Martin Hirschi; Wouter Arnoud Dorigo; Wouter Arnoud Dorigo; Johanna Lems; Johanna Lems; Johanna Lems
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset provides global daily estimates of Root-Zone Soil Moisture (RZSM) content at 0.25° spatial grid resolution, derived from gap-filled merged satellite observations of 14 passive satellites sensors operating in the microwave domain of the electromagnetic spectrum. Data is provided from January 1991 to December 2023.

This dataset was produced with funding from the European Space Agency (ESA) Climate Change Initiative (CCI) Plus Soil Moisture Project (CCN 3 to ESRIN Contract No: 4000126684/19/I-NB "ESA CCI+ Phase 1 New R&D on CCI ECVS Soil Moisture"). Project website: https://climate.esa.int/en/projects/soil-moisture/" target="_blank" rel="noopener">https://climate.esa.int/en/projects/soil-moisture/. Operational implementation is supported by the Copernicus Climate Change Service implemented by ECMWF through C3S2 312a/313c.

Studies using this dataset [preprint]

This dataset is used by Hirschi et al. (2025) to assess recent summer drought trends in Switzerland.

Hirschi, M., Michel, D., Schumacher, D. L., Preimesberger, W., and Seneviratne, S. I.: Recent summer soil moisture drying in Switzerland based on measurements from the SwissSMEX network, Earth Syst. Sci. Data Discuss. [preprint], https://doi.org/10.5194/essd-2025-416, in review, 2025.

Abstract

ESA CCI Soil Moisture is a multi-satellite climate data record that consists of harmonized, daily observations from various microwave satellite remote sensing sensors (Dorigo et al., 2017, 2024; Gruber et al., 2019). This version of the dataset uses the PASSIVE record as input, which contains only observations from passive (radiometer) measurements (scaling reference AMSR-E). The surface observations are gap-filled using a univariate interpolation algorithm (Preimesberger et al., 2025). The gap-filled passive observations serve as input for an exponential filter based method to assess soil moisture in different layers of the root-zone of soil (0-200 cm) following the approach by Pasik et al. (2023). The final gap-free root-zone soil moisture estimates based on passive surface input data are provided here at 4 separate depth layers (0-10, 10-40, 40-100, 100-200 cm) over the period 1991-2023.

Summary

Gap-free root-zone soil moisture estimates from 1991-2023 at 0.25° spatial sampling from passive measurements

Fields of application include: climate variability and change, land-atmosphere interactions, global biogeochemical cycles and ecology, hydrological and land surface modelling, drought applications, agriculture and meteorology

More information: See Dorigo et al. (2017, 2024) and Gruber et al. (2019) for a description of the satellite base product and uncertainty estimates, Preimesberger et al. (2025) for the gap-filling, and Pasik et al. (2023) for the root-zone soil moisture and uncertainty propagation algorithm.

Programmatic Download

You can use command line tools such as wget or curl to download (and extract) data for multiple years. The following command will download and extract the complete data set to the local directory ~/Downloads on Linux or macOS systems.

#!/bin/bash

# Set download directory
DOWNLOAD_DIR=~/Downloads

base_url="https://researchdata.tuwien.ac.at/records/8dda4-xne96/files"

# Loop through years 1991 to 2023 and download & extract data
for year in {1991..2023}; do
echo "Downloading $year.zip..."
wget -q -P "$DOWNLOAD_DIR" "$base_url/$year.zip"
unzip -o "$DOWNLOAD_DIR/$year.zip" -d $DOWNLOAD_DIR
rm "$DOWNLOAD_DIR/$year.zip"
done

Data details

The dataset provides global daily estimates for the 1991-2023 period at 0.25° (~25 km) horizontal grid resolution. Daily images are grouped by year (YYYY), each subdirectory containing one netCDF image file for a specific day (DD), month (MM) in a 2-dimensional (longitude, latitude) grid system (CRS: WGS84). The file name has the following convention:

ESA_CCI_PASSIVERZSM-YYYYMMDD000000-fv09.1.nc

Data Variables

Each netCDF file contains 3 coordinate variables (WGS84 longitude, latitude and time stamp), as well as the following data variables:

rzsm_1: (float) Root Zone Soil Moisture at 0-10 cm. Given in volumetric units [m3/m3].

rzsm_2: (float) Root Zone Soil Moisture at 10-40 cm. Given in volumetric units [m3/m3].

rzsm_3: (float) Root Zone Soil Moisture at 40-100 cm. Given in volumetric units [m3/m3].

rzsm_4: (float) Root Zone Soil Moisture at 100-200. Given in volumetric units [m3/m3].

uncertainty_1: (float) Root Zone Soil Moisture uncertainty at 0-10 cm from propagated surface uncertainties [m3/m3].

uncertainty_2: (float) Root Zone Soil Moisture uncertainty at 10-40 cm from propagated surface uncertainties [m3/m3].

uncertainty_3: (float) Root Zone Soil Moisture uncertainty at 40-100 cm from propagated surface uncertainties [m3/m3].

uncertainty_4: (float) Root Zone Soil Moisture uncertainty at 100-200 cm from propagated surface uncertainties [m3/m3].

Additional information for each variable is given in the netCDF attributes.

Version Changelog

v9.1

Initial version based on PASSIVE input data from ESA CCI SM v09.1 as used by Hirschi et al. (2025).

Software to open netCDF files

These data can be read by any software that supports Climate and Forecast (CF) conform metadata standards for netCDF files, such as:

https://github.com/pydata/xarray" target="_blank" rel="noopener">Xarray (python)

https://unidata.github.io/netcdf4-python/" target="_blank" rel="noopener">netCDF4 (python)

https://github.com/TUW-GEO/esa_cci_sm">esa_cci_sm (python)

Similar tools exists for other programming languages (Matlab, R, etc.)

Software packages and GIS tools can open netCDF files, e.g. CDO, NCO, QGIS, ArCGIS

You can also use the GUI software Panoply to view the contents of each file

References

Dorigo, W., Wagner, W., Albergel, C., Albrecht, F., Balsamo, G., Brocca, L., Chung, D., Ertl, M., Forkel, M., Gruber, A., Haas, E., Hamer, P. D., Hirschi, M., Ikonen, J., de Jeu, R., Kidd, R., Lahoz, W., Liu, Y. Y., Miralles, D., Mistelbauer, T., Nicolai-Shaw, N., Parinussa, R., Pratola, C., Reimer, C., van der Schalie, R., Seneviratne, S. I., Smolander, T., and Lecomte, P.: ESA CCI Soil Moisture for improved Earth system understanding: State-of-the art and future directions, Remote Sensing of Environment, 203, 185-215, 10.1016/j.rse.2017.07.001, 2017

Dorigo, W., Stradiotti, P., Preimesberger, W., Kidd, R., van der Schalie, R., Frederikse, T., Rodriguez-Fernandez, N., & Baghdadi, N. (2024). ESA Climate Change Initiative Plus - Soil Moisture Algorithm Theoretical Baseline Document (ATBD) Supporting Product Version 09.0. Zenodo. https://doi.org/10.5281/zenodo.13860922

Gruber, A., Scanlon, T., van der Schalie, R., Wagner, W., and Dorigo, W.: Evolution of the ESA CCI Soil Moisture climate data records and their underlying merging methodology, Earth Syst. Sci. Data, 11, 717–739, https://doi.org/10.5194/essd-11-717-2019, 2019.

Hirschi, M., Michel, D., Schumacher, D. L., Preimesberger, W., Seneviratne, S. I.: Recent summer soil moisture drying in Switzerland based on the SwissSMEX network, 2025 (paper submitted)

Pasik, A., Gruber, A., Preimesberger, W., De Santis, D., and Dorigo, W.: Uncertainty estimation for a new exponential-filter-based long-term root-zone soil moisture dataset from Copernicus Climate Change Service (C3S) surface observations, Geosci. Model Dev., 16, 4957–4976, https://doi.org/10.5194/gmd-16-4957-2023, 2023

Preimesberger, W., Stradiotti, P., and Dorigo, W.: ESA CCI Soil Moisture GAPFILLED: An independent global gap-free satellite climate data record with uncertainty estimates, Earth Syst. Sci. Data Discuss. [preprint], https://doi.org/10.5194/essd-2024-610, in review, 2025.

Related Records

Please see the ESA CCI Soil Moisture science data records community for more records based on ESA CCI SM.
a
PlantVillage
datasets.activeloop.ai
tensorflow.org
+2more
deeplake
Updated Feb 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arun Pandian J, Geetharamani Gopal (2022). PlantVillage [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/plantvillage-dataset/
Explore at:
deeplakeAvailable download formats
Dataset updated
Feb 3, 2022
Authors
Arun Pandian J, Geetharamani Gopal
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
A dataset of 61,486 images of plant leaves and backgrounds, with each image labeled with the disease or pest that is present. The dataset was created by researchers at the University of Wisconsin-Madison and is used for research in machine learning and computer vision tasks such as plant disease detection and pest identification.
h
ag_news_training_set_losses
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Vila, ag_news_training_set_losses [Dataset]. https://huggingface.co/datasets/dvilasuero/ag_news_training_set_losses
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Daniel Vila
Description
AG News train losses

This dataset is part of an experiment using Rubrix, an open-source Python framework for human-in-the loop NLP data annotation and management.
a
USPS
datasets.activeloop.ai
opendatalab.com
deeplake
Updated Mar 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
J. J. Hull (2022). USPS [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/usps-dataset/
Explore at:
deeplakeAvailable download formats
Dataset updated
Mar 28, 2022
Authors
J. J. Hull
License
Attribution-NonCommercial-NoDerivs 2.0 (CC BY-NC-ND 2.0)https://creativecommons.org/licenses/by-nc-nd/2.0/
License information was derived automatically
Description
A dataset of 20,000 handwritten digits from US mail service forms. The dataset was created by researchers at the University of California, Berkeley and is used for research in machine learning and computer vision tasks such as digit recognition.
Z
RPMC_L2
data.niaid.nih.gov
Updated Feb 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous (2025). RPMC_L2 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14854216
Explore at:
Dataset updated
Feb 12, 2025
Authors
Anonymous
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Dataset Overview

This is the Rock, Punk, Metal, and Core - Livehouse Lighting (RPMC-L2) Dataset.

Purpose: Dataset for studying the relationship between music and lighting in live music performances

Music Genres: Rock, Punk, Metal, and Core

Total Files: 699 files of synchronized music and lighting data

Collection Method: Collected from professional live performance venues

Data Format: HDF5 file format (.h5)

Total Size: ~40 GB

Dataset Data Structure

music (dict)

Contains audio-related features, stored as np.ndarray arrays. Each feature has a shape (X, L), where L is the sequence length.

Feature Shape Description

openl3 (512, L) OpenL3 deep audio embedding.

mel_spectrogram (128, L) Mel spectrogram.

mel_spectrogram_db (128, L) Mel spectrogram in decibels.

cqt (84, L) Constant-Q transform (CQT).

stft (1025, L) Short-time Fourier transform (STFT).

mfcc (128, L) Mel-frequency cepstral coefficients.

chroma_stft (12, L) Chroma features from STFT.

chroma_cqt (12, L) Chroma features from CQT.

chroma_cens (12, L) Chroma Energy Normalized Statistics.

spectral_centroids (1, L) Spectral centroid.

spectral_bandwidth (1, L) Spectral bandwidth.

spectral_contrast (7, L) Spectral contrast.

spectral_rolloff (1, L) Spectral rolloff frequency.

zero_crossing_rate (1, L) Zero-crossing rate.

light (dict)

Contains lighting-related data, structured as np.ndarray arrays with specific ranges and shapes.

Feature Range Shape Description

threshold 0 to 240 (F, 3, 256) Frame-specific light threshold data.

Details of threshold (per frame):

Frame (np.ndarray): Length F, where each frame has a shape (3, 256):

h (Hue):

Values range from 0 to 179.

Shape: (180, padded to 256).

s (Saturation):

Values range from 0 to 255.

Shape: (256,).

v (Value):

Values range from 0 to 255.

Shape: (256,).

This structure organizes the datasets into two main categories: music features for audio characteristics and light features for lighting data, enabling efficient data processing and analysis.

Data Usage

Merge the Files

Use the cat command to merge the split files into a single .h5 file:

cat RPMC_L2_part_aa RPMC_L2_part_ab RPMC_L2_part_ac RPMC_L2_part_ad > RPMC_L2.h5

Read the Merged File

Use the following Python code to read the merged .h5 file and iterate through its contents:

import os import h5py

root_folder = "/path/to/your/folder" # Replace with your actual folder path

with h5py.File(os.path.join(root_folder, 'RPMC_L2.h5'), 'r') as f: for key in f.keys(): # Iterate through each file hash print(f" File {key}:") for group_name in f[key].keys(): # Iterate through 'music' and 'light' groups print(f" Group: {group_name}") for dataset_name in f[key][group_name].keys(): # Iterate through specific datasets print(f"{dataset_name}: {f[key][group_name][dataset_name].shape}")

f.keys(): Retrieves the top-level keys, typically representing file hashes.

f[key].keys(): Accesses the groups within each file (e.g., music and light).

f[key][group_name].keys(): Accesses the specific datasets within each group.
R
Python PI Control Script for Laser Wavelength Stabilisation
repod.icm.edu.pl
text/x-python, txt
Updated Sep 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Linek, Adam (2025). Python PI Control Script for Laser Wavelength Stabilisation [Dataset]. http://doi.org/10.18150/S4VIZZ
Explore at:
txt(1067), txt(3747), text/x-python(10871), txt(257)Available download formats
Unique identifier
https://doi.org/10.18150/S4VIZZ
Dataset updated
Sep 3, 2025
Dataset provided by
RepOD
Authors
Linek, Adam
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains a Python script implementing a proportional–integral (PI) control loop for stabilising the wavelength of a laser system. The script communicates with a SLICE-DHV high-voltage driver via PyVISA to tune the piezo actuator of the laser, while simultaneously reading the laser wavelength from a HighFinesse wavemeter through the wlmData.dll interface.The control loop compensates deviations from a user-defined setpoint, applying anti-windup protection and enforcing safety limits on wavelength error and control voltage. All measurements and control signals are continuously logged to a file for later analysis.The software has been developed for laboratory use in high-precision laser spectroscopy and frequency metrology experiments, where long-term wavelength stability of ultra-stable laser systems is essential.
Benign and Malicious QR codes
kaggle.com
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samah Malibari (2022). Benign and Malicious QR codes [Dataset]. https://www.kaggle.com/datasets/samahsadiq/benign-and-malicious-qr-codes/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 1, 2022
Dataset provided by
Kaggle
Authors
Samah Malibari
Description
This dataset is created using Python code to generate QR codes from the REAL list of URLs provided in the following dataset from Kaggle: https://www.kaggle.com/datasets/samahsadiq/benign-and-malicious-urls

The mentioned dataset consists of over 600,000 URLs. However, only the first 100,000 URLs from each class {Benign and Malicious} are used to generate the QR codes. In total, there 200,000 QR codes images in the dataset that encoded REAL URLs.

This dataset is a 'Balanced Dataset' of QR codes of version 2. The 100,000 Benign QR codes were generated by a single Loop in python, and the same for the Malicious QR codes.

The QR code images that belong to malicious URLs are under the 'malicious' folder with 'malicious' word in their file name. On the other hand, the QR cods that belongs to benign URLs are listed under 'benign' folder with 'benign' word appears in their filename.

NOTE: Keep in mind that malicious QR codes are encoded a REAL malicious URLs, it is not recommended to scan them manually and visiting their encoded websites.

For more informations about the encoded URLs, please refer to the mentioned dataset above in Kaggle.
s
MUSCLE (MUltiplexed Single-molecule Characterization at the Library scalE)...
figshare.scilifelab.se
researchdata.se
+1more
zip
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mikhail Panfilov; Guanzhong Mao; Jianfeng Guo; Javier Aguirre Rivera; Anton Sabantcev; Sebastian Deindl (2025). MUSCLE (MUltiplexed Single-molecule Characterization at the Library scalE) protocol data and codes [Dataset]. http://doi.org/10.17044/scilifelab.28008872.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.17044/scilifelab.28008872.v1
Dataset updated
Jan 15, 2025
Dataset provided by
Uppsala University
Authors
Mikhail Panfilov; Guanzhong Mao; Jianfeng Guo; Javier Aguirre Rivera; Anton Sabantcev; Sebastian Deindl
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A test dataset for MUSCLE (MUltiplexed Single-molecule Characterization at the Library scalE) data analysis. See "\Python codes for MUSCLE data analysis\README.txt" for the instructions on running the data analysis codes. Use the files in the "Test MUSCLE dataset" folder as input for the codes. "Test MUSCLE dataset\Output_tile1" contains the code output for the test dataset. The example dataset corresponds to one MiSeq tile in an experiment analyzing dCas9-induced R-loop formation for a library of 256 different target sequences.The latest version of the Python codes for matching single-molecule FRET traces with sequenced clusters is available at https://github.com/deindllab/MUSCLE/.
Red Eye Removal
kaggle.com
zip
Updated Mar 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brian Langay ☺ (2024). Red Eye Removal [Dataset]. https://www.kaggle.com/datasets/brianlangay/red-eye-removal
Explore at:
zip(3000906 bytes)Available download formats
Dataset updated
Mar 6, 2024
Authors
Brian Langay ☺
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Image-Restoration-Computer-Vision

image processing techniques for eye detection and red-eye removal, which are essential components of image restoration. The context provided by these snippets lays the groundwork for understanding how image restoration algorithms can be implemented and applied in real-world scenarios to enhance the quality of digital images.

https://github.com/brianlangay4/Image-Restoration-Computer-Vision/assets/67788456/714097a0-01ab-43dc-86b0-d6cc68d96b97" alt="Screenshot 2024-03-04 214443">

OpenCV is a powerful library for computer vision tasks in Python. It includes tools for image processing, object detection, and more. One of its key features is the Haar cascade classifier, a machine learning-based algorithm used for object detection.

In the context of eye reduction, OpenCV's Haar cascade classifier is particularly useful. By training on a dataset of positive (eye-containing) and negative (eye-lacking) images, it learns to detect eyes in images. This pre-trained classifier can then be applied to new images to automatically locate eye regions. This functionality is leveraged in tasks like red-eye reduction, where the detected eye regions are processed to remove unwanted red-eye effects.

The eyesCascade variable in the our code refers to a Haar cascade classifier specifically trained for detecting eyes in images.

Haar Cascade Classifiers: Haar cascade classifiers are machine learning-based algorithms used for object detection. They work by using a series of feature templates (Haar features) to detect objects of interest. These features are simple rectangular areas where the pixel values are summed up and compared to a threshold.

Eyes Cascade Classifier: The eyes cascade classifier is trained specifically to detect eyes in images. It's pre-trained using a large dataset of positive samples (images containing eyes) and negative samples (images without eyes). During training, the classifier learns to distinguish between these two types of samples based on the patterns of Haar features present in the images.

How it Works: When applied to an input image, the eyes cascade classifier scans the image at multiple scales and locations, searching for regions that match the learned patterns of eye features. It uses a sliding window approach, where a window of fixed size moves across the image, and at each position, the Haar features are computed and compared to the learned patterns. If a region matches the eye patterns above a certain threshold, it's considered a positive detection, and the bounding box coordinates of the detected eyes are returned.

Usage in the Code: In the provided code, the eyesCascade variable is loaded with a pre-trained eyes cascade classifier XML file using cv2.CascadeClassifier(). This file contains the learned patterns necessary for eye detection. Later, the detectMultiScale() function of the eyesCascade object is called to perform eye detection on the input image (img). The function returns a list of rectangles representing the bounding boxes of the detected eyes in the image.

Overall, the eyes cascade classifier plays a crucial role in automatically identifying eye regions within images, which is essential for subsequent processing tasks, such as red-eye removal, as demonstrated in the code.

Eye processing

To understand how the code detects and removes red eyes, let's break down the relevant parts: ``` 1. **Eye Detection**: ```python eyes = eyesCascade.detectMultiScale(img, scaleFactor=1.3, minNeighbors=4, minSize=(100, 100)) ``` - This line utilizes the Haar cascade classifier (`eyesCascade`) to detect eyes in the input image (`img`). - The `detectMultiScale` function detects objects (in this case, eyes) of different sizes in the input image. It returns a list of rectangles where it believes it found eyes. 2. **Processing Detected Eyes**: ```python for (x, y, w, h) in eyes: ``` - This loop iterates over each detected eye, represented by its bounding box `(x, y, w, h)`. 3. **Extracting Eye Region**: ```python eye = img[y:y+h, x:x+w] ``` - This line extracts the region of interest (ROI) from the original image (`img`) corresponding to the detected eye. It crops the image based on the coordinates of the bounding box. 4. **Red Eye Removal**: - Once the eye region is extracted, the code performs the following steps to remove red-eye effect: - **Extracting Channels**: It separates the eye image into its three color channels: blue (`b`), green (`g`), and red (`r`). - **Calculating Background**: It calculates the sum of blue and green channels (`bg`), representing the background color without the red-eye effect. - **Creating Mask**: It creates a binary mask (`mask`) to identify pixels that are significantly more red than the background. This is done by com...
Datasets
figshare.com
csv
Updated Sep 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rishabh Das (2025). Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.28735547.v2
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28735547.v2
Dataset updated
Sep 16, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Rishabh Das
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Critical infrastructures encompass a wide range of process control systems, each with unique security needs. Securing diverse systems is a challenge since they require custom defenses. To address the gap, this study describes a process-aware anomaly detection framework that can automatically baseline the behavior of the process. Utilizing a sliding window Granger causality method, the framework detects time-varying dependencies, allowing it to capture stable and transient causal links across different operational states. Additionally, the anomaly detection framework considers the criticality of various components. The study evaluates the framework on a hardware-in-the-loop (HIL) water tank testbed. The framework successfully identified four sensors and actuator spoofing scenarios on the water tank system.List of Variables in PLC MemoryVariable nameVariable addressVariable FunctionalityI_PbFillIX100.0Push button to manually fill the tankI_PbDischargeIX100.1Push button to manually discharge the tankI_Level_MeterIW100Display the level of water in the tankI_ModeSelector%IX100.2Switches between auto and manual processQ_Fill_Valve%QW101Pumps water into the tankQ_Discharge_Valve%QW102Discharge water from the tankQ_Display%QW100Shows the numerical current tank water levelQ_Fill_Light%QX100.0Lights when the filling process is on.Q_Discharge_Light%QX100.2Lights when the discharging process is on.I_Flow_Meter%IW101Shows the current diameter of the discharge valve nozzleLowSetpoint%MW1Used to actuate the automatic filling processHighSetpoint%MW2Used to actuate the automatic discharging processTankLevel%MW0Used to calibrate the water level and control the LowSetpoint/HighSetpointI_PbSet%MX0.1Used to set the Q_Fill_LightI_PbReset%MX0.2Used to set the Q_Discharge_LightQ_Discharge_Valve_M%MW3Used to set the manual discharging processQ_Fill_Valve_M%MW4Used to set the manual filling processTo investigate variable dependencies, we capture multivariate time series data from the OpenPLC’s hardware layer. In a physical system, the hardware layer represents the wired connection between the PLC, sensor, and actuator network. By capturing data from the hardware layer, we can track the state of the sensors, actuators, and the MODBUS memory map. The memory map includes discrete output coils, discrete input contacts, analog input registers, and holding registers. Table I shows the list of variables in the Water tank simulation.During data collection, the water tank is set to auto mode. A network-connected Python program writes random low and high setpoint values at a random interval. The Python program also randomly opens and closes the valve. The normal capture spans over 15 hours and has 893,795 entries of data. Table II provides details on the datasets.For abnormal data, we simulate four spoofing scenarios involving the level, flow sensor, fill valve, and Display interface. The level sensor measures the water level in the tank, the flow sensor measures the outgoing flow, and the fill valve controls the water inflow. The display interface is a digital meter showing the current water level in the tank.Decription of the DatasetsData TypeDurationTotal Sample sizeNotesDataset 1: Normal Operation [monitor_data_randomized_setpoints]15 hours, 34 minutes, and 32 seconds893795Normal operation. Data used for baselining Water Tank using Frequency-Based Causal Structure AnalysisDataset 2: Level sensor spoof. [monitor_data_levelmeter]1 hour, 6 minutes, and 3 seconds64156Data captures during level sensor spoofing scenarioDataset 3: Flow meter sensor spoof [monitor_data_flowmeter]55 minutes and 1 second52439Data captures during flow meter spoofing scenarioDataset 4: Fill valve spoof. [monitor_data_fillvalve_march21st]1 hour, 21 minutes, and 54 seconds79224Data captures during fill valve spoofing scenarioDataset 5: Display interface anomaly. [monitor_data_Display]1 hour, 26 minutes, and 51 seconds84680Data captures during display interface spoofing scenarioDataset 6: Normal Operation [monitor_data_normal_march21st]1 hour, 24 minutes, and 23 seconds.81348Testing data for outlining creates normal thresholdFeel free to contact Dr. Rishabh Das for additional details.[Email:- rishabh.das@ohio.edu ]or[Email:- das.rishabh92@gmail.com]If you use this dataset, Please cite the following research paper."R. Das and G. Agendia, "Process-Aware Anomaly Detection in Industrial Control Systems Using Frequency-Based Causal Structure Analysis," 2025 IEEE World AI IoT Congress (AIIoT), Seattle, WA, USA, 2025, pp. 0228-0234, doi: 10.1109/AIIoT65859.2025.11105316."

Facebook

Twitter

Click to copy link

Link copied

Cite

Sadique Khan (2023). Functions & Loops in Python [Dataset]. https://www.kaggle.com/datasets/sadiquekhann/functions-and-loops-in-python/discussion?sort=undefined

Functions & Loops in Python

Mastering Functions and Loops: Unleashing the Power

Explore at:

zip(5790 bytes)Available download formats

Dataset updated

May 31, 2023

Authors

Sadique Khan

Description

Dataset

This dataset was created by Sadique Khan

Clear search

Close search

Google apps

Main menu

Functions & Loops in Python

Dataset

Contents

ESA CCI SM GAPFILLED Long-term Climate Data Record of Surface Soil Moisture...

Dataset Paper (Open Access)

Abstract

Summary

Programmatic Download

Data details

Data Variables

Version Changelog

Software to open netCDF files

References

Related Records

COCO

PV-gradient (PVG) tropopause: Time series 1980--2017 in four reanalyses

PV-gradient tropopause time series

General description

Data and methods

Contents

How to use

Funding

Brain-in-the-Loop Learning for Intelligent Vehicle Decision-Making

Dataset for "Fractional Skyrmion Tubes in 3D Magnetic Nanowires"

About the dataset

Data & File Overview

Requirements

Licenses

Code Snippets: Insights and Readability

Dataset 1: data_python

Dataset 2: data_cpp

Data from: Closed Loop Geothermal Working Group: GeoCLUSTER App, Subsurface...

depyler-citl

tokenized-github-code-python

ESA CCI SM PASSIVE Daily Gap-filled Root-Zone Soil Moisture from merged...

Studies using this dataset [preprint]

Abstract

Summary

Programmatic Download

Data details

Data Variables

Version Changelog

Software to open netCDF files

References

Related Records

PlantVillage

ag_news_training_set_losses

USPS

RPMC_L2

Python PI Control Script for Laser Wavelength Stabilisation

Benign and Malicious QR codes

MUSCLE (MUltiplexed Single-molecule Characterization at the Library scalE)...

Red Eye Removal

Image-Restoration-Computer-Vision

OpenCV is a powerful library for computer vision tasks in Python. It includes tools for image processing, object detection, and more. One of its key features is the Haar cascade classifier, a machine learning-based algorithm used for object detection.

Datasets

Functions & Loops in Python

Mastering Functions and Loops: Unleashing the Power

Dataset

Contents