Facebook
TwitterThis dataset was created by Sadique Khan
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains information on the Surface Soil Moisture (SM) content derived from satellite observations in the microwave domain.
A description of this dataset, including the methodology and validation results, is available at:
Preimesberger, W., Stradiotti, P., and Dorigo, W.: ESA CCI Soil Moisture GAPFILLED: an independent global gap-free satellite climate data record with uncertainty estimates, Earth Syst. Sci. Data, 17, 4305–4329, https://doi.org/10.5194/essd-17-4305-2025, 2025.
ESA CCI Soil Moisture is a multi-satellite climate data record that consists of harmonized, daily observations coming from 19 satellites (as of v09.1) operating in the microwave domain. The wealth of satellite information, particularly over the last decade, facilitates the creation of a data record with the highest possible data consistency and coverage.
However, data gaps are still found in the record. This is particularly notable in earlier periods when a limited number of satellites were in operation, but can also arise from various retrieval issues, such as frozen soils, dense vegetation, and radio frequency interference (RFI). These data gaps present a challenge for many users, as they have the potential to obscure relevant events within a study area or are incompatible with (machine learning) software that often relies on gap-free inputs.
Since the requirement of a gap-free ESA CCI SM product was identified, various studies have demonstrated the suitability of different statistical methods to achieve this goal. A fundamental feature of such gap-filling method is to rely only on the original observational record, without need for ancillary variable or model-based information. Due to the intrinsic challenge, there was until present no global, long-term univariate gap-filled product available. In this version of the record, data gaps due to missing satellite overpasses and invalid measurements are filled using the Discrete Cosine Transform (DCT) Penalized Least Squares (PLS) algorithm (Garcia, 2010). A linear interpolation is applied over periods of (potentially) frozen soils with little to no variability in (frozen) soil moisture content. Uncertainty estimates are based on models calibrated in experiments to fill satellite-like gaps introduced to GLDAS Noah reanalysis soil moisture (Rodell et al., 2004), and consider the gap size and local vegetation conditions as parameters that affect the gapfilling performance.
You can use command line tools such as wget or curl to download (and extract) data for multiple years. The following command will download and extract the complete data set to the local directory ~/Download on Linux or macOS systems.
#!/bin/bash
# Set download directory
DOWNLOAD_DIR=~/Downloads
base_url="https://researchdata.tuwien.at/records/3fcxr-cde10/files"
# Loop through years 1991 to 2023 and download & extract data
for year in {1991..2023}; do
echo "Downloading $year.zip..."
wget -q -P "$DOWNLOAD_DIR" "$base_url/$year.zip"
unzip -o "$DOWNLOAD_DIR/$year.zip" -d $DOWNLOAD_DIR
rm "$DOWNLOAD_DIR/$year.zip"
done
The dataset provides global daily estimates for the 1991-2023 period at 0.25° (~25 km) horizontal grid resolution. Daily images are grouped by year (YYYY), each subdirectory containing one netCDF image file for a specific day (DD), month (MM) in a 2-dimensional (longitude, latitude) grid system (CRS: WGS84). The file name has the following convention:
ESACCI-SOILMOISTURE-L3S-SSMV-COMBINED_GAPFILLED-YYYYMMDD000000-fv09.1r1.nc
Each netCDF file contains 3 coordinate variables (WGS84 longitude, latitude and time stamp), as well as the following data variables:
Additional information for each variable is given in the netCDF attributes.
Changes in v9.1r1 (previous version was v09.1):
These data can be read by any software that supports Climate and Forecast (CF) conform metadata standards for netCDF files, such as:
The following records are all part of the ESA CCI Soil Moisture science data records community
| 1 |
ESA CCI SM MODELFREE Surface Soil Moisture Record | <a href="https://doi.org/10.48436/svr1r-27j77" target="_blank" |
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The COCO dataset is a large dataset of labeled images and annotations. It is a popular dataset for machine learning and artificial intelligence research. The dataset consists of 330,000 images and 500,000 object annotations. The annotations include the bounding boxes of objects in the images, as well as the labels of the objects.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These datasets contain time series of the PV-gradient tropopause (PVG tropopause) introduced by A. Kunz (2011, doi:10.1029/2010JD014343) and calculated by K. Turhal (2024, paper " Variability and Trends in the PVG Tropopause", preprint in EGUsphere: https://doi.org/10.5194/egusphere-2024-471).
The PVG tropopause has been computed by means of the Eddy Tracking Toolkit (developed by J. Clemens and K. Turhal, to be published):
Datasets are provided for each year and isentropic level in NetCDF4 format, every file consisting of two groups for the northern and southern hemisphere. Each group contains the following variables, with time as dimension:
In this upload, the PVG tropopause time series are included as *.zip files:
The variables in these netCDF files are grouped by hemisphere. To read in the data, specify the group first ("NorthernHemisphere" or "SouthernHemisphere") and then the variable name (see list above). In Python, this can be done as follows:
import netCDF4 as nc
file="
If you would like to read in all variables in both hemispheres, you can loop e.g. as follows:
import netCDF4 as nc
file = "
This project has been funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – TRR 301 – Project-ID 428312742, TPChange: The Tropopause Region in a Changing Atmosphere (https://tpchange.de/).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The inflexible human-autonomy relationship within autonomous driving scenarios still has not realized deep intelligent synergy, therefore unable to provide adaptive and context-sensitive decision-making and sometimes leading to violation of human preferences or even hazards. In this paper, we utilize functional near-infrared spectroscopy (fNIRS) signals as real-time human risk-perception feedback to establish a brain-in-the-loop (BiTL) trained artificial intelligence algorithm for decision-making. The proposed algorithm uses the result of driving risk reasoning as one input of reinforcement learning combining fNIRS-based risk and driving safety field model-based risk, realizing integrating human brain activity into the reinforcement learning scheme, then overcoming the disadvantage of machine-oriented intelligence that could violate human intentions. To achieve policy learning within limited BiTL training periods, we add two modification features to the proposed algorithm based on TD3. The experiment involving twenty participants has been conducted, and the results show that in continuously high-risk driving scenarios, compared to traditional reinforcement learning algorithms without human participation, the proposed algorithm can maintain a cautious driving policy and avoid potential collisions, validated with both proximal surrogate indicators and success rates. This repository contains the experimental dataset and Python code to reproduce the experimental results used in our research on 'Brain-in-the-Loop Learning for Intelligent Vehicle Decision-Making'. Both human subject studies, control groups and ablation studies data are included in this repository. Detailed description of file organization, data structures, requirements could be found in the README.md document.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset supports a study where fractional skyrmion tubes were observed in double-helical nanowires fabricated by 3D nano-printing using focused electron beam-induced deposition.
The dataset includes code, images and processed data for reproducing the figures from the associated paper, and is intended to support researchers interested in reproducing the data of the scientific article, including simulations and experiments. For more information about the code and data, please refer to the readme.txt file.
The published preprint can be found here: https://arxiv.org/abs/2412.14069
1) Micromagnetic Simulations
Contains Mumax3 files and scripts used to generate simulated data for the publication:
2) XMCD
Contains original ptychographic XMCD data from SOLEIL (beamtime 20210958, June 2022), processed from CL and CR reconstructions, aligned and normalized. Data saved as .dat arrays and .png images with field values in filenames. Includes metadata in [figure_name]_data_list.csv.
3) SEM
Contains an original SEM image of the fabricated double-helix structures used in Fig. 1.
4) TEM
Contains original TEM images of the FEBID Co nanostructure, used in Supplementary Fig. 7.
The code can be executed using Python, MATLAB, Paraview and Mumax3, depending on the file.
The images can be opened with any standard image software.
The data is licensed under CC-BY, the code is licensed under MIT.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Title: Python Code Metrics and Readability Dataset
Description: This dataset provides a comprehensive collection of metrics and readability scores for Python code snippets. Each entry includes information such as the problem title, Python solutions, difficulty level, number of lines, code length, comments, cyclomatic complexity, indents, loop count, line length, identifiers, and readability score. The dataset is designed to facilitate the analysis of coding patterns, complexity, and readability in Python programming.
Columns:
- problem_title: Title of the coding problem
- python_solutions: Python code solutions for the problem
- difficulty: Difficulty level of the coding problem
- num_of_lines: Number of lines in the Python code
- code_length: Length of the Python code
- comments: Number of comments in the code
- cyclomatic_complexity: Cyclomatic complexity of the code
- indents: Number of indents in the code
- loop_count: Count of loops in the code
- line_length: Average line length in the code
- identifiers: Number of identifiers used in the code
- readability: Readability score of the code
Title: C++ Code Metrics and Readability Dataset
Description: This dataset offers a comprehensive set of metrics and readability scores for C++ code snippets. Each record includes details such as the code itself, number of lines, code length, comments, cyclomatic complexity, number of indents, loop count, line length, identifiers, and readability score. The dataset is crafted to support the exploration of coding styles, complexity, and readability in C++ programming.
Columns:
- Answer: C++ code snippet
- num_of_lines: Number of lines in the C++ code
- code_length: Length of the C++ code
- comments: Number of comments in the code
- cyclomatic_complexity: Cyclomatic complexity of the code
- num_of_indents: Number of indents in the code
- loop_count: Count of loops in the code
- line_length: Average line length in the code
- identifiers: Number of identifiers used in the code
- readability: Readability score of the code
These datasets are valuable resources for researchers, educators, and practitioners interested in code analysis, programming styles, and software readability in Python and C++.
All features of the dataset have been generated through coded functions that will be linked in the code file by the author.
Facebook
TwitterTo better understand the heat production, electricity generation performance, and economic viability of closed-loop geothermal systems in hot-dry rock, the Closed-Loop Geothermal Working Group -- a consortium of several national labs and academic institutions has tabulated time-dependent numerical solutions and levelized cost results of two popular closed-loop heat exchanger designs (u-tube and co-axial). The heat exchanger designs were evaluated for two working fluids (water and supercritical CO2) while varying seven continuous independent parameters of interest (mass flow rate, vertical depth, horizontal extent, borehole diameter, formation gradient, formation conductivity, and injection temperature). The corresponding numerical solutions (approximately 1.2 million per heat exchanger design) are stored as multi-dimensional HDF5 datasets and can be queried at off-grid points using multi-dimensional linear interpolation. A Python script was developed to query this database and estimate time-dependent electricity generation using an organic Rankine cycle (for water) or direct turbine expansion cycle (for CO2) and perform a cost assessment. This document aims to give an overview of the HDF5 database file and highlights how to read, visualize, and query quantities of interest (e.g., levelized cost of electricity, levelized cost of heat) using the accompanying Python scripts. Details regarding the capital, operation, and maintenance and levelized cost calculation using the techno-economic analysis script are provided. This data submission will contain results from the Closed Loop Geothermal Working Group study that are within the public domain, including publications, simulation results, databases, and computer codes. GeoCLUSTER is a Python-based web application created using Dash, an open-source framework built on top of Flask that streamlines the building of data dashboards. GeoCLUSTER provides users with a collection of interactive methods for streamlining the exploration and visualization of an HDF5 dataset. The GeoCluster app and database are contained in the compressed file geocluster_vx.zip, where the "x" refers to the version number. For example, geocluster_v1.zip is Version 1 of the app. This zip file also contains installation instructions. **To use the GeoCLUSTER app in the cloud, click the link to "GeoCLUSTER on AWS" in the Resources section below. To use the GeoCLUSTER app locally, download the geocluster_vx.zip to your computer and uncompress this file. When uncompressed this file comprises two directories and the geocluster_installation.pdf file. The geo-data app contains the HDF5 database in condensed format, and the GeoCLUSTER directory contains the GeoCLUSTER app in the subdirectory dash_app, as app.py. The geocluster_installation.pdf file provides instructions on installing Python, the needed Python modules, and then executing the app.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Depyler CITL Corpus
Python→Rust transpilation pairs for Compiler-in-the-Loop training.
Dataset Description
606 Python CLI examples with corresponding Rust translations (where available), designed for training transpiler ML models.
Split Examples With Rust Size
train 606 439 (72.4%) 957 KB
Schema
Facebook
TwitterPretokenized GitHub Code Dataset
Dataset Description
This is a pretokenized version of the Python files of the GitHub Code dataset, that consists of 115M code files from GitHub in 32 programming languages. We tokenized the dataset using BPE Tokenizer trained on code, available in this repo. Having a pretokenized dataset can speed up the training loop by not having to tokenize data at each batch call. We also include ratio_char_token which gives the ratio between the… See the full description on the dataset page: https://huggingface.co/datasets/loubnabnl/tokenized-github-code-python.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides global daily estimates of Root-Zone Soil Moisture (RZSM) content at 0.25° spatial grid resolution, derived from gap-filled merged satellite observations of 14 passive satellites sensors operating in the microwave domain of the electromagnetic spectrum. Data is provided from January 1991 to December 2023.
This dataset was produced with funding from the European Space Agency (ESA) Climate Change Initiative (CCI) Plus Soil Moisture Project (CCN 3 to ESRIN Contract No: 4000126684/19/I-NB "ESA CCI+ Phase 1 New R&D on CCI ECVS Soil Moisture"). Project website: https://climate.esa.int/en/projects/soil-moisture/" target="_blank" rel="noopener">https://climate.esa.int/en/projects/soil-moisture/. Operational implementation is supported by the Copernicus Climate Change Service implemented by ECMWF through C3S2 312a/313c.
This dataset is used by Hirschi et al. (2025) to assess recent summer drought trends in Switzerland.
Hirschi, M., Michel, D., Schumacher, D. L., Preimesberger, W., and Seneviratne, S. I.: Recent summer soil moisture drying in Switzerland based on measurements from the SwissSMEX network, Earth Syst. Sci. Data Discuss. [preprint], https://doi.org/10.5194/essd-2025-416, in review, 2025.
ESA CCI Soil Moisture is a multi-satellite climate data record that consists of harmonized, daily observations from various microwave satellite remote sensing sensors (Dorigo et al., 2017, 2024; Gruber et al., 2019). This version of the dataset uses the PASSIVE record as input, which contains only observations from passive (radiometer) measurements (scaling reference AMSR-E). The surface observations are gap-filled using a univariate interpolation algorithm (Preimesberger et al., 2025). The gap-filled passive observations serve as input for an exponential filter based method to assess soil moisture in different layers of the root-zone of soil (0-200 cm) following the approach by Pasik et al. (2023). The final gap-free root-zone soil moisture estimates based on passive surface input data are provided here at 4 separate depth layers (0-10, 10-40, 40-100, 100-200 cm) over the period 1991-2023.
You can use command line tools such as wget or curl to download (and extract) data for multiple years. The following command will download and extract the complete data set to the local directory ~/Downloads on Linux or macOS systems.
#!/bin/bash
# Set download directory
DOWNLOAD_DIR=~/Downloads
base_url="https://researchdata.tuwien.ac.at/records/8dda4-xne96/files"
# Loop through years 1991 to 2023 and download & extract data
for year in {1991..2023}; do
echo "Downloading $year.zip..."
wget -q -P "$DOWNLOAD_DIR" "$base_url/$year.zip"
unzip -o "$DOWNLOAD_DIR/$year.zip" -d $DOWNLOAD_DIR
rm "$DOWNLOAD_DIR/$year.zip"
done
The dataset provides global daily estimates for the 1991-2023 period at 0.25° (~25 km) horizontal grid resolution. Daily images are grouped by year (YYYY), each subdirectory containing one netCDF image file for a specific day (DD), month (MM) in a 2-dimensional (longitude, latitude) grid system (CRS: WGS84). The file name has the following convention:
ESA_CCI_PASSIVERZSM-YYYYMMDD000000-fv09.1.nc
Each netCDF file contains 3 coordinate variables (WGS84 longitude, latitude and time stamp), as well as the following data variables:
Additional information for each variable is given in the netCDF attributes.
These data can be read by any software that supports Climate and Forecast (CF) conform metadata standards for netCDF files, such as:
Please see the ESA CCI Soil Moisture science data records community for more records based on ESA CCI SM.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A dataset of 61,486 images of plant leaves and backgrounds, with each image labeled with the disease or pest that is present. The dataset was created by researchers at the University of Wisconsin-Madison and is used for research in machine learning and computer vision tasks such as plant disease detection and pest identification.
Facebook
TwitterAG News train losses
This dataset is part of an experiment using Rubrix, an open-source Python framework for human-in-the loop NLP data annotation and management.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 2.0 (CC BY-NC-ND 2.0)https://creativecommons.org/licenses/by-nc-nd/2.0/
License information was derived automatically
A dataset of 20,000 handwritten digits from US mail service forms. The dataset was created by researchers at the University of California, Berkeley and is used for research in machine learning and computer vision tasks such as digit recognition.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dataset Overview
This is the Rock, Punk, Metal, and Core - Livehouse Lighting (RPMC-L2) Dataset.
Purpose: Dataset for studying the relationship between music and lighting in live music performances
Music Genres: Rock, Punk, Metal, and Core
Total Files: 699 files of synchronized music and lighting data
Collection Method: Collected from professional live performance venues
Data Format: HDF5 file format (.h5)
Total Size: ~40 GB
Dataset Data Structure
Contains audio-related features, stored as np.ndarray arrays. Each feature has a shape (X, L), where L is the sequence length.
Feature Shape Description
openl3 (512, L) OpenL3 deep audio embedding.
mel_spectrogram (128, L) Mel spectrogram.
mel_spectrogram_db (128, L) Mel spectrogram in decibels.
cqt (84, L) Constant-Q transform (CQT).
stft (1025, L) Short-time Fourier transform (STFT).
mfcc (128, L) Mel-frequency cepstral coefficients.
chroma_stft (12, L) Chroma features from STFT.
chroma_cqt (12, L) Chroma features from CQT.
chroma_cens (12, L) Chroma Energy Normalized Statistics.
spectral_centroids (1, L) Spectral centroid.
spectral_bandwidth (1, L) Spectral bandwidth.
spectral_contrast (7, L) Spectral contrast.
spectral_rolloff (1, L) Spectral rolloff frequency.
zero_crossing_rate (1, L) Zero-crossing rate.
Contains lighting-related data, structured as np.ndarray arrays with specific ranges and shapes.
Feature Range Shape Description
threshold 0 to 240 (F, 3, 256) Frame-specific light threshold data.
Details of threshold (per frame):
Frame (np.ndarray): Length F, where each frame has a shape (3, 256):
h (Hue):
Values range from 0 to 179.
Shape: (180, padded to 256).
s (Saturation):
Values range from 0 to 255.
Shape: (256,).
v (Value):
Values range from 0 to 255.
Shape: (256,).
This structure organizes the datasets into two main categories: music features for audio characteristics and light features for lighting data, enabling efficient data processing and analysis.
Data Usage
Use the cat command to merge the split files into a single .h5 file:
cat RPMC_L2_part_aa RPMC_L2_part_ab RPMC_L2_part_ac RPMC_L2_part_ad > RPMC_L2.h5
Use the following Python code to read the merged .h5 file and iterate through its contents:
import os import h5py
root_folder = "/path/to/your/folder" # Replace with your actual folder path
with h5py.File(os.path.join(root_folder, 'RPMC_L2.h5'), 'r') as f: for key in f.keys(): # Iterate through each file hash print(f" File {key}:") for group_name in f[key].keys(): # Iterate through 'music' and 'light' groups print(f" Group: {group_name}") for dataset_name in f[key][group_name].keys(): # Iterate through specific datasets print(f"{dataset_name}: {f[key][group_name][dataset_name].shape}")
f.keys(): Retrieves the top-level keys, typically representing file hashes.
f[key].keys(): Accesses the groups within each file (e.g., music and light).
f[key][group_name].keys(): Accesses the specific datasets within each group.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains a Python script implementing a proportional–integral (PI) control loop for stabilising the wavelength of a laser system. The script communicates with a SLICE-DHV high-voltage driver via PyVISA to tune the piezo actuator of the laser, while simultaneously reading the laser wavelength from a HighFinesse wavemeter through the wlmData.dll interface.The control loop compensates deviations from a user-defined setpoint, applying anti-windup protection and enforcing safety limits on wavelength error and control voltage. All measurements and control signals are continuously logged to a file for later analysis.The software has been developed for laboratory use in high-precision laser spectroscopy and frequency metrology experiments, where long-term wavelength stability of ultra-stable laser systems is essential.
Facebook
TwitterThis dataset is created using Python code to generate QR codes from the REAL list of URLs provided in the following dataset from Kaggle: https://www.kaggle.com/datasets/samahsadiq/benign-and-malicious-urls
The mentioned dataset consists of over 600,000 URLs. However, only the first 100,000 URLs from each class {Benign and Malicious} are used to generate the QR codes. In total, there 200,000 QR codes images in the dataset that encoded REAL URLs.
This dataset is a 'Balanced Dataset' of QR codes of version 2. The 100,000 Benign QR codes were generated by a single Loop in python, and the same for the Malicious QR codes.
The QR code images that belong to malicious URLs are under the 'malicious' folder with 'malicious' word in their file name. On the other hand, the QR cods that belongs to benign URLs are listed under 'benign' folder with 'benign' word appears in their filename.
NOTE: Keep in mind that malicious QR codes are encoded a REAL malicious URLs, it is not recommended to scan them manually and visiting their encoded websites.
For more informations about the encoded URLs, please refer to the mentioned dataset above in Kaggle.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A test dataset for MUSCLE (MUltiplexed Single-molecule Characterization at the Library scalE) data analysis. See "\Python codes for MUSCLE data analysis\README.txt" for the instructions on running the data analysis codes. Use the files in the "Test MUSCLE dataset" folder as input for the codes. "Test MUSCLE dataset\Output_tile1" contains the code output for the test dataset. The example dataset corresponds to one MiSeq tile in an experiment analyzing dCas9-induced R-loop formation for a library of 256 different target sequences.The latest version of the Python codes for matching single-molecule FRET traces with sequenced clusters is available at https://github.com/deindllab/MUSCLE/.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
image processing techniques for eye detection and red-eye removal, which are essential components of image restoration. The context provided by these snippets lays the groundwork for understanding how image restoration algorithms can be implemented and applied in real-world scenarios to enhance the quality of digital images.
https://github.com/brianlangay4/Image-Restoration-Computer-Vision/assets/67788456/714097a0-01ab-43dc-86b0-d6cc68d96b97" alt="Screenshot 2024-03-04 214443">
The eyesCascade variable in the our code refers to a Haar cascade classifier specifically trained for detecting eyes in images.
Haar Cascade Classifiers: Haar cascade classifiers are machine learning-based algorithms used for object detection. They work by using a series of feature templates (Haar features) to detect objects of interest. These features are simple rectangular areas where the pixel values are summed up and compared to a threshold.
Eyes Cascade Classifier: The eyes cascade classifier is trained specifically to detect eyes in images. It's pre-trained using a large dataset of positive samples (images containing eyes) and negative samples (images without eyes). During training, the classifier learns to distinguish between these two types of samples based on the patterns of Haar features present in the images.
How it Works: When applied to an input image, the eyes cascade classifier scans the image at multiple scales and locations, searching for regions that match the learned patterns of eye features. It uses a sliding window approach, where a window of fixed size moves across the image, and at each position, the Haar features are computed and compared to the learned patterns. If a region matches the eye patterns above a certain threshold, it's considered a positive detection, and the bounding box coordinates of the detected eyes are returned.
Usage in the Code:
In the provided code, the eyesCascade variable is loaded with a pre-trained eyes cascade classifier XML file using cv2.CascadeClassifier(). This file contains the learned patterns necessary for eye detection. Later, the detectMultiScale() function of the eyesCascade object is called to perform eye detection on the input image (img). The function returns a list of rectangles representing the bounding boxes of the detected eyes in the image.
Overall, the eyes cascade classifier plays a crucial role in automatically identifying eye regions within images, which is essential for subsequent processing tasks, such as red-eye removal, as demonstrated in the code.
Eye processing
To understand how the code detects and removes red eyes, let's break down the relevant parts:
```
1. **Eye Detection**:
```python
eyes = eyesCascade.detectMultiScale(img, scaleFactor=1.3, minNeighbors=4, minSize=(100, 100))
```
- This line utilizes the Haar cascade classifier (`eyesCascade`) to detect eyes in the input image (`img`).
- The `detectMultiScale` function detects objects (in this case, eyes) of different sizes in the input image. It returns a list of rectangles where it believes it found eyes.
2. **Processing Detected Eyes**:
```python
for (x, y, w, h) in eyes:
```
- This loop iterates over each detected eye, represented by its bounding box `(x, y, w, h)`.
3. **Extracting Eye Region**:
```python
eye = img[y:y+h, x:x+w]
```
- This line extracts the region of interest (ROI) from the original image (`img`) corresponding to the detected eye. It crops the image based on the coordinates of the bounding box.
4. **Red Eye Removal**:
- Once the eye region is extracted, the code performs the following steps to remove red-eye effect:
- **Extracting Channels**: It separates the eye image into its three color channels: blue (`b`), green (`g`), and red (`r`).
- **Calculating Background**: It calculates the sum of blue and green channels (`bg`), representing the background color without the red-eye effect.
- **Creating Mask**: It creates a binary mask (`mask`) to identify pixels that are significantly more red than the background. This is done by com...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Critical infrastructures encompass a wide range of process control systems, each with unique security needs. Securing diverse systems is a challenge since they require custom defenses. To address the gap, this study describes a process-aware anomaly detection framework that can automatically baseline the behavior of the process. Utilizing a sliding window Granger causality method, the framework detects time-varying dependencies, allowing it to capture stable and transient causal links across different operational states. Additionally, the anomaly detection framework considers the criticality of various components. The study evaluates the framework on a hardware-in-the-loop (HIL) water tank testbed. The framework successfully identified four sensors and actuator spoofing scenarios on the water tank system.List of Variables in PLC MemoryVariable nameVariable addressVariable FunctionalityI_PbFillIX100.0Push button to manually fill the tankI_PbDischargeIX100.1Push button to manually discharge the tankI_Level_MeterIW100Display the level of water in the tankI_ModeSelector%IX100.2Switches between auto and manual processQ_Fill_Valve%QW101Pumps water into the tankQ_Discharge_Valve%QW102Discharge water from the tankQ_Display%QW100Shows the numerical current tank water levelQ_Fill_Light%QX100.0Lights when the filling process is on.Q_Discharge_Light%QX100.2Lights when the discharging process is on.I_Flow_Meter%IW101Shows the current diameter of the discharge valve nozzleLowSetpoint%MW1Used to actuate the automatic filling processHighSetpoint%MW2Used to actuate the automatic discharging processTankLevel%MW0Used to calibrate the water level and control the LowSetpoint/HighSetpointI_PbSet%MX0.1Used to set the Q_Fill_LightI_PbReset%MX0.2Used to set the Q_Discharge_LightQ_Discharge_Valve_M%MW3Used to set the manual discharging processQ_Fill_Valve_M%MW4Used to set the manual filling processTo investigate variable dependencies, we capture multivariate time series data from the OpenPLC’s hardware layer. In a physical system, the hardware layer represents the wired connection between the PLC, sensor, and actuator network. By capturing data from the hardware layer, we can track the state of the sensors, actuators, and the MODBUS memory map. The memory map includes discrete output coils, discrete input contacts, analog input registers, and holding registers. Table I shows the list of variables in the Water tank simulation.During data collection, the water tank is set to auto mode. A network-connected Python program writes random low and high setpoint values at a random interval. The Python program also randomly opens and closes the valve. The normal capture spans over 15 hours and has 893,795 entries of data. Table II provides details on the datasets.For abnormal data, we simulate four spoofing scenarios involving the level, flow sensor, fill valve, and Display interface. The level sensor measures the water level in the tank, the flow sensor measures the outgoing flow, and the fill valve controls the water inflow. The display interface is a digital meter showing the current water level in the tank.Decription of the DatasetsData TypeDurationTotal Sample sizeNotesDataset 1: Normal Operation [monitor_data_randomized_setpoints]15 hours, 34 minutes, and 32 seconds893795Normal operation. Data used for baselining Water Tank using Frequency-Based Causal Structure AnalysisDataset 2: Level sensor spoof. [monitor_data_levelmeter]1 hour, 6 minutes, and 3 seconds64156Data captures during level sensor spoofing scenarioDataset 3: Flow meter sensor spoof [monitor_data_flowmeter]55 minutes and 1 second52439Data captures during flow meter spoofing scenarioDataset 4: Fill valve spoof. [monitor_data_fillvalve_march21st]1 hour, 21 minutes, and 54 seconds79224Data captures during fill valve spoofing scenarioDataset 5: Display interface anomaly. [monitor_data_Display]1 hour, 26 minutes, and 51 seconds84680Data captures during display interface spoofing scenarioDataset 6: Normal Operation [monitor_data_normal_march21st]1 hour, 24 minutes, and 23 seconds.81348Testing data for outlining creates normal thresholdFeel free to contact Dr. Rishabh Das for additional details.[Email:- rishabh.das@ohio.edu ]or[Email:- das.rishabh92@gmail.com]If you use this dataset, Please cite the following research paper."R. Das and G. Agendia, "Process-Aware Anomaly Detection in Industrial Control Systems Using Frequency-Based Causal Structure Analysis," 2025 IEEE World AI IoT Congress (AIIoT), Seattle, WA, USA, 2025, pp. 0228-0234, doi: 10.1109/AIIoT65859.2025.11105316."
Facebook
TwitterThis dataset was created by Sadique Khan