34 datasets found
  1. Functions & Loops in Python

    • kaggle.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sadique Khan (2023). Functions & Loops in Python [Dataset]. https://www.kaggle.com/datasets/sadiquekhann/functions-and-loops-in-python/discussion?sort=undefined
    Explore at:
    zip(5790 bytes)Available download formats
    Dataset updated
    May 31, 2023
    Authors
    Sadique Khan
    Description

    Dataset

    This dataset was created by Sadique Khan

    Contents

  2. t

    ESA CCI SM GAPFILLED Long-term Climate Data Record of Surface Soil Moisture...

    • researchdata.tuwien.at
    • researchdata.tuwien.ac.at
    zip
    Updated Sep 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wolfgang Preimesberger; Wolfgang Preimesberger; Pietro Stradiotti; Pietro Stradiotti; Wouter Arnoud Dorigo; Wouter Arnoud Dorigo (2025). ESA CCI SM GAPFILLED Long-term Climate Data Record of Surface Soil Moisture from merged multi-satellite observations [Dataset]. http://doi.org/10.48436/3fcxr-cde10
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 5, 2025
    Dataset provided by
    TU Wien
    Authors
    Wolfgang Preimesberger; Wolfgang Preimesberger; Pietro Stradiotti; Pietro Stradiotti; Wouter Arnoud Dorigo; Wouter Arnoud Dorigo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    This dataset was produced with funding from the European Space Agency (ESA) Climate Change Initiative (CCI) Plus Soil Moisture Project (CCN 3 to ESRIN Contract No: 4000126684/19/I-NB "ESA CCI+ Phase 1 New R&D on CCI ECVS Soil Moisture"). Project website: https://climate.esa.int/en/projects/soil-moisture/

    This dataset contains information on the Surface Soil Moisture (SM) content derived from satellite observations in the microwave domain.

    Dataset Paper (Open Access)

    A description of this dataset, including the methodology and validation results, is available at:

    Preimesberger, W., Stradiotti, P., and Dorigo, W.: ESA CCI Soil Moisture GAPFILLED: an independent global gap-free satellite climate data record with uncertainty estimates, Earth Syst. Sci. Data, 17, 4305–4329, https://doi.org/10.5194/essd-17-4305-2025, 2025.

    Abstract

    ESA CCI Soil Moisture is a multi-satellite climate data record that consists of harmonized, daily observations coming from 19 satellites (as of v09.1) operating in the microwave domain. The wealth of satellite information, particularly over the last decade, facilitates the creation of a data record with the highest possible data consistency and coverage.
    However, data gaps are still found in the record. This is particularly notable in earlier periods when a limited number of satellites were in operation, but can also arise from various retrieval issues, such as frozen soils, dense vegetation, and radio frequency interference (RFI). These data gaps present a challenge for many users, as they have the potential to obscure relevant events within a study area or are incompatible with (machine learning) software that often relies on gap-free inputs.
    Since the requirement of a gap-free ESA CCI SM product was identified, various studies have demonstrated the suitability of different statistical methods to achieve this goal. A fundamental feature of such gap-filling method is to rely only on the original observational record, without need for ancillary variable or model-based information. Due to the intrinsic challenge, there was until present no global, long-term univariate gap-filled product available. In this version of the record, data gaps due to missing satellite overpasses and invalid measurements are filled using the Discrete Cosine Transform (DCT) Penalized Least Squares (PLS) algorithm (Garcia, 2010). A linear interpolation is applied over periods of (potentially) frozen soils with little to no variability in (frozen) soil moisture content. Uncertainty estimates are based on models calibrated in experiments to fill satellite-like gaps introduced to GLDAS Noah reanalysis soil moisture (Rodell et al., 2004), and consider the gap size and local vegetation conditions as parameters that affect the gapfilling performance.

    Summary

    • Gap-filled global estimates of volumetric surface soil moisture from 1991-2023 at 0.25° sampling
    • Fields of application (partial): climate variability and change, land-atmosphere interactions, global biogeochemical cycles and ecology, hydrological and land surface modelling, drought applications, and meteorology
    • Method: Modified version of DCT-PLS (Garcia, 2010) interpolation/smoothing algorithm, linear interpolation over periods of frozen soils. Uncertainty estimates are provided for all data points.
    • More information: See Preimesberger et al. (2025) and https://doi.org/10.5281/zenodo.8320869" target="_blank" rel="noopener">ESA CCI SM Algorithm Theoretical Baseline Document [Chapter 7.2.9] (Dorigo et al., 2023)

    Programmatic Download

    You can use command line tools such as wget or curl to download (and extract) data for multiple years. The following command will download and extract the complete data set to the local directory ~/Download on Linux or macOS systems.

    #!/bin/bash

    # Set download directory
    DOWNLOAD_DIR=~/Downloads

    base_url="https://researchdata.tuwien.at/records/3fcxr-cde10/files"

    # Loop through years 1991 to 2023 and download & extract data
    for year in {1991..2023}; do
    echo "Downloading $year.zip..."
    wget -q -P "$DOWNLOAD_DIR" "$base_url/$year.zip"
    unzip -o "$DOWNLOAD_DIR/$year.zip" -d $DOWNLOAD_DIR
    rm "$DOWNLOAD_DIR/$year.zip"
    done

    Data details

    The dataset provides global daily estimates for the 1991-2023 period at 0.25° (~25 km) horizontal grid resolution. Daily images are grouped by year (YYYY), each subdirectory containing one netCDF image file for a specific day (DD), month (MM) in a 2-dimensional (longitude, latitude) grid system (CRS: WGS84). The file name has the following convention:

    ESACCI-SOILMOISTURE-L3S-SSMV-COMBINED_GAPFILLED-YYYYMMDD000000-fv09.1r1.nc

    Data Variables

    Each netCDF file contains 3 coordinate variables (WGS84 longitude, latitude and time stamp), as well as the following data variables:

    • sm: (float) The Soil Moisture variable reflects estimates of daily average volumetric soil moisture content (m3/m3) in the soil surface layer (~0-5 cm) over a whole grid cell (0.25 degree).
    • sm_uncertainty: (float) The Soil Moisture Uncertainty variable reflects the uncertainty (random error) of the original satellite observations and of the predictions used to fill observation data gaps.
    • sm_anomaly: Soil moisture anomalies (reference period 1991-2020) derived from the gap-filled values (`sm`)
    • sm_smoothed: Contains DCT-PLS predictions used to fill data gaps in the original soil moisture field. These values are also provided for cases where an observation was initially available (compare `gapmask`). In this case, they provided a smoothed version of the original data.
    • gapmask: (0 | 1) Indicates grid cells where a satellite observation is available (1), and where the interpolated (smoothed) values are used instead (0) in the 'sm' field.
    • frozenmask: (0 | 1) Indicates grid cells where ERA5 soil temperature is <0 °C. In this case, a linear interpolation over time is applied.

    Additional information for each variable is given in the netCDF attributes.

    Version Changelog

    Changes in v9.1r1 (previous version was v09.1):

    • This version uses a novel uncertainty estimation scheme as described in Preimesberger et al. (2025).

    Software to open netCDF files

    These data can be read by any software that supports Climate and Forecast (CF) conform metadata standards for netCDF files, such as:

    References

    • Preimesberger, W., Stradiotti, P., and Dorigo, W.: ESA CCI Soil Moisture GAPFILLED: an independent global gap-free satellite climate data record with uncertainty estimates, Earth Syst. Sci. Data, 17, 4305–4329, https://doi.org/10.5194/essd-17-4305-2025, 2025.
    • Dorigo, W., Preimesberger, W., Stradiotti, P., Kidd, R., van der Schalie, R., van der Vliet, M., Rodriguez-Fernandez, N., Madelon, R., & Baghdadi, N. (2023). ESA Climate Change Initiative Plus - Soil Moisture Algorithm Theoretical Baseline Document (ATBD) Supporting Product Version 08.1 (version 1.1). Zenodo. https://doi.org/10.5281/zenodo.8320869
    • Garcia, D., 2010. Robust smoothing of gridded data in one and higher dimensions with missing values. Computational Statistics & Data Analysis, 54(4), pp.1167-1178. Available at: https://doi.org/10.1016/j.csda.2009.09.020
    • Rodell, M., Houser, P. R., Jambor, U., Gottschalck, J., Mitchell, K., Meng, C.-J., Arsenault, K., Cosgrove, B., Radakovich, J., Bosilovich, M., Entin, J. K., Walker, J. P., Lohmann, D., and Toll, D.: The Global Land Data Assimilation System, Bulletin of the American Meteorological Society, 85, 381 – 394, https://doi.org/10.1175/BAMS-85-3-381, 2004.

    Related Records

    The following records are all part of the ESA CCI Soil Moisture science data records community

    1

    ESA CCI SM MODELFREE Surface Soil Moisture Record

    <a href="https://doi.org/10.48436/svr1r-27j77" target="_blank"

  3. a

    COCO

    • datasets.activeloop.ai
    • huggingface.co
    deeplake
    Updated Feb 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tsung-Yi Lin (2022). COCO [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/coco-dataset/
    Explore at:
    deeplakeAvailable download formats
    Dataset updated
    Feb 5, 2022
    Authors
    Tsung-Yi Lin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2014 - Dec 31, 2015
    Dataset funded by
    Microsoft Research
    Description

    The COCO dataset is a large dataset of labeled images and annotations. It is a popular dataset for machine learning and artificial intelligence research. The dataset consists of 330,000 images and 500,000 object annotations. The annotations include the bounding boxes of objects in the images, as well as the labels of the objects.

  4. PV-gradient (PVG) tropopause: Time series 1980--2017 in four reanalyses

    • zenodo.org
    • data-staging.niaid.nih.gov
    zip
    Updated Mar 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katharina Turhal; Katharina Turhal (2024). PV-gradient (PVG) tropopause: Time series 1980--2017 in four reanalyses [Dataset]. http://doi.org/10.5281/zenodo.10529153
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 27, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Katharina Turhal; Katharina Turhal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    PV-gradient tropopause time series

    General description

    These datasets contain time series of the PV-gradient tropopause (PVG tropopause) introduced by A. Kunz (2011, doi:10.1029/2010JD014343) and calculated by K. Turhal (2024, paper " Variability and Trends in the PVG Tropopause", preprint in EGUsphere: https://doi.org/10.5194/egusphere-2024-471).

    Data and methods

    The PVG tropopause has been computed by means of the Eddy Tracking Toolkit (developed by J. Clemens and K. Turhal, to be published):

    • from four reanalyses: ERA5, ERA-Interim, MERRA-2 and JRA-55
    • for the time range 1980/01/01 -- 2017/12/31 in time steps of the according reanalyses, i.e. four times daily at 00h, 06h, 12h and 18h
    • on each isentropic level, with potential temperatures (theta) ranging from 320 K to 380 K, in steps of 5 K for ERA5 and 10 K for the other reanalyses.

    Contents

    Datasets are provided for each year and isentropic level in NetCDF4 format, every file consisting of two groups for the northern and southern hemisphere. Each group contains the following variables, with time as dimension:

    • time in seconds since 2000/01/01 00:00 UTC
    • u_lim: Zonal wind speed at the PVG tropopause
    • vh_lim: Horizontal wind speed at the PVG tropopause
    • q_lim: Maximum of Q = vh * Grad PV
    • eqlat_lim: Location of the PVG tropopause in equivalent latitudes
    • latmean_lim: Location of the PVG tropopause in latitudes
    • pv_lim: PV value at the PVG tropopause

    In this upload, the PVG tropopause time series are included as *.zip files:

    • ERA5 dataset: "pvg-tp_era5_ts.zip"
    • ERA-Interim dataset: "pvg-tp_eraint_ts.zip"
    • MERRA-2 dataset: "pvg-tp_merra2_ts.zip"
    • JRA-55 dataset: "pvg-tp_jra55_ts.zip"
    • Plots of time series for each reanalysis of the variables eqlat_lim, latmean_lim and pv_lim: "pvg_tropopause_timeseries_plots.zip".

    How to use

    The variables in these netCDF files are grouped by hemisphere. To read in the data, specify the group first ("NorthernHemisphere" or "SouthernHemisphere") and then the variable name (see list above). In Python, this can be done as follows:

    import netCDF4 as nc
    
    file="

    If you would like to read in all variables in both hemispheres, you can loop e.g. as follows:

    import netCDF4 as nc
    
    file = "

    Funding

    This project has been funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – TRR 301 – Project-ID 428312742, TPChange: The Tropopause Region in a Changing Atmosphere (https://tpchange.de/).

  5. Brain-in-the-Loop Learning for Intelligent Vehicle Decision-Making

    • figshare.com
    zip
    Updated May 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiaofei Zhang; Haoyi Zheng; Jun Li; Chaosheng Huang; Hong Wang (2025). Brain-in-the-Loop Learning for Intelligent Vehicle Decision-Making [Dataset]. http://doi.org/10.6084/m9.figshare.27685629.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 7, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Xiaofei Zhang; Haoyi Zheng; Jun Li; Chaosheng Huang; Hong Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The inflexible human-autonomy relationship within autonomous driving scenarios still has not realized deep intelligent synergy, therefore unable to provide adaptive and context-sensitive decision-making and sometimes leading to violation of human preferences or even hazards. In this paper, we utilize functional near-infrared spectroscopy (fNIRS) signals as real-time human risk-perception feedback to establish a brain-in-the-loop (BiTL) trained artificial intelligence algorithm for decision-making. The proposed algorithm uses the result of driving risk reasoning as one input of reinforcement learning combining fNIRS-based risk and driving safety field model-based risk, realizing integrating human brain activity into the reinforcement learning scheme, then overcoming the disadvantage of machine-oriented intelligence that could violate human intentions. To achieve policy learning within limited BiTL training periods, we add two modification features to the proposed algorithm based on TD3. The experiment involving twenty participants has been conducted, and the results show that in continuously high-risk driving scenarios, compared to traditional reinforcement learning algorithms without human participation, the proposed algorithm can maintain a cautious driving policy and avoid potential collisions, validated with both proximal surrogate indicators and success rates. This repository contains the experimental dataset and Python code to reproduce the experimental results used in our research on 'Brain-in-the-Loop Learning for Intelligent Vehicle Decision-Making'. Both human subject studies, control groups and ablation studies data are included in this repository. Detailed description of file organization, data structures, requirements could be found in the README.md document.

  6. t

    Dataset for "Fractional Skyrmion Tubes in 3D Magnetic Nanowires"

    • researchdata.tuwien.ac.at
    zip
    Updated Apr 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amalio Fernandez-Pacheco Chicon; Naemi Riccarda Leo; Naemi Riccarda Leo; John Fullerton; John Fullerton; Amalio Fernandez-Pacheco Chicon; Amalio Fernandez-Pacheco Chicon; Amalio Fernandez-Pacheco Chicon (2025). Dataset for "Fractional Skyrmion Tubes in 3D Magnetic Nanowires" [Dataset]. http://doi.org/10.48436/3qbnk-w3115
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 15, 2025
    Dataset provided by
    TU Wien
    Authors
    Amalio Fernandez-Pacheco Chicon; Naemi Riccarda Leo; Naemi Riccarda Leo; John Fullerton; John Fullerton; Amalio Fernandez-Pacheco Chicon; Amalio Fernandez-Pacheco Chicon; Amalio Fernandez-Pacheco Chicon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    About the dataset

    This dataset supports a study where fractional skyrmion tubes were observed in double-helical nanowires fabricated by 3D nano-printing using focused electron beam-induced deposition.

    The dataset includes code, images and processed data for reproducing the figures from the associated paper, and is intended to support researchers interested in reproducing the data of the scientific article, including simulations and experiments. For more information about the code and data, please refer to the readme.txt file.

    The published preprint can be found here: https://arxiv.org/abs/2412.14069

    Data & File Overview

    1) Micromagnetic Simulations
    Contains Mumax3 files and scripts used to generate simulated data for the publication:

    • shape.go: Modified Mumax3 source file enabling double-helix geometry.
    • Double_Helix_Phase_Diagram.mx3: Script for calculating the energetic phase diagram (Fig. 1d), run with varying Msat, arm_Separation, and nm_per_turn_d.
    • Double_Helix_Two_Vortex_State_Hysteresis_Loop.mx3 + .ovf: Script and initial state for hysteresis loop simulations (Figs. 2e/f/k).
    • Double_Helix_Two_Vortex_State_Minor_Hysteresis_Loop.mx3 + Hybrid_Vortex-AP_State_MinorLoop.ovf: For minor loop simulations starting from hybrid vortex–AP states (Fig. 3c + Supplementary).
    • Double_Helix_Two_Vortex_State_Topological_Charge_Variation.mx3 + .ovf: For topological charge variation in two-vortex state (Figs. 4c/d).
    • Double_Helix_Fractional_Skyrmion_State_Topological_Charge_Variation.mx3 + .ovf: For topological charge variation in fractional Skyrmion state (Figs. 4e/f/g/h).

    2) XMCD
    Contains original ptychographic XMCD data from SOLEIL (beamtime 20210958, June 2022), processed from CL and CR reconstructions, aligned and normalized. Data saved as .dat arrays and .png images with field values in filenames. Includes metadata in [figure_name]_data_list.csv.

    • Fig2: Data for Fig. 2; includes Fig2_fitparameters.dat (linecut fit results for experimental loops in Fig. 2j).
    • SFig4: Data for Fig. 2 and Supplementary Fig. 4.
    • SFig5-major: Data for Fig. 3 and Supplementary Fig. 5.
    • SFig5-minor: Data for Fig. 3 and Supplementary Figs. 5 & 6.
    • SFig9: Data for Supplementary Fig. 9.

    3) SEM
    Contains an original SEM image of the fabricated double-helix structures used in Fig. 1.

    4) TEM
    Contains original TEM images of the FEBID Co nanostructure, used in Supplementary Fig. 7.

    Requirements

    The code can be executed using Python, MATLAB, Paraview and Mumax3, depending on the file.

    The images can be opened with any standard image software.

    Licenses

    The data is licensed under CC-BY, the code is licensed under MIT.

  7. Code Snippets: Insights and Readability

    • kaggle.com
    zip
    Updated Feb 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paakhi Maheshwari (2024). Code Snippets: Insights and Readability [Dataset]. https://www.kaggle.com/datasets/paakhim10/code-snippets-insights-and-readability
    Explore at:
    zip(643750 bytes)Available download formats
    Dataset updated
    Feb 2, 2024
    Authors
    Paakhi Maheshwari
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset 1: data_python

    Title: Python Code Metrics and Readability Dataset

    Description: This dataset provides a comprehensive collection of metrics and readability scores for Python code snippets. Each entry includes information such as the problem title, Python solutions, difficulty level, number of lines, code length, comments, cyclomatic complexity, indents, loop count, line length, identifiers, and readability score. The dataset is designed to facilitate the analysis of coding patterns, complexity, and readability in Python programming.

    Columns: - problem_title: Title of the coding problem - python_solutions: Python code solutions for the problem - difficulty: Difficulty level of the coding problem - num_of_lines: Number of lines in the Python code - code_length: Length of the Python code - comments: Number of comments in the code - cyclomatic_complexity: Cyclomatic complexity of the code - indents: Number of indents in the code - loop_count: Count of loops in the code - line_length: Average line length in the code - identifiers: Number of identifiers used in the code - readability: Readability score of the code

    Dataset 2: data_cpp

    Title: C++ Code Metrics and Readability Dataset

    Description: This dataset offers a comprehensive set of metrics and readability scores for C++ code snippets. Each record includes details such as the code itself, number of lines, code length, comments, cyclomatic complexity, number of indents, loop count, line length, identifiers, and readability score. The dataset is crafted to support the exploration of coding styles, complexity, and readability in C++ programming.

    Columns: - Answer: C++ code snippet - num_of_lines: Number of lines in the C++ code - code_length: Length of the C++ code - comments: Number of comments in the code - cyclomatic_complexity: Cyclomatic complexity of the code - num_of_indents: Number of indents in the code - loop_count: Count of loops in the code - line_length: Average line length in the code - identifiers: Number of identifiers used in the code - readability: Readability score of the code

    These datasets are valuable resources for researchers, educators, and practitioners interested in code analysis, programming styles, and software readability in Python and C++.

    All features of the dataset have been generated through coded functions that will be linked in the code file by the author.

  8. d

    Data from: Closed Loop Geothermal Working Group: GeoCLUSTER App, Subsurface...

    • catalog.data.gov
    • gdr.openei.org
    • +4more
    Updated Jan 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pacific Northwest National Laboratory (2025). Closed Loop Geothermal Working Group: GeoCLUSTER App, Subsurface Simulation Results, and Publications [Dataset]. https://catalog.data.gov/dataset/closed-loop-geothermal-working-group-geocluster-app-subsurface-simulation-results-and-publ-1d377
    Explore at:
    Dataset updated
    Jan 20, 2025
    Dataset provided by
    Pacific Northwest National Laboratory
    Description

    To better understand the heat production, electricity generation performance, and economic viability of closed-loop geothermal systems in hot-dry rock, the Closed-Loop Geothermal Working Group -- a consortium of several national labs and academic institutions has tabulated time-dependent numerical solutions and levelized cost results of two popular closed-loop heat exchanger designs (u-tube and co-axial). The heat exchanger designs were evaluated for two working fluids (water and supercritical CO2) while varying seven continuous independent parameters of interest (mass flow rate, vertical depth, horizontal extent, borehole diameter, formation gradient, formation conductivity, and injection temperature). The corresponding numerical solutions (approximately 1.2 million per heat exchanger design) are stored as multi-dimensional HDF5 datasets and can be queried at off-grid points using multi-dimensional linear interpolation. A Python script was developed to query this database and estimate time-dependent electricity generation using an organic Rankine cycle (for water) or direct turbine expansion cycle (for CO2) and perform a cost assessment. This document aims to give an overview of the HDF5 database file and highlights how to read, visualize, and query quantities of interest (e.g., levelized cost of electricity, levelized cost of heat) using the accompanying Python scripts. Details regarding the capital, operation, and maintenance and levelized cost calculation using the techno-economic analysis script are provided. This data submission will contain results from the Closed Loop Geothermal Working Group study that are within the public domain, including publications, simulation results, databases, and computer codes. GeoCLUSTER is a Python-based web application created using Dash, an open-source framework built on top of Flask that streamlines the building of data dashboards. GeoCLUSTER provides users with a collection of interactive methods for streamlining the exploration and visualization of an HDF5 dataset. The GeoCluster app and database are contained in the compressed file geocluster_vx.zip, where the "x" refers to the version number. For example, geocluster_v1.zip is Version 1 of the app. This zip file also contains installation instructions. **To use the GeoCLUSTER app in the cloud, click the link to "GeoCLUSTER on AWS" in the Resources section below. To use the GeoCLUSTER app locally, download the geocluster_vx.zip to your computer and uncompress this file. When uncompressed this file comprises two directories and the geocluster_installation.pdf file. The geo-data app contains the HDF5 database in condensed format, and the GeoCLUSTER directory contains the GeoCLUSTER app in the subdirectory dash_app, as app.py. The geocluster_installation.pdf file provides instructions on installing Python, the needed Python modules, and then executing the app.

  9. h

    depyler-citl

    • huggingface.co
    Updated Nov 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Noah Gift (2025). depyler-citl [Dataset]. https://huggingface.co/datasets/paiml/depyler-citl
    Explore at:
    Dataset updated
    Nov 29, 2025
    Authors
    Noah Gift
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Depyler CITL Corpus

    Python→Rust transpilation pairs for Compiler-in-the-Loop training.

      Dataset Description
    

    606 Python CLI examples with corresponding Rust translations (where available), designed for training transpiler ML models.

    Split Examples With Rust Size

    train 606 439 (72.4%) 957 KB

      Schema
    
  10. h

    tokenized-github-code-python

    • huggingface.co
    Updated Aug 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Loubna Ben Allal (2023). tokenized-github-code-python [Dataset]. https://huggingface.co/datasets/loubnabnl/tokenized-github-code-python
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 8, 2023
    Authors
    Loubna Ben Allal
    Description

    Pretokenized GitHub Code Dataset

      Dataset Description
    

    This is a pretokenized version of the Python files of the GitHub Code dataset, that consists of 115M code files from GitHub in 32 programming languages. We tokenized the dataset using BPE Tokenizer trained on code, available in this repo. Having a pretokenized dataset can speed up the training loop by not having to tokenize data at each batch call. We also include ratio_char_token which gives the ratio between the… See the full description on the dataset page: https://huggingface.co/datasets/loubnabnl/tokenized-github-code-python.

  11. t

    ESA CCI SM PASSIVE Daily Gap-filled Root-Zone Soil Moisture from merged...

    • researchdata.tuwien.at
    • researchdata.tuwien.ac.at
    zip
    Updated Oct 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wolfgang Preimesberger; Wolfgang Preimesberger; Johanna Lems; Martin Hirschi; Martin Hirschi; Wouter Arnoud Dorigo; Wouter Arnoud Dorigo; Johanna Lems; Johanna Lems; Johanna Lems (2025). ESA CCI SM PASSIVE Daily Gap-filled Root-Zone Soil Moisture from merged multi-satellite observations [Dataset]. http://doi.org/10.48436/8dda4-xne96
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 3, 2025
    Dataset provided by
    TU Wien
    Authors
    Wolfgang Preimesberger; Wolfgang Preimesberger; Johanna Lems; Martin Hirschi; Martin Hirschi; Wouter Arnoud Dorigo; Wouter Arnoud Dorigo; Johanna Lems; Johanna Lems; Johanna Lems
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset provides global daily estimates of Root-Zone Soil Moisture (RZSM) content at 0.25° spatial grid resolution, derived from gap-filled merged satellite observations of 14 passive satellites sensors operating in the microwave domain of the electromagnetic spectrum. Data is provided from January 1991 to December 2023.

    This dataset was produced with funding from the European Space Agency (ESA) Climate Change Initiative (CCI) Plus Soil Moisture Project (CCN 3 to ESRIN Contract No: 4000126684/19/I-NB "ESA CCI+ Phase 1 New R&D on CCI ECVS Soil Moisture"). Project website: https://climate.esa.int/en/projects/soil-moisture/" target="_blank" rel="noopener">https://climate.esa.int/en/projects/soil-moisture/. Operational implementation is supported by the Copernicus Climate Change Service implemented by ECMWF through C3S2 312a/313c.

    Studies using this dataset [preprint]

    This dataset is used by Hirschi et al. (2025) to assess recent summer drought trends in Switzerland.

    Hirschi, M., Michel, D., Schumacher, D. L., Preimesberger, W., and Seneviratne, S. I.: Recent summer soil moisture drying in Switzerland based on measurements from the SwissSMEX network, Earth Syst. Sci. Data Discuss. [preprint], https://doi.org/10.5194/essd-2025-416, in review, 2025.

    Abstract

    ESA CCI Soil Moisture is a multi-satellite climate data record that consists of harmonized, daily observations from various microwave satellite remote sensing sensors (Dorigo et al., 2017, 2024; Gruber et al., 2019). This version of the dataset uses the PASSIVE record as input, which contains only observations from passive (radiometer) measurements (scaling reference AMSR-E). The surface observations are gap-filled using a univariate interpolation algorithm (Preimesberger et al., 2025). The gap-filled passive observations serve as input for an exponential filter based method to assess soil moisture in different layers of the root-zone of soil (0-200 cm) following the approach by Pasik et al. (2023). The final gap-free root-zone soil moisture estimates based on passive surface input data are provided here at 4 separate depth layers (0-10, 10-40, 40-100, 100-200 cm) over the period 1991-2023.

    Summary

    • Gap-free root-zone soil moisture estimates from 1991-2023 at 0.25° spatial sampling from passive measurements
    • Fields of application include: climate variability and change, land-atmosphere interactions, global biogeochemical cycles and ecology, hydrological and land surface modelling, drought applications, agriculture and meteorology
    • More information: See Dorigo et al. (2017, 2024) and Gruber et al. (2019) for a description of the satellite base product and uncertainty estimates, Preimesberger et al. (2025) for the gap-filling, and Pasik et al. (2023) for the root-zone soil moisture and uncertainty propagation algorithm.

    Programmatic Download

    You can use command line tools such as wget or curl to download (and extract) data for multiple years. The following command will download and extract the complete data set to the local directory ~/Downloads on Linux or macOS systems.

    #!/bin/bash

    # Set download directory
    DOWNLOAD_DIR=~/Downloads

    base_url="https://researchdata.tuwien.ac.at/records/8dda4-xne96/files"

    # Loop through years 1991 to 2023 and download & extract data
    for year in {1991..2023}; do
    echo "Downloading $year.zip..."
    wget -q -P "$DOWNLOAD_DIR" "$base_url/$year.zip"
    unzip -o "$DOWNLOAD_DIR/$year.zip" -d $DOWNLOAD_DIR
    rm "$DOWNLOAD_DIR/$year.zip"
    done

    Data details

    The dataset provides global daily estimates for the 1991-2023 period at 0.25° (~25 km) horizontal grid resolution. Daily images are grouped by year (YYYY), each subdirectory containing one netCDF image file for a specific day (DD), month (MM) in a 2-dimensional (longitude, latitude) grid system (CRS: WGS84). The file name has the following convention:

    ESA_CCI_PASSIVERZSM-YYYYMMDD000000-fv09.1.nc

    Data Variables

    Each netCDF file contains 3 coordinate variables (WGS84 longitude, latitude and time stamp), as well as the following data variables:

    • rzsm_1: (float) Root Zone Soil Moisture at 0-10 cm. Given in volumetric units [m3/m3].
    • rzsm_2: (float) Root Zone Soil Moisture at 10-40 cm. Given in volumetric units [m3/m3].
    • rzsm_3: (float) Root Zone Soil Moisture at 40-100 cm. Given in volumetric units [m3/m3].
    • rzsm_4: (float) Root Zone Soil Moisture at 100-200. Given in volumetric units [m3/m3].
    • uncertainty_1: (float) Root Zone Soil Moisture uncertainty at 0-10 cm from propagated surface uncertainties [m3/m3].
    • uncertainty_2: (float) Root Zone Soil Moisture uncertainty at 10-40 cm from propagated surface uncertainties [m3/m3].
    • uncertainty_3: (float) Root Zone Soil Moisture uncertainty at 40-100 cm from propagated surface uncertainties [m3/m3].
    • uncertainty_4: (float) Root Zone Soil Moisture uncertainty at 100-200 cm from propagated surface uncertainties [m3/m3].

    Additional information for each variable is given in the netCDF attributes.

    Version Changelog

    • v9.1
      • Initial version based on PASSIVE input data from ESA CCI SM v09.1 as used by Hirschi et al. (2025).

    Software to open netCDF files

    These data can be read by any software that supports Climate and Forecast (CF) conform metadata standards for netCDF files, such as:

    References

    • Dorigo, W., Wagner, W., Albergel, C., Albrecht, F., Balsamo, G., Brocca, L., Chung, D., Ertl, M., Forkel, M., Gruber, A., Haas, E., Hamer, P. D., Hirschi, M., Ikonen, J., de Jeu, R., Kidd, R., Lahoz, W., Liu, Y. Y., Miralles, D., Mistelbauer, T., Nicolai-Shaw, N., Parinussa, R., Pratola, C., Reimer, C., van der Schalie, R., Seneviratne, S. I., Smolander, T., and Lecomte, P.: ESA CCI Soil Moisture for improved Earth system understanding: State-of-the art and future directions, Remote Sensing of Environment, 203, 185-215, 10.1016/j.rse.2017.07.001, 2017
    • Dorigo, W., Stradiotti, P., Preimesberger, W., Kidd, R., van der Schalie, R., Frederikse, T., Rodriguez-Fernandez, N., & Baghdadi, N. (2024). ESA Climate Change Initiative Plus - Soil Moisture Algorithm Theoretical Baseline Document (ATBD) Supporting Product Version 09.0. Zenodo. https://doi.org/10.5281/zenodo.13860922
    • Gruber, A., Scanlon, T., van der Schalie, R., Wagner, W., and Dorigo, W.: Evolution of the ESA CCI Soil Moisture climate data records and their underlying merging methodology, Earth Syst. Sci. Data, 11, 717–739, https://doi.org/10.5194/essd-11-717-2019, 2019.
    • Hirschi, M., Michel, D., Schumacher, D. L., Preimesberger, W., Seneviratne, S. I.: Recent summer soil moisture drying in Switzerland based on the SwissSMEX network, 2025 (paper submitted)
    • Pasik, A., Gruber, A., Preimesberger, W., De Santis, D., and Dorigo, W.: Uncertainty estimation for a new exponential-filter-based long-term root-zone soil moisture dataset from Copernicus Climate Change Service (C3S) surface observations, Geosci. Model Dev., 16, 4957–4976, https://doi.org/10.5194/gmd-16-4957-2023, 2023
    • Preimesberger, W., Stradiotti, P., and Dorigo, W.: ESA CCI Soil Moisture GAPFILLED: An independent global gap-free satellite climate data record with uncertainty estimates, Earth Syst. Sci. Data Discuss. [preprint], https://doi.org/10.5194/essd-2024-610, in review, 2025.

    Related Records

    Please see the ESA CCI Soil Moisture science data records community for more records based on ESA CCI SM.

  12. a

    PlantVillage

    • datasets.activeloop.ai
    • tensorflow.org
    • +2more
    deeplake
    Updated Feb 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arun Pandian J, Geetharamani Gopal (2022). PlantVillage [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/plantvillage-dataset/
    Explore at:
    deeplakeAvailable download formats
    Dataset updated
    Feb 3, 2022
    Authors
    Arun Pandian J, Geetharamani Gopal
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    A dataset of 61,486 images of plant leaves and backgrounds, with each image labeled with the disease or pest that is present. The dataset was created by researchers at the University of Wisconsin-Madison and is used for research in machine learning and computer vision tasks such as plant disease detection and pest identification.

  13. h

    ag_news_training_set_losses

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Vila, ag_news_training_set_losses [Dataset]. https://huggingface.co/datasets/dvilasuero/ag_news_training_set_losses
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Daniel Vila
    Description

    AG News train losses

    This dataset is part of an experiment using Rubrix, an open-source Python framework for human-in-the loop NLP data annotation and management.

  14. a

    USPS

    • datasets.activeloop.ai
    • opendatalab.com
    deeplake
    Updated Mar 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    J. J. Hull (2022). USPS [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/usps-dataset/
    Explore at:
    deeplakeAvailable download formats
    Dataset updated
    Mar 28, 2022
    Authors
    J. J. Hull
    License

    Attribution-NonCommercial-NoDerivs 2.0 (CC BY-NC-ND 2.0)https://creativecommons.org/licenses/by-nc-nd/2.0/
    License information was derived automatically

    Description

    A dataset of 20,000 handwritten digits from US mail service forms. The dataset was created by researchers at the University of California, Berkeley and is used for research in machine learning and computer vision tasks such as digit recognition.

  15. Z

    RPMC_L2

    • data.niaid.nih.gov
    Updated Feb 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous (2025). RPMC_L2 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_14854216
    Explore at:
    Dataset updated
    Feb 12, 2025
    Authors
    Anonymous
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Dataset Overview

    This is the Rock, Punk, Metal, and Core - Livehouse Lighting (RPMC-L2) Dataset.

    Purpose: Dataset for studying the relationship between music and lighting in live music performances

    Music Genres: Rock, Punk, Metal, and Core

    Total Files: 699 files of synchronized music and lighting data

    Collection Method: Collected from professional live performance venues

    Data Format: HDF5 file format (.h5)

    Total Size: ~40 GB

    Dataset Data Structure

    1. music (dict)

    Contains audio-related features, stored as np.ndarray arrays. Each feature has a shape (X, L), where L is the sequence length.

    Feature Shape Description

    openl3 (512, L) OpenL3 deep audio embedding.

    mel_spectrogram (128, L) Mel spectrogram.

    mel_spectrogram_db (128, L) Mel spectrogram in decibels.

    cqt (84, L) Constant-Q transform (CQT).

    stft (1025, L) Short-time Fourier transform (STFT).

    mfcc (128, L) Mel-frequency cepstral coefficients.

    chroma_stft (12, L) Chroma features from STFT.

    chroma_cqt (12, L) Chroma features from CQT.

    chroma_cens (12, L) Chroma Energy Normalized Statistics.

    spectral_centroids (1, L) Spectral centroid.

    spectral_bandwidth (1, L) Spectral bandwidth.

    spectral_contrast (7, L) Spectral contrast.

    spectral_rolloff (1, L) Spectral rolloff frequency.

    zero_crossing_rate (1, L) Zero-crossing rate.

    1. light (dict)

    Contains lighting-related data, structured as np.ndarray arrays with specific ranges and shapes.

    Feature Range Shape Description

    threshold 0 to 240 (F, 3, 256) Frame-specific light threshold data.

    Details of threshold (per frame):

    Frame (np.ndarray): Length F, where each frame has a shape (3, 256):

    h (Hue):

    Values range from 0 to 179.

    Shape: (180, padded to 256).

    s (Saturation):

    Values range from 0 to 255.

    Shape: (256,).

    v (Value):

    Values range from 0 to 255.

    Shape: (256,).

    This structure organizes the datasets into two main categories: music features for audio characteristics and light features for lighting data, enabling efficient data processing and analysis.

    Data Usage

    1. Merge the Files

    Use the cat command to merge the split files into a single .h5 file:

    cat RPMC_L2_part_aa RPMC_L2_part_ab RPMC_L2_part_ac RPMC_L2_part_ad > RPMC_L2.h5

    1. Read the Merged File

    Use the following Python code to read the merged .h5 file and iterate through its contents:

    import os import h5py

    root_folder = "/path/to/your/folder" # Replace with your actual folder path

    with h5py.File(os.path.join(root_folder, 'RPMC_L2.h5'), 'r') as f: for key in f.keys(): # Iterate through each file hash print(f" File {key}:") for group_name in f[key].keys(): # Iterate through 'music' and 'light' groups print(f" Group: {group_name}") for dataset_name in f[key][group_name].keys(): # Iterate through specific datasets print(f"{dataset_name}: {f[key][group_name][dataset_name].shape}")

    f.keys(): Retrieves the top-level keys, typically representing file hashes.

    f[key].keys(): Accesses the groups within each file (e.g., music and light).

    f[key][group_name].keys(): Accesses the specific datasets within each group.

  16. R

    Python PI Control Script for Laser Wavelength Stabilisation

    • repod.icm.edu.pl
    text/x-python, txt
    Updated Sep 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Linek, Adam (2025). Python PI Control Script for Laser Wavelength Stabilisation [Dataset]. http://doi.org/10.18150/S4VIZZ
    Explore at:
    txt(1067), txt(3747), text/x-python(10871), txt(257)Available download formats
    Dataset updated
    Sep 3, 2025
    Dataset provided by
    RepOD
    Authors
    Linek, Adam
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains a Python script implementing a proportional–integral (PI) control loop for stabilising the wavelength of a laser system. The script communicates with a SLICE-DHV high-voltage driver via PyVISA to tune the piezo actuator of the laser, while simultaneously reading the laser wavelength from a HighFinesse wavemeter through the wlmData.dll interface.The control loop compensates deviations from a user-defined setpoint, applying anti-windup protection and enforcing safety limits on wavelength error and control voltage. All measurements and control signals are continuously logged to a file for later analysis.The software has been developed for laboratory use in high-precision laser spectroscopy and frequency metrology experiments, where long-term wavelength stability of ultra-stable laser systems is essential.

  17. Benign and Malicious QR codes

    • kaggle.com
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samah Malibari (2022). Benign and Malicious QR codes [Dataset]. https://www.kaggle.com/datasets/samahsadiq/benign-and-malicious-qr-codes/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    Kaggle
    Authors
    Samah Malibari
    Description

    This dataset is created using Python code to generate QR codes from the REAL list of URLs provided in the following dataset from Kaggle: https://www.kaggle.com/datasets/samahsadiq/benign-and-malicious-urls

    The mentioned dataset consists of over 600,000 URLs. However, only the first 100,000 URLs from each class {Benign and Malicious} are used to generate the QR codes. In total, there 200,000 QR codes images in the dataset that encoded REAL URLs.

    This dataset is a 'Balanced Dataset' of QR codes of version 2. The 100,000 Benign QR codes were generated by a single Loop in python, and the same for the Malicious QR codes.

    The QR code images that belong to malicious URLs are under the 'malicious' folder with 'malicious' word in their file name. On the other hand, the QR cods that belongs to benign URLs are listed under 'benign' folder with 'benign' word appears in their filename.

    NOTE: Keep in mind that malicious QR codes are encoded a REAL malicious URLs, it is not recommended to scan them manually and visiting their encoded websites.

    For more informations about the encoded URLs, please refer to the mentioned dataset above in Kaggle.

  18. s

    MUSCLE (MUltiplexed Single-molecule Characterization at the Library scalE)...

    • figshare.scilifelab.se
    • researchdata.se
    • +1more
    zip
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mikhail Panfilov; Guanzhong Mao; Jianfeng Guo; Javier Aguirre Rivera; Anton Sabantcev; Sebastian Deindl (2025). MUSCLE (MUltiplexed Single-molecule Characterization at the Library scalE) protocol data and codes [Dataset]. http://doi.org/10.17044/scilifelab.28008872.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    Uppsala University
    Authors
    Mikhail Panfilov; Guanzhong Mao; Jianfeng Guo; Javier Aguirre Rivera; Anton Sabantcev; Sebastian Deindl
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A test dataset for MUSCLE (MUltiplexed Single-molecule Characterization at the Library scalE) data analysis. See "\Python codes for MUSCLE data analysis\README.txt" for the instructions on running the data analysis codes. Use the files in the "Test MUSCLE dataset" folder as input for the codes. "Test MUSCLE dataset\Output_tile1" contains the code output for the test dataset. The example dataset corresponds to one MiSeq tile in an experiment analyzing dCas9-induced R-loop formation for a library of 256 different target sequences.The latest version of the Python codes for matching single-molecule FRET traces with sequenced clusters is available at https://github.com/deindllab/MUSCLE/.

  19. Red Eye Removal

    • kaggle.com
    zip
    Updated Mar 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brian Langay ☺ (2024). Red Eye Removal [Dataset]. https://www.kaggle.com/datasets/brianlangay/red-eye-removal
    Explore at:
    zip(3000906 bytes)Available download formats
    Dataset updated
    Mar 6, 2024
    Authors
    Brian Langay ☺
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Image-Restoration-Computer-Vision

    image processing techniques for eye detection and red-eye removal, which are essential components of image restoration. The context provided by these snippets lays the groundwork for understanding how image restoration algorithms can be implemented and applied in real-world scenarios to enhance the quality of digital images.

    https://github.com/brianlangay4/Image-Restoration-Computer-Vision/assets/67788456/714097a0-01ab-43dc-86b0-d6cc68d96b97" alt="Screenshot 2024-03-04 214443">

    OpenCV is a powerful library for computer vision tasks in Python. It includes tools for image processing, object detection, and more. One of its key features is the Haar cascade classifier, a machine learning-based algorithm used for object detection.

    In the context of eye reduction, OpenCV's Haar cascade classifier is particularly useful. By training on a dataset of positive (eye-containing) and negative (eye-lacking) images, it learns to detect eyes in images. This pre-trained classifier can then be applied to new images to automatically locate eye regions. This functionality is leveraged in tasks like red-eye reduction, where the detected eye regions are processed to remove unwanted red-eye effects.

    The eyesCascade variable in the our code refers to a Haar cascade classifier specifically trained for detecting eyes in images.

    Haar Cascade Classifiers: Haar cascade classifiers are machine learning-based algorithms used for object detection. They work by using a series of feature templates (Haar features) to detect objects of interest. These features are simple rectangular areas where the pixel values are summed up and compared to a threshold.

    Eyes Cascade Classifier: The eyes cascade classifier is trained specifically to detect eyes in images. It's pre-trained using a large dataset of positive samples (images containing eyes) and negative samples (images without eyes). During training, the classifier learns to distinguish between these two types of samples based on the patterns of Haar features present in the images.

    How it Works: When applied to an input image, the eyes cascade classifier scans the image at multiple scales and locations, searching for regions that match the learned patterns of eye features. It uses a sliding window approach, where a window of fixed size moves across the image, and at each position, the Haar features are computed and compared to the learned patterns. If a region matches the eye patterns above a certain threshold, it's considered a positive detection, and the bounding box coordinates of the detected eyes are returned.

    Usage in the Code: In the provided code, the eyesCascade variable is loaded with a pre-trained eyes cascade classifier XML file using cv2.CascadeClassifier(). This file contains the learned patterns necessary for eye detection. Later, the detectMultiScale() function of the eyesCascade object is called to perform eye detection on the input image (img). The function returns a list of rectangles representing the bounding boxes of the detected eyes in the image.

    Overall, the eyes cascade classifier plays a crucial role in automatically identifying eye regions within images, which is essential for subsequent processing tasks, such as red-eye removal, as demonstrated in the code.

    Eye processing

    To understand how the code detects and removes red eyes, let's break down the relevant parts:
      ```
    1. **Eye Detection**:
      ```python
      eyes = eyesCascade.detectMultiScale(img, scaleFactor=1.3, minNeighbors=4, minSize=(100, 100))
      ```
      - This line utilizes the Haar cascade classifier (`eyesCascade`) to detect eyes in the input image (`img`). 
      - The `detectMultiScale` function detects objects (in this case, eyes) of different sizes in the input image. It returns a list of rectangles where it believes it found eyes.
    
    2. **Processing Detected Eyes**:
      ```python
      for (x, y, w, h) in eyes:
      ```
      - This loop iterates over each detected eye, represented by its bounding box `(x, y, w, h)`.
    
    3. **Extracting Eye Region**:
      ```python
      eye = img[y:y+h, x:x+w]
      ```
      - This line extracts the region of interest (ROI) from the original image (`img`) corresponding to the detected eye. It crops the image based on the coordinates of the bounding box.
    
    4. **Red Eye Removal**:
      - Once the eye region is extracted, the code performs the following steps to remove red-eye effect:
       - **Extracting Channels**: It separates the eye image into its three color channels: blue (`b`), green (`g`), and red (`r`).
       - **Calculating Background**: It calculates the sum of blue and green channels (`bg`), representing the background color without the red-eye effect.
       - **Creating Mask**: It creates a binary mask (`mask`) to identify pixels that are significantly more red than the background. This is done by com...
    
  20. Datasets

    • figshare.com
    csv
    Updated Sep 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rishabh Das (2025). Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.28735547.v2
    Explore at:
    csvAvailable download formats
    Dataset updated
    Sep 16, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Rishabh Das
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Critical infrastructures encompass a wide range of process control systems, each with unique security needs. Securing diverse systems is a challenge since they require custom defenses. To address the gap, this study describes a process-aware anomaly detection framework that can automatically baseline the behavior of the process. Utilizing a sliding window Granger causality method, the framework detects time-varying dependencies, allowing it to capture stable and transient causal links across different operational states. Additionally, the anomaly detection framework considers the criticality of various components. The study evaluates the framework on a hardware-in-the-loop (HIL) water tank testbed. The framework successfully identified four sensors and actuator spoofing scenarios on the water tank system.List of Variables in PLC MemoryVariable nameVariable addressVariable FunctionalityI_PbFillIX100.0Push button to manually fill the tankI_PbDischargeIX100.1Push button to manually discharge the tankI_Level_MeterIW100Display the level of water in the tankI_ModeSelector%IX100.2Switches between auto and manual processQ_Fill_Valve%QW101Pumps water into the tankQ_Discharge_Valve%QW102Discharge water from the tankQ_Display%QW100Shows the numerical current tank water levelQ_Fill_Light%QX100.0Lights when the filling process is on.Q_Discharge_Light%QX100.2Lights when the discharging process is on.I_Flow_Meter%IW101Shows the current diameter of the discharge valve nozzleLowSetpoint%MW1Used to actuate the automatic filling processHighSetpoint%MW2Used to actuate the automatic discharging processTankLevel%MW0Used to calibrate the water level and control the LowSetpoint/HighSetpointI_PbSet%MX0.1Used to set the Q_Fill_LightI_PbReset%MX0.2Used to set the Q_Discharge_LightQ_Discharge_Valve_M%MW3Used to set the manual discharging processQ_Fill_Valve_M%MW4Used to set the manual filling processTo investigate variable dependencies, we capture multivariate time series data from the OpenPLC’s hardware layer. In a physical system, the hardware layer represents the wired connection between the PLC, sensor, and actuator network. By capturing data from the hardware layer, we can track the state of the sensors, actuators, and the MODBUS memory map. The memory map includes discrete output coils, discrete input contacts, analog input registers, and holding registers. Table I shows the list of variables in the Water tank simulation.During data collection, the water tank is set to auto mode. A network-connected Python program writes random low and high setpoint values at a random interval. The Python program also randomly opens and closes the valve. The normal capture spans over 15 hours and has 893,795 entries of data. Table II provides details on the datasets.For abnormal data, we simulate four spoofing scenarios involving the level, flow sensor, fill valve, and Display interface. The level sensor measures the water level in the tank, the flow sensor measures the outgoing flow, and the fill valve controls the water inflow. The display interface is a digital meter showing the current water level in the tank.Decription of the DatasetsData TypeDurationTotal Sample sizeNotesDataset 1: Normal Operation [monitor_data_randomized_setpoints]15 hours, 34 minutes, and 32 seconds893795Normal operation. Data used for baselining Water Tank using Frequency-Based Causal Structure AnalysisDataset 2: Level sensor spoof. [monitor_data_levelmeter]1 hour, 6 minutes, and 3 seconds64156Data captures during level sensor spoofing scenarioDataset 3: Flow meter sensor spoof [monitor_data_flowmeter]55 minutes and 1 second52439Data captures during flow meter spoofing scenarioDataset 4: Fill valve spoof. [monitor_data_fillvalve_march21st]1 hour, 21 minutes, and 54 seconds79224Data captures during fill valve spoofing scenarioDataset 5: Display interface anomaly. [monitor_data_Display]1 hour, 26 minutes, and 51 seconds84680Data captures during display interface spoofing scenarioDataset 6: Normal Operation [monitor_data_normal_march21st]1 hour, 24 minutes, and 23 seconds.81348Testing data for outlining creates normal thresholdFeel free to contact Dr. Rishabh Das for additional details.[Email:- rishabh.das@ohio.edu ]or[Email:- das.rishabh92@gmail.com]If you use this dataset, Please cite the following research paper."R. Das and G. Agendia, "Process-Aware Anomaly Detection in Industrial Control Systems Using Frequency-Based Causal Structure Analysis," 2025 IEEE World AI IoT Congress (AIIoT), Seattle, WA, USA, 2025, pp. 0228-0234, doi: 10.1109/AIIoT65859.2025.11105316."

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sadique Khan (2023). Functions & Loops in Python [Dataset]. https://www.kaggle.com/datasets/sadiquekhann/functions-and-loops-in-python/discussion?sort=undefined
Organization logo

Functions & Loops in Python

Mastering Functions and Loops: Unleashing the Power

Explore at:
zip(5790 bytes)Available download formats
Dataset updated
May 31, 2023
Authors
Sadique Khan
Description

Dataset

This dataset was created by Sadique Khan

Contents

Search
Clear search
Close search
Google apps
Main menu