66 datasets found
  1. H

    Hydroinformatics Instruction Module Example Code: Sensor Data Quality...

    • hydroshare.org
    • beta.hydroshare.org
    • +1more
    zip
    Updated Mar 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amber Spackman Jones (2022). Hydroinformatics Instruction Module Example Code: Sensor Data Quality Control with pyhydroqc [Dataset]. https://www.hydroshare.org/resource/451c4f9697654b1682d87ee619cd7924
    Explore at:
    zip(159.5 MB)Available download formats
    Dataset updated
    Mar 3, 2022
    Dataset provided by
    HydroShare
    Authors
    Amber Spackman Jones
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This resource contains Jupyter Notebooks with examples for conducting quality control post processing for in situ aquatic sensor data. The code uses the Python pyhydroqc package. The resource is part of set of materials for hydroinformatics and water data science instruction. Complete learning module materials are found in HydroLearn: Jones, A.S., Horsburgh, J.S., Bastidas Pacheco, C.J. (2022). Hydroinformatics and Water Data Science. HydroLearn. https://edx.hydrolearn.org/courses/course-v1:USU+CEE6110+2022/about.

    This resources consists of 3 example notebooks and associated data files.

    Notebooks: 1. Example 1: Import and plot data 2. Example 2: Perform rules-based quality control 3. Example 3: Perform model-based quality control (ARIMA)

    Data files: Data files are available for 6 aquatic sites in the Logan River Observatory. Each file contains data for one site for a single year. Each file corresponds to a single year of data. The files are named according to monitoring site (FranklinBasin, TonyGrove, WaterLab, MainStreet, Mendon, BlackSmithFork) and year. The files were sourced by querying the Logan River Observatory relational database, and equivalent data could be obtained from the LRO website or on HydroShare. Additional information on sites, variables, and methods can be found on the LRO website (http://lrodata.usu.edu/tsa/) or HydroShare (https://www.hydroshare.org/search/?q=logan%20river%20observatory). Each file has the same structure indexed with a datetime column (mountain standard time) with three columns corresponding to each variable. Variable abbreviations and units are: - temp: water temperature, degrees C - cond: specific conductance, μS/cm - ph: pH, standard units - do: dissolved oxygen, mg/L - turb: turbidity, NTU - stage: stage height, cm

    For each variable, there are 3 columns: - Raw data value measured by the sensor (column header is the variable abbreviation). - Technician quality controlled (corrected) value (column header is the variable abbreviation appended with '_cor'). - Technician labels/qualifiers (column header is the variable abbreviation appended with '_qual').

  2. Data from: A Python-based pipeline for preprocessing LC-MS data for...

    • data.niaid.nih.gov
    xml
    Updated Nov 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NICOLAS ZABALEGUI (2020). A Python-based pipeline for preprocessing LC-MS data for untargeted metabolomics workflows [Dataset]. https://data.niaid.nih.gov/resources?id=mtbls1919
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Nov 21, 2020
    Dataset provided by
    CIBION-CONICET
    Authors
    NICOLAS ZABALEGUI
    Variables measured
    Metabolomics
    Description

    Preprocessing data in a reproducible and robust way is one of the current challenges in untargeted metabolomics workflows. Data curation in liquid chromatography-mass spectrometry (LC-MS) involves the removal of unwanted features (retention time; m/z pairs) to retain only high-quality data for subsequent analysis and interpretation. The present work introduces a package for the Python programming language for pre-processing LC-MS data for quality control procedures in untargeted metabolomics workflows. It is a versatile strategy that can be customized or fit for purpose according to the specific metabolomics application. It allows performing quality control procedures to ensure accuracy and reliability in LC-MS measurements, and it allows preprocessing metabolomics data to obtain cleaned matrices for subsequent statistical analysis. The capabilities of the package are showcased with pipelines for an LC-MS system suitability check, system conditioning, signal drift evaluation, and data curation. These applications were implemented to preprocess data corresponding to a new suite of plasma candidate plasma reference materials developed by the National Institute of Standards and Technology (NIST; hypertriglyceridemic, diabetic, and African-American plasma pools) to be used in untargeted metabolomics studies. in addition to NIST SRM 1950 – Metabolites in Frozen Human Plasma. The package offers a rapid and reproducible workflow that can be used in an automated or semi-automated fashion, and it is an open and free tool available to all users.

  3. a

    How Python Can Work For You

    • cope-open-data-deegsnccu.hub.arcgis.com
    • code-deegsnccu.hub.arcgis.com
    • +1more
    Updated Aug 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    East Carolina University (2023). How Python Can Work For You [Dataset]. https://cope-open-data-deegsnccu.hub.arcgis.com/datasets/ECU::how-python-can-work-for-you-
    Explore at:
    Dataset updated
    Aug 26, 2023
    Dataset authored and provided by
    East Carolina University
    Description

    Python is a free computer language that prioritizes readability for humans and general application. It is one of the easier computer languages to learn and start especially with no prior programming knowledge. I have been using Python for Excel spreadsheet automation, data analysis, and data visualization. It has allowed me to better focus on learning how to automate my data analysis workload. I am currently examining the North Carolina Department of Environmental Quality (NCDEQ) database for water quality sampling for the Town of Nags Head, NC. It spans over 26 years (1997-2023) and lists a total of currently 41 different testing site locations. You can see at the bottom of image 2 below that I have 148,204 testing data points for the entirety of the NCDEQ testing for the state. From this large dataset 34,759 data points are from Dare County (Nags Head) specifically with this subdivided into testing sites.

  4. 4

    Data from: Code Repository of "Advancing Data Quality Assurance with Machine...

    • data.4tu.nl
    zip
    Updated Nov 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vincent S. de Feiter; Jessica M.I. Strickland; Irene Garcia-Marti (2024). Code Repository of "Advancing Data Quality Assurance with Machine Learning: A Case Study on Wind Vane Stalling Detection" [Dataset]. http://doi.org/10.4121/6bee84b7-0088-4232-badb-9b3b38a3c40e.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 22, 2024
    Dataset provided by
    4TU.ResearchData
    Authors
    Vincent S. de Feiter; Jessica M.I. Strickland; Irene Garcia-Marti
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2024
    Area covered
    Description

    The code repository to replicate the work (e.g., figures and results) from the publication: "Advancing Data Quality Assurance with Machine Learning: A Case Study on Wind Vane Stalling Detection". Repository includes dedicated Python files and a README document.

  5. d

    Techniques for Increased Automation of Aquatic Sensor Data Post Processing...

    • search.dataone.org
    • hydroshare.org
    Updated Dec 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amber Spackman Jones; Jeffery S. Horsburgh; Tannner Jones (2021). Techniques for Increased Automation of Aquatic Sensor Data Post Processing in Python: Video Presentation [Dataset]. https://search.dataone.org/view/sha256%3Ac5b617be5f503d53736c7b2393b85b95f764e569a31935c4829ced0a048c5760
    Explore at:
    Dataset updated
    Dec 5, 2021
    Dataset provided by
    Hydroshare
    Authors
    Amber Spackman Jones; Jeffery S. Horsburgh; Tannner Jones
    Description

    This resource contains a video recording for a presentation given as part of the National Water Quality Monitoring Council conference in April 2021. The presentation covers the motivation for performing quality control for sensor data, the development of PyHydroQC, a Python package with functions for automating sensor quality control including anomaly detection and correction, and the performance of the algorithms applied to data from multiple sites in the Logan River Observatory.

    The initial abstract for the presentation: Water quality sensors deployed to aquatic environments make measurements at high frequency and commonly include artifacts that do not represent the environmental phenomena targeted by the sensor. Sensors are subject to fouling from environmental conditions, often exhibit drift and calibration shifts, and report anomalies and erroneous readings due to issues with datalogging, transmission, and other unknown causes. The suitability of data for analyses and decision making often depend on subjective and time-consuming quality control processes consisting of manual review and adjustment of data. Data driven and machine learning techniques have the potential to automate identification and correction of anomalous data, streamlining the quality control process. We explored documented approaches and selected several for implementation in a reusable, extensible Python package designed for anomaly detection for aquatic sensor data. Implemented techniques include regression approaches that estimate values in a time series, flag a point as anomalous if the difference between the sensor measurement exceeds a threshold, and offer replacement values for correcting anomalies. Additional algorithms that scaffold the central regression approaches include rules-based preprocessing, thresholds for determining anomalies that adjust with data variability, and the ability to detect and correct anomalies using forecasted and backcasted estimation. The techniques were developed and tested based on several years of data from aquatic sensors deployed at multiple sites in the Logan River Observatory in northern Utah, USA. Performance was assessed based on labels and corrections applied previously by trained technicians. In this presentation, we describe the techniques for detection and correction, report their performance, illustrate the workflow for applying to high frequency aquatic sensor data, and demonstrate the possibility for additional approaches to help increase automation of aquatic sensor data post processing.

  6. Quality Inspection Dataset

    • kaggle.com
    zip
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Python Developer (2025). Quality Inspection Dataset [Dataset]. https://www.kaggle.com/datasets/programmer3/quality-inspection-dataset
    Explore at:
    zip(226641 bytes)Available download formats
    Dataset updated
    Jul 31, 2025
    Authors
    Python Developer
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset consist of 1854 rows of real-world sensor data collected from automated manufacturing systems to detect and classify production faults. It includes domain-specific features such as temperature, vibration, acoustic signals, humidity, pressure, motor current, RPM, surface reflectance, machine cycle time, and tool wear level. Each instance is labeled with a fault type (normal, surface crack, overheating, vibration anomaly).

  7. Insurance Dataset for Data Engineering Practice

    • kaggle.com
    zip
    Updated Sep 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KPOVIESI Olaolouwa Amiche StƩphane (2025). Insurance Dataset for Data Engineering Practice [Dataset]. https://www.kaggle.com/datasets/kpoviesistphane/insurance-dataset-for-data-engineering-practice
    Explore at:
    zip(475362 bytes)Available download formats
    Dataset updated
    Sep 24, 2025
    Authors
    KPOVIESI Olaolouwa Amiche StƩphane
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Insurance Dataset for Data Engineering Practice

    Overview

    A realistic synthetic French insurance dataset specifically designed for practicing data cleaning, transformation, and analytics with PySpark and other big data tools. This dataset contains intentional data quality issues commonly found in real-world insurance data.

    Dataset Contents

    šŸ“Š Three Main Tables:

    • contracts.csv (~15,000 rows) - Insurance contracts with client information
    • claims.csv (~6,000 rows) - Insurance claims with damage and settlement details
    • vehicles.csv (~12,000 rows) - Vehicle information for auto insurance contracts

    šŸ—ŗļø Geographic Coverage:

    • French cities with realistic postal codes
    • Risk zone classifications (High/Medium/Low)
    • Regional pricing coefficients

    šŸ·ļø Product Types:

    • Auto Insurance (majority)
    • Home Insurance
    • Life Insurance
    • Health Insurance

    šŸŽÆ Intentional Data Quality Issues

    Perfect for practicing data cleaning and transformation:

    Date Format Issues:

    • Mixed formats: 2024-01-15, 15/01/2024, 01/15/2024
    • String storage requiring parsing and standardization

    Price Format Inconsistencies:

    • Multiple currency formats: 1250.50€, €1250.50, 1250.50 EUR, $1375.55
    • Missing currency symbols: 1250.50
    • Written formats: 1250.50 euros

    Missing Data Patterns:

    • Strategic missingness in age (8%), CSP (12%), expert_id (20-25%)
    • Realistic patterns based on business logic

    Categorical Inconsistencies:

    • Gender: M, F, Male, Female, empty strings
    • Power units: 150 HP, 150hp, 150 CV, 111 kW, missing values

    Data Type Issues:

    • Numeric values stored as strings
    • Mixed data types requiring casting

    šŸš€ Perfect for Practicing:

    PySpark Operations:

    • to_date() and date parsing functions
    • regexp_replace() for price cleaning
    • when().otherwise() conditional logic
    • cast() for data type conversions
    • fillna() and dropna() strategies

    Data Engineering Tasks:

    • ETL pipeline development
    • Data validation and quality checks
    • Join operations across related tables
    • Aggregation with business logic
    • Data standardization workflows

    Analytics & ML:

    • Customer segmentation
    • Claim frequency analysis
    • Premium pricing models
    • Risk assessment by geography
    • Churn prediction

    šŸ¢ Business Context

    Realistic insurance business rules implemented: - Age-based premium adjustments - Geographic risk zone pricing - Product-specific claim patterns - Seasonal claim distributions - Client lifecycle status transitions

    šŸ’” Use Cases:

    • Data Engineering Bootcamps: Hands-on PySpark practice
    • SQL Training: Complex joins and aggregations
    • Data Science Projects: End-to-end ML pipeline development
    • Business Intelligence: Dashboard and reporting practice
    • Data Quality Workshops: Cleaning and validation techniques

    šŸ”§ Tools Compatibility:

    • Apache Spark / PySpark
    • Pandas / Python
    • SQL databases
    • Databricks
    • Google Cloud Dataflow
    • AWS Glue

    šŸ“ˆ Difficulty Level:

    Intermediate - Suitable for learners with basic Python/SQL knowledge ready to tackle real-world data challenges.

    Generated with realistic French business context and intentional quality issues for educational purposes. All data is synthetic and does not represent real individuals or companies.

  8. Omega-Prime: Data Model, Data Format and Python Library for Handling Ground...

    • doi.org
    • zenodo.org
    zip
    Updated Jul 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sven Tarlowski; Sven Tarlowski (2025). Omega-Prime: Data Model, Data Format and Python Library for Handling Ground Truth Traffic Data [Dataset]. http://doi.org/10.5281/zenodo.16565213
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 29, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sven Tarlowski; Sven Tarlowski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data Model, Format and Python Library for ground truth data containing information on dynamic objects, map and environmental factors optimized for representing urban traffic. The repository contains:

    Data Model and Specification

    see ./docs/omega_prime_specification.md

    • šŸŒ Data Model: What signals exist and how these are defined.
    • 🧾 Data Format Specification: How to exchange and store those signals.

    Python Library

    • šŸ”Ø Create omega-prime files from many sources (see ./tutorial.ipynb):
    • šŸ—ŗļø Map Association: Associate Object Location with Lanes from OpenDRIVE or OSI Maps (see tutorial_locator.ipynb)
    • šŸ“ŗ Plotting of data: interactive top view plots using altair
    • āœ… Validation of data: check if your data conforms to the omega-prime specification (e.g., correct yaw) using pandera
    • šŸ“ Interpolation of data: bring your data into a fixed frequency
    • šŸ“ˆ Metrics: compute interaction metrics like PET, TTC, THW (see tutorial_metrics.ipynb)
    • šŸš€ Fast Processing directly on DataFrames using polars, polars-st

    The data model and format utilize ASAM OpenDRIVE and ASAM Open-Simulation-Interface GroundTruth messages. omega-prime sets requirements on presence and quality of ASAM OSI GroundTruth messages and ASAM OpenDRIVE files and defines a file format for the exchange and storage of these.

    Omega-Prime is the successor of the OMEGAFormat. It has the benefit that its definition is directly based on the established standards ASAM OSI and ASAM OpenDRIVE and carries over the data quality requirements and the data tooling from OMEGAFormat. Therefore, it should be easier to incorporate omega-prime into existing workflows and tooling.

    To learn more about the example data read example_files/README.md. Example data was taken and created from esmini.

  9. H

    New Zealand Hydrological Society Data Workshop 2024: A Python Package for...

    • beta.hydroshare.org
    • hydroshare.org
    • +1more
    zip
    Updated Apr 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amber Spackman Jones (2024). New Zealand Hydrological Society Data Workshop 2024: A Python Package for Automating Aquatic Data QA/QC [Dataset]. https://beta.hydroshare.org/resource/5e942e193e494f3fab89dc317d8084fa/
    Explore at:
    zip(159.6 MB)Available download formats
    Dataset updated
    Apr 9, 2024
    Dataset provided by
    HydroShare
    Authors
    Amber Spackman Jones
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    New Zealand
    Description

    This resource was created for the 2024 New Zealand Hydrological Society Data Workshop in Queenstown, NZ. This resource contains Jupyter Notebooks with examples for conducting quality control post processing for in situ aquatic sensor data. The code uses the Python pyhydroqc package to detect anomalies. This resource consists of 3 example notebooks and associated data files. For more information, see the original resource from which this was derived: http://www.hydroshare.org/resource/451c4f9697654b1682d87ee619cd7924.

    Notebooks: 1. Example 1: Import and plot data 2. Example 2: Perform rules-based quality control 3. Example 3: Perform model-based quality control (ARIMA) 4. Example 4: Model-based quality control (ARIMA) with user data

    Data files: Data files are available for 6 aquatic sites in the Logan River Observatory. Each file contains data for one site for a single year. Each file corresponds to a single year of data. The files are named according to monitoring site (FranklinBasin, TonyGrove, WaterLab, MainStreet, Mendon, BlackSmithFork) and year. The files were sourced by querying the Logan River Observatory relational database, and equivalent data could be obtained from the LRO website or on HydroShare. Additional information on sites, variables, and methods can be found on the LRO website (http://lrodata.usu.edu/tsa/) or HydroShare (https://www.hydroshare.org/search/?q=logan%20river%20observatory). Each file has the same structure indexed with a datetime column (mountain standard time) with three columns corresponding to each variable. Variable abbreviations and units are: - temp: water temperature, degrees C - cond: specific conductance, μS/cm - ph: pH, standard units - do: dissolved oxygen, mg/L - turb: turbidity, NTU - stage: stage height, cm

    For each variable, there are 3 columns: - Raw data value measured by the sensor (column header is the variable abbreviation). - Technician quality controlled (corrected) value (column header is the variable abbreviation appended with '_cor'). - Technician labels/qualifiers (column header is the variable abbreviation appended with '_qual').

    There is also a file "data.csv" for use with Example 4. If any user wants to bring their own data file, they should structure it similarly to this file with a single column of datetime values and a single column of numeric observations labeled "raw".

  10. Benchmark data set for MSPypeline, a python package for streamlined mass...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    xml
    Updated Jul 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Held; Ursula Klingmüller (2021). Benchmark data set for MSPypeline, a python package for streamlined mass spectrometry-based proteomics data analysis [Dataset]. https://data-staging.niaid.nih.gov/resources?id=pxd025792
    Explore at:
    xmlAvailable download formats
    Dataset updated
    Jul 22, 2021
    Dataset provided by
    DKFZ Heidelberg
    Division Systems Biology of Signal Transduction, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
    Authors
    Alexander Held; Ursula Klingmüller
    Variables measured
    Proteomics
    Description

    Mass spectrometry-based proteomics is increasingly employed in biology and medicine. To generate reliable information from large data sets and ensure comparability of results it is crucial to implement and standardize the quality control of the raw data, the data processing steps and the statistical analyses. The MSPypeline provides a platform for the import of MaxQuant output tables, the generation of quality control reports, the preprocessing of data including normalization and exploratory analyses by statistical inference plots. These standardized steps assess data quality, provide customizable figures and enable the identification of differentially expressed proteins to reach biologically relevant conclusions.

  11. Data from: Open-source quality control routine and multi-year power...

    • zenodo.org
    • data.niaid.nih.gov
    csv, text/x-python +1
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lennard Visser; Lennard Visser; Boudewijn Elsinga; Tarek AlSkaif; Tarek AlSkaif; Wilfried van Sark; Wilfried van Sark; Boudewijn Elsinga (2022). Open-source quality control routine and multi-year power generation data of 175 PV systems [Dataset]. http://doi.org/10.5281/zenodo.6906504
    Explore at:
    text/x-python, csv, zipAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lennard Visser; Lennard Visser; Boudewijn Elsinga; Tarek AlSkaif; Tarek AlSkaif; Wilfried van Sark; Wilfried van Sark; Boudewijn Elsinga
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    The repository contains an extensive dataset of PV power measurements and a python package (qcpv) for quality controlling PV power measurements. The dataset features four years (2014-2017) of power measurements of 175 rooftop mounted residential PV systems located in Utrecht, the Netherlands. The power measurements have a 1-min resolution.

    PV power measurements

    Three different versions of the power measurements are included in three data-subsets in the repository. Unfiltered power measurements are enclosed in unfiltered_pv_power_measurements.csv. Filtered power measurements are included as filtered_pv_power_measurements_sc.csv and filtered_pv_power_measurements_ac.csv. The former dataset contains the quality controlled power measurements after running single system filters only, the latter dataset considers the output after running both single and across system filters. The metadata of the PV systems is added in metadata.csv. This file holds for each PV system a unique ID, start and end time of registered power measurements, estimated DC and AC capacity, tilt and azimuth angle, annual yield and mapped grids of the system location (north, south, west and east boundary).

    Quality control routine

    An open-source quality control routine that can be applied to filter erroneous PV power measurements is added to the repository in the form of the Python package qcpv (qcpv.py). Sample code to call and run the functions in the qcpv package is available as example.py.

    Objective

    By publishing the dataset we provide access to high quality PV power measurements that can be used for research experiments on several topics related to PV power and the integration of PV in the electricity grid.

    By publishing the qcpv package we strive to set a next step into developing a standardized routine for quality control of PV power measurements. We hope to stimulate others to adopt and improve the routine of quality control and work towards a widely adopted standardized routine.

    Data usage

    If you use the data and/or python package in a published work please cite: Visser, L., Elsinga, B., AlSkaif, T., van Sark, W., 2022. Open-source quality control routine and multi-year power generation data of 175 PV systems. Journal of Renewable and Sustainable Energy.

    Units

    Timestamps are in UTC (YYYY-MM-DD HH:MM:SS+00:00).

    Power measurements are in Watt.

    Installed capacities (DC and AC) are in Watt-peak.

    Additional information

    A detailed discussion of the data and qcpv package is presented in: Visser, L., Elsinga, B., AlSkaif, T., van Sark, W., 2022. Open-source quality control routine and multi-year power generation data of 175 PV systems. Journal of Renewable and Sustainable Energy.

    Acknowledgements

    This work is part of the Energy Intranets (NEAT: ESI-BiDa 647.003.002) project, which is funded by the Dutch Research Council NWO in the framework of the Energy Systems Integration & Big Data programme. The authors would especially like to thank the PV owners who volunteered to take part in the measurement campaign.

  12. m

    Golgi_HCS_Data_Analysis_Tool

    • data.mendeley.com
    Updated Sep 8, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaista Hussain (2017). Golgi_HCS_Data_Analysis_Tool [Dataset]. http://doi.org/10.17632/pp282j4h29.2
    Explore at:
    Dataset updated
    Sep 8, 2017
    Authors
    Shaista Hussain
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data consists of a Golgi image dataset and the pipeline to perform unsupervised phenotypic analysis on these images. The data is presented as a zipped file ā€˜Golgi_HCA_workflow.zip’ and its contents include:

    1) Data folder ā€˜snare_2’ containing vignettes of Golgi images (.jpg) acquired from multiple fields of multiple wells and numerical data (.sta) corresponding to the image features extracted for each Golgi image. 2) Plate map folder ā€˜plate_maps’ containing the .csv plate map file for ā€˜snare_2’ dataset with the well locations for all the siRNA treatments. 3) Repository folder ā€˜repository’ containing ā€˜nqc.h5’. A labeled set of good and bad nuclei was used to train the nuclei quality control (NQC) classifier. The results of this pre-trained classifier have been included in ā€˜nqc.h5’ for convenience of users.
    4) Two Python scripts ā€˜control_model_utils.py’ for the control modeling module of the pipeline and 'HCA_workflow.py’ is the main script for running the entire pipeline. 5) README file describing the steps to download and install this package and the Python software needed to run it.

  13. h

    python-doctest-corpus-test

    • huggingface.co
    Updated Oct 18, 2007
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Noah Gift (2007). python-doctest-corpus-test [Dataset]. https://huggingface.co/datasets/paiml/python-doctest-corpus-test
    Explore at:
    Dataset updated
    Oct 18, 2007
    Authors
    Noah Gift
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Python Doctest Corpus

    A curated corpus of Python doctest examples designed for training Python-to-Rust transpilers and testing code translation systems.

      Dataset Description
    

    This dataset contains Python function signatures, doctest inputs, and expected outputs that serve as high-quality training data for:

    Transpilation training: Teaching models to translate Python patterns to Rust Test validation: Verifying that transpiled code produces correct outputs Code understanding:… See the full description on the dataset page: https://huggingface.co/datasets/paiml/python-doctest-corpus-test.

  14. zip of MetDataModel v0.6.1.

    • plos.figshare.com
    zip
    Updated Jun 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joshua M. Mitchell; Yuanye Chi; Maheshwor Thapa; Zhiqiang Pang; Jianguo Xia; Shuzhao Li (2024). zip of MetDataModel v0.6.1. [Dataset]. http://doi.org/10.1371/journal.pcbi.1011912.s007
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 18, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Joshua M. Mitchell; Yuanye Chi; Maheshwor Thapa; Zhiqiang Pang; Jianguo Xia; Shuzhao Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To standardize metabolomics data analysis and facilitate future computational developments, it is essential to have a set of well-defined templates for common data structures. Here we describe a collection of data structures involved in metabolomics data processing and illustrate how they are utilized in a full-featured Python-centric pipeline. We demonstrate the performance of the pipeline, and the details in annotation and quality control using large-scale LC-MS metabolomics and lipidomics data and LC-MS/MS data. Multiple previously published datasets are also reanalyzed to showcase its utility in biological data analysis. This pipeline allows users to streamline data processing, quality control, annotation, and standardization in an efficient and transparent manner. This work fills a major gap in the Python ecosystem for computational metabolomics.

  15. MOSTLY AI Prize Data

    • kaggle.com
    zip
    Updated May 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ivonaK (2025). MOSTLY AI Prize Data [Dataset]. https://www.kaggle.com/datasets/ivonav/mostly-ai-prize-data/code
    Explore at:
    zip(9871594 bytes)Available download formats
    Dataset updated
    May 16, 2025
    Authors
    ivonaK
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    Competition

    • Generate the BEST tabular synthetic data and win 100,000 USD in cash.
    • Competition runs for 50 days: May 14 - July 3, 2025.
    • MOSTLY AI Prize

    This competition features two independent synthetic data challenges that you can join separately: - The FLAT DATA Challenge - The SEQUENTIAL DATA Challenge

    For each challenge, generate a dataset with the same size and structure as the original, capturing its statistical patterns — but without being significantly closer to the (released) original samples than to the (unreleased) holdout samples.

    Train a generative model that generalizes well, using any open-source tools (Synthetic Data SDK, synthcity, reprosyn, etc.) or your own solution. Submissions must be fully open-source, reproducible, and runnable within 6 hours on a standard machine.

    Timeline

    • Submissions open: May 14, 2025, 15:30 UTC
    • Submission credits: 3 per calendar week (+bonus)
    • Submissions close: July 3, 2025, 23:59 UTC
    • Evaluation of Leaders: July 3 - July 9
    • Winners announced: on July 9 šŸ†

    Datasets

    Flat Data - 100,000 records - 80 data columns: 60 numeric, 20 categorical

    Sequential Data - 20,000 groups - each group contains 5-10 records - 10 data columns: 7 numeric, 3 categorical

    Evaluation

    • CSV submissions are parsed using pandas.read_csv() and checked for expected structure & size
    • Evaluated using the Synthetic Data Quality Assurance toolkit
    • Compared against the released training set and a hidden holdout set (same size, non-overlapping, from the same source)

    Submission

    MOSTLY AI Prize

    Citation

    If you use this dataset in your research, please cite:

    @dataset{mostlyaiprize,
     author = {MOSTLY AI},
     title = {MOSTLY AI Prize Dataset},
     year = {2025},
     url = {https://www.mostlyaiprize.com/},
    }
    
  16. E-commerce_dataset

    • kaggle.com
    zip
    Updated Nov 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhay Ayare (2025). E-commerce_dataset [Dataset]. https://www.kaggle.com/datasets/abhayayare/e-commerce-dataset
    Explore at:
    zip(644123 bytes)Available download formats
    Dataset updated
    Nov 16, 2025
    Authors
    Abhay Ayare
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    E-commerce_dataset

    This dataset is a synthetic yet realistic E-commerce retail dataset generated programmatically using Python (Faker + NumPy + Pandas).
    It is designed to closely mimic real-world online shopping behavior, user patterns, product interactions, seasonal trends, and marketplace events.
    
    

    You can use this dataset for:

    Machine Learning & Deep Learning
    Recommender Systems
    Customer Segmentation
    Sales Forecasting
    A/B Testing
    E-commerce Behaviour Analysis
    Data Cleaning / Feature Engineering Practice
    SQL practice
    

    šŸ“**Dataset Contents**

    The dataset contains 6 CSV files: ~~~ File Rows Description users.csv ~10,000 User profiles, demographics & signup info products.csv ~2,000 Product catalog with rating and pricing orders.csv ~20,000 Order-level transactions order_items.csv ~60,000 Items purchased per order reviews.csv ~15,000 Customer-written product reviews events.csv ~80,000 User event logs: view, cart, wishlist, purchase ~~~

    🧬 Data Dictionary

    1. Users (users.csv)
    Column Description
    user_id Unique user identifier
    name  Full customer name
    email  Email (synthetic, no real emails)
    gender Male / Female / Other
    city  City of residence
    signup_date Account creation date
    
    2. Products (products.csv)
    Column Description
    product_id Unique product identifier
    product_name  Product title
    category  Electronics, Clothing, Beauty, Home, Sports, etc.
    price  Actual selling price
    rating Average product rating
    
    3. Orders (orders.csv)
    Column Description
    order_id  Unique order identifier
    user_id User who placed the order
    order_date Timestamp of the order
    order_status  Completed / Cancelled / Returned
    total_amount  Total order value
    
    4. Order Items (order_items.csv)
    Column Description
    order_item_id  Unique identifier
    order_id  Associated order
    product_id Purchased product
    quantity  Quantity purchased
    item_price Price per unit
    
    5. Reviews (reviews.csv)
    Column Description
    review_id  Unique review identifier
    user_id User who submitted review
    product_id Reviewed product
    rating 1–5 star rating
    review_text Short synthetic review
    review_date Submission date
    
    6. Events (events.csv)
    Column Description
    event_id  Unique event identifier
    user_id User performing event
    product_id Viewed/added/purchased product
    event_type view/cart/wishlist/purchase
    event_timestamp Timestamp of event
    

    🧠 Possible Use Cases (Ideas & Projects)

    šŸ” Machine Learning

    Customer churn prediction
    Review sentiment analysis (NLP)
    Recommendation engines
    Price optimization models
    Demand forecasting (Time-series)
    

    šŸ“¦ Business Analytics

    Market basket analysis
    RFM segmentation
    Cohort analysis
    Funnel conversion tracking
    A/B testing simulations
    

    🧮 SQL Practice

    Joins
    Window functions
    Aggregations
    CTE-based funnels
    Complex queries
    

    šŸ›  How the Dataset Was Generated

    The dataset was generated entirely in Python using:

    Faker for realistic user and review generation
    NumPy for probability-based event modeling
    Pandas for data processing
    

    Custom logic for:

    demand variation
    user behavior simulation
    return/cancel probabilities
    seasonal order timestamp distribution
    The dataset does not include any real personal data.
    Everything is generated synthetically.
    

    āš ļø License

    This dataset is released under CC BY 4.0 — free to use for:
    Research
    Education
    Commercial projects
    Kaggle competitions
    Machine learning pipelines
    Just provide attribution.
    

    ⭐ If you found this dataset helpful, please:

    Upvote the dataset
    Leave a comment
    Share your notebooks/notebooks using it
    
  17. SDMdata: A Web-Based Software Tool for Collecting Species Occurrence Records...

    • plos.figshare.com
    doc
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiaoquan Kong; Minyi Huang; Renyan Duan (2023). SDMdata: A Web-Based Software Tool for Collecting Species Occurrence Records [Dataset]. http://doi.org/10.1371/journal.pone.0128295
    Explore at:
    docAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Xiaoquan Kong; Minyi Huang; Renyan Duan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    It is important to easily and efficiently obtain high quality species distribution data for predicting the potential distribution of species using species distribution models (SDMs). There is a need for a powerful software tool to automatically or semi-automatically assist in identifying and correcting errors. Here, we use Python to develop a web-based software tool (SDMdata) to easily collect occurrence data from the Global Biodiversity Information Facility (GBIF) and check species names and the accuracy of coordinates (latitude and longitude). It is an open source software (GNU Affero General Public License/AGPL licensed) allowing anyone to access and manipulate the source code. SDMdata is available online free of charge from .

  18. C

    Replication data for: Autonomous flow routing for near real-time quality of...

    • dataverse.csuc.cat
    ods, pdf, txt
    Updated Jun 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sima Barzegar; Sima Barzegar; Marc Ruiz RamĆ­rez; Marc Ruiz RamĆ­rez; Luis Domingo Velasco Esteban; Luis Domingo Velasco Esteban (2024). Replication data for: Autonomous flow routing for near real-time quality of service assurance [Dataset]. http://doi.org/10.34810/data1404
    Explore at:
    txt(8187), pdf(514866), ods(1275613)Available download formats
    Dataset updated
    Jun 7, 2024
    Dataset provided by
    CORA.Repositori de Dades de Recerca
    Authors
    Sima Barzegar; Sima Barzegar; Marc Ruiz RamĆ­rez; Marc Ruiz RamĆ­rez; Luis Domingo Velasco Esteban; Luis Domingo Velasco Esteban
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    ICREA
    https://ror.org/00k4n6c32
    https://ror.org/003x0zc53
    Description

    The datasets are the output of a Python-based simulator of control plane containing the module of deep reinforcement learning based flow operation, a lightweight Software-Defined Networking controller, the flow routing manager and the packet node agents. The data plane was accurately emulated using a simulator based on CURSA-SQ.

  19. a

    Full Range Heat Anomalies - USA 2022

    • hub.arcgis.com
    • giscommons-countyplanning.opendata.arcgis.com
    Updated Mar 11, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Trust for Public Land (2023). Full Range Heat Anomalies - USA 2022 [Dataset]. https://hub.arcgis.com/datasets/26b8ebf70dfc46c7a5eb099a2380ee1d
    Explore at:
    Dataset updated
    Mar 11, 2023
    Dataset authored and provided by
    The Trust for Public Land
    Area covered
    Description

    Notice: this is not the latest Heat Island Anomalies image service.This layer contains the relative degrees Fahrenheit difference between any given pixel and the mean heat value for the city in which it is located, for every city in the contiguous United States, Alaska, Hawaii, and Puerto Rico. This 30-meter raster was derived from Landsat 8 imagery band 10 (ground-level thermal sensor) from the summer of 2022, with patching from summer of 2021 where necessary.Federal statistics over a 30-year period show extreme heat is the leading cause of weather-related deaths in the United States. Extreme heat exacerbated by urban heat islands can lead to increased respiratory difficulties, heat exhaustion, and heat stroke. These heat impacts significantly affect the most vulnerable—children, the elderly, and those with preexisting conditions.The purpose of this layer is to show where certain areas of cities are hotter or cooler than the average temperature for that same city as a whole. This dataset represents a snapshot in time. It will be updated yearly, but is static between updates. It does not take into account changes in heat during a single day, for example, from building shadows moving. The thermal readings detected by the Landsat 8 sensor are surface-level, whether that surface is the ground or the top of a building. Although there is strong correlation between surface temperature and air temperature, they are not the same. We believe that this is useful at the national level, and for cities that don’t have the ability to conduct their own hyper local temperature survey. Where local data is available, it may be more accurate than this dataset. Dataset SummaryThis dataset was developed using proprietary Python code developed at The Trust for Public Land, running on the Descartes Labs platform through the Descartes Labs API for Python. The Descartes Labs platform allows for extremely fast retrieval and processing of imagery, which makes it possible to produce heat island data for all cities in the United States in a relatively short amount of time.In order to click on the image service and see the raw pixel values in a map viewer, you must be signed in to ArcGIS Online, then Enable Pop-Ups and Configure Pop-Ups.Using the Urban Heat Island (UHI) Image ServicesThe data is made available as an image service. There is a processing template applied that supplies the yellow-to-red or blue-to-red color ramp, but once this processing template is removed (you can do this in ArcGIS Pro or ArcGIS Desktop, or in QGIS), the actual data values come through the service and can be used directly in a geoprocessing tool (for example, to extract an area of interest). Following are instructions for doing this in Pro.In ArcGIS Pro, in a Map view, in the Catalog window, click on Portal. In the Portal window, click on the far-right icon representing Living Atlas. Search on the acronyms ā€œtplā€ and ā€œuhiā€. The results returned will be the UHI image services. Right click on a result and select ā€œAdd to current mapā€ from the context menu. When the image service is added to the map, right-click on it in the map view, and select Properties. In the Properties window, select Processing Templates. On the drop-down menu at the top of the window, the default Processing Template is either a yellow-to-red ramp or a blue-to-red ramp. Click the drop-down, and select ā€œNoneā€, then ā€œOKā€. Now you will have the actual pixel values displayed in the map, and available to any geoprocessing tool that takes a raster as input. Below is a screenshot of ArcGIS Pro with a UHI image service loaded, color ramp removed, and symbology changed back to a yellow-to-red ramp (a classified renderer can also be used): A typical operation at this point is to clip out your area of interest. To do this, add your polygon shapefile or feature class to the map view, and use the Clip Raster tool to export your area of interest as a geoTIFF raster (file extension ".tif"). In the environments tab for the Clip Raster tool, click the dropdown for "Extent" and select "Same as Layer:", and select the name of your polygon. If you then need to convert the output raster to a polygon shapefile or feature class, run the Raster to Polygon tool, and select "Value" as the field.Other Sources of Heat Island InformationPlease see these websites for valuable information on heat islands and to learn about exciting new heat island research being led by scientists across the country:EPA’s Heat Island Resource CenterDr. Ladd Keith, University of ArizonaDr. Ben McMahan, University of Arizona Dr. Jeremy Hoffman, Science Museum of Virginia Dr. Hunter Jones, NOAA Daphne Lundi, Senior Policy Advisor, NYC Mayor's Office of Recovery and ResiliencyDisclaimer/FeedbackWith nearly 14,000 cities represented, checking each city's heat island raster for quality assurance would be prohibitively time-consuming, so The Trust for Public Land checked a statistically significant sample size for data quality. The sample passed all quality checks, with about 98.5% of the output cities error-free, but there could be instances where the user finds errors in the data. These errors will most likely take the form of a line of discontinuity where there is no city boundary; this type of error is caused by large temperature differences in two adjacent Landsat scenes, so the discontinuity occurs along scene boundaries (see figure below). The Trust for Public Land would appreciate feedback on these errors so that version 2 of the national UHI dataset can be improved. Contact Dale.Watt@tpl.org with feedback.

  20. Seismicity patterns and multi-scale imaging of Krafla (N-E) Iceland with...

    • zenodo.org
    text/x-python, txt +1
    Updated Sep 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elisabeth Glück; Stephane Garambois; Stephane Garambois; Jean Vandemeulebrouck; Jean Vandemeulebrouck; Jean Virieux; Jean Virieux; Titouan Muzellec; Titouan Muzellec; Anette Mortensen; Egill Arni Gudnason; Egill Arni Gudnason; Thorbjorg Agustsdottir; Thorbjorg Agustsdottir; Elisabeth Glück; Anette Mortensen (2024). Seismicity patterns and multi-scale imaging of Krafla (N-E) Iceland with local earthquake tomography: Raw event waveforms for all events used in the inversion and manual picks for temporary network [Dataset]. http://doi.org/10.5281/zenodo.13842962
    Explore at:
    txt, zip, text/x-pythonAvailable download formats
    Dataset updated
    Sep 26, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Elisabeth Glück; Stephane Garambois; Stephane Garambois; Jean Vandemeulebrouck; Jean Vandemeulebrouck; Jean Virieux; Jean Virieux; Titouan Muzellec; Titouan Muzellec; Anette Mortensen; Egill Arni Gudnason; Egill Arni Gudnason; Thorbjorg Agustsdottir; Thorbjorg Agustsdottir; Elisabeth Glück; Anette Mortensen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Iceland, Krafla
    Description

    This Data and Software were used in the submitted paper "Seismicity patterns and multi-scale imaging at Krafla (N-E Iceland) wih local earthquake tomography" by Glück et al.
    The data and software provided here are used to compute the velocity models with TomoTV.
    The raw data (.mseed format) can be visualised with the Python package Pyrocko/Snuffler, which was also used for the arrival time picking.
    For the temporary network the manual picks are provided along with the code to prepare the manual picks as the input files for a localisation with NonLinLoc by weighting and quality checking the data. This resulting localsitations and the weighted traveltimes are then used for the LET.
    The same workflow was used for the picks from the permanent network.

    Data:
    - Raw data (\WaveformsPermanentStations): 7s waveform snippets of the events listed in the ISOR catalogue on http://lv.isor.is:8080/events/browse/ for the years 2021 and 2022.
    - Raw data (\WaveformsNodes): 5s waveform snippets of the events listed in the ISOR catalogue on http://lv.isor.is:8080/events/browse/2022 recorded with the temporary network of 98 temporary nodes in June and July 2022.
    - Pickfile (ManualPicks_100Nodes_Kafla2022.txt): Manual picks of the events listed in the ISOR catalogue for the evenst recorded with the temporary network.

    Software (Hyp_format.py):
    - Weighting: The picks are weighted according to their Signal-to-Noise ratio (described in more detail in Section 2.3 in the main text of the paper)
    - Writing the inputfile for NonLinLoc (with the selecting the mode option "PorS" in line 118), including all picks, also for those stations where not both phases were picked. The file "endfile.txt" is needed to write the picks to the NonLinLoc input format.
    - Quality check of the picks: Computing a modified Wadati diagram from the traveltime differences of P and S phases for all the events available (with the selecting the mode option "PandS" in line 118)
    - Python packages needed: numpy, scipy, matplotlib, pandas, obspy

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Amber Spackman Jones (2022). Hydroinformatics Instruction Module Example Code: Sensor Data Quality Control with pyhydroqc [Dataset]. https://www.hydroshare.org/resource/451c4f9697654b1682d87ee619cd7924

Hydroinformatics Instruction Module Example Code: Sensor Data Quality Control with pyhydroqc

Explore at:
zip(159.5 MB)Available download formats
Dataset updated
Mar 3, 2022
Dataset provided by
HydroShare
Authors
Amber Spackman Jones
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This resource contains Jupyter Notebooks with examples for conducting quality control post processing for in situ aquatic sensor data. The code uses the Python pyhydroqc package. The resource is part of set of materials for hydroinformatics and water data science instruction. Complete learning module materials are found in HydroLearn: Jones, A.S., Horsburgh, J.S., Bastidas Pacheco, C.J. (2022). Hydroinformatics and Water Data Science. HydroLearn. https://edx.hydrolearn.org/courses/course-v1:USU+CEE6110+2022/about.

This resources consists of 3 example notebooks and associated data files.

Notebooks: 1. Example 1: Import and plot data 2. Example 2: Perform rules-based quality control 3. Example 3: Perform model-based quality control (ARIMA)

Data files: Data files are available for 6 aquatic sites in the Logan River Observatory. Each file contains data for one site for a single year. Each file corresponds to a single year of data. The files are named according to monitoring site (FranklinBasin, TonyGrove, WaterLab, MainStreet, Mendon, BlackSmithFork) and year. The files were sourced by querying the Logan River Observatory relational database, and equivalent data could be obtained from the LRO website or on HydroShare. Additional information on sites, variables, and methods can be found on the LRO website (http://lrodata.usu.edu/tsa/) or HydroShare (https://www.hydroshare.org/search/?q=logan%20river%20observatory). Each file has the same structure indexed with a datetime column (mountain standard time) with three columns corresponding to each variable. Variable abbreviations and units are: - temp: water temperature, degrees C - cond: specific conductance, μS/cm - ph: pH, standard units - do: dissolved oxygen, mg/L - turb: turbidity, NTU - stage: stage height, cm

For each variable, there are 3 columns: - Raw data value measured by the sensor (column header is the variable abbreviation). - Technician quality controlled (corrected) value (column header is the variable abbreviation appended with '_cor'). - Technician labels/qualifiers (column header is the variable abbreviation appended with '_qual').

Search
Clear search
Close search
Google apps
Main menu