100+ datasets found
  1. Storage and Transit Time Data and Code

    • zenodo.org
    zip
    Updated Nov 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew Felton; Andrew Felton (2024). Storage and Transit Time Data and Code [Dataset]. http://doi.org/10.5281/zenodo.14171251
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 15, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andrew Felton; Andrew Felton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Author: Andrew J. Felton
    Date: 11/15/2024

    This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis, and figure production for the study entitled:

    "Global estimates of the storage and transit time of water through vegetation"

    Please note that 'turnover' and 'transit' are used interchangeably. Also please note that this R project has been updated multiple times as the analysis has updated throughout the peer review process.

    #Data information:

    The data folder contains key data sets used for analysis. In particular:

    "data/turnover_from_python/updated/august_2024_lc/" contains the core datasets used in this study including global arrays summarizing five year (2016-2020) averages of mean (annual) and minimum (monthly) transit time, storage, canopy transpiration, and number of months of data able as both an array (.nc) or data table (.csv). These data were produced in python using the python scripts found in the "supporting_code" folder. The remaining files in the "data" and "data/supporting_data" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here. The "supporting_data"" folder also contains annual (2016-2020) MODIS land cover data used in the analysis and contains separate filters containing the original data (.hdf) and then the final process (filtered) data in .nc format. The resulting annual land cover distributions were used in the pre-processing of data in python.

    #Code information

    Python scripts can be found in the "supporting_code" folder.

    Each R script in this project has a role:

    "01_start.R": This script sets the working directory, loads in the tidyverse package (the remaining packages in this project are called using the `::` operator), and can run two other scripts: one that loads the customized functions (02_functions.R) and one for importing and processing the key dataset for this analysis (03_import_data.R).

    "02_functions.R": This script contains custom functions. Load this using the `source()` function in the 01_start.R script.

    "03_import_data.R": This script imports and processes the .csv transit data. It joins the mean (annual) transit time data with the minimum (monthly) transit data to generate one dataset for analysis: annual_turnover_2. Load this using the
    `source()` function in the 01_start.R script.

    "04_figures_tables.R": This is the main workhouse for figure/table production and supporting analyses. This script generates the key figures and summary statistics used in the study that then get saved in the "manuscript_figures" folder. Note that all maps were produced using Python code found in the "supporting_code"" folder. Also note that within the "manuscript_figures" folder there is an "extended_data" folder, which contains tables of the summary statistics (e.g., quartiles and sample sizes) behind figures containing box plots or depicting regression coefficients.

    "supporting_generate_data.R": This script processes supporting data used in the analysis, primarily the varying ground-based datasets of leaf water content.

    "supporting_process_land_cover.R": This takes annual MODIS land cover distributions and processes them through a multi-step filtering process so that they can be used in preprocessing of datasets in python.

  2. f

    Raw data and python code. from Metacarpophalangeal joint loads during bonobo...

    • rs.figshare.com
    7z
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Synek; Szu-Ching Lu; Sandra Nauwelaerts; Dieter H. Pahr; Tracy L. Kivell (2023). Raw data and python code. from Metacarpophalangeal joint loads during bonobo locomotion: model predictions versus proxies [Dataset]. http://doi.org/10.6084/m9.figshare.11865489.v1
    Explore at:
    7zAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    The Royal Society
    Authors
    Alexander Synek; Szu-Ching Lu; Sandra Nauwelaerts; Dieter H. Pahr; Tracy L. Kivell
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This file contains the raw data as well as the Python code used to generate the results and plots shown in the main manuscript.

  3. Pre-Processed Power Grid Frequency Time Series

    • data.subak.org
    csv
    Updated Feb 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2023). Pre-Processed Power Grid Frequency Time Series [Dataset]. https://data.subak.org/dataset/pre-processed-power-grid-frequency-time-series
    Explore at:
    csvAvailable download formats
    Dataset updated
    Feb 16, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Description

    Overview

    This repository contains ready-to-use frequency time series as well as the corresponding pre-processing scripts in python. The data covers three synchronous areas of the European power grid:

    • Continental Europe
    • Great Britain
    • Nordic

    This work is part of the paper "Predictability of Power Grid Frequency"[1]. Please cite this paper, when using the data and the code. For a detailed documentation of the pre-processing procedure we refer to the supplementary material of the paper.

    Data sources

    We downloaded the frequency recordings from publically available repositories of three different Transmission System Operators (TSOs).

    • Continental Europe [2]: We downloaded the data from the German TSO TransnetBW GmbH, which retains the Copyright on the data, but allows to re-publish it upon request [3].
    • Great Britain [4]: The download was supported by National Grid ESO Open Data, which belongs to the British TSO National Grid. They publish the frequency recordings under the NGESO Open License [5].
    • Nordic [6]: We obtained the data from the Finish TSO Fingrid, which provides the data under the open license CC-BY 4.0 [7].

    Content of the repository

    A) Scripts

    1. In the "Download_scripts" folder you will find three scripts to automatically download frequency data from the TSO's websites.
    2. In "convert_data_format.py" we save the data with corrected timestamp formats. Missing data is marked as NaN (processing step (1) in the supplementary material of [1]).
    3. In "clean_corrupted_data.py" we load the converted data and identify corrupted recordings. We mark them as NaN and clean some of the resulting data holes (processing step (2) in the supplementary material of [1]).

    The python scripts run with Python 3.7 and with the packages found in "requirements.txt".

    B) Yearly converted and cleansed data

    The folders "

    • File type: The files are zipped csv-files, where each file comprises one year.
    • Data format: The files contain two columns. The second column contains the frequency values in Hz. The first one represents the time stamps in the format Year-Month-Day Hour-Minute-Second, which is given as naive local time. The local time refers to the following time zones and includes Daylight Saving Times (python time zone in brackets):
      • TransnetBW: Continental European Time (CE)
      • Nationalgrid: Great Britain (GB)
      • Fingrid: Finland (Europe/Helsinki)
    • NaN representation: We mark corrupted and missing data as "NaN" in the csv-files.

    Use cases

    We point out that this repository can be used in two different was:

    • Use pre-processed data: You can directly use the converted or the cleansed data. Note however, that both data sets include segments of NaN-values due to missing and corrupted recordings. Only a very small part of the NaN-values were eliminated in the cleansed data to not manipulate the data too much.

    • Produce your own cleansed data: Depending on your application, you might want to cleanse the data in a custom way. You can easily add your custom cleansing procedure in "clean_corrupted_data.py" and then produce cleansed data from the raw data in "

    License

    This work is licensed under multiple licenses, which are located in the "LICENSES" folder.

    • We release the code in the folder "Scripts" under the MIT license .
    • The pre-processed data in the subfolders "**/Fingrid" and "**/Nationalgrid" are licensed under CC-BY 4.0.
    • TransnetBW originally did not publish their data under an open license. We have explicitly received the permission to publish the pre-processed version from TransnetBW. However, we cannot publish our pre-processed version under an open license due to the missing license of the original TransnetBW data.

    Changelog

    Version 2:

    • Add time zone information to description
    • Include new frequency data
    • Update references
    • Change folder structure to yearly folders

    Version 3:

    • Correct TransnetBW files for missing data in May 2016
  4. Python Import Data in March - Seair.co.in

    • seair.co.in
    Updated Mar 30, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim (2016). Python Import Data in March - Seair.co.in [Dataset]. https://www.seair.co.in
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Mar 30, 2016
    Dataset provided by
    Seair Exim Solutions
    Authors
    Seair Exim
    Area covered
    Cyprus, Chad, Tuvalu, French Polynesia, Tanzania, Israel, Lao People's Democratic Republic, Germany, Bermuda, Maldives
    Description

    Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

  5. STEAD subsample 4 CDiffSD

    • zenodo.org
    bin
    Updated Apr 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniele Trappolini; Daniele Trappolini (2024). STEAD subsample 4 CDiffSD [Dataset]. http://doi.org/10.5281/zenodo.11094536
    Explore at:
    binAvailable download formats
    Dataset updated
    Apr 30, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Daniele Trappolini; Daniele Trappolini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 15, 2024
    Description

    STEAD Subsample Dataset for CDiffSD Training

    Overview

    This dataset is a subsampled version of the STEAD dataset, specifically tailored for training our CDiffSD model (Cold Diffusion for Seismic Denoising). It consists of four HDF5 files, each saved in a format that requires Python's `h5py` method for opening.

    Dataset Files

    The dataset includes the following files:

    • train: Used for both training and validation phases (with validation train split). Contains earthquake ground truth traces.
    • noise_train: Used for both training and validation phases. Contains noise used to contaminate the traces.
    • test: Used for the testing phase, structured similarly to train.
    • noise_test: Used for the testing phase, contains noise data for testing.

    Each file is structured to support the training and evaluation of seismic denoising models.

    Data

    The HDF5 files named noise contain two main datasets:

    • traces: This dataset includes N number of events, with each event being 6000 in size, representing the length of the traces. Each trace is organized into three channels in the following order: E (East-West), N (North-South), Z (Vertical).
    • metadata: This dataset contains the names of the traces for each event.

    Similarly, the train and test files, which contain earthquake data, include the same traces and metadata datasets, but also feature two additional datasets:

    • p_arrival: Contains the arrival indices of P-waves, expressed in counts.
    • s_arrival: Contains the arrival indices of S-waves, also expressed in counts.


    Usage

    To load these files in a Python environment, use the following approach:

    ```python

    import h5py
    import numpy as np

    # Open the HDF5 file in read mode
    with h5py.File('train_noise.hdf5', 'r') as file:
    # Print all the main keys in the file
    print("Keys in the HDF5 file:", list(file.keys()))

    if 'traces' in file:
    # Access the dataset
    data = file['traces'][:10] # Load the first 10 traces

    if 'metadata' in file:
    # Access the dataset
    trace_name = file['metadata'][:10] # Load the first 10 metadata entries```

    Ensure that the path to the file is correctly specified relative to your Python script.

    Requirements

    To use this dataset, ensure you have Python installed along with the Pandas library, which can be installed via pip if not already available:

    ```bash
    pip install numpy
    pip install h5py
    ```

  6. h

    Python-DPO

    • huggingface.co
    Updated Jul 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NextWealth Entrepreneurs Private Limited (2024). Python-DPO [Dataset]. https://huggingface.co/datasets/NextWealth/Python-DPO
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 18, 2024
    Dataset authored and provided by
    NextWealth Entrepreneurs Private Limited
    Description

    Dataset Card for Python-DPO

    This dataset is the smaller version of Python-DPO-Large dataset and has been created using Argilla.

      Load with datasets
    

    To load this dataset with datasets, you'll just need to install datasets as pip install datasets --upgrade and then use the following code: from datasets import load_dataset

    ds = load_dataset("NextWealth/Python-DPO")

      Data Fields
    

    Each data instance contains:

    instruction: The problem… See the full description on the dataset page: https://huggingface.co/datasets/NextWealth/Python-DPO.

  7. p

    Historical wind measurements at 10 meters height from the DMC network

    • plataformadedatos.cl
    csv, mat, npz, xlsx
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meteorological Directorate of Chile, Historical wind measurements at 10 meters height from the DMC network [Dataset]. https://www.plataformadedatos.cl/datasets/en/ac80b695a1398aa5
    Explore at:
    npz, mat, csv, xlsxAvailable download formats
    Dataset authored and provided by
    Meteorological Directorate of Chile
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The speed, direction of the wind and the variable wind indicator are the variables recorded by the meteorological network of the Chilean Meteorological Directorate (DMC). This collection contains the information stored by 168 stations that have recorded, at some point, the orientation of the wind since 1950, spaced one hour apart. It is important to note that not all stations are currently operational.

    The data is updated directly from the DMC's web services and can be viewed in the Data Series viewer of the Itrend Data Platform.

    In addition, a historical database is provided in .npz* and .mat** format that is updated every 30 days for those stations that are still valid.

    *To load the data correctly in Python it is recommended to use the following code:

    import numpy as np
    
    with np.load(filename, allow_pickle = True) as f:
      data = {}
      for key, value in f.items():
        data[key] = value.item()
    

    **Date data is in datenum format, and to load it correctly in datetime format, it is recommended to use the following command in MATLAB:

    datetime(TS.x , 'ConvertFrom' , 'datenum')
    
  8. Z

    polyOne Data Set - 100 million hypothetical polymers including 29 properties...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Mar 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rampi Ramprasad (2023). polyOne Data Set - 100 million hypothetical polymers including 29 properties [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7124187
    Explore at:
    Dataset updated
    Mar 24, 2023
    Dataset provided by
    Rampi Ramprasad
    Christopher Kuenneth
    Description

    polyOne Data Set

    The data set contains 100 million hypothetical polymers each with 29 predicted properties using machine learning models. We use PSMILES strings to represent polymer structures, see here and here. The polymers are generated by decomposing previously synthesized polymers into unique chemical fragments. Random and enumerative compositions of these fragments yield 100 million hypothetical PSMILES strings. All PSMILES strings are chemically valid polymers but, mostly, have never been synthesized before. More information can be found in the paper. Please note the license agreement in the LICENSE file.

    Full data set including the properties

    The data files are in Apache Parquet format. The files start with polyOne_*.parquet.

    I recommend using dask (pip install dask) to load and process the data set. Pandas also works but is slower.

    Load sharded data set with dask python import dask.dataframe as dd ddf = dd.read_parquet("*.parquet", engine="pyarrow")

    For example, compute the description of data set ```python df_describe = ddf.describe().compute() df_describe

    
    
    PSMILES strings only
    
    
    
      
    generated_polymer_smiles_train.txt - 80 million PSMILES strings for training polyBERT. One string per line.
      
    generated_polymer_smiles_dev.txt - 20 million PSMILES strings for testing polyBERT. One string per line.
    
  9. d

    Community Geothermal: Soil Conductivity, Borehole Design, Energy Models, and...

    • catalog.data.gov
    • data.openei.org
    • +3more
    Updated Jan 20, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTI Energy (2025). Community Geothermal: Soil Conductivity, Borehole Design, Energy Models, and Load Data for a Residential System Development - Hinesburg, VT [Dataset]. https://catalog.data.gov/dataset/community-geothermal-soil-conductivity-borehole-design-energy-models-and-load-data-for-a-r-46d5d
    Explore at:
    Dataset updated
    Jan 20, 2025
    Dataset provided by
    GTI Energy
    Area covered
    Hinesburg
    Description

    This dataset contains materials from the Coalition for Community-Supported Affordable Geothermal Energy Systems (C2SAGES) project, which evaluated the techno-economic feasibility of a community geothermal system for a residential development in Hinesburg, VT. The dataset includes detailed soil conductivity test reports, energy models, borehole design reports, hourly energy loads for heating, cooling, and hot water, and design layouts. EnergyPlus was used to model building energy loads, and Modelica software was applied for geothermal loop sizing based on these loads and soil conductivity results. Python scripts for network design further refined the models. Key files include PDF reports on borehole design (with projections for 1-year, 15-year, and 30-year systems), soil conductivity test results, EnergyPlus modeling outputs, and 2D/3D design drawings in PDF, DWG, and DXF formats. Python notebooks for network design and OnePipe model files are also provided, with Modelica required for viewing certain files. Outputs and modeling data are in various formats including CSV, JPG, HTML, and IDF, with units and data clearly labeled to support understanding of system design and performance for the proposed geothermal solution.

  10. Z

    Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Dec 24, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lignos, Dimitrios G. (2022). Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6965146
    Explore at:
    Dataset updated
    Dec 24, 2022
    Dataset provided by
    Hartloper, Alexander R.
    de Castro e Sousa, Albano
    Lignos, Dimitrios G.
    Ozden, Selimcan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials

    Background

    This dataset contains data from monotonic and cyclic loading experiments on structural metallic materials. The materials are primarily structural steels and one iron-based shape memory alloy is also included. Summary files are included that provide an overview of the database and data from the individual experiments is also included.

    The files included in the database are outlined below and the format of the files is briefly described. Additional information regarding the formatting can be found through the post-processing library (https://github.com/ahartloper/rlmtp/tree/master/protocols).

    Usage

    The data is licensed through the Creative Commons Attribution 4.0 International.

    If you have used our data and are publishing your work, we ask that you please reference both:

    this database through its DOI, and

    any publication that is associated with the experiments. See the Overall_Summary and Database_References files for the associated publication references.

    Included Files

    Overall_Summary_2022-08-25_v1-0-0.csv: summarises the specimen information for all experiments in the database.

    Summarized_Mechanical_Props_Campaign_2022-08-25_v1-0-0.csv: summarises the average initial yield stress and average initial elastic modulus per campaign.

    Unreduced_Data-#_v1-0-0.zip: contain the original (not downsampled) data

    Where # is one of: 1, 2, 3, 4, 5, 6. The unreduced data is broken into separate archives because of upload limitations to Zenodo. Together they provide all the experimental data.

    We recommend you un-zip all the folders and place them in one "Unreduced_Data" directory similar to the "Clean_Data"

    The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.

    There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the unreduced data.

    The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.

    Clean_Data_v1-0-0.zip: contains all the downsampled data

    The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.

    There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the clean data.

    The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.

    Database_References_v1-0-0.bib

    Contains a bibtex reference for many of the experiments in the database. Corresponds to the "citekey" entry in the summary files.

    File Format: Downsampled Data

    These are the "LP_

    The header of the first column is empty: the first column corresponds to the index of the sample point in the original (unreduced) data

    Time[s]: time in seconds since the start of the test

    e_true: true strain

    Sigma_true: true stress in MPa

    (optional) Temperature[C]: the surface temperature in degC

    These data files can be easily loaded using the pandas library in Python through:

    import pandas data = pandas.read_csv(data_file, index_col=0)

    The data is formatted so it can be used directly in RESSPyLab (https://github.com/AlbanoCastroSousa/RESSPyLab). Note that the column names "e_true" and "Sigma_true" were kept for backwards compatibility reasons with RESSPyLab.

    File Format: Unreduced Data

    These are the "LP_

    The first column is the index of each data point

    S/No: sample number recorded by the DAQ

    System Date: Date and time of sample

    Time[s]: time in seconds since the start of the test

    C_1_Force[kN]: load cell force

    C_1_Déform1[mm]: extensometer displacement

    C_1_Déplacement[mm]: cross-head displacement

    Eng_Stress[MPa]: engineering stress

    Eng_Strain[]: engineering strain

    e_true: true strain

    Sigma_true: true stress in MPa

    (optional) Temperature[C]: specimen surface temperature in degC

    The data can be loaded and used similarly to the downsampled data.

    File Format: Overall_Summary

    The overall summary file provides data on all the test specimens in the database. The columns include:

    hidden_index: internal reference ID

    grade: material grade

    spec: specifications for the material

    source: base material for the test specimen

    id: internal name for the specimen

    lp: load protocol

    size: type of specimen (M8, M12, M20)

    gage_length_mm_: unreduced section length in mm

    avg_reduced_dia_mm_: average measured diameter for the reduced section in mm

    avg_fractured_dia_top_mm_: average measured diameter of the top fracture surface in mm

    avg_fractured_dia_bot_mm_: average measured diameter of the bottom fracture surface in mm

    fy_n_mpa_: nominal yield stress

    fu_n_mpa_: nominal ultimate stress

    t_a_deg_c_: ambient temperature in degC

    date: date of test

    investigator: person(s) who conducted the test

    location: laboratory where test was conducted

    machine: setup used to conduct test

    pid_force_k_p, pid_force_t_i, pid_force_t_d: PID parameters for force control

    pid_disp_k_p, pid_disp_t_i, pid_disp_t_d: PID parameters for displacement control

    pid_extenso_k_p, pid_extenso_t_i, pid_extenso_t_d: PID parameters for extensometer control

    citekey: reference corresponding to the Database_References.bib file

    yield_stress_mpa_: computed yield stress in MPa

    elastic_modulus_mpa_: computed elastic modulus in MPa

    fracture_strain: computed average true strain across the fracture surface

    c,si,mn,p,s,n,cu,mo,ni,cr,v,nb,ti,al,b,zr,sn,ca,h,fe: chemical compositions in units of %mass

    file: file name of corresponding clean (downsampled) stress-strain data

    File Format: Summarized_Mechanical_Props_Campaign

    Meant to be loaded in Python as a pandas DataFrame with multi-indexing, e.g.,

    tab1 = pd.read_csv('Summarized_Mechanical_Props_Campaign_' + date + version + '.csv', index_col=[0, 1, 2, 3], skipinitialspace=True, header=[0, 1], keep_default_na=False, na_values='')

    citekey: reference in "Campaign_References.bib".

    Grade: material grade.

    Spec.: specifications (e.g., J2+N).

    Yield Stress [MPa]: initial yield stress in MPa

    size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign

    Elastic Modulus [MPa]: initial elastic modulus in MPa

    size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign

    Caveats

    The files in the following directories were tested before the protocol was established. Therefore, only the true stress-strain is available for each:

    A500

    A992_Gr50

    BCP325

    BCR295

    HYP400

    S460NL

    S690QL/25mm

    S355J2_Plates/S355J2_N_25mm and S355J2_N_50mm

  11. Stage Two Experiments - Datasets

    • figshare.com
    bin
    Updated Jan 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luke Yerbury (2025). Stage Two Experiments - Datasets [Dataset]. http://doi.org/10.6084/m9.figshare.27427629.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 21, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Luke Yerbury
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data used in the various stage two experiments in: "Comparing Clustering Approaches for Smart Meter Time Series: Investigating the Influence of Dataset Properties on Performance". This includes datasets with varied characteristics.All datasets are stored in a dict with tuples of (time series array, class labels). To access data in python:import picklefilename = "dataset.txt"with open(filename, 'rb') as f: data = pickle.load(f)

  12. c

    Saved Data Results from the Workflow to Analysis Imputation for...

    • kilthub.cmu.edu
    zip
    Updated Apr 9, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alan Shteyman; Ruchi Asthana; Russell Schwartz (2018). Saved Data Results from the Workflow to Analysis Imputation for Microsatellite / Cell Datasets for Reconstructing Cell Lineage Maps [Dataset]. http://doi.org/10.1184/R1/6007451.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 9, 2018
    Dataset provided by
    Carnegie Mellon University
    Authors
    Alan Shteyman; Ruchi Asthana; Russell Schwartz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    These are the results of the pipeline I wrote for my research in order to determine how imputation method affects the underlying structure of a Microsatellite/Cell Dataset useful for reconstructing cell lineage maps. The analysis was done using KNN,Linear Regression and a novel method based on Fitch's Algorithm. The pipeline was applied to 3 datasets (Tree A, B and C) taken from Frumkin et al. 2005 and is in three different folders. Follow the instructions txt to easily load data into python for easy processing.

  13. Z

    Data from: Large Landing Trajectory Data Set for Go-Around Analysis

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +1more
    Updated Dec 16, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Timothé Krauth (2022). Large Landing Trajectory Data Set for Go-Around Analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7148116
    Explore at:
    Dataset updated
    Dec 16, 2022
    Dataset provided by
    Benoit Figuet
    Marcel Dettling
    Raphael Monstein
    Manuel Waltert
    Timothé Krauth
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Large go-around, also referred to as missed approach, data set. The data set is in support of the paper presented at the OpenSky Symposium on November the 10th.

    If you use this data for a scientific publication, please consider citing our paper.

    The data set contains landings from 176 (mostly) large airports from 44 different countries. The landings are labelled as performing a go-around (GA) or not. In total, the data set contains almost 9 million landings with more than 33000 GAs. The data was collected from OpenSky Network's historical data base for the year 2019. The published data set contains multiple files:

    go_arounds_minimal.csv.gz

    Compressed CSV containing the minimal data set. It contains a row for each landing and a minimal amount of information about the landing, and if it was a GA. The data is structured in the following way:

        Column name
        Type
        Description
    
    
    
    
        time
        date time
        UTC time of landing or first GA attempt
    
    
        icao24
        string
        Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned
    
    
        callsign
        string
        Aircraft identifier in air-ground communications
    
    
        airport
        string
        ICAO airport code where the aircraft is landing
    
    
        runway
        string
        Runway designator on which the aircraft landed
    
    
        has_ga
        string
        "True" if at least one GA was performed, otherwise "False"
    
    
        n_approaches
        integer
        Number of approaches identified for this flight
    
    
        n_rwy_approached
        integer
        Number of unique runways approached by this flight
    

    The last two columns, n_approaches and n_rwy_approached, are useful to filter out training and calibration flight. These have usually a large number of n_approaches, so an easy way to exclude them is to filter by n_approaches > 2.

    go_arounds_augmented.csv.gz

    Compressed CSV containing the augmented data set. It contains a row for each landing and additional information about the landing, and if it was a GA. The data is structured in the following way:

        Column name
        Type
        Description
    
    
    
    
        time
        date time
        UTC time of landing or first GA attempt
    
    
        icao24
        string
        Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned
    
    
        callsign
        string
        Aircraft identifier in air-ground communications
    
    
        airport
        string
        ICAO airport code where the aircraft is landing
    
    
        runway
        string
        Runway designator on which the aircraft landed
    
    
        has_ga
        string
        "True" if at least one GA was performed, otherwise "False"
    
    
        n_approaches
        integer
        Number of approaches identified for this flight
    
    
        n_rwy_approached
        integer
        Number of unique runways approached by this flight
    
    
        registration
        string
        Aircraft registration
    
    
        typecode
        string
        Aircraft ICAO typecode
    
    
        icaoaircrafttype
        string
        ICAO aircraft type
    
    
        wtc
        string
        ICAO wake turbulence category
    
    
        glide_slope_angle
        float
        Angle of the ILS glide slope in degrees
    
    
        has_intersection
    

    string

        Boolean that is true if the runway has an other runway intersecting it, otherwise false
    
    
        rwy_length
        float
        Length of the runway in kilometre
    
    
        airport_country
        string
        ISO Alpha-3 country code of the airport
    
    
        airport_region
        string
        Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)
    
    
        operator_country
        string
        ISO Alpha-3 country code of the operator
    
    
        operator_region
        string
        Geographical region of the operator of the aircraft (either Europe, North America, South America, Asia, Africa, or Oceania)
    
    
        wind_speed_knts
        integer
        METAR, surface wind speed in knots
    
    
        wind_dir_deg
        integer
        METAR, surface wind direction in degrees
    
    
        wind_gust_knts
        integer
        METAR, surface wind gust speed in knots
    
    
        visibility_m
        float
        METAR, visibility in m
    
    
        temperature_deg
        integer
        METAR, temperature in degrees Celsius
    
    
        press_sea_level_p
        float
        METAR, sea level pressure in hPa
    
    
        press_p
        float
        METAR, QNH in hPA
    
    
        weather_intensity
        list
        METAR, list of present weather codes: qualifier - intensity
    
    
        weather_precipitation
        list
        METAR, list of present weather codes: weather phenomena - precipitation
    
    
        weather_desc
        list
        METAR, list of present weather codes: qualifier - descriptor
    
    
        weather_obscuration
        list
        METAR, list of present weather codes: weather phenomena - obscuration
    
    
        weather_other
        list
        METAR, list of present weather codes: weather phenomena - other
    

    This data set is augmented with data from various public data sources. Aircraft related data is mostly from the OpenSky Network's aircraft data base, the METAR information is from the Iowa State University, and the rest is mostly scraped from different web sites. If you need help with the METAR information, you can consult the WMO's Aerodrom Reports and Forecasts handbook.

    go_arounds_agg.csv.gz

    Compressed CSV containing the aggregated data set. It contains a row for each airport-runway, i.e. every runway at every airport for which data is available. The data is structured in the following way:

        Column name
        Type
        Description
    
    
    
    
        airport
        string
        ICAO airport code where the aircraft is landing
    
    
        runway
        string
        Runway designator on which the aircraft landed
    
    
        n_landings
        integer
        Total number of landings observed on this runway in 2019
    
    
        ga_rate
        float
        Go-around rate, per 1000 landings
    
    
        glide_slope_angle
        float
        Angle of the ILS glide slope in degrees
    
    
        has_intersection
        string
        Boolean that is true if the runway has an other runway intersecting it, otherwise false
    
    
        rwy_length
        float
        Length of the runway in kilometres
    
    
        airport_country
        string
        ISO Alpha-3 country code of the airport
    
    
        airport_region
        string
        Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)
    

    This aggregated data set is used in the paper for the generalized linear regression model.

    Downloading the trajectories

    Users of this data set with access to OpenSky Network's Impala shell can download the historical trajectories from the historical data base with a few lines of Python code. For example, you want to get all the go-arounds of the 4th of January 2019 at London City Airport (EGLC). You can use the Traffic library for easy access to the database:

    import datetime from tqdm.auto import tqdm import pandas as pd from traffic.data import opensky from traffic.core import Traffic

    load minimum data set

    df = pd.read_csv("go_arounds_minimal.csv.gz", low_memory=False) df["time"] = pd.to_datetime(df["time"])

    select London City Airport, go-arounds, and 2019-01-04

    airport = "EGLC" start = datetime.datetime(year=2019, month=1, day=4).replace( tzinfo=datetime.timezone.utc ) stop = datetime.datetime(year=2019, month=1, day=5).replace( tzinfo=datetime.timezone.utc )

    df_selection = df.query("airport==@airport & has_ga & (@start <= time <= @stop)")

    iterate over flights and pull the data from OpenSky Network

    flights = [] delta_time = pd.Timedelta(minutes=10) for _, row in tqdm(df_selection.iterrows(), total=df_selection.shape[0]): # take at most 10 minutes before and 10 minutes after the landing or go-around start_time = row["time"] - delta_time stop_time = row["time"] + delta_time

    # fetch the data from OpenSky Network
    flights.append(
      opensky.history(
        start=start_time.strftime("%Y-%m-%d %H:%M:%S"),
        stop=stop_time.strftime("%Y-%m-%d %H:%M:%S"),
        callsign=row["callsign"],
        return_flight=True,
      )
    )
    

    The flights can be converted into a Traffic object

    Traffic.from_flights(flights)

    Additional files

    Additional files are available to check the quality of the classification into GA/not GA and the selection of the landing runway. These are:

    validation_table.xlsx: This Excel sheet was manually completed during the review of the samples for each runway in the data set. It provides an estimate of the false positive and false negative rate of the go-around classification. It also provides an estimate of the runway misclassification rate when the airport has two or more parallel runways. The columns with the headers highlighted in red were filled in manually, the rest is generated automatically.

    validation_sample.zip: For each runway, 8 batches of 500 randomly selected trajectories (or as many as available, if fewer than 4000) classified as not having a GA and up to 8 batches of 10 random landings, classified as GA, are plotted. This allows the interested user to visually inspect a random sample of the landings and go-arounds easily.

  14. STEAD subsample 4 CDiffSD

    • zenodo.org
    csv
    Updated Apr 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniele Trappolini; Daniele Trappolini (2024). STEAD subsample 4 CDiffSD [Dataset]. http://doi.org/10.5281/zenodo.11083249
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 29, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Daniele Trappolini; Daniele Trappolini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 15, 2024
    Description


    # STEAD Subsample Dataset for CDiffSD Training

    ## Overview
    This dataset is a subsampled version of the STEAD dataset, specifically tailored for training our CDiffSD model (Cold Diffusion for Seismic Denoising). It consists of four CSV files, each saved in a format that requires Python's `pd.read_pickle` method for opening.

    ## Dataset Files
    The dataset includes the following files:

    • df_train: Used for both training and validation phases (with validation train split). Contains earthquake ground truth traces.
    • df_noise_train: Used for both training and validation phases. Contains noise used to contaminate the traces.
    • df_test: Used for the testing phase, structured similarly to df_train.
    • df_noise_test: Used for the testing phase, contains noise data for testing.

    Each file is structured to support the training and evaluation of seismic denoising models.

    ## Data Columns
    The dataset files contain columns from the original STEAD dataset, with specific focus on the following columns that are critical for our use:

    • E_channel: East-West channel data
    • N_channel: North-South channel data
    • Z_channel: Vertical channel data
    • s_arrival_sample: Sample index for S-wave arrival
    • p_arrival_sample: Sample index for P-wave arrival
    • trace_name: Identifier for the seismic trace

    ## Usage
    To load these files in a Python environment, use the following approach:

    ```python
    import pandas as pd

    # Example of loading the training data
    df_train = pd.read_pickle('path/to/df_train.csv')
    ```

    Ensure that the path to the file is correctly specified relative to your Python script.

    ## Requirements
    To use this dataset, ensure you have Python installed along with the Pandas library, which can be installed via pip if not already available:

    ```bash
    pip install pandas
    ```

  15. p

    Historical air temperature measurements from the DMC network

    • plataformadedatos.cl
    csv, mat, npz, xlsx
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meteorological Directorate of Chile, Historical air temperature measurements from the DMC network [Dataset]. https://www.plataformadedatos.cl/datasets/en/6f02d1394570d4c2
    Explore at:
    csv, mat, xlsx, npzAvailable download formats
    Dataset provided by
    Chilean Meteorological Officehttp://www.meteochile.gob.cl/
    Authors
    Meteorological Directorate of Chile
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Air temperature is one of the variables recorded by the meteorological network of the Chilean Meteorological Directorate (DMC). This collection contains the information stored by 507 stations that have recorded, at some point, the air temperature since 1950, spaced every hour. It is important to note that not all stations are currently operational.

    The data is updated directly from the DMC's web services and can be viewed in the Data Series viewer of the Itrend Data Platform.

    In addition, a historical database is provided in .npz* and .mat** format that is updated every 30 days for those stations that are still valid.

    *To load the data correctly in Python it is recommended to use the following code:

    import numpy as np
    
    with np.load(filename, allow_pickle = True) as f:
      data = {}
      for key, value in f.items():
        data[key] = value.item()
    

    **Date data is in datenum format, and to load it correctly in datetime format, it is recommended to use the following command in MATLAB:

    datetime(TS.x , 'ConvertFrom' , 'datenum')
    
  16. p

    Historical wind measurements at 2 meters height from the DMC network

    • plataformadedatos.cl
    csv, mat, npz, xlsx
    Updated Feb 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meteorological Directorate of Chile (2023). Historical wind measurements at 2 meters height from the DMC network [Dataset]. https://www.plataformadedatos.cl/datasets/en/81a687645f99ebe4
    Explore at:
    csv, npz, xlsx, matAvailable download formats
    Dataset updated
    Feb 8, 2023
    Dataset authored and provided by
    Meteorological Directorate of Chile
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The speed, direction of the wind and the variable wind indicator are the variables recorded by the meteorological network of the Chilean Meteorological Directorate (DMC). This collection contains the information stored by 326 stations that have recorded, at some point, the orientation of the wind since 1950, spaced one hour apart. It is important to note that not all stations are currently operational.

    The data is updated directly from the DMC's web services and can be viewed in the Data Series viewer of the Itrend Data Platform.

    In addition, a historical database is provided in .npz* and .mat** format that is updated every 30 days for those stations that are still valid.

    *To load the data correctly in Python it is recommended to use the following code:

    import numpy as np
    
    with np.load(filename, allow_pickle = True) as f:
      data = {}
      for key, value in f.items():
        data[key] = value.item()
    

    **Date data is in datenum format, and to load it correctly in datetime format, it is recommended to use the following command in MATLAB:

    datetime(TS.x , 'ConvertFrom' , 'datenum')
    
  17. Python Import Data in January - Seair.co.in

    • seair.co.in
    Updated Jan 29, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim (2016). Python Import Data in January - Seair.co.in [Dataset]. https://www.seair.co.in
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Jan 29, 2016
    Dataset provided by
    Seair Exim Solutions
    Authors
    Seair Exim
    Area covered
    Marshall Islands, Congo, Iceland, Ecuador, Chile, Indonesia, Equatorial Guinea, Bosnia and Herzegovina, Bahrain, Vietnam
    Description

    Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.

  18. Zero Modes and Classification of Combinatorial Metamaterials

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Nov 8, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryan van Mastrigt; Ryan van Mastrigt; Marjolein Dijkstra; Marjolein Dijkstra; Martin van Hecke; Martin van Hecke; Corentin Coulais; Corentin Coulais (2022). Zero Modes and Classification of Combinatorial Metamaterials [Dataset]. http://doi.org/10.5281/zenodo.7070963
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 8, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ryan van Mastrigt; Ryan van Mastrigt; Marjolein Dijkstra; Marjolein Dijkstra; Martin van Hecke; Martin van Hecke; Corentin Coulais; Corentin Coulais
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the simulation data of the combinatorial metamaterial as used for the paper 'Machine Learning of Implicit Combinatorial Rules in Mechanical Metamaterials', as published in Physical Review Letters.

    In this paper, the data is used to classify each \(k \times k\) unit cell design into one of two classes (C or I) based on the scaling (linear or constant) of the number of zero modes \(M_k(n)\) for metamaterials consisting of an \(n\times n\) tiling of the corresponding unit cell. Additionally, a random walk through the design space starting from class C unit cells was performed to characterize the boundary between class C and I in design space. A more detailed description of the contents of the dataset follows below.

    Modescaling_raw_data.zip

    This file contains uniformly sampled unit cell designs for metamaterial M2 and \(M_k(n)\) for \(1\leq n\leq 4\), which was used to classify the unit cell designs for the data set. There is a small subset of designs for \(k=\{3, 4, 5\}\) that do not neatly fall into the class C and I classification, and instead require additional simulation for \(4 \leq n \leq 6\) before either saturating to a constant number of zero modes (class I) or linearly increasing (class C). This file contains the simulation data of size \(3 \leq k \leq 8\) unit cells. The data is organized as follows.

    Simulation data for \(3 \leq k \leq 5\) and \(1 \leq n \leq 4\) is stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. These files are named "data_new_rrQR_i_n_M_kxk_fixn4.npy", and contain a [Nsim, 1+k*k+4] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:

    • col 0: label number to keep track
    • col 1 - k*k+1: flattened unit cell design, numpy.reshape should bring it back to its original \(k \times k\) form.
    • col k*k+1 - k*k+5: number of zero modes \(M_k(n)\) in ascending order of \(n\), so: \(\{M_k(1), M_k(2), M_k(3), M_k(4)\}\).

    Note: the unit cell design uses the numbers \(\{0, 1, 2, 3\}\) to refer to each building block orientation. The building block orientations can be characterized through the orientation of the missing diagonal bar (see Fig. 2 in the paper), which can be Left Up (LU), Left Down (LD), Right Up (RU), or Right Down (RD). The numbers correspond to the building block orientation \(\{0, 1, 2, 3\} = \{\mathrm{LU, RU, RD, LD}\}\).

    Simulation data for \(3 \leq k \leq 5\) and \(1 \leq n \leq 6\) for unit cells that cannot be classified as class C or I for \(1 \leq n \leq 4\) is stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. These files are named "data_new_rrQR_i_n_M_kxk_fixn4_classX_extend.npy", and contain a [Nsim, 1+k*k+6] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:

    • col 0: label number to keep track
    • col 1 - k*k+1: flattened unit cell design, numpy.reshape should bring it back to its original \(k \times k\) form.
    • col k*k+1 - k*k+5: number of zero modes \(M_k(n)\) in ascending order of \(n\), so: \(\{M_k(1), M_k(2), M_k(3), M_k(4), M_k(5), M_k(6)\}\).

    Simulation data for \(6 \leq k \leq 8\) unit cells are stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. Note that the number of modes is now calculated for \(n_x \times n_y\) metamaterials, where we calculate \((n_x, n_y) = \{(1,1), (2, 2), (3, 2), (4,2), (2, 3), (2, 4)\}\) rather than \(n_x=n_y=n\) to save computation time. These files are named "data_new_rrQR_i_n_Mx_My_n4_kxk(_extended).npy", and contain a [Nsim, 1+k*k+8] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:

    • col 0: label number to keep track
    • col 1 - k*k+1: flattened unit cell design, numpy.reshape should bring it back to its original \(k \times k\) form.
    • col k*k+1 - k*k+9: number of zero modes \(M_k(n_x, n_y)\) in order: \(\{M_k(1, 1), M_k(2, 2), M_k(3, 2), M_k(4, 2), M_k(1, 1), M_k(2, 2), M_k(2, 3), M_k(2, 4)\}\).

    Simulation data of metamaterial M1 for \(k_x \times k_y\) metamaterials are stored in compressed numpy array format (.npz) and can be loaded in Python with the Numpy package using the numpy.load command. These files are named "smiley_cube_x_y_\(k_x\)x\(k_y\).npz", which contain all possible metamaterial designs, and "smiley_cube_uniform_sample_x_y_\(k_x\)x\(k_y\).npz", which contain uniformly sampled metamaterial designs. The configurations are accessed with the keyword argument 'configs'. The classification is accessed with the keyword argument 'compatible'. The configurations array is of shape [Nsim, \(k_x\), \(k_y\)], the classification array is of shape [Nsim]. The building blocks in the configuration are denoted by 0 or 1, which correspond to the red/green and white/dashed building blocks respectively. Classification is 0 or 1, which corresponds to I and C respectively.

    Modescaling_classification_results.zip

    This file contains the classification, slope, and offset of the scaling of the number of zero modes \(M_k(n)\) for the unit cells of metamaterial M2 in Modescaling_raw_data.zip. The data is organized as follows.

    The results for \(3 \leq k \leq 5\) based on the \(1 \leq n \leq 4\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scen_slope_offset_M1k_kxk_fixn4.txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:

    col 0: label number to keep track

    col 1: the class, where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n \leq 4\))

    col 2: slope from \(n \geq 2\) onward (undefined for class X)

    col 3: the offset is defined as \(M_k(2) - 2 \cdot \mathrm{slope}\)

    col 4: \(M_k(1)\)

    The results for \(3 \leq k \leq 5\) based on the extended \(1 \leq n \leq 6\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scen_slope_offset_M1k_kxk_fixn4_classC_extend.txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:

    col 0: label number to keep track

    col 1: the class, where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n \leq 6\))

    col 2: slope from \(n \geq 2\) onward (undefined for class X)

    col 3: the offset is defined as \(M_k(2) - 2 \cdot \mathrm{slope}\)

    col 4: \(M_k(1)\)

    The results for \(6 \leq k \leq 8\) based on the \(1 \leq n \leq 4\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scenx_Sceny_slopex_slopey_offsetx_offsety_M1k_kxk(_extended).txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:

    col 0: label number to keep track

    col 1: the class_x based on \(M_k(n_x, 2)\), where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n_x \leq 4\))

    col 2: the class_y based on \(M_k(2, n_y)\), where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n_y \leq 4\))

    col 3: slope_x from \(n_x \geq 2\) onward (undefined for class X)

    col 4: slope_y from \(n_y \geq 2\) onward (undefined for class X)

    col 5: the offset_x is defined as \(M_k(2, 2) - 2 \cdot \mathrm{slope_x}\)

    col 6: the offset_x is defined as \(M_k(2, 2) - 2 \cdot \mathrm{slope_y}\)

    col 7: (M_k(1,

  19. d

    Supporting Materials for: Advancing Open and Reproducible Water Data Science...

    • search.dataone.org
    Updated Nov 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeffery S. Horsburgh (2024). Supporting Materials for: Advancing Open and Reproducible Water Data Science by Integrating Data Analytics with an Online Data Repository [Dataset]. https://search.dataone.org/view/sha256%3A88a5e5b98544a9512abb25c1e82acafd3b306d88f9ac33b64f72c32f23614350
    Explore at:
    Dataset updated
    Nov 16, 2024
    Dataset provided by
    Hydroshare
    Authors
    Jeffery S. Horsburgh
    Description

    This HydroShare resource was created as a demonstration of how a reproducible data science workflow can be created and shared using HydroShare. The hsclient Python Client package for HydroShare is used to show how the content files for the analysis can be managed and shared automatically in HydroShare. The content files include a Jupyter notebook that demonstrates a simple regression analysis to develop a model of annual maximum discharge in the Logan River in northern Utah, USA from annual maximum snow water equivalent data from a snowpack telemetry (SNOTEL) monitoring site located in the watershed. Streamflow data are retrieved from the United States Geological Survey (USGS) National Water Information System using the dataretrieval package. Snow water equivalent data are retrieved from the United States Department of Agriculture Natural Resources Conservation Service (NRCS) SNOTEL system. An additional notebook demonstrates how to use hsclient to retrieve data from HydroShare, load it into a performance data object, and then use the data for visualization and analysis.

  20. H

    Hydroinformatics Instruction Module Example Code: Programmatic Data Access...

    • hydroshare.org
    • beta.hydroshare.org
    • +1more
    zip
    Updated Mar 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amber Spackman Jones; Jeffery S. Horsburgh (2022). Hydroinformatics Instruction Module Example Code: Programmatic Data Access with USGS Data Retrieval [Dataset]. https://www.hydroshare.org/resource/a58b5d522d7f4ab08c15cd05f3fd2ad3
    Explore at:
    zip(34.5 KB)Available download formats
    Dataset updated
    Mar 3, 2022
    Dataset provided by
    HydroShare
    Authors
    Amber Spackman Jones; Jeffery S. Horsburgh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This resource contains Jupyter Notebooks with examples for accessing USGS NWIS data via web services and performing subsequent analysis related to drought with particular focus on sites in Utah and the southwestern United States (could be modified to any USGS sites). The code uses the Python DataRetrieval package. The resource is part of set of materials for hydroinformatics and water data science instruction. Complete learning module materials are found in HydroLearn: Jones, A.S., Horsburgh, J.S., Bastidas Pacheco, C.J. (2022). Hydroinformatics and Water Data Science. HydroLearn. https://edx.hydrolearn.org/courses/course-v1:USU+CEE6110+2022/about.

    This resources consists of 6 example notebooks: 1. Example 1: Import and plot daily flow data 2. Example 2: Import and plot instantaneous flow data for multiple sites 3. Example 3: Perform analyses with USGS annual statistics data 4. Example 4: Retrieve data and find daily flow percentiles 3. Example 5: Further examination of drought year flows 6. Coding challenge: Assess drought severity

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Andrew Felton; Andrew Felton (2024). Storage and Transit Time Data and Code [Dataset]. http://doi.org/10.5281/zenodo.14171251
Organization logo

Storage and Transit Time Data and Code

Explore at:
zipAvailable download formats
Dataset updated
Nov 15, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andrew Felton; Andrew Felton
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Author: Andrew J. Felton
Date: 11/15/2024

This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis, and figure production for the study entitled:

"Global estimates of the storage and transit time of water through vegetation"

Please note that 'turnover' and 'transit' are used interchangeably. Also please note that this R project has been updated multiple times as the analysis has updated throughout the peer review process.

#Data information:

The data folder contains key data sets used for analysis. In particular:

"data/turnover_from_python/updated/august_2024_lc/" contains the core datasets used in this study including global arrays summarizing five year (2016-2020) averages of mean (annual) and minimum (monthly) transit time, storage, canopy transpiration, and number of months of data able as both an array (.nc) or data table (.csv). These data were produced in python using the python scripts found in the "supporting_code" folder. The remaining files in the "data" and "data/supporting_data" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here. The "supporting_data"" folder also contains annual (2016-2020) MODIS land cover data used in the analysis and contains separate filters containing the original data (.hdf) and then the final process (filtered) data in .nc format. The resulting annual land cover distributions were used in the pre-processing of data in python.

#Code information

Python scripts can be found in the "supporting_code" folder.

Each R script in this project has a role:

"01_start.R": This script sets the working directory, loads in the tidyverse package (the remaining packages in this project are called using the `::` operator), and can run two other scripts: one that loads the customized functions (02_functions.R) and one for importing and processing the key dataset for this analysis (03_import_data.R).

"02_functions.R": This script contains custom functions. Load this using the `source()` function in the 01_start.R script.

"03_import_data.R": This script imports and processes the .csv transit data. It joins the mean (annual) transit time data with the minimum (monthly) transit data to generate one dataset for analysis: annual_turnover_2. Load this using the
`source()` function in the 01_start.R script.

"04_figures_tables.R": This is the main workhouse for figure/table production and supporting analyses. This script generates the key figures and summary statistics used in the study that then get saved in the "manuscript_figures" folder. Note that all maps were produced using Python code found in the "supporting_code"" folder. Also note that within the "manuscript_figures" folder there is an "extended_data" folder, which contains tables of the summary statistics (e.g., quartiles and sample sizes) behind figures containing box plots or depicting regression coefficients.

"supporting_generate_data.R": This script processes supporting data used in the analysis, primarily the varying ground-based datasets of leaf water content.

"supporting_process_land_cover.R": This takes annual MODIS land cover distributions and processes them through a multi-step filtering process so that they can be used in preprocessing of datasets in python.

Search
Clear search
Close search
Google apps
Main menu