Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Author: Andrew J. Felton
Date: 11/15/2024
This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis, and figure production for the study entitled:
"Global estimates of the storage and transit time of water through vegetation"
Please note that 'turnover' and 'transit' are used interchangeably. Also please note that this R project has been updated multiple times as the analysis has updated throughout the peer review process.
#Data information:
The data folder contains key data sets used for analysis. In particular:
"data/turnover_from_python/updated/august_2024_lc/" contains the core datasets used in this study including global arrays summarizing five year (2016-2020) averages of mean (annual) and minimum (monthly) transit time, storage, canopy transpiration, and number of months of data able as both an array (.nc) or data table (.csv). These data were produced in python using the python scripts found in the "supporting_code" folder. The remaining files in the "data" and "data/supporting_data" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here. The "supporting_data"" folder also contains annual (2016-2020) MODIS land cover data used in the analysis and contains separate filters containing the original data (.hdf) and then the final process (filtered) data in .nc format. The resulting annual land cover distributions were used in the pre-processing of data in python.
#Code information
Python scripts can be found in the "supporting_code" folder.
Each R script in this project has a role:
"01_start.R": This script sets the working directory, loads in the tidyverse package (the remaining packages in this project are called using the `::` operator), and can run two other scripts: one that loads the customized functions (02_functions.R) and one for importing and processing the key dataset for this analysis (03_import_data.R).
"02_functions.R": This script contains custom functions. Load this using the `source()` function in the 01_start.R script.
"03_import_data.R": This script imports and processes the .csv transit data. It joins the mean (annual) transit time data with the minimum (monthly) transit data to generate one dataset for analysis: annual_turnover_2. Load this using the
`source()` function in the 01_start.R script.
"04_figures_tables.R": This is the main workhouse for figure/table production and supporting analyses. This script generates the key figures and summary statistics used in the study that then get saved in the "manuscript_figures" folder. Note that all maps were produced using Python code found in the "supporting_code"" folder. Also note that within the "manuscript_figures" folder there is an "extended_data" folder, which contains tables of the summary statistics (e.g., quartiles and sample sizes) behind figures containing box plots or depicting regression coefficients.
"supporting_generate_data.R": This script processes supporting data used in the analysis, primarily the varying ground-based datasets of leaf water content.
"supporting_process_land_cover.R": This takes annual MODIS land cover distributions and processes them through a multi-step filtering process so that they can be used in preprocessing of datasets in python.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file contains the raw data as well as the Python code used to generate the results and plots shown in the main manuscript.
Overview
This repository contains ready-to-use frequency time series as well as the corresponding pre-processing scripts in python. The data covers three synchronous areas of the European power grid:
This work is part of the paper "Predictability of Power Grid Frequency"[1]. Please cite this paper, when using the data and the code. For a detailed documentation of the pre-processing procedure we refer to the supplementary material of the paper.
Data sources
We downloaded the frequency recordings from publically available repositories of three different Transmission System Operators (TSOs).
Content of the repository
A) Scripts
The python scripts run with Python 3.7 and with the packages found in "requirements.txt".
B) Yearly converted and cleansed data
The folders "
Use cases
We point out that this repository can be used in two different was:
Use pre-processed data: You can directly use the converted or the cleansed data. Note however, that both data sets include segments of NaN-values due to missing and corrupted recordings. Only a very small part of the NaN-values were eliminated in the cleansed data to not manipulate the data too much.
Produce your own cleansed data: Depending on your application, you might want to cleanse the data in a custom way. You can easily add your custom cleansing procedure in "clean_corrupted_data.py" and then produce cleansed data from the raw data in "
License
This work is licensed under multiple licenses, which are located in the "LICENSES" folder.
Changelog
Version 2:
Version 3:
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is a subsampled version of the STEAD dataset, specifically tailored for training our CDiffSD model (Cold Diffusion for Seismic Denoising). It consists of four HDF5 files, each saved in a format that requires Python's `h5py` method for opening.
The dataset includes the following files:
Each file is structured to support the training and evaluation of seismic denoising models.
The HDF5 files named noise contain two main datasets:
Similarly, the train and test files, which contain earthquake data, include the same traces and metadata datasets, but also feature two additional datasets:
To load these files in a Python environment, use the following approach:
```python
import h5py
import numpy as np
# Open the HDF5 file in read mode
with h5py.File('train_noise.hdf5', 'r') as file:
# Print all the main keys in the file
print("Keys in the HDF5 file:", list(file.keys()))
if 'traces' in file:
# Access the dataset
data = file['traces'][:10] # Load the first 10 traces
if 'metadata' in file:
# Access the dataset
trace_name = file['metadata'][:10] # Load the first 10 metadata entries```
Ensure that the path to the file is correctly specified relative to your Python script.
To use this dataset, ensure you have Python installed along with the Pandas library, which can be installed via pip if not already available:
```bash
pip install numpy
pip install h5py
```
Dataset Card for Python-DPO
This dataset is the smaller version of Python-DPO-Large dataset and has been created using Argilla.
Load with datasets
To load this dataset with datasets, you'll just need to install datasets as pip install datasets --upgrade and then use the following code: from datasets import load_dataset
ds = load_dataset("NextWealth/Python-DPO")
Data Fields
Each data instance contains:
instruction: The problem… See the full description on the dataset page: https://huggingface.co/datasets/NextWealth/Python-DPO.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The speed, direction of the wind and the variable wind indicator are the variables recorded by the meteorological network of the Chilean Meteorological Directorate (DMC). This collection contains the information stored by 168 stations that have recorded, at some point, the orientation of the wind since 1950, spaced one hour apart. It is important to note that not all stations are currently operational.
The data is updated directly from the DMC's web services and can be viewed in the Data Series viewer of the Itrend Data Platform.
In addition, a historical database is provided in .npz* and .mat** format that is updated every 30 days for those stations that are still valid.
*To load the data correctly in Python it is recommended to use the following code:
import numpy as np
with np.load(filename, allow_pickle = True) as f:
data = {}
for key, value in f.items():
data[key] = value.item()
**Date data is in datenum
format, and to load it correctly in datetime
format, it is recommended to use the following command in MATLAB:
datetime(TS.x , 'ConvertFrom' , 'datenum')
polyOne Data Set
The data set contains 100 million hypothetical polymers each with 29 predicted properties using machine learning models. We use PSMILES strings to represent polymer structures, see here and here. The polymers are generated by decomposing previously synthesized polymers into unique chemical fragments. Random and enumerative compositions of these fragments yield 100 million hypothetical PSMILES strings. All PSMILES strings are chemically valid polymers but, mostly, have never been synthesized before. More information can be found in the paper. Please note the license agreement in the LICENSE file.
Full data set including the properties
The data files are in Apache Parquet format. The files start with polyOne_*.parquet
.
I recommend using dask (pip install dask
) to load and process the data set. Pandas also works but is slower.
Load sharded data set with dask
python
import dask.dataframe as dd
ddf = dd.read_parquet("*.parquet", engine="pyarrow")
For example, compute the description of data set ```python df_describe = ddf.describe().compute() df_describe
PSMILES strings only
generated_polymer_smiles_train.txt - 80 million PSMILES strings for training polyBERT. One string per line.
generated_polymer_smiles_dev.txt - 20 million PSMILES strings for testing polyBERT. One string per line.
This dataset contains materials from the Coalition for Community-Supported Affordable Geothermal Energy Systems (C2SAGES) project, which evaluated the techno-economic feasibility of a community geothermal system for a residential development in Hinesburg, VT. The dataset includes detailed soil conductivity test reports, energy models, borehole design reports, hourly energy loads for heating, cooling, and hot water, and design layouts. EnergyPlus was used to model building energy loads, and Modelica software was applied for geothermal loop sizing based on these loads and soil conductivity results. Python scripts for network design further refined the models. Key files include PDF reports on borehole design (with projections for 1-year, 15-year, and 30-year systems), soil conductivity test results, EnergyPlus modeling outputs, and 2D/3D design drawings in PDF, DWG, and DXF formats. Python notebooks for network design and OnePipe model files are also provided, with Modelica required for viewing certain files. Outputs and modeling data are in various formats including CSV, JPG, HTML, and IDF, with units and data clearly labeled to support understanding of system design and performance for the proposed geothermal solution.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Database of Uniaxial Cyclic and Tensile Coupon Tests for Structural Metallic Materials
Background
This dataset contains data from monotonic and cyclic loading experiments on structural metallic materials. The materials are primarily structural steels and one iron-based shape memory alloy is also included. Summary files are included that provide an overview of the database and data from the individual experiments is also included.
The files included in the database are outlined below and the format of the files is briefly described. Additional information regarding the formatting can be found through the post-processing library (https://github.com/ahartloper/rlmtp/tree/master/protocols).
Usage
The data is licensed through the Creative Commons Attribution 4.0 International.
If you have used our data and are publishing your work, we ask that you please reference both:
this database through its DOI, and
any publication that is associated with the experiments. See the Overall_Summary and Database_References files for the associated publication references.
Included Files
Overall_Summary_2022-08-25_v1-0-0.csv: summarises the specimen information for all experiments in the database.
Summarized_Mechanical_Props_Campaign_2022-08-25_v1-0-0.csv: summarises the average initial yield stress and average initial elastic modulus per campaign.
Unreduced_Data-#_v1-0-0.zip: contain the original (not downsampled) data
Where # is one of: 1, 2, 3, 4, 5, 6. The unreduced data is broken into separate archives because of upload limitations to Zenodo. Together they provide all the experimental data.
We recommend you un-zip all the folders and place them in one "Unreduced_Data" directory similar to the "Clean_Data"
The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.
There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the unreduced data.
The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.
Clean_Data_v1-0-0.zip: contains all the downsampled data
The experimental data is provided through .csv files for each test that contain the processed data. The experiments are organised by experimental campaign and named by load protocol and specimen. A .pdf file accompanies each test showing the stress-strain graph.
There is a "db_tag_clean_data_map.csv" file that is used to map the database summary with the clean data.
The computed yield stresses and elastic moduli are stored in the "yield_stress" directory.
Database_References_v1-0-0.bib
Contains a bibtex reference for many of the experiments in the database. Corresponds to the "citekey" entry in the summary files.
File Format: Downsampled Data
These are the "LP_
The header of the first column is empty: the first column corresponds to the index of the sample point in the original (unreduced) data
Time[s]: time in seconds since the start of the test
e_true: true strain
Sigma_true: true stress in MPa
(optional) Temperature[C]: the surface temperature in degC
These data files can be easily loaded using the pandas library in Python through:
import pandas data = pandas.read_csv(data_file, index_col=0)
The data is formatted so it can be used directly in RESSPyLab (https://github.com/AlbanoCastroSousa/RESSPyLab). Note that the column names "e_true" and "Sigma_true" were kept for backwards compatibility reasons with RESSPyLab.
File Format: Unreduced Data
These are the "LP_
The first column is the index of each data point
S/No: sample number recorded by the DAQ
System Date: Date and time of sample
Time[s]: time in seconds since the start of the test
C_1_Force[kN]: load cell force
C_1_Déform1[mm]: extensometer displacement
C_1_Déplacement[mm]: cross-head displacement
Eng_Stress[MPa]: engineering stress
Eng_Strain[]: engineering strain
e_true: true strain
Sigma_true: true stress in MPa
(optional) Temperature[C]: specimen surface temperature in degC
The data can be loaded and used similarly to the downsampled data.
File Format: Overall_Summary
The overall summary file provides data on all the test specimens in the database. The columns include:
hidden_index: internal reference ID
grade: material grade
spec: specifications for the material
source: base material for the test specimen
id: internal name for the specimen
lp: load protocol
size: type of specimen (M8, M12, M20)
gage_length_mm_: unreduced section length in mm
avg_reduced_dia_mm_: average measured diameter for the reduced section in mm
avg_fractured_dia_top_mm_: average measured diameter of the top fracture surface in mm
avg_fractured_dia_bot_mm_: average measured diameter of the bottom fracture surface in mm
fy_n_mpa_: nominal yield stress
fu_n_mpa_: nominal ultimate stress
t_a_deg_c_: ambient temperature in degC
date: date of test
investigator: person(s) who conducted the test
location: laboratory where test was conducted
machine: setup used to conduct test
pid_force_k_p, pid_force_t_i, pid_force_t_d: PID parameters for force control
pid_disp_k_p, pid_disp_t_i, pid_disp_t_d: PID parameters for displacement control
pid_extenso_k_p, pid_extenso_t_i, pid_extenso_t_d: PID parameters for extensometer control
citekey: reference corresponding to the Database_References.bib file
yield_stress_mpa_: computed yield stress in MPa
elastic_modulus_mpa_: computed elastic modulus in MPa
fracture_strain: computed average true strain across the fracture surface
c,si,mn,p,s,n,cu,mo,ni,cr,v,nb,ti,al,b,zr,sn,ca,h,fe: chemical compositions in units of %mass
file: file name of corresponding clean (downsampled) stress-strain data
File Format: Summarized_Mechanical_Props_Campaign
Meant to be loaded in Python as a pandas DataFrame with multi-indexing, e.g.,
tab1 = pd.read_csv('Summarized_Mechanical_Props_Campaign_' + date + version + '.csv', index_col=[0, 1, 2, 3], skipinitialspace=True, header=[0, 1], keep_default_na=False, na_values='')
citekey: reference in "Campaign_References.bib".
Grade: material grade.
Spec.: specifications (e.g., J2+N).
Yield Stress [MPa]: initial yield stress in MPa
size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign
Elastic Modulus [MPa]: initial elastic modulus in MPa
size, count, mean, coefvar: number of experiments in campaign, number of experiments in mean, mean value for campaign, coefficient of variation for campaign
Caveats
The files in the following directories were tested before the protocol was established. Therefore, only the true stress-strain is available for each:
A500
A992_Gr50
BCP325
BCR295
HYP400
S460NL
S690QL/25mm
S355J2_Plates/S355J2_N_25mm and S355J2_N_50mm
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data used in the various stage two experiments in: "Comparing Clustering Approaches for Smart Meter Time Series: Investigating the Influence of Dataset Properties on Performance". This includes datasets with varied characteristics.All datasets are stored in a dict with tuples of (time series array, class labels). To access data in python:import picklefilename = "dataset.txt"with open(filename, 'rb') as f: data = pickle.load(f)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These are the results of the pipeline I wrote for my research in order to determine how imputation method affects the underlying structure of a Microsatellite/Cell Dataset useful for reconstructing cell lineage maps. The analysis was done using KNN,Linear Regression and a novel method based on Fitch's Algorithm. The pipeline was applied to 3 datasets (Tree A, B and C) taken from Frumkin et al. 2005 and is in three different folders. Follow the instructions txt to easily load data into python for easy processing.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Large go-around, also referred to as missed approach, data set. The data set is in support of the paper presented at the OpenSky Symposium on November the 10th.
If you use this data for a scientific publication, please consider citing our paper.
The data set contains landings from 176 (mostly) large airports from 44 different countries. The landings are labelled as performing a go-around (GA) or not. In total, the data set contains almost 9 million landings with more than 33000 GAs. The data was collected from OpenSky Network's historical data base for the year 2019. The published data set contains multiple files:
go_arounds_minimal.csv.gz
Compressed CSV containing the minimal data set. It contains a row for each landing and a minimal amount of information about the landing, and if it was a GA. The data is structured in the following way:
Column name
Type
Description
time
date time
UTC time of landing or first GA attempt
icao24
string
Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned
callsign
string
Aircraft identifier in air-ground communications
airport
string
ICAO airport code where the aircraft is landing
runway
string
Runway designator on which the aircraft landed
has_ga
string
"True" if at least one GA was performed, otherwise "False"
n_approaches
integer
Number of approaches identified for this flight
n_rwy_approached
integer
Number of unique runways approached by this flight
The last two columns, n_approaches and n_rwy_approached, are useful to filter out training and calibration flight. These have usually a large number of n_approaches, so an easy way to exclude them is to filter by n_approaches > 2.
go_arounds_augmented.csv.gz
Compressed CSV containing the augmented data set. It contains a row for each landing and additional information about the landing, and if it was a GA. The data is structured in the following way:
Column name
Type
Description
time
date time
UTC time of landing or first GA attempt
icao24
string
Unique 24-bit (hexadecimal number) ICAO identifier of the aircraft concerned
callsign
string
Aircraft identifier in air-ground communications
airport
string
ICAO airport code where the aircraft is landing
runway
string
Runway designator on which the aircraft landed
has_ga
string
"True" if at least one GA was performed, otherwise "False"
n_approaches
integer
Number of approaches identified for this flight
n_rwy_approached
integer
Number of unique runways approached by this flight
registration
string
Aircraft registration
typecode
string
Aircraft ICAO typecode
icaoaircrafttype
string
ICAO aircraft type
wtc
string
ICAO wake turbulence category
glide_slope_angle
float
Angle of the ILS glide slope in degrees
has_intersection
string
Boolean that is true if the runway has an other runway intersecting it, otherwise false
rwy_length
float
Length of the runway in kilometre
airport_country
string
ISO Alpha-3 country code of the airport
airport_region
string
Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)
operator_country
string
ISO Alpha-3 country code of the operator
operator_region
string
Geographical region of the operator of the aircraft (either Europe, North America, South America, Asia, Africa, or Oceania)
wind_speed_knts
integer
METAR, surface wind speed in knots
wind_dir_deg
integer
METAR, surface wind direction in degrees
wind_gust_knts
integer
METAR, surface wind gust speed in knots
visibility_m
float
METAR, visibility in m
temperature_deg
integer
METAR, temperature in degrees Celsius
press_sea_level_p
float
METAR, sea level pressure in hPa
press_p
float
METAR, QNH in hPA
weather_intensity
list
METAR, list of present weather codes: qualifier - intensity
weather_precipitation
list
METAR, list of present weather codes: weather phenomena - precipitation
weather_desc
list
METAR, list of present weather codes: qualifier - descriptor
weather_obscuration
list
METAR, list of present weather codes: weather phenomena - obscuration
weather_other
list
METAR, list of present weather codes: weather phenomena - other
This data set is augmented with data from various public data sources. Aircraft related data is mostly from the OpenSky Network's aircraft data base, the METAR information is from the Iowa State University, and the rest is mostly scraped from different web sites. If you need help with the METAR information, you can consult the WMO's Aerodrom Reports and Forecasts handbook.
go_arounds_agg.csv.gz
Compressed CSV containing the aggregated data set. It contains a row for each airport-runway, i.e. every runway at every airport for which data is available. The data is structured in the following way:
Column name
Type
Description
airport
string
ICAO airport code where the aircraft is landing
runway
string
Runway designator on which the aircraft landed
n_landings
integer
Total number of landings observed on this runway in 2019
ga_rate
float
Go-around rate, per 1000 landings
glide_slope_angle
float
Angle of the ILS glide slope in degrees
has_intersection
string
Boolean that is true if the runway has an other runway intersecting it, otherwise false
rwy_length
float
Length of the runway in kilometres
airport_country
string
ISO Alpha-3 country code of the airport
airport_region
string
Geographical region of the airport (either Europe, North America, South America, Asia, Africa, or Oceania)
This aggregated data set is used in the paper for the generalized linear regression model.
Downloading the trajectories
Users of this data set with access to OpenSky Network's Impala shell can download the historical trajectories from the historical data base with a few lines of Python code. For example, you want to get all the go-arounds of the 4th of January 2019 at London City Airport (EGLC). You can use the Traffic library for easy access to the database:
import datetime from tqdm.auto import tqdm import pandas as pd from traffic.data import opensky from traffic.core import Traffic
df = pd.read_csv("go_arounds_minimal.csv.gz", low_memory=False) df["time"] = pd.to_datetime(df["time"])
airport = "EGLC" start = datetime.datetime(year=2019, month=1, day=4).replace( tzinfo=datetime.timezone.utc ) stop = datetime.datetime(year=2019, month=1, day=5).replace( tzinfo=datetime.timezone.utc )
df_selection = df.query("airport==@airport & has_ga & (@start <= time <= @stop)")
flights = [] delta_time = pd.Timedelta(minutes=10) for _, row in tqdm(df_selection.iterrows(), total=df_selection.shape[0]): # take at most 10 minutes before and 10 minutes after the landing or go-around start_time = row["time"] - delta_time stop_time = row["time"] + delta_time
# fetch the data from OpenSky Network
flights.append(
opensky.history(
start=start_time.strftime("%Y-%m-%d %H:%M:%S"),
stop=stop_time.strftime("%Y-%m-%d %H:%M:%S"),
callsign=row["callsign"],
return_flight=True,
)
)
Traffic.from_flights(flights)
Additional files
Additional files are available to check the quality of the classification into GA/not GA and the selection of the landing runway. These are:
validation_table.xlsx: This Excel sheet was manually completed during the review of the samples for each runway in the data set. It provides an estimate of the false positive and false negative rate of the go-around classification. It also provides an estimate of the runway misclassification rate when the airport has two or more parallel runways. The columns with the headers highlighted in red were filled in manually, the rest is generated automatically.
validation_sample.zip: For each runway, 8 batches of 500 randomly selected trajectories (or as many as available, if fewer than 4000) classified as not having a GA and up to 8 batches of 10 random landings, classified as GA, are plotted. This allows the interested user to visually inspect a random sample of the landings and go-arounds easily.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
# STEAD Subsample Dataset for CDiffSD Training
## Overview
This dataset is a subsampled version of the STEAD dataset, specifically tailored for training our CDiffSD model (Cold Diffusion for Seismic Denoising). It consists of four CSV files, each saved in a format that requires Python's `pd.read_pickle` method for opening.
## Dataset Files
The dataset includes the following files:
Each file is structured to support the training and evaluation of seismic denoising models.
## Data Columns
The dataset files contain columns from the original STEAD dataset, with specific focus on the following columns that are critical for our use:
## Usage
To load these files in a Python environment, use the following approach:
```python
import pandas as pd
# Example of loading the training data
df_train = pd.read_pickle('path/to/df_train.csv')
```
Ensure that the path to the file is correctly specified relative to your Python script.
## Requirements
To use this dataset, ensure you have Python installed along with the Pandas library, which can be installed via pip if not already available:
```bash
pip install pandas
```
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Air temperature is one of the variables recorded by the meteorological network of the Chilean Meteorological Directorate (DMC). This collection contains the information stored by 507 stations that have recorded, at some point, the air temperature since 1950, spaced every hour. It is important to note that not all stations are currently operational.
The data is updated directly from the DMC's web services and can be viewed in the Data Series viewer of the Itrend Data Platform.
In addition, a historical database is provided in .npz* and .mat** format that is updated every 30 days for those stations that are still valid.
*To load the data correctly in Python it is recommended to use the following code:
import numpy as np
with np.load(filename, allow_pickle = True) as f:
data = {}
for key, value in f.items():
data[key] = value.item()
**Date data is in datenum
format, and to load it correctly in datetime
format, it is recommended to use the following command in MATLAB:
datetime(TS.x , 'ConvertFrom' , 'datenum')
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The speed, direction of the wind and the variable wind indicator are the variables recorded by the meteorological network of the Chilean Meteorological Directorate (DMC). This collection contains the information stored by 326 stations that have recorded, at some point, the orientation of the wind since 1950, spaced one hour apart. It is important to note that not all stations are currently operational.
The data is updated directly from the DMC's web services and can be viewed in the Data Series viewer of the Itrend Data Platform.
In addition, a historical database is provided in .npz* and .mat** format that is updated every 30 days for those stations that are still valid.
*To load the data correctly in Python it is recommended to use the following code:
import numpy as np
with np.load(filename, allow_pickle = True) as f:
data = {}
for key, value in f.items():
data[key] = value.item()
**Date data is in datenum
format, and to load it correctly in datetime
format, it is recommended to use the following command in MATLAB:
datetime(TS.x , 'ConvertFrom' , 'datenum')
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the simulation data of the combinatorial metamaterial as used for the paper 'Machine Learning of Implicit Combinatorial Rules in Mechanical Metamaterials', as published in Physical Review Letters.
In this paper, the data is used to classify each \(k \times k\) unit cell design into one of two classes (C or I) based on the scaling (linear or constant) of the number of zero modes \(M_k(n)\) for metamaterials consisting of an \(n\times n\) tiling of the corresponding unit cell. Additionally, a random walk through the design space starting from class C unit cells was performed to characterize the boundary between class C and I in design space. A more detailed description of the contents of the dataset follows below.
Modescaling_raw_data.zip
This file contains uniformly sampled unit cell designs for metamaterial M2 and \(M_k(n)\) for \(1\leq n\leq 4\), which was used to classify the unit cell designs for the data set. There is a small subset of designs for \(k=\{3, 4, 5\}\) that do not neatly fall into the class C and I classification, and instead require additional simulation for \(4 \leq n \leq 6\) before either saturating to a constant number of zero modes (class I) or linearly increasing (class C). This file contains the simulation data of size \(3 \leq k \leq 8\) unit cells. The data is organized as follows.
Simulation data for \(3 \leq k \leq 5\) and \(1 \leq n \leq 4\) is stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. These files are named "data_new_rrQR_i_n_M_kxk_fixn4.npy", and contain a [Nsim, 1+k*k+4] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:
Note: the unit cell design uses the numbers \(\{0, 1, 2, 3\}\) to refer to each building block orientation. The building block orientations can be characterized through the orientation of the missing diagonal bar (see Fig. 2 in the paper), which can be Left Up (LU), Left Down (LD), Right Up (RU), or Right Down (RD). The numbers correspond to the building block orientation \(\{0, 1, 2, 3\} = \{\mathrm{LU, RU, RD, LD}\}\).
Simulation data for \(3 \leq k \leq 5\) and \(1 \leq n \leq 6\) for unit cells that cannot be classified as class C or I for \(1 \leq n \leq 4\) is stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. These files are named "data_new_rrQR_i_n_M_kxk_fixn4_classX_extend.npy", and contain a [Nsim, 1+k*k+6] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:
Simulation data for \(6 \leq k \leq 8\) unit cells are stored in numpy array format (.npy) and can be readily loaded in Python with the Numpy package using the numpy.load command. Note that the number of modes is now calculated for \(n_x \times n_y\) metamaterials, where we calculate \((n_x, n_y) = \{(1,1), (2, 2), (3, 2), (4,2), (2, 3), (2, 4)\}\) rather than \(n_x=n_y=n\) to save computation time. These files are named "data_new_rrQR_i_n_Mx_My_n4_kxk(_extended).npy", and contain a [Nsim, 1+k*k+8] sized array, where Nsim is the number of simulated unit cells. Each row corresponds to a unit cell. The columns are organized as follows:
Simulation data of metamaterial M1 for \(k_x \times k_y\) metamaterials are stored in compressed numpy array format (.npz) and can be loaded in Python with the Numpy package using the numpy.load command. These files are named "smiley_cube_x_y_\(k_x\)x\(k_y\).npz", which contain all possible metamaterial designs, and "smiley_cube_uniform_sample_x_y_\(k_x\)x\(k_y\).npz", which contain uniformly sampled metamaterial designs. The configurations are accessed with the keyword argument 'configs'. The classification is accessed with the keyword argument 'compatible'. The configurations array is of shape [Nsim, \(k_x\), \(k_y\)], the classification array is of shape [Nsim]. The building blocks in the configuration are denoted by 0 or 1, which correspond to the red/green and white/dashed building blocks respectively. Classification is 0 or 1, which corresponds to I and C respectively.
Modescaling_classification_results.zip
This file contains the classification, slope, and offset of the scaling of the number of zero modes \(M_k(n)\) for the unit cells of metamaterial M2 in Modescaling_raw_data.zip. The data is organized as follows.
The results for \(3 \leq k \leq 5\) based on the \(1 \leq n \leq 4\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scen_slope_offset_M1k_kxk_fixn4.txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:
col 0: label number to keep track
col 1: the class, where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n \leq 4\))
col 2: slope from \(n \geq 2\) onward (undefined for class X)
col 3: the offset is defined as \(M_k(2) - 2 \cdot \mathrm{slope}\)
col 4: \(M_k(1)\)
The results for \(3 \leq k \leq 5\) based on the extended \(1 \leq n \leq 6\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scen_slope_offset_M1k_kxk_fixn4_classC_extend.txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:
col 0: label number to keep track
col 1: the class, where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n \leq 6\))
col 2: slope from \(n \geq 2\) onward (undefined for class X)
col 3: the offset is defined as \(M_k(2) - 2 \cdot \mathrm{slope}\)
col 4: \(M_k(1)\)
The results for \(6 \leq k \leq 8\) based on the \(1 \leq n \leq 4\) mode scaling data is stored in "results_analysis_new_rrQR_i_Scenx_Sceny_slopex_slopey_offsetx_offsety_M1k_kxk(_extended).txt". The data can be loaded using ',' as delimiter. Every row corresponds to a unit cell design (see the label number to compare to the earlier data). The columns are organized as follows:
col 0: label number to keep track
col 1: the class_x based on \(M_k(n_x, 2)\), where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n_x \leq 4\))
col 2: the class_y based on \(M_k(2, n_y)\), where 0 corresponds to class I, 1 to class C and 2 to class X (neither class I or C for \(1 \leq n_y \leq 4\))
col 3: slope_x from \(n_x \geq 2\) onward (undefined for class X)
col 4: slope_y from \(n_y \geq 2\) onward (undefined for class X)
col 5: the offset_x is defined as \(M_k(2, 2) - 2 \cdot \mathrm{slope_x}\)
col 6: the offset_x is defined as \(M_k(2, 2) - 2 \cdot \mathrm{slope_y}\)
col 7: (M_k(1,
This HydroShare resource was created as a demonstration of how a reproducible data science workflow can be created and shared using HydroShare. The hsclient Python Client package for HydroShare is used to show how the content files for the analysis can be managed and shared automatically in HydroShare. The content files include a Jupyter notebook that demonstrates a simple regression analysis to develop a model of annual maximum discharge in the Logan River in northern Utah, USA from annual maximum snow water equivalent data from a snowpack telemetry (SNOTEL) monitoring site located in the watershed. Streamflow data are retrieved from the United States Geological Survey (USGS) National Water Information System using the dataretrieval package. Snow water equivalent data are retrieved from the United States Department of Agriculture Natural Resources Conservation Service (NRCS) SNOTEL system. An additional notebook demonstrates how to use hsclient to retrieve data from HydroShare, load it into a performance data object, and then use the data for visualization and analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This resource contains Jupyter Notebooks with examples for accessing USGS NWIS data via web services and performing subsequent analysis related to drought with particular focus on sites in Utah and the southwestern United States (could be modified to any USGS sites). The code uses the Python DataRetrieval package. The resource is part of set of materials for hydroinformatics and water data science instruction. Complete learning module materials are found in HydroLearn: Jones, A.S., Horsburgh, J.S., Bastidas Pacheco, C.J. (2022). Hydroinformatics and Water Data Science. HydroLearn. https://edx.hydrolearn.org/courses/course-v1:USU+CEE6110+2022/about.
This resources consists of 6 example notebooks: 1. Example 1: Import and plot daily flow data 2. Example 2: Import and plot instantaneous flow data for multiple sites 3. Example 3: Perform analyses with USGS annual statistics data 4. Example 4: Retrieve data and find daily flow percentiles 3. Example 5: Further examination of drought year flows 6. Coding challenge: Assess drought severity
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Author: Andrew J. Felton
Date: 11/15/2024
This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis, and figure production for the study entitled:
"Global estimates of the storage and transit time of water through vegetation"
Please note that 'turnover' and 'transit' are used interchangeably. Also please note that this R project has been updated multiple times as the analysis has updated throughout the peer review process.
#Data information:
The data folder contains key data sets used for analysis. In particular:
"data/turnover_from_python/updated/august_2024_lc/" contains the core datasets used in this study including global arrays summarizing five year (2016-2020) averages of mean (annual) and minimum (monthly) transit time, storage, canopy transpiration, and number of months of data able as both an array (.nc) or data table (.csv). These data were produced in python using the python scripts found in the "supporting_code" folder. The remaining files in the "data" and "data/supporting_data" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here. The "supporting_data"" folder also contains annual (2016-2020) MODIS land cover data used in the analysis and contains separate filters containing the original data (.hdf) and then the final process (filtered) data in .nc format. The resulting annual land cover distributions were used in the pre-processing of data in python.
#Code information
Python scripts can be found in the "supporting_code" folder.
Each R script in this project has a role:
"01_start.R": This script sets the working directory, loads in the tidyverse package (the remaining packages in this project are called using the `::` operator), and can run two other scripts: one that loads the customized functions (02_functions.R) and one for importing and processing the key dataset for this analysis (03_import_data.R).
"02_functions.R": This script contains custom functions. Load this using the `source()` function in the 01_start.R script.
"03_import_data.R": This script imports and processes the .csv transit data. It joins the mean (annual) transit time data with the minimum (monthly) transit data to generate one dataset for analysis: annual_turnover_2. Load this using the
`source()` function in the 01_start.R script.
"04_figures_tables.R": This is the main workhouse for figure/table production and supporting analyses. This script generates the key figures and summary statistics used in the study that then get saved in the "manuscript_figures" folder. Note that all maps were produced using Python code found in the "supporting_code"" folder. Also note that within the "manuscript_figures" folder there is an "extended_data" folder, which contains tables of the summary statistics (e.g., quartiles and sample sizes) behind figures containing box plots or depicting regression coefficients.
"supporting_generate_data.R": This script processes supporting data used in the analysis, primarily the varying ground-based datasets of leaf water content.
"supporting_process_land_cover.R": This takes annual MODIS land cover distributions and processes them through a multi-step filtering process so that they can be used in preprocessing of datasets in python.