Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Author: Andrew J. Felton
Date: 10/29/2024
This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis, and figure production for the study entitled:
"Global estimates of the storage and transit time of water through vegetation"
Please note that 'turnover' and 'transit' are used interchangeably. Also please note that this R project has been updated multiple times as the analysis has updated.
Data information:
The data folder contains key data sets used for analysis. In particular:
"data/turnover_from_python/updated/august_2024_lc/" contains the core datasets used in this study including global arrays summarizing five year (2016-2020) averages of mean (annual) and minimum (monthly) transit time, storage, canopy transpiration, and number of months of data able as both an array (.nc) or data table (.csv). These data were produced in python using the python scripts found in the "supporting_code" folder. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here. The "supporting_data"" folder also contains annual (2016-2020) MODIS land cover data used in the analysis and contains separate filters containing the original data (.hdf) and then the final process (filtered) data in .nc format. The resulting annual land cover distributions were used in the pre-processing of data in python.
#Code information
Python scripts can be found in the "supporting_code" folder.
Each R script in this project has a role:
"01_start.R": This script sets the working directory, loads in the tidyverse package (the remaining packages in this project are called using the `::` operator), and can run two other scripts: one that loads the customized functions (02_functions.R) and one for importing and processing the key dataset for this analysis (03_import_data.R).
"02_functions.R": This script contains custom functions. Load this using the
`source()` function in the 01_start.R script.
"03_import_data.R": This script imports and processes the .csv transit data. It joins the mean (annual) transit time data with the minimum (monthly) transit data to generate one dataset for analysis: annual_turnover_2. Load this using the
`source()` function in the 01_start.R script.
"04_figures_tables.R": This is the main workhouse for figure/table production and
supporting analyses. This script generates the key figures and summary statistics
used in the study that then get saved in the manuscript_figures folder. Note that all
maps were produced using Python code found in the "supporting_code"" folder.
"supporting_generate_data.R": This script processes supporting data used in the analysis, primarily the varying ground-based datasets of leaf water content.
"supporting_process_land_cover.R": This takes annual MODIS land cover distributions and processes them through a multi-step filtering process so that they can be used in preprocessing of datasets in python.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset was developed by NREL's distributed energy systems integration group as part of a study on high penetrations of distributed solar PV [1]. It consists of hourly load data in CSV format for use with the PNNL taxonomy of distribution feeders [2]. These feeders were developed in the open source GridLAB-D modelling language [3]. In this dataset each of the load points in the taxonomy feeders is populated with hourly averaged load data from a utility in the feeder’s geographical region, scaled and randomized to emulate real load profiles. For more information on the scaling and randomization process, see [1].
The taxonomy feeders are statistically representative of the various types of distribution feeders found in five geographical regions of the U.S. Efforts are underway (possibly complete) to translate these feeders into the OpenDSS modelling language.
This data set consists of one large CSV file for each feeder. Within each CSV, each column represents one load bus on the feeder. The header row lists the name of the load bus. The subsequent 8760 rows represent the loads for each hour of the year. The loads were scaled and randomized using a Python script, so each load series represents only one of many possible randomizations. In the header row, "rl" = residential load and "cl" = commercial load. Commercial loads are followed by a phase letter (A, B, or C). For regions 1-3, the data is from 2009. For regions 4-5, the data is from 2000.
For use in GridLAB-D, each column will need to be separated into its own CSV file without a header. The load value goes in the second column, and corresponding datetime values go in the first column, as shown in the sample file, sample_individual_load_file.csv. Only the first value in the time column needs to written as an absolute time; subsequent times may be written in relative format (i.e. "+1h", as in the sample). The load should be written in P+Qj format, as seen in the sample CSV, in units of Watts (W) and Volt-amps reactive (VAr). This dataset was derived from metered load data and hence includes only real power; reactive power can be generated by assuming an appropriate power factor. These loads were used with GridLAB-D version 2.2.
Browse files in this dataset, accessible as individual files and as a single ZIP file. This dataset is approximately 242MB compressed or 475MB uncompressed.
For questions about this dataset, contact andy.hoke@nrel.gov.
If you find this dataset useful, please mention NREL and cite [1] in your work.
References:
[1] A. Hoke, R. Butler, J. Hambrick, and B. Kroposki, “Steady-State Analysis of Maximum Photovoltaic Penetration Levels on Typical Distribution Feeders,” IEEE Transactions on Sustainable Energy, April 2013, available at http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6357275 .
[2] K. Schneider, D. P. Chassin, R. Pratt, D. Engel, and S. Thompson, “Modern Grid Initiative Distribution Taxonomy Final Report”, PNNL, Nov. 2008. Accessed April 27, 2012: http://www.gridlabd.org/models/feeders/taxonomy of prototypical feeders.pdf
[3] K. Schneider, D. Chassin, Y. Pratt, and J. C. Fuller, “Distribution power flow for smart grid technologies”, IEEE/PES Power Systems Conference and Exposition, Seattle, WA, Mar. 2009, pp. 1-7, 15-18.
The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. The classes are completely mutually exclusive. There are 50000 training images and 10000 test images.
The batches.meta file contains the label names of each class.
The dataset was originally divided in 5 training batches with 10000 images per batch. The original dataset can be found here: https://www.cs.toronto.edu/~kriz/cifar.html. This dataset contains all the training data and test data in the same CSV file so it is easier to load.
Here is the list of the 10 classes in the CIFAR-10:
Classes: 1) 0: airplane 2) 1: automobile 3) 2: bird 4) 3: cat 5) 4: deer 6) 5: dog 7) 6: frog 8) 7: horse 9) 8: ship 10) 9: truck
The function used to open the file:
def unpickle(file):
import pickle
with open(file, 'rb') as fo:
dict = pickle.load(fo, encoding='bytes')
return dict
Example of how to read the file:
metadata_path = './cifar-10-python/batches.meta' # change this path
metadata = unpickle(metadata_path)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Scientific and management challenges in the water domain require synthesis of diverse data. Many data analysis tasks are difficult because datasets are large and complex; standard data formats are not always agreed upon or mapped to efficient structures for analysis; scientists may lack training for tackling large and complex datasets; and it can be difficult to share, collaborate around, and reproduce scientific work. Overcoming barriers to accessing, organizing, and preparing datasets for analyses can transform the way water scientists work. Building on the HydroShare repository’s cyberinfrastructure, we have advanced two Python packages that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS) (i.e., a Python equivalent of USGS’ R dataRetrieval package), loading data into performant structures that integrate with existing visualization, analysis, and data science capabilities available in Python, and writing analysis results back to HydroShare for sharing and publication. While these Python packages can be installed for use within any Python environment, we will demonstrate how the technical burden for scientists associated with creating a computational environment for executing analyses can be reduced and how sharing and reproducibility of analyses can be enhanced through the use of these packages within CUAHSI’s HydroShare-linked JupyterHub server.
This HydroShare resource includes all of the materials presented in a workshop at the 2023 CUAHSI Biennial Colloquium.
Dataset Card for Python-DPO
This dataset is the smaller version of Python-DPO-Large dataset and has been created using Argilla.
Load with datasets
To load this dataset with datasets, you'll just need to install datasets as pip install datasets --upgrade and then use the following code: from datasets import load_dataset
ds = load_dataset("NextWealth/Python-DPO")
Data Fields
Each data instance contains:
instruction: The problem description/requirements… See the full description on the dataset page: https://huggingface.co/datasets/NextWealth/Python-DPO.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The speed, direction of the wind and the variable wind indicator are the variables recorded by the meteorological network of the Chilean Meteorological Directorate (DMC). This collection contains the information stored by 326 stations that have recorded, at some point, the orientation of the wind since 1950, spaced one hour apart. It is important to note that not all stations are currently operational.
The data is updated directly from the DMC's web services and can be viewed in the Data Series viewer of the Itrend Data Platform.
In addition, a historical database is provided in .npz* and .mat** format that is updated every 30 days for those stations that are still valid.
*To load the data correctly in Python it is recommended to use the following code:
import numpy as np
with np.load(filename, allow_pickle = True) as f:
data = {}
for key, value in f.items():
data[key] = value.item()
**Date data is in datenum
format, and to load it correctly in datetime
format, it is recommended to use the following command in MATLAB:
datetime(TS.x , 'ConvertFrom' , 'datenum')
CIFAR-10 Python (in CSV): LINK
The CIFAR-100 dataset consists of 60000 32x32 colour images in 100 classes, with 600 images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs). There are 50000 training images and 10000 test images. The meta file contains the label names of each class and superclass.
Here is the list of the 100 classes in the CIFAR-100:
Classes: 1-5) beaver, dolphin, otter, seal, whale 6-10) aquarium fish, flatfish, ray, shark, trout 11-15) orchids, poppies, roses, sunflowers, tulips 16-20) bottles, bowls, cans, cups, plates 21-25) apples, mushrooms, oranges, pears, sweet peppers 26-30) clock, computer keyboard, lamp, telephone, television 31-35) bed, chair, couch, table, wardrobe 36-40) bee, beetle, butterfly, caterpillar, cockroach 41-45) bear, leopard, lion, tiger, wolf 46-50) bridge, castle, house, road, skyscraper 51-55) cloud, forest, mountain, plain, sea 56-60) camel, cattle, chimpanzee, elephant, kangaroo 61-65) fox, porcupine, possum, raccoon, skunk 66-70) crab, lobster, snail, spider, worm 71-75) baby, boy, girl, man, woman 76-80) crocodile, dinosaur, lizard, snake, turtle 81-85) hamster, mouse, rabbit, shrew, squirrel 86-90) maple, oak, palm, pine, willow 91-95) bicycle, bus, motorcycle, pickup truck, train 96-100) lawn-mower, rocket, streetcar, tank, tractor
and the list of the 20 superclasses: 1) aquatic mammals (classes 1-5) 2) fish (classes 6-10) 3) flowers (classes 11-15) 4) food containers (classes 16-20) 5) fruit and vegetables (classes 21-25) 6) household electrical devices (classes 26-30) 7) household furniture (classes 31-35) 8) insects (classes 36-40) 9) large carnivores (classes 41-45) 10) large man-made outdoor things (classes 46-50) 11) large natural outdoor scenes (classes 51-55) 12) large omnivores and herbivores (classes 56-60) 13) medium-sized mammals (classes 61-65) 14) non-insect invertebrates (classes 66-70) 15) people (classes 71-75) 16) reptiles (classes 76-80) 17) small mammals (classes 81-85) 18) trees (classes 86-90) 19) vehicles 1 (classes 91-95) 20) vehicles 2 (classes 96-100)
The function used to open each file:
def unpickle(file):
import pickle
with open(file, 'rb') as fo:
dict = pickle.load(fo, encoding='bytes')
return dict
Example of how to read the metadata and the superclasses:
metadata_path = './cifar-100-python/meta' # change this path`\
metadata = unpickle(metadata_path)
superclass_dict = dict(list(enumerate(metadata[b'coarse_label_names'])))
How to load the training and test sets (using superclasses): ``` data_pre_path = './cifar-100-python/' # change this path
data_train_path = data_pre_path + 'train' data_test_path = data_pre_path + 'test'
data_train_dict = unpickle(data_train_path) data_test_dict = unpickle(data_test_path)
data_train = data_train_dict[b'data'] label_train = np.array(data_train_dict[b'coarse_labels']) data_test = data_test_dict[b'data'] label_test = np.array(data_test_dict[b'coarse_labels']) ```
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Scientific and related management challenges in the water domain require synthesis of data from multiple domains. Many data analysis tasks are difficult because datasets are large and complex; standard formats for data types are not always agreed upon nor mapped to an efficient structure for analysis; water scientists may lack training in methods needed to efficiently tackle large and complex datasets; and available tools can make it difficult to share, collaborate around, and reproduce scientific work. Overcoming these barriers to accessing, organizing, and preparing datasets for analyses will be an enabler for transforming scientific inquiries. Building on the HydroShare repository’s established cyberinfrastructure, we have advanced two packages for the Python language that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS), loading of data into performant structures keyed to specific scientific data types and that integrate with existing visualization, analysis, and data science capabilities available in Python, and then writing analysis results back to HydroShare for sharing and eventual publication. These capabilities reduce the technical burden for scientists associated with creating a computational environment for executing analyses by installing and maintaining the packages within CUAHSI’s HydroShare-linked JupyterHub server. HydroShare users can leverage these tools to build, share, and publish more reproducible scientific workflows. The HydroShare Python Client and USGS NWIS Data Retrieval packages can be installed within a Python environment on any computer running Microsoft Windows, Apple MacOS, or Linux from the Python Package Index using the PIP utility. They can also be used online via the CUAHSI JupyterHub server (https://jupyterhub.cuahsi.org/) or other Python notebook environments like Google Collaboratory (https://colab.research.google.com/). Source code, documentation, and examples for the software are freely available in GitHub at https://github.com/hydroshare/hsclient/ and https://github.com/USGS-python/dataretrieval.
This presentation was delivered as part of the Hawai'i Data Science Institute's regular seminar series: https://datascience.hawaii.edu/event/data-science-and-analytics-for-water/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of Jupyter Notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we analyzed 1.4 million notebooks from GitHub.
This repository contains two files:
The dump.tar.bz2 file contains a PostgreSQL dump of the database, with all the data we extracted from the notebooks.
The jupyter_reproducibility.tar.bz2 file contains all the scripts we used to query and download Jupyter Notebooks, extract data from them, and analyze the data. It is organized as follows:
In the remaining of this text, we give instructions for reproducing the analyses, by using the data provided in the dump and reproducing the collection, by collecting data from GitHub again.
Reproducing the Analysis
This section shows how to load the data in the database and run the analyses notebooks. In the analysis, we used the following environment:
Ubuntu 18.04.1 LTS
PostgreSQL 10.6
Conda 4.5.11
Python 3.7.2
PdfCrop 2012/11/02 v1.38
First, download dump.tar.bz2 and extract it:
tar -xjf dump.tar.bz2
It extracts the file db2019-03-13.dump. Create a database in PostgreSQL (we call it "jupyter"), and use psql to restore the dump:
psql jupyter < db2019-03-13.dump
It populates the database with the dump. Now, configure the connection string for sqlalchemy by setting the environment variable JUP_DB_CONNECTTION:
export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter";
Download and extract jupyter_reproducibility.tar.bz2:
tar -xjf jupyter_reproducibility.tar.bz2
Create a conda environment with Python 3.7:
conda create -n analyses python=3.7
conda activate analyses
Go to the analyses folder and install all the dependencies of the requirements.txt
cd jupyter_reproducibility/analyses
pip install -r requirements.txt
For reproducing the analyses, run jupyter on this folder:
jupyter notebook
Execute the notebooks on this order:
Reproducing or Expanding the Collection
The collection demands more steps to reproduce and takes much longer to run (months). It also involves running arbitrary code on your machine. Proceed with caution.
Requirements
This time, we have extra requirements:
All the analysis requirements
lbzip2 2.5
gcc 7.3.0
Github account
Gmail account
Environment
First, set the following environment variables:
export JUP_MACHINE="db"; # machine identifier
export JUP_BASE_DIR="/mnt/jupyter/github"; # place to store the repositories
export JUP_LOGS_DIR="/home/jupyter/logs"; # log files
export JUP_COMPRESSION="lbzip2"; # compression program
export JUP_VERBOSE="5"; # verbose level
export JUP_DB_CONNECTION="postgresql://user:password@hostname/jupyter"; # sqlchemy connection
export JUP_GITHUB_USERNAME="github_username"; # your github username
export JUP_GITHUB_PASSWORD="github_password"; # your github password
export JUP_MAX_SIZE="8000.0"; # maximum size of the repositories directory (in GB)
export JUP_FIRST_DATE="2013-01-01"; # initial date to query github
export JUP_EMAIL_LOGIN="gmail@gmail.com"; # your gmail address
export JUP_EMAIL_TO="target@email.com"; # email that receives notifications
export JUP_OAUTH_FILE="~/oauth2_creds.json" # oauth2 auhentication file
export JUP_NOTEBOOK_INTERVAL=""; # notebook id interval for this machine. Leave it in blank
export JUP_REPOSITORY_INTERVAL=""; # repository id interval for this machine. Leave it in blank
export JUP_WITH_EXECUTION="1"; # run execute python notebooks
export JUP_WITH_DEPENDENCY="0"; # run notebooks with and without declared dependnecies
export JUP_EXECUTION_MODE="-1"; # run following the execution order
export JUP_EXECUTION_DIR="/home/jupyter/execution"; # temporary directory for running notebooks
export JUP_ANACONDA_PATH="~/anaconda3"; # conda installation path
export JUP_MOUNT_BASE="/home/jupyter/mount_ghstudy.sh"; # bash script to mount base dir
export JUP_UMOUNT_BASE="/home/jupyter/umount_ghstudy.sh"; # bash script to umount base dir
export JUP_NOTEBOOK_TIMEOUT="300"; # timeout the extraction
# Frequenci of log report
export JUP_ASTROID_FREQUENCY="5";
export JUP_IPYTHON_FREQUENCY="5";
export JUP_NOTEBOOKS_FREQUENCY="5";
export JUP_REQUIREMENT_FREQUENCY="5";
export JUP_CRAWLER_FREQUENCY="1";
export JUP_CLONE_FREQUENCY="1";
export JUP_COMPRESS_FREQUENCY="5";
export JUP_DB_IP="localhost"; # postgres database IP
Then, configure the file ~/oauth2_creds.json, according to yagmail documentation: https://media.readthedocs.org/pdf/yagmail/latest/yagmail.pdf
Configure the mount_ghstudy.sh and umount_ghstudy.sh scripts. The first one should mount the folder that stores the directories. The second one should umount it. You can leave the scripts in blank, but it is not advisable, as the reproducibility study runs arbitrary code on your machine and you may lose your data.
Scripts
Download and extract jupyter_reproducibility.tar.bz2:
tar -xjf jupyter_reproducibility.tar.bz2
Install 5 conda environments and 5 anaconda environments, for each python version. In each of them, upgrade pip, install pipenv, and install the archaeology package (Note that it is a local package that has not been published to pypi. Make sure to use the -e option):
Conda 2.7
conda create -n raw27 python=2.7 -y
conda activate raw27
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Anaconda 2.7
conda create -n py27 python=2.7 anaconda -y
conda activate py27
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Conda 3.4
It requires a manual jupyter and pathlib2 installation due to some incompatibilities found on the default installation.
conda create -n raw34 python=3.4 -y
conda activate raw34
conda install jupyter -c conda-forge -y
conda uninstall jupyter -y
pip install --upgrade pip
pip install jupyter
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
pip install pathlib2
Anaconda 3.4
conda create -n py34 python=3.4 anaconda -y
conda activate py34
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Conda 3.5
conda create -n raw35 python=3.5 -y
conda activate raw35
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Anaconda 3.5
It requires the manual installation of other anaconda packages.
conda create -n py35 python=3.5 anaconda -y
conda install -y appdirs atomicwrites keyring secretstorage libuuid navigator-updater prometheus_client pyasn1 pyasn1-modules spyder-kernels tqdm jeepney automat constantly anaconda-navigator
conda activate py35
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Conda 3.6
conda create -n raw36 python=3.6 -y
conda activate raw36
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Anaconda 3.6
conda create -n py36 python=3.6 anaconda -y
conda activate py36
conda install -y anaconda-navigator jupyterlab_server navigator-updater
pip install --upgrade pip
pip install pipenv
pip install -e jupyter_reproducibility/archaeology
Conda 3.7
<code
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
3D skeletons UP-Fall Dataset
Different between Fall and Impact detection
Overview
This dataset aims to facilitate research in fall detection, particularly focusing on the precise detection of impact moments within fall events. The 3D skeletons data accuracy and comprehensiveness make it a valuable resource for developing and benchmarking fall detection algorithms. The dataset contains 3D skeletal data extracted from fall events and daily activities of 5 subjects performing fall scenarios
Data Collection
The skeletal data was extracted using a pose estimation algorithm, which processes images frames to determine the 3D coordinates of each joint. Sequences with less than 100 frames of extracted data were excluded to ensure the quality and reliability of the dataset. As a result, some subjects may have fewer CSV files.
CSV Structure
The data is organized by subjects, and each subject contains CSV files named according to the pattern C1S1A1T1, where:
C: Camera (1 or 2)
S: Subject (1 to 5)
A: Activity (1 to N, representing different activities)
T: Trial (1 to 3)
subject1/`: Contains CSV files for Subject 1.
C1S1A1T1.csv: Data from Camera 1, Activity 1, Trial 1 for Subject 1
C1S1A2T1.csv: Data from Camera 1, Activity 2, Trial 1 for Subject 1
C1S1A3T1.csv: Data from Camera 1, Activity 3, Trial 1 for Subject 1
C2S1A1T1.csv: Data from Camera 2, Activity 1, Trial 1 for Subject 1
C2S1A2T1.csv: Data from Camera 2, Activity 2, Trial 1 for Subject 1
C2S1A3T1.csv: Data from Camera 2, Activity 3, Trial 1 for Subject 1
subject2/`: Contains CSV files for Subject 2.
C1S2A1T1.csv: Data from Camera 1, Activity 1, Trial 1 for Subject 2
C1S2A2T1.csv: Data from Camera 1, Activity 2, Trial 1 for Subject 2
C1S2A3T1.csv: Data from Camera 1, Activity 3, Trial 1 for Subject 2
C2S2A1T1.csv: Data from Camera 2, Activity 1, Trial 1 for Subject 2
C2S2A2T1.csv: Data from Camera 2, Activity 2, Trial 1 for Subject 2
C2S2A3T1.csv: Data from Camera 2, Activity 3, Trial 1 for Subject 2
subject3/, subject4/, subject5/: Similar structure as above, but may contain fewer CSV files due to the data extraction criteria mentioned above.
Column Descriptions
Each CSV file contains the following columns representing different skeletal joints and their respective coordinates in 3D space:
Column Name
Description
joint_1_x
X coordinate of joint 1
joint_1_y
Y coordinate of joint 1
joint_1_z
Z coordinate of joint 1
joint_2_x
X coordinate of joint 2
joint_2_y
Y coordinate of joint 2
joint_2_z
Z coordinate of joint 2
...
...
joint_n_x
X coordinate of joint n
joint_n_y
Y coordinate of joint n
joint_n_z
Z coordinate of joint n
LABEL
Label indicating impact (1) or non-impact (0)
Example
Here is an example of what a row in one of the CSV files might look like:
joint_1_x
joint_1_y
joint_1_z
joint_2_x
joint_2_y
joint_2_z
...
joint_n_x
joint_n_y
joint_n_33
LABEL
0.123
0.456
0.789
0.234
0.567
0.890
...
0.345
0.678
0.901
0
Usage
This data can be used for developing and benchmarking impact fall detection algorithms. It provides detailed information on human posture and movement during falls, making it suitable for machine learning and deep learning applications in impact fall detection and prevention.
Using github
Clone the repository:
-bash git clone
https://github.com/Tresor-Koffi/3D_skeletons-UP-Fall-Dataset
Navigate to the directory:
-bash -cd 3D_skeletons-UP-Fall-Dataset
Examples
Here's a simple example of how to load and inspect a sample data file using Python:```pythonimport pandas as pd
data = pd.read_csv('subject1/C1S1A1T1.csv')print(data.head())
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data inputted in the simulation were generated by two Python scripts: "GENERATE_SAMPLES.py" and "GENERATE_RESAMPLING_DATA.py".1. "GENERATE_SAMPLES.py": In this Python script, we aim to generate a) "DataSet_n[N]_p[p].pickle" where N is replaced by 500 or 5000, p is replaced by 2 or 10. This Python object contains: a1. the explicative variables "X", a2. the responses "Y", a3. the knots "knots", a4. the target tail index parameters "gamma0", a5. the k-different ranndom state responses "Yk" with k=1,..,100. To read these data, you should run the following python code (take n=5000 and p=10 for example) import pickle with open('DataSet_n5000_p10.pickle', 'rb') as handle: X = pickle.load(handle) Y = pickle.load(handle) knots = pickle.load(handle) gamma0 = pickle.load(handle) Yk = pickle.load(handle) b) "gridX_p[p].picke" where p is replaced by 2 or 10. This Python object contains: b1. the setting points "gridX" which correspond to (x(1)_(m1),...,x(p)_(mp)) in the paper, b2. "prefactor" corresponds to \Delta(p)x in the paper b3. "gamma0_gridX corresponds to gamma0(gridX) To read these data, you should run the following python code (take p=10 for example) import pickle with open('gridX_p10.pickle', 'rb') as handle: gridX = pickle.load(handle) prefactor = pickle.load(handle) gamma0_gridX = pickle.load(handle)2. "GENERATE_RESAMPLING_DATA.py": In this Python script, we aim to generate: a) "DataSet_Resampling_n[N]_p[p]_w_replacement.pickle" where N is replaced by 500 or 5000, p is replaced by 2 or 10. This Python object contains: a1. the resampling explicative variables "X_resample", a2. the knots "knots", a3. the resampling k-different random state response "Y_resample". To read these data, you should run the following python code (take N=5000 and p=10 for example) import pickle with open('DataSet_Resampling_n5000_p10_w_replacement.pickle', 'rb') as handle: X_resample = pickle.load(handle) ignored = pickle.load(handle) Y_resample = pickle.load(handle)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of movements drawn with a wireless remote controller in an immersive VR environment for 6 different movements called "Cube," "Cylinder", "Heart", "Infinity", "Sphere" and "Triangle". Data were collected from 10 participants. Each movement was collected 3 times for each participant for each session and 3 sessions were performed thus providing a total of 9 repetitions of each movement per participant. The data were projected onto the frontal plane facing the head mounted display. The total number of movements collected in the database is 540. Data were collected using Unity 2020.3.26f1 with the Oculus Rift S and resampled to 32 points.
A Python script is also included with an example of how to load the data. The Python script was tested with: # Python - 3.8.8 # Panda - 1.3.1 # Numpy - 1.21.1 # Matplotlib - 3.4.2
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains simultaneous electroencephalography (EEG) and near-infrared spectroscopy (fNIRS) signals recorded from 12 participants while performing a silent naming task and three sensory-based imagery tasks using visual, auditory, and tactile perception. Participants were asked to visualize an object in their minds, imagine the sounds made by the object, and imagine the feeling of touching the object.
EEG data were acquired with a BioSemi ActiveTwo system with 64 electrodes positioned according to the international 10-20 system, plus one electrode on each earlobe as references ('EXG1' channel is the left ear electrode and 'EXG2' channel is the right ear electrode). Additionally, 2 electrodes placed on the left hand measured galvanic skin response ('GSR1' channel) and a respiration belt around the waist measured respiration ('Resp' channel). The sampling rate was 2048 Hz.
The electrode names were saved in a default BioSemi labeling scheme (A1-A32, B1-B32). See the Biosemi documentation for the corresponding international 10-20 naming scheme (https://www.biosemi.com/pics/cap_64_layout_medium.jpg, https://www.biosemi.com/headcap.htm).
For convenience, the following ordered channels
['A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7', 'A8', 'A9', 'A10', 'A11', 'A12', 'A13', 'A14', 'A15', 'A16', 'A17', 'A18', 'A19', 'A20', 'A21', 'A22', 'A23', 'A24', 'A25', 'A26', 'A27', 'A28', 'A29', 'A30', 'A31', 'A32', 'B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B9', 'B10', 'B11', 'B12', 'B13', 'B14', 'B15', 'B16', 'B17', 'B18', 'B19', 'B20', 'B21', 'B22', 'B23', 'B24', 'B25', 'B26', 'B27', 'B28', 'B29', 'B30', 'B31', 'B32']
can thus be renamed to
['Fp1', 'AF7', 'AF3', 'F1', 'F3', 'F5', 'F7', 'FT7', 'FC5', 'FC3', 'FC1', 'C1', 'C3', 'C5', 'T7', 'TP7', 'CP5', 'CP3', 'CP1', 'P1', 'P3', 'P5', 'P7', 'P9', 'PO7', 'PO3', 'O1', 'Iz', 'Oz', 'POz', 'Pz', 'CPz', 'Fpz', 'Fp2', 'AF8', 'AF4', 'AFz', 'Fz', 'F2', 'F4', 'F6', 'F8', 'FT8', 'FC6', 'FC4', 'FC2', 'FCz', 'Cz', 'C2', 'C4', 'C6', 'T8', 'TP8', 'CP6', 'CP4', 'CP2', 'P2', 'P4', 'P6', 'P8', 'P10', 'PO8', 'PO4', 'O2']
fNIRS data were acquired with a NIRx NIRScoutXP continuous wave imaging system equipped with 4 light detectors, 8 light emitters (sources), and low-profile fNIRS optodes. Both electrodes and optodes were placed in a NIRx NIRScap for integrated fNIRS-EEG layouts. Two different montages were used: frontal and temporal, see references for more information.
Folder 'stimuli' contains all images of the semantic categories of animals and tools presented to participants.
We have prepared example scripts to demonstrate how to load the EEG and fNIRS data into Python using MNE and MNE-BIDS packages. These scripts are located in the 'code' directory.
This dataset was analyzed in the following publications:
[1] Rybář, M., Poli, R. and Daly, I., 2024. Using data from cue presentations results in grossly overestimating semantic BCI performance. Scientific Reports, 14(1), p.28003.
[2] Rybář, M., Poli, R. and Daly, I., 2021. Decoding of semantic categories of imagined concepts of animals and tools in fNIRS. Journal of Neural Engineering, 18(4), p.046035.
[3] Rybář, M., 2023. Towards EEG/fNIRS-based semantic brain-computer interfacing (Doctoral dissertation, University of Essex).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Wiens, S., van Berlekom, E., Szychowska, M., & Eklund, R. (2019). Visual Perceptual Load Does Not Affect the Frequency Mismatch Negativity. Frontiers in Psychology, 10(1970). doi:10.3389/fpsyg.2019.01970We manipulated visual perceptual load (high and low load) while we recorded electroencephalography. Event-related potentials (ERPs) were computed from these data.OSF_*.pdf contains the preregistration at open science framework (osf).https://doi.org/10.17605/OSF.IO/EWG9XERP_2019_rawdata_bdf.zip contains the raw eeg data files that were recorded with a biosemi system (www.biosemi.com). The files can be opened in matlab with the fieldtrip toolbox. https://www.mathworks.com/products/matlab.htmlhttp://www.fieldtriptoolbox.org/ERP_2019_visual_load_fieldtrip_scripts.zip contains all the matlab scripts that were used to process the ERP data with the toolbox fieldtrip. http://www.fieldtriptoolbox.org/ERP_2019_fieldtrip_mat_*.zip contain the final, preprocessed individual data files. They can be opened with matlab.ERP_2019_visual_load_python_scripts.zip contains the python scripts for the main task. They need python (https://www.python.org/) and psychopy (http://www.psychopy.org/)ERP_2019_visual_load_wmc_R_scripts.zip contains the R scripts to process the working memory capacity (wmc) data. https://www.r-project.org/.ERP_2019_visual_load_R_scripts.zip contains the R scripts to analyze the data and the output files with figures (eg scatterplots). https://www.r-project.org/.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dew point or dew temperature is the highest temperature at which the water vapor contained in the air begins to condense, producing dew, mist, any type of cloud or, if the temperature is low enough, Frost. This is one of the variables recorded by the meteorological network of the Chilean Meteorological Directorate (DMC). This collection contains the information stored by 321 stations that have recorded, at some point, the dew point since 1950, spaced every hour. It is important to note that not all stations are currently operational.
The data is updated directly from the DMC's web services and can be viewed in the Data Series viewer of the Itrend Data Platform.
In addition, a historical database is provided in .npz* and .mat** format that is updated every 30 days for those stations that are still valid.
*To load the data correctly in Python it is recommended to use the following code:
import numpy as np
with np.load(filename, allow_pickle = True) as f:
data = {}
for key, value in f.items():
data[key] = value.item()
**Date data is in datenum
format, and to load it correctly in datetime
format, it is recommended to use the following command in MATLAB:
datetime(TS.x , 'ConvertFrom' , 'datenum')
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Open data: Visual load does not decrease the auditory steady state response to 40-Hz amplitude-modulated tones The main results files are saved separately: - ASSR_study1.html: R output of the main analyses- ASSR_study1_subset_subjects.html: R output of the main analyses- ASSR_study2.html: R output of the main analyses The studies were preregistered:Study 1: https://doi.org/10.17605/OSF.IO/UYJVAStudy 2: https://doi.org/10.17605/OSF.IO/JVMFD DATA & FILE OVERVIEW File List:The files contain the raw data, scripts, and results of main and supplementary analyses of two electroencephalography (EEG) studies (Study1, Study2). Links to the hardware and software are provided under methodological information. ASSR_study1_experiment_scripts.zip: contains the Python files to run the experiment. ASSR_study1_rawdata.zip: contains raw datafiles for each subject - data_EEG: EEG data in bdf format (generated by Biosemi)- data_log: logfiles of the EEG session (generated by Python)- data_WMC: logfiles of the working memory capacity task (generated by Python) ASSR_study1_EEG_scripts.zip: Python-MNE scripts to process the EEG data ASSR_study1_EEG_preprocessed.zip: Preprocessed EEG data from Python-MNE ASSR_study1_analysis_scripts.zip: R scripts to analyze the data together with the main datafiles. The main files in the folder are: - ASSR_study1.html: R output of the main analyses- ASSR_study1_subset_subjects.html: R output of the main analyses but after excluding five subjects who were excluded because of stricter, preregistered artifact rejection criteria ASSR_study1_figures.zip: contains all figures and tables that are created by Python-MNE and R. ASSR_study2_experiment_scripts.zip: contains the Python files to run the experiment ASSR_study2_rawdata.zip: contains raw datafiles for each subject - data_EEG: EEG data in bdf format (generated by Biosemi)- data_log: logfiles of the EEG session (generated by Python)- data_WMC: logfiles of the working memory capacity task (generated by Python) ASSR_study2_EEG_scripts.zip: Python-MNE scripts to process the EEG data ASSR_study2_EEG_preprocessed.zip: Preprocessed EEG data from Python-MNE ASSR_study2_analysis_scripts.zip: R scripts to analyze the data together with the main datafiles. The main files in the folder are: - ASSR_study2.html: R output of the main analyses- ASSR_compare_performance_between_studies.html: R output of analyses that compare behavioral performance between study 1 and study 2. ASSR_study2_figures.zip: contains all figures and tables that are created by Python-MNE and R. Instrument- or software-specific information needed to interpret the data:MNE-Python (Gramfort A., et al., 2013): https://mne.tools/stable/index.html#Rstudio used with R (R Core Team, 2016): https://rstudio.com/products/rstudio/Wiens, S. (2017). Aladins Bayes Factor in R (Version 3). https://www.doi.org/10.17045/sthlmuni.4981154.v3
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2352583%2F868a18fb09d7a1d3da946d74a9857130%2FLogo.PNG?generation=1604973725053566&alt=media" alt="">
Medical Dataset for Abbreviation Disambiguation for Natural Language Understanding (MeDAL) is a large medical text dataset curated for abbreviation disambiguation, designed for natural language understanding pre-training in the medical domain. It was published at the ClinicalNLP workshop at EMNLP.
💻 Code 🤗 Dataset (Hugging Face) 💾 Dataset (Kaggle) 💽 Dataset (Zenodo) 📜 Paper (ACL) 📝 Paper (Arxiv) ⚡ Pre-trained ELECTRA (Hugging Face)
We recommend downloading from Kaggle if you can authenticate through their API. The advantage to Kaggle is that the data is compressed, so it will be faster to download. Links to the data can be found at the top of the readme.
First, you will need to create an account on kaggle.com. Afterwards, you will need to install the kaggle API:
pip install kaggle
Then, you will need to follow the instructions here to add your username and key. Once that's done, you can run:
kaggle datasets download xhlulu/medal-emnlp
Now, unzip everything and place them inside the data
directory:
unzip -nq crawl-300d-2M-subword.zip -d data
mv data/pretrain_sample/* data/
For the LSTM models, we will need to use the fastText embeddings. To do so, first download and extract the weights:
wget -nc -P data/ https://dl.fbaipublicfiles.com/fasttext/vectors-english/crawl-300d-2M-subword.zip
unzip -nq data/crawl-300d-2M-subword.zip -d data/
You can directly load LSTM and LSTM-SA with torch.hub
:
```python
import torch
lstm = torch.hub.load("BruceWen120/medal", "lstm") lstm_sa = torch.hub.load("BruceWen120/medal", "lstm_sa") ```
If you want to use the Electra model, you need to first install transformers:
pip install transformers
Then, you can load it with torch.hub
:
python
import torch
electra = torch.hub.load("BruceWen120/medal", "electra")
transformers
If you are only interested in the pre-trained ELECTRA weights (without the disambiguation head), you can load it directly from the Hugging Face Repository:
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("xhlu/electra-medal")
tokenizer = AutoTokenizer.from_pretrained("xhlu/electra-medal")
Download the bibtex
here, or copy the text below:
@inproceedings{wen-etal-2020-medal,
title = "{M}e{DAL}: Medical Abbreviation Disambiguation Dataset for Natural Language Understanding Pretraining",
author = "Wen, Zhi and Lu, Xing Han and Reddy, Siva",
booktitle = "Proceedings of the 3rd Clinical Natural Language Processing Workshop",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.clinicalnlp-1.15",
pages = "130--135",
}
The ELECTRA model is licensed under Apache 2.0. The license for the libraries used in this project (transformers
, pytorch
, etc.) can be found in their respective GitHub repository. Our model is released under a MIT license.
The original dataset was retrieved and modified from the NLM website. By using this dataset, you are bound by the terms and conditions specified by NLM:
INTRODUCTION
Downloading data from the National Library of Medicine FTP servers indicates your acceptance of the following Terms and Conditions: No charges, usage fees or royalties are paid to NLM for this data.
MEDLINE/PUBMED SPECIFIC TERMS
NLM freely provides PubMed/MEDLINE data. Please note some PubMed/MEDLINE abstracts may be protected by copyright.
GENERAL TERMS AND CONDITIONS
Users of the data agree to:
- acknowledge NLM as the source of the data by including the phrase "Courtesy of the U.S. National Library of Medicine" in a clear and conspicuous manner,
- properly use registration and/or trademark symbols when referring to NLM products, and
- not indicate or imply that NLM has endorsed its products/services/applications.
Users who republish or redistribute the data (services, products or raw data) agree to:
- maintain the most current version of all distributed data, or
- make known in a clear and conspicuous manner that the products/services/applications do not reflect the most current/accurate data available from NLM.
These data are produced with a reasonable standard of care, but NLM makes no warranties express or implied, including no warranty of merchantability or fitness for particular purpose, regarding the accuracy or completeness of the data. Users agree to hold NLM and the U.S. Government harmless from any liability resulting from errors in the data. NLM disclaims any liability for any consequences due to use, misuse, or interpretation of information contained or not contained in the data.
NLM does not provide legal advice regarding copyright, fair use, or other aspects of intellectual property rights. See the NLM Copyright page.
NLM reserves the right to change the type and format of its machine-readable data. NLM will take reasonable steps to inform users of any changes to the format of the data before the data are distributed via the announcement section or subscription to email and RSS updates.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This is a classic and very widely used dataset in machine learning and statistics, often serving as a first dataset for classification problems. Introduced by the British statistician and biologist Ronald Fisher in his 1936 paper "The use of multiple measurements in taxonomic problems," it is a foundational resource for learning classification algorithms.
Overview:
The dataset contains measurements for 150 samples of iris flowers. Each sample belongs to one of three species of iris:
For each flower, four features were measured:
The goal is typically to build a model that can classify iris flowers into their correct species based on these four features.
File Structure:
The dataset is usually provided as a single CSV (Comma Separated Values) file, often named iris.csv
or similar. This file typically contains the following columns:
Content of the Data:
The dataset contains an equal number of samples (50) for each of the three iris species. The measurements of the sepal and petal dimensions vary between the species, allowing for their differentiation using machine learning models.
How to Use This Dataset:
iris.csv
file.Citation:
When using the Iris dataset, it is common to cite Ronald Fisher's original work:
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179-188.
Data Contribution:
Thank you for providing this classic and fundamental dataset to the Kaggle community. The Iris dataset remains an invaluable resource for both beginners learning the basics of classification and experienced practitioners testing new algorithms. Its simplicity and clear class separation make it an ideal starting point for many data science projects.
If you find this dataset description helpful and the dataset itself useful for your learning or projects, please consider giving it an upvote after downloading. Your appreciation is valuable!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Relative humidity is the ratio of the partial pressure of water vapor to the equilibrium vapor pressure of water at a given temperature. Relative humidity depends on the temperature and pressure of the system of interest. This is one of the variables recorded by the meteorological network of the Chilean Meteorological Directorate (DMC). This collection contains the information stored by 488 stations that have recorded, at some point, the relative humidity since 1952, spaced every hour. It is important to note that not all stations are currently operational.
The data is updated directly from the DMC's web services and can be viewed in the Data Series viewer of the Itrend Data Platform.
In addition, a historical database is provided in .npz* and .mat** format that is updated every 30 days for those stations that are still valid.
*To load the data correctly in Python it is recommended to use the following code:
import numpy as np
with np.load(filename, allow_pickle = True) as f:
data = {}
for key, value in f.items():
data[key] = value.item()
**Date data is in datenum
format, and to load it correctly in datetime
format, it is recommended to use the following command in MATLAB:
datetime(TS.x , 'ConvertFrom' , 'datenum')
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains two hyperspectral and one multispectral anomaly detection images, and their corresponding binary pixel masks. They were initially used for real-time anomaly detection in line-scanning, but they can be used for any anomaly detection task.
They are in .npy file format (will add tiff or geotiff variants in the future), with the image datasets being in the order of (height, width, channels). The SNP dataset was collected using sentinelhub, and the Synthetic dataset was collected from AVIRIS. The Python code used to analyse these datasets can be found at: https://github.com/WiseGamgee/HyperAD
All that is needed to load these datasets is Python (preferably 3.8+) and the NumPy package. Example code for loading the Beach Dataset if you put it in a folder called "data" with the python script is:
import numpy as np
# Load image file
hsi_array = np.load("data/beach_hsi.npy")
n_pixels, n_lines, n_bands = hsi_array.shape
print(f"This dataset has {n_pixels} pixels, {n_lines} lines, and {n_bands}.")
# Load image mask
mask_array = np.load("data/beach_mask.npy")
m_pixels, m_lines = mask_array.shape
print(f"The corresponding anomaly mask is {m_pixels} pixels by {m_lines} lines.")
If you use any of these datasets, please cite the following paper:
@article{garske2024erx,
title={ERX - a Fast Real-Time Anomaly Detection Algorithm for Hyperspectral Line-Scanning},
author={Garske, Samuel and Evans, Bradley and Artlett, Christopher and Wong, KC},
journal={arXiv preprint arXiv:2408.14947},
year={2024},
}
If you use the beach dataset please cite the following paper as well (original source):
@article{mao2022openhsi,
title={OpenHSI: A complete open-source hyperspectral imaging solution for everyone},
author={Mao, Yiwei and Betters, Christopher H and Evans, Bradley and Artlett, Christopher P and Leon-Saval, Sergio G and Garske, Samuel and Cairns, Iver H and Cocks, Terry and Winter, Robert and Dell, Timothy},
journal={Remote Sensing},
volume={14},
number={9},
pages={2244},
year={2022},
publisher={MDPI}
}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Author: Andrew J. Felton
Date: 10/29/2024
This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis, and figure production for the study entitled:
"Global estimates of the storage and transit time of water through vegetation"
Please note that 'turnover' and 'transit' are used interchangeably. Also please note that this R project has been updated multiple times as the analysis has updated.
Data information:
The data folder contains key data sets used for analysis. In particular:
"data/turnover_from_python/updated/august_2024_lc/" contains the core datasets used in this study including global arrays summarizing five year (2016-2020) averages of mean (annual) and minimum (monthly) transit time, storage, canopy transpiration, and number of months of data able as both an array (.nc) or data table (.csv). These data were produced in python using the python scripts found in the "supporting_code" folder. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here. The "supporting_data"" folder also contains annual (2016-2020) MODIS land cover data used in the analysis and contains separate filters containing the original data (.hdf) and then the final process (filtered) data in .nc format. The resulting annual land cover distributions were used in the pre-processing of data in python.
#Code information
Python scripts can be found in the "supporting_code" folder.
Each R script in this project has a role:
"01_start.R": This script sets the working directory, loads in the tidyverse package (the remaining packages in this project are called using the `::` operator), and can run two other scripts: one that loads the customized functions (02_functions.R) and one for importing and processing the key dataset for this analysis (03_import_data.R).
"02_functions.R": This script contains custom functions. Load this using the
`source()` function in the 01_start.R script.
"03_import_data.R": This script imports and processes the .csv transit data. It joins the mean (annual) transit time data with the minimum (monthly) transit data to generate one dataset for analysis: annual_turnover_2. Load this using the
`source()` function in the 01_start.R script.
"04_figures_tables.R": This is the main workhouse for figure/table production and
supporting analyses. This script generates the key figures and summary statistics
used in the study that then get saved in the manuscript_figures folder. Note that all
maps were produced using Python code found in the "supporting_code"" folder.
"supporting_generate_data.R": This script processes supporting data used in the analysis, primarily the varying ground-based datasets of leaf water content.
"supporting_process_land_cover.R": This takes annual MODIS land cover distributions and processes them through a multi-step filtering process so that they can be used in preprocessing of datasets in python.