8 datasets found

n
Data For: Herbarium specimens provide reliable estimates of phenological...
data.niaid.nih.gov
explore.openaire.eu
+1more
zip
Updated Jun 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tadeo Ramirez-Parada; Isaac Park; Susan Mazer (2022). Data For: Herbarium specimens provide reliable estimates of phenological responsiveness to climate at unparalleled taxonomic and spatiotemporal scales [Dataset]. http://doi.org/10.25349/D9TK64
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.25349/D9TK64
Dataset updated
Jun 22, 2022
Dataset provided by
University of California, Santa Barbara
Authors
Tadeo Ramirez-Parada; Isaac Park; Susan Mazer
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Understanding the effects of climate change on the phenological structure of plant communities will require measuring variation in sensitivity among thousands of co-occurring species across regions. Herbarium collections provide vast resources with which to do this, but may also exhibit biases as sources of phenological data. Despite general recognition of these caveats, validation of herbarium-based estimates of phenological sensitivity against estimates obtained using field observations remain rare and limited in scope. Here, we leveraged extensive datasets of herbarium specimens and of field observations from the USA National Phenology Network for 21 species in the United States and, for each species, compared herbarium- and field-based standardized estimates of peak flowering dates and of sensitivity of peak flowering time to geographic and interannual variation in mean spring minimum temperatures (TMIN). We found strong agreement between herbarium- and field-based estimates for standardized peak flowering time (r=0.91, p<0.001) and for the direction and magnitude of sensitivity to both geographic TMIN variation (r=0.88, p <0.001) and interannual TMIN variation (r=0.82, p<0.001). This agreement was robust to substantial differences between datasets in 1) the long-term TMIN conditions observed among collection and phenological monitoring sites and 2) the interannual TMIN conditions observed in the time periods encompassed by both datasets for most species. Our results show that herbarium-based sensitivity estimates are reliable among species spanning a wide diversity of life histories and biomes, demonstrating their utility in a broad range of ecological contexts, and underscoring the potential of herbarium collections to enable phenoclimatic analysis at taxonomic and spatiotemporal scales not yet captured by observational data.

Methods Phenological data The dataset of field observations consisted of all records of flowering onset and termination available in the USA National Phenology Network database (NPNdb), representing an initial 1,105,764 phenological observations. To ensure the quality of the observational data, we retained only observations for which we could determine that the dates of onset and termination of flowering had an arbitrary maximum error of 14 days. To do this, we filtered the data to include only records for which the date on which the first open flower on an individual was observed was preceded by an observation of the same individual without flowers no more than 14 days prior, and for which the date on which the last flower was recorded was followed by an observation of the same individual without flowers no more than 14 days later. After filtering, field observations in our data had an average maximum error of 6.4 days for the onset of flowering, and of 6.6 days for the termination of flowering. The herbarium dataset was constructed using an initial 894,392 digital herbarium specimen records archived by 72 herbaria across North America. We excluded from analysis all specimens not explicitly recorded as being in flower, or for which GPS coordinates or dates of collection were not available. We further filtered both datasets by only retaining species that were found in both datasets and that were represented by observations at a minimum of 15 unique sites in the NPN dataset. For each species, and to more closely match the geographic ranges covered by each dataset, we filtered the herbarium dataset to include only specimens within the range of latitudes and longitudes represented by the field observations in the NPN data. Finally, we retained only species represented by 70 or more herbarium specimens to ensure sufficient sample sizes for phenoclimatic modeling. This procedure identified a final set of 21 native species represented in 3,243 field observations across 1,406 unique site-year combinations, and a final sample of 5,405 herbarium specimens across 4,906 unique site-year combinations. For the herbarium dataset, sample sizes ranged from 69 unique sites and 74 specimens for Prosopis velutina, to 1,323 unique sites containing 1,368 specimens for Achillea millefolium. Sample sizes in the NPN dataset ranged from 15 unique sites with 74 observations for Impatiens capensis, 108 unique sites with 321 observations for Cornus florida. These 21 species represented 15 families and 17 genera, spanning a diverse range of life-history strategies and growth forms, including evergreen and deciduous shrubs and trees (e.g., Quercus agrifolia and Tilia americana, respectively), as well as herbaceous perennials (e.g., Achillea millefolium) and annuals (e.g., Impatiens capensis). Our focal species covered a wide variety of biomes and regions including Western deserts (e.g., Fouquieria splendens), Mediterranean shrublands and oak woodlands (e.g., Baccharis pilularis, Quercus agrifolia), and Eastern deciduous forests (e.g., Quercus rubra, Tilia Americana). To estimate flowering dates in the herbarium dataset, we employed the day of year of collection (henceforth ‘DOY’) of each specimen collected while in flower as a proxy. Herbarium specimens in flower could have been collected at any point between the onset and termination of their flowering period and botanists may preferentially collect individuals in their flowering peak for many species. Therefore, herbarium specimen collection dates are more likely to reflect peak flowering dates than flowering onset dates. To maximize the phenological equivalence of the field and herbarium datasets, we used the median date between onset and termination of flowering for each individual in each year in the NPN data as a proxy for peak flowering time. Due to the maximum error of 14 days for flowering onset and termination dates in the NPN dataset, median flowering dates also had a maximum error of 14 days, with an average maximum error among observations of 6.5 days. To account for the artificial DOY discontinuity between December 31st (DOY = 365 or 366 in a leap year) to January 1st (DOY = 1), we converted DOY in both datasets into a circular variable using an Azimuthal correction. Climate data Daily minimum temperatures mediate key developmental processes including the break of dormancy, floral induction, and anthesis. Therefore, we used minimum surface temperatures averaged over the three months leading up to (and including) the mean flowering month for each species (hereafter ‘TMIN’) as the climatic correlate of flowering time in this study; consequently, the specific months over which temperatures were averaged varied among species. Using TMIN calculated over different time periods instead (e.g., during spring for all species) did not qualitatively affect our results. Then, we partitioned variation among sites into spatial and temporal components, characterizing TMIN for each observation by the long-term mean TMIN at its site of collection (henceforth ‘TMIN normals’), and by the deviation between its TMIN in the year of collection (for the three-month window of interest) and its long-term mean TMIN (henceforth ‘TMIN anomalies’). For each site, we obtained a monthly time series of TMIN from January, 1901, and December, 2016, using ClimateNA v6.30, a software package that interpolates 4km2 resolution climate data from PRISM Climate Group from Oregon State University, (http://prism.oregonstate.edu) to generate elevation-adjusted climate estimates. To calculate TMIN normals, we averaged observed TMIN for the three months leading up to the mean flowering date of each species across all years between 1901 and 2016 for each site. TMIN anomalies relative to long-term conditions were calculated by subtracting TMIN normals from observed TMIN conditions in the year of collection. Therefore, positive and negative values of the anomalies respectively reflect warmer-than-average and colder-than-average conditions in a given year. Analysis We also provide R code to reproduce all results presented in the main text and the supplemental materials of our study. This code includes 1) all steps necessary to merge herbarium and field data into a single dataset ready for analysis, 2) the formulation and specification of the varying-intercepts and varying-slopes Bayesian model used to generate herbarium- vs. field-based estimates of phenology and its sensitivity to TMINsp, 3) the steps required to process the output of the Bayesian model and to obtain all metrics required for the analyses in the paper, and 4) the code used to generate each figure. Contributing Herbaria Data used in this study was contributed by the Yale Peabody Museum of Natural History, the George Safford Torrey Herbarium at the University of Connecticut, the Acadia University Herbarium, the Chrysler Herbarium at Rutgers University, the University of Montreal Herbarium, the Harvard University Herbarium, the Albion Hodgdon Herbarium at the University of New Hampshire, the Academy of Natural Sciences of Drexel University, the Jepson Herbarium at the University of California-Berkeley, the University of California-Berkeley Sagehen Creek Field Station Herbarium, the California Polytechnic State University Herbarium, the University of Santa Cruz Herbarium, the Black Hills State University Herbarium, the Luther College Herbarium, the Minot State University Herbarium, the Tarleton State University Herbarium, the South Dakota State University Herbarium, the Pittsburg State University Herbarium, the Montana State University-Billings Herbarium, the Sul Ross University Herbarium, the Fort Hays State University Herbarium, the Utah State University Herbarium, the Brigham Young University Herbarium, the Eastern Nevada Landscape Coalition Herbarium, the University of Nevada Herbarium, the Natural History Museum of Utah, the Western Illinois University Herbarium, the Eastern Illinois University Herbarium, the Northern Illinois University Herbarium, the Morton Arboretum Herbarium, the Chicago Botanic Garden
Z
Storage and Transit Time Data and Code
data.niaid.nih.gov
zenodo.org
Updated Jun 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Felton (2024). Storage and Transit Time Data and Code [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8136816
Explore at:
Dataset updated
Jun 12, 2024
Dataset authored and provided by
Andrew Felton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Author: Andrew J. FeltonDate: 5/5/2024

This R project contains the primary code and data (following pre-processing in python) used for data production, manipulation, visualization, and analysis and figure production for the study entitled:

"Global estimates of the storage and transit time of water through vegetation"

Please note that 'turnover' and 'transit' are used interchangeably in this project.

Data information:

The data folder contains key data sets used for analysis. In particular:

"data/turnover_from_python/updated/annual/multi_year_average/average_annual_turnover.nc" contains a global array summarizing five year (2016-2020) averages of annual transit, storage, canopy transpiration, and number of months of data. This is the core dataset for the analysis; however, each folder has much more data, including a dataset for each year of the analysis. Data are also available is separate .csv files for each land cover type. Oterh data can be found for the minimum, monthly, and seasonal transit time found in their respective folders. These data were produced using the python code found in the "supporting_code" folder given the ease of working with .nc and EASE grid in the xarray python module. R was used primarily for data visualization purposes. The remaining files in the "data" and "data/supporting_data"" folder primarily contain ground-based estimates of storage and transit found in public databases or through a literature search, but have been extensively processed and filtered here.

Code information

Python scripts can be found in the "supporting_code" folder.

Each R script in this project has a particular function:

01_start.R: This script loads the R packages used in the analysis, sets thedirectory, and imports custom functions for the project. You can also load in the main transit time (turnover) datasets here using the source() function.

02_functions.R: This script contains the custom function for this analysis, primarily to work with importing the seasonal transit data. Load this using the source() function in the 01_start.R script.

03_generate_data.R: This script is not necessary to run and is primarilyfor documentation. The main role of this code was to import and wranglethe data needed to calculate ground-based estimates of aboveground water storage.

04_annual_turnover_storage_import.R: This script imports the annual turnover andstorage data for each landcover type. You load in these data from the 01_start.R scriptusing the source() function.

05_minimum_turnover_storage_import.R: This script imports the minimum turnover andstorage data for each landcover type. Minimum is defined as the lowest monthlyestimate.You load in these data from the 01_start.R scriptusing the source() function.

06_figures_tables.R: This is the main workhouse for figure/table production and supporting analyses. This script generates the key figures and summary statistics used in the study that then get saved in the manuscript_figures folder. Note that allmaps were produced using Python code found in the "supporting_code"" folder.
Meta Kaggle Code
kaggle.com
zip
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
Explore at:
zip(143722388562 bytes)Available download formats
Dataset updated
Jun 5, 2025
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!
Z
Multispectral Spectral Imaging dataset for use in Heritage Science
data.niaid.nih.gov
Updated Jul 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fort, Molly B.M (2024). Multispectral Spectral Imaging dataset for use in Heritage Science [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7292713
Explore at:
Dataset updated
Jul 15, 2024
Dataset provided by
Gibson, Adam
Fort, Molly B.M
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The following data sets were collected to support the potential uses of opensource data in the context of digital humanities and heritage sciences.

This proposed experiment is conducted by the UCL Institute for Sustainable Heritage in collaboration with the Centre for Digital Humanities. Imaging methods including Photography, Multispectral Imaging, Hyperspectral Imaging and Xray Fluorescence Mapping have been collected along with the complete readout metadata of the instrumentation.

We hope that you find the data helpful, and we welcome you to use the data in any way you wish, for all and any analysis development purposes. For us to build upon this research, we ask that in return you would be willing to share in some regard your experiences in using open-source data, using our data, successes and issues.

If you would be willing to engage with us in this endeavor, please feel free to contact us so that we may be able to follow up with you.

Other Data sets available Here

E: molly.fort.21@ucl.ac.uk

Object Paradata;

Postcard – c. Early 1900's

Language – Eng.

Materials – colour print on card, metallic leafing.

Front transcription -

‘Greetings’

‘May your Birthday bring you Peace & perfect Happiness, Golden hopes & Love of Friends, And every Happiness this world can send.’

Object Dimensions – 138mm X 88mm

The postcard is an item of ephemera donated to the UCLDH Digitisation Suite by Prof Melissa Terras, for teaching and training purposes in 2015.

This folder contains:

Images captured using a PhaseOne XF Multispectral Camera System.

Image filenames are arranged as postacards_postcardmsi-Postcard- (Wavelength No.)(Filter)_*sequence order number*_R.tif where wavelength number is the nominal central illumination wavelength in nm (365, 385, 410, 420, 450, 480, 510, 550, 600, 630, 640, 660, 740, 850, 940), Filter is the colour of the long-pass filter (N - no filter, I - Infrared filter, G - Green filter, R - Red filter) and sequence order number is a count from 0001 denoting the order in which the image was acquired)

Complementary flats for each of the object images, used typically to process even illumination distribution, captured of white, flat, smooth, non-chemically processed imaging standard flat paper with the same naming convention as above.

postcard_postcardmsi-Postcard.json - Metadata read out collected from MS camera system

Truecolour RGB reference image
Wimmera CMA Search API
researchdata.edu.au
demo.dev.magda.io
Updated May 1, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wimmera CMA (2018). Wimmera CMA Search API [Dataset]. https://researchdata.edu.au/wimmera-cma-search-api/2996509
Explore at:
Dataset updated
May 1, 2018
Dataset provided by
Data.govhttps://data.gov/
Authors
Wimmera CMA
License
Attribution 2.5 (CC BY 2.5)https://creativecommons.org/licenses/by/2.5/
License information was derived automatically
Description
Search API for looking up addresses and roads within the catchment. The api can search for both address and road, or either. This dataset is updated weekly from VicMap Roads and Addresses, sourced via www.data.vic.gov.au.\r \r

Use\r

\r The Search API uses a data.gov.au datastore and allows a user to take full advantage of full test search functionality.\r \r An sql attribute is passed to the URL to define the query against the API. Please note that the attribute must be URL encoded. The sql statement takes for form as below:\r \r \r SELECT distinct display, x, y\r FROM "4bf30358-6dc6-412c-91ee-a6f15aaee62a"\r WHERE _full_text @@ to_tsquery(replace('[term]', ' ', ' %26 '))\r LIMIT 10\r\r \r The above will select the top 10 results from the API matching the input 'term', and return the display name as well as an x and y coordinate. \r \r The full URL for the above query would be:\r \r \r https://data.gov.au/api/3/action/datastore_search_sql?sql=SELECT display, x, y FROM "4bf30358-6dc6-412c-91ee-a6f15aaee62a" WHERE _full_text @@ to_tsquery(replace('[term]', ' ', ' %26 ')) LIMIT 10)\r\r \r

Fields\r

Any field in the source dataset can be returned via the API. Display, x and y are used in the example above, but any other field can be returned by altering the select component of the sql statement. See examples below.\r \r

Filters\r

Search data sources and LGA can also be used to filter results. When not using a filter, the API defaults to using all records. See examples below.\r \r

Source Dataset\r

A filter can be applied to select for a particular source dataset using the 'src' field. The currently available datasets are as follows:\r \r - 1 for Roads\r - 2 for Address\r - 3 for Localities\r - 4 for Parcels (CREF and SPI)\r - 5 for Localities (Propnum)\r \r

Local Government Area\r

Filters can be applied to select for a specific local government area using the 'lga_code' field. LGA codes are derrived from Vicmap LGA datasets. Wimmeras LGAs include:\r \r - 332 Horsham Rural City Council\r - 330 Hindmarsh Shire Council\r - 357 Northern Grampians Shire Council\r - 371 West Wimmera Shire Council\r - 378 Yarriambiack Shire Council\r \r

Examples\r

Search for the top 10 addresses and roads with the word 'darlot' in their names:\r \r \r SELECT distinct display, x, y FROM "4bf30358-6dc6-412c-91ee-a6f15aaee62a" WHERE _full_text @@ to_tsquery(replace('darlot', ' ', ' & ')) LIMIT 10)\r\r example\r \r Search for all roads with the word 'perkins' in their names:\r \r \r SELECT distinct display, x, y FROM "4bf30358-6dc6-412c-91ee-a6f15aaee62a" WHERE _full_text @@ to_tsquery(replace('perkins', ' ', ' %26 ')) AND src=1\r\r example\r \r Search for all addresses with the word 'kalimna' in their names, within Horsham Rural City Council:\r \r \r SELECT distinct display, x, y FROM "4bf30358-6dc6-412c-91ee-a6f15aaee62a" WHERE _full_text @@ to_tsquery(replace('kalimna', ' ', ' %26 ')) AND src=2 and lga_code=332\r\r example\r \r Search for the top 10 addresses and roads with the word 'green' in their names, returning just their display name, locality, x and y:\r \r \r SELECT distinct display, locality, x, y FROM "4bf30358-6dc6-412c-91ee-a6f15aaee62a" WHERE _full_text @@ to_tsquery(replace('green', ' ', ' %26 ')) LIMIT 10\r\r example\r \r Search all addresses in Hindmarsh Shire:\r \r \r SELECT distinct display, locality, x, y FROM "4bf30358-6dc6-412c-91ee-a6f15aaee62a" WHERE lga_code=330\r\r example
P
RoBo6 Dataset
paperswithcode.com
Updated Nov 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Kyselica; Marek Šuppa; Jiří Šilha; Roman Ďurikovič (2024). RoBo6 Dataset [Dataset]. https://paperswithcode.com/dataset/robo6
Explore at:
Dataset updated
Nov 29, 2024
Authors
Daniel Kyselica; Marek Šuppa; Jiří Šilha; Roman Ďurikovič
Description
Dataset contains light curves of 6 rocket body types from Mini Mega Tortora database (MMT)¹. The dataset was created to be used as a benchmark for rocket body light curve classification. For more informations follow the original paper: RoBo6: Standardized MMT Light Curve Dataset for Rocket Body Classification²

Class labels: - ARIANE 5 R/B - ATLAS 5 CENTAUR R/B - CZ-3B R/B - DELTA 4 R/B - FALCON 9 R/B - H-2A R/B

Dataset description Usage ```python

from datasets import load_dataset

dataset = load_dataset("kyselica/RoBo6", data_files={"train": "train.csv", "test": "test.csv"}) dataset DatasetDict({ train: Dataset({ features: ['label', ' id', ' part', ' period', ' mag', ' phase', ' time'], num_rows: 5676 }) test: Dataset({ features: ['label', ' id', ' part', ' period', ' mag', ' phase', ' time'], num_rows: 1404 }) }) ```

label - class name id - unique identifier of the light curve from MMT part - part number of the light curve period - rotational period of the object mag - relative path to the magnitude values file phase - relative path to the phase values file time - relative path to the time values file

Mean and standard deviation of magnitudes are stored in mean_std.csv file.

File structure

data directory contains 5 subdirectories, one for each class. Light curves are stored in file triplets in the following format:

where

MMT Rocket Bodies ├── README.md ├── train.csv ├── test.csv ├── mean_std.csv ├── data │ ├── ARIANE 5 R_B │ │ ├──

Data preprocessing To create data sutable for both CNN and RNN based models, the light curves were preprocessed in the following way:

Split the light curves if the gap between two consecutive measurements is larger than object's rotational period. Split the light curves to have maximum span 1_000 seconds. Filter out light curves which folded form divided into 100 bins has more than 25% of bins empty. Resample the light curves to 10_000 points with step 0.1 seconds. Filter out light curves with less than 100 measurements.

Citation @article{kyselica2024robo6, title={RoBo6: Standardized MMT Light Curve Dataset for Rocket Body Classification}, author={Kyselica, Daniel and {\v{S}}uppa, Marek and {\v{S}}ilha, Ji{\v{r}}{\'\i} and {\v{D}}urikovi{\v{c}}, Roman}, journal={arXiv preprint arXiv:2412.00544}, year={2024} }

References

Karpov, S., et al. "Mini-Mega-TORTORA wide-field monitoring system with sub-second temporal resolution: first year of operation." Revista Mexicana de Astronomía y Astrofísica 48 (2016): 91-96. ↩

RoBo6: Standardized MMT Light Curve Dataset for Rocket Body Classification ↩
Z
Ultra high-density 255-channel EEG-AAD dataset
data.niaid.nih.gov
Updated Jun 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zink, Rob (2024). Ultra high-density 255-channel EEG-AAD dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4518753
Explore at:
Dataset updated
Jun 13, 2024
Dataset provided by
Mundanad Narayanan, Abhijith
Bertrand, Alexander
Zink, Rob
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description

If using this dataset, please cite the following paper above and the current Zenodo repository:A. Mundanad Narayanan, R. Zink, and A. Bertrand, "EEG miniaturization limits for stimulus decoding with EEG sensor networks", Journal of Neural Engineering, vol. 18, 2021, doi: 10.1088/1741-2552/ac2629

Experiment*************

This dataset contains 255-channel electroencephalography (EEG) data collected during an auditory attention decoding experiment (AAD). The EEG was recorded using a SynAmps RT device (Compumedics, Australia) at a sampling rate of 1 kHz and using active Ag/Cl electrodes. The electrodes were placed on the head according to the international 10-5 (5%) system. 30 normal hearing male subjects between 22 and 35 years old participated in the experiment. All of them signed an informed consent form approved by the KU Leuven ethical committee.

Two Dutch stories narrated by different male speakers divided into two parts of 6 minutes each were used as the stimuli in the experiment [1]. A single trial of the experiment involved the presentation of these two parts (one of both stories) to the subject through insert phones (Etymotic ER3A) at 60dBA. These speech stimuli were filtered using a head-related transfer function (HRTF) such that the stories seemed to arrive from two distinct spatial locations, namely left and right with respect to the subject with 180 degrees separation. In each trial, the subjects were asked to attend to only one ear while ignoring the other. Four trials of 6 minutes each were carried out, in which each story part is used twice. The order of presentations was randomized and balanced over different subjects. Thus approximately 24 minutes of EEG data was recorded per subject.

File organization and details********************************

The EEG data of each of the 30 subjects are uploaded as a ZIP file with the name Sx.tar.gzip here x=0,1,2,..,29. When a zip file is extracted, the EEG data are in their original raw format as recorded by the CURRY software [2]. The data files of each recording consist of four files with the same name but different extensions, namely, .dat, ,dap, .rs3 and .ceo. The name of each file follows the following convention: Sx_AAD_P. With P taking one of the following values for each file:1. 1L2. 1R3. 2L4. 2R

The letter 'L' or 'R' in P indicates the attended direction of each subject in a recording: left and right respectively. A MATLAB function to read the software is provided in the directory called scripts. A python function to read the file is available in this Github repository [3].The original version of stimuli presented to subjects, i.e. without the HRTF filtering, can be found after extracting the stimuli.zip file in WAV format. There are 4 WAV files corresponding to the two parts of each of the two stories. These files have been sampled at 44.1 kHz. The order of presentation of these WAV files is given in the table below: Stimuli presentation and attention information of files

Trial (P) Stimuli: Left-ear Stimuli: Right-ear Attention

1L part1_track1_dry part1_track2_dry Left

1R part1_track1_dry part1_track2_dry Right

2L part2_track2_dry part2_track1_dry Left

2R part2_track2_dry part2_track1_dry Right

Additional files (after extracting scripts.zip and misc.zip):

scripts/sample_script.m: Demonstrates reading an EEG-AAD recording and extracting the start and end of the experiment.

misc/channel-layout.jpeg: The 255-channel EEG cap layout

misc/eeg255ch_locs.csv: The channel names, numbers and their spherical (theta and phi) scalp coordinates.

[1] Radioboeken voor kinderen, http://radioboeken.eu/kinderradioboeken.php?lang=NL, 2007 (Accessed: 8 Feb 2021)

[2] CURRY 8 X – Data Acquisition and Online Processing, https://compumedicsneuroscan.com/product/curry-data-acquisition-online-processing-x/ (Accessed: 8, Feb, 2021)

[3] Abhijith Mundanad Narayanan, "EEG analysis in python", 2021. https://github.com/mabhijithn/eeg-analyse , (Accessed: 8 Feb, 2021)
n
Processed data for the analysis of human mobility changes from COVID-19...
data.niaid.nih.gov
datadryad.org
+1more
zip
Updated Mar 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jin Bai; Michael Caslin; Madhusudan Katti (2024). Processed data for the analysis of human mobility changes from COVID-19 lockdown on bird occupancy in North Carolina, USA [Dataset]. http://doi.org/10.5061/dryad.gb5mkkwxr
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.gb5mkkwxr
Dataset updated
Mar 28, 2024
Dataset provided by
North Carolina State University
Authors
Jin Bai; Michael Caslin; Madhusudan Katti
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Area covered
North Carolina, United States
Description
The COVID-19 pandemic lockdown worldwide provided a unique research opportunity for ecologists to investigate the human-wildlife relationship under abrupt changes in human mobility, also known as Anthropause. Here we chose 15 common non-migratory bird species with different levels of synanthrope and we aimed to compare how human mobility changes could influence the occupancy of fully synanthropic species such as House Sparrow (Passer domesticus) versus casual to tangential synanthropic species such as White-breasted Nuthatch (Sitta carolinensis). We extracted data from the eBird citizen science project during three study periods in the spring and summer of 2020 when human mobility changed unevenly across different counties in North Carolina. We used the COVID-19 Community Mobility reports from Google to examine how community mobility changes towards workplaces, an indicator of overall human movements at the county level, could influence bird occupancy. Methods The data source we used for bird data was eBird, a global citizen science project run by the Cornell Lab of Ornithology. We used the COVID-19 Community Mobility Reports by Google to represent the pause of human activities at the county level in North Carolina. These data are publicly available and were last updated on 10/15/2022. We used forest land cover data from NC One Map that has a high resolution (1-meter pixel) raster data from 2016 imagery to represent canopy cover at each eBird checklist location. We also used the raster data of the 2019 National Land Cover Database to represent the degree of development/impervious surface at each eBird checklist location. All three measurements were used for the highest resolution that was available to use. We downloaded the eBird Basic Dataset (EBD) that contains the 15 study species from February to June 2020. We also downloaded the sampling event data that contains the checklist efforts information. First, we used the R package Auk (version 0.6.0) in R (version 4.2.1) to filter data in the following conditions: (1) Date: 02/19/2020 - 03/29/2020; (2) Checklist type: stationary; (3) Complete checklist; (4) Time: 07:00 am - 06:00 pm; (5) Checklist duration: 5-20 mins; (6) Location: North Carolina. After filtering data, we used the zero fill function from Auk to create detection/non-detection data of each study species in NC. Then we used the repeat visits filter from Auk to filter eBird checklist locations where at least 2 checklists (max 10 checklists) have been submitted to the same location by the same observer, allowing us to create a hierarchical data frame where both detection and state process can be analyzed using Occupancy Modeling. This data frame was in a matrix format that each row represents a sampling location and the columns represent the detection and non-detection of the 2-10 repeat sampling events. For the Google Community Mobility data, we chose the “Workplaces” categoriy of mobility data to analyze the Anthropause effect because it was highly relevant to the pause of human activities in urban areas. The mobility data from Google is a percentage change compared to a baseline for each day. A baseline day represents a normal value for the day of the week from the 5-week period (01/03/2020-02/06/2020). For example, a mobility value of -30.0 for Wake County on Apr 15, 2020, means the overall mobility in Wake County on that day decreased by 30% compared to the baseline day a few months ago. Because the eBird data we used covers a wider range of dates rather than each day, we took the average value of mobility before lockdown, during lockdown, and after lockdown in each county in NC. For the environmental variables, we calculated the values in ArcGIS Pro (version 3.1.0). We created a 200 m buffer at each eligible eBird checklist location. For the forest cover data, we used “Zonal Statistics as Table” to extract the percentage of forest cover at each checklist location’s 200-meter circular buffer. For the National Land Cover Database (NLCD) data, we combined low-intensity, medium-intensity, and high-intensity development as development covers and used “Summarize Within” to extract the percentage of development cover using the polygon version of NLCD. We used a correlation matrix of the three predictors (workplace mobility, percent forest cover, and percent development cover) and found no co-linearity. Thus, these three predictors plus the interaction between workplace mobility and percent development cover were the site covariates of the Occupancy Models. For the detection covariates, four predictors were considered including time of observation, checklist duration, number of observers, and workplace mobility. These detection covariates were also not highly correlated. We then merged all data into an unmarked data frame using the “unmarked” R package (version 1.2.5). The unmarked data frame has eBird sampling locations as sites (rows in the data frame) and repeat checklists at the same sampling locations as repeat visits (columns in the data frame).
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Tadeo Ramirez-Parada; Isaac Park; Susan Mazer (2022). Data For: Herbarium specimens provide reliable estimates of phenological responsiveness to climate at unparalleled taxonomic and spatiotemporal scales [Dataset]. http://doi.org/10.25349/D9TK64

Data For: Herbarium specimens provide reliable estimates of phenological responsiveness to climate at unparalleled taxonomic and spatiotemporal scales

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.25349/D9TK64

Dataset updated

Jun 22, 2022

Dataset provided by

University of California, Santa Barbara

Authors

Tadeo Ramirez-Parada; Isaac Park; Susan Mazer

License

https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

Description

Understanding the effects of climate change on the phenological structure of plant communities will require measuring variation in sensitivity among thousands of co-occurring species across regions. Herbarium collections provide vast resources with which to do this, but may also exhibit biases as sources of phenological data. Despite general recognition of these caveats, validation of herbarium-based estimates of phenological sensitivity against estimates obtained using field observations remain rare and limited in scope. Here, we leveraged extensive datasets of herbarium specimens and of field observations from the USA National Phenology Network for 21 species in the United States and, for each species, compared herbarium- and field-based standardized estimates of peak flowering dates and of sensitivity of peak flowering time to geographic and interannual variation in mean spring minimum temperatures (TMIN). We found strong agreement between herbarium- and field-based estimates for standardized peak flowering time (r=0.91, p<0.001) and for the direction and magnitude of sensitivity to both geographic TMIN variation (r=0.88, p <0.001) and interannual TMIN variation (r=0.82, p<0.001). This agreement was robust to substantial differences between datasets in 1) the long-term TMIN conditions observed among collection and phenological monitoring sites and 2) the interannual TMIN conditions observed in the time periods encompassed by both datasets for most species. Our results show that herbarium-based sensitivity estimates are reliable among species spanning a wide diversity of life histories and biomes, demonstrating their utility in a broad range of ecological contexts, and underscoring the potential of herbarium collections to enable phenoclimatic analysis at taxonomic and spatiotemporal scales not yet captured by observational data.

Methods Phenological data The dataset of field observations consisted of all records of flowering onset and termination available in the USA National Phenology Network database (NPNdb), representing an initial 1,105,764 phenological observations. To ensure the quality of the observational data, we retained only observations for which we could determine that the dates of onset and termination of flowering had an arbitrary maximum error of 14 days. To do this, we filtered the data to include only records for which the date on which the first open flower on an individual was observed was preceded by an observation of the same individual without flowers no more than 14 days prior, and for which the date on which the last flower was recorded was followed by an observation of the same individual without flowers no more than 14 days later. After filtering, field observations in our data had an average maximum error of 6.4 days for the onset of flowering, and of 6.6 days for the termination of flowering. The herbarium dataset was constructed using an initial 894,392 digital herbarium specimen records archived by 72 herbaria across North America. We excluded from analysis all specimens not explicitly recorded as being in flower, or for which GPS coordinates or dates of collection were not available. We further filtered both datasets by only retaining species that were found in both datasets and that were represented by observations at a minimum of 15 unique sites in the NPN dataset. For each species, and to more closely match the geographic ranges covered by each dataset, we filtered the herbarium dataset to include only specimens within the range of latitudes and longitudes represented by the field observations in the NPN data. Finally, we retained only species represented by 70 or more herbarium specimens to ensure sufficient sample sizes for phenoclimatic modeling. This procedure identified a final set of 21 native species represented in 3,243 field observations across 1,406 unique site-year combinations, and a final sample of 5,405 herbarium specimens across 4,906 unique site-year combinations. For the herbarium dataset, sample sizes ranged from 69 unique sites and 74 specimens for Prosopis velutina, to 1,323 unique sites containing 1,368 specimens for Achillea millefolium. Sample sizes in the NPN dataset ranged from 15 unique sites with 74 observations for Impatiens capensis, 108 unique sites with 321 observations for Cornus florida. These 21 species represented 15 families and 17 genera, spanning a diverse range of life-history strategies and growth forms, including evergreen and deciduous shrubs and trees (e.g., Quercus agrifolia and Tilia americana, respectively), as well as herbaceous perennials (e.g., Achillea millefolium) and annuals (e.g., Impatiens capensis). Our focal species covered a wide variety of biomes and regions including Western deserts (e.g., Fouquieria splendens), Mediterranean shrublands and oak woodlands (e.g., Baccharis pilularis, Quercus agrifolia), and Eastern deciduous forests (e.g., Quercus rubra, Tilia Americana). To estimate flowering dates in the herbarium dataset, we employed the day of year of collection (henceforth ‘DOY’) of each specimen collected while in flower as a proxy. Herbarium specimens in flower could have been collected at any point between the onset and termination of their flowering period and botanists may preferentially collect individuals in their flowering peak for many species. Therefore, herbarium specimen collection dates are more likely to reflect peak flowering dates than flowering onset dates. To maximize the phenological equivalence of the field and herbarium datasets, we used the median date between onset and termination of flowering for each individual in each year in the NPN data as a proxy for peak flowering time. Due to the maximum error of 14 days for flowering onset and termination dates in the NPN dataset, median flowering dates also had a maximum error of 14 days, with an average maximum error among observations of 6.5 days. To account for the artificial DOY discontinuity between December 31st (DOY = 365 or 366 in a leap year) to January 1st (DOY = 1), we converted DOY in both datasets into a circular variable using an Azimuthal correction. Climate data Daily minimum temperatures mediate key developmental processes including the break of dormancy, floral induction, and anthesis. Therefore, we used minimum surface temperatures averaged over the three months leading up to (and including) the mean flowering month for each species (hereafter ‘TMIN’) as the climatic correlate of flowering time in this study; consequently, the specific months over which temperatures were averaged varied among species. Using TMIN calculated over different time periods instead (e.g., during spring for all species) did not qualitatively affect our results. Then, we partitioned variation among sites into spatial and temporal components, characterizing TMIN for each observation by the long-term mean TMIN at its site of collection (henceforth ‘TMIN normals’), and by the deviation between its TMIN in the year of collection (for the three-month window of interest) and its long-term mean TMIN (henceforth ‘TMIN anomalies’). For each site, we obtained a monthly time series of TMIN from January, 1901, and December, 2016, using ClimateNA v6.30, a software package that interpolates 4km2 resolution climate data from PRISM Climate Group from Oregon State University, (http://prism.oregonstate.edu) to generate elevation-adjusted climate estimates. To calculate TMIN normals, we averaged observed TMIN for the three months leading up to the mean flowering date of each species across all years between 1901 and 2016 for each site. TMIN anomalies relative to long-term conditions were calculated by subtracting TMIN normals from observed TMIN conditions in the year of collection. Therefore, positive and negative values of the anomalies respectively reflect warmer-than-average and colder-than-average conditions in a given year. Analysis We also provide R code to reproduce all results presented in the main text and the supplemental materials of our study. This code includes 1) all steps necessary to merge herbarium and field data into a single dataset ready for analysis, 2) the formulation and specification of the varying-intercepts and varying-slopes Bayesian model used to generate herbarium- vs. field-based estimates of phenology and its sensitivity to TMINsp, 3) the steps required to process the output of the Bayesian model and to obtain all metrics required for the analyses in the paper, and 4) the code used to generate each figure. Contributing Herbaria Data used in this study was contributed by the Yale Peabody Museum of Natural History, the George Safford Torrey Herbarium at the University of Connecticut, the Acadia University Herbarium, the Chrysler Herbarium at Rutgers University, the University of Montreal Herbarium, the Harvard University Herbarium, the Albion Hodgdon Herbarium at the University of New Hampshire, the Academy of Natural Sciences of Drexel University, the Jepson Herbarium at the University of California-Berkeley, the University of California-Berkeley Sagehen Creek Field Station Herbarium, the California Polytechnic State University Herbarium, the University of Santa Cruz Herbarium, the Black Hills State University Herbarium, the Luther College Herbarium, the Minot State University Herbarium, the Tarleton State University Herbarium, the South Dakota State University Herbarium, the Pittsburg State University Herbarium, the Montana State University-Billings Herbarium, the Sul Ross University Herbarium, the Fort Hays State University Herbarium, the Utah State University Herbarium, the Brigham Young University Herbarium, the Eastern Nevada Landscape Coalition Herbarium, the University of Nevada Herbarium, the Natural History Museum of Utah, the Western Illinois University Herbarium, the Eastern Illinois University Herbarium, the Northern Illinois University Herbarium, the Morton Arboretum Herbarium, the Chicago Botanic Garden

Clear search

Close search

Google apps

Main menu

Data For: Herbarium specimens provide reliable estimates of phenological...

Storage and Transit Time Data and Code

Code information

Meta Kaggle Code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments

Multispectral Spectral Imaging dataset for use in Heritage Science

Wimmera CMA Search API

Use\r

Fields\r

Filters\r

Source Dataset\r

Local Government Area\r

Examples\r

RoBo6 Dataset

Ultra high-density 255-channel EEG-AAD dataset

Processed data for the analysis of human mobility changes from COVID-19...

Data For: Herbarium specimens provide reliable estimates of phenological responsiveness to climate at unparalleled taxonomic and spatiotemporal scales