Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset brings to you Iris Dataset in several data formats (see more details in the next sections).
You can use it to test the ingestion of data in all these formats using Python or R libraries. We also prepared Python Jupyter Notebook and R Markdown report that input all these formats:
Iris Dataset was created by R. A. Fisher and donated by Michael Marshall.
Repository on UCI site: https://archive.ics.uci.edu/ml/datasets/iris
Data Source: https://archive.ics.uci.edu/ml/machine-learning-databases/iris/
The file downloaded is iris.data and is formatted as a comma delimited file.
This small data collection was created to help you test your skills with ingesting various data formats.
This file was processed to convert the data in the following formats:
* csv - comma separated values format
* tsv - tab separated values format
* parquet - parquet format
* feather - feather format
* parquet.gzip - compressed parquet format
* h5 - hdf5 format
* pickle - Python binary object file - pickle format
* xslx - Excel format
* npy - Numpy (Python library) binary format
* npz - Numpy (Python library) binary compressed format
* rds - Rds (R specific data format) binary format
I would like to acknowledge the work of the creator of the dataset - R. A. Fisher and of the donor - Michael Marshall.
Use these data formats to test your skills in ingesting data in various formats.
[Édition 2024] La base de données de référence pour la diffusion infra-communale des résultats du recensement de la population par Iris, de précision décamétrique. Coédition Insee et IGN, Contours... IRIS® est un fond numérisé des îlots Iris définis par l'Insee pour les besoins des recensements sur l'ensemble des communes de plus de 10 000 habitants et la plupart des communes de 5 000 à 10 000 habitants.
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Palmer Penguins
The Palmer penguins dataset by Allison Horst, Alison Hill, and Kristen Gorman was first made publicly available as an R package. The goal of the Palmer Penguins dataset is to replace the highly overused Iris dataset for data exploration & visualization. However, now you can use Palmer penguins on huggingface!
License
Data are available by CC-0 license in accordance with the Palmer Station LTER Data Policy and the LTER Data Access Policy for Type I data.… See the full description on the dataset page: https://huggingface.co/datasets/SIH/palmer-penguins.
Provides de-identified client and matter information related to legal services delivered by Indigenous Legal Assistance Programme service providers\r \r More information about this dataset can be found at: http://www.abs.gov.au/ausstats/abs@.nsf/Lookup/4533.0Main+Features382013
Author: H. Altay Guvenir, Burak Acar, Haldun Muderrisoglu
Source: UCI
Please cite: UCI
Cardiac Arrhythmia Database
The aim is to determine the type of arrhythmia from the ECG recordings. This database contains 279 attributes, 206 of which are linear valued and the rest are nominal.
Concerning the study of H. Altay Guvenir: "The aim is to distinguish between the presence and absence of cardiac arrhythmia and to classify it in one of the 16 groups. Class 01 refers to 'normal' ECG classes, 02 to 15 refers to different classes of arrhythmia and class 16 refers to the rest of unclassified ones. For the time being, there exists a computer program that makes such a classification. However, there are differences between the cardiologist's and the program's classification. Taking the cardiologist's as a gold standard we aim to minimize this difference by means of machine learning tools.
The names and id numbers of the patients were recently removed from the database.
1 Age: Age in years , linear
2 Sex: Sex (0 = male; 1 = female) , nominal
3 Height: Height in centimeters , linear
4 Weight: Weight in kilograms , linear
5 QRS duration: Average of QRS duration in msec., linear
6 P-R interval: Average duration between onset of P and Q waves
in msec., linear
7 Q-T interval: Average duration between onset of Q and offset
of T waves in msec., linear
8 T interval: Average duration of T wave in msec., linear
9 P interval: Average duration of P wave in msec., linear
Vector angles in degrees on front plane of:, linear
10 QRS
11 T
12 P
13 QRST
14 J
15 Heart rate: Number of heart beats per minute ,linear
Of channel DI:
Average width, in msec., of: linear
16 Q wave
17 R wave
18 S wave
19 R' wave, small peak just after R
20 S' wave
21 Number of intrinsic deflections, linear
22 Existence of ragged R wave, nominal
23 Existence of diphasic derivation of R wave, nominal
24 Existence of ragged P wave, nominal
25 Existence of diphasic derivation of P wave, nominal
26 Existence of ragged T wave, nominal
27 Existence of diphasic derivation of T wave, nominal
Of channel DII:
28 .. 39 (similar to 16 .. 27 of channel DI)
Of channels DIII:
40 .. 51
Of channel AVR:
52 .. 63
Of channel AVL:
64 .. 75
Of channel AVF:
76 .. 87
Of channel V1:
88 .. 99
Of channel V2:
100 .. 111
Of channel V3:
112 .. 123
Of channel V4:
124 .. 135
Of channel V5:
136 .. 147
Of channel V6:
148 .. 159
Of channel DI:
Amplitude , * 0.1 milivolt, of
160 JJ wave, linear
161 Q wave, linear
162 R wave, linear
163 S wave, linear
164 R' wave, linear
165 S' wave, linear
166 P wave, linear
167 T wave, linear
168 QRSA , Sum of areas of all segments divided by 10,
( Area= width * height / 2 ), linear
169 QRSTA = QRSA + 0.5 * width of T wave * 0.1 * height of T
wave. (If T is diphasic then the bigger segment is
considered), linear
Of channel DII:
170 .. 179
Of channel DIII:
180 .. 189
Of channel AVR:
190 .. 199
Of channel AVL:
200 .. 209
Of channel AVF:
210 .. 219
Of channel V1:
220 .. 229
Of channel V2:
230 .. 239
Of channel V3:
240 .. 249
Of channel V4:
250 .. 259
Of channel V5:
260 .. 269
Of channel V6:
270 .. 279
Class code - class - number of instances:
01 Normal 245 02 Ischemic changes (Coronary Artery Disease) 44 03 Old Anterior Myocardial Infarction 15 04 Old Inferior Myocardial Infarction 15 05 Sinus tachycardy 13 06 Sinus bradycardy 25 07 Ventricular Premature Contraction (PVC) 3 08 Supraventricular Premature Contraction 2 09 Left bundle branch block 9 10 Right bundle branch block 50 11 1. degree AtrioVentricular block 0 12 2. degree AV block 0 13 3. degree AV block 0 14 Left ventricule hypertrophy 4 15 Atrial Fibrillation or Flutter 5 16 Others 22
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This artifact bundles the five dataset archives used in our private federated clustering evaluation, corresponding to the real-world benchmarks, scaling experiments, ablation studies, and timing performance tests described in the paper. The real_datasets.tar.xz includes ten established clustering benchmarks drawn from UCI and the Clustering basic benchmark (DOI: https://doi.org/10.1007/s10489-018-1238-7); scale_datasets.tar.xz contains the SynthNew family generated to assess scalability via the R clusterGeneration package ; ablate_datasets.tar.xz holds the AblateSynth sets varying cluster separation for ablation analysis also powered by clusterGeneration ; g2_datasets.tar.xz packages the G2 sets—Gaussian clusters of size 2048 across dimensions 2–1024 with two clusters each, collected from the Clustering basic benchmark (DOI: https://doi.org/10.1007/s10489-018-1238-7) ; and timing_datasets.tar.xz includes the real s1 and lsun datasets alongside TimeSynth files (balanced synthetic clusters for timing), as per Mohassel et al.’s experimental framework .
Contains ten real-world benchmark datasets and formatted as one sample per line with space-separated features:
iris.txt: 150 samples, 4 features, 3 classes; classic UCI Iris dataset for petal/sepal measurements.
lsun.txt: 400 samples, 2 features, 3 clusters; two-dimensional variant of the LSUN dataset for clustering experiments .
s1.txt: 5,000 samples, 2 features, 15 clusters; synthetic benchmark from Fränti’s S1 series.
house.txt: 1,837 samples, 3 features, 3 clusters; housing data transformed for clustering tasks.
adult.txt: 48,842 samples, 6 features, 3 clusters; UCI Census Income (“Adult”) dataset for income bracket prediction.
wine.txt: 178 samples, 13 features, 3 cultivars; UCI Wine dataset with chemical analysis features.
breast.txt: 569 samples, 9 features, 2 classes; Wisconsin Diagnostic Breast Cancer dataset.
yeast.txt: 1,484 samples, 8 features, 10 localization sites; yeast protein localization data.
mnist.txt: 10,000 samples, 784 features (28×28 pixels), 10 digit classes; MNIST handwritten digits.
birch2.txt: (a random) 25,000/100,000 subset of samples, 2 features, 100 clusters; synthetic BIRCH2 dataset for high-cluster‐count evaluation .
Holds the SynthNew_{k}_{d}_{s}.txt files for scaling experiments, where:
$k \in \{2,4,8,16,32\}$ is the number of clusters,
$d \in \{2,4,8,16,32,64,128,256,512\}$ is the dimensionality,
$s \in \{1,2,3\}$ are different random seeds.
These are generated with the R clusterGeneration package with cluster sizes following a $1:2:...:k$ ratio. We incorporate a random number (in $[0, 100]$) of randomly sampled outliers and set the cluster separation degrees randomly in $[0.16, 0.26]$, spanning partially overlapping to separated clusters.
Contains the AblateSynth_{k}_{d}_{sep}.txt files for ablation studies, with:
$k \in \{2,4,8,16\}$ clusters,
$d \in \{2,4,8,16\}$ dimensions,
$sep \in \{0.25, 0.5, 0.75\}$ controlling cluster separation degrees.
Also generated via clusterGeneration.
Packages the G2 synthetic sets (g2-{dim}-{var}.txt) from the clustering-data benchmarks:
$N=2048$ samples, $k=2$ Gaussian clusters,
Dimensions $d \in \{1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024\}$
Includes:
s1.txt, lsun.txt: two real datasets for baseline timing.
timesynth_{k}_{d}_{n}.txt: synthetic timing datasets with balanced cluster sizes C_{avg}=N/K, varying:
$k \in \{2,5\}$
$d \in \{2,5\}$
$N \in \{10000; 100000\}$
Generated similarly to the scaling sets, following Mohassel et al.’s timing experiment protocol .
Usage:
Unpack any archive with tar -xJf
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
There have been very few studies with an evolutionary perspective on eye (iris) color, outside of humans and domesticated animals. Extant members of the family Felidae have a great interspecific and intraspecific diversity of eye colors, in stark contrast to their closest relatives, all of which have only brown eyes. This makes the felids a great model to investigate the evolution of eye color in natural populations. Through machine learning image analysis of publicly available photographs of all felid species, as well as a number of subspecies, five felid eye colors were identified: brown, green, yellow, gray, and blue. Using phylogenetic comparative methods, the presence or absence of these colors was reconstructed on a phylogeny. Additionally, through a new color analysis method, the specific shades of the ancestors’ eyes were quantitatively reconstructed. The ancestral felid population was predicted to have brown-eyed individuals, as well as a novel evolution of gray-eyed individuals, the latter being a key innovation that allowed the rapid diversification of eye color seen in modern felids, including numerous gains and losses of different eye colors. It was also found that the gain of yellow eyes is highly associated with, and may be necessary for, the evolution of round pupils in felids, which may influence the shades present in the eyes in turn. Along with these important insights, the methods presented in this work are widely applicable and will facilitate future research into phylogenetic reconstruction of color beyond irises. Methods Data set In order to sample all felid species, we took advantage of public databases. Images of individuals from 40 extant felid species (all but Felis catus, excluded due to the artificial selection on eye color in domesticated cats by humans), as well as 12 identifiable subspecies and four outgroups (banded linsang, Prionodon linsang; spotted hyena, Crocuta crocuta; common genet, Genetta genetta; and fennec fox, Vulpes zerda), were found using Google Images and iNaturalist using both the scientific name and the common name for each species as search terms. This approach, taking advantage of the enormous resource of publicly available images, allows access to a much larger data set than in the published scientific literature or than would be possible to obtain de novo for this study. Public image-based methods for character state classification have been used previously, such as in a phylogenetic analysis of felid coat patterns (Werdelin and Olsson 1997) and a catalog of iris color variation in the white-browed scrubwren (Cake 2019). However, this approach does require implementing strong criteria for selecting images. Criteria used to choose images included selecting images where the animal was facing towards the camera, at least one eye was unobstructed, the animal was a non-senescent adult, and the eye was not in direct light (causing glare) or completely in shadow (causing unwanted darkening). The taxonomic identity of the animal in each selected image was verified through images present in the literature, as well as the “research grade” section of iNaturalist. When possible, we collected five images per taxon, although some rarer taxa had fewer than five acceptable images available. In addition, some species with a large number of eye colors needed more than five images to capture their variation, determined by quantitative methods discussed below. Each of the 56 taxa and the number of images used are given in Supplementary Table 2. Once the images were selected, they were manually edited using MacOS Preview. This editing process involved choosing the “better” of the two eyes for each image (i.e. the one that is most visible and with the least glare and shadow). Then, the section of the iris for that eye without obstruction, such as glare, shadow, or fur, was cropped out. An example of this is given in Figure S11. The strict selection criteria and image editing eliminated the need to color correct the images, a process that can introduce additional subjectivity; the consistency of the data can be seen in the lack of variation between eyes identified as the same color (Figure S5). This process resulted in a data set of 290 cropped, standardized, irises. These images, along with the original photos, can be found in the Supplementary Material. Eye color identification To impartially identify the eye color(s) present in each felid population, the data set images were loaded by species into Python (version 3.8.8) using the Python Imaging Library (PIL) (Van Rossum and Drake 2009; Clark 2015). For each image, the red, green, and blue (RGB) values for each of its pixels were extracted. Then, they were averaged and the associated hex color code for the average R, G, and B values was printed. The color associated with this code was identified using curated and open source color identification programs (Aerne 2022; Cooper 2022). There is no universally agreed upon list of colors, since exact naming conventions differ on an individual and cultural basis, but these programs offer a workable solution, consisting of tens of thousands of colors names derived from published, corporate, and governmental sources. This data allowed the color of each eye in the data set to be impartially assigned, removing a great deal of the bias inherent in a researcher subjectively deciding the color of each iris. Eye colors were assigned on this basis to one of five fundamental color groups: brown, green (including hazel), yellow (including beige), gray, and blue. The possible color groups were determined before observation of the data based on basic color categories established in the literature: white, black, red, green, yellow, blue, brown, purple, pink, orange, and gray (Berlin and Kay 1991). Of course, not all of the eleven categories ended up being represented by any irises; no irises were observed to be white, black, red, purple, pink, or orange. As an example of this method, if an iris’s color had the RGB values R: 114, G: 160, B: 193, this would correspond to the hex code #72A0C1. This hex code, when put into the color identification programs, results in the identification “Air Superiority Blue”, derived from the British Royal Air Force’s official flag specifications (Cooper 2022; Aerne 2022). Based on the identification, this iris would be added to the “blue” color group, bypassing a researcher having to choose the color themself. If a color’s name did not already contain one of the eleven aforementioned color categories, the name was searched for in the Inter-Society Color Council-National Bureau of Standards (ISCC–NBS) System of Color Designation (Judd and Kelly 1939). For instance, the color with RGB values R: 37, G: 29, B: 14 corresponds to hex code #251D0E, identified as “Burnt Coffee” by the color identification programs. The ISCC–NBS descriptor for this color is “moderate brown”, so the color would be added to the “brown” group. All colors were able to be placed directly from their color name or their ISCC–NBS descriptor and, for colors with both a color category in the name and an ISCC–NBS descriptor, there were no instances in which the two conflicted. While color itself lies on a spectrum, splitting the colors into discrete fundamental groups is the most tractable approach to analyze eye color in a biologically reasonable way. If every eye color was instead taken together on one spectrum and analyzed as a continuous trait, the results would be highly unrealistic. As an example, if there were two sister taxa, one with blue eyes (R: 0, G: 0, B: 139) and one with brown eyes (R: 150, G: 75, B: 0), a continuous reconstruction would assign the ancestor the intermediate eye color in the color space: R: 75, G: 37, B: 69. However, this color is firmly within the “purple” category. It is highly unlikely that a recent ancestor of two taxa with blue and brown eyes had purple eyes, rather than blue eyes, brown eyes, or both, which would be the result if blue and brown were considered as separate categories. Indeed, one would run into the same issue if categories were removed at an earlier stage and each taxon was only considered to have one eye color, determined by averaging all irises. A taxon with blue and brown eyes would again be said to have purple eyes, a color which none of the members of that taxon have. The data being separated into color groups is the most realistic way to investigate this trait, preventing the loss of variation present in the natural populations and simultaneously creating impossible analyses. The lines between color categories are not always clear to an observer (e.g. grayish-blues and bluish-grays can look alike) and, no matter how they are defined, they may still be arbitrary. Nevertheless, this is why we used color identification programs, impartially defining the lines to make the analysis possible. To ensure no data was missed due to low sample size, the first 500 Google Images, as well as all the “research grade” images on iNaturalist, were manually viewed for each species, while referring back to already analyzed data and periodically checked with the color identification programs (Aerne 2022; Cooper 2022). Any missed colors were added to the data set. This method nonetheless has a small, but non-zero, chance to miss rare eye colors that are present in species. However, overall, it provides a robust and repeatable way to identify the general iris colors present in animals. In addition, if, for a given species, one, two, or three eye colors were greatly predominant in the available data online (i.e. the first 500 Google Images, as well as all the “research grade” images on iNaturalist), they were defined as being the most common eye color(s). For three colors to be considered the most common, each color had to be present for >26.6% of the images. For two colors, each had to be present for >40%
Description
The VRBiom (Virtual Reality Dataset for Biometric Applications) dataset has been acquired using a head-mounted display (HMD) to benchmark and develop various biometric use-cases such as iris and periocular recognition and associated sub-tasks such as detection and semantic segmentation. The VRBiom dataset consists of 900 short videos acquired from 25 individuals recorded in the NIR spectrum. To encompass real-world variations, the dataset includes recordings under three gaze conditions: steady, moving, and partially closed eyes. Additionally, it has maintained an equal split of recordings without and with glasses to facilitate the analysis of eyewear. These videos, characterized by non-frontal views of the eye and relatively low spatial resolutions (400 × 400). The dataset also includes 1104 presentation attacks constructed from 92 PA instruments. These PAIs fall into six categories constructed through combinations of print attacks (real and synthetic identities), fake 3D eyeballs, plastic eyes, and various types of masks and mannequins.
Reference
If you use this dataset, please cite the following publication(s) depending on the use:
@article{vrbiom_dataset_arxiv2024, author = {Kotwal, Ketan and Ulucan, Ibrahim and \”{O}zbulak, G\”{o}khan and Selliah, Janani and Marcel, S\'{e}bastien}, title = {VRBiom: A New Periocular Dataset for Biometric Applications of HMD}, year = {2024}, month = {Jul}, journal = {arXiv preprint arXiv:2407.02150}, DOI = {https://doi.org/10.48550/arXiv.2407.02150} }
@inproceedings{vrbiom_pad_ijcb2024, author = {Kotwal, Ketan and \”{O}zbulak, G\”{o}khan and Marcel, S\'{e}bastien}, title = {Assessing the Reliability of Biometric Authentication on Virtual Reality Devices}, booktitle = {Proceedings of IEEE International Joint Conference on Biometrics (IJCB2024)}, month = {Sep}, year = {2024} }
https://dataverse.nl/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.34894/PX5IVZhttps://dataverse.nl/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.34894/PX5IVZ
PsyCorona is an ad hoc, multinational collaborative study in response to the COVID-19 pandemic. Broadly speaking, we study the psychological factors that predict how people respond to the coronavirus and to the associated public health measures. The ultimate goal is to provide actionable knowledge that can serve to enhance pandemic response. To achieve this goal, PsyCorona was designed with three distinct phases: (1) a cross-sectional survey, (2) follow-up surveys, and (3) integrative data science. The Dataset codebook can be found at: https://osf.io/qhyue/.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about: Time series of coordinates for station IRIS. Please consult parent dataset @ https://doi.org/10.1594/PANGAEA.934034 for more information.
Apalachicola Bay and St. George Sound contain the largest oyster fishery in Florida, and the growth and distribution of the numerous oyster reefs here are the combined product of modern estuarine conditions and the late Holocene evolution of the bay. A suite of geophysical data and cores were collected during a cooperative study by the U.S. Geological Survey, the National Oceanic and Atmospheric Administration Coastal Services Center, and the Apalachicola National Estuarine Research Reserve to refine the geology of the bay floor as well as the bay's Holocene stratigraphy. Sidescan-sonar imagery, bathymetry, high-resolution seismic profiles, and cores show that oyster reefs occupy the crests of sandy shoals that range from 1 to 7 kilometers in length, while most of the remainder of the bay floor is covered by mud. The sandy shoals are the surficial expression of broader sand deposits associated with deltas that advanced southward into the bay between 6,400 and 4,400 years before present. The seismic and core data indicate that the extent of oyster reefs was greatest between 2,400 and 1,200 years before present and has decreased since then due to the continued input of mud to the bay by the Apalachicola River. The association of oyster reefs with the middle to late Holocene sandy delta deposits indicates that the present distribution of oyster beds is controlled in part by the geologic evolution of the estuary. For more information on the surveys involved in this project, see http://woodshole.er.usgs.gov/operations/ia/public_ds_info.php?fa=2005-001-FA and http://woodshole.er.usgs.gov/operations/ia/public_ds_info.php?fa=2006-001-FA.
https://doi.org/10.17026/fp39-0x58https://doi.org/10.17026/fp39-0x58
Molecular pathology reports of patients with colorectal carcinoma were collected from PALGA using specific queries from 1 October 2017 to 30 June 2019 (for details see: Figure 1A PMID: 34675090 / DOI: 10.1136/jclinpath-2021-207865). Manual curation of these reports showed 4060 patients with CRC undergoing predictive mutation analyses in this 21-month study period. Details ofthe mutation analyses (ie, technique, gene panel, diagnostic yield) were manually extracted from these reports and shown in the current dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data set and associated R code (RMarkdown) for project assessing factors influencing iris colour in Spotted Towhees (Pipilo maculatus oregonus) in urban greenspaces within Metro Vancouver during the 2022-2024 breeding and 2022-2023 non-breeding seasons. Accepted at the Journal of Field Ornithology.
Floral nectar usually functions as a pollinator reward, yet it may also attract herbivores. However, the effects of herbivore consumption of nectar or nectaries on pollination have rarely been tested. We investigated Iris bulleyana, an alpine plant that has showy tepals and abundant nectar, in the Hengduan Mountains of SW China. In this region, flowers are visited mainly by pollen-collecting pollinators and nectarivorous herbivores. We tested the hypothesis that, in I. bulleyana, sacrificing nectar and nectaries to herbivores protects tepals and thus enhances pollinator attraction. We compared rates of pollination and herbivory on different floral tissues in plants with flowers protected from nectar and nectary consumption with rates in unprotected control plants. We found that nectar and nectaries suffered more herbivore damage than did tepals in natural conditions. However, the amount of tepal damage was significantly greater in the flowers with protected nectaries than in the control...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data for Nature manuscript titled
Corresponding author Dr. Sybryn Maes – sybryn.maes@gmail.com
Github contains all R scripts on https://github.com/mjalava/tundraflux
The bold names refer to scripts (see the Github repository https://github.com/mjalava/tundraflux) and names in italics refer to files in this repository
df_0
-Study design Figure 1 and Extended Fig. 1 from main text
df_1a
-Effect size calculations of response (ER)
-Links to df_1.csv file with raw flux and environmental data
-Only the experiments that state ‘Open Access’ in the excel file Authors_Datasets (sheet 2). For experiments stating ‘Available Upon Request’, you need to contact the authors for the -raw flux data.
df_1b
-Effect size calculations of environmental drivers
-Links to df_1.csv file with raw flux data data (see above) and Dataset_ID.csv (this file includes all dataset IDs to merge the drivers into one dataframe)
df_2a-f
-Meta-analysis (2a) and meta-regression models (2b-f) (ER, N=136)
-Links to df_2.csv file with effect size data and context-dependencies and Forestplot_horiz_weights_fig.csv (this file includes the mean pooled Hedges SMD as well as the individual dataset Hedges SMD to plot figure 2)
-Contains code for Figs. 2-4 and Extended Figs 2-3
df_3
-Meta-regression for experimental warming duration
-Contains code for Fig. 5
df_4a
-Effect size calculations of autotrophic-heterotrophic respiration partitioning (Ra, Rh, N=9)
-Links to df_3.csv file with raw partitioning data of subset experiments (output file df_4.csv)
df_4b
-Sub-meta-analysis models (ER, Ra, Rh)
-Links to df_4.csv (input file)
NOTES
· All additional input files for the meta-analysis R-scripts are included within the folders.
· ER, Ra, Rh = ecosystem, autotrophic, and heterotrophic respiration
· N = sample size (number of datasets)
For upscaling, the input data is described in the code files (see the Github repository) and the accompanying Readme.txt.
percentageChangeResp_tundraAlpine.tif: modelled change in respiration
baseResp_tundraAlpine.tif: baseline respiration (calculated from the data from literature)
modResp_tundraAlpine.tif: modelled respiration after warming (our calculations: (percentageChangeResp_tundraAlpine+1) * baseResp_tundraAlpine)
changeResp_tundraAlpine.tif: modResp-baseResp
standError_tundraAlpine.tif: standard error of modelled respiration (
standError_tundraAlpine_onlyDataUncertainty.tif: standard error of modelled respiration where only data uncertainty is taken into account
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pearson Correlation Coefficient, R (P value), among Iris-Related Parameters.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundThe retinal pigment epithelium (RPE) is a neural monolayer lining the back of the eye.Degeneration of the RPE leads to severe vision loss in, so far incurable, diseases such as age-related macular degeneration and some forms of retinitis pigmentosa. A promising future replacement therapy may be autologous iris epithelial cell transdifferentiation into RPE in vitro and, subsequently, transplantation. In this study we compared the gene expression profiles of the iris epithelium (IE) and the RPE.MethodsWe collected both primary RPE- and IE cells from 5 freshly frozen human donor eyes, using respectively laser dissection microscopy and excision. We performed whole-genome expression profiling using 44k Agilent human microarrays. We investigated the gene expression profiles on both gene and functional network level, using R and the knowledge database Ingenuity.ResultsThe major molecular pathways related to the RPE and IE were quite similar and yielded basic neuro-epithelial cell functions. Nonetheless, we also found major specific differences: For example, genes and molecular pathways, related to the visual cycle and retinol biosynthesis are significantly higher expressed in the RPE than in the IE. Interestingly, Wnt and aryl hydrocarbon receptor (AhR-) signaling pathways are much higher expressed in the IE than in the RPE, suggesting, respectively, a possible pluripotent and high detoxification state of the IE.ConclusionsThis study provides a valuation of the similarities and differences between the expression profiles of the RPE and IE. Our data combined with that of the literature, represent a most comprehensive perspective on transcriptional variation, which may support future research in the development of therapeutic transplantation of IE.
BEAST XML file used to infer phylogenybeast_ucln_31July17.xmlTable S1Details on the individuals for which we generated ddRAD data, including their museum voucher id, latitude, longitude, nominal species, and raw number of sequencing pairs generated.S1_individuals_v3.csvPhylogeny of revised taxonomyotu.trePhylogeny of existing taxonomyspecies.trePseudo-reference genomes for revised speciesassemblies.zip
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IRIS Data Set and R code. (ZIP 291 KB)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.