13 datasets found

d
Data from: Constraints on trait combinations explain climatic drivers of...
datadryad.org
search.dataone.org
zip
Updated Apr 27, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John M. Dwyer; Daniel C. Laughlin (2018). Constraints on trait combinations explain climatic drivers of biodiversity: the importance of trait covariance in community assembly [Dataset]. http://doi.org/10.5061/dryad.76kt8
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.76kt8
Dataset updated
Apr 27, 2018
Dataset provided by
Dryad
Authors
John M. Dwyer; Daniel C. Laughlin
Time period covered
Apr 27, 2017
Description
quadrat.scale.dataRefer to R script ("Dwyer_&_Laughlin_2017_Trait_covariance_script.r" for information about this dataframe.species.in.quadrat.scale.dataRefer to R script ("Dwyer_&_Laughlin_2017_Trait_covariance_script.r" for information about this dataframe.Dwyer_&_Laughlin_2017_Trait_covariance_scriptThis script reads in the two dataframes of "raw" data, calculates diversity and trait metrics and runs the major analyses presented in Dwyer & Laughlin 2017.
Market Basket Analysis
kaggle.com
zip
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
zip(23875170 bytes)Available download formats
Dataset updated
Dec 9, 2021
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
H
Additional Tennessee Eastman Process Simulation Data for Anomaly Detection...
dataverse.harvard.edu
dataone.org
Updated Jul 6, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cory A. Rieth; Ben D. Amsel; Randy Tran; Maia B. Cook (2017). Additional Tennessee Eastman Process Simulation Data for Anomaly Detection Evaluation [Dataset]. http://doi.org/10.7910/DVN/6C3JR1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/6C3JR1
Dataset updated
Jul 6, 2017
Dataset provided by
Harvard Dataverse
Authors
Cory A. Rieth; Ben D. Amsel; Randy Tran; Maia B. Cook
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/6C3JR1https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/6C3JR1
Description
User Agreement, Public Domain Dedication, and Disclaimer of Liability. By accessing or downloading the data or work provided here, you, the User, agree that you have read this agreement in full and agree to its terms. The person who owns, created, or contributed a work to the data or work provided here dedicated the work to the public domain and has waived his or her rights to the work worldwide under copyright law. You can copy, modify, distribute, and perform the work, for any lawful purpose, without asking permission. In no way are the patent or trademark rights of any person affected by this agreement, nor are the rights that any other person may have in the work or in how the work is used, such as publicity or privacy rights. Pacific Science & Engineering Group, Inc., its agents and assigns, make no warranties about the work and disclaim all liability for all uses of the work, to the fullest extent permitted by law. When you use or cite the work, you shall not imply endorsement by Pacific Science & Engineering Group, Inc., its agents or assigns, or by another author or affirmer of the work. This Agreement may be amended, and the use of the data or work shall be governed by the terms of the Agreement at the time that you access or download the data or work from this Website. Description This dataverse contains the data referenced in Rieth et al. (2017). Issues and Advances in Anomaly Detection Evaluation for Joint Human-Automated Systems. To be presented at Applied Human Factors and Ergonomics 2017. Each .RData file is an external representation of an R dataframe that can be read into an R environment with the 'load' function. The variables loaded are named ‘fault_free_training’, ‘fault_free_testing’, ‘faulty_testing’, and ‘faulty_training’, corresponding to the RData files. Each dataframe contains 55 columns: Column 1 ('faultNumber') ranges from 1 to 20 in the “Faulty” datasets and represents the fault type in the TEP. The “FaultFree” datasets only contain fault 0 (i.e. normal operating conditions). Column 2 ('simulationRun') ranges from 1 to 500 and represents a different random number generator state from which a full TEP dataset was generated (Note: the actual seeds used to generate training and testing datasets were non-overlapping). Column 3 ('sample') ranges either from 1 to 500 (“Training” datasets) or 1 to 960 (“Testing” datasets). The TEP variables (columns 4 to 55) were sampled every 3 minutes for a total duration of 25 hours and 48 hours respectively. Note that the faults were introduced 1 and 8 hours into the Faulty Training and Faulty Testing datasets, respectively. Columns 4 to 55 contain the process variables; the column names retain the original variable names. Acknowledgments. This work was sponsored by the Office of Naval Research, Human & Bioengineered Systems (ONR 341), program officer Dr. Jeffrey G. Morrison under contract N00014-15-C-5003. The views expressed are those of the authors and do not reflect the official policy or position of the Office of Naval Research, Department of Defense, or US Government.
f
Supplement 1. Code for simulations and statistical models.
datasetcatalog.nlm.nih.gov
wiley.figshare.com
Updated Aug 10, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Menge, Duncan N. L.; Ángeles-Pérez, Gregorio; Lichstein, Jeremy W. (2016). Supplement 1. Code for simulations and statistical models. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001526256
Explore at:
Dataset updated
Aug 10, 2016
Authors
Menge, Duncan N. L.; Ángeles-Pérez, Gregorio; Lichstein, Jeremy W.
Description
File List Menge_Latitudinal_Abundance_Model_Code.R (MD5: 7917640d0c7cf0c449b97517fa29133d) Menge_SuccessionDynamicsModel_Script.m (MD5: e3bb898eef5b2e1d104cc90629d37120) Menge_SuccessionDynamicsModel_Pars.m (MD5: 4f3456b66d123b4a957d5a21e74951d8) Menge_SuccessionDynamicsModel_odes_ob_non.m (MD5: c32d1000d5a3c7047a5ed86d479e97ab) Menge_SuccessionDynamicsModel_odes_fac_non.m (MD5: 401ad7cdb1a9d6d6785964231f133e82) Menge_SuccessionDynamicsModel_Figures.m (MD5: e1e7346752dd3e7b8b6e9a101066a5a5) Menge_SuccessionDynamicsModel_FigB8_Script.m (MD5: 9a482ad728fb63538bde2965c79a6041) Menge_SuccessionDynamicsModel_Pars_3.m (MD5: 1ce6191d5cfc70061fd037ae6df80905) Menge_SuccessionDynamicsModel_odes_fac_ob_non.m (MD5: 7c1ba58eb3e7563bad78d2301d2a1da0) Menge_SuccessionDynamicsModel_swa.m (MD5: 2f0631669f3411753e287a264bd1ebd7) Menge_SuccessionDynamicsModel_FigB8_Figure.m (MD5: 9e4a1d2b02675b6c52f64e1e39f76bd5) Description This supplement contains files that consist of code for simulations and statistical analyses consisting of 1 .R (R) file for the "Latitudinal Abundance Model" and 10 .m (matlab) files for the "Succession Dynamics Model." The .R file, Menge_Latitudinal_Abundance_Model_Code.R, contains the statistical model code. It creates a dataframe with latitude and the 1-degree-latitude mean percent basal area occupied by N fixers (the data used in model fitting). It then creates variables for the abundance of each type in each habitat for given age distribution; these data were output from the Successional Dynamics Model and weighted by different age distributions. It then creates some functions needed for the model fitting exercise (as described in the text), and uses nls to fit the model to the data. Finally, it plots up the results. As currently set up, it creates Fig. 4 in the paper. To create Fig. B4–B7, some of the variables must be changed and the code rerun, as indicated in the code comments. The Successional Dynamics Model code is contained in the .m files. There are 5 .m files associated with running the simulation for Fig. 3. Menge_SuccessionDynamicsModel_Script.m is the main script. It calls Menge_SuccessionDynamicsModel_Pars.m to set the parameters, sets up the initial conditions for each simulation, runs the simulation(s), saves the data, and calls the script that makes the figure. Options in the file (used to create the different panels) are changing SevModNon and whichrun (as indicated in the code). The files Menge_SuccessionDynamicsModel_odes_ob_non.m and Menge_SuccessionDynamicsModel_odes_fac_non.m are the functions that describe the mathematical equations of the model, and are called in the main script with the function ode45. The file Menge_SuccessionDynamicsModel_Figures.m sets up the figure, loads the data for each panel (which must be made, first, from the main script), then fills out each panel. There are 5 .m files associated with running the simulation for Appendix B Fig. B8. The file Menge_SuccessionDynamicsModel_FigB8_Script.m is the main script. It initializes the parameter values for all three types (in the file Menge_SuccessionDynamicsModel_Pars_3.m), loops over habitat types and cost values (expressed as "psi," which is related to gamma in the paper by gamma = psi * c, with c = 120 per year), then numerically integrates the model Menge_SuccessionDynamicsModel_odes_fac_ob_non.m with the matlab function ode45, then weights the successional abundances by the FIA age distribution using the function Menge_SuccessionDynamicsModel_swa.m (also does so for the alternate age distributions, but these are not used in the figure), saves the data, then calls the figure script. The file Menge_SuccessionDynamicsModel_FigB8_Figure.m sets up the figure, loads the data, and fills out the panels. Editing of axis and panel labels directly on the pdf was done in Adobe Illustrator. Note that this run takes a long time.
H
Replication Data for: The Wikipedia Adventure: Field Evaluation of an...
dataverse.harvard.edu
search.dataone.org
Updated Jun 7, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sneha Narayan; Jake Orlowitz; Aaron D. Shaw; Benjamin Mako Hill (2017). Replication Data for: The Wikipedia Adventure: Field Evaluation of an Interactive Tutorial for New Users [Dataset]. http://doi.org/10.7910/DVN/6HPRIG
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/6HPRIG
Dataset updated
Jun 7, 2017
Dataset provided by
Harvard Dataverse
Authors
Sneha Narayan; Jake Orlowitz; Aaron D. Shaw; Benjamin Mako Hill
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/6HPRIGhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/6HPRIG
Dataset funded by
National Science Foundation (NSF)
Description
This dataset contains the data and code necessary to replicate work in the following paper: Narayan, Sneha, Jake Orlowitz, Jonathan Morgan, Benjamin Mako Hill, and Aaron Shaw. 2017. “The Wikipedia Adventure: Field Evaluation of an Interactive Tutorial for New Users.” in Proceedings of the 20th ACM Conference on Computer-Supported Cooperative Work & Social Computing (CSCW '17). New York, New York: ACM Press. http://dx.doi.org/10.1145/2998181.2998307 The published paper contains two studies. Study 1 is a descriptive analysis of a survey of Wikipedia editors who played a gamified tutorial. Study 2 is a field experiment that evaluated the same the tutorial. These data are the data used in the field experiment described in Study 2. Description of Files This dataset contains the following files beyond this README: twa.RData — An RData file that includes all variables used in Study 2. twa_analysis.R — A GNU R script that includes all the code used to generate the tables and plots related to Study 2 in the paper. The RData file contains one variable (d) which is an R dataframe (i.e., table) that includes the following columns: userid (integer): The unique numerical ID representing each user on in our sample. These are 8-digit integers and describe public accounts on Wikipedia. sample.date (date string): The day the user was recruited to the study. Dates are formatted in “YYYY-MM-DD” format. In the case of invitees, it is the date their invitation was sent. For users in the control group, these is the date that they would have been invited to the study. edits.all (integer): The total number of edits made by the user on Wikipedia in the 180 days after they joined the study. Edits to user's user pages, user talk pages and subpages are ignored. edits.ns0 (integer): The total number of edits made by user to article pages on Wikipedia in the 180 days after they joined the study. edits.talk (integer): The total number of edits made by user to talk pages on Wikipedia in the 180 days after they joined the study. Edits to a user's user page, user talk page and subpages are ignored. treat (logical): TRUE if the user was invited, FALSE if the user was in control group. play (logical): TRUE if the user played the game. FALSE if the user did not. All users in control are listed as FALSE because any user who had not been invited to the game but played was removed. twa.level (integer): Takes a value 0 of if the user has not played the game. Ranges from 1 to 7 for those who did, indicating the highest level they reached in the game. quality.score (float). This is the average word persistence (over a 6 revision window) over all edits made by this userid. Our measure of word persistence (persistent word revision per word) is a measure of edit quality developed by Halfaker et al. that tracks how long words in an edit persist after subsequent revisions are made to the wiki-page. For more information on how word persistence is calculated, see the following paper: Halfaker, Aaron, Aniket Kittur, Robert Kraut, and John Riedl. 2009. “A Jury of Your Peers: Quality, Experience and Ownership in Wikipedia.” In Proceedings of the 5th International Symposium on Wikis and Open Collaboration (OpenSym '09), 1–10. New York, New York: ACM Press. doi:10.1145/1641309.1641332. Or this page: https://meta.wikimedia.org/wiki/Research:Content_persistence How we created twa.RData The files twa.RData combines datasets drawn from three places: A dataset created by Wikimedia Foundation staff that tracked the details of the experiment and how far people got in the game. The variables userid, sample.date, treat, play, and twa.level were all generated in a dataset created by WMF staff when The Wikipedia Adventure was deployed. All users in the sample created their accounts within 2 days before the date they were entered into the study. None of them had received a Teahouse invitation, a Level 4 user warning, or been blocked from editing at the time that they entered the study. Additionally, all users made at least one edit after the day they were invited. Users were sorted randomly into treatment and control groups, based on which they either received or did not receive an invite to play The Wikipedia Adventure. Edit and text persistence data drawn from public XML dumps created on May 21st, 2015. We used publicly available XML dumps to generate the outcome variables, namely edits.all, edits.ns0, edits.talk and quality.score. We first extracted all edits made by users in our sample during the six month period since they joined the study, excluding edits made to user pages or user talk pages using. We parsed the XML dumps using the Python based wikiq and MediaWikiUtilities software online at: http://projects.mako.cc/source/?p=mediawiki_dump_tools https://github.com/mediawiki-utilities/python-mediawiki-utilities We obtained the XML dumps from: https://dumps.wikimedia.org/enwiki/ A list of edits made by users in our study that were subsequently deleted, created on...
Study Hours vs Grades Dataset
kaggle.com
zip
Updated Oct 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrey Silva (2025). Study Hours vs Grades Dataset [Dataset]. https://www.kaggle.com/datasets/andreylss/study-hours-vs-grades-dataset
Explore at:
zip(33964 bytes)Available download formats
Dataset updated
Oct 12, 2025
Authors
Andrey Silva
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This synthetic dataset contains 5,000 student records exploring the relationship between study hours and academic performance.

Dataset Features

student_id: Unique identifier for each student (1-5000)

study_hours: Hours spent studying (0-12 hours, continuous)

grade: Final exam score (0-100 points, continuous)

Potential Use Cases

Linear regression modeling and practice

Data visualization exercises

Statistical analysis tutorials

Machine learning for beginners

Educational research simulations

Data Quality

No missing values

Normally distributed residuals

Realistic educational scenario

Ready for immediate analysis

Data Generation Code

This dataset was generated using R.

R Code

# Set seed for reproducibility set.seed(42) # Define number of observations (students) n <- 5000 # Generate study hours (independent variable) # Uniform distribution between 0 and 12 hours study_hours <- runif(n, min = 0, max = 12) # Create relationship between study hours and grade # Base grade: 40 points # Each study hour adds an average of 5 points # Add normal noise (standard deviation = 10) theoretical_grade <- 40 + 5 * study_hours # Add normal noise to make it realistic noise <- rnorm(n, mean = 0, sd = 10) # Calculate final grade grade <- theoretical_grade + noise # Limit grades between 0 and 100 grade <- pmin(pmax(grade, 0), 100) # Create the dataframe dataset <- data.frame( student_id = 1:n, study_hours = round(study_hours, 2), grade = round(grade, 2) )
CO2_data
kaggle.com
zip
Updated Oct 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saheli Basu Roy (2022). CO2_data [Dataset]. https://www.kaggle.com/datasets/sahelibasuroy/co2data
Explore at:
zip(681 bytes)Available download formats
Dataset updated
Oct 29, 2022
Authors
Saheli Basu Roy
Description
The CO2 dataframe is a dataset built into R showing the results of an experiment on the cold tolerance of grass. Grass samples from two regions were grown in either a chilled or nonchilled environment, and their CO2 uptake rate was tested.The dataset has been downloaded as a .csv file.

The two types of region chosen for this experiment are Quebec and Mississippi. Each type has three different plants used for this experiment. Each plant has been treated in either a chilled and nonchilled environment. Average concentration is found to be constant for all categories. Thus, plots are made with respect to variation in average CO2 uptake.
f
Data for the Farewell and Herberg example of a two-phase experiment using a...
adelaide.figshare.com
datasetcatalog.nlm.nih.gov
+1more
application/gzip
Updated Jun 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chris Brien (2021). Data for the Farewell and Herberg example of a two-phase experiment using a plaid design [Dataset]. http://doi.org/10.25909/13122095
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.25909/13122095
Dataset updated
Jun 12, 2021
Dataset provided by
The University of Adelaide
Authors
Chris Brien
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The experiment that Farewell and Herzberg (2003) describe is pain-rating experiment that is a subset of the experiment reported by Solomon et al. (1997). It is a two-phase experiment. The first phase is a self-assessment phase in which patients self-assess for pain while moving a painful shoulder joint. The second phase of this experiment is an evaluation phase in which occupational and physical therapy students (the raters) are evaluated for rating patients in a set of videos for pain. The measured response is the difference between a student rating and the patient's rating.The R data file plaid.dat.rda contains the data.frame plaid.dat that has a revised version of the data for the Farewell and Herzberg example downloaded from https://doi.org/10.17863/CAM.54494. The comma delimited text file plaid.dat.csv has the same information in this more commonly accepted format, but without the metadata associated with the data.frame.The data.frame contains the factors Raters, Viewings, Trainings, Expressiveness, Patients, Occasions, and Motions and a column for the response variable Y. The two factors Viewings and Occasions are additional to those in the downloaded file and the remaining factors have been converted from integers or characters to factors and renamed to the names given above. The column Y is unchanged from the column in the original file.To load the data in R use: load("plaid.dat.rda") or plaid.dat
Alpha-Chaconine and Alpha-Solanine Occurrence in
kaggle.com
zip
Updated Feb 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Alpha-Chaconine and Alpha-Solanine Occurrence in [Dataset]. https://www.kaggle.com/datasets/thedevastator/alpha-chaconine-and-alpha-solanine-occurrence-in
Explore at:
zip(57243 bytes)Available download formats
Dataset updated
Feb 20, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Alpha-Chaconine and Alpha-Solanine Occurrence in Food and Feed

Dietary Exposure Assessment

By [source]

About this dataset

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

The dataset includes various data points such as the country of origin, production method, sampling point, lot size, sample number, analysis method and outcome.

In order to use this dataset effectively it is important to understand how to access and interpret the data. This guide will explain how to access the datasets using various tools such as Python or R programming languages. Additionally it will give an overview of what information can be gleaned from examining the datasets.

Accessing Datasets

The Alpha Chaconine/Solanine Occurrence in Food & Feed: Results from EFSA Continuous Call for Data dataset can be accessed through Kaggle (https://www.kaggle.com/efsa2/alpha-chaconinesolanine-occurrence). Once you open Kaggle you can download the raw csv file by clicking on ‘Download’ button on top left corner or simply dragging files into Python (or R) environment followed by reading files into a Pandas Dataframe (pd.read_csv).

#### Interpreting Datasets

The datasets include 24 columns that provide relevant information about each result included in this survey: , , , , ,
f
Data used in "A summer heatwave reduced activity, heart rate and autumn body...
datasetcatalog.nlm.nih.gov
figshare.com
Updated Mar 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evans, Alina L.; Albon, Steve; Król, Elżbieta; Trondrud, L. Monica; Kumpula, Jouko; Speakman, john; Loe, Leif Egil; Pigeon, Gabriel; Ropstad, Erik (2023). Data used in "A summer heatwave reduced activity, heart rate and autumn body mass in a cold-adapted ungulate" [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001093609
Explore at:
Dataset updated
Mar 31, 2023
Authors
Evans, Alina L.; Albon, Steve; Król, Elżbieta; Trondrud, L. Monica; Kumpula, Jouko; Speakman, john; Loe, Leif Egil; Pigeon, Gabriel; Ropstad, Erik
Description
Overview This dataset contains biologging data and R script used to produce the results in "A summer heatwave reduced activity, heart rate and autumn body mass in a cold-adapted ungulate", a submitted manuscript. The longitudinal data of female reindeer and calf body masses used in the paper is owned by the Finnish Reindeer Herders’ Association. Natural Resources Institute Finland (Luke) updates, saves and administrates this long-term reindeer herd data. Methods of data collection Animals and study area The study involved biologging (see below) 14 adult semi-domesticated reindeer females (Focal animals: Table S1) at the Kutuharju Reindeer Research Facility (Kaamanen, Northern Finland, 69° 8’ N, 26° 59’ E, Figure S1), during June–September 2018. Ten of these individuals had been intensively handled in June as part of another study (Trondrud, 2021). The 14 females were part of a herd of ~100 animals, belonging to the Reindeer Herders’ Association. The herding management included keeping reindeer in two large enclosures (~13.8 and ~15 km2) after calving until the rut, after which animals were moved to a winter enclosure (~15 km2) and then in spring to a calving paddock (~0.3 km2) to give birth (See Supporting Information for further details on the study area). Kutuharju reindeer graze freely on natural pastures from May to November and after that are provided with silage and pellets as a supplementary feed in winter. During the period from September to April animals are weighed 5–6 times. In September, body masses of the focal females did not differ from the rest of the herd. Heart rate (HR) and subcutaneous body temperature (Tsc) data In February 2018, the focal females were instrumented with a heart rate (HR) and temperature logger (DST centi-HRT, Star-Oddi, Gardabaer, Iceland). The surgical protocol is described in the Supporting Information. The DST centi-HRT sensors recorded HR and subcutaneous body temperature (Tsc) every 15 min. HR was automatically calculated from a 4-sec electrocardiogram (ECG) at 150 Hz measurement frequency, alongside an index for signal quality. Additional data processing is described in Supporting Information. Activity data The animals were fitted with collar-mounted tri-axial accelerometers (Vertex Plus Activity Sensor, Vectronic Aerospace GmbH, Berlin, Germany) to monitor their activity levels. These sensors recorded acceleration (g) in three directions representing back-forward, lateral, and dorsal-ventral movements at 8 Hz resolution. For each axis, partial dynamic body acceleration (PDBA) was calculated by subtracting the static acceleration using a 4 sec running average from the raw acceleration (Shepard et al., 2008). We estimated vectorial dynamic body acceleration (VeDBA) by calculating the square root of the sum of squared PDBAs (Wilson et al., 2020). We aggregated VeDBA data into 15-min sums (hereafter “sum VeDBA”) to match with HR and Tsc records. Corrections for time offsets are described in Supporting Information. Due to logger failures, only 10 of the 14 individuals had complete data from both loggers (activity and heart rate). Weather and climate data We set up a HOBO weather station (Onset Computer Corporation, Bourne, MA, USA) mounted on a 2 m tall tripod in May 2018 that measured air temperature (Ta, °C) at 15-minute intervals. The placement of the station was between the two summer paddocks. These measurements were matched to the nearest timestamps for VeDBA, HR and Tsc recordings. Also, we obtained weather records from the nearest public weather stations for the years 1990–2021 (Table S2). Weather station IDs and locations relative to the study area are shown in Figure S1 in the Supporting Information. The temperatures at the study site and the nearest weather station were strongly correlated (Pearson’s, r = 0.99), but temperatures were on average ~1.0°C higher at the study site (Figure S2). Statistical analyses All statistical analyses were conducted in R version 4.1.1 (The R Core Team, 2021). Mean values are presented with standard deviation (SD), and parameter estimates with standard error (SE). Environmental effects on activity states and transition probabilities We fitted hidden Markov models (HMM) to 15-min sum VeDBA using the package ‘momentuHMM’ (McClintock & Michelot, 2018). HMMs assume that the observed pattern is driven by an underlying latent state sequence (a finite Markov chain). These states can then be used as proxies to interpret the animal’s unobserved behaviour (Langrock et al., 2012). We assumed only two underlying states, thought to represent ‘inactive’ and ‘active’ (Figure S3). The ‘active’ state thus contains multiple forms of movement, e.g., foraging, walking, and running, but reindeer spend more than 50% of the time foraging in summer (Skogland, 1980). We fitted several HMMs to evaluate both external (temperature and time of day) and individual-level (calf status) effects on the probability to occupy each state (stationary state probabilities). The combination of the explanatory variables in each HMM is listed in Table S5. Ta was fitted as a continuous variable with piecewise polynomial spline with 8 knots, asserted from visual inspection of the model outputs. We included sine and cosine terms for time of day to account for cyclicity. In addition, to assess the impact of Ta on activity patterns, we fitted five temperature-day categories in interaction with time of day. These categories were based on 20% intervals of the distribution of temperature data from our local weather station, in the period 19 June to 19 August 2018, with ranges of < 10°C (cold), 10−13°C (cool), 13−16°C (intermediate) 16−20°C (warm) and ≥ 20°C (hot). We evaluated the significance of each variable on the transition probabilities from the confidence intervals of each estimate, and the goodness-of-fit of each model using Akaike information criteria (AIC) (Burnham & Anderson, 2002), retaining models within ΔAIC < 5. We extracted the most likely state occupied by an individual using the viterbi function, returning the optimal state pathway, i.e., a two-level categorical variable indicating whether the individual was most likely resting or active. We used this output to calculate daily activity budgets (% time spent active). Drivers of heart rate (HR) and subcutaneous body temperature (Tsc) We matched the activity states derived from the HMM to the HR and Tsc data. We opted to investigate the drivers of variation in HR and Tsc only within the inactive state. HR and Tsc were fitted as response variables in separate generalised additive mixed-effects models (GAMM), which included the following smooth terms: calendar day as a thin-plate regression spline, time of day (ToD, in hours, knots [k] = 10) as a cubic circular regression spline and individual as random intercept. All models were fitted using restricted maximum likelihood, a penalization value (λ) of 1.4 (Wood, 2017), and an autoregressive structure (AR1) to account for temporal autocorrelation. We used the ‘gam.check’ function from the ‘mgcv’ package to select k. The sum of VeDBA in the past 15 minutes was included as a predictor in all models. All models were fitted with the same set of explanatory variables: sum VeDBA, age, body mass (BM), lactation status, Ta, as well as the interaction between lactation status and Ta. Description of files 1. Data: "kutuharju_weather.csv" weather data recorded from local weather station during study period "Inari_Ivalo_lentoasema.csv" public weather data from weather station ID 102033, owned and managed by the Finnish Meterorological Institute "activitydata.Rdata" dataset used in analyses of activity patterns in reindeer "HR_temp_data.Rdata" dataset used in analyses of heart rate and body temperature responses in reindeer "HRfigureData.Rdata" and "TempFigureData.Rdata" are data files (lists) with model outputs generated in "heartrate_bodytemp_analyses.R" and used in "figures_in_paper.R" "HMM_df_withStates.Rdata" data frame used in HMM models including output from viterbi function "plotdf_m16.Rdata" dataframe for plotting output from model 16 "plotdf_m22.Rdata" dataframe for plotting output from model 22 2. Scripts "activitydata_HMMs.R" R script for data prep and hidden markov models to analyse activity patterns in reindeer "heartrate_bodytemp_analyses.R" R script for data prep and generalized additive mixed models to analyse heart rate and body temperature responses in reindeer "figures_in_paper.R" R script for generating figures 1-3 in the manuscript 3. HMM_model "modelList.Rdata" list containing 2 items: string of all 25 HMM models created, and dataframe with model number and formula "m16.Rdata" and "m22.Rdata" direct acces to two best-fit models
Supplement 2. Function dispeRsal for predicting seed dispersal distances in...
wiley.figshare.com
html
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Riin Tamme; Lars Götzenberger; Martin Zobel; James M. Bullock; Danny A. P. Hooftman; Ants Kaasik; Meelis Pärtel (2023). Supplement 2. Function dispeRsal for predicting seed dispersal distances in R, and instructions for using dispeRsal. [Dataset]. http://doi.org/10.6084/m9.figshare.3558624.v1
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3558624.v1
Dataset updated
May 30, 2023
Dataset provided by
Wileyhttps://www.wiley.com/
Authors
Riin Tamme; Lars Götzenberger; Martin Zobel; James M. Bullock; Danny A. P. Hooftman; Ants Kaasik; Meelis Pärtel
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
File List dispeRsal.rda (MD5: 3589cb5b047c9a44e83fd624dabbacad) instructions.pdf (MD5: b5114a42cb9a0e1856e3fa87a3f8db89)

Description dispeRsal.rda – includes the function dispeRsal to predict maximum seed dispersal distances from simple plant traits in software R (R Development Core Team 2012) as well as a dataframe used for the predictive models, an example dataframe, a dataframe for assigning taxonomic families to orders and modified versions of the TPLck and TPL functions from the Taxonstand (Cayuela et al. 2012) package for handling synonymies in the data provided by the user. Full details and instructions for using dispeRsal can be found in the instructions file. instructions.pdf – detailed instructions for using dispeRsal function to predict maximum seed dispersal distances with confidence intervals for user’s own data sets. Future updates of the tool and the underlying data can be found on a webpage at http://www.botany.ut.ee/dispersal. To achieve ongoing improvements to the models, we ask the research community to contact us with further data on measured seed dispersal distances and associated plant traits.
Z
Calculation of Ferrite Core Losses with Arbitrary Waveforms using the...
data-staging.niaid.nih.gov
data.niaid.nih.gov
Updated Mar 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomas Guillod; Lee, Jenna S.; Li, Haoran; Shukai, Wang; Minjie, Chen; Sullivan, Charles R. (2025). Calculation of Ferrite Core Losses with Arbitrary Waveforms using the Composite Waveform Hypothesis: Reproducibility Dataset [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_7368935
Explore at:
Dataset updated
Mar 28, 2025
Dataset provided by
Princeton University
Dartmouth College
Authors
Thomas Guillod; Lee, Jenna S.; Li, Haoran; Shukai, Wang; Minjie, Chen; Sullivan, Charles R.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Paper

This package contains the datasets used in the following paper:

Calculation of Ferrite Core Losses with Arbitrary Waveforms using the Composite Waveform Hypothesis

Thomas Guillod, Jenna S. Lee, Haoran Li, Shukai Wang, Minjie Chen, and Charles R. Sullivan

https://doi.org/10.1109/APEC43580.2023.10131348

IEEE APEC 2023, Orlando, Florida, USA

Datasets

The EPCOS TDK N87 datasets used in this paper are part of the MagNet initiative (https://mag-net.princeton.edu). MagNet is an openly available large-scale dataset including measurements of several core materials under various operating conditions. MagNet is a joint project between Princeton University, Dartmouth College, and Plexim GmbH.

This package includes the two datasets used in the paper:

"N87_ambient_temperature" - Loss dataset for EPCOS TDK N87 at ambient temperature (measured on a R22.1X13.7X7.9, 2022-02-01).

"N87_variable_temperature" - Loss dataset for EPCOS TDK N87 at different temperatures (measured on a R34.0X20.5X12.5, 2022-07-14).

It should be noted that the datasets contains more measurements than used in the paper:

The measurements where the iGCC can be evaluated without extrapolation are used in the paper.

The measurements where the iGCC require an extrapolation of the loss data are not used in the paper.

A flag in the dataset indicates in which category a measurement belongs.

More details about the measurement setup can be found on the MagNet website (https://mag-net.princeton.edu).

More details about the iGCC method can be found on GitHub (https://github.com/otvam/magnet_webinar_eqn_models).

File Formats

The datasets are available in three different formats:

CSV (text files).

MATLAB tables (MAT v7.3 binary files, exported with MATLAB 2021a).

Pandas dataframes (HDF5 binary files, exported with Python 3.10.6 and Pandas 1.3.5).

The file "dataset_metadata.csv" contains the description of the different variables.The file "test_matlab.m" is a MATLAB test file for loading the MATLAB tables.The file "test_python.py" is a Python test file for loading the Pandas dataframes.
Insurance Claims Data
kaggle.com
zip
Updated Jan 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Satish Varma (2022). Insurance Claims Data [Dataset]. https://www.kaggle.com/datasets/saisatish09/insuranceclaimsdata
Explore at:
zip(1959661 bytes)Available download formats
Dataset updated
Jan 30, 2022
Authors
Satish Varma
Description
Autobi(Automobile Bodily Injury Claims) -

The data contains information on demographic information about the claimant, attorney involvement and the economic loss (LOSS, in thousands), among other variables.The full data contains over 70,000 closed claims based on data from thirty-two insurers.

A data frame with 1340 observations on the following 8 variables.

CASENUM- Case number to identify the claim, a numeric vector ATTORNEY- Whether the claimant is represented by an attorney (=1 if yes and =2 if no), a numeric vector CLMSEX - Claimant's gender (=1 if male and =2 if female), a numeric vector MARITAL- claimant's marital status (=1 if married, =2 if single, =3 if widowed, and =4 if divorced/separated), a numeric vector CLMINSUR- Whether or not the driver of the claimant's vehicle was uninsured (=1 if yes, =2 if no, and =3 if not applicable), a numeric vector SEATBELT- Whether or not the claimant was wearing a seatbelt/child restraint (=1 if yes, =2 if no, and =3 if not applicable), a numeric vector CLMAGE- Claimant's age, a numeric vector LOSS- The claimant's total economic loss (in thousands), a numeric vector

AutoClaims(Automobile Insurance Claims) -

A data frame with 6773 observations on the following 5 variables.

STATE CLASS - Rating class of operator, based on age, gender, marital status, use of vehicle GENDER AGE - Age of operator PAID - Amount paid to settle and close a claim

AutoCollision(Automobile UK Collision Claims)

8,942 collision losses from private passenger United Kingdom (UK) automobile insurance policies. The average severity is in pounds sterling adjusted for inflation.

A data frame with 32 observations on the following 4 variables.

Age - Age of driver Vehicle_Use - Purpose of the vehicle use Severity - Average amount of claims Claim_Count - Number of claims

Additional information can be found in the document: https://cran.r-project.org/web/packages/insuranceData/index.html
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

John M. Dwyer; Daniel C. Laughlin (2018). Constraints on trait combinations explain climatic drivers of biodiversity: the importance of trait covariance in community assembly [Dataset]. http://doi.org/10.5061/dryad.76kt8

Data from: Constraints on trait combinations explain climatic drivers of biodiversity: the importance of trait covariance in community assembly

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.5061/dryad.76kt8

Dataset updated

Apr 27, 2018

Dataset provided by

Dryad

Authors

John M. Dwyer; Daniel C. Laughlin

Time period covered

Apr 27, 2017

Description

quadrat.scale.dataRefer to R script ("Dwyer_&_Laughlin_2017_Trait_covariance_script.r" for information about this dataframe.species.in.quadrat.scale.dataRefer to R script ("Dwyer_&_Laughlin_2017_Trait_covariance_script.r" for information about this dataframe.Dwyer_&_Laughlin_2017_Trait_covariance_scriptThis script reads in the two dataframes of "raw" data, calculates diversity and trait metrics and runs the major analyses presented in Dwyer & Laughlin 2017.

Clear search

Close search

Google apps

Main menu

Data from: Constraints on trait combinations explain climatic drivers of...

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

Additional Tennessee Eastman Process Simulation Data for Anomaly Detection...

Supplement 1. Code for simulations and statistical models.

Replication Data for: The Wikipedia Adventure: Field Evaluation of an...

Study Hours vs Grades Dataset

Dataset Features

Potential Use Cases

Data Quality

Data Generation Code

R Code

CO2_data

Data for the Farewell and Herberg example of a two-phase experiment using a...

Alpha-Chaconine and Alpha-Solanine Occurrence in

Alpha-Chaconine and Alpha-Solanine Occurrence in Food and Feed

Dietary Exposure Assessment

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Accessing Datasets

Data used in "A summer heatwave reduced activity, heart rate and autumn body...

Supplement 2. Function dispeRsal for predicting seed dispersal distances in...

Calculation of Ferrite Core Losses with Arbitrary Waveforms using the...

Insurance Claims Data

Autobi(Automobile Bodily Injury Claims) -

AutoClaims(Automobile Insurance Claims) -

AutoCollision(Automobile UK Collision Claims)

Data from: Constraints on trait combinations explain climatic drivers of biodiversity: the importance of trait covariance in community assembly