Facebook
TwitterFrequency table for different selected variables.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Weighted frequency distribution for selected variables.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ORCA-VFD is a multi-domain dataset for reliability modeling and remaining useful life (RUL) estimation of variable frequency drives (VFDs).The dataset integrates physics-derived sequences, processed fault-injection data, and field-informed degradation patterns to create a unified framework for lifecycle analysis and predictive maintenance research.This Figshare record contains the full synthesized ORCA-VFD lifecycle dataset, including:100,000-hour lifecycle trajectories spanning infant-mortality, useful-life, and wearout phases.Anomaly score sequences derived from physics-informed feature engineering.Core-8 reliability features computed from VFD electrical signatures.Processed versions of physics, fault, and field datasets, transformed into the standardized ORCA-VFD format.Training, validation, and test sets used for remaining useful life model development.Metadata files, including feature definitions and lifecycle documentation.Raw third-party datasets (e.g., Hanke physics data, PMSM fault data) are not redistributed here and are available at their original sources as cited in the ORCA-VFD manuscript.This Figshare package includes only newly created or transformed data, compliant with open data licensing practices.The ORCA-VFD dataset supports research in:predictive maintenancephysics-informed machine learningVFD degradation modelingreliability engineeringRUL predictiondomain adaptation and cross-domain validationeconomic optimization of maintenance actionsThe companion GitHub repository provides the full modeling code, lifecycle synthesis scripts, feature engineering tools, and sample files:https://github.com/gencaddy2/ORCA-VFD
Facebook
TwitterThis data set contains 18 metrics used to describe patterns in specific conductance (SC) and chloride concentrations in 93 streams located across the eastern United States. These data were quantified for an analysis described in Moore and others (in review). All metrics were quantified for a water year and a median was taken across all years for which data were available to provide a single value for each site. High-frequency SC and chloride were measured or estimated at sub-daily time steps from 2-minute intervals to hourly intervals (e.g., high-frequency) depending on the site. Moore, J., R. Fanelli, and A. Sekellick. In review. High-frequency data reveal deicing salts drive elevated conductivity and chloride along with pervasive and frequent exceedances of the EPA aquatic life criteria for chloride in urban streams. Submitted to Environmental Science and Technology.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Frequency distributions of the background variables (N = 542).
Facebook
TwitterTo facilitate the use of data collected through the high-frequency phone surveys on COVID-19, the Living Standards Measurement Study (LSMS) team has created the harmonized datafiles using two household surveys: 1) the country’ latest face-to-face survey which has become the sample frame for the phone survey, and 2) the country’s high-frequency phone survey on COVID-19.
The LSMS team has extracted and harmonized variables from these surveys, based on the harmonized definitions and ensuring the same variable names. These variables include demography as well as housing, household consumption expenditure, food security, and agriculture. Inevitably, many of the original variables are collected using questions that are asked differently. The harmonized datafiles include the best available variables with harmonized definitions.
Two harmonized datafiles are prepared for each survey. The two datafiles are: 1. HH: This datafile contains household-level variables. The information include basic household characterizes, housing, water and sanitation, asset ownership, consumption expenditure, consumption quintile, food security, livestock ownership. It also contains information on agricultural activities such as crop cultivation, use of organic and inorganic fertilizer, hired labor, use of tractor and crop sales. 2. IND: This datafile contains individual-level variables. It includes basic characteristics of individuals such as age, sex, marital status, disability status, literacy, education and work.
National coverage
The survey covered all de jure households excluding prisons, hospitals, military barracks, and school dormitories.
Sample survey data [ssd]
See “Ethiopia - Socioeconomic Survey 2018-2019” and “Ethiopia - COVID-19 High Frequency Phone Survey of Households 2020” available in the Microdata Library for details.
Computer Assisted Personal Interview [capi]
Ethiopia Socioeconomic Survey (ESS) 2018-2019 and Ethiopia COVID-19 High Frequency Phone Survey of Households (HFPS) 2020 data were harmonized following the harmonization guidelines (see “Harmonized Datafiles and Variables for High-Frequency Phone Surveys on COVID-19” for more details).
The high-frequency phone survey on COVID-19 has multiple rounds of data collection. When variables are extracted from multiple rounds of the survey, the originating round of the survey is noted with “_rX” in the variable name, where X represents the number of the round. For example, a variable with “_r3” presents that the variable was extracted from Round 3 of the high-frequency phone survey. Round 0 refers to the country’s latest face-to-face survey which has become the sample frame for the high-frequency phone surveys on COVID-19. When the variables are without “_rX”, they were extracted from Round 0.
See “Ethiopia - Socioeconomic Survey 2018-2019” and “Ethiopia - COVID-19 High Frequency Phone Survey of Households 2020” available in the Microdata Library for details.
Facebook
TwitterTo facilitate the use of data collected through the high-frequency phone surveys on COVID-19, the Living Standards Measurement Study (LSMS) team has created the harmonized datafiles using two household surveys: 1) the country’ latest face-to-face survey which has become the sample frame for the phone survey, and 2) the country’s high-frequency phone survey on COVID-19.
The LSMS team has extracted and harmonized variables from these surveys, based on the harmonized definitions and ensuring the same variable names. These variables include demography as well as housing, household consumption expenditure, food security, and agriculture. Inevitably, many of the original variables are collected using questions that are asked differently. The harmonized datafiles include the best available variables with harmonized definitions.
Two harmonized datafiles are prepared for each survey. The two datafiles are:
1. HH: This datafile contains household-level variables. The information include basic household characterizes, housing, water and sanitation, asset ownership, consumption expenditure, consumption quintile, food security, livestock ownership. It also contains information on agricultural activities such as crop cultivation, use of organic and inorganic fertilizer, hired labor, use of tractor and crop sales.
2. IND: This datafile contains individual-level variables. It includes basic characteristics of individuals such as age, sex, marital status, disability status, literacy, education and work.
National coverage
The survey covered all de jure households excluding prisons, hospitals, military barracks, and school dormitories.
Sample survey data [ssd]
See “Malawi - Integrated Household Panel Survey 2010-2013-2016-2019 (Long-Term Panel, 102 EAs)” and “Malawi - High-Frequency Phone Survey on COVID-19” available in the Microdata Library for details.
Computer Assisted Personal Interview [capi]
Malawi Integrated Household Panel Survey (IHPS) 2019 and Malawi High-Frequency Phone Survey on COVID-19 data were harmonized following the harmonization guidelines (see “Harmonized Datafiles and Variables for High-Frequency Phone Surveys on COVID-19” for more details).
The high-frequency phone survey on COVID-19 has multiple rounds of data collection. When variables are extracted from multiple rounds of the survey, the originating round of the survey is noted with “_rX” in the variable name, where X represents the number of the round. For example, a variable with “_r3” presents that the variable was extracted from Round 3 of the high-frequency phone survey. Round 0 refers to the country’s latest face-to-face survey which has become the sample frame for the high-frequency phone surveys on COVID-19. When the variables are without “_rX”, they were extracted from Round 0.
See “Malawi - Integrated Household Panel Survey 2010-2013-2016-2019 (Long-Term Panel, 102 EAs)” and “Malawi - High-Frequency Phone Survey on COVID-19” available in the Microdata Library for details.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Note that the table is based on studies focusing solely on behaviour or survival, but not both.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset Overview:
This dataset contains simulated (hypothetical) but almost realistic (based on AI) data related to sleep, heart rate, and exercise habits of 500 individuals. It includes both pre-exercise and post-exercise resting heart rates, allowing for analyses such as a dependent t-test (Paired Sample t-test) to observe changes in heart rate after an exercise program. The dataset also includes additional health-related variables, such as age, hours of sleep per night, and exercise frequency.
The data is designed for tasks involving hypothesis testing, health analytics, or even machine learning applications that predict changes in heart rate based on personal attributes and exercise behavior. It can be used to understand the relationships between exercise frequency, sleep, and changes in heart rate.
File: Filename: heart_rate_data.csv File Format: CSV
- Features (Columns):
Age: Description: The age of the individual. Type: Integer Range: 18-60 years Relevance: Age is an important factor in determining heart rate and the effects of exercise.
Sleep Hours: Description: The average number of hours the individual sleeps per night. Type: Float Range: 3.0 - 10.0 hours Relevance: Sleep is a crucial health metric that can impact heart rate and exercise recovery.
Exercise Frequency (Days/Week): Description: The number of days per week the individual engages in physical exercise. Type: Integer Range: 1-7 days/week Relevance: More frequent exercise may lead to greater heart rate improvements and better cardiovascular health.
Resting Heart Rate Before: Description: The individual’s resting heart rate measured before beginning a 6-week exercise program. Type: Integer Range: 50 - 100 bpm (beats per minute) Relevance: This is a key health indicator, providing a baseline measurement for the individual’s heart rate.
Resting Heart Rate After: Description: The individual’s resting heart rate measured after completing the 6-week exercise program. Type: Integer Range: 45 - 95 bpm (lower than the "Resting Heart Rate Before" due to the effects of exercise). Relevance: This variable is essential for understanding how exercise affects heart rate over time, and it can be used to perform a dependent t-test analysis.
Max Heart Rate During Exercise: Description: The maximum heart rate the individual reached during exercise sessions. Type: Integer Range: 120 - 190 bpm Relevance: This metric helps in understanding cardiovascular strain during exercise and can be linked to exercise frequency or fitness levels.
Potential Uses: Dependent T-Test Analysis: The dataset is particularly suited for a dependent (paired) t-test where you compare the resting heart rate before and after the exercise program for each individual.
Exploratory Data Analysis (EDA):Investigate relationships between sleep, exercise frequency, and changes in heart rate. Potential analyses include correlations between sleep hours and resting heart rate improvement, or regression analyses to predict heart rate after exercise.
Machine Learning: Use the dataset for predictive modeling, and build a beginner regression model to predict post-exercise heart rate using age, sleep, and exercise frequency as features.
Health and Fitness Insights: This dataset can be useful for studying how different factors like sleep and age influence heart rate changes and overall cardiovascular health.
License: Choose an appropriate open license, such as:
CC BY 4.0 (Attribution 4.0 International).
Inspiration for Kaggle Users: How does exercise frequency influence the reduction in resting heart rate? Is there a relationship between sleep and heart rate improvements post-exercise? Can we predict the post-exercise heart rate using other health variables? How do age and exercise frequency interact to affect heart rate?
Acknowledgments: This is a simulated dataset for educational purposes, generated to demonstrate statistical and machine learning applications in the field of health analytics.
Facebook
TwitterTo facilitate the use of data collected through the high-frequency phone surveys on COVID-19, the Living Standards Measurement Study (LSMS) team has created the harmonized datafiles using two household surveys: 1) the country’ latest face-to-face survey which has become the sample frame for the phone survey, and 2) the country’s high-frequency phone survey on COVID-19.
The LSMS team has extracted and harmonized variables from these surveys, based on the harmonized definitions and ensuring the same variable names. These variables include demography as well as housing, household consumption expenditure, food security, and agriculture. Inevitably, many of the original variables are collected using questions that are asked differently. The harmonized datafiles include the best available variables with harmonized definitions.
Two harmonized datafiles are prepared for each survey. The two datafiles are:
1. HH: This datafile contains household-level variables. The information include basic household characterizes, housing, water and sanitation, asset ownership, consumption expenditure, consumption quintile, food security, livestock ownership. It also contains information on agricultural activities such as crop cultivation, use of organic and inorganic fertilizer, hired labor, use of tractor and crop sales.
2. IND: This datafile contains individual-level variables. It includes basic characteristics of individuals such as age, sex, marital status, disability status, literacy, education and work.
National coverage
The survey covered all de jure households excluding prisons, hospitals, military barracks, and school dormitories.
Sample survey data [ssd]
See “Nigeria - General Household Survey, Panel 2018-2019, Wave 4” and “Nigeria - COVID-19 National Longitudinal Phone Survey 2020” available in the Microdata Library for details.
Computer Assisted Personal Interview [capi]
Nigeria General Household Survey, Panel (GHS-Panel) 2018-2019 and Nigeria COVID-19 National Longitudinal Phone Survey (COVID-19 NLPS) 2020 data were harmonized following the harmonization guidelines (see “Harmonized Datafiles and Variables for High-Frequency Phone Surveys on COVID-19” for more details).
The high-frequency phone survey on COVID-19 has multiple rounds of data collection. When variables are extracted from multiple rounds of the survey, the originating round of the survey is noted with “_rX” in the variable name, where X represents the number of the round. For example, a variable with “_r3” presents that the variable was extracted from Round 3 of the high-frequency phone survey. Round 0 refers to the country’s latest face-to-face survey which has become the sample frame for the high-frequency phone surveys on COVID-19. When the variables are without “_rX”, they were extracted from Round 0.
See “Nigeria - General Household Survey, Panel 2018-2019, Wave 4” and “Nigeria - COVID-19 National Longitudinal Phone Survey 2020” available in the Microdata Library for details.
Facebook
Twitterfrequency and percentage distribution of the dependent variables.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The CoDEx-VFD dataset provides time-series current measurements from a three-phase Variable Frequency Drive (VFD) system subjected to controlled electromagnetic disturbances (EMD). This dataset is designed for benchmarking and comparing anomaly detection algorithms in the context of electromagnetic compatibility (EMC). The data was collected under controlled laboratory conditions, with varying levels of disturbance severity and frequency, providing a valuable resource for researchers developing and evaluating methods for EMI detection and mitigation in electronic systems. The dataset comprises 100 CSV files, each representing a single measurement run with different anomaly scenarios. Measurements include two directly measured phase currents along with a binary label indicating the presence or absence of an injected disturbance at each time point. The sampling rate is 2.5 MHz, providing high temporal resolution for capturing transient EMI events. Key experimental parameters, including disturbance characteristics and equipment details, are documented in the accompanying README file.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This comprehensive earthquake dataset contains detailed records of seismic events in Southern California, specifically filtered to focus on a 100 km radius around Los Angeles from January 1, 2012, to September 1, 2024. The dataset was developed to support advanced machine learning and neural network algorithms for earthquake forecasting and prediction.
The dataset was compiled from the Southern California Earthquake Data Center (SCEDC) and underwent extensive preprocessing including: - Magnitude standardization to local magnitude (ML) scale - Spatial filtering within 100 km radius of Los Angeles - Feature engineering for enhanced predictive modeling - Quality control to exclude inconsistent magnitude types
The dataset includes multiple engineered features designed to enhance predictive modeling capabilities:
Six-category classification of earthquake magnitude classes
DOI: Yavas, C. E., Chen, L., Kadlec, C., & Ji, Y. (2024). Los Angeles, California, Earthquake Dataset with Feature-Engineered Variables (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.13738726
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
aP value for a two-sided χ2 test.b+/− : presence or absence of HBV infection.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Written word frequency is a key variable used in many psycholinguistic studies and is central in explaining visual word recognition. Indeed, methodological advances on single word frequency estimates have helped to uncover novel language-related cognitive processes, fostering new ideas and studies. In an attempt to support and promote research on a related emerging topic, visual multi-word recognition, we extracted from the exhaustive Google Ngram datasets a selection of millions of multi-word sequences and computed their associated frequency estimate. Such sequences are presented with Part-of-Speech information for each individual word. An online behavioral investigation making use of the French 4-gram lexicon in a grammatical decision task was carried out. The results show an item-level frequency effect of word sequences. Moreover, the proposed datasets were found useful during the stimulus selection phase, allowing more precise control of the multi-word characteristics.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The Climatic Research Unit (CRU) Country (CY) data version 4.08 dataset consists of ten climate variables for country averages at a monthly, seasonal and annual frequency: including cloud cover, diurnal temperature range, frost day frequency, precipitation, daily mean temperature, monthly average daily maximum and minimum temperature, vapour pressure, potential evapotranspiration and wet day frequency. This version uses the updated set of country definitions, please see the appropriate Release Notes.
This dataset was produced in 2024 by CRU at the University of East Anglia and extends the CRU CY4.07 data to include 2023. The data are available as text files with the extension '.per' and can be opened by most text editors.
Spatial averages are calculated using area-weighted means. CRU CY4.08 is derived directly from the CRU time series (TS) 4.07 dataset. CRU CY version 4.08 spans the period 1901-2023 for 292 countries.
To understand the CRU CY4.08 dataset, it is important to understand the construction and limitations of the underlying dataset, CRU TS4.07. It is therefore recommended that all users read the Harris et al, 2020 paper and the CRU TS4.08 release notes listed in the online documentation on this record.
CRU CY data are available for download to all CEDA users.
Facebook
TwitterThe 2.4 GHz ISM band is shared by Wi-Fi, Bluetooth, Wireless HART, ISA100.11a, and several other industrial wireless systems. Our dataset contains comprehensive electromagnetic interference (EMI) measurements from machinery taken in various industrial environments. The measurements were taken at two frequencies: 900 MHz, 2.4 GHz. This dataset may be useful for understanding EMI emitters in factories and can be instrumental in developing interference mitigation strategies, aiding in RF band selection and enterprise frequency planning, improving wireless technology, and informing communications standardization activities such as the IEEE 3388 industrial wireless performance evaluation standard.The interference measurements were taken in the following types of industrial environments:1) Infrared Curing Machine: Curing process using infrared radiation producing EMI across the 2.4 GHz band, 2) Crane with an Unshielded VFD: Overhead gantry crane operating at 900 MHz with an unshielded variable frequency drive (VFD) causing broadband interference, 3) Microwave Dryer: Two independent sets of measurements of a microwave oven baking machines used for a ceramic drying process. Multiple magnetrons are used with a power output of 1100 Watts each, 4) Unidentified Interference: General recording of the 2400 MHz band capturing both wireless network traffic and an unidentified broadband RFI emitter possibly caused by an unshielded VFD.NIST Disclaimer: Certain commercial equipment, instruments, or materials are identified in this publication in order to describe the experimental procedures and data adequately. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides daily gridded data of sea ice edge and sea ice type derived from brightness temperatures measured by satellite passive microwave radiometers. Sea ice is an important component of our climate system and a sensitive indicator of climate change. Its presence or its retreat has a strong impact on air-sea interactions, the Earth’s energy budget as well as marine ecosystems. It is recognized by the Global Climate Observing System as an Essential Climate Variable. Sea ice edge and type are some of the parameters used to characterise sea ice. Other parameters include sea ice concentration and sea ice thickness, also available in the Climate Data Store. Sea ice edge and type are defined as follows:
Sea ice edge classifies the sea surface into open water, open ice, and closed ice depending on the amount of sea ice present in each grid cell. This variable is provided for both the Northern and Southern Hemispheres. Note that a sea ice concentration threshold of 30% is used to distinguish between open water and open ice, which differs from the 15% threshold commonly used for other sea ice products such as sea ice extent. Sea ice type classifies ice-covered areas into two categories based on the age of the sea ice: multiyear ice versus seasonal first-year ice. This variable is currently only available for the Northern Hemisphere and limited to the extended boreal winter months (October through April). Sea ice type classification during summer is difficult due to the effect of melting at the ice surface which disturbs the passive microwave signature.
Both sea ice products are based on measurements from the series of Scanning Multichannel Microwave Radiometer (SMMR), Special Sensor Microwave/Imager (SSM/I), and Special Sensor Microwave Imager/Sounder (SSMIS) sensors and share the same algorithm baseline. However, sea ice edge makes use of two lower frequencies near 19 GHz and 37 GHz and a higher frequency near 90 GHz whereas sea ice type only uses the two lower frequencies. This dataset combines Climate Data Records (CDRs), which are intended to have sufficient length, consistency, and continuity to assess climate variability and change, and Interim Climate Data Records (ICDRs), which provide regular temporal extensions to the CDRs and where consistency with the CDRs is expected but not extensively checked. For this dataset, both the CDR and ICDR parts of each product were generated using the same software and algorithms. The CDRs of sea ice edge and type currently extend from 25 October 1978 to 31 December 2020 whereas the corresponding ICDRs extend from January 2021 to present (with a 16-day latency behind real time). All data from the current release of the datasets (version 3.0) are Level-4 products, in which data gaps are filled by temporal and spatial interpolation. For product limitations and known issues, please consult the Product User Guide. This dataset is produced on behalf of Copernicus Climate Change Service (C3S), with heritage from the operational products generated by EUMETSAT Ocean and Sea Ice Satellite Application Facility (OSI SAF).
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The CRU CY3.21 dataset consists of country averages at a monthly, seasonal and annual frequency, for ten climate variables in 289 countries for the period Jan. 1901 to Dec. 2012. It was produced in 2013 by the Climatic Research Unit (CRU) at the University of East Anglia. Spatial averages are calculated using area-weighted means. Variables include cloud cover (cld), diurnal temperature range (dtr), frost day frequency (frs), precipitation (pre), daily mean temperature (tmp), monthly average daily maximum (tmx) and minimum (tmn) temperature, vapour pressure (vap), Potential Evapo-transpiration (pet) and wet day frequency (wet).
CRU CY3.21 is derived directly from the CRU TS3.21 dataset. Version numbering is matched between the two datasets. The data are available as text files with the extension '.per' and can be opened by most text editors.
To understand the CRU-CY3.21 dataset, it is important to understand the construction and limitations of the underlying dataset, CRU TS3.21. It is therefore recommended that all users read the paper referenced below (Harris et al, 2014).
CRU CY data are available for download to all CEDA users.
Facebook
TwitterA high-frequency record of chlorophyll fluorescence measurements collected during both under-ice and open-water periods using sonde technology. The dataset includes values adjusted using laboratory measurements. Erken Laboratory (2025). Lake variables - Chlorophyll from Erken, 2019-04-17–2024-11-13 [Data set]. Swedish Infrastructure for Ecosystem Science (SITES). https://hdl.handle.net/11676.1/w1di9Z4rHVXirQR7FxM02TRB
Facebook
TwitterFrequency table for different selected variables.