24 datasets found

Z
EA-MD-QD: Large Euro Area and Euro Member Countries Datasets for...
data.niaid.nih.gov
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Barigozzi, Matteo; Lissona, Claudio (2025). EA-MD-QD: Large Euro Area and Euro Member Countries Datasets for Macroeconomic Research [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10514667
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
University of Bologna
Authors
Barigozzi, Matteo; Lissona, Claudio
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
EA-MD-QD is a collection of large monthly and quarterly EA and EA member countries datasets for macroeconomic analysis.The EA member countries covered are: AT, BE, DE, EL, ES, FR, IE, IT, NL, PT.

The formal reference to this dataset is:

Barigozzi, M. and Lissona, C. (2024) "EA-MD-QD: Large Euro Area and Euro Member Countries Datasets for Macroeconomic Research". Zenodo.

Please refer to it when using the data.

Each zip file contains:- Excel files for the EA and the countries covered, each containing an unbalanced panel of raw de-seasonalized data.- A Matlab code taking as input the raw data and allowing to perform various operations such as:choose the frequency, fill-in missing values, transform data to stationarity, and control for covid outliers.- A pdf file with all informations about the series names, sources, and transformation codes.

This version (03.2025):

Updated data as of 28-March-2025. We improved the matlab code and included a ReadME file containing details on the parameters' choice from the user, which before were only briefly commented in the code.
f
Independent Data Aggregation, Quality Control and Visualization of...
datasetcatalog.nlm.nih.gov
Updated Oct 21, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ly, Chun; Knott, Cheryl; McCleary, Jill; Castiello-Gutiérrez, Santiago (2020). Independent Data Aggregation, Quality Control and Visualization of University of Arizona COVID-19 Re-Entry Testing Data [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000484783
Explore at:
Dataset updated
Oct 21, 2020
Authors
Ly, Chun; Knott, Cheryl; McCleary, Jill; Castiello-Gutiérrez, Santiago
Description
AbstractThe dataset provided here contains the efforts of independent data aggregation, quality control, and visualization of the University of Arizona (UofA) COVID-19 testing programs for the 2019 novel Coronavirus pandemic. The dataset is provided in the form of machine-readable tables in comma-separated value (.csv) and Microsoft Excel (.xlsx) formats.Additional InformationAs part of the UofA response to the 2019-20 Coronavirus pandemic, testing was conducted on students, staff, and faculty prior to start of the academic year and throughout the school year. These testings were done at the UofA Campus Health Center and through their instance program called "Test All Test Smart" (TATS). These tests identify active cases of SARS-nCoV-2 infections using the reverse transcription polymerase chain reaction (RT-PCR) test and the Antigen test. Because the Antigen test provided more rapid diagnosis, it was greatly used three weeks prior to the start of the Fall semester and throughout the academic year.As these tests were occurring, results were provided on the COVID-19 websites. First, beginning in early March, the Campus Health Alerts website reported the total number of positive cases. Later, numbers were provided for the total number of tests (March 12 and thereafter). According to the website, these numbers were updated daily for positive cases and weekly for total tests. These numbers were reported until early September where they were then included in the reporting for the TATS program.For the TATS program, numbers were provided through the UofA COVID-19 Update website. Initially on August 21, the numbers provided were the total number (July 31 and thereafter) of tests and positive cases. Later (August 25), additional information was provided where both PCR and Antigen testings were available. Here, the daily numbers were also included. On September 3, this website then provided both the Campus Health and TATS data. Here, PCR and Antigen were combined and referred to as "Total", and daily and cumulative numbers were provided.At this time, no official data dashboard was available until September 16, and aside from the information provided on these websites, the full dataset was not made publicly available. As such, the authors of this dataset independently aggregated data from multiple sources. These data were made publicly available through a Google Sheet with graphical illustration provided through the spreadsheet and on social media. The goal of providing the data and illustrations publicly was to provide factual information and to understand the infection rate of SARS-nCoV-2 in the UofA community.Because of differences in reported data between Campus Health and the TATS program, the dataset provides Campus Health numbers on September 3 and thereafter. TATS numbers are provided beginning on August 14, 2020.Description of Dataset ContentThe following terms are used in describing the dataset.1. "Report Date" is the date and time in which the website was updated to reflect the new numbers2. "Test Date" is to the date of testing/sample collection3. "Total" is the combination of Campus Health and TATS numbers4. "Daily" is to the new data associated with the Test Date5. "To Date (07/31--)" provides the cumulative numbers from 07/31 and thereafter6. "Sources" provides the source of information. The number prior to the colon refers to the number of sources. Here, "UACU" refers to the UA COVID-19 Update page, and "UARB" refers to the UA Weekly Re-Entry Briefing. "SS" and "WBM" refers to screenshot (manually acquired) and "Wayback Machine" (see Reference section for links) with initials provided to indicate which author recorded the values. These screenshots are available in the records.zip file.The dataset is distinguished where available by the testing program and the methods of testing. Where data are not available, calculations are made to fill in missing data (e.g., extrapolating backwards on the total number of tests based on daily numbers that are deemed reliable). Where errors are found (by comparing to previous numbers), those are reported on the above Google Sheet with specifics noted.For inquiries regarding the contents of this dataset, please contact the Corresponding Author listed in the README.txt file. Administrative inquiries (e.g., removal requests, trouble downloading, etc.) can be directed to data-management@arizona.edu
u
Data from: DATABASE FOR THE ANALYSIS OF ROAD ACCIDENTS IN EUROPE
produccioncientifica.ugr.es
data.niaid.nih.gov
+1more
Updated 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Navarro-Moreno, José; De Oña, Juan; Calvo-Poyo, Francisco; Navarro-Moreno, José; De Oña, Juan; Calvo-Poyo, Francisco (2022). DATABASE FOR THE ANALYSIS OF ROAD ACCIDENTS IN EUROPE [Dataset]. https://produccioncientifica.ugr.es/documentos/668fc484b9e7c03b01bdfcfc
Explore at:
Dataset updated
2022
Authors
Navarro-Moreno, José; De Oña, Juan; Calvo-Poyo, Francisco; Navarro-Moreno, José; De Oña, Juan; Calvo-Poyo, Francisco
Area covered
Europe
Description
This database that can be used for macro-level analysis of road accidents on interurban roads in Europe. Through the variables it contains, road accidents can be explained using variables related to economic resources invested in roads, traffic, road network, socioeconomic characteristics, legislative measures and meteorology. This repository contains the data used for the analysis carried out in the papers: 1. Calvo-Poyo F., Navarro-Moreno J., de Oña J. (2020) Road Investment and Traffic Safety: An International Study. Sustainability 12:6332. https://doi.org/10.3390/su12166332 2. Navarro-Moreno J., Calvo-Poyo F., de Oña J. (2022) Influence of road investment and maintenance expenses on injured traffic crashes in European roads. Int J Sustain Transp 1–11. https://doi.org/10.1080/15568318.2022.2082344 3. Navarro-Moreno, J., Calvo-Poyo, F., de Oña, J. (2022) Investment in roads and traffic safety: linked to economic development? A European comparison. Environ. Sci. Pollut. Res. https://doi.org/10.1007/s11356-022-22567 The file with the database is available in excel. DATA SOURCES The database presents data from 1998 up to 2016 from 20 european countries: Austria, Belgium, Croatia, Czechia, Denmark, Estonia, Finland, France, Germany, Ireland, Italy, Latvia, Netherlands, Poland, Portugal, Slovakia, Slovenia, Spain, Sweden and United Kingdom. Crash data were obtained from the United Nations Economic Commission for Europe (UNECE) [2], which offers enough level of disaggregation between crashes occurring inside versus outside built-up areas. With reference to the data on economic resources invested in roadways, deserving mention –given its extensive coverage—is the database of the Organisation for Economic Cooperation and Development (OECD), managed by the International Transport Forum (ITF) [1], which collects data on investment in the construction of roads and expenditure on their maintenance, following the definitions of the United Nations System of National Accounts (2008 SNA). Despite some data gaps, the time series present consistency from one country to the next. Moreover, to confirm the consistency and complete missing data, diverse additional sources, mainly the national Transport Ministries of the respective countries were consulted. All the monetary values were converted to constant prices in 2015 using the OECD price index. To obtain the rest of the variables in the database, as well as to ensure consistency in the time series and complete missing data, the following national and international sources were consulted: Eurostat [3] Directorate-General for Mobility and Transport (DG MOVE). European Union [4] The World Bank [5] World Health Organization (WHO) [6] European Transport Safety Council (ETSC) [7] European Road Safety Observatory (ERSO) [8] European Climatic Energy Mixes (ECEM) of the Copernicus Climate Change [9] EU BestPoint-Project [10] Ministerstvo dopravy, República Checa [11] Bundesministerium für Verkehr und digitale Infrastruktur, Alemania [12] Ministerie van Infrastructuur en Waterstaat, Países Bajos [13] National Statistics Office, Malta [14] Ministério da Economia e Transição Digital, Portugal [15] Ministerio de Fomento, España [16] Trafikverket, Suecia [17] Ministère de l’environnement de l’énergie et de la mer, Francia [18] Ministero delle Infrastrutture e dei Trasporti, Italia [19–25] Statistisk sentralbyrå, Noruega [26-29] Instituto Nacional de Estatística, Portugal [30] Infraestruturas de Portugal S.A., Portugal [31–35] Road Safety Authority (RSA), Ireland [36] DATA BASE DESCRIPTION The database was made trying to combine the longest possible time period with the maximum number of countries with complete dataset (some countries like Lithuania, Luxemburg, Malta and Norway were eliminated from the definitive dataset owing to a lack of data or breaks in the time series of records). Taking into account the above, the definitive database is made up of 19 variables, and contains data from 20 countries during the period between 1998 and 2016. Table 1 shows the coding of the variables, as well as their definition and unit of measure. Table. Database metadata Code Variable and unit fatal_pc_km Fatalities per billion passenger-km fatal_mIn Fatalities per million inhabitants accid_adj_pc_km Accidents per billion passenger-km p_km Billions of passenger-km croad_inv_km Investment in roads construction per kilometer, €/km (2015 constant prices) croad_maint_km Expenditure on roads maintenance per kilometer €/km (2015 constant prices) prop_motorwa Proportion of motorways over the total road network (%) populat Population, in millions of inhabitants unemploy Unemployment rate (%) petro_car Consumption of gasolina and petrol derivatives (tons), per tourism alcohol Alcohol consumption, in liters per capita (age > 15) mot_index Motorization index, in cars per 1,000 inhabitants den_populat Population density, inhabitants/km² cgdp Gross Domestic Product (GDP), in € (2015 constant prices) cgdp_cap GDP per capita, in € (2015 constant prices) precipit Average depth of rain water during a year (mm) prop_elder Proportion of people over 65 years (%) dps Demerit Point System, dummy variable (0: no; 1: yes) freight Freight transport, in billions of ton-km ACKNOWLEDGEMENTS This database was carried out in the framework of the project “Inversión en carreteras y seguridad vial: un análisis internacional (INCASE)”, financed by: FEDER/Ministerio de Ciencia, Innovación y Universidades–Agencia Estatal de Investigación/Proyecto RTI2018-101770-B-I00, within Spain´s National Program of R+D+i Oriented to Societal Challenges. Moreover, the authors would like to express their gratitude to the Ministry of Transport, Mobility and Urban Agenda of Spain (MITMA), and the Federal Ministry of Transport and Digital Infrastructure of Germany (BMVI) for providing data for this study. REFERENCES 1. International Transport Forum OECD iLibrary | Transport infrastructure investment and maintenance. 2. United Nations Economic Commission for Europe UNECE Statistical Database Available online: https://w3.unece.org/PXWeb2015/pxweb/en/STAT/STAT_40-TRTRANS/?rxid=18ad5d0d-bd5e-476f-ab7c-40545e802eeb (accessed on Apr 28, 2020). 3. European Commission Database - Eurostat Available online: https://ec.europa.eu/eurostat/data/database (accessed on Apr 28, 2021). 4. Directorate-General for Mobility and Transport. European Commission EU Transport in figures - Statistical Pocketbooks Available online: https://ec.europa.eu/transport/facts-fundings/statistics_en (accessed on Apr 28, 2021). 5. World Bank Group World Bank Open Data | Data Available online: https://data.worldbank.org/ (accessed on Apr 30, 2021). 6. World Health Organization (WHO) WHO Global Information System on Alcohol and Health Available online: https://apps.who.int/gho/data/node.main.GISAH?lang=en (accessed on Apr 29, 2021). 7. European Transport Safety Council (ETSC) Traffic Law Enforcement across the EU - Tackling the Three Main Killers on Europe’s Roads; Brussels, Belgium, 2011; 8. Copernicus Climate Change Service Climate data for the European energy sector from 1979 to 2016 derived from ERA-Interim Available online: https://cds.climate.copernicus.eu/cdsapp#!/dataset/sis-european-energy-sector?tab=overview (accessed on Apr 29, 2021). 9. Klipp, S.; Eichel, K.; Billard, A.; Chalika, E.; Loranc, M.D.; Farrugia, B.; Jost, G.; Møller, M.; Munnelly, M.; Kallberg, V.P.; et al. European Demerit Point Systems : Overview of their main features and expert opinions. EU BestPoint-Project 2011, 1–237. 10. Ministerstvo dopravy Serie: Ročenka dopravy; Ročenka dopravy; Centrum dopravního výzkumu: Prague, Czech Republic; 11. Bundesministerium für Verkehr und digitale Infrastruktur Verkehr in Zahlen 2003/2004; Hamburg, Germany, 2004; ISBN 3871542946. 12. Bundesministerium für Verkehr und digitale Infrastruktur Verkehr in Zahlen 2018/2019. In Verkehrsdynamik; Flensburg, Germany, 2018 ISBN 9783000612947. 13. Ministerie van Infrastructuur en Waterstaat Rijksjaarverslag 2018 a Infrastructuurfonds; The Hague, Netherlands, 2019; ISBN 0921-7371. 14. Ministerie van Infrastructuur en Milieu Rijksjaarverslag 2014 a Infrastructuurfonds; The Hague, Netherlands, 2015; ISBN 0921- 7371. 15. Ministério da Economia e Transição Digital Base de Dados de Infraestruturas - GEE Available online: https://www.gee.gov.pt/pt/publicacoes/indicadores-e-estatisticas/base-de-dados-de-infraestruturas (accessed on Apr 29, 2021). 16. Ministerio de Fomento. Dirección General de Programación Económica y Presupuestos. Subdirección General de Estudios Económicos y Estadísticas Serie: Anuario estadístico; NIPO 161-13-171-0; Centro de Publicaciones. Secretaría General Técnica. Ministerio de Fomento: Madrid, Spain; 17. Trafikverket The Swedish Transport Administration Annual report: 2017; 2018; ISBN 978-91-7725-272-6. 18. Ministère de l’Équipement, du T. et de la M. Mémento de statistiques des transports 2003; Ministère de l’environnement de l’énergie et de la mer, 2005; 19. Ministero delle Infrastrutture e dei Trasporti Conto Nazionale delle Infrastrutture e dei Trasporti Anno 2000; Istituto Poligrafico e Zecca dello Stato: Roma, Italy, 2001; 20. Ministero delle Infrastrutture e dei Trasporti Conto nazionale dei trasporti 1999. 2000. 21. Generale, D.; Informativi, S. delle Infrastrutture e dei Trasporti Anno 2004. 22. Ministero delle Infrastrutture e dei Trasporti Conto Nazionale delle Infrastrutture e dei Trasporti Anno 2001; 2002; 23. Ministero delle Infrastrutture e dei
Netflix Movies and TV Shows Dataset Cleaned(excel)
kaggle.com
Updated Apr 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gaurav Tawri (2025). Netflix Movies and TV Shows Dataset Cleaned(excel) [Dataset]. https://www.kaggle.com/datasets/gauravtawri/netflix-movies-and-tv-shows-dataset-cleanedexcel
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 8, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Gaurav Tawri
Description
This dataset is a cleaned and preprocessed version of the original Netflix Movies and TV Shows dataset available on Kaggle. All cleaning was done using Microsoft Excel — no programming involved.

🎯 What’s Included: - Cleaned Excel file (standardized columns, proper date format, removed duplicates/missing values) - A separate "formulas_used.txt" file listing all Excel formulas used during cleaning (e.g., TRIM, CLEAN, DATE, SUBSTITUTE, TEXTJOIN, etc.) - Columns like 'date_added' have been properly formatted into DMY structure - Multi-valued columns like 'listed_in' are split for better analysis - Null values replaced with “Unknown” for clarity - Duration field broken into numeric + unit components

🔍 Dataset Purpose: Ideal for beginners and analysts who want to: - Practice data cleaning in Excel - Explore Netflix content trends - Analyze content by type, country, genre, or date added

📁 Original Dataset Credit: The base version was originally published by Shivam Bansal on Kaggle: https://www.kaggle.com/shivamb/netflix-shows

📌 Bonus: You can find a step-by-step cleaning guide and the same dataset on GitHub as well — along with screenshots and formulas documentation.
m
Cross Regional Eucalyptus Growth and Environmental Data
data.mendeley.com
Updated Oct 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christopher Erasmus (2024). Cross Regional Eucalyptus Growth and Environmental Data [Dataset]. http://doi.org/10.17632/2m9rcy3dr9.3
Explore at:
Unique identifier
https://doi.org/10.17632/2m9rcy3dr9.3
Dataset updated
Oct 7, 2024
Authors
Christopher Erasmus
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset is provided in a single .xlsx file named "eucalyptus_growth_environment_data_V2.xlsx" and consists of fifteen sheets:

Codebook: This sheet details the index, values, and descriptions for each field within the dataset, providing a comprehensive guide to understanding the data structure.

ALL NODES: Contains measurements from all devices, totalling 102,916 data points. This sheet aggregates the data across all nodes.

GWD1 to GWD10: These subset sheets include measurements from individual nodes, labelled according to the abbreviation “Generic Wireless Dendrometer” followed by device IDs 1 through 10. Each sheet corresponds to a specific node, representing measurements from ten trees (or nodes).

Metadata: Provides detailed metadata for each node, including species, initial diameter, location, measurement frequency, battery specifications, and irrigation status. This information is essential for identifying and differentiating the nodes and their specific attributes.

Missing Data Intervals: Details gaps in the data stream, including start and end dates and times when data was not uploaded. It includes information on the total duration of each missing interval and the number of missing data points.

Missing Intervals Distribution: Offers a summary of missing data intervals and their distribution, providing insight into data gaps and reasons for missing data.

All nodes utilize LoRaWAN for data transmission. Please note that intermittent data gaps may occur due to connectivity issues between the gateway and the nodes, as well as maintenance activities or experimental procedures.

Software considerations: The provided R code named “Simple_Dendro_Imputation_and_Analysis.R” is a comprehensive analysis workflow that processes and analyses Eucalyptus growth and environmental data from the "eucalyptus_growth_environment_data_V2.xlsx" dataset. The script begins by loading necessary libraries, setting the working directory, and reading the data from the specified Excel sheet. It then combines date and time information into a unified DateTime format and performs data type conversions for relevant columns. The analysis focuses on a specified device, allowing for the selection of neighbouring devices for imputation of missing data. A loop checks for gaps in the time series and fills in missing intervals based on a defined threshold, followed by a function that imputes missing values using the average from nearby devices. Outliers are identified and managed through linear interpolation. The code further calculates vapor pressure metrics and applies temperature corrections to the dendrometer data. Finally, it saves the cleaned and processed data into a new Excel file while conducting dendrometer analysis using the dendRoAnalyst package, which includes visualizations and calculations of daily growth metrics and correlations with environmental factors such as vapour pressure deficit (VPD).
g
Soil Temperature Station Data from Permafrost Regions of Russia (Selection...
data.globalchange.gov
data.wu.ac.at
Updated Feb 17, 2011
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2011). Soil Temperature Station Data from Permafrost Regions of Russia (Selection of Five Stations), 1880s - 2000 [Dataset]. https://data.globalchange.gov/dataset/nsidc-g02189
Explore at:
Dataset updated
Feb 17, 2011
Description
This data set includes soil temperature data from boreholes located at five stations in Russia: Yakutsk, Verkhoyansk, Pokrovsk, Isit', and Churapcha. The data have been compiled into five Microsoft Excel files, one for each station. Each Excel file contains three worksheets:

G02189info worksheet: Contains the same content in each Excel file - lat/lon info and notes on the stations

Jan soil & surface temp worksheet: Contains winter (January) soil temperature and air temperature (except for the Churapcha Excel file that only contains soil temperature - air temperature was not available)

Jul soil & surface temp worksheet: Contains summer (July) soil temperature and air temperature (except for the Churapcha Excel file)

There are two different versions of the Excel files: a complete version and a subsetted version. Both versions exist for each of the five stations for a total of 10 files. The complete versions of the files reside in the directory called complete and have the word full in their filename. These files contain borehole temperature data at all available standard depths: 0.2 m, 0.4 m, 0.6 m, 0.8 m, 1.2 m, 1.6 m, 2.0 m, 2.4 m, and 3.2 m. The subsetted versions of the files reside in the subset directory and have subset in their filename. These files contain data from the 0.8 m and 3.2 m depths only. Missing data are indicated by the value -999.0. The complete version is more applicable to scientific investigation. The subset version is provided for K-12 teachers and is featured in a classroom activity called "How Permanent is Permafrost?" We have included air temperature measured at these five stations when it is available. There are two sources for the surface air temperature data: NCAR World Monthly Surface Station Climatology, 1738-cont and NOAA Global Historical Climatology Network (GHCN) Monthly data set. These two sources both draw on the same single original source: data from the World Meteorological Organization (WMO) station network. The complete files have data from one or both sources, while the subset files only include data from the source with the most complete record. These data are being offered as is. NOAA@NSIDC believes these data to be of value but is unable to research and document these data as we do most data sets we publish.
S
Data from a Chinese measurement tool for the accessibility of death-related...
scidb.cn
Updated Sep 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
chen xin yu (2025). Data from a Chinese measurement tool for the accessibility of death-related thoughts [Dataset]. http://doi.org/10.57760/sciencedb.psych.00724
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.psych.00724
Dataset updated
Sep 20, 2025
Dataset provided by
Science Data Bank
Authors
chen xin yu
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The dataset was generated from a laboratory experiment based on the dot-matrix integration paradigm, designed to measure death thought accessibility (DTA). The study was conducted under controlled conditions, with participants tested individually in a quiet, dimly lit room. Stimulus presentation and response collection were implemented using PsychoPy (exact version number provided in the supplementary materials), and reaction times were recorded via a standard USB keyboard. Experimental stimuli consisted of five categories of two-character Chinese words rendered in dot-matrix form: death-related words, metaphorical-death words, positive words, neutral words, and meaningless words. Stimuli were centrally displayed on the screen, with presentation durations and inter-stimulus intervals (ISI) precisely controlled at the millisecond level.Data collection took place in spring 2025, with a total of 39 participants contributing approximately 16,699 valid trials. Each trial-level record includes participant ID, priming condition (0 = neutral priming, 1 = mortality salience priming), word type, inter-stimulus interval (in milliseconds), reaction time (in milliseconds), and recognition accuracy (0 = incorrect, 1 = correct). In the dataset, rows correspond to single trials and columns represent experimental variables. Reaction times were measured in milliseconds and later log-transformed for statistical analyses to reduce skewness. Accuracy was coded as a binary variable indicating correct recognition.Data preprocessing included the removal of extreme reaction times (less than 150 ms or greater than 3000 ms). Only trials with valid responses were retained for analysis. Missing data were minimal (<1% of all trials), primarily due to occasional non-responses by participants, and are explicitly marked in the dataset. Potential sources of error include natural individual variability in reaction times and minor recording fluctuations from input devices, which are within the millisecond range and do not affect overall patterns.The data files are stored in Excel format (.xlsx), with each participant’s data saved in a separate file named according to the participant ID. Within each file, the first row contains variable names, and subsequent rows record trial-level observations, allowing for straightforward data access and processing. Excel files are compatible with a wide range of statistical software, including R, Python, SPSS, and Microsoft Excel, and no additional software is required to open them. A supplementary documentation file accompanies the dataset, providing detailed explanations of all variables and data processing steps. A complete codebook of variable definitions is included in the appendix to facilitate data interpretation and ensure reproducibility of the analyses.
Z
A dataset from a survey investigating disciplinary differences in data...
nde-dev.biothings.io
data.niaid.nih.gov
+1more
Updated Jul 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gregory, Kathleen (2024). A dataset from a survey investigating disciplinary differences in data citation [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_7555362
Explore at:
Dataset updated
Jul 12, 2024
Dataset provided by
Gregory, Kathleen
Ninkov, Anton Boudreau
Haustein, Stefanie
Ripp, Chantal
Peters, Isabella
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
GENERAL INFORMATION

Title of Dataset: A dataset from a survey investigating disciplinary differences in data citation

Date of data collection: January to March 2022

Collection instrument: SurveyMonkey

Funding: Alfred P. Sloan Foundation

SHARING/ACCESS INFORMATION

Licenses/restrictions placed on the data: These data are available under a CC BY 4.0 license

Links to publications that cite or use the data:

Gregory, K., Ninkov, A., Ripp, C., Peters, I., & Haustein, S. (2022). Surveying practices of data citation and reuse across disciplines. Proceedings of the 26th International Conference on Science and Technology Indicators. International Conference on Science and Technology Indicators, Granada, Spain. https://doi.org/10.5281/ZENODO.6951437

Gregory, K., Ninkov, A., Ripp, C., Roblin, E., Peters, I., & Haustein, S. (2023). Tracing data: A survey investigating disciplinary differences in data citation. Zenodo. https://doi.org/10.5281/zenodo.7555266

DATA & FILE OVERVIEW

File List

Filename: MDCDatacitationReuse2021Codebookv2.pdf Codebook

Filename: MDCDataCitationReuse2021surveydatav2.csv Dataset format in csv

Filename: MDCDataCitationReuse2021surveydatav2.sav Dataset format in SPSS

Filename: MDCDataCitationReuseSurvey2021QNR.pdf Questionnaire

Additional related data collected that was not included in the current data package: Open ended questions asked to respondents

METHODOLOGICAL INFORMATION

Description of methods used for collection/generation of data:

The development of the questionnaire (Gregory et al., 2022) was centered around the creation of two main branches of questions for the primary groups of interest in our study: researchers that reuse data (33 questions in total) and researchers that do not reuse data (16 questions in total). The population of interest for this survey consists of researchers from all disciplines and countries, sampled from the corresponding authors of papers indexed in the Web of Science (WoS) between 2016 and 2020.

Received 3,632 responses, 2,509 of which were completed, representing a completion rate of 68.6%. Incomplete responses were excluded from the dataset. The final total contains 2,492 complete responses and an uncorrected response rate of 1.57%. Controlling for invalid emails, bounced emails and opt-outs (n=5,201) produced a response rate of 1.62%, similar to surveys using comparable recruitment methods (Gregory et al., 2020).

Methods for processing the data:

Results were downloaded from SurveyMonkey in CSV format and were prepared for analysis using Excel and SPSS by recoding ordinal and multiple choice questions and by removing missing values.

Instrument- or software-specific information needed to interpret the data:

The dataset is provided in SPSS format, which requires IBM SPSS Statistics. The dataset is also available in a coded format in CSV. The Codebook is required to interpret to values.

DATA-SPECIFIC INFORMATION FOR: MDCDataCitationReuse2021surveydata

Number of variables: 95

Number of cases/rows: 2,492

Missing data codes: 999 Not asked

Refer to MDCDatacitationReuse2021Codebook.pdf for detailed variable information.
d
Dataset for "Cognitive behavioural therapy self-help intervention...
datasets.ai
data.niaid.nih.gov
+2more
0
Updated Sep 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
EU Open Research Repository (2022). Dataset for "Cognitive behavioural therapy self-help intervention preferences among informal caregivers of adults with chronic kidney disease: an online cross-sectional survey" [Dataset]. https://datasets.ai/datasets/oai-zenodo-org-7104638
Explore at:
0Available download formats
Dataset updated
Sep 21, 2022
Dataset authored and provided by
EU Open Research Repository
Description
Data and R code used for the analysis of data for the publication: Coumoundouros et al., Cognitive behavioural therapy self-help intervention preferences among informal caregivers of adults with chronic kidney disease: an online cross-sectional survey. BMC Nephrology Summary of study An online cross-sectional survey for informal caregivers (e.g. family and friends) of people living with chronic kidney disease in the United Kingdom. Study aimed to examine informal caregivers' cognitive behavioural therapy self-help intervention preferences, and describe the caregiving situation (e.g. types of care activities) and informal caregiver's mental health (depression, anxiety and stress symptoms). Participants were eligible to participate if they were at least 18 years old, lived in the United Kingdom, and provided unpaid care to someone living with chronic kidney disease who was at least 18 years old. The online survey included questions regarding (1) informal caregiver's characteristics; (2) care recipient's characteristics; (3) intervention preferences (e.g. content, delivery format); and (4) informal caregiver's mental health. Informal caregiver's mental health was assessed using the 21 item Depression, Anxiety, and Stress Scale (DASS-21), which is composed of three subscales measuring depression, anxiety, and stress, respectively. Sixty-five individuals participated in the survey. See the published article for full study details. Description of uploaded files 1. ENTWINE_ESR14_Kidney Carer Survey Data_FULL_2022-08-30: Excel file with the complete, raw survey data. Note: the first half of participant's postal codes was collected, however this data was removed from the uploaded dataset to ensure participant anonymity. 2. ENTWINE_ESR14_Kidney Carer Survey Data_Clean DASS-21 Data_2022-08-30: Excel file with cleaned data for the DASS-21 scale. Data cleaning involved imputation of missing data if participants were missing data for one item within a subscale of the DASS-21. Missing values were imputed by finding the mean of all other items within the relevant subscale. 3. ENTWINE_ESR14_Kidney Carer Survey_KEY_2022-08-30: Excel file with key linking item labels in uploaded datasets with the corresponding survey question. 4. R Code for Kidney Carer Survey_2022-08-30: R file of R code used to analyse survey data. 5. R code for Kidney Carer Survey_PDF_2022-08-30: PDF file of R code used to analyse survey data.
S
A survey of college students' psychological dependence on AIGC
scidb.cn
Updated Nov 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
xu huan (2025). A survey of college students' psychological dependence on AIGC [Dataset]. http://doi.org/10.57760/sciencedb.31471
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.31471
Dataset updated
Nov 12, 2025
Dataset provided by
Science Data Bank
Authors
xu huan
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This dataset is derived from a questionnaire survey on the psychological dependence of college students on generative AI software. The data was collected in the form of an online questionnaire from January 24, 2025 to February 6, 2025, covering college students from multiple universities in Yunnan Province. The questionnaire design includes multiple dimensions such as basic information, usage behavior, psychological dependence, negative emotional experience, self-efficacy, etc., with a total of 1110 valid sample records. The data has been anonymized and does not contain any personal identification information. All responses were filled out by the participants themselves. In the data file, each row represents the complete answer of a respondent, and column labels include serial number, gender, grade level, major category, whether generative AI has been used, commonly used software types, frequency of use, start time, motivation for use, impact on learning efficiency, recommendation intention, attitude towards prohibition of use, future use intention, level of trust in AI, dependency behavior, anxiety and emotional reactions, self-efficacy, and other aspects. Some of the questions in the questionnaire were scored with the Likert five point scale, and some were Single choice question or multiple choice questions. Some questions, such as "Have you used Generative AI before?", are automatically skipped if not used, resulting in a missing value of "0" in the corresponding column, which is a reasonable loss in design logic. There may be self-report bias in the data collection process, and some questions involve subjective evaluations of psychological states, resulting in certain subjective errors. In the data processing stage, preliminary cleaning has been carried out for issues such as outliers and duplicate submissions to ensure the validity and consistency of the data. The data file is in Excel format (. xlsx) and can be opened and processed using common spreadsheet software such as Microsoft Excel, WPS spreadsheets, Google Sheets, etc. This dataset is suitable for empirical research in fields such as educational technology, psychology, and information behavior, especially for exploring the psychological and behavioral characteristics of college students during their interaction with generative AI.
f
UC_vs_US Statistic Analysis.xlsx
figshare.com
xlsx
Updated Jul 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.23644/uu.12631628.v1
Dataset updated
Jul 9, 2020
Dataset provided by
Utrecht University
Authors
F. (Fabiano) Dalpiaz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

Tagging scheme: Aligned (AL) - A concept is represented as a class in both models, either

with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

All the calculations and information provided in the following sheets

originate from that raw data.

Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,

including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

Sheet 3 (Size-Ratio):

The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

Sheet 4 (Overall):

Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

For sheet 4 as well as for the following four sheets, diverging stacked bar

charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

Sheet 5 (By-Notation):

Model correctness and model completeness is compared by notation - UC, US.

Sheet 6 (By-Case):

Model correctness and model completeness is compared by case - SIM, HOS, IFA.

Sheet 7 (By-Process):

Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

Sheet 8 (By-Grade):

Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
SPORTS_DATA_ANALYSIS_ON_EXCEL
kaggle.com
zip
Updated Dec 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nil kamal Saha (2024). SPORTS_DATA_ANALYSIS_ON_EXCEL [Dataset]. https://www.kaggle.com/datasets/nilkamalsaha/sports-data-analysis-on-excel
Explore at:
zip(1203633 bytes)Available download formats
Dataset updated
Dec 12, 2024
Authors
Nil kamal Saha
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
PROJECT OBJECTIVE

We are a part of XYZ Co Pvt Ltd company who is in the business of organizing the sports events at international level. Countries nominate sportsmen from different departments and our team has been given the responsibility to systematize the membership roster and generate different reports as per business requirements.

Questions (KPIs)

TASK 1: STANDARDIZING THE DATASET

Populate the FULLNAME consisting of the following fields ONLY, in the prescribed format: PREFIX FIRSTNAME LASTNAME.{Note: All UPPERCASE)

Get the COUNTRY NAME to which these sportsmen belong to. Make use of LOCATION sheet to get the required data

Populate the LANGUAGE_!poken by the sportsmen. Make use of LOCTION sheet to get the required data

Generate the EMAIL ADDRESS for those members, who speak English, in the prescribed format :lastname.firstnamel@xyz .org {Note: All lowercase) and for all other members, format should be lastname.firstname@xyz.com (Note: All lowercase)

Populate the SPORT LOCATION of the sport played by each player. Make use of SPORT sheet to get the required data

TASK 2: DATA FORMATING

Display MEMBER IDas always 3 digit number {Note: 001,002 ...,D2D,..etc)

Format the BIRTHDATE as dd mmm'yyyy (Prescribed format example: 09 May' 1986)

Display the units for the WEIGHT column (Prescribed format example: 80 kg)

Format the SALARY to show the data In thousands. If SALARY is less than 100,000 then display data with 2 decimal places else display data with one decimal place. In both cases units should be thousands (k) e.g. 87670 -> 87.67 k and 12 250 -> 123.2 k

TASK 3: SUMMARIZE DATA - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1) • Create a PIVOT table in the worksheet ANALYSIS, starting at cell B3,with the following details:

In COLUMNS; Group : GENDER.

In ROWS; Group : COUNTRY (Note: use COUNTRY NAMES).

In VALUES; calculate the count of candidates from each COUNTRY and GENDER type, Remove GRAND TOTALs.

TASK 4: SUMMARIZE DATA - EXCEL FUNCTIONS (Use SPORTSMEN worksheet after attempting TASK 1)

• Create a SUMMARY table in the worksheet ANALYSIS,starting at cell G4, with the following details:

Starting from range RANGE H4; get the distinct GENDER. Use remove duplicates option and transpose the data.

Starting from range RANGE GS; get the distinct COUNTRY (Note: use COUNTRY NAMES).

In the cross table,get the count of candidates from each COUNTRY and GENDER type.

TASK 5: GENERATE REPORT - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1)

• Create a PIVOT table report in the worksheet REPORT, starting at cell A3, with the following information:

Change the report layout to TABULAR form.

Remove expand and collapse buttons.

Remove GRAND TOTALs.

Allow user to filter the data by SPORT LOCATION.

Process

Verify data for any missing values and anomalies, and sort out the same.

Made sure data is consistent and clean with respect to data type, data format and values used.

Created pivot tables according to the questions asked.
Z
Help Me study! Music Listening Habits While Studying (Dataset)
data.niaid.nih.gov
Updated Apr 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cheah, Yiting (2024). Help Me study! Music Listening Habits While Studying (Dataset) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10085103
Explore at:
Dataset updated
Apr 4, 2024
Dataset provided by
University of Liverpool
Authors
Cheah, Yiting
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains the raw data used for a research study that examined university students' music listening habits while studying. There are two experiments in this research study. Experiment 1 is a retrospective survey, and Experiment 2 is a mobile experience sampling research study. This repository contains five Microsoft Excel files with data obtained from both experiments. The files are as follows:

onlineSurvey_raw_data.xlsx esm_raw_data.xlsx esm_music_features_analysis.xlsx esm_demographics.xlsx index.xlsx Files Description File: onlineSurvey_raw_data.xlsx This file contains the raw data from Experiment 1, including the (anonymised) demographic information of the sample. The sample characteristics recorded are:

studentship area of study country of study type of accommodation a participant was living in age self-identified gender language ability (mono- or bi-/multilingual) (various) personality traits (various) musicianship (various) everyday music uses (various) music capacity The file also contains raw data of responses to the questions about participants' music listening habits while studying in real life. These pieces of data are:

likelihood of listening to specific (rated across 23) music genres while studying and during everyday listening. likelihood of listening to music with specific acoustic features (e.g., with/without lyrics, loud/soft, fast/slow) music genres while studying and during everyday listening. general likelihood of listening to music while studying in real life. (verbatim) responses to participants' written responses to the open-ended questions about their real-life music listening habits while studying. File: esm_raw_data.xlsx This file contains the raw data from Experiment 2, including the following variables:

information of the music tracks (track name, artist name, and if available, Spotify ID of those tracks) each participant was listening to during each music episode (both while studying and during everyday-listening) level of arousal at the onset of music playing and the end of the 30-minute study period level of valence at the onset of music playing and the end of the 30-minute study period specific mood at the onset of music playing and the end of the 30-minute study period whether participants were studying their location at that moment (if studying) whether they were studying alone (if studying) the types of study tasks (if studying) the perceived level of difficulty of the study task whether participants were planning to listen to music while studying (various) reasons for music listening (various) perceived positive and negative impacts of studying with music Each row represents the data for a single participant. Rows with a record of a participant ID but no associated data indicate that the participant did not respond to the questionnaire (i.e., missing data). File: esm_music_features_analysis.xlsx This file presents the music features of each recorded music track during both the study-episodes and the everyday-episodes (retrieved from Spotify's "Get Track's Audio Features" API). These features are:

energy level loudness valence tempo mode The contextual details of the moments each track was being played are also presented here, which include:

whether the participant was studying their location (e.g., at home, cafe, university) whether they were studying alone the type of study tasks they were engaging with (e.g., reading, writing) the perceived difficulty level of the task File: esm_demographics.xlsx This file contains the demographics of the sample in Experiment 2 (N = 10), which are the same as in Experiment 1 (see above). Each row represents the data for a single participant. Rows with a record of a participant ID but no associated demographic data indicate that the participant did not respond to the questionnaire (i.e., missing data). File: index.xlsx Finally, this file contains all the abbreviations used in each document as well as their explanations.

Retail Store Sales: Dirty for Data Cleaning

kaggle.com

zip

Updated Jan 18, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Ahmed Mohamed (2025). Retail Store Sales: Dirty for Data Cleaning [Dataset]. https://www.kaggle.com/datasets/ahmedmohamed2003/retail-store-sales-dirty-for-data-cleaning

Explore at:

zip(226740 bytes)Available download formats

Dataset updated

Jan 18, 2025

Authors

Ahmed Mohamed

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Dirty Retail Store Sales Dataset

Overview

The Dirty Retail Store Sales dataset contains 12,575 rows of synthetic data representing sales transactions from a retail store. The dataset includes eight product categories with 25 items per category, each having static prices. It is designed to simulate real-world sales data, including intentional "dirtiness" such as missing or inconsistent values. This dataset is suitable for practicing data cleaning, exploratory data analysis (EDA), and feature engineering.

File Information

File Name: retail_store_sales.csv
Number of Rows: 12,575
Number of Columns: 11

Columns Description

Column Name	Description	Example Values
`Transaction ID`	A unique identifier for each transaction. Always present and unique.	`TXN_1234567`
`Customer ID`	A unique identifier for each customer. 25 unique customers.	`CUST_01`
`Category`	The category of the purchased item.	`Food`, `Furniture`
`Item`	The name of the purchased item. May contain missing values or `None`.	`Item_1_FOOD`, `None`
`Price Per Unit`	The static price of a single unit of the item. May contain missing or `None` values.	`4.00`, `None`
`Quantity`	The quantity of the item purchased. May contain missing or `None` values.	`1`, `None`
`Total Spent`	The total amount spent on the transaction. Calculated as `Quantity * Price Per Unit`.	`8.00`, `None`
`Payment Method`	The method of payment used. May contain missing or invalid values.	`Cash`, `Credit Card`
`Location`	The location where the transaction occurred. May contain missing or invalid values.	`In-store`, `Online`
`Transaction Date`	The date of the transaction. Always present and valid.	`2023-01-15`
`Discount Applied`	Indicates if a discount was applied to the transaction. May contain missing values.	`True`, `False`, `None`

Categories and Items

The dataset includes the following categories, each containing 25 items with corresponding codes, names, and static prices:

Electric Household Essentials

Item Code	Item Name	Price
Item_1_EHE	Blender	5.0
Item_2_EHE	Microwave	6.5
Item_3_EHE	Toaster	8.0
Item_4_EHE	Vacuum Cleaner	9.5
Item_5_EHE	Air Purifier	11.0
Item_6_EHE	Electric Kettle	12.5
Item_7_EHE	Rice Cooker	14.0
Item_8_EHE	Iron	15.5
Item_9_EHE	Ceiling Fan	17.0
Item_10_EHE	Table Fan	18.5
Item_11_EHE	Hair Dryer	20.0
Item_12_EHE	Heater	21.5
Item_13_EHE	Humidifier	23.0
Item_14_EHE	Dehumidifier	24.5
Item_15_EHE	Coffee Maker	26.0
Item_16_EHE	Portable AC	27.5
Item_17_EHE	Electric Stove	29.0
Item_18_EHE	Pressure Cooker	30.5
Item_19_EHE	Induction Cooktop	32.0
Item_20_EHE	Water Dispenser	33.5
Item_21_EHE	Hand Blender	35.0
Item_22_EHE	Mixer Grinder	36.5
Item_23_EHE	Sandwich Maker	38.0
Item_24_EHE	Air Fryer	39.5
Item_25_EHE	Juicer	41.0

Furniture

Item Code	Item Name	Price
Item_1_FUR	Office Chair	5.0
Item_2_FUR	Sofa	6.5
Item_3_FUR	Coffee Table	8.0
Item_4_FUR	Dining Table	9.5
Item_5_FUR	Bookshelf	11.0
Item_6_FUR	Bed F...

bikes_data_cleaned_02.2022_01.2023
kaggle.com
zip
Updated Feb 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mazapán Lindsey (2023). bikes_data_cleaned_02.2022_01.2023 [Dataset]. https://www.kaggle.com/datasets/mazapnlindsey/bikes-data-cleaned-022022-012023
Explore at:
zip(745510749 bytes)Available download formats
Dataset updated
Feb 26, 2023
Authors
Mazapán Lindsey
Description
From the Google Data Analytics Certificate course, case study 1: Beginning in 2016, the fictional company Cyclistic launched a successful bike-share offering. The program has grown to geotracked 5,824 bicycles and 692 docking stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.

This dataset contains cleaned data for the 12 month period 02/2022 - 01/2023. The cleaning process is as follows, also documented within the "How a Bike-Share Company Navigates Speedy Success" notebook:

downloaded zip files

opened in an excel workbook, 12 sheets, one for each csv file

formatted cells - made dates (started_at and ended_at columns) into mm/dd/yyyy hh:mm format in excel for each sheet

made sure there were no duplicates in the ride_id column by putting conditional formatting on for duplicate cell values

created column named ride_length. Formula is ended_at - started_at, formatted it as hh:mm:ss

created column named day_of_week. Formula is =WEEKDAY() based on started_at. 1=Sunday 7=Saturday. Verified the first row of each sheet was accurate via Google

noted there were missing values for columns start_station_name, end_station_name, and end_station_id. Determined since there were longitude and latitude values for each of these (columns start_lat, start_lng, end_lat, end_lng) that the missing data was not detrimental to this analysis
Data from: Student Academic Performance Dataset
kaggle.com
Updated Oct 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hackathon data (2025). Student Academic Performance Dataset [Dataset]. https://www.kaggle.com/datasets/aryancodes12fyds/student-academic-performance-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 6, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Hackathon data
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
📘 Description

The Student Academic Performance Dataset contains detailed academic and lifestyle information of 250 students, created to analyze how various factors — such as study hours, sleep, attendance, stress, and social media usage — influence their overall academic outcomes and GPA.

This dataset is synthetic but realistic, carefully generated to reflect believable academic patterns and relationships. It’s perfect for learning data analysis, statistics, and visualization using Excel, Python, or R.

The data includes 12 attributes, primarily numerical, ensuring that it’s suitable for a wide range of analytical tasks — from basic descriptive statistics (mean, median, SD) to correlation and regression analysis.

📊 Key Features

🧮 250 rows and 12 columns

💡 Mostly numerical — great for Excel-based statistical functions

🔍 No missing values — ready for direct use

📈 Balanced and realistic — ideal for clear visualizations and trend analysis

🎯 Suitable for:

Descriptive statistics

Correlation & regression

Data visualization projects

Dashboard creation (Excel, Tableau, Power BI)

💡 Possible Insights to Explore

How do study hours impact GPA?

Is there a relationship between stress levels and performance?

Does social media usage reduce study efficiency?

Do students with higher attendance achieve better grades?

⚙️ Data Generation Details

Each record represents a unique student.

GPA is calculated using a weighted formula based on midterm and final scores.

Relationships are designed to be realistic — for example:

Higher study hours → higher scores and GPA

Higher stress → slightly lower sleep hours

Excessive social media time → reduced academic performance

⚠️ Disclaimer

This dataset is synthetically generated using statistical modeling techniques and does not contain any real student data. It is intended purely for educational, analytical, and research purposes.
Hive Annotation Job Results - Cleaned and Audited
kaggle.com
zip
Updated Apr 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brendan Kelley (2021). Hive Annotation Job Results - Cleaned and Audited [Dataset]. https://www.kaggle.com/brendankelley/hive-annotation-job-results-cleaned-and-audited
Explore at:
zip(471571 bytes)Available download formats
Dataset updated
Apr 28, 2021
Authors
Brendan Kelley
Description
Context

This notebook serves to showcase my problem solving ability, knowledge of the data analysis process, proficiency with Excel and its various tools and functions, as well as my strategic mindset and statistical prowess. This project consist of an auditing prompt provided by Hive Data, a raw Excel data set, a cleaned and audited version of the raw Excel data set, and my description of my thought process and knowledge used during completion of the project. The prompt can be found below:

Hive Data Audit Prompt

The raw data that accompanies the prompt can be found below:

Hive Annotation Job Results - Raw Data

^ These are the tools I was given to complete my task. The rest of the work is entirely my own.

To summarize broadly, my task was to audit the dataset and summarize my process and results. Specifically, I was to create a method for identifying which "jobs" - explained in the prompt above - needed to be rerun based on a set of "background facts," or criteria. The description of my extensive thought process and results can be found below in the Content section.

Content

Brendan Kelley April 23, 2021

Hive Data Audit Prompt Results

This paper explains the auditing process of the “Hive Annotation Job Results” data. It includes the preparation, analysis, visualization, and summary of the data. It is accompanied by the results of the audit in the excel file “Hive Annotation Job Results – Audited”.

Observation

The “Hive Annotation Job Results” data comes in the form of a single excel sheet. It contains 7 columns and 5,001 rows, including column headers. The data includes “file”, “object id”, and the pseudonym for five questions that each client was instructed to answer about their respective table: “tabular”, “semantic”, “definition list”, “header row”, and “header column”. The “file” column includes non-unique (that is, there are multiple instances of the same value in the column) numbers separated by a dash. The “object id” column includes non-unique numbers ranging from 5 to 487539. The columns containing the answers to the five questions include Boolean values - TRUE or FALSE – which depend upon the yes/no worker judgement.

Use of the COUNTIF() function reveals that there are no values other than TRUE or FALSE in any of the five question columns. The VLOOKUP() function reveals that the data does not include any missing values in any of the cells.

Assumptions

Based on the clean state of the data and the guidelines of the Hive Data Audit Prompt, the assumption is that duplicate values in the “file” column are acceptable and should not be removed. Similarly, duplicated values in the “object id” column are acceptable and should not be removed. The data is therefore clean and is ready for analysis/auditing.

Preparation

The purpose of the audit is to analyze the accuracy of the yes/no worker judgement of each question according to the guidelines of the background facts. The background facts are as follows:

• A table that is a definition list should automatically be tabular and also semantic • Semantic tables should automatically be tabular • If a table is NOT tabular, then it is definitely not semantic nor a definition list • A tabular table that has a header row OR header column should definitely be semantic

These background facts serve as instructions for how the answers to the five questions should interact with one another. These facts can be re-written to establish criteria for each question:

For tabular column: - If the table is a definition list, it is also tabular - If the table is semantic, it is also tabular

For semantic column: - If the table is a definition list, it is also semantic - If the table is not tabular, it is not semantic - If the table is tabular and has either a header row or a header column...
Coachella 2024 Artist and Lineup Data
kaggle.com
zip
Updated Apr 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Micah Lawrence (2024). Coachella 2024 Artist and Lineup Data [Dataset]. https://www.kaggle.com/datasets/micahlawrence/coachella-2024-artist-and-lineup-data/data
Explore at:
zip(14507 bytes)Available download formats
Dataset updated
Apr 30, 2024
Authors
Micah Lawrence
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
A dataset of Coachella 2024 artists complete with lineup data, artist data and Spotify artist data.

**Dataset derived from: https://docs.google.com/spreadsheets/d/1m7_Be2CPBGcqt4duMWRHmgomdLK_YjNSNNPfuBhf9Js/edit#gid=1826236554

**Source: Data found on r/coachella from the user natnav_

Data cleaning notes from source:

The stage is NULL for several artists so I refrenced the lineup and filled in the stages that were missing.

The Spotify Listeners data has a Ms and Ks for millions and thousands so I removed those in Excel and appended the appropriate 0s

I also added additional Spotify data sourced from https://songstats.com/platforms/spotify

Gender had the type of artist in parentheses next to it. I'd like to analyze that data so I made a new field called type used text to columns in Excel seperated by space to move over. I added Solo when it was just one gender.

Seperated out Artist to a table with genre and country and implented an artist_id on other tables to allow for easier joining and less repetive data storage

Note that for b2b dj sets, the artists were separated into individual artist lines and are presented on the line up as a seperate artist playing on the stage on that day.

I ignored if certain artists were weekend 1 or weekend 2 only to include all

If an artist dropped out before the lineup announce, they were removed as well (i.e. Tyla)
AIG Actuarial Analyst
kaggle.com
zip
Updated Mar 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ankur kumar (2025). AIG Actuarial Analyst [Dataset]. https://www.kaggle.com/datasets/ankurkumar7078/aig-actuarial-analyst
Explore at:
zip(96850 bytes)Available download formats
Dataset updated
Mar 14, 2025
Authors
Ankur kumar
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Instructions

Analyze the dataset using the claims data resource

Examine the data: Start by thoroughly examining the dataset within the Claims Data resource. Focus on key variables such as claim dates, types of claims, amounts claimed, and additional details about the incidents. Manipulate the data: Derive the missing values in columns F, O, P, and Q. Use hints if needed. This step emphasizes data manipulation, a key component of account pricing analysis. Identify patterns and anomalies: Conduct EDA using the data in the Claims Data resource. Identify patterns, trends, and anomalies. Utilize visual tools such as histograms, scatter plots, and bar charts within Excel to help you visualize and interpret the data. 2. Apply actuarial principles to the data

Risk assessment: Use the actuarial principles you learned in Task 1 to assess the risks associated with the claims data. Calculate key metrics such as claim frequency, severity, and loss ratios based on the data provided. Calculate premiums: Develop a pricing model using experience-based rating. This involves adjusting historical data from the Claims Data resource to project future claims costs, considering factors such as inflation and changes in exposure. 3. Develop comprehensive reports in Excel

Analysis report: Compile your findings: Organize your EDA into a well-structured section within the Excel workbook. This section should include a detailed evaluation of the Marine Liability insurance claims data, visualizations of key findings, and a commentary on observed trends and anomalies. Commentary on risks and uncertainties: Provide a clear commentary on the risks and uncertainties associated with your assessment. Discuss how different scenarios could impact the pricing model and the potential financial implications for Oceanic Shipping Co. Pricing calculation: Perform a numbers-based premium calculation: Use the Claims Data resource to calculate the appropriate premiums for the Marine Liability insurance policy. Apply actuarial principles such as loss frequency, loss severity, and pure premium calculation, and adjust for expenses and profit margins. Sensitivity analysis: Include a sensitivity analysis within the Excel workbook to assess how changes in key assumptions (e.g., an increase in loss severity) could impact the final premium. Document your calculations: Ensure your premium calculation section in Excel clearly documents your methodology, assumptions, and final premium recommendations. Discuss the potential risks and uncertainties in your pricing model, including any external factors that could impact future claims.
Bestseller Book Data
kaggle.com
zip
Updated Mar 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
oyebusola (2024). Bestseller Book Data [Dataset]. https://www.kaggle.com/datasets/oyecrafts/bestseller-book-data
Explore at:
zip(420400 bytes)Available download formats
Dataset updated
Mar 28, 2024
Authors
oyebusola
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset provides valuable insights into the ratings distribution of bestselling books across different categories. With a meticulous categorization of bestsellers based on their user ratings, this dataset offers a comprehensive overview of the popularity and reception of top-selling books. Whether you're interested in exploring highly-rated bestsellers, very highly-rated bestsellers, or moderately rated bestsellers, this dataset empowers you to analyze trends and patterns in the literary world. Leveraging this dataset opens up opportunities for market research, trend analysis, and strategic decision-making for publishers, authors, and book enthusiasts alike.

What questions were asked

What is the distribution of bestseller ratings among the top-selling books?

How many books fall into each category of bestseller ratings (e.g., very highly rated, highly rated, moderately rated)?

Which genres tend to have the highest-rated bestsellers?

Are there any trends or patterns in the ratings of bestsellers over time?

What are the characteristics of highly-rated bestsellers compared to moderately-rated ones?

How do the prices of bestsellers correlate with their ratings?

Can we identify any outliers or anomalies in the dataset that may require further investigation?

Are there any authors who consistently produce highly-rated bestsellers?

How does the number of reviews correlate with the user ratings of bestsellers?

What insights can be gained from comparing the ratings breakdowns across different years or time periods?

What were the tasks completed?

1.Data Cleaning and Manipulation in Excel: Conducted data cleaning and manipulation tasks such as removing duplicates, handling missing values, and formatting data for analysis in Excel.

2.Data Collection from Kaggle: Gathered the initial dataset containing information about bestselling books from Kaggle, a popular platform for datasets.

3.Visualization in Tableau: Created interactive visualizations of the dataset using Tableau, a powerful data visualization tool, to explore and analyze bestseller ratings breakdowns.

4.Reporting on Google Docs: Generated reports and summaries of the findings using Google Docs, a collaborative document editing platform, to communicate insights effectively.

Facebook

Twitter

Click to copy link

Link copied

Cite

Barigozzi, Matteo; Lissona, Claudio (2025). EA-MD-QD: Large Euro Area and Euro Member Countries Datasets for Macroeconomic Research [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10514667

EA-MD-QD: Large Euro Area and Euro Member Countries Datasets for Macroeconomic Research

Explore at:

Dataset updated

Mar 31, 2025

Dataset provided by

University of Bologna

Authors

Barigozzi, Matteo; Lissona, Claudio

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

EA-MD-QD is a collection of large monthly and quarterly EA and EA member countries datasets for macroeconomic analysis.The EA member countries covered are: AT, BE, DE, EL, ES, FR, IE, IT, NL, PT.

The formal reference to this dataset is:

Barigozzi, M. and Lissona, C. (2024) "EA-MD-QD: Large Euro Area and Euro Member Countries Datasets for Macroeconomic Research". Zenodo.

Please refer to it when using the data.

Each zip file contains:- Excel files for the EA and the countries covered, each containing an unbalanced panel of raw de-seasonalized data.- A Matlab code taking as input the raw data and allowing to perform various operations such as:choose the frequency, fill-in missing values, transform data to stationarity, and control for covid outliers.- A pdf file with all informations about the series names, sources, and transformation codes.

This version (03.2025):

Updated data as of 28-March-2025. We improved the matlab code and included a ReadME file containing details on the parameters' choice from the user, which before were only briefly commented in the code.

Clear search

Close search

Google apps

Main menu

EA-MD-QD: Large Euro Area and Euro Member Countries Datasets for...

Independent Data Aggregation, Quality Control and Visualization of...

Data from: DATABASE FOR THE ANALYSIS OF ROAD ACCIDENTS IN EUROPE

Netflix Movies and TV Shows Dataset Cleaned(excel)

Cross Regional Eucalyptus Growth and Environmental Data

Soil Temperature Station Data from Permafrost Regions of Russia (Selection...

Data from a Chinese measurement tool for the accessibility of death-related...

A dataset from a survey investigating disciplinary differences in data...

Dataset for "Cognitive behavioural therapy self-help intervention...

A survey of college students' psychological dependence on AIGC

UC_vs_US Statistic Analysis.xlsx

SPORTS_DATA_ANALYSIS_ON_EXCEL

Help Me study! Music Listening Habits While Studying (Dataset)

Retail Store Sales: Dirty for Data Cleaning

Dirty Retail Store Sales Dataset

Overview

File Information

Columns Description

Categories and Items

Electric Household Essentials

Furniture

bikes_data_cleaned_02.2022_01.2023

Data from: Student Academic Performance Dataset

Hive Annotation Job Results - Cleaned and Audited

Context

Content

Coachella 2024 Artist and Lineup Data

AIG Actuarial Analyst

Bestseller Book Data

What questions were asked

What were the tasks completed?

EA-MD-QD: Large Euro Area and Euro Member Countries Datasets for Macroeconomic ResearchSee More Versions

EA-MD-QD: Large Euro Area and Euro Member Countries Datasets for Macroeconomic Research