24 datasets found
  1. Z

    EA-MD-QD: Large Euro Area and Euro Member Countries Datasets for...

    • data.niaid.nih.gov
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Barigozzi, Matteo; Lissona, Claudio (2025). EA-MD-QD: Large Euro Area and Euro Member Countries Datasets for Macroeconomic Research [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10514667
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    University of Bologna
    Authors
    Barigozzi, Matteo; Lissona, Claudio
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    EA-MD-QD is a collection of large monthly and quarterly EA and EA member countries datasets for macroeconomic analysis.The EA member countries covered are: AT, BE, DE, EL, ES, FR, IE, IT, NL, PT.

    The formal reference to this dataset is:

    Barigozzi, M. and Lissona, C. (2024) "EA-MD-QD: Large Euro Area and Euro Member Countries Datasets for Macroeconomic Research". Zenodo.

    Please refer to it when using the data.

    Each zip file contains:- Excel files for the EA and the countries covered, each containing an unbalanced panel of raw de-seasonalized data.- A Matlab code taking as input the raw data and allowing to perform various operations such as:choose the frequency, fill-in missing values, transform data to stationarity, and control for covid outliers.- A pdf file with all informations about the series names, sources, and transformation codes.

    This version (03.2025):

    Updated data as of 28-March-2025. We improved the matlab code and included a ReadME file containing details on the parameters' choice from the user, which before were only briefly commented in the code.

  2. f

    Independent Data Aggregation, Quality Control and Visualization of...

    • datasetcatalog.nlm.nih.gov
    Updated Oct 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ly, Chun; Knott, Cheryl; McCleary, Jill; Castiello-Gutiérrez, Santiago (2020). Independent Data Aggregation, Quality Control and Visualization of University of Arizona COVID-19 Re-Entry Testing Data [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000484783
    Explore at:
    Dataset updated
    Oct 21, 2020
    Authors
    Ly, Chun; Knott, Cheryl; McCleary, Jill; Castiello-Gutiérrez, Santiago
    Description

    AbstractThe dataset provided here contains the efforts of independent data aggregation, quality control, and visualization of the University of Arizona (UofA) COVID-19 testing programs for the 2019 novel Coronavirus pandemic. The dataset is provided in the form of machine-readable tables in comma-separated value (.csv) and Microsoft Excel (.xlsx) formats.Additional InformationAs part of the UofA response to the 2019-20 Coronavirus pandemic, testing was conducted on students, staff, and faculty prior to start of the academic year and throughout the school year. These testings were done at the UofA Campus Health Center and through their instance program called "Test All Test Smart" (TATS). These tests identify active cases of SARS-nCoV-2 infections using the reverse transcription polymerase chain reaction (RT-PCR) test and the Antigen test. Because the Antigen test provided more rapid diagnosis, it was greatly used three weeks prior to the start of the Fall semester and throughout the academic year.As these tests were occurring, results were provided on the COVID-19 websites. First, beginning in early March, the Campus Health Alerts website reported the total number of positive cases. Later, numbers were provided for the total number of tests (March 12 and thereafter). According to the website, these numbers were updated daily for positive cases and weekly for total tests. These numbers were reported until early September where they were then included in the reporting for the TATS program.For the TATS program, numbers were provided through the UofA COVID-19 Update website. Initially on August 21, the numbers provided were the total number (July 31 and thereafter) of tests and positive cases. Later (August 25), additional information was provided where both PCR and Antigen testings were available. Here, the daily numbers were also included. On September 3, this website then provided both the Campus Health and TATS data. Here, PCR and Antigen were combined and referred to as "Total", and daily and cumulative numbers were provided.At this time, no official data dashboard was available until September 16, and aside from the information provided on these websites, the full dataset was not made publicly available. As such, the authors of this dataset independently aggregated data from multiple sources. These data were made publicly available through a Google Sheet with graphical illustration provided through the spreadsheet and on social media. The goal of providing the data and illustrations publicly was to provide factual information and to understand the infection rate of SARS-nCoV-2 in the UofA community.Because of differences in reported data between Campus Health and the TATS program, the dataset provides Campus Health numbers on September 3 and thereafter. TATS numbers are provided beginning on August 14, 2020.Description of Dataset ContentThe following terms are used in describing the dataset.1. "Report Date" is the date and time in which the website was updated to reflect the new numbers2. "Test Date" is to the date of testing/sample collection3. "Total" is the combination of Campus Health and TATS numbers4. "Daily" is to the new data associated with the Test Date5. "To Date (07/31--)" provides the cumulative numbers from 07/31 and thereafter6. "Sources" provides the source of information. The number prior to the colon refers to the number of sources. Here, "UACU" refers to the UA COVID-19 Update page, and "UARB" refers to the UA Weekly Re-Entry Briefing. "SS" and "WBM" refers to screenshot (manually acquired) and "Wayback Machine" (see Reference section for links) with initials provided to indicate which author recorded the values. These screenshots are available in the records.zip file.The dataset is distinguished where available by the testing program and the methods of testing. Where data are not available, calculations are made to fill in missing data (e.g., extrapolating backwards on the total number of tests based on daily numbers that are deemed reliable). Where errors are found (by comparing to previous numbers), those are reported on the above Google Sheet with specifics noted.For inquiries regarding the contents of this dataset, please contact the Corresponding Author listed in the README.txt file. Administrative inquiries (e.g., removal requests, trouble downloading, etc.) can be directed to data-management@arizona.edu

  3. u

    Data from: DATABASE FOR THE ANALYSIS OF ROAD ACCIDENTS IN EUROPE

    • produccioncientifica.ugr.es
    • data.niaid.nih.gov
    • +1more
    Updated 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Navarro-Moreno, José; De Oña, Juan; Calvo-Poyo, Francisco; Navarro-Moreno, José; De Oña, Juan; Calvo-Poyo, Francisco (2022). DATABASE FOR THE ANALYSIS OF ROAD ACCIDENTS IN EUROPE [Dataset]. https://produccioncientifica.ugr.es/documentos/668fc484b9e7c03b01bdfcfc
    Explore at:
    Dataset updated
    2022
    Authors
    Navarro-Moreno, José; De Oña, Juan; Calvo-Poyo, Francisco; Navarro-Moreno, José; De Oña, Juan; Calvo-Poyo, Francisco
    Area covered
    Europe
    Description

    This database that can be used for macro-level analysis of road accidents on interurban roads in Europe. Through the variables it contains, road accidents can be explained using variables related to economic resources invested in roads, traffic, road network, socioeconomic characteristics, legislative measures and meteorology. This repository contains the data used for the analysis carried out in the papers: 1. Calvo-Poyo F., Navarro-Moreno J., de Oña J. (2020) Road Investment and Traffic Safety: An International Study. Sustainability 12:6332. https://doi.org/10.3390/su12166332 2. Navarro-Moreno J., Calvo-Poyo F., de Oña J. (2022) Influence of road investment and maintenance expenses on injured traffic crashes in European roads. Int J Sustain Transp 1–11. https://doi.org/10.1080/15568318.2022.2082344 3. Navarro-Moreno, J., Calvo-Poyo, F., de Oña, J. (2022) Investment in roads and traffic safety: linked to economic development? A European comparison. Environ. Sci. Pollut. Res. https://doi.org/10.1007/s11356-022-22567 The file with the database is available in excel. DATA SOURCES The database presents data from 1998 up to 2016 from 20 european countries: Austria, Belgium, Croatia, Czechia, Denmark, Estonia, Finland, France, Germany, Ireland, Italy, Latvia, Netherlands, Poland, Portugal, Slovakia, Slovenia, Spain, Sweden and United Kingdom. Crash data were obtained from the United Nations Economic Commission for Europe (UNECE) [2], which offers enough level of disaggregation between crashes occurring inside versus outside built-up areas. With reference to the data on economic resources invested in roadways, deserving mention –given its extensive coverage—is the database of the Organisation for Economic Cooperation and Development (OECD), managed by the International Transport Forum (ITF) [1], which collects data on investment in the construction of roads and expenditure on their maintenance, following the definitions of the United Nations System of National Accounts (2008 SNA). Despite some data gaps, the time series present consistency from one country to the next. Moreover, to confirm the consistency and complete missing data, diverse additional sources, mainly the national Transport Ministries of the respective countries were consulted. All the monetary values were converted to constant prices in 2015 using the OECD price index. To obtain the rest of the variables in the database, as well as to ensure consistency in the time series and complete missing data, the following national and international sources were consulted: Eurostat [3] Directorate-General for Mobility and Transport (DG MOVE). European Union [4] The World Bank [5] World Health Organization (WHO) [6] European Transport Safety Council (ETSC) [7] European Road Safety Observatory (ERSO) [8] European Climatic Energy Mixes (ECEM) of the Copernicus Climate Change [9] EU BestPoint-Project [10] Ministerstvo dopravy, República Checa [11] Bundesministerium für Verkehr und digitale Infrastruktur, Alemania [12] Ministerie van Infrastructuur en Waterstaat, Países Bajos [13] National Statistics Office, Malta [14] Ministério da Economia e Transição Digital, Portugal [15] Ministerio de Fomento, España [16] Trafikverket, Suecia [17] Ministère de l’environnement de l’énergie et de la mer, Francia [18] Ministero delle Infrastrutture e dei Trasporti, Italia [19–25] Statistisk sentralbyrå, Noruega [26-29] Instituto Nacional de Estatística, Portugal [30] Infraestruturas de Portugal S.A., Portugal [31–35] Road Safety Authority (RSA), Ireland [36] DATA BASE DESCRIPTION The database was made trying to combine the longest possible time period with the maximum number of countries with complete dataset (some countries like Lithuania, Luxemburg, Malta and Norway were eliminated from the definitive dataset owing to a lack of data or breaks in the time series of records). Taking into account the above, the definitive database is made up of 19 variables, and contains data from 20 countries during the period between 1998 and 2016. Table 1 shows the coding of the variables, as well as their definition and unit of measure. Table. Database metadata Code Variable and unit fatal_pc_km Fatalities per billion passenger-km fatal_mIn Fatalities per million inhabitants accid_adj_pc_km Accidents per billion passenger-km p_km Billions of passenger-km croad_inv_km Investment in roads construction per kilometer, €/km (2015 constant prices) croad_maint_km Expenditure on roads maintenance per kilometer €/km (2015 constant prices) prop_motorwa Proportion of motorways over the total road network (%) populat Population, in millions of inhabitants unemploy Unemployment rate (%) petro_car Consumption of gasolina and petrol derivatives (tons), per tourism alcohol Alcohol consumption, in liters per capita (age > 15) mot_index Motorization index, in cars per 1,000 inhabitants den_populat Population density, inhabitants/km2 cgdp Gross Domestic Product (GDP), in € (2015 constant prices) cgdp_cap GDP per capita, in € (2015 constant prices) precipit Average depth of rain water during a year (mm) prop_elder Proportion of people over 65 years (%) dps Demerit Point System, dummy variable (0: no; 1: yes) freight Freight transport, in billions of ton-km ACKNOWLEDGEMENTS This database was carried out in the framework of the project “Inversión en carreteras y seguridad vial: un análisis internacional (INCASE)”, financed by: FEDER/Ministerio de Ciencia, Innovación y Universidades–Agencia Estatal de Investigación/Proyecto RTI2018-101770-B-I00, within Spain´s National Program of R+D+i Oriented to Societal Challenges. Moreover, the authors would like to express their gratitude to the Ministry of Transport, Mobility and Urban Agenda of Spain (MITMA), and the Federal Ministry of Transport and Digital Infrastructure of Germany (BMVI) for providing data for this study. REFERENCES 1. International Transport Forum OECD iLibrary | Transport infrastructure investment and maintenance. 2. United Nations Economic Commission for Europe UNECE Statistical Database Available online: https://w3.unece.org/PXWeb2015/pxweb/en/STAT/STAT_40-TRTRANS/?rxid=18ad5d0d-bd5e-476f-ab7c-40545e802eeb (accessed on Apr 28, 2020). 3. European Commission Database - Eurostat Available online: https://ec.europa.eu/eurostat/data/database (accessed on Apr 28, 2021). 4. Directorate-General for Mobility and Transport. European Commission EU Transport in figures - Statistical Pocketbooks Available online: https://ec.europa.eu/transport/facts-fundings/statistics_en (accessed on Apr 28, 2021). 5. World Bank Group World Bank Open Data | Data Available online: https://data.worldbank.org/ (accessed on Apr 30, 2021). 6. World Health Organization (WHO) WHO Global Information System on Alcohol and Health Available online: https://apps.who.int/gho/data/node.main.GISAH?lang=en (accessed on Apr 29, 2021). 7. European Transport Safety Council (ETSC) Traffic Law Enforcement across the EU - Tackling the Three Main Killers on Europe’s Roads; Brussels, Belgium, 2011; 8. Copernicus Climate Change Service Climate data for the European energy sector from 1979 to 2016 derived from ERA-Interim Available online: https://cds.climate.copernicus.eu/cdsapp#!/dataset/sis-european-energy-sector?tab=overview (accessed on Apr 29, 2021). 9. Klipp, S.; Eichel, K.; Billard, A.; Chalika, E.; Loranc, M.D.; Farrugia, B.; Jost, G.; Møller, M.; Munnelly, M.; Kallberg, V.P.; et al. European Demerit Point Systems : Overview of their main features and expert opinions. EU BestPoint-Project 2011, 1–237. 10. Ministerstvo dopravy Serie: Ročenka dopravy; Ročenka dopravy; Centrum dopravního výzkumu: Prague, Czech Republic; 11. Bundesministerium für Verkehr und digitale Infrastruktur Verkehr in Zahlen 2003/2004; Hamburg, Germany, 2004; ISBN 3871542946. 12. Bundesministerium für Verkehr und digitale Infrastruktur Verkehr in Zahlen 2018/2019. In Verkehrsdynamik; Flensburg, Germany, 2018 ISBN 9783000612947. 13. Ministerie van Infrastructuur en Waterstaat Rijksjaarverslag 2018 a Infrastructuurfonds; The Hague, Netherlands, 2019; ISBN 0921-7371. 14. Ministerie van Infrastructuur en Milieu Rijksjaarverslag 2014 a Infrastructuurfonds; The Hague, Netherlands, 2015; ISBN 0921- 7371. 15. Ministério da Economia e Transição Digital Base de Dados de Infraestruturas - GEE Available online: https://www.gee.gov.pt/pt/publicacoes/indicadores-e-estatisticas/base-de-dados-de-infraestruturas (accessed on Apr 29, 2021). 16. Ministerio de Fomento. Dirección General de Programación Económica y Presupuestos. Subdirección General de Estudios Económicos y Estadísticas Serie: Anuario estadístico; NIPO 161-13-171-0; Centro de Publicaciones. Secretaría General Técnica. Ministerio de Fomento: Madrid, Spain; 17. Trafikverket The Swedish Transport Administration Annual report: 2017; 2018; ISBN 978-91-7725-272-6. 18. Ministère de l’Équipement, du T. et de la M. Mémento de statistiques des transports 2003; Ministère de l’environnement de l’énergie et de la mer, 2005; 19. Ministero delle Infrastrutture e dei Trasporti Conto Nazionale delle Infrastrutture e dei Trasporti Anno 2000; Istituto Poligrafico e Zecca dello Stato: Roma, Italy, 2001; 20. Ministero delle Infrastrutture e dei Trasporti Conto nazionale dei trasporti 1999. 2000. 21. Generale, D.; Informativi, S. delle Infrastrutture e dei Trasporti Anno 2004. 22. Ministero delle Infrastrutture e dei Trasporti Conto Nazionale delle Infrastrutture e dei Trasporti Anno 2001; 2002; 23. Ministero delle Infrastrutture e dei

  4. Netflix Movies and TV Shows Dataset Cleaned(excel)

    • kaggle.com
    Updated Apr 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gaurav Tawri (2025). Netflix Movies and TV Shows Dataset Cleaned(excel) [Dataset]. https://www.kaggle.com/datasets/gauravtawri/netflix-movies-and-tv-shows-dataset-cleanedexcel
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 8, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Gaurav Tawri
    Description

    This dataset is a cleaned and preprocessed version of the original Netflix Movies and TV Shows dataset available on Kaggle. All cleaning was done using Microsoft Excel — no programming involved.

    🎯 What’s Included: - Cleaned Excel file (standardized columns, proper date format, removed duplicates/missing values) - A separate "formulas_used.txt" file listing all Excel formulas used during cleaning (e.g., TRIM, CLEAN, DATE, SUBSTITUTE, TEXTJOIN, etc.) - Columns like 'date_added' have been properly formatted into DMY structure - Multi-valued columns like 'listed_in' are split for better analysis - Null values replaced with “Unknown” for clarity - Duration field broken into numeric + unit components

    🔍 Dataset Purpose: Ideal for beginners and analysts who want to: - Practice data cleaning in Excel - Explore Netflix content trends - Analyze content by type, country, genre, or date added

    📁 Original Dataset Credit: The base version was originally published by Shivam Bansal on Kaggle: https://www.kaggle.com/shivamb/netflix-shows

    📌 Bonus: You can find a step-by-step cleaning guide and the same dataset on GitHub as well — along with screenshots and formulas documentation.

  5. m

    Cross Regional Eucalyptus Growth and Environmental Data

    • data.mendeley.com
    Updated Oct 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher Erasmus (2024). Cross Regional Eucalyptus Growth and Environmental Data [Dataset]. http://doi.org/10.17632/2m9rcy3dr9.3
    Explore at:
    Dataset updated
    Oct 7, 2024
    Authors
    Christopher Erasmus
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is provided in a single .xlsx file named "eucalyptus_growth_environment_data_V2.xlsx" and consists of fifteen sheets:

    Codebook: This sheet details the index, values, and descriptions for each field within the dataset, providing a comprehensive guide to understanding the data structure.

    ALL NODES: Contains measurements from all devices, totalling 102,916 data points. This sheet aggregates the data across all nodes.

    GWD1 to GWD10: These subset sheets include measurements from individual nodes, labelled according to the abbreviation “Generic Wireless Dendrometer” followed by device IDs 1 through 10. Each sheet corresponds to a specific node, representing measurements from ten trees (or nodes).

    Metadata: Provides detailed metadata for each node, including species, initial diameter, location, measurement frequency, battery specifications, and irrigation status. This information is essential for identifying and differentiating the nodes and their specific attributes.

    Missing Data Intervals: Details gaps in the data stream, including start and end dates and times when data was not uploaded. It includes information on the total duration of each missing interval and the number of missing data points.

    Missing Intervals Distribution: Offers a summary of missing data intervals and their distribution, providing insight into data gaps and reasons for missing data.

    All nodes utilize LoRaWAN for data transmission. Please note that intermittent data gaps may occur due to connectivity issues between the gateway and the nodes, as well as maintenance activities or experimental procedures.

    Software considerations: The provided R code named “Simple_Dendro_Imputation_and_Analysis.R” is a comprehensive analysis workflow that processes and analyses Eucalyptus growth and environmental data from the "eucalyptus_growth_environment_data_V2.xlsx" dataset. The script begins by loading necessary libraries, setting the working directory, and reading the data from the specified Excel sheet. It then combines date and time information into a unified DateTime format and performs data type conversions for relevant columns. The analysis focuses on a specified device, allowing for the selection of neighbouring devices for imputation of missing data. A loop checks for gaps in the time series and fills in missing intervals based on a defined threshold, followed by a function that imputes missing values using the average from nearby devices. Outliers are identified and managed through linear interpolation. The code further calculates vapor pressure metrics and applies temperature corrections to the dendrometer data. Finally, it saves the cleaned and processed data into a new Excel file while conducting dendrometer analysis using the dendRoAnalyst package, which includes visualizations and calculations of daily growth metrics and correlations with environmental factors such as vapour pressure deficit (VPD).

  6. g

    Soil Temperature Station Data from Permafrost Regions of Russia (Selection...

    • data.globalchange.gov
    • data.wu.ac.at
    Updated Feb 17, 2011
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2011). Soil Temperature Station Data from Permafrost Regions of Russia (Selection of Five Stations), 1880s - 2000 [Dataset]. https://data.globalchange.gov/dataset/nsidc-g02189
    Explore at:
    Dataset updated
    Feb 17, 2011
    Description

    This data set includes soil temperature data from boreholes located at five stations in Russia: Yakutsk, Verkhoyansk, Pokrovsk, Isit', and Churapcha. The data have been compiled into five Microsoft Excel files, one for each station. Each Excel file contains three worksheets:

    • G02189info worksheet: Contains the same content in each Excel file - lat/lon info and notes on the stations
    • Jan soil & surface temp worksheet: Contains winter (January) soil temperature and air temperature (except for the Churapcha Excel file that only contains soil temperature - air temperature was not available)
    • Jul soil & surface temp worksheet: Contains summer (July) soil temperature and air temperature (except for the Churapcha Excel file)
    There are two different versions of the Excel files: a complete version and a subsetted version. Both versions exist for each of the five stations for a total of 10 files. The complete versions of the files reside in the directory called complete and have the word full in their filename. These files contain borehole temperature data at all available standard depths: 0.2 m, 0.4 m, 0.6 m, 0.8 m, 1.2 m, 1.6 m, 2.0 m, 2.4 m, and 3.2 m. The subsetted versions of the files reside in the subset directory and have subset in their filename. These files contain data from the 0.8 m and 3.2 m depths only. Missing data are indicated by the value -999.0. The complete version is more applicable to scientific investigation. The subset version is provided for K-12 teachers and is featured in a classroom activity called "How Permanent is Permafrost?" We have included air temperature measured at these five stations when it is available. There are two sources for the surface air temperature data: NCAR World Monthly Surface Station Climatology, 1738-cont and NOAA Global Historical Climatology Network (GHCN) Monthly data set. These two sources both draw on the same single original source: data from the World Meteorological Organization (WMO) station network. The complete files have data from one or both sources, while the subset files only include data from the source with the most complete record. These data are being offered as is. NOAA@NSIDC believes these data to be of value but is unable to research and document these data as we do most data sets we publish.

  7. S

    Data from a Chinese measurement tool for the accessibility of death-related...

    • scidb.cn
    Updated Sep 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    chen xin yu (2025). Data from a Chinese measurement tool for the accessibility of death-related thoughts [Dataset]. http://doi.org/10.57760/sciencedb.psych.00724
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 20, 2025
    Dataset provided by
    Science Data Bank
    Authors
    chen xin yu
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The dataset was generated from a laboratory experiment based on the dot-matrix integration paradigm, designed to measure death thought accessibility (DTA). The study was conducted under controlled conditions, with participants tested individually in a quiet, dimly lit room. Stimulus presentation and response collection were implemented using PsychoPy (exact version number provided in the supplementary materials), and reaction times were recorded via a standard USB keyboard. Experimental stimuli consisted of five categories of two-character Chinese words rendered in dot-matrix form: death-related words, metaphorical-death words, positive words, neutral words, and meaningless words. Stimuli were centrally displayed on the screen, with presentation durations and inter-stimulus intervals (ISI) precisely controlled at the millisecond level.Data collection took place in spring 2025, with a total of 39 participants contributing approximately 16,699 valid trials. Each trial-level record includes participant ID, priming condition (0 = neutral priming, 1 = mortality salience priming), word type, inter-stimulus interval (in milliseconds), reaction time (in milliseconds), and recognition accuracy (0 = incorrect, 1 = correct). In the dataset, rows correspond to single trials and columns represent experimental variables. Reaction times were measured in milliseconds and later log-transformed for statistical analyses to reduce skewness. Accuracy was coded as a binary variable indicating correct recognition.Data preprocessing included the removal of extreme reaction times (less than 150 ms or greater than 3000 ms). Only trials with valid responses were retained for analysis. Missing data were minimal (<1% of all trials), primarily due to occasional non-responses by participants, and are explicitly marked in the dataset. Potential sources of error include natural individual variability in reaction times and minor recording fluctuations from input devices, which are within the millisecond range and do not affect overall patterns.The data files are stored in Excel format (.xlsx), with each participant’s data saved in a separate file named according to the participant ID. Within each file, the first row contains variable names, and subsequent rows record trial-level observations, allowing for straightforward data access and processing. Excel files are compatible with a wide range of statistical software, including R, Python, SPSS, and Microsoft Excel, and no additional software is required to open them. A supplementary documentation file accompanies the dataset, providing detailed explanations of all variables and data processing steps. A complete codebook of variable definitions is included in the appendix to facilitate data interpretation and ensure reproducibility of the analyses.

  8. Z

    A dataset from a survey investigating disciplinary differences in data...

    • nde-dev.biothings.io
    • data.niaid.nih.gov
    • +1more
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gregory, Kathleen (2024). A dataset from a survey investigating disciplinary differences in data citation [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_7555362
    Explore at:
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Gregory, Kathleen
    Ninkov, Anton Boudreau
    Haustein, Stefanie
    Ripp, Chantal
    Peters, Isabella
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    GENERAL INFORMATION

    Title of Dataset: A dataset from a survey investigating disciplinary differences in data citation

    Date of data collection: January to March 2022

    Collection instrument: SurveyMonkey

    Funding: Alfred P. Sloan Foundation

    SHARING/ACCESS INFORMATION

    Licenses/restrictions placed on the data: These data are available under a CC BY 4.0 license

    Links to publications that cite or use the data:

    Gregory, K., Ninkov, A., Ripp, C., Peters, I., & Haustein, S. (2022). Surveying practices of data citation and reuse across disciplines. Proceedings of the 26th International Conference on Science and Technology Indicators. International Conference on Science and Technology Indicators, Granada, Spain. https://doi.org/10.5281/ZENODO.6951437

    Gregory, K., Ninkov, A., Ripp, C., Roblin, E., Peters, I., & Haustein, S. (2023). Tracing data: A survey investigating disciplinary differences in data citation. Zenodo. https://doi.org/10.5281/zenodo.7555266

    DATA & FILE OVERVIEW

    File List

    Filename: MDCDatacitationReuse2021Codebookv2.pdf Codebook

    Filename: MDCDataCitationReuse2021surveydatav2.csv Dataset format in csv

    Filename: MDCDataCitationReuse2021surveydatav2.sav Dataset format in SPSS

    Filename: MDCDataCitationReuseSurvey2021QNR.pdf Questionnaire

    Additional related data collected that was not included in the current data package: Open ended questions asked to respondents

    METHODOLOGICAL INFORMATION

    Description of methods used for collection/generation of data:

    The development of the questionnaire (Gregory et al., 2022) was centered around the creation of two main branches of questions for the primary groups of interest in our study: researchers that reuse data (33 questions in total) and researchers that do not reuse data (16 questions in total). The population of interest for this survey consists of researchers from all disciplines and countries, sampled from the corresponding authors of papers indexed in the Web of Science (WoS) between 2016 and 2020.

    Received 3,632 responses, 2,509 of which were completed, representing a completion rate of 68.6%. Incomplete responses were excluded from the dataset. The final total contains 2,492 complete responses and an uncorrected response rate of 1.57%. Controlling for invalid emails, bounced emails and opt-outs (n=5,201) produced a response rate of 1.62%, similar to surveys using comparable recruitment methods (Gregory et al., 2020).

    Methods for processing the data:

    Results were downloaded from SurveyMonkey in CSV format and were prepared for analysis using Excel and SPSS by recoding ordinal and multiple choice questions and by removing missing values.

    Instrument- or software-specific information needed to interpret the data:

    The dataset is provided in SPSS format, which requires IBM SPSS Statistics. The dataset is also available in a coded format in CSV. The Codebook is required to interpret to values.

    DATA-SPECIFIC INFORMATION FOR: MDCDataCitationReuse2021surveydata

    Number of variables: 95

    Number of cases/rows: 2,492

    Missing data codes: 999 Not asked

    Refer to MDCDatacitationReuse2021Codebook.pdf for detailed variable information.

  9. d

    Dataset for "Cognitive behavioural therapy self-help intervention...

    • datasets.ai
    • data.niaid.nih.gov
    • +2more
    0
    Updated Sep 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EU Open Research Repository (2022). Dataset for "Cognitive behavioural therapy self-help intervention preferences among informal caregivers of adults with chronic kidney disease: an online cross-sectional survey" [Dataset]. https://datasets.ai/datasets/oai-zenodo-org-7104638
    Explore at:
    0Available download formats
    Dataset updated
    Sep 21, 2022
    Dataset authored and provided by
    EU Open Research Repository
    Description

    Data and R code used for the analysis of data for the publication: Coumoundouros et al., Cognitive behavioural therapy self-help intervention preferences among informal caregivers of adults with chronic kidney disease: an online cross-sectional survey. BMC Nephrology Summary of study An online cross-sectional survey for informal caregivers (e.g. family and friends) of people living with chronic kidney disease in the United Kingdom. Study aimed to examine informal caregivers' cognitive behavioural therapy self-help intervention preferences, and describe the caregiving situation (e.g. types of care activities) and informal caregiver's mental health (depression, anxiety and stress symptoms). Participants were eligible to participate if they were at least 18 years old, lived in the United Kingdom, and provided unpaid care to someone living with chronic kidney disease who was at least 18 years old. The online survey included questions regarding (1) informal caregiver's characteristics; (2) care recipient's characteristics; (3) intervention preferences (e.g. content, delivery format); and (4) informal caregiver's mental health. Informal caregiver's mental health was assessed using the 21 item Depression, Anxiety, and Stress Scale (DASS-21), which is composed of three subscales measuring depression, anxiety, and stress, respectively. Sixty-five individuals participated in the survey. See the published article for full study details. Description of uploaded files 1. ENTWINE_ESR14_Kidney Carer Survey Data_FULL_2022-08-30: Excel file with the complete, raw survey data. Note: the first half of participant's postal codes was collected, however this data was removed from the uploaded dataset to ensure participant anonymity. 2. ENTWINE_ESR14_Kidney Carer Survey Data_Clean DASS-21 Data_2022-08-30: Excel file with cleaned data for the DASS-21 scale. Data cleaning involved imputation of missing data if participants were missing data for one item within a subscale of the DASS-21. Missing values were imputed by finding the mean of all other items within the relevant subscale. 3. ENTWINE_ESR14_Kidney Carer Survey_KEY_2022-08-30: Excel file with key linking item labels in uploaded datasets with the corresponding survey question. 4. R Code for Kidney Carer Survey_2022-08-30: R file of R code used to analyse survey data. 5. R code for Kidney Carer Survey_PDF_2022-08-30: PDF file of R code used to analyse survey data.

  10. S

    A survey of college students' psychological dependence on AIGC

    • scidb.cn
    Updated Nov 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    xu huan (2025). A survey of college students' psychological dependence on AIGC [Dataset]. http://doi.org/10.57760/sciencedb.31471
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 12, 2025
    Dataset provided by
    Science Data Bank
    Authors
    xu huan
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This dataset is derived from a questionnaire survey on the psychological dependence of college students on generative AI software. The data was collected in the form of an online questionnaire from January 24, 2025 to February 6, 2025, covering college students from multiple universities in Yunnan Province. The questionnaire design includes multiple dimensions such as basic information, usage behavior, psychological dependence, negative emotional experience, self-efficacy, etc., with a total of 1110 valid sample records. The data has been anonymized and does not contain any personal identification information. All responses were filled out by the participants themselves. In the data file, each row represents the complete answer of a respondent, and column labels include serial number, gender, grade level, major category, whether generative AI has been used, commonly used software types, frequency of use, start time, motivation for use, impact on learning efficiency, recommendation intention, attitude towards prohibition of use, future use intention, level of trust in AI, dependency behavior, anxiety and emotional reactions, self-efficacy, and other aspects. Some of the questions in the questionnaire were scored with the Likert five point scale, and some were Single choice question or multiple choice questions. Some questions, such as "Have you used Generative AI before?", are automatically skipped if not used, resulting in a missing value of "0" in the corresponding column, which is a reasonable loss in design logic. There may be self-report bias in the data collection process, and some questions involve subjective evaluations of psychological states, resulting in certain subjective errors. In the data processing stage, preliminary cleaning has been carried out for issues such as outliers and duplicate submissions to ensure the validity and consistency of the data. The data file is in Excel format (. xlsx) and can be opened and processed using common spreadsheet software such as Microsoft Excel, WPS spreadsheets, Google Sheets, etc. This dataset is suitable for empirical research in fields such as educational technology, psychology, and information behavior, especially for exploring the psychological and behavioral characteristics of college students during their interaction with generative AI.

  11. f

    UC_vs_US Statistic Analysis.xlsx

    • figshare.com
    xlsx
    Updated Jul 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 9, 2020
    Dataset provided by
    Utrecht University
    Authors
    F. (Fabiano) Dalpiaz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

    Tagging scheme:
    Aligned (AL) - A concept is represented as a class in both models, either
    

    with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

    All the calculations and information provided in the following sheets
    

    originate from that raw data.

    Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
    

    including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

    Sheet 3 (Size-Ratio):
    

    The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

    Sheet 4 (Overall):
    

    Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

    For sheet 4 as well as for the following four sheets, diverging stacked bar
    

    charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

    Sheet 5 (By-Notation):
    

    Model correctness and model completeness is compared by notation - UC, US.

    Sheet 6 (By-Case):
    

    Model correctness and model completeness is compared by case - SIM, HOS, IFA.

    Sheet 7 (By-Process):
    

    Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

    Sheet 8 (By-Grade):
    

    Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.

  12. SPORTS_DATA_ANALYSIS_ON_EXCEL

    • kaggle.com
    zip
    Updated Dec 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nil kamal Saha (2024). SPORTS_DATA_ANALYSIS_ON_EXCEL [Dataset]. https://www.kaggle.com/datasets/nilkamalsaha/sports-data-analysis-on-excel
    Explore at:
    zip(1203633 bytes)Available download formats
    Dataset updated
    Dec 12, 2024
    Authors
    Nil kamal Saha
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    PROJECT OBJECTIVE

    We are a part of XYZ Co Pvt Ltd company who is in the business of organizing the sports events at international level. Countries nominate sportsmen from different departments and our team has been given the responsibility to systematize the membership roster and generate different reports as per business requirements.

    Questions (KPIs)

    TASK 1: STANDARDIZING THE DATASET

    • Populate the FULLNAME consisting of the following fields ONLY, in the prescribed format: PREFIX FIRSTNAME LASTNAME.{Note: All UPPERCASE)
    • Get the COUNTRY NAME to which these sportsmen belong to. Make use of LOCATION sheet to get the required data
    • Populate the LANGUAGE_!poken by the sportsmen. Make use of LOCTION sheet to get the required data
    • Generate the EMAIL ADDRESS for those members, who speak English, in the prescribed format :lastname.firstnamel@xyz .org {Note: All lowercase) and for all other members, format should be lastname.firstname@xyz.com (Note: All lowercase)
    • Populate the SPORT LOCATION of the sport played by each player. Make use of SPORT sheet to get the required data

    TASK 2: DATA FORMATING

    • Display MEMBER IDas always 3 digit number {Note: 001,002 ...,D2D,..etc)
    • Format the BIRTHDATE as dd mmm'yyyy (Prescribed format example: 09 May' 1986)
    • Display the units for the WEIGHT column (Prescribed format example: 80 kg)
    • Format the SALARY to show the data In thousands. If SALARY is less than 100,000 then display data with 2 decimal places else display data with one decimal place. In both cases units should be thousands (k) e.g. 87670 -> 87.67 k and 12 250 -> 123.2 k

    TASK 3: SUMMARIZE DATA - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1) • Create a PIVOT table in the worksheet ANALYSIS, starting at cell B3,with the following details:

    • In COLUMNS; Group : GENDER.
    • In ROWS; Group : COUNTRY (Note: use COUNTRY NAMES).
    • In VALUES; calculate the count of candidates from each COUNTRY and GENDER type, Remove GRAND TOTALs.

    TASK 4: SUMMARIZE DATA - EXCEL FUNCTIONS (Use SPORTSMEN worksheet after attempting TASK 1)

    • Create a SUMMARY table in the worksheet ANALYSIS,starting at cell G4, with the following details:

    • Starting from range RANGE H4; get the distinct GENDER. Use remove duplicates option and transpose the data.
    • Starting from range RANGE GS; get the distinct COUNTRY (Note: use COUNTRY NAMES).
    • In the cross table,get the count of candidates from each COUNTRY and GENDER type.

    TASK 5: GENERATE REPORT - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1)

    • Create a PIVOT table report in the worksheet REPORT, starting at cell A3, with the following information:

    • Change the report layout to TABULAR form.
    • Remove expand and collapse buttons.
    • Remove GRAND TOTALs.
    • Allow user to filter the data by SPORT LOCATION.

    Process

    • Verify data for any missing values and anomalies, and sort out the same.
    • Made sure data is consistent and clean with respect to data type, data format and values used.
    • Created pivot tables according to the questions asked.
  13. Z

    Help Me study! Music Listening Habits While Studying (Dataset)

    • data.niaid.nih.gov
    Updated Apr 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cheah, Yiting (2024). Help Me study! Music Listening Habits While Studying (Dataset) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10085103
    Explore at:
    Dataset updated
    Apr 4, 2024
    Dataset provided by
    University of Liverpool
    Authors
    Cheah, Yiting
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the raw data used for a research study that examined university students' music listening habits while studying. There are two experiments in this research study. Experiment 1 is a retrospective survey, and Experiment 2 is a mobile experience sampling research study. This repository contains five Microsoft Excel files with data obtained from both experiments. The files are as follows:

    onlineSurvey_raw_data.xlsx esm_raw_data.xlsx esm_music_features_analysis.xlsx esm_demographics.xlsx index.xlsx Files Description File: onlineSurvey_raw_data.xlsx This file contains the raw data from Experiment 1, including the (anonymised) demographic information of the sample. The sample characteristics recorded are:

    studentship area of study country of study type of accommodation a participant was living in age self-identified gender language ability (mono- or bi-/multilingual) (various) personality traits (various) musicianship (various) everyday music uses (various) music capacity The file also contains raw data of responses to the questions about participants' music listening habits while studying in real life. These pieces of data are:

    likelihood of listening to specific (rated across 23) music genres while studying and during everyday listening. likelihood of listening to music with specific acoustic features (e.g., with/without lyrics, loud/soft, fast/slow) music genres while studying and during everyday listening. general likelihood of listening to music while studying in real life. (verbatim) responses to participants' written responses to the open-ended questions about their real-life music listening habits while studying. File: esm_raw_data.xlsx This file contains the raw data from Experiment 2, including the following variables:

    information of the music tracks (track name, artist name, and if available, Spotify ID of those tracks) each participant was listening to during each music episode (both while studying and during everyday-listening) level of arousal at the onset of music playing and the end of the 30-minute study period level of valence at the onset of music playing and the end of the 30-minute study period specific mood at the onset of music playing and the end of the 30-minute study period whether participants were studying their location at that moment (if studying) whether they were studying alone (if studying) the types of study tasks (if studying) the perceived level of difficulty of the study task whether participants were planning to listen to music while studying (various) reasons for music listening (various) perceived positive and negative impacts of studying with music Each row represents the data for a single participant. Rows with a record of a participant ID but no associated data indicate that the participant did not respond to the questionnaire (i.e., missing data). File: esm_music_features_analysis.xlsx This file presents the music features of each recorded music track during both the study-episodes and the everyday-episodes (retrieved from Spotify's "Get Track's Audio Features" API). These features are:

    energy level loudness valence tempo mode The contextual details of the moments each track was being played are also presented here, which include:

    whether the participant was studying their location (e.g., at home, cafe, university) whether they were studying alone the type of study tasks they were engaging with (e.g., reading, writing) the perceived difficulty level of the task File: esm_demographics.xlsx This file contains the demographics of the sample in Experiment 2 (N = 10), which are the same as in Experiment 1 (see above). Each row represents the data for a single participant. Rows with a record of a participant ID but no associated demographic data indicate that the participant did not respond to the questionnaire (i.e., missing data). File: index.xlsx Finally, this file contains all the abbreviations used in each document as well as their explanations.

  14. Retail Store Sales: Dirty for Data Cleaning

    • kaggle.com
    zip
    Updated Jan 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Mohamed (2025). Retail Store Sales: Dirty for Data Cleaning [Dataset]. https://www.kaggle.com/datasets/ahmedmohamed2003/retail-store-sales-dirty-for-data-cleaning
    Explore at:
    zip(226740 bytes)Available download formats
    Dataset updated
    Jan 18, 2025
    Authors
    Ahmed Mohamed
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dirty Retail Store Sales Dataset

    Overview

    The Dirty Retail Store Sales dataset contains 12,575 rows of synthetic data representing sales transactions from a retail store. The dataset includes eight product categories with 25 items per category, each having static prices. It is designed to simulate real-world sales data, including intentional "dirtiness" such as missing or inconsistent values. This dataset is suitable for practicing data cleaning, exploratory data analysis (EDA), and feature engineering.

    File Information

    • File Name: retail_store_sales.csv
    • Number of Rows: 12,575
    • Number of Columns: 11

    Columns Description

    Column NameDescriptionExample Values
    Transaction IDA unique identifier for each transaction. Always present and unique.TXN_1234567
    Customer IDA unique identifier for each customer. 25 unique customers.CUST_01
    CategoryThe category of the purchased item.Food, Furniture
    ItemThe name of the purchased item. May contain missing values or None.Item_1_FOOD, None
    Price Per UnitThe static price of a single unit of the item. May contain missing or None values.4.00, None
    QuantityThe quantity of the item purchased. May contain missing or None values.1, None
    Total SpentThe total amount spent on the transaction. Calculated as Quantity * Price Per Unit.8.00, None
    Payment MethodThe method of payment used. May contain missing or invalid values.Cash, Credit Card
    LocationThe location where the transaction occurred. May contain missing or invalid values.In-store, Online
    Transaction DateThe date of the transaction. Always present and valid.2023-01-15
    Discount AppliedIndicates if a discount was applied to the transaction. May contain missing values.True, False, None

    Categories and Items

    The dataset includes the following categories, each containing 25 items with corresponding codes, names, and static prices:

    Electric Household Essentials

    Item CodeItem NamePrice
    Item_1_EHEBlender5.0
    Item_2_EHEMicrowave6.5
    Item_3_EHEToaster8.0
    Item_4_EHEVacuum Cleaner9.5
    Item_5_EHEAir Purifier11.0
    Item_6_EHEElectric Kettle12.5
    Item_7_EHERice Cooker14.0
    Item_8_EHEIron15.5
    Item_9_EHECeiling Fan17.0
    Item_10_EHETable Fan18.5
    Item_11_EHEHair Dryer20.0
    Item_12_EHEHeater21.5
    Item_13_EHEHumidifier23.0
    Item_14_EHEDehumidifier24.5
    Item_15_EHECoffee Maker26.0
    Item_16_EHEPortable AC27.5
    Item_17_EHEElectric Stove29.0
    Item_18_EHEPressure Cooker30.5
    Item_19_EHEInduction Cooktop32.0
    Item_20_EHEWater Dispenser33.5
    Item_21_EHEHand Blender35.0
    Item_22_EHEMixer Grinder36.5
    Item_23_EHESandwich Maker38.0
    Item_24_EHEAir Fryer39.5
    Item_25_EHEJuicer41.0

    Furniture

    Item CodeItem NamePrice
    Item_1_FUROffice Chair5.0
    Item_2_FURSofa6.5
    Item_3_FURCoffee Table8.0
    Item_4_FURDining Table9.5
    Item_5_FURBookshelf11.0
    Item_6_FURBed F...
  15. bikes_data_cleaned_02.2022_01.2023

    • kaggle.com
    zip
    Updated Feb 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mazapán Lindsey (2023). bikes_data_cleaned_02.2022_01.2023 [Dataset]. https://www.kaggle.com/datasets/mazapnlindsey/bikes-data-cleaned-022022-012023
    Explore at:
    zip(745510749 bytes)Available download formats
    Dataset updated
    Feb 26, 2023
    Authors
    Mazapán Lindsey
    Description

    From the Google Data Analytics Certificate course, case study 1: Beginning in 2016, the fictional company Cyclistic launched a successful bike-share offering. The program has grown to geotracked 5,824 bicycles and 692 docking stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.

    This dataset contains cleaned data for the 12 month period 02/2022 - 01/2023. The cleaning process is as follows, also documented within the "How a Bike-Share Company Navigates Speedy Success" notebook:

    • downloaded zip files
    • opened in an excel workbook, 12 sheets, one for each csv file
    • formatted cells - made dates (started_at and ended_at columns) into mm/dd/yyyy hh:mm format in excel for each sheet
    • made sure there were no duplicates in the ride_id column by putting conditional formatting on for duplicate cell values
    • created column named ride_length. Formula is ended_at - started_at, formatted it as hh:mm:ss
    • created column named day_of_week. Formula is =WEEKDAY() based on started_at. 1=Sunday 7=Saturday. Verified the first row of each sheet was accurate via Google
    • noted there were missing values for columns start_station_name, end_station_name, and end_station_id. Determined since there were longitude and latitude values for each of these (columns start_lat, start_lng, end_lat, end_lng) that the missing data was not detrimental to this analysis
  16. Data from: Student Academic Performance Dataset

    • kaggle.com
    Updated Oct 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hackathon data (2025). Student Academic Performance Dataset [Dataset]. https://www.kaggle.com/datasets/aryancodes12fyds/student-academic-performance-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 6, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Hackathon data
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    📘 Description

    The Student Academic Performance Dataset contains detailed academic and lifestyle information of 250 students, created to analyze how various factors — such as study hours, sleep, attendance, stress, and social media usage — influence their overall academic outcomes and GPA.

    This dataset is synthetic but realistic, carefully generated to reflect believable academic patterns and relationships. It’s perfect for learning data analysis, statistics, and visualization using Excel, Python, or R.

    The data includes 12 attributes, primarily numerical, ensuring that it’s suitable for a wide range of analytical tasks — from basic descriptive statistics (mean, median, SD) to correlation and regression analysis.

    📊 Key Features

    🧮 250 rows and 12 columns

    💡 Mostly numerical — great for Excel-based statistical functions

    🔍 No missing values — ready for direct use

    📈 Balanced and realistic — ideal for clear visualizations and trend analysis

    🎯 Suitable for:

    Descriptive statistics

    Correlation & regression

    Data visualization projects

    Dashboard creation (Excel, Tableau, Power BI)

    💡 Possible Insights to Explore

    How do study hours impact GPA?

    Is there a relationship between stress levels and performance?

    Does social media usage reduce study efficiency?

    Do students with higher attendance achieve better grades?

    ⚙️ Data Generation Details

    Each record represents a unique student.

    GPA is calculated using a weighted formula based on midterm and final scores.

    Relationships are designed to be realistic — for example:

    Higher study hours → higher scores and GPA

    Higher stress → slightly lower sleep hours

    Excessive social media time → reduced academic performance

    ⚠️ Disclaimer

    This dataset is synthetically generated using statistical modeling techniques and does not contain any real student data. It is intended purely for educational, analytical, and research purposes.

  17. Hive Annotation Job Results - Cleaned and Audited

    • kaggle.com
    zip
    Updated Apr 28, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brendan Kelley (2021). Hive Annotation Job Results - Cleaned and Audited [Dataset]. https://www.kaggle.com/brendankelley/hive-annotation-job-results-cleaned-and-audited
    Explore at:
    zip(471571 bytes)Available download formats
    Dataset updated
    Apr 28, 2021
    Authors
    Brendan Kelley
    Description

    Context

    This notebook serves to showcase my problem solving ability, knowledge of the data analysis process, proficiency with Excel and its various tools and functions, as well as my strategic mindset and statistical prowess. This project consist of an auditing prompt provided by Hive Data, a raw Excel data set, a cleaned and audited version of the raw Excel data set, and my description of my thought process and knowledge used during completion of the project. The prompt can be found below:

    Hive Data Audit Prompt

    The raw data that accompanies the prompt can be found below:

    Hive Annotation Job Results - Raw Data

    ^ These are the tools I was given to complete my task. The rest of the work is entirely my own.

    To summarize broadly, my task was to audit the dataset and summarize my process and results. Specifically, I was to create a method for identifying which "jobs" - explained in the prompt above - needed to be rerun based on a set of "background facts," or criteria. The description of my extensive thought process and results can be found below in the Content section.

    Content

    Brendan Kelley April 23, 2021

    Hive Data Audit Prompt Results

    This paper explains the auditing process of the “Hive Annotation Job Results” data. It includes the preparation, analysis, visualization, and summary of the data. It is accompanied by the results of the audit in the excel file “Hive Annotation Job Results – Audited”.

    Observation

    The “Hive Annotation Job Results” data comes in the form of a single excel sheet. It contains 7 columns and 5,001 rows, including column headers. The data includes “file”, “object id”, and the pseudonym for five questions that each client was instructed to answer about their respective table: “tabular”, “semantic”, “definition list”, “header row”, and “header column”. The “file” column includes non-unique (that is, there are multiple instances of the same value in the column) numbers separated by a dash. The “object id” column includes non-unique numbers ranging from 5 to 487539. The columns containing the answers to the five questions include Boolean values - TRUE or FALSE – which depend upon the yes/no worker judgement.

    Use of the COUNTIF() function reveals that there are no values other than TRUE or FALSE in any of the five question columns. The VLOOKUP() function reveals that the data does not include any missing values in any of the cells.

    Assumptions

    Based on the clean state of the data and the guidelines of the Hive Data Audit Prompt, the assumption is that duplicate values in the “file” column are acceptable and should not be removed. Similarly, duplicated values in the “object id” column are acceptable and should not be removed. The data is therefore clean and is ready for analysis/auditing.

    Preparation

    The purpose of the audit is to analyze the accuracy of the yes/no worker judgement of each question according to the guidelines of the background facts. The background facts are as follows:

    • A table that is a definition list should automatically be tabular and also semantic • Semantic tables should automatically be tabular • If a table is NOT tabular, then it is definitely not semantic nor a definition list • A tabular table that has a header row OR header column should definitely be semantic

    These background facts serve as instructions for how the answers to the five questions should interact with one another. These facts can be re-written to establish criteria for each question:

    For tabular column: - If the table is a definition list, it is also tabular - If the table is semantic, it is also tabular

    For semantic column: - If the table is a definition list, it is also semantic - If the table is not tabular, it is not semantic - If the table is tabular and has either a header row or a header column...

  18. Coachella 2024 Artist and Lineup Data

    • kaggle.com
    zip
    Updated Apr 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Micah Lawrence (2024). Coachella 2024 Artist and Lineup Data [Dataset]. https://www.kaggle.com/datasets/micahlawrence/coachella-2024-artist-and-lineup-data/data
    Explore at:
    zip(14507 bytes)Available download formats
    Dataset updated
    Apr 30, 2024
    Authors
    Micah Lawrence
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    A dataset of Coachella 2024 artists complete with lineup data, artist data and Spotify artist data.

    **Dataset derived from: https://docs.google.com/spreadsheets/d/1m7_Be2CPBGcqt4duMWRHmgomdLK_YjNSNNPfuBhf9Js/edit#gid=1826236554

    **Source: Data found on r/coachella from the user natnav_

    Data cleaning notes from source:

    • The stage is NULL for several artists so I refrenced the lineup and filled in the stages that were missing.
    • The Spotify Listeners data has a Ms and Ks for millions and thousands so I removed those in Excel and appended the appropriate 0s
    • I also added additional Spotify data sourced from https://songstats.com/platforms/spotify
    • Gender had the type of artist in parentheses next to it. I'd like to analyze that data so I made a new field called type used text to columns in Excel seperated by space to move over. I added Solo when it was just one gender.
    • Seperated out Artist to a table with genre and country and implented an artist_id on other tables to allow for easier joining and less repetive data storage
    • Note that for b2b dj sets, the artists were separated into individual artist lines and are presented on the line up as a seperate artist playing on the stage on that day.
    • I ignored if certain artists were weekend 1 or weekend 2 only to include all
    • If an artist dropped out before the lineup announce, they were removed as well (i.e. Tyla)
  19. AIG Actuarial Analyst

    • kaggle.com
    zip
    Updated Mar 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ankur kumar (2025). AIG Actuarial Analyst [Dataset]. https://www.kaggle.com/datasets/ankurkumar7078/aig-actuarial-analyst
    Explore at:
    zip(96850 bytes)Available download formats
    Dataset updated
    Mar 14, 2025
    Authors
    Ankur kumar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Instructions

    1. Analyze the dataset using the claims data resource

    Examine the data: Start by thoroughly examining the dataset within the Claims Data resource. Focus on key variables such as claim dates, types of claims, amounts claimed, and additional details about the incidents. Manipulate the data: Derive the missing values in columns F, O, P, and Q. Use hints if needed. This step emphasizes data manipulation, a key component of account pricing analysis. Identify patterns and anomalies: Conduct EDA using the data in the Claims Data resource. Identify patterns, trends, and anomalies. Utilize visual tools such as histograms, scatter plots, and bar charts within Excel to help you visualize and interpret the data. 2. Apply actuarial principles to the data

    Risk assessment: Use the actuarial principles you learned in Task 1 to assess the risks associated with the claims data. Calculate key metrics such as claim frequency, severity, and loss ratios based on the data provided. Calculate premiums: Develop a pricing model using experience-based rating. This involves adjusting historical data from the Claims Data resource to project future claims costs, considering factors such as inflation and changes in exposure. 3. Develop comprehensive reports in Excel

    Analysis report: Compile your findings: Organize your EDA into a well-structured section within the Excel workbook. This section should include a detailed evaluation of the Marine Liability insurance claims data, visualizations of key findings, and a commentary on observed trends and anomalies. Commentary on risks and uncertainties: Provide a clear commentary on the risks and uncertainties associated with your assessment. Discuss how different scenarios could impact the pricing model and the potential financial implications for Oceanic Shipping Co. Pricing calculation: Perform a numbers-based premium calculation: Use the Claims Data resource to calculate the appropriate premiums for the Marine Liability insurance policy. Apply actuarial principles such as loss frequency, loss severity, and pure premium calculation, and adjust for expenses and profit margins. Sensitivity analysis: Include a sensitivity analysis within the Excel workbook to assess how changes in key assumptions (e.g., an increase in loss severity) could impact the final premium. Document your calculations: Ensure your premium calculation section in Excel clearly documents your methodology, assumptions, and final premium recommendations. Discuss the potential risks and uncertainties in your pricing model, including any external factors that could impact future claims.

  20. Bestseller Book Data

    • kaggle.com
    zip
    Updated Mar 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    oyebusola (2024). Bestseller Book Data [Dataset]. https://www.kaggle.com/datasets/oyecrafts/bestseller-book-data
    Explore at:
    zip(420400 bytes)Available download formats
    Dataset updated
    Mar 28, 2024
    Authors
    oyebusola
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset provides valuable insights into the ratings distribution of bestselling books across different categories. With a meticulous categorization of bestsellers based on their user ratings, this dataset offers a comprehensive overview of the popularity and reception of top-selling books. Whether you're interested in exploring highly-rated bestsellers, very highly-rated bestsellers, or moderately rated bestsellers, this dataset empowers you to analyze trends and patterns in the literary world. Leveraging this dataset opens up opportunities for market research, trend analysis, and strategic decision-making for publishers, authors, and book enthusiasts alike.

    What questions were asked

    • What is the distribution of bestseller ratings among the top-selling books?
    • How many books fall into each category of bestseller ratings (e.g., very highly rated, highly rated, moderately rated)?
    • Which genres tend to have the highest-rated bestsellers?
    • Are there any trends or patterns in the ratings of bestsellers over time?
    • What are the characteristics of highly-rated bestsellers compared to moderately-rated ones?
    • How do the prices of bestsellers correlate with their ratings?
    • Can we identify any outliers or anomalies in the dataset that may require further investigation?
    • Are there any authors who consistently produce highly-rated bestsellers?
    • How does the number of reviews correlate with the user ratings of bestsellers?
    • What insights can be gained from comparing the ratings breakdowns across different years or time periods?

    What were the tasks completed?

    1.Data Cleaning and Manipulation in Excel: Conducted data cleaning and manipulation tasks such as removing duplicates, handling missing values, and formatting data for analysis in Excel.

    2.Data Collection from Kaggle: Gathered the initial dataset containing information about bestselling books from Kaggle, a popular platform for datasets.

    3.Visualization in Tableau: Created interactive visualizations of the dataset using Tableau, a powerful data visualization tool, to explore and analyze bestseller ratings breakdowns.

    4.Reporting on Google Docs: Generated reports and summaries of the findings using Google Docs, a collaborative document editing platform, to communicate insights effectively.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Barigozzi, Matteo; Lissona, Claudio (2025). EA-MD-QD: Large Euro Area and Euro Member Countries Datasets for Macroeconomic Research [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10514667

EA-MD-QD: Large Euro Area and Euro Member Countries Datasets for Macroeconomic Research

Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
University of Bologna
Authors
Barigozzi, Matteo; Lissona, Claudio
License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Description

EA-MD-QD is a collection of large monthly and quarterly EA and EA member countries datasets for macroeconomic analysis.The EA member countries covered are: AT, BE, DE, EL, ES, FR, IE, IT, NL, PT.

The formal reference to this dataset is:

Barigozzi, M. and Lissona, C. (2024) "EA-MD-QD: Large Euro Area and Euro Member Countries Datasets for Macroeconomic Research". Zenodo.

Please refer to it when using the data.

Each zip file contains:- Excel files for the EA and the countries covered, each containing an unbalanced panel of raw de-seasonalized data.- A Matlab code taking as input the raw data and allowing to perform various operations such as:choose the frequency, fill-in missing values, transform data to stationarity, and control for covid outliers.- A pdf file with all informations about the series names, sources, and transformation codes.

This version (03.2025):

Updated data as of 28-March-2025. We improved the matlab code and included a ReadME file containing details on the parameters' choice from the user, which before were only briefly commented in the code.

Search
Clear search
Close search
Google apps
Main menu