67 datasets found
  1. Netflix Movies and TV Shows Dataset Cleaned(excel)

    • kaggle.com
    Updated Apr 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gaurav Tawri (2025). Netflix Movies and TV Shows Dataset Cleaned(excel) [Dataset]. https://www.kaggle.com/datasets/gauravtawri/netflix-movies-and-tv-shows-dataset-cleanedexcel
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 8, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Gaurav Tawri
    Description

    This dataset is a cleaned and preprocessed version of the original Netflix Movies and TV Shows dataset available on Kaggle. All cleaning was done using Microsoft Excel — no programming involved.

    🎯 What’s Included: - Cleaned Excel file (standardized columns, proper date format, removed duplicates/missing values) - A separate "formulas_used.txt" file listing all Excel formulas used during cleaning (e.g., TRIM, CLEAN, DATE, SUBSTITUTE, TEXTJOIN, etc.) - Columns like 'date_added' have been properly formatted into DMY structure - Multi-valued columns like 'listed_in' are split for better analysis - Null values replaced with “Unknown” for clarity - Duration field broken into numeric + unit components

    🔍 Dataset Purpose: Ideal for beginners and analysts who want to: - Practice data cleaning in Excel - Explore Netflix content trends - Analyze content by type, country, genre, or date added

    📁 Original Dataset Credit: The base version was originally published by Shivam Bansal on Kaggle: https://www.kaggle.com/shivamb/netflix-shows

    📌 Bonus: You can find a step-by-step cleaning guide and the same dataset on GitHub as well — along with screenshots and formulas documentation.

  2. u

    Data from: DATABASE FOR THE ANALYSIS OF ROAD ACCIDENTS IN EUROPE

    • produccioncientifica.ugr.es
    • data.niaid.nih.gov
    • +1more
    Updated 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Navarro-Moreno, José; De Oña, Juan; Calvo-Poyo, Francisco; Navarro-Moreno, José; De Oña, Juan; Calvo-Poyo, Francisco (2022). DATABASE FOR THE ANALYSIS OF ROAD ACCIDENTS IN EUROPE [Dataset]. https://produccioncientifica.ugr.es/documentos/668fc484b9e7c03b01bdfcfc
    Explore at:
    Dataset updated
    2022
    Authors
    Navarro-Moreno, José; De Oña, Juan; Calvo-Poyo, Francisco; Navarro-Moreno, José; De Oña, Juan; Calvo-Poyo, Francisco
    Area covered
    Europe
    Description

    This database that can be used for macro-level analysis of road accidents on interurban roads in Europe. Through the variables it contains, road accidents can be explained using variables related to economic resources invested in roads, traffic, road network, socioeconomic characteristics, legislative measures and meteorology. This repository contains the data used for the analysis carried out in the papers: 1. Calvo-Poyo F., Navarro-Moreno J., de Oña J. (2020) Road Investment and Traffic Safety: An International Study. Sustainability 12:6332. https://doi.org/10.3390/su12166332 2. Navarro-Moreno J., Calvo-Poyo F., de Oña J. (2022) Influence of road investment and maintenance expenses on injured traffic crashes in European roads. Int J Sustain Transp 1–11. https://doi.org/10.1080/15568318.2022.2082344 3. Navarro-Moreno, J., Calvo-Poyo, F., de Oña, J. (2022) Investment in roads and traffic safety: linked to economic development? A European comparison. Environ. Sci. Pollut. Res. https://doi.org/10.1007/s11356-022-22567 The file with the database is available in excel. DATA SOURCES The database presents data from 1998 up to 2016 from 20 european countries: Austria, Belgium, Croatia, Czechia, Denmark, Estonia, Finland, France, Germany, Ireland, Italy, Latvia, Netherlands, Poland, Portugal, Slovakia, Slovenia, Spain, Sweden and United Kingdom. Crash data were obtained from the United Nations Economic Commission for Europe (UNECE) [2], which offers enough level of disaggregation between crashes occurring inside versus outside built-up areas. With reference to the data on economic resources invested in roadways, deserving mention –given its extensive coverage—is the database of the Organisation for Economic Cooperation and Development (OECD), managed by the International Transport Forum (ITF) [1], which collects data on investment in the construction of roads and expenditure on their maintenance, following the definitions of the United Nations System of National Accounts (2008 SNA). Despite some data gaps, the time series present consistency from one country to the next. Moreover, to confirm the consistency and complete missing data, diverse additional sources, mainly the national Transport Ministries of the respective countries were consulted. All the monetary values were converted to constant prices in 2015 using the OECD price index. To obtain the rest of the variables in the database, as well as to ensure consistency in the time series and complete missing data, the following national and international sources were consulted: Eurostat [3] Directorate-General for Mobility and Transport (DG MOVE). European Union [4] The World Bank [5] World Health Organization (WHO) [6] European Transport Safety Council (ETSC) [7] European Road Safety Observatory (ERSO) [8] European Climatic Energy Mixes (ECEM) of the Copernicus Climate Change [9] EU BestPoint-Project [10] Ministerstvo dopravy, República Checa [11] Bundesministerium für Verkehr und digitale Infrastruktur, Alemania [12] Ministerie van Infrastructuur en Waterstaat, Países Bajos [13] National Statistics Office, Malta [14] Ministério da Economia e Transição Digital, Portugal [15] Ministerio de Fomento, España [16] Trafikverket, Suecia [17] Ministère de l’environnement de l’énergie et de la mer, Francia [18] Ministero delle Infrastrutture e dei Trasporti, Italia [19–25] Statistisk sentralbyrå, Noruega [26-29] Instituto Nacional de Estatística, Portugal [30] Infraestruturas de Portugal S.A., Portugal [31–35] Road Safety Authority (RSA), Ireland [36] DATA BASE DESCRIPTION The database was made trying to combine the longest possible time period with the maximum number of countries with complete dataset (some countries like Lithuania, Luxemburg, Malta and Norway were eliminated from the definitive dataset owing to a lack of data or breaks in the time series of records). Taking into account the above, the definitive database is made up of 19 variables, and contains data from 20 countries during the period between 1998 and 2016. Table 1 shows the coding of the variables, as well as their definition and unit of measure. Table. Database metadata Code Variable and unit fatal_pc_km Fatalities per billion passenger-km fatal_mIn Fatalities per million inhabitants accid_adj_pc_km Accidents per billion passenger-km p_km Billions of passenger-km croad_inv_km Investment in roads construction per kilometer, €/km (2015 constant prices) croad_maint_km Expenditure on roads maintenance per kilometer €/km (2015 constant prices) prop_motorwa Proportion of motorways over the total road network (%) populat Population, in millions of inhabitants unemploy Unemployment rate (%) petro_car Consumption of gasolina and petrol derivatives (tons), per tourism alcohol Alcohol consumption, in liters per capita (age > 15) mot_index Motorization index, in cars per 1,000 inhabitants den_populat Population density, inhabitants/km2 cgdp Gross Domestic Product (GDP), in € (2015 constant prices) cgdp_cap GDP per capita, in € (2015 constant prices) precipit Average depth of rain water during a year (mm) prop_elder Proportion of people over 65 years (%) dps Demerit Point System, dummy variable (0: no; 1: yes) freight Freight transport, in billions of ton-km ACKNOWLEDGEMENTS This database was carried out in the framework of the project “Inversión en carreteras y seguridad vial: un análisis internacional (INCASE)”, financed by: FEDER/Ministerio de Ciencia, Innovación y Universidades–Agencia Estatal de Investigación/Proyecto RTI2018-101770-B-I00, within Spain´s National Program of R+D+i Oriented to Societal Challenges. Moreover, the authors would like to express their gratitude to the Ministry of Transport, Mobility and Urban Agenda of Spain (MITMA), and the Federal Ministry of Transport and Digital Infrastructure of Germany (BMVI) for providing data for this study. REFERENCES 1. International Transport Forum OECD iLibrary | Transport infrastructure investment and maintenance. 2. United Nations Economic Commission for Europe UNECE Statistical Database Available online: https://w3.unece.org/PXWeb2015/pxweb/en/STAT/STAT_40-TRTRANS/?rxid=18ad5d0d-bd5e-476f-ab7c-40545e802eeb (accessed on Apr 28, 2020). 3. European Commission Database - Eurostat Available online: https://ec.europa.eu/eurostat/data/database (accessed on Apr 28, 2021). 4. Directorate-General for Mobility and Transport. European Commission EU Transport in figures - Statistical Pocketbooks Available online: https://ec.europa.eu/transport/facts-fundings/statistics_en (accessed on Apr 28, 2021). 5. World Bank Group World Bank Open Data | Data Available online: https://data.worldbank.org/ (accessed on Apr 30, 2021). 6. World Health Organization (WHO) WHO Global Information System on Alcohol and Health Available online: https://apps.who.int/gho/data/node.main.GISAH?lang=en (accessed on Apr 29, 2021). 7. European Transport Safety Council (ETSC) Traffic Law Enforcement across the EU - Tackling the Three Main Killers on Europe’s Roads; Brussels, Belgium, 2011; 8. Copernicus Climate Change Service Climate data for the European energy sector from 1979 to 2016 derived from ERA-Interim Available online: https://cds.climate.copernicus.eu/cdsapp#!/dataset/sis-european-energy-sector?tab=overview (accessed on Apr 29, 2021). 9. Klipp, S.; Eichel, K.; Billard, A.; Chalika, E.; Loranc, M.D.; Farrugia, B.; Jost, G.; Møller, M.; Munnelly, M.; Kallberg, V.P.; et al. European Demerit Point Systems : Overview of their main features and expert opinions. EU BestPoint-Project 2011, 1–237. 10. Ministerstvo dopravy Serie: Ročenka dopravy; Ročenka dopravy; Centrum dopravního výzkumu: Prague, Czech Republic; 11. Bundesministerium für Verkehr und digitale Infrastruktur Verkehr in Zahlen 2003/2004; Hamburg, Germany, 2004; ISBN 3871542946. 12. Bundesministerium für Verkehr und digitale Infrastruktur Verkehr in Zahlen 2018/2019. In Verkehrsdynamik; Flensburg, Germany, 2018 ISBN 9783000612947. 13. Ministerie van Infrastructuur en Waterstaat Rijksjaarverslag 2018 a Infrastructuurfonds; The Hague, Netherlands, 2019; ISBN 0921-7371. 14. Ministerie van Infrastructuur en Milieu Rijksjaarverslag 2014 a Infrastructuurfonds; The Hague, Netherlands, 2015; ISBN 0921- 7371. 15. Ministério da Economia e Transição Digital Base de Dados de Infraestruturas - GEE Available online: https://www.gee.gov.pt/pt/publicacoes/indicadores-e-estatisticas/base-de-dados-de-infraestruturas (accessed on Apr 29, 2021). 16. Ministerio de Fomento. Dirección General de Programación Económica y Presupuestos. Subdirección General de Estudios Económicos y Estadísticas Serie: Anuario estadístico; NIPO 161-13-171-0; Centro de Publicaciones. Secretaría General Técnica. Ministerio de Fomento: Madrid, Spain; 17. Trafikverket The Swedish Transport Administration Annual report: 2017; 2018; ISBN 978-91-7725-272-6. 18. Ministère de l’Équipement, du T. et de la M. Mémento de statistiques des transports 2003; Ministère de l’environnement de l’énergie et de la mer, 2005; 19. Ministero delle Infrastrutture e dei Trasporti Conto Nazionale delle Infrastrutture e dei Trasporti Anno 2000; Istituto Poligrafico e Zecca dello Stato: Roma, Italy, 2001; 20. Ministero delle Infrastrutture e dei Trasporti Conto nazionale dei trasporti 1999. 2000. 21. Generale, D.; Informativi, S. delle Infrastrutture e dei Trasporti Anno 2004. 22. Ministero delle Infrastrutture e dei Trasporti Conto Nazionale delle Infrastrutture e dei Trasporti Anno 2001; 2002; 23. Ministero delle Infrastrutture e dei

  3. m

    Cross Regional Eucalyptus Growth and Environmental Data

    • data.mendeley.com
    Updated Oct 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher Erasmus (2024). Cross Regional Eucalyptus Growth and Environmental Data [Dataset]. http://doi.org/10.17632/2m9rcy3dr9.3
    Explore at:
    Dataset updated
    Oct 7, 2024
    Authors
    Christopher Erasmus
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset is provided in a single .xlsx file named "eucalyptus_growth_environment_data_V2.xlsx" and consists of fifteen sheets:

    Codebook: This sheet details the index, values, and descriptions for each field within the dataset, providing a comprehensive guide to understanding the data structure.

    ALL NODES: Contains measurements from all devices, totalling 102,916 data points. This sheet aggregates the data across all nodes.

    GWD1 to GWD10: These subset sheets include measurements from individual nodes, labelled according to the abbreviation “Generic Wireless Dendrometer” followed by device IDs 1 through 10. Each sheet corresponds to a specific node, representing measurements from ten trees (or nodes).

    Metadata: Provides detailed metadata for each node, including species, initial diameter, location, measurement frequency, battery specifications, and irrigation status. This information is essential for identifying and differentiating the nodes and their specific attributes.

    Missing Data Intervals: Details gaps in the data stream, including start and end dates and times when data was not uploaded. It includes information on the total duration of each missing interval and the number of missing data points.

    Missing Intervals Distribution: Offers a summary of missing data intervals and their distribution, providing insight into data gaps and reasons for missing data.

    All nodes utilize LoRaWAN for data transmission. Please note that intermittent data gaps may occur due to connectivity issues between the gateway and the nodes, as well as maintenance activities or experimental procedures.

    Software considerations: The provided R code named “Simple_Dendro_Imputation_and_Analysis.R” is a comprehensive analysis workflow that processes and analyses Eucalyptus growth and environmental data from the "eucalyptus_growth_environment_data_V2.xlsx" dataset. The script begins by loading necessary libraries, setting the working directory, and reading the data from the specified Excel sheet. It then combines date and time information into a unified DateTime format and performs data type conversions for relevant columns. The analysis focuses on a specified device, allowing for the selection of neighbouring devices for imputation of missing data. A loop checks for gaps in the time series and fills in missing intervals based on a defined threshold, followed by a function that imputes missing values using the average from nearby devices. Outliers are identified and managed through linear interpolation. The code further calculates vapor pressure metrics and applies temperature corrections to the dendrometer data. Finally, it saves the cleaned and processed data into a new Excel file while conducting dendrometer analysis using the dendRoAnalyst package, which includes visualizations and calculations of daily growth metrics and correlations with environmental factors such as vapour pressure deficit (VPD).

  4. a

    CalEnviroScreen 4.0 Results

    • usc-geohealth-hub-uscssi.hub.arcgis.com
    • uscssi.hub.arcgis.com
    Updated Nov 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Spatial Sciences Institute (2021). CalEnviroScreen 4.0 Results [Dataset]. https://usc-geohealth-hub-uscssi.hub.arcgis.com/maps/USCSSI::calenviroscreen-4-0-results
    Explore at:
    Dataset updated
    Nov 24, 2021
    Dataset authored and provided by
    Spatial Sciences Institute
    Area covered
    Description

    Spatial extent: CaliforniaSpatial Unit: Census TractCreated: Oct 20, 2021Updated: Oct 20, 2021Source: California Office of Environmental Health Hazard AssessmentContact Email: CalEnviroScreen@oehha.ca.gov Source Link: https://oehha.ca.gov/calenviroscreen/report/calenviroscreen-40Microsoft Excel spreadsheet and PDF with a Data Dictionary: There are two files in this zipped folder. 1) a spreadsheet showing raw data and calculated percentiles for individual indicators and combined CalEnviroScreen scores for individual census tracts with additional demographic information. 2) a pdf document including the data dictionary and information on zeros and missing values: CalEnviroScreen 4.0 Excel and Data Dictionary PDF

  5. d

    Performance characteristics of an HIV risk screening tool in Uganda

    • datadryad.org
    • zenodo.org
    zip
    Updated Aug 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matovu John Bosco Junior (2023). Performance characteristics of an HIV risk screening tool in Uganda [Dataset]. http://doi.org/10.5061/dryad.m0cfxpp8t
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 23, 2023
    Dataset provided by
    Dryad
    Authors
    Matovu John Bosco Junior
    Time period covered
    Aug 11, 2023
    Area covered
    Uganda
    Description

    Excel, Stata SPSS Epi info

  6. 4

    Data underlying the thesis: Multiparty Computation: The effect of multiparty...

    • data.4tu.nl
    zip
    Updated Nov 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Masud Petronia (2020). Data underlying the thesis: Multiparty Computation: The effect of multiparty computation on firms' willingness to contribute protected data [Dataset]. http://doi.org/10.4121/13102430.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 6, 2020
    Dataset provided by
    4TU.ResearchData
    Authors
    Masud Petronia
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This thesis-mpc-dataset-public-readme.txt file was generated on 2020-10-20 by Masud Petronia

    GENERAL INFORMATION
    1. Title of Dataset: Data underlying the thesis: Multiparty Computation: The effect of multiparty computation on firms' willingness to contribute protected data
    2. Author Information A. Principal Investigator Contact Information Name: Masud Petronia Institution: TU Delft, Faculty of Technology, Policy and Management Address: Mekelweg 5, 2628 CD Delft, Netherlands Email: masud.petronia@gmail.com ORCID: https://orcid.org/0000-0003-2798-046X
    3: Description of dataset: This dataset contains perceptual data of firms' willingness to contribute protected data through multi party computation (MPC). Petronia (2020, ch. 6) draws several conclusions from this dataset and provides recommendations for future research Petronia (2020, ch. 7.4).
    4. Date of data collection: July-August 2020
    5. Geographic location of data collection: Netherlands
    6. Information about funding sources that supported the collection of the data: Horizon 2020 Research and Innovation Programme, Grant Agreement no 825225 – Safe Data Enabled Economic Development (SAFE-DEED), from the H2020-ICT-2018-2

    SHARING/ACCESS INFORMATION
    1. Licenses/restrictions placed on the data: CC 0
    2. Links to publications that cite or use the data: Petronia, M. N. (2020). Multiparty Computation: The effect of multiparty computation on firms' willingness to contribute protected data (Master's thesis). Retrieved from http://resolver.tudelft.nl/uuid:b0de4a4b-f5a3-44b8-baa4-a6416cebe26f
    3. Was data derived from another source? No
    4. Citation for this dataset: Petronia, M. N. (2020). Multiparty Computation: The effect of multiparty computation on firms' willingness to contribute protected data (Master's thesis). Retrieved from https://data.4tu.nl/. doi:10.4121/13102430

    DATA & FILE OVERVIEW
    1. File List: thesis-mpc-dataset-public.xlsxthesis-mpc-dataset-public-readme.txt (this document)
    2. Relationship between files: Dataset metadata and instructions
    3. Additional related data collected that was not included in the current data package: Occupation and role of respondents (traceable to unique reference), removed for privacy reasons.
    4. Are there multiple versions of the dataset? No

    METHODOLOGICAL INFORMATION
    1. Description of methods used for collection/generation of data: A pre- and post test experimental design. For more information; see Petronia (2020, ch. 5)
    2. Methods for processing the data: Full instructions are provided by Petronia (2020, ch. 6)
    3. Instrument- or software-specific information needed to interpret the data: Microsoft Excel can be used to convert the dataset to other formats.
    4. Environmental/experimental conditions: This dataset comprises three datasets collected through three channels. These channels are Prolific (incentive), LinkedIn/Twitter (voluntarily), and respondents in a lab setting (voluntarily). For more information; see Petronia (2020, ch. 6.1)
    5. Describe any quality-assurance procedures performed on the data: A thorough examination of consistency and reliability is performed. For more information; see Petronia (2020, ch. 6).
    6. People involved with sample collection, processing, analysis and/or submission: See Petronia (2020, ch. 6)

    DATA-SPECIFIC INFORMATION
    1. Number of variables: see worksheet experiment_matrix of thesis-mpc-dataset-public.xlsx
    2. Number of cases/rows: see worksheet experiment_matrix of thesis-mpc-dataset-public.xlsx
    3. Variable List: see worksheet labels of thesis-mpc-dataset-public.xlsx
    4. Missing data codes: see worksheet comments of thesis-mpc-dataset-public.xlsx
    5. Specialized formats or other abbreviations used: Multiparty computation (MPC) and Trusted Third Party (TTP).

    INSTRUCTIONS
    1. Petronia (2020, ch. 6) describes associated tests and respective syntax.

  7. a

    Meteorological data from datalogger and sensors near 21 plots at Thule Air...

    • arcticdata.io
    • search.dataone.org
    Updated Oct 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steven F. Oberbauer (2020). Meteorological data from datalogger and sensors near 21 plots at Thule Air Base, Greenland, 2013 [Dataset]. http://doi.org/10.18739/A2K931726
    Explore at:
    Dataset updated
    Oct 8, 2020
    Dataset provided by
    Arctic Data Center
    Authors
    Steven F. Oberbauer
    Area covered
    Description

    Data from a small meteorological station set-up near 21 plots in 2013. Campbell Scientific CR10 datalogger, Campbell 215 temp/humidity sensor, two Apogee PAR sensors (one facing up, another facing down), soil temperature with type T thermocouple, Campbell CS616 soil reflectometer for soil water content. Data collected between DOY153 and DOY224. Logger collected a measurement every 60 seconds and averaged to 5 min data table. Post-processing to 60 min averages and daily mean, max, and min. MS Excel (.xls) workbook with three worksheets. Worksheet 5_min data columns: year, day of year, hour, minute, fractional day of year, incoming PAR (umol m-2 s-1), reflected PAR (umol m-2 s-1), albedo calculated as (par_out/par_in)*100, air temperature (C), relative humidity (%), soil temp (C), raw reflectance time reported by CS616, calculated volumetric water content corrected for soil temperature (v/v), battery voltage. Worksheet 60_min data columns (units as above): day of year, hour, fractional day of year, week of year, air temperature, relative humidity, incoming PAR, outgoing PAR, albedo, soil temperature, and volumetric water content. Worksheet daily (units as above unless indicated): date, day of year, air temperature min, air temperature max, air temperature mean, relative humidity min, relative humidity max, relative humidity mean, soil temperature mean, soil water content mean, total incoming PAR (mol m-2 d-1), out going PAR (mol m-2 d-1), albedo, minimum battery voltage. missing values are -6999 or 6999. Soil temperature and VWC not valid until instruments could be installed in the soil DOY 163. RH sensor failed DOY177, did not function again. Battery issue DOY 183.

  8. S

    Data from a Chinese measurement tool for the accessibility of death-related...

    • scidb.cn
    Updated Sep 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    chen xin yu (2025). Data from a Chinese measurement tool for the accessibility of death-related thoughts [Dataset]. http://doi.org/10.57760/sciencedb.psych.00724
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 20, 2025
    Dataset provided by
    Science Data Bank
    Authors
    chen xin yu
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The dataset was generated from a laboratory experiment based on the dot-matrix integration paradigm, designed to measure death thought accessibility (DTA). The study was conducted under controlled conditions, with participants tested individually in a quiet, dimly lit room. Stimulus presentation and response collection were implemented using PsychoPy (exact version number provided in the supplementary materials), and reaction times were recorded via a standard USB keyboard. Experimental stimuli consisted of five categories of two-character Chinese words rendered in dot-matrix form: death-related words, metaphorical-death words, positive words, neutral words, and meaningless words. Stimuli were centrally displayed on the screen, with presentation durations and inter-stimulus intervals (ISI) precisely controlled at the millisecond level.Data collection took place in spring 2025, with a total of 39 participants contributing approximately 16,699 valid trials. Each trial-level record includes participant ID, priming condition (0 = neutral priming, 1 = mortality salience priming), word type, inter-stimulus interval (in milliseconds), reaction time (in milliseconds), and recognition accuracy (0 = incorrect, 1 = correct). In the dataset, rows correspond to single trials and columns represent experimental variables. Reaction times were measured in milliseconds and later log-transformed for statistical analyses to reduce skewness. Accuracy was coded as a binary variable indicating correct recognition.Data preprocessing included the removal of extreme reaction times (less than 150 ms or greater than 3000 ms). Only trials with valid responses were retained for analysis. Missing data were minimal (<1% of all trials), primarily due to occasional non-responses by participants, and are explicitly marked in the dataset. Potential sources of error include natural individual variability in reaction times and minor recording fluctuations from input devices, which are within the millisecond range and do not affect overall patterns.The data files are stored in Excel format (.xlsx), with each participant’s data saved in a separate file named according to the participant ID. Within each file, the first row contains variable names, and subsequent rows record trial-level observations, allowing for straightforward data access and processing. Excel files are compatible with a wide range of statistical software, including R, Python, SPSS, and Microsoft Excel, and no additional software is required to open them. A supplementary documentation file accompanies the dataset, providing detailed explanations of all variables and data processing steps. A complete codebook of variable definitions is included in the appendix to facilitate data interpretation and ensure reproducibility of the analyses.

  9. f

    UC_vs_US Statistic Analysis.xlsx

    • figshare.com
    xlsx
    Updated Jul 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    F. (Fabiano) Dalpiaz (2020). UC_vs_US Statistic Analysis.xlsx [Dataset]. http://doi.org/10.23644/uu.12631628.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 9, 2020
    Dataset provided by
    Utrecht University
    Authors
    F. (Fabiano) Dalpiaz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.

    Tagging scheme:
    Aligned (AL) - A concept is represented as a class in both models, either
    

    with the same name or using synonyms or clearly linkable names; Wrongly represented (WR) - A class in the domain expert model is incorrectly represented in the student model, either (i) via an attribute, method, or relationship rather than class, or (ii) using a generic term (e.g., user'' instead ofurban planner''); System-oriented (SO) - A class in CM-Stud that denotes a technical implementation aspect, e.g., access control. Classes that represent legacy system or the system under design (portal, simulator) are legitimate; Omitted (OM) - A class in CM-Expert that does not appear in any way in CM-Stud; Missing (MI) - A class in CM-Stud that does not appear in any way in CM-Expert.

    All the calculations and information provided in the following sheets
    

    originate from that raw data.

    Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
    

    including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.

    Sheet 3 (Size-Ratio):
    

    The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.

    Sheet 4 (Overall):
    

    Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.

    For sheet 4 as well as for the following four sheets, diverging stacked bar
    

    charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:

    Sheet 5 (By-Notation):
    

    Model correctness and model completeness is compared by notation - UC, US.

    Sheet 6 (By-Case):
    

    Model correctness and model completeness is compared by case - SIM, HOS, IFA.

    Sheet 7 (By-Process):
    

    Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.

    Sheet 8 (By-Grade):
    

    Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.

  10. f

    Independent Data Aggregation, Quality Control and Visualization of...

    • datasetcatalog.nlm.nih.gov
    Updated Oct 21, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ly, Chun; Knott, Cheryl; McCleary, Jill; Castiello-Gutiérrez, Santiago (2020). Independent Data Aggregation, Quality Control and Visualization of University of Arizona COVID-19 Re-Entry Testing Data [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000484783
    Explore at:
    Dataset updated
    Oct 21, 2020
    Authors
    Ly, Chun; Knott, Cheryl; McCleary, Jill; Castiello-Gutiérrez, Santiago
    Description

    AbstractThe dataset provided here contains the efforts of independent data aggregation, quality control, and visualization of the University of Arizona (UofA) COVID-19 testing programs for the 2019 novel Coronavirus pandemic. The dataset is provided in the form of machine-readable tables in comma-separated value (.csv) and Microsoft Excel (.xlsx) formats.Additional InformationAs part of the UofA response to the 2019-20 Coronavirus pandemic, testing was conducted on students, staff, and faculty prior to start of the academic year and throughout the school year. These testings were done at the UofA Campus Health Center and through their instance program called "Test All Test Smart" (TATS). These tests identify active cases of SARS-nCoV-2 infections using the reverse transcription polymerase chain reaction (RT-PCR) test and the Antigen test. Because the Antigen test provided more rapid diagnosis, it was greatly used three weeks prior to the start of the Fall semester and throughout the academic year.As these tests were occurring, results were provided on the COVID-19 websites. First, beginning in early March, the Campus Health Alerts website reported the total number of positive cases. Later, numbers were provided for the total number of tests (March 12 and thereafter). According to the website, these numbers were updated daily for positive cases and weekly for total tests. These numbers were reported until early September where they were then included in the reporting for the TATS program.For the TATS program, numbers were provided through the UofA COVID-19 Update website. Initially on August 21, the numbers provided were the total number (July 31 and thereafter) of tests and positive cases. Later (August 25), additional information was provided where both PCR and Antigen testings were available. Here, the daily numbers were also included. On September 3, this website then provided both the Campus Health and TATS data. Here, PCR and Antigen were combined and referred to as "Total", and daily and cumulative numbers were provided.At this time, no official data dashboard was available until September 16, and aside from the information provided on these websites, the full dataset was not made publicly available. As such, the authors of this dataset independently aggregated data from multiple sources. These data were made publicly available through a Google Sheet with graphical illustration provided through the spreadsheet and on social media. The goal of providing the data and illustrations publicly was to provide factual information and to understand the infection rate of SARS-nCoV-2 in the UofA community.Because of differences in reported data between Campus Health and the TATS program, the dataset provides Campus Health numbers on September 3 and thereafter. TATS numbers are provided beginning on August 14, 2020.Description of Dataset ContentThe following terms are used in describing the dataset.1. "Report Date" is the date and time in which the website was updated to reflect the new numbers2. "Test Date" is to the date of testing/sample collection3. "Total" is the combination of Campus Health and TATS numbers4. "Daily" is to the new data associated with the Test Date5. "To Date (07/31--)" provides the cumulative numbers from 07/31 and thereafter6. "Sources" provides the source of information. The number prior to the colon refers to the number of sources. Here, "UACU" refers to the UA COVID-19 Update page, and "UARB" refers to the UA Weekly Re-Entry Briefing. "SS" and "WBM" refers to screenshot (manually acquired) and "Wayback Machine" (see Reference section for links) with initials provided to indicate which author recorded the values. These screenshots are available in the records.zip file.The dataset is distinguished where available by the testing program and the methods of testing. Where data are not available, calculations are made to fill in missing data (e.g., extrapolating backwards on the total number of tests based on daily numbers that are deemed reliable). Where errors are found (by comparing to previous numbers), those are reported on the above Google Sheet with specifics noted.For inquiries regarding the contents of this dataset, please contact the Corresponding Author listed in the README.txt file. Administrative inquiries (e.g., removal requests, trouble downloading, etc.) can be directed to data-management@arizona.edu

  11. CalEnviroScreen Source Data

    • zenodo.org
    • data-staging.niaid.nih.gov
    zip
    Updated Dec 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Will Fitzgerald; Will Fitzgerald; Gretchen Gehrke; Gretchen Gehrke (2024). CalEnviroScreen Source Data [Dataset]. http://doi.org/10.5281/zenodo.14563093
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 27, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Will Fitzgerald; Will Fitzgerald; Gretchen Gehrke; Gretchen Gehrke
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Published by the California Office of Environmental Health Hazard Assessment (OEHHA), CalEnviroScreen is a screening methodology that can be used to help identify California communities that are disproportionately burdened by multiple sources of pollution. This data was downloaded from the CalEnviroScreen website and processed by the Environmental Data & Governance Initiative (EDGI) The URL for the original data is: https://oehha.ca.gov/calenviroscreen/report/calenviroscreen-40

    There are two files in this deposit:

    1. calenviroscreen40resultsdatadictionaryf2021.zip - a zip file containing (1) a spreadsheet showing raw data and calculated percentiles for individual indicators and combined CalEnviroScreen scores for individual census tracts with additional demographic information. (2) a pdf document including the data dictionary and information on zeros and missing values: CalEnviroScreen 4.0 Excel and Data Dictionary PDF.
    2. alenviroscreen40shpf2021shp.zip - a zip file containing the shapefile for the CalEnviroScreen 4.0 results
  12. Z

    Elective Disciplines Survey Data from Kryvyi Rih State Pedagogical...

    • data.niaid.nih.gov
    Updated Mar 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Semerikov, Serhiy; Bondarenko, Olha; Kryvyi Rih State Pedagogical University (2025). Elective Disciplines Survey Data from Kryvyi Rih State Pedagogical University [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_15098019
    Explore at:
    Dataset updated
    Mar 27, 2025
    Dataset provided by
    Kryvyi Rih State Pedagogical University
    Authors
    Semerikov, Serhiy; Bondarenko, Olha; Kryvyi Rih State Pedagogical University
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Kryvyi Rih
    Description

    Elective Disciplines Survey Data - Kryvyi Rih State Pedagogical University

    Description

    Survey data collected from students at Kryvyi Rih State Pedagogical University regarding the process of studying elective disciplines. The survey aims to analyze priorities for improving educational quality and identify issues in the procedure for selecting and studying elective courses.

    Dataset Information

    Number of Records: 1089

    Number of Variables: 15

    Data Collection Date: 2025

    Location: Kryvyi Rih, Ukraine

    License: https://creativecommons.org/licenses/by/4.0/

    Variables in Dataset

    The dataset contains the following variables:

    номер_відповіді: int64

    вкажіть_ступінь_вищої_освіти,_який_ви_здобуваєте_в_університеті:: object

    вкажіть_освітню_програму,_за_якою_ви_навчаєтесь_в_університеті:: object

    чи_знайомі_ви_з_процедурою_вибору_навчальних_дисциплін?: object

    (04)_як_ви_ставитесь_до_процедури_вибору_навчальних_дисциплін_в_університеті?: object

    якщо_на_попереднє_питання_відповіли_інше,_вкажіть_як_саме.: object

    чи_влаштовує_вас_кількість_запропонованих_дисциплін_вільного_вибору?: object

    (06)_визначте_чинники,які_впливають_на_вибір_вами_навчальних_дисциплін(оберіть_усі_можливі_варіанти): object

    якщо_на_попереднє_питання_відповіли_інше,_вкажіть_як_саме..1: object

    чи_ознайомлювалися_ви_з_силабусами_вибіркових_дисциплін_перед_тим,_як_зробити_свій_вибір?: object

    (08)_обрані_вами_дисципліни_виявилися:: object

    якщо_на_попереднє_питання_відповіли_інше,_вкажіть_як_саме..2: object

    які_з_вибраних_вами_дисциплін_виявилися_найбільш_цікавими_й_корисними?: object

    чи_обрали_б_ви_ці_дисципліни_повторно,_чи_змінили_б_свій_вибір?: object

    ваші_пропозиції_щодо_політики_і_процедур_вибору_навчальних_дисциплін: object

    Files in Package

    This data package includes the following files:

    elective_disciplines_survey_data.csv: Data in CSV format (comma-separated)

    elective_disciplines_survey_data.tsv: Data in TSV format (tab-separated)

    elective_disciplines_survey_data.json: Data in JSON format (line-delimited JSON records)

    elective_disciplines_survey_data.xlsx: Data in Excel format

    metadata.json: Comprehensive metadata in JSON-LD format

    README.md: This file

    Usage and Citation

    When using this dataset, please cite:

    Kryvyi Rih State Pedagogical University. (2025). Elective Disciplines Survey Data. Zenodo. https://doi.org/[DOI_TO_BE_ASSIGNED]

    Processing Information

    Loaded original Excel survey data

    Cleaned and standardized column names

    Handled missing values

    Anonymized personal identifiers

    Converted to multiple standard formats (CSV, TSV, JSON, Excel)

    Generated comprehensive metadata

    Contact Information

    For questions about this dataset, please contact:

    Research Department, Kryvyi Rih State Pedagogical University

    Email: semerikov@gmail.com

    Website: https://kdpu.edu.ua/en/

  13. Quality Assurance and Quality Control (QA/QC) of Meteorological Time Series...

    • osti.gov
    • dataone.org
    • +1more
    Updated Dec 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Environmental System Science Data Infrastructure for a Virtual Ecosystem (2020). Quality Assurance and Quality Control (QA/QC) of Meteorological Time Series Data for Billy Barr, East River, Colorado USA [Dataset]. http://doi.org/10.15485/1823516
    Explore at:
    Dataset updated
    Dec 31, 2020
    Dataset provided by
    Office of Sciencehttp://www.er.doe.gov/
    Environmental System Science Data Infrastructure for a Virtual Ecosystem
    Area covered
    Colorado, United States, East River
    Description

    A comprehensive Quality Assurance (QA) and Quality Control (QC) statistical framework consists of three major phases: Phase 1—Preliminary raw data sets exploration, including time formatting and combining datasets of different lengths and different time intervals; Phase 2—QA of the datasets, including detecting and flagging of duplicates, outliers, and extreme values; and Phase 3—the development of time series of a desired frequency, imputation of missing values, visualization and a final statistical summary. The time series data collected at the Billy Barr meteorological station (East River Watershed, Colorado) were analyzed. The developed statistical framework is suitable for both real-time and post-data-collection QA/QC analysis of meteorological datasets.The files that are in this data package include one excel file, converted to CSV format (Billy_Barr_raw_qaqc.csv) that contains the raw meteorological data, i.e., input data used for the QA/QC analysis. The second CSV file (Billy_Barr_1hr.csv) is the QA/QC and flagged meteorological data, i.e., output data from the QA/QC analysis. The last file (QAQC_Billy_Barr_2021-03-22.R) is a script written in R that implements the QA/QC and flagging process. The purpose of the CSV data files included in this package is to provide input and output files implemented in the R script.

  14. u

    Environmental data for Macrobenthic sampling stations in the Chukchi Sea...

    • ckanprod.data-commons.k8s.ucar.edu
    excel
    Updated Oct 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arny L. Blanchard; Howard Feder (2025). Environmental data for Macrobenthic sampling stations in the Chukchi Sea [Blanchard] [Dataset]. http://doi.org/10.5065/D6HX19QD
    Explore at:
    excelAvailable download formats
    Dataset updated
    Oct 7, 2025
    Authors
    Arny L. Blanchard; Howard Feder
    Time period covered
    Aug 29, 1986 - Oct 5, 1987
    Area covered
    Description

    These data were collected in 1986 and 1987 under the supervision of Dr. Howard Feder. The data include sediment grain-size (%), sediment organic carbon (mg per g), bottom-water temperature (C), and bottom-water salinity. See the related publication for details. This dataset is part of the Pacific Marine Arctic Regional Synthesis (PacMARS) Project. Note that there are missing values that exist as blanks in the Excel file.

  15. Z

    Help Me study! Music Listening Habits While Studying (Dataset)

    • data.niaid.nih.gov
    Updated Apr 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cheah, Yiting (2024). Help Me study! Music Listening Habits While Studying (Dataset) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10085103
    Explore at:
    Dataset updated
    Apr 4, 2024
    Dataset provided by
    University of Liverpool
    Authors
    Cheah, Yiting
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the raw data used for a research study that examined university students' music listening habits while studying. There are two experiments in this research study. Experiment 1 is a retrospective survey, and Experiment 2 is a mobile experience sampling research study. This repository contains five Microsoft Excel files with data obtained from both experiments. The files are as follows:

    onlineSurvey_raw_data.xlsx esm_raw_data.xlsx esm_music_features_analysis.xlsx esm_demographics.xlsx index.xlsx Files Description File: onlineSurvey_raw_data.xlsx This file contains the raw data from Experiment 1, including the (anonymised) demographic information of the sample. The sample characteristics recorded are:

    studentship area of study country of study type of accommodation a participant was living in age self-identified gender language ability (mono- or bi-/multilingual) (various) personality traits (various) musicianship (various) everyday music uses (various) music capacity The file also contains raw data of responses to the questions about participants' music listening habits while studying in real life. These pieces of data are:

    likelihood of listening to specific (rated across 23) music genres while studying and during everyday listening. likelihood of listening to music with specific acoustic features (e.g., with/without lyrics, loud/soft, fast/slow) music genres while studying and during everyday listening. general likelihood of listening to music while studying in real life. (verbatim) responses to participants' written responses to the open-ended questions about their real-life music listening habits while studying. File: esm_raw_data.xlsx This file contains the raw data from Experiment 2, including the following variables:

    information of the music tracks (track name, artist name, and if available, Spotify ID of those tracks) each participant was listening to during each music episode (both while studying and during everyday-listening) level of arousal at the onset of music playing and the end of the 30-minute study period level of valence at the onset of music playing and the end of the 30-minute study period specific mood at the onset of music playing and the end of the 30-minute study period whether participants were studying their location at that moment (if studying) whether they were studying alone (if studying) the types of study tasks (if studying) the perceived level of difficulty of the study task whether participants were planning to listen to music while studying (various) reasons for music listening (various) perceived positive and negative impacts of studying with music Each row represents the data for a single participant. Rows with a record of a participant ID but no associated data indicate that the participant did not respond to the questionnaire (i.e., missing data). File: esm_music_features_analysis.xlsx This file presents the music features of each recorded music track during both the study-episodes and the everyday-episodes (retrieved from Spotify's "Get Track's Audio Features" API). These features are:

    energy level loudness valence tempo mode The contextual details of the moments each track was being played are also presented here, which include:

    whether the participant was studying their location (e.g., at home, cafe, university) whether they were studying alone the type of study tasks they were engaging with (e.g., reading, writing) the perceived difficulty level of the task File: esm_demographics.xlsx This file contains the demographics of the sample in Experiment 2 (N = 10), which are the same as in Experiment 1 (see above). Each row represents the data for a single participant. Rows with a record of a participant ID but no associated demographic data indicate that the participant did not respond to the questionnaire (i.e., missing data). File: index.xlsx Finally, this file contains all the abbreviations used in each document as well as their explanations.

  16. u

    Environmental data for Macrobenthic sampling stations in the Chukchi Sea...

    • data.ucar.edu
    • dataone.org
    • +1more
    excel
    Updated Oct 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arny L. Blanchard; Howard Feder (2025). Environmental data for Macrobenthic sampling stations in the Chukchi Sea [Blanchard] [Dataset]. http://doi.org/10.5065/D6HX19QD
    Explore at:
    excelAvailable download formats
    Dataset updated
    Oct 7, 2025
    Authors
    Arny L. Blanchard; Howard Feder
    Time period covered
    Aug 29, 1986 - Oct 5, 1987
    Area covered
    Description

    These data were collected in 1986 and 1987 under the supervision of Dr. Howard Feder. The data include sediment grain-size (%), sediment organic carbon (mg per g), bottom-water temperature (C), and bottom-water salinity. See the related publication for details. This dataset is part of the Pacific Marine Arctic Regional Synthesis (PacMARS) Project. Note that there are missing values that exist as blanks in the Excel file.

  17. H

    New Haven Crime Data, 2010

    • dataverse.harvard.edu
    Updated Jun 9, 2015
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raman Prasad (2015). New Haven Crime Data, 2010 [Dataset]. http://doi.org/10.7910/DVN/P9EON2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 9, 2015
    Dataset provided by
    Harvard Dataverse
    Authors
    Raman Prasad
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    New Haven
    Description

    This dataset includes files used for the New Haven Crime Log website, www.newhavencrimelog.org This covers the year 2010, including: Excel files provided by the police department* Text files, should be the same as Excel Geocoded text files Error files from geocoding not included - incidents are missing between the Excel files and Geocoded text files * Missing June/July Excel files but still have text versions.

  18. m

    A brief dataset highlighting online learning test scores of Bangladeshi...

    • data.mendeley.com
    Updated Feb 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shabab Rahman (2024). A brief dataset highlighting online learning test scores of Bangladeshi high-school students [Dataset]. http://doi.org/10.17632/g88h8vz9kg.2
    Explore at:
    Dataset updated
    Feb 6, 2024
    Authors
    Shabab Rahman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bangladesh
    Description

    Purposive sampling was the method we chose to collect the data. We obtained information from two after-school coaching programs that voluntarily provided their online learning data to us in 2020 during the pandemic. Batches of 45 and 75 students each were used to organize the data, which were then combined to create a single dataset with 399 entries. Two phases of collection took place: on January 17, 2023, and on February 12, 2023. The initial data recording was done using Google Learning Management System's Google Classroom. The data was then exported to local storage by the classroom faculties and then passed onto the researchers. Excel was used to organize the data, with rows representing individual students and columns representing different topics. The dataset, which consists of four mock tests and sixteen physics topics, was gathered from grade 10 physics instructors and students. Every pupil was given a unique ID to protect their privacy, resulting in 399 distinct entries overall. The coaching institution standardized the dataset to score it out of 100 for consistency. It is important to note that for students who did not take the majority of the exams, the institutions did not gather or transmit missing data. The dataset displays a spread with a standard deviation of 20.5 and an average score of 69.547.

  19. Z

    A dataset from a survey investigating disciplinary differences in data...

    • nde-dev.biothings.io
    • data.niaid.nih.gov
    • +1more
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gregory, Kathleen (2024). A dataset from a survey investigating disciplinary differences in data citation [Dataset]. https://nde-dev.biothings.io/resources?id=zenodo_7555362
    Explore at:
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Gregory, Kathleen
    Ninkov, Anton Boudreau
    Haustein, Stefanie
    Ripp, Chantal
    Peters, Isabella
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    GENERAL INFORMATION

    Title of Dataset: A dataset from a survey investigating disciplinary differences in data citation

    Date of data collection: January to March 2022

    Collection instrument: SurveyMonkey

    Funding: Alfred P. Sloan Foundation

    SHARING/ACCESS INFORMATION

    Licenses/restrictions placed on the data: These data are available under a CC BY 4.0 license

    Links to publications that cite or use the data:

    Gregory, K., Ninkov, A., Ripp, C., Peters, I., & Haustein, S. (2022). Surveying practices of data citation and reuse across disciplines. Proceedings of the 26th International Conference on Science and Technology Indicators. International Conference on Science and Technology Indicators, Granada, Spain. https://doi.org/10.5281/ZENODO.6951437

    Gregory, K., Ninkov, A., Ripp, C., Roblin, E., Peters, I., & Haustein, S. (2023). Tracing data: A survey investigating disciplinary differences in data citation. Zenodo. https://doi.org/10.5281/zenodo.7555266

    DATA & FILE OVERVIEW

    File List

    Filename: MDCDatacitationReuse2021Codebookv2.pdf Codebook

    Filename: MDCDataCitationReuse2021surveydatav2.csv Dataset format in csv

    Filename: MDCDataCitationReuse2021surveydatav2.sav Dataset format in SPSS

    Filename: MDCDataCitationReuseSurvey2021QNR.pdf Questionnaire

    Additional related data collected that was not included in the current data package: Open ended questions asked to respondents

    METHODOLOGICAL INFORMATION

    Description of methods used for collection/generation of data:

    The development of the questionnaire (Gregory et al., 2022) was centered around the creation of two main branches of questions for the primary groups of interest in our study: researchers that reuse data (33 questions in total) and researchers that do not reuse data (16 questions in total). The population of interest for this survey consists of researchers from all disciplines and countries, sampled from the corresponding authors of papers indexed in the Web of Science (WoS) between 2016 and 2020.

    Received 3,632 responses, 2,509 of which were completed, representing a completion rate of 68.6%. Incomplete responses were excluded from the dataset. The final total contains 2,492 complete responses and an uncorrected response rate of 1.57%. Controlling for invalid emails, bounced emails and opt-outs (n=5,201) produced a response rate of 1.62%, similar to surveys using comparable recruitment methods (Gregory et al., 2020).

    Methods for processing the data:

    Results were downloaded from SurveyMonkey in CSV format and were prepared for analysis using Excel and SPSS by recoding ordinal and multiple choice questions and by removing missing values.

    Instrument- or software-specific information needed to interpret the data:

    The dataset is provided in SPSS format, which requires IBM SPSS Statistics. The dataset is also available in a coded format in CSV. The Codebook is required to interpret to values.

    DATA-SPECIFIC INFORMATION FOR: MDCDataCitationReuse2021surveydata

    Number of variables: 95

    Number of cases/rows: 2,492

    Missing data codes: 999 Not asked

    Refer to MDCDatacitationReuse2021Codebook.pdf for detailed variable information.

  20. SPORTS_DATA_ANALYSIS_ON_EXCEL

    • kaggle.com
    zip
    Updated Dec 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nil kamal Saha (2024). SPORTS_DATA_ANALYSIS_ON_EXCEL [Dataset]. https://www.kaggle.com/datasets/nilkamalsaha/sports-data-analysis-on-excel
    Explore at:
    zip(1203633 bytes)Available download formats
    Dataset updated
    Dec 12, 2024
    Authors
    Nil kamal Saha
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    PROJECT OBJECTIVE

    We are a part of XYZ Co Pvt Ltd company who is in the business of organizing the sports events at international level. Countries nominate sportsmen from different departments and our team has been given the responsibility to systematize the membership roster and generate different reports as per business requirements.

    Questions (KPIs)

    TASK 1: STANDARDIZING THE DATASET

    • Populate the FULLNAME consisting of the following fields ONLY, in the prescribed format: PREFIX FIRSTNAME LASTNAME.{Note: All UPPERCASE)
    • Get the COUNTRY NAME to which these sportsmen belong to. Make use of LOCATION sheet to get the required data
    • Populate the LANGUAGE_!poken by the sportsmen. Make use of LOCTION sheet to get the required data
    • Generate the EMAIL ADDRESS for those members, who speak English, in the prescribed format :lastname.firstnamel@xyz .org {Note: All lowercase) and for all other members, format should be lastname.firstname@xyz.com (Note: All lowercase)
    • Populate the SPORT LOCATION of the sport played by each player. Make use of SPORT sheet to get the required data

    TASK 2: DATA FORMATING

    • Display MEMBER IDas always 3 digit number {Note: 001,002 ...,D2D,..etc)
    • Format the BIRTHDATE as dd mmm'yyyy (Prescribed format example: 09 May' 1986)
    • Display the units for the WEIGHT column (Prescribed format example: 80 kg)
    • Format the SALARY to show the data In thousands. If SALARY is less than 100,000 then display data with 2 decimal places else display data with one decimal place. In both cases units should be thousands (k) e.g. 87670 -> 87.67 k and 12 250 -> 123.2 k

    TASK 3: SUMMARIZE DATA - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1) • Create a PIVOT table in the worksheet ANALYSIS, starting at cell B3,with the following details:

    • In COLUMNS; Group : GENDER.
    • In ROWS; Group : COUNTRY (Note: use COUNTRY NAMES).
    • In VALUES; calculate the count of candidates from each COUNTRY and GENDER type, Remove GRAND TOTALs.

    TASK 4: SUMMARIZE DATA - EXCEL FUNCTIONS (Use SPORTSMEN worksheet after attempting TASK 1)

    • Create a SUMMARY table in the worksheet ANALYSIS,starting at cell G4, with the following details:

    • Starting from range RANGE H4; get the distinct GENDER. Use remove duplicates option and transpose the data.
    • Starting from range RANGE GS; get the distinct COUNTRY (Note: use COUNTRY NAMES).
    • In the cross table,get the count of candidates from each COUNTRY and GENDER type.

    TASK 5: GENERATE REPORT - PIVOT TABLE (Use SPORTSMEN worksheet after attempting TASK 1)

    • Create a PIVOT table report in the worksheet REPORT, starting at cell A3, with the following information:

    • Change the report layout to TABULAR form.
    • Remove expand and collapse buttons.
    • Remove GRAND TOTALs.
    • Allow user to filter the data by SPORT LOCATION.

    Process

    • Verify data for any missing values and anomalies, and sort out the same.
    • Made sure data is consistent and clean with respect to data type, data format and values used.
    • Created pivot tables according to the questions asked.
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Gaurav Tawri (2025). Netflix Movies and TV Shows Dataset Cleaned(excel) [Dataset]. https://www.kaggle.com/datasets/gauravtawri/netflix-movies-and-tv-shows-dataset-cleanedexcel
Organization logo

Netflix Movies and TV Shows Dataset Cleaned(excel)

Cleaned Netflix dataset with detailed formulas and step-by-step documentation

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 8, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Gaurav Tawri
Description

This dataset is a cleaned and preprocessed version of the original Netflix Movies and TV Shows dataset available on Kaggle. All cleaning was done using Microsoft Excel — no programming involved.

🎯 What’s Included: - Cleaned Excel file (standardized columns, proper date format, removed duplicates/missing values) - A separate "formulas_used.txt" file listing all Excel formulas used during cleaning (e.g., TRIM, CLEAN, DATE, SUBSTITUTE, TEXTJOIN, etc.) - Columns like 'date_added' have been properly formatted into DMY structure - Multi-valued columns like 'listed_in' are split for better analysis - Null values replaced with “Unknown” for clarity - Duration field broken into numeric + unit components

🔍 Dataset Purpose: Ideal for beginners and analysts who want to: - Practice data cleaning in Excel - Explore Netflix content trends - Analyze content by type, country, genre, or date added

📁 Original Dataset Credit: The base version was originally published by Shivam Bansal on Kaggle: https://www.kaggle.com/shivamb/netflix-shows

📌 Bonus: You can find a step-by-step cleaning guide and the same dataset on GitHub as well — along with screenshots and formulas documentation.

Search
Clear search
Close search
Google apps
Main menu