Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains one Excel sheet and five Word documents. In this dataset, Simulation.xlsx describes the parameter values used for the numerical analysis based on empirical data. In this Excel sheet, we calculated the values of each capped call-option model parameter. Computation of Table 2.docx and other documents show the results of the comparative statistics.
Facebook
TwitterThis dataset is a cleaned and preprocessed version of the original Netflix Movies and TV Shows dataset available on Kaggle. All cleaning was done using Microsoft Excel — no programming involved.
🎯 What’s Included: - Cleaned Excel file (standardized columns, proper date format, removed duplicates/missing values) - A separate "formulas_used.txt" file listing all Excel formulas used during cleaning (e.g., TRIM, CLEAN, DATE, SUBSTITUTE, TEXTJOIN, etc.) - Columns like 'date_added' have been properly formatted into DMY structure - Multi-valued columns like 'listed_in' are split for better analysis - Null values replaced with “Unknown” for clarity - Duration field broken into numeric + unit components
🔍 Dataset Purpose: Ideal for beginners and analysts who want to: - Practice data cleaning in Excel - Explore Netflix content trends - Analyze content by type, country, genre, or date added
📁 Original Dataset Credit: The base version was originally published by Shivam Bansal on Kaggle: https://www.kaggle.com/shivamb/netflix-shows
📌 Bonus: You can find a step-by-step cleaning guide and the same dataset on GitHub as well — along with screenshots and formulas documentation.
Facebook
TwitterSpatial extent: CaliforniaSpatial Unit: Census TractCreated: Oct 20, 2021Updated: Oct 20, 2021Source: California Office of Environmental Health Hazard AssessmentContact Email: CalEnviroScreen@oehha.ca.gov Source Link: https://oehha.ca.gov/calenviroscreen/report/calenviroscreen-40Microsoft Excel spreadsheet and PDF with a Data Dictionary: There are two files in this zipped folder. 1) a spreadsheet showing raw data and calculated percentiles for individual indicators and combined CalEnviroScreen scores for individual census tracts with additional demographic information. 2) a pdf document including the data dictionary and information on zeros and missing values: CalEnviroScreen 4.0 Excel and Data Dictionary PDF
Facebook
TwitterTHE USE OF MICROSOFT EXCEL IN TITANIC MACHINE LEARNING ON KAGGLE
This is a Titinic dataset. It was being cleaned with the help of Microsoft Excel. I actually used several Excel functions e.g IF, COUNTIFS etc to apply One-hot encoding to all the categorical features like Cabin, Embarked, Sex. Also, all the full names of Titanic passengers were also categorized into "Mr","Mrs","Master" and "Miss" respectively by using the function " remove duplicate" in Excel and this helps to extract only the Titles from all the passengers full names e.g Mr, Mrs,Miss & Master. The feature "Family" was encoded to "IsAlone" using Excel function "IF" , where "0" represents IsAlone (i.e only the passenger came alone without any siblings, spouse or relative) while "1" represents IsNotAlone(i.e the passenger possibly came with at least one person e.g spouse, family member or relative). Also, the feature "Ticket" that's of different different types or varieties was being encoded using One-Hot Encoder with the help of the same Excel functions: IF, COUNTIFS to categorize "Ticket" into different types of Tickets taken by Titanic Passengers. The missing values under the feature "Age" was being replaced using "Mode" the age of the passenger with highest frequency or highest number of occurrence. Amazingly, at the end of this, the train score is 84.9% (when using XGBClassifier) while Test prediction score is 84.3%. (Difference = 84.9%-84.3% = 0.6).
Titanic Machine Learning Leaderboard Score: 0.8181. Among Top 4% on the LB being my first attempt Machine Learning Submission on Kaggle.com.
Note: Though I was able to attain top 1% currently on Kaggle with LB Score: 0.8889 with rigorous research on different approaches in Machine Learning.
First of all, with this of my little achievement on Kaggle Machine Learning Competition just bearly Five(5) months that I joined and started applying my acquired ML knowledge. I acknowledge those behind this amazing platform called Kaggle.com and definitely I really appreciate those who had taken their time out of no time to teach online on how to clean a dataset using several Microsoft Excel functions. I learnt a lot from these online videos. Thus, we can imagine the combination of Excel and Python codes.
Though the use Excel functions to clean up dataset marveled me a lot. And also to see how powerful Microsoft Excel could be. But nevertheless, I will love to see if there's a new or different approach to this, as regards encoding dataset features and also fixing or replacing missing values in a dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Finding a good data source is the first step toward creating a database. Cardiovascular illnesses (CVDs) are the major cause of death worldwide. CVDs include coronary heart disease, cerebrovascular disease, rheumatic heart disease, and other heart and blood vessel problems. According to the World Health Organization, 17.9 million people die each year. Heart attacks and strokes account for more than four out of every five CVD deaths, with one-third of these deaths occurring before the age of 70 A comprehensive database for factors that contribute to a heart attack has been constructed , The main purpose here is to collect characteristics of Heart Attack or factors that contribute to it. As a result, a form is created to accomplish this. Microsoft Excel was used to create this form. Figure 1 depicts the form which It has nine fields, where eight fields for input fields and one field for output field. Age, gender, heart rate, systolic BP, diastolic BP, blood sugar, CK-MB, and Test-Troponin are representing the input fields, while the output field pertains to the presence of heart attack, which is divided into two categories (negative and positive).negative refers to the absence of a heart attack, while positive refers to the presence of a heart attack.Table 1 show the detailed information and max and min of values attributes for 1319 cases in the whole database.To confirm the validity of this data, we looked at the patient files in the hospital archive and compared them with the data stored in the laboratories system. On the other hand, we interviewed the patients and specialized doctors. Table 2 is a sample for 1320 cases, which shows 44 cases and the factors that lead to a heart attack in the whole database,After collecting this data, we checked the data if it has null values (invalid values) or if there was an error during data collection. The value is null if it is unknown. Null values necessitate special treatment. This value is used to indicate that the target isn’t a valid data element. When trying to retrieve data that isn't present, you can come across the keyword null in Processing. If you try to do arithmetic operations on a numeric column with one or more null values, the outcome will be null. An example of a null values processing is shown in Figure 2.The data used in this investigation were scaled between 0 and 1 to guarantee that all inputs and outputs received equal attention and to eliminate their dimensionality. Prior to the use of AI models, data normalization has two major advantages. The first is to avoid overshadowing qualities in smaller numeric ranges by employing attributes in larger numeric ranges. The second goal is to avoid any numerical problems throughout the process.After completion of the normalization process, we split the data set into two parts - training and test sets. In the test, we have utilized1060 for train 259 for testing Using the input and output variables, modeling was implemented.
Facebook
TwitterExcel, Stata SPSS Epi info
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is provided in a single .xlsx file named "eucalyptus_growth_environment_data_V2.xlsx" and consists of fifteen sheets:
Codebook: This sheet details the index, values, and descriptions for each field within the dataset, providing a comprehensive guide to understanding the data structure.
ALL NODES: Contains measurements from all devices, totalling 102,916 data points. This sheet aggregates the data across all nodes.
GWD1 to GWD10: These subset sheets include measurements from individual nodes, labelled according to the abbreviation “Generic Wireless Dendrometer” followed by device IDs 1 through 10. Each sheet corresponds to a specific node, representing measurements from ten trees (or nodes).
Metadata: Provides detailed metadata for each node, including species, initial diameter, location, measurement frequency, battery specifications, and irrigation status. This information is essential for identifying and differentiating the nodes and their specific attributes.
Missing Data Intervals: Details gaps in the data stream, including start and end dates and times when data was not uploaded. It includes information on the total duration of each missing interval and the number of missing data points.
Missing Intervals Distribution: Offers a summary of missing data intervals and their distribution, providing insight into data gaps and reasons for missing data.
All nodes utilize LoRaWAN for data transmission. Please note that intermittent data gaps may occur due to connectivity issues between the gateway and the nodes, as well as maintenance activities or experimental procedures.
Software considerations: The provided R code named “Simple_Dendro_Imputation_and_Analysis.R” is a comprehensive analysis workflow that processes and analyses Eucalyptus growth and environmental data from the "eucalyptus_growth_environment_data_V2.xlsx" dataset. The script begins by loading necessary libraries, setting the working directory, and reading the data from the specified Excel sheet. It then combines date and time information into a unified DateTime format and performs data type conversions for relevant columns. The analysis focuses on a specified device, allowing for the selection of neighbouring devices for imputation of missing data. A loop checks for gaps in the time series and fills in missing intervals based on a defined threshold, followed by a function that imputes missing values using the average from nearby devices. Outliers are identified and managed through linear interpolation. The code further calculates vapor pressure metrics and applies temperature corrections to the dendrometer data. Finally, it saves the cleaned and processed data into a new Excel file while conducting dendrometer analysis using the dendRoAnalyst package, which includes visualizations and calculations of daily growth metrics and correlations with environmental factors such as vapour pressure deficit (VPD).
Facebook
TwitterExample of a filtered Microsoft Excel spreadsheet for TaAMY2 single null mutant detection (selected data).
Facebook
TwitterMissing values are indicated as NA, and explanations of headings are provided in the tab 'codes' of the Excel files and in the "read me first" file. The data file can be opened with excel and/or R software (R codes are also provided).
Facebook
TwitterFile List Verberk body size.xls Verberk body size.csv Verberk richness.xls Verberk richness.csv Description [Please be aware that ESA cannot guarantee the forward migration (availability) of the Excel (.xls) files] Verberk body size.xls gives all the data and formulae used to analyze the relationship between amphipod body size and clines in environmental conditions purportedly linked with oxygen (altitude, temperature, salinity). Verberk body size.csv is a comma-delimited version of the body size worksheet, giving all the data and the calculated values for the oxygen supply index, but without the formulae. Verberk richness.xls gives all the data and formulae used to analyze the relationship between species richness and clines in environmental conditions purportedly linked with oxygen (altitude, organic pollution, temperature). Verberk richness.csv is a comma-delimited version of the richness worksheet, giving all the data and the calculated values for the oxygen supply index, but without the formulae. Check sum values for: Verberk body size.xls and .csv: column 1 (Site descriptors): checksum is 136, no missing values Verberk richness.xls: column 1 (Site descriptors): checksum is 655, 3 missing values Verberk richness.csv: column 2 (Site descriptors): checksum is 655, 3 missing values
Facebook
TwitterData from a small meteorological station set-up near 21 plots in 2013. Campbell Scientific CR10 datalogger, Campbell 215 temp/humidity sensor, two Apogee PAR sensors (one facing up, another facing down), soil temperature with type T thermocouple, Campbell CS616 soil reflectometer for soil water content. Data collected between DOY153 and DOY224. Logger collected a measurement every 60 seconds and averaged to 5 min data table. Post-processing to 60 min averages and daily mean, max, and min. MS Excel (.xls) workbook with three worksheets. Worksheet 5_min data columns: year, day of year, hour, minute, fractional day of year, incoming PAR (umol m-2 s-1), reflected PAR (umol m-2 s-1), albedo calculated as (par_out/par_in)*100, air temperature (C), relative humidity (%), soil temp (C), raw reflectance time reported by CS616, calculated volumetric water content corrected for soil temperature (v/v), battery voltage. Worksheet 60_min data columns (units as above): day of year, hour, fractional day of year, week of year, air temperature, relative humidity, incoming PAR, outgoing PAR, albedo, soil temperature, and volumetric water content. Worksheet daily (units as above unless indicated): date, day of year, air temperature min, air temperature max, air temperature mean, relative humidity min, relative humidity max, relative humidity mean, soil temperature mean, soil water content mean, total incoming PAR (mol m-2 d-1), out going PAR (mol m-2 d-1), albedo, minimum battery voltage. missing values are -6999 or 6999. Soil temperature and VWC not valid until instruments could be installed in the soil DOY 163. RH sensor failed DOY177, did not function again. Battery issue DOY 183.
Facebook
TwitterMethods for data collection are described in the manuscript.
Facebook
TwitterThere are scattered missing values for some of these variables. They appear as blank cells in the Excel file.
Facebook
TwitterThis dataset contains a zipped file (dailyQs.zip) of daily streamflow data for 1,378 streamgages in 19 study regions in the conterminous U.S. from October 1, 1980 through September 30, 2013 from mostly undisturbed watersheds. USGS streamgages that were identified as being of “reference” quality in the GAGES-II dataset (https://water.usgs.gov/lookup/getspatial?gagesII_Sept2011) and had at least 10 complete water years (WYs) during the study period from WY1981 through WY2013 were selected. Daily streamflow data were retrieved from the National Water Information System (NWIS) on April 18, 2016. This dataset also contains an Excel file (compWYs.xlsx) indicating for each WY during the study period whether a streamgage had a complete streamflow record (no missing values) during that year. Only complete WYs of daily streamflow data during the study period from the selected streamgages were used to compute the empirical FDC quantiles to which regression equations were fitted. These data support a concurrent publication (Over and others, 2018).
Facebook
TwitterThese data were collected in 1986 and 1987 under the supervision of Dr. Howard Feder. The data include sediment grain-size (%), sediment organic carbon (mg per g), bottom-water temperature (C), and bottom-water salinity. See the related publication for details. This dataset is part of the Pacific Marine Arctic Regional Synthesis (PacMARS) Project. Note that there are missing values that exist as blanks in the Excel file.
Facebook
TwitterThis database that can be used for macro-level analysis of road accidents on interurban roads in Europe. Through the variables it contains, road accidents can be explained using variables related to economic resources invested in roads, traffic, road network, socioeconomic characteristics, legislative measures and meteorology. This repository contains the data used for the analysis carried out in the papers: 1. Calvo-Poyo F., Navarro-Moreno J., de Oña J. (2020) Road Investment and Traffic Safety: An International Study. Sustainability 12:6332. https://doi.org/10.3390/su12166332 2. Navarro-Moreno J., Calvo-Poyo F., de Oña J. (2022) Influence of road investment and maintenance expenses on injured traffic crashes in European roads. Int J Sustain Transp 1–11. https://doi.org/10.1080/15568318.2022.2082344 3. Navarro-Moreno, J., Calvo-Poyo, F., de Oña, J. (2022) Investment in roads and traffic safety: linked to economic development? A European comparison. Environ. Sci. Pollut. Res. https://doi.org/10.1007/s11356-022-22567 The file with the database is available in excel. DATA SOURCES The database presents data from 1998 up to 2016 from 20 european countries: Austria, Belgium, Croatia, Czechia, Denmark, Estonia, Finland, France, Germany, Ireland, Italy, Latvia, Netherlands, Poland, Portugal, Slovakia, Slovenia, Spain, Sweden and United Kingdom. Crash data were obtained from the United Nations Economic Commission for Europe (UNECE) [2], which offers enough level of disaggregation between crashes occurring inside versus outside built-up areas. With reference to the data on economic resources invested in roadways, deserving mention –given its extensive coverage—is the database of the Organisation for Economic Cooperation and Development (OECD), managed by the International Transport Forum (ITF) [1], which collects data on investment in the construction of roads and expenditure on their maintenance, following the definitions of the United Nations System of National Accounts (2008 SNA). Despite some data gaps, the time series present consistency from one country to the next. Moreover, to confirm the consistency and complete missing data, diverse additional sources, mainly the national Transport Ministries of the respective countries were consulted. All the monetary values were converted to constant prices in 2015 using the OECD price index. To obtain the rest of the variables in the database, as well as to ensure consistency in the time series and complete missing data, the following national and international sources were consulted: Eurostat [3] Directorate-General for Mobility and Transport (DG MOVE). European Union [4] The World Bank [5] World Health Organization (WHO) [6] European Transport Safety Council (ETSC) [7] European Road Safety Observatory (ERSO) [8] European Climatic Energy Mixes (ECEM) of the Copernicus Climate Change [9] EU BestPoint-Project [10] Ministerstvo dopravy, República Checa [11] Bundesministerium für Verkehr und digitale Infrastruktur, Alemania [12] Ministerie van Infrastructuur en Waterstaat, Países Bajos [13] National Statistics Office, Malta [14] Ministério da Economia e Transição Digital, Portugal [15] Ministerio de Fomento, España [16] Trafikverket, Suecia [17] Ministère de l’environnement de l’énergie et de la mer, Francia [18] Ministero delle Infrastrutture e dei Trasporti, Italia [19–25] Statistisk sentralbyrå, Noruega [26-29] Instituto Nacional de Estatística, Portugal [30] Infraestruturas de Portugal S.A., Portugal [31–35] Road Safety Authority (RSA), Ireland [36] DATA BASE DESCRIPTION The database was made trying to combine the longest possible time period with the maximum number of countries with complete dataset (some countries like Lithuania, Luxemburg, Malta and Norway were eliminated from the definitive dataset owing to a lack of data or breaks in the time series of records). Taking into account the above, the definitive database is made up of 19 variables, and contains data from 20 countries during the period between 1998 and 2016. Table 1 shows the coding of the variables, as well as their definition and unit of measure. Table. Database metadata Code Variable and unit fatal_pc_km Fatalities per billion passenger-km fatal_mIn Fatalities per million inhabitants accid_adj_pc_km Accidents per billion passenger-km p_km Billions of passenger-km croad_inv_km Investment in roads construction per kilometer, €/km (2015 constant prices) croad_maint_km Expenditure on roads maintenance per kilometer €/km (2015 constant prices) prop_motorwa Proportion of motorways over the total road network (%) populat Population, in millions of inhabitants unemploy Unemployment rate (%) petro_car Consumption of gasolina and petrol derivatives (tons), per tourism alcohol Alcohol consumption, in liters per capita (age > 15) mot_index Motorization index, in cars per 1,000 inhabitants den_populat Population density, inhabitants/km2 cgdp Gross Domestic Product (GDP), in € (2015 constant prices) cgdp_cap GDP per capita, in € (2015 constant prices) precipit Average depth of rain water during a year (mm) prop_elder Proportion of people over 65 years (%) dps Demerit Point System, dummy variable (0: no; 1: yes) freight Freight transport, in billions of ton-km ACKNOWLEDGEMENTS This database was carried out in the framework of the project “Inversión en carreteras y seguridad vial: un análisis internacional (INCASE)”, financed by: FEDER/Ministerio de Ciencia, Innovación y Universidades–Agencia Estatal de Investigación/Proyecto RTI2018-101770-B-I00, within Spain´s National Program of R+D+i Oriented to Societal Challenges. Moreover, the authors would like to express their gratitude to the Ministry of Transport, Mobility and Urban Agenda of Spain (MITMA), and the Federal Ministry of Transport and Digital Infrastructure of Germany (BMVI) for providing data for this study. REFERENCES 1. International Transport Forum OECD iLibrary | Transport infrastructure investment and maintenance. 2. United Nations Economic Commission for Europe UNECE Statistical Database Available online: https://w3.unece.org/PXWeb2015/pxweb/en/STAT/STAT_40-TRTRANS/?rxid=18ad5d0d-bd5e-476f-ab7c-40545e802eeb (accessed on Apr 28, 2020). 3. European Commission Database - Eurostat Available online: https://ec.europa.eu/eurostat/data/database (accessed on Apr 28, 2021). 4. Directorate-General for Mobility and Transport. European Commission EU Transport in figures - Statistical Pocketbooks Available online: https://ec.europa.eu/transport/facts-fundings/statistics_en (accessed on Apr 28, 2021). 5. World Bank Group World Bank Open Data | Data Available online: https://data.worldbank.org/ (accessed on Apr 30, 2021). 6. World Health Organization (WHO) WHO Global Information System on Alcohol and Health Available online: https://apps.who.int/gho/data/node.main.GISAH?lang=en (accessed on Apr 29, 2021). 7. European Transport Safety Council (ETSC) Traffic Law Enforcement across the EU - Tackling the Three Main Killers on Europe’s Roads; Brussels, Belgium, 2011; 8. Copernicus Climate Change Service Climate data for the European energy sector from 1979 to 2016 derived from ERA-Interim Available online: https://cds.climate.copernicus.eu/cdsapp#!/dataset/sis-european-energy-sector?tab=overview (accessed on Apr 29, 2021). 9. Klipp, S.; Eichel, K.; Billard, A.; Chalika, E.; Loranc, M.D.; Farrugia, B.; Jost, G.; Møller, M.; Munnelly, M.; Kallberg, V.P.; et al. European Demerit Point Systems : Overview of their main features and expert opinions. EU BestPoint-Project 2011, 1–237. 10. Ministerstvo dopravy Serie: Ročenka dopravy; Ročenka dopravy; Centrum dopravního výzkumu: Prague, Czech Republic; 11. Bundesministerium für Verkehr und digitale Infrastruktur Verkehr in Zahlen 2003/2004; Hamburg, Germany, 2004; ISBN 3871542946. 12. Bundesministerium für Verkehr und digitale Infrastruktur Verkehr in Zahlen 2018/2019. In Verkehrsdynamik; Flensburg, Germany, 2018 ISBN 9783000612947. 13. Ministerie van Infrastructuur en Waterstaat Rijksjaarverslag 2018 a Infrastructuurfonds; The Hague, Netherlands, 2019; ISBN 0921-7371. 14. Ministerie van Infrastructuur en Milieu Rijksjaarverslag 2014 a Infrastructuurfonds; The Hague, Netherlands, 2015; ISBN 0921- 7371. 15. Ministério da Economia e Transição Digital Base de Dados de Infraestruturas - GEE Available online: https://www.gee.gov.pt/pt/publicacoes/indicadores-e-estatisticas/base-de-dados-de-infraestruturas (accessed on Apr 29, 2021). 16. Ministerio de Fomento. Dirección General de Programación Económica y Presupuestos. Subdirección General de Estudios Económicos y Estadísticas Serie: Anuario estadístico; NIPO 161-13-171-0; Centro de Publicaciones. Secretaría General Técnica. Ministerio de Fomento: Madrid, Spain; 17. Trafikverket The Swedish Transport Administration Annual report: 2017; 2018; ISBN 978-91-7725-272-6. 18. Ministère de l’Équipement, du T. et de la M. Mémento de statistiques des transports 2003; Ministère de l’environnement de l’énergie et de la mer, 2005; 19. Ministero delle Infrastrutture e dei Trasporti Conto Nazionale delle Infrastrutture e dei Trasporti Anno 2000; Istituto Poligrafico e Zecca dello Stato: Roma, Italy, 2001; 20. Ministero delle Infrastrutture e dei Trasporti Conto nazionale dei trasporti 1999. 2000. 21. Generale, D.; Informativi, S. delle Infrastrutture e dei Trasporti Anno 2004. 22. Ministero delle Infrastrutture e dei Trasporti Conto Nazionale delle Infrastrutture e dei Trasporti Anno 2001; 2002; 23. Ministero delle Infrastrutture e dei
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Elective Disciplines Survey Data - Kryvyi Rih State Pedagogical University
Description
Survey data collected from students at Kryvyi Rih State Pedagogical University regarding the process of studying elective disciplines. The survey aims to analyze priorities for improving educational quality and identify issues in the procedure for selecting and studying elective courses.
Dataset Information
Number of Records: 1089
Number of Variables: 15
Data Collection Date: 2025
Location: Kryvyi Rih, Ukraine
License: https://creativecommons.org/licenses/by/4.0/
Variables in Dataset
The dataset contains the following variables:
номер_відповіді: int64
вкажіть_ступінь_вищої_освіти,_який_ви_здобуваєте_в_університеті:: object
вкажіть_освітню_програму,_за_якою_ви_навчаєтесь_в_університеті:: object
чи_знайомі_ви_з_процедурою_вибору_навчальних_дисциплін?: object
(04)_як_ви_ставитесь_до_процедури_вибору_навчальних_дисциплін_в_університеті?: object
якщо_на_попереднє_питання_відповіли_інше,_вкажіть_як_саме.: object
чи_влаштовує_вас_кількість_запропонованих_дисциплін_вільного_вибору?: object
(06)_визначте_чинники,які_впливають_на_вибір_вами_навчальних_дисциплін(оберіть_усі_можливі_варіанти): object
якщо_на_попереднє_питання_відповіли_інше,_вкажіть_як_саме..1: object
чи_ознайомлювалися_ви_з_силабусами_вибіркових_дисциплін_перед_тим,_як_зробити_свій_вибір?: object
(08)_обрані_вами_дисципліни_виявилися:: object
якщо_на_попереднє_питання_відповіли_інше,_вкажіть_як_саме..2: object
які_з_вибраних_вами_дисциплін_виявилися_найбільш_цікавими_й_корисними?: object
чи_обрали_б_ви_ці_дисципліни_повторно,_чи_змінили_б_свій_вибір?: object
ваші_пропозиції_щодо_політики_і_процедур_вибору_навчальних_дисциплін: object
Files in Package
This data package includes the following files:
elective_disciplines_survey_data.csv: Data in CSV format (comma-separated)
elective_disciplines_survey_data.tsv: Data in TSV format (tab-separated)
elective_disciplines_survey_data.json: Data in JSON format (line-delimited JSON records)
elective_disciplines_survey_data.xlsx: Data in Excel format
metadata.json: Comprehensive metadata in JSON-LD format
README.md: This file
Usage and Citation
When using this dataset, please cite:
Kryvyi Rih State Pedagogical University. (2025). Elective Disciplines Survey Data. Zenodo. https://doi.org/[DOI_TO_BE_ASSIGNED]
Processing Information
Loaded original Excel survey data
Cleaned and standardized column names
Handled missing values
Anonymized personal identifiers
Converted to multiple standard formats (CSV, TSV, JSON, Excel)
Generated comprehensive metadata
Contact Information
For questions about this dataset, please contact:
Research Department, Kryvyi Rih State Pedagogical University
Email: semerikov@gmail.com
Website: https://kdpu.edu.ua/en/
Facebook
TwitterA comprehensive Quality Assurance (QA) and Quality Control (QC) statistical framework consists of three major phases: Phase 1—Preliminary raw data sets exploration, including time formatting and combining datasets of different lengths and different time intervals; Phase 2—QA of the datasets, including detecting and flagging of duplicates, outliers, and extreme values; and Phase 3—the development of time series of a desired frequency, imputation of missing values, visualization and a final statistical summary. The time series data collected at the Billy Barr meteorological station (East River Watershed, Colorado) were analyzed. The developed statistical framework is suitable for both real-time and post-data-collection QA/QC analysis of meteorological datasets.The files that are in this data package include one excel file, converted to CSV format (Billy_Barr_raw_qaqc.csv) that contains the raw meteorological data, i.e., input data used for the QA/QC analysis. The second CSV file (Billy_Barr_1hr.csv) is the QA/QC and flagged meteorological data, i.e., output data from the QA/QC analysis. The last file (QAQC_Billy_Barr_2021-03-22.R) is a script written in R that implements the QA/QC and flagging process. The purpose of the CSV data files included in this package is to provide input and output files implemented in the R script.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.
Tagging scheme:
Aligned (AL) - A concept is represented as a class in both models, either
with the same name or using synonyms or clearly linkable names;
Wrongly represented (WR) - A class in the domain expert model is
incorrectly represented in the student model, either (i) via an attribute,
method, or relationship rather than class, or
(ii) using a generic term (e.g., user'' instead ofurban
planner'');
System-oriented (SO) - A class in CM-Stud that denotes a technical
implementation aspect, e.g., access control. Classes that represent legacy
system or the system under design (portal, simulator) are legitimate;
Omitted (OM) - A class in CM-Expert that does not appear in any way in
CM-Stud;
Missing (MI) - A class in CM-Stud that does not appear in any way in
CM-Expert.
All the calculations and information provided in the following sheets
originate from that raw data.
Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.
Sheet 3 (Size-Ratio):
The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.
Sheet 4 (Overall):
Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.
For sheet 4 as well as for the following four sheets, diverging stacked bar
charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:
Sheet 5 (By-Notation):
Model correctness and model completeness is compared by notation - UC, US.
Sheet 6 (By-Case):
Model correctness and model completeness is compared by case - SIM, HOS, IFA.
Sheet 7 (By-Process):
Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.
Sheet 8 (By-Grade):
Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The Dirty Retail Store Sales dataset contains 12,575 rows of synthetic data representing sales transactions from a retail store. The dataset includes eight product categories with 25 items per category, each having static prices. It is designed to simulate real-world sales data, including intentional "dirtiness" such as missing or inconsistent values. This dataset is suitable for practicing data cleaning, exploratory data analysis (EDA), and feature engineering.
retail_store_sales.csv| Column Name | Description | Example Values |
|---|---|---|
Transaction ID | A unique identifier for each transaction. Always present and unique. | TXN_1234567 |
Customer ID | A unique identifier for each customer. 25 unique customers. | CUST_01 |
Category | The category of the purchased item. | Food, Furniture |
Item | The name of the purchased item. May contain missing values or None. | Item_1_FOOD, None |
Price Per Unit | The static price of a single unit of the item. May contain missing or None values. | 4.00, None |
Quantity | The quantity of the item purchased. May contain missing or None values. | 1, None |
Total Spent | The total amount spent on the transaction. Calculated as Quantity * Price Per Unit. | 8.00, None |
Payment Method | The method of payment used. May contain missing or invalid values. | Cash, Credit Card |
Location | The location where the transaction occurred. May contain missing or invalid values. | In-store, Online |
Transaction Date | The date of the transaction. Always present and valid. | 2023-01-15 |
Discount Applied | Indicates if a discount was applied to the transaction. May contain missing values. | True, False, None |
The dataset includes the following categories, each containing 25 items with corresponding codes, names, and static prices:
| Item Code | Item Name | Price |
|---|---|---|
| Item_1_EHE | Blender | 5.0 |
| Item_2_EHE | Microwave | 6.5 |
| Item_3_EHE | Toaster | 8.0 |
| Item_4_EHE | Vacuum Cleaner | 9.5 |
| Item_5_EHE | Air Purifier | 11.0 |
| Item_6_EHE | Electric Kettle | 12.5 |
| Item_7_EHE | Rice Cooker | 14.0 |
| Item_8_EHE | Iron | 15.5 |
| Item_9_EHE | Ceiling Fan | 17.0 |
| Item_10_EHE | Table Fan | 18.5 |
| Item_11_EHE | Hair Dryer | 20.0 |
| Item_12_EHE | Heater | 21.5 |
| Item_13_EHE | Humidifier | 23.0 |
| Item_14_EHE | Dehumidifier | 24.5 |
| Item_15_EHE | Coffee Maker | 26.0 |
| Item_16_EHE | Portable AC | 27.5 |
| Item_17_EHE | Electric Stove | 29.0 |
| Item_18_EHE | Pressure Cooker | 30.5 |
| Item_19_EHE | Induction Cooktop | 32.0 |
| Item_20_EHE | Water Dispenser | 33.5 |
| Item_21_EHE | Hand Blender | 35.0 |
| Item_22_EHE | Mixer Grinder | 36.5 |
| Item_23_EHE | Sandwich Maker | 38.0 |
| Item_24_EHE | Air Fryer | 39.5 |
| Item_25_EHE | Juicer | 41.0 |
| Item Code | Item Name | Price |
|---|---|---|
| Item_1_FUR | Office Chair | 5.0 |
| Item_2_FUR | Sofa | 6.5 |
| Item_3_FUR | Coffee Table | 8.0 |
| Item_4_FUR | Dining Table | 9.5 |
| Item_5_FUR | Bookshelf | 11.0 |
| Item_6_FUR | Bed F... |
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains one Excel sheet and five Word documents. In this dataset, Simulation.xlsx describes the parameter values used for the numerical analysis based on empirical data. In this Excel sheet, we calculated the values of each capped call-option model parameter. Computation of Table 2.docx and other documents show the results of the comparative statistics.