Facebook
TwitterThis dataset is a cleaned and preprocessed version of the original Netflix Movies and TV Shows dataset available on Kaggle. All cleaning was done using Microsoft Excel — no programming involved.
🎯 What’s Included: - Cleaned Excel file (standardized columns, proper date format, removed duplicates/missing values) - A separate "formulas_used.txt" file listing all Excel formulas used during cleaning (e.g., TRIM, CLEAN, DATE, SUBSTITUTE, TEXTJOIN, etc.) - Columns like 'date_added' have been properly formatted into DMY structure - Multi-valued columns like 'listed_in' are split for better analysis - Null values replaced with “Unknown” for clarity - Duration field broken into numeric + unit components
🔍 Dataset Purpose: Ideal for beginners and analysts who want to: - Practice data cleaning in Excel - Explore Netflix content trends - Analyze content by type, country, genre, or date added
📁 Original Dataset Credit: The base version was originally published by Shivam Bansal on Kaggle: https://www.kaggle.com/shivamb/netflix-shows
📌 Bonus: You can find a step-by-step cleaning guide and the same dataset on GitHub as well — along with screenshots and formulas documentation.
Facebook
TwitterTool: Microsoft Excel
Dataset: Coffee Sales
Process: 1. Data Cleaning: • Remove duplicates and blanks. • Standardize date and currency formats.
Data Manipulation:
• Sorting and filtering function to work
with interest subsets of data.
• Use XLOOKUP, INDEX-MATCH and IF
formula for efficient data manipulation,
such as retrieving, matching and
organising information in spreadsheets
Data Analysis: • Create Pivot Tables and Pivot Charts with the formatting to visualize trends.
Dashboard Development: • Insert Slicers with the formatting for easy filtering and dynamic updates.
Highlights: This project aims to understand coffee sales trends by country, roast type, and year, which could help identify marketing opportunities and customer segments.
Facebook
Twitterhttps://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
This dataset was used by the NCI's Quantitative Imaging Network (QIN) PET-CT Subgroup for their project titled: Multi-center Comparison of Radiomic Features from Different Software Packages on Digital Reference Objects and Patient Datasets. The purpose of this project was to assess the agreement among radiomic features when computed by several groups by using different software packages under very tightly controlled conditions, which included common image data sets and standardized feature definitions.
The image datasets (and Volumes of Interest – VOIs) provided here are the same ones used in that project and reported in the publication listed below (ISSN 2379-1381 https://doi.org/10.18383/j.tom.2019.00031). In addition, we have provided detailed information about the software packages used (Table 1 in that publication) as well as the individual feature value results for each image dataset and each software package that was used to create the summary tables (Tables 2, 3 and 4) in that publication.
For that project, nine common quantitative imaging features were selected for comparison including features that describe morphology, intensity, shape, and texture and that are described in detail in the International Biomarker Standardisation Initiative (IBSI, https://arxiv.org/abs/1612.07003 and publication (Zwanenburg A. Vallières M, et al, The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping. Radiology. 2020 May;295(2):328-338. doi: https://doi.org/10.1148/radiol.2020191145).
There are three datasets provided – two image datasets and one dataset consisting of four excel spreadsheets containing feature values.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Existing studies investigating 30-day in-hospital stroke case fatality rates in sub-Saharan Africa have produced varying results, underscoring the significance of obtaining precise and reliable estimations for this indicator. Consequently, this study aimed to conduct a systematic review and update of the current scientific evidence regarding 30-day in-hospital stroke case fatality and associated risk factors in sub-Saharan Africa. Medline/PubMed, Cumulative Index to Nursing and Allied Health Literature (CINAHL), APA PsycNet (encompassing PsycINFO and PsychArticle), Google Scholar, and Africa Journal Online (AJOL) were systematically searched to identify potentially relevant articles. Two independent assessors extracted the data from the eligible studies using a pre-tested and standardized excel spreadsheet. Outcomes were 30-day in-hospital stroke case fatality and associated risk factors. Data was pooled using random effects model. Ninety-three (93) studies involving 42,057 participants were included. The overall stroke case fatality rate was 27% [25%-29%]. Subgroup analysis revealed 24% [21%-28%], 25% [21%-28%], 29% [25%-32%] and 31% [20%-43%] stroke case fatality rates in East Africa, Southern Africa, West Africa, and Central Africa respectively. Stroke severity, stroke type, untyped stroke, and post-stroke complications were identified as risk factors. The most prevalent risk factors were low (
Facebook
TwitterThis data set is comprised of five files related to the modification and scoring of Index of Waterbird Community Integrity (IWCI) scores for all waterbirds of the Chesapeake Bay. One Excel file (A) contains a list of 100+ Chesapeake waterbird species and their species attribute and IWCI scores. Another Excel file (B) contains case study data from recent surveys of breeding and migratory waterbirds in Chesapeake Bay and shoreline delineations across a disturbance gradient that were used to demonstrate the utility of the modified index. Finally, three supplemental files include an Access database (C), R code (D) and a protocol (E) for running the complex steps to calculate index scores.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Classification on methodological quality and risk of bias of included studies.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All published datasets from the ice core paleoclimatology (ICP) group at the Byrd Polar and Climate Research Center (BPCRC) are archived in the NOAA-NCEI Paleoclimatology Database (https://www.ncei.noaa.gov/access/paleo-search/?dataTypeId=7). However, the formatting of these datasets is not consistent across the archival files, making it difficult to download and aggregate multiple datasets for research purposes. This repository is intended to provide a simple, consistently formatted archive of Excel files containing the published data for more than 16 ice core records collected by the BPCRC-ICP group since the 1980s.
These files can be accessed directly in MATLAB with the assistance of the Byrd-ICP Data App, downloadable on Github (https://github.com/weber1158/Byrd-ICP-Data-App).
The file "2023-ByrdICP-datasets.xlsx " contains a column for each ice core location and a list of the sheet names within the corresponding Excel file for that ice core location.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overall and sub-regional estimates of 30-day in-hospital stroke case fatality in Sub-Saharan Africa.
Facebook
TwitterStudent name, target grade and three test grades. Used as mock data to create student data tables in Excel.
Facebook
TwitterAnalyzing sales data is essential for any business looking to make informed decisions and optimize its operations. In this project, we will utilize Microsoft Excel and Power Query to conduct a comprehensive analysis of Superstore sales data. Our primary objectives will be to establish meaningful connections between various data sheets, ensure data quality, and calculate critical metrics such as the Cost of Goods Sold (COGS) and discount values. Below are the key steps and elements of this analysis:
1- Data Import and Transformation:
2- Data Quality Assessment:
3- Calculating COGS:
4- Discount Analysis:
5- Sales Metrics:
6- Visualization:
7- Report Generation:
Throughout this analysis, the goal is to provide a clear and comprehensive understanding of the Superstore's sales performance. By using Excel and Power Query, we can efficiently manage and analyze the data, ensuring that the insights gained contribute to the store's growth and success.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 1: 994 Prodigal ortholog sets with inconsistent start sites. The Excel file provides information about the 994 ortholog sets with inconsistent start sites, including the genes within each set and the gene start site revisions required to achieve consistency within each set. (XLS 1 MB)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Age-standardized death rate of T2DM from 2020 to 2030.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Comprehensive sexuality education (CSE) is recognized as a critical tool for addressing sexuality and reproductive health challenges among adolescents. However, little is known about the broader impacts of CSE on populations beyond adolescents, such as schools, families, and communities. This study explores multi-level impacts of an innovative CSE program in Madagascar, which employs young adult CSE educators to teach a three-year curriculum in government middle schools across the country. The two-phased study embraced a participatory approach and qualitative Human-centered Design (HCD) methods. In phase 1, 90 school principals and administrators representing 45 schools participated in HCD workshops, which were held in six regional cities. Phase 2 took place one year later, which included 50 principals from partner schools, and focused on expanding and validating findings from phase 1. From the perspective of school principals and administrators, the results indicate several areas in which CSE programming is having spill-over effects, beyond direct adolescent student sexuality knowledge and behaviors. In the case of this youth-led model in Madagascar, the program has impacted the lives of students (e.g., increased academic motivation and confidence), their parents (e.g., strengthened family relationships and increased parental involvement in schools), theirschools (e.g., increased perceived value of schools and teacher effectiveness), their communities (e.g., increased community connections), and impacted broader structural issues (e.g., improved equity and access to resources such as menstrual pads). While not all impacts of the CSE program were perceived as positive (e.g., students start experimenting with sex and love), the findings uncovered opportunities for targeting investments and refining CSE programming to maximize positive impacts at family, school, and community levels. Methods Data were collected in June 2021 during seven workshops held in six regional cities in Madagascar. Principals and one other member of the school were invited from each of of the youth-led, non-governmental organization's 45 partner schools during the 2020-2021 school year. Principals had the discretion to invite second representatives, which was often the person who worked most closely with or alongside the CSE Educator. If the principal could not attend, then they assigned someone to attend in their place. In total, 90 school administrators participated, which included principals, vice principals, and school monitors representing 45 schools, both urban and rural. Workshops were facilitated by the organization's Monitoring and Evaluation Manager and lasted 3-4 hours. Workshops were held in the organization's regional offices or rented working space (i.e., not in schools), and discussions were facilitated in the local language, Malagasy. Additional organization staff members in attendance took detailed notes which they then cross-referenced with the audio recordings of the discussions as needed to accurately fill a standardized data entry sheet in Microsoft Excel file. The final data entry sheet was translated into English for qualitative analysis. At the workshops, participants responded to two research prompts: 1) What are the different effects CSE programming has on students’ lives? 2) What other effects does the CSE programming have in the school or in the community? In brief, participants brainstormed responses to the prompts individually on post-it notes, then participated in an affinity clustering exercise in which they grouped their ideas based on perceived similarities and differences. In phase 2, data were collected in May 2022 during the non-governmental organization's end of the school year symposium for principals of partner schools to gather feedback on the cluster analysis conducted in phase 1, validate findings, and vote on priority areas. Fifty-one principals from the partner schools from the 2021-2022 school year were invited to participate, and a total of 50 representatives from the schools attended the workshops, representing three regions of Madagascar. Two members of the research team consolidated key findings across the seven workshops conducted in phase 1, paying particular attention to key themes that were raised in multiple different workshops using qualitative content analysis. The team used grounded theory to analyze the data using an inductive approach, in which initial cluster themes were compared across all workshops to draw broader comparisons, and the relationships between clusters were studied to propose a final conceptual model of the CSE program impacts. The final cluster labels were created by the team based on a review of the items within the clusters. The tallied votes from the phase 2 “Visualize the Vote” exercise were collated in a standardized Microsoft excel spreadsheet. Basic counts were calculated for each cluster to reflect the principal’s observations of areas impacted by the CSE programming and the principal’s vote for the areas of impact most important to them. Translations of the Phase 2 principal feedback were reviewed by the coding team and select quotes were used as illustrative examples to contextualize the key clusters.
Facebook
TwitterThis dummy dataset contains one attribute which is the CET score and one target variable Admitted.
Based on the CET(Common Entrance Test) score, a suitable algorithm has to be used to predict whether the student is admitted or not. This dataset can be used for logistic regression problem.
Logistic regression using this dataset is demonstrated in the article and video below-
https://www.analyticsvidhya.com/blog/2022/02/logistic-regression-using-python-and-excel/
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundThe Qualitative aspect of health care delivery is one of the major factors in reducing morbidity and mortality in a health care setup. The expanding suburban secondary health care delivery facilities of the Municipal Corporation of Greater Mumbai are an important part of the healthcare backbone of Mumbai and therefore the quality of care delivered here needed standardization.Material and MethodsThe project was completed over a period of one year from Jan to Dec, 2013 and implemented in three phases. The framework with components and sub-components were developed and formats for data collection were standardized. The benchmarks were based on past performance in the same hospital and probability was used for development of normal range. An Excel spreadsheet was developed to facilitate data analysis.ResultsThe indicators comprise of 3 components - Statutory Requirements, Patient care & Cure and Administrative efficiency. The measurements made, pointed to the broad areas needing attention.ConclusionThe Indicators for patient care and monitoring standards can be used as a self assessment tool for health care setups for standardization and improvement of delivery of health care services.
Facebook
TwitterThis dataset was created by Hoang
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ObjectivesIn adult major trauma patients admission hypocalcaemia occurs in approximately half of cases and is associated with increased mortality. However, data amongst paediatric patients are limited. The objectives of this review were to determine the incidence of admission ionised hypocalcaemia in paediatric major trauma patients and to explore whether hypocalcaemia is associated with adverse outcomes.MethodsA systematic review was conducted following PRISMA guidelines. All studies including major trauma patients
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
We are enclosing the database used in our research titled "Concentration and Geospatial Modelling of Health Development Offices' Accessibility for the Total and Elderly Populations in Hungary", along with our statistical calculations. For the sake of reproducibility, further information can be found in the file Short_Description_of_Data_Analysis.pdf and Statistical_formulas.pdf
The sharing of data is part of our aim to strengthen the base of our scientific research. As of March 7, 2024, the detailed submission and analysis of our research findings to a scientific journal has not yet been completed.
The dataset was expanded on 23rd September 2024 to include SPSS statistical analysis data, a heatmap, and buffer zone analysis around the Health Development Offices (HDOs) created in QGIS software.
Short Description of Data Analysis and Attached Files (datasets):
Our research utilised data from 2022, serving as the basis for statistical standardisation. The 2022 Hungarian census provided an objective basis for our analysis, with age group data available at the county level from the Hungarian Central Statistical Office (KSH) website. The 2022 demographic data provided an accurate picture compared to the data available from the 2023 microcensus. The used calculation is based on our standardisation of the 2022 data. For xlsx files, we used MS Excel 2019 (version: 1808, build: 10406.20006) with the SOLVER add-in.
Hungarian Central Statistical Office served as the data source for population by age group, county, and regions: https://www.ksh.hu/stadat_files/nep/hu/nep0035.html, (accessed 04 Jan. 2024.) with data recorded in MS Excel in the Data_of_demography.xlsx file.
In 2022, 108 Health Development Offices (HDOs) were operational, and it's noteworthy that no developments have occurred in this area since 2022. The availability of these offices and the demographic data from the Central Statistical Office in Hungary are considered public interest data, freely usable for research purposes without requiring permission.
The contact details for the Health Development Offices were sourced from the following page (Hungarian National Population Centre (NNK)): https://www.nnk.gov.hu/index.php/efi (n=107). The Semmelweis University Health Development Centre was not listed by NNK, hence it was separately recorded as the 108th HDO. More information about the office can be found here: https://semmelweis.hu/egeszsegfejlesztes/en/ (n=1). (accessed 05 Dec. 2023.)
Geocoordinates were determined using Google Maps (N=108): https://www.google.com/maps. (accessed 02 Jan. 2024.) Recording of geocoordinates (latitude and longitude according to WGS 84 standard), address data (postal code, town name, street, and house number), and the name of each HDO was carried out in the: Geo_coordinates_and_names_of_Hungarian_Health_Development_Offices.csv file.
The foundational software for geospatial modelling and display (QGIS 3.34), an open-source software, can be downloaded from:
https://qgis.org/en/site/forusers/download.html. (accessed 04 Jan. 2024.)
The HDOs_GeoCoordinates.gpkg QGIS project file contains Hungary's administrative map and the recorded addresses of the HDOs from the
Geo_coordinates_and_names_of_Hungarian_Health_Development_Offices.csv file,
imported via .csv file.
The OpenStreetMap tileset is directly accessible from www.openstreetmap.org in QGIS. (accessed 04 Jan. 2024.)
The Hungarian county administrative boundaries were downloaded from the following website: https://data2.openstreetmap.hu/hatarok/index.php?admin=6 (accessed 04 Jan. 2024.)
HDO_Buffers.gpkg is a QGIS project file that includes the administrative map of Hungary, the county boundaries, as well as the HDO offices and their corresponding buffer zones with a radius of 7.5 km.
Heatmap.gpkg is a QGIS project file that includes the administrative map of Hungary, the county boundaries, as well as the HDO offices and their corresponding heatmap (Kernel Density Estimation).
A brief description of the statistical formulas applied is included in the Statistical_formulas.pdf.
Recording of our base data for statistical concentration and diversification measurement was done using MS Excel 2019 (version: 1808, build: 10406.20006) in .xlsx format.
Using the SPSS 29.0.1.0 program, we performed the following statistical calculations with the databases Data_HDOs_population_without_outliers.sav and Data_HDOs_population.sav:
For easier readability, the files have been provided in both SPV and PDF formats.
The translation of these supplementary files into English was completed on 23rd Sept. 2024.
If you have any further questions regarding the dataset, please contact the corresponding author: domjan.peter@phd.semmelweis.hu
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
A colleague, Lucas, has asked you to update a spreadsheet called Reseller Details that records details of Adventure Work’s resellers in the United States. This information in the spreadsheet was downloaded from another system. The download process created several inconsistencies or errors within the data.
These errors include unnecessary spaces, the use of the wrong case, and entries that need to be joined together or split apart.
You now need to add formulas to the worksheet to standardize the data so that it can be used for analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset and codebook of the content analysis study published in the Journal of Cleaner Production:https://doi.org/10.1016/j.jclepro.2016.02.060Complete codebook in Word format;Dataset as excel file including the computed credibility scores
Facebook
TwitterThis dataset is a cleaned and preprocessed version of the original Netflix Movies and TV Shows dataset available on Kaggle. All cleaning was done using Microsoft Excel — no programming involved.
🎯 What’s Included: - Cleaned Excel file (standardized columns, proper date format, removed duplicates/missing values) - A separate "formulas_used.txt" file listing all Excel formulas used during cleaning (e.g., TRIM, CLEAN, DATE, SUBSTITUTE, TEXTJOIN, etc.) - Columns like 'date_added' have been properly formatted into DMY structure - Multi-valued columns like 'listed_in' are split for better analysis - Null values replaced with “Unknown” for clarity - Duration field broken into numeric + unit components
🔍 Dataset Purpose: Ideal for beginners and analysts who want to: - Practice data cleaning in Excel - Explore Netflix content trends - Analyze content by type, country, genre, or date added
📁 Original Dataset Credit: The base version was originally published by Shivam Bansal on Kaggle: https://www.kaggle.com/shivamb/netflix-shows
📌 Bonus: You can find a step-by-step cleaning guide and the same dataset on GitHub as well — along with screenshots and formulas documentation.