32 datasets found
  1. Netflix Movies and TV Shows Dataset Cleaned(excel)

    • kaggle.com
    Updated Apr 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gaurav Tawri (2025). Netflix Movies and TV Shows Dataset Cleaned(excel) [Dataset]. https://www.kaggle.com/datasets/gauravtawri/netflix-movies-and-tv-shows-dataset-cleanedexcel
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 8, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Gaurav Tawri
    Description

    This dataset is a cleaned and preprocessed version of the original Netflix Movies and TV Shows dataset available on Kaggle. All cleaning was done using Microsoft Excel — no programming involved.

    🎯 What’s Included: - Cleaned Excel file (standardized columns, proper date format, removed duplicates/missing values) - A separate "formulas_used.txt" file listing all Excel formulas used during cleaning (e.g., TRIM, CLEAN, DATE, SUBSTITUTE, TEXTJOIN, etc.) - Columns like 'date_added' have been properly formatted into DMY structure - Multi-valued columns like 'listed_in' are split for better analysis - Null values replaced with “Unknown” for clarity - Duration field broken into numeric + unit components

    🔍 Dataset Purpose: Ideal for beginners and analysts who want to: - Practice data cleaning in Excel - Explore Netflix content trends - Analyze content by type, country, genre, or date added

    📁 Original Dataset Credit: The base version was originally published by Shivam Bansal on Kaggle: https://www.kaggle.com/shivamb/netflix-shows

    📌 Bonus: You can find a step-by-step cleaning guide and the same dataset on GitHub as well — along with screenshots and formulas documentation.

  2. Coffee Sales Excel Project

    • kaggle.com
    Updated Nov 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nuha Zahidi (2024). Coffee Sales Excel Project [Dataset]. https://www.kaggle.com/datasets/nuhazahidi/coffee-sales-excel-project
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 13, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nuha Zahidi
    Description

    Tool: Microsoft Excel

    Dataset: Coffee Sales

    Process: 1. Data Cleaning: • Remove duplicates and blanks. • Standardize date and currency formats.

    1. Data Manipulation: • Sorting and filtering function to work
      with interest subsets of data. • Use XLOOKUP, INDEX-MATCH and IF
      formula for efficient data manipulation, such as retrieving, matching and organising information in spreadsheets

    2. Data Analysis: • Create Pivot Tables and Pivot Charts with the formatting to visualize trends.

    3. Dashboard Development: • Insert Slicers with the formatting for easy filtering and dynamic updates.

    Highlights: This project aims to understand coffee sales trends by country, roast type, and year, which could help identify marketing opportunities and customer segments.

  3. c

    Standardization in Quantitative Imaging: A Multi-center Comparison of...

    • cancerimagingarchive.net
    n/a, nifti and zip +1
    Updated Jun 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Cancer Imaging Archive (2020). Standardization in Quantitative Imaging: A Multi-center Comparison of Radiomic Feature Values [Dataset]. http://doi.org/10.7937/tcia.2020.9era-gg29
    Explore at:
    xlsx, n/a, nifti and zipAvailable download formats
    Dataset updated
    Jun 9, 2020
    Dataset authored and provided by
    The Cancer Imaging Archive
    License

    https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

    Time period covered
    Jun 9, 2020
    Dataset funded by
    National Cancer Institutehttp://www.cancer.gov/
    Description

    This dataset was used by the NCI's Quantitative Imaging Network (QIN) PET-CT Subgroup for their project titled: Multi-center Comparison of Radiomic Features from Different Software Packages on Digital Reference Objects and Patient Datasets. The purpose of this project was to assess the agreement among radiomic features when computed by several groups by using different software packages under very tightly controlled conditions, which included common image data sets and standardized feature definitions. The image datasets (and Volumes of Interest – VOIs) provided here are the same ones used in that project and reported in the publication listed below (ISSN 2379-1381 https://doi.org/10.18383/j.tom.2019.00031). In addition, we have provided detailed information about the software packages used (Table 1 in that publication) as well as the individual feature value results for each image dataset and each software package that was used to create the summary tables (Tables 2, 3 and 4) in that publication. For that project, nine common quantitative imaging features were selected for comparison including features that describe morphology, intensity, shape, and texture and that are described in detail in the International Biomarker Standardisation Initiative (IBSI, https://arxiv.org/abs/1612.07003 and publication (Zwanenburg A. Vallières M, et al, The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping. Radiology. 2020 May;295(2):328-338. doi: https://doi.org/10.1148/radiol.2020191145). There are three datasets provided – two image datasets and one dataset consisting of four excel spreadsheets containing feature values.

    1. The first image dataset is a set of three Digital Reference Objects (DROs) used in the project, which are: (a) a sphere with uniform intensity, (b) a sphere with intensity variation (c) a nonspherical (but mathematically defined) object with uniform intensity. These DROs were created by the team at Stanford University and are described in (Jaggi A, Mattonen SA, McNitt-Gray M, Napel S. Stanford DRO Toolkit: digital reference objects for standardization of radiomic features. Tomography. 2019;6:–.) and are a subset of the DROs described in DRO Toolkit. Each DRO is represented in both DICOM and NIfTI format and the VOI was provided in each format as well (DICOM Segmentation Object (DSO) as well as NIfTI segmentation boundary).
    2. The second image dataset is the set of 10 patient CT scans, originating from the LIDC-IDRI dataset, that were used in the QIN multi-site collection of Lung CT data with Nodule Segmentations project ( https://doi.org/10.7937/K9/TCIA.2015.1BUVFJR7 ). In that QIN study, a single lesion from each case was identified for analysis and then nine VOIs were generated using three repeat runs of three segmentation algorithms (one from each of three academic institutions) on each lesion. To eliminate one source of variability in our project, only one of the VOIs previously created for each lesion was identified and all sites used that same VOI definition. The specific VOI chosen for each lesion was the first run of the first algorithm (algorithm 1, run 1). DICOM images were provided for each dataset and the VOI was provided in both DICOM Segmentation Object (DSO) and NIfTI segmentation formats.
    3. The third dataset is a collection of four excel spreadsheets, each of which contains detailed information corresponding to each of the four tables in the publication. For example, the raw feature values and the summary tables for Tables 2,3 and 4 reported in the publication cited (https://doi.org/10.18383/j.tom.2019.00031). These tables are:
    Software Package details : This table contains detailed information about the software packages used in the study (and listed in Table 1 in the publication) including version number and any parameters specified in the calculation of the features reported. DRO results : This contains the original feature values obtained for each software package for each DRO as well as the table summarizing results across software packages (Table 2 in the publication) . Patient Dataset results: This contains the original feature values for each software package for each patient dataset (1 lesion per case) as well as the table summarizing results across software packages and patient datasets (Table 3 in the publication). Harmonized GLCM Entropy Results : This contains the values for the “Harmonized” GLCM Entropy feature for each patient dataset and each software package as well as the summary across software packages (Table 4 in the publication).

  4. f

    Data associated with manuscript.

    • figshare.com
    xlsx
    Updated Jan 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Ackah; Louise Ameyaw; Richard Appiah; David Owiredu; Hosea Boakye; Webster Donaldy; Comos Yarfi; Ulric S. Abonie (2024). Data associated with manuscript. [Dataset]. http://doi.org/10.1371/journal.pgph.0002769.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 19, 2024
    Dataset provided by
    PLOS Global Public Health
    Authors
    Martin Ackah; Louise Ameyaw; Richard Appiah; David Owiredu; Hosea Boakye; Webster Donaldy; Comos Yarfi; Ulric S. Abonie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Existing studies investigating 30-day in-hospital stroke case fatality rates in sub-Saharan Africa have produced varying results, underscoring the significance of obtaining precise and reliable estimations for this indicator. Consequently, this study aimed to conduct a systematic review and update of the current scientific evidence regarding 30-day in-hospital stroke case fatality and associated risk factors in sub-Saharan Africa. Medline/PubMed, Cumulative Index to Nursing and Allied Health Literature (CINAHL), APA PsycNet (encompassing PsycINFO and PsychArticle), Google Scholar, and Africa Journal Online (AJOL) were systematically searched to identify potentially relevant articles. Two independent assessors extracted the data from the eligible studies using a pre-tested and standardized excel spreadsheet. Outcomes were 30-day in-hospital stroke case fatality and associated risk factors. Data was pooled using random effects model. Ninety-three (93) studies involving 42,057 participants were included. The overall stroke case fatality rate was 27% [25%-29%]. Subgroup analysis revealed 24% [21%-28%], 25% [21%-28%], 29% [25%-32%] and 31% [20%-43%] stroke case fatality rates in East Africa, Southern Africa, West Africa, and Central Africa respectively. Stroke severity, stroke type, untyped stroke, and post-stroke complications were identified as risk factors. The most prevalent risk factors were low (

  5. g

    Standardization and Application of an Index of Community Integrity for...

    • gimi9.com
    Updated May 24, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). Standardization and Application of an Index of Community Integrity for Waterbirds in the Chesapeake Bay | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_standardization-and-application-of-an-index-of-community-integrity-for-waterbirds-in-the-c/
    Explore at:
    Dataset updated
    May 24, 2017
    Area covered
    Chesapeake Bay
    Description

    This data set is comprised of five files related to the modification and scoring of Index of Waterbird Community Integrity (IWCI) scores for all waterbirds of the Chesapeake Bay. One Excel file (A) contains a list of 100+ Chesapeake waterbird species and their species attribute and IWCI scores. Another Excel file (B) contains case study data from recent surveys of breeding and migratory waterbirds in Chesapeake Bay and shoreline delineations across a disturbance gradient that were used to demonstrate the utility of the modified index. Finally, three supplemental files include an Access database (C), R code (D) and a protocol (E) for running the complex steps to calculate index scores.

  6. f

    Classification on methodological quality and risk of bias of included...

    • plos.figshare.com
    xls
    Updated Jan 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Ackah; Louise Ameyaw; Richard Appiah; David Owiredu; Hosea Boakye; Webster Donaldy; Comos Yarfi; Ulric S. Abonie (2024). Classification on methodological quality and risk of bias of included studies. [Dataset]. http://doi.org/10.1371/journal.pgph.0002769.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 19, 2024
    Dataset provided by
    PLOS Global Public Health
    Authors
    Martin Ackah; Louise Ameyaw; Richard Appiah; David Owiredu; Hosea Boakye; Webster Donaldy; Comos Yarfi; Ulric S. Abonie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Classification on methodological quality and risk of bias of included studies.

  7. Z

    Byrd Polar and Climate Research Center Ice Core Paleoclimatology Datasets in...

    • data.niaid.nih.gov
    Updated Oct 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weber, Austin M. (2023). Byrd Polar and Climate Research Center Ice Core Paleoclimatology Datasets in a Standardized Excel Format [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8353856
    Explore at:
    Dataset updated
    Oct 10, 2023
    Dataset provided by
    Byrd Polar and Climate Research Center
    Authors
    Weber, Austin M.
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    All published datasets from the ice core paleoclimatology (ICP) group at the Byrd Polar and Climate Research Center (BPCRC) are archived in the NOAA-NCEI Paleoclimatology Database (https://www.ncei.noaa.gov/access/paleo-search/?dataTypeId=7). However, the formatting of these datasets is not consistent across the archival files, making it difficult to download and aggregate multiple datasets for research purposes. This repository is intended to provide a simple, consistently formatted archive of Excel files containing the published data for more than 16 ice core records collected by the BPCRC-ICP group since the 1980s.

    These files can be accessed directly in MATLAB with the assistance of the Byrd-ICP Data App, downloadable on Github (https://github.com/weber1158/Byrd-ICP-Data-App).

    The file "2023-ByrdICP-datasets.xlsx " contains a column for each ice core location and a list of the sheet names within the corresponding Excel file for that ice core location.

  8. f

    Overall and sub-regional estimates of 30-day in-hospital stroke case...

    • plos.figshare.com
    xls
    Updated Jan 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Ackah; Louise Ameyaw; Richard Appiah; David Owiredu; Hosea Boakye; Webster Donaldy; Comos Yarfi; Ulric S. Abonie (2024). Overall and sub-regional estimates of 30-day in-hospital stroke case fatality in Sub-Saharan Africa. [Dataset]. http://doi.org/10.1371/journal.pgph.0002769.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 19, 2024
    Dataset provided by
    PLOS Global Public Health
    Authors
    Martin Ackah; Louise Ameyaw; Richard Appiah; David Owiredu; Hosea Boakye; Webster Donaldy; Comos Yarfi; Ulric S. Abonie
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Africa, Sub-Saharan Africa
    Description

    Overall and sub-regional estimates of 30-day in-hospital stroke case fatality in Sub-Saharan Africa.

  9. Student results

    • kaggle.com
    zip
    Updated Jun 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    lizzyld (2023). Student results [Dataset]. https://www.kaggle.com/lizzyld/student-results
    Explore at:
    zip(11154 bytes)Available download formats
    Dataset updated
    Jun 7, 2023
    Authors
    lizzyld
    Description

    Student name, target grade and three test grades. Used as mock data to create student data tables in Excel.

  10. Superstore Sales Analysis

    • kaggle.com
    zip
    Updated Oct 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Reda Elblgihy (2023). Superstore Sales Analysis [Dataset]. https://www.kaggle.com/datasets/aliredaelblgihy/superstore-sales-analysis/versions/1
    Explore at:
    zip(3009057 bytes)Available download formats
    Dataset updated
    Oct 21, 2023
    Authors
    Ali Reda Elblgihy
    Description

    Analyzing sales data is essential for any business looking to make informed decisions and optimize its operations. In this project, we will utilize Microsoft Excel and Power Query to conduct a comprehensive analysis of Superstore sales data. Our primary objectives will be to establish meaningful connections between various data sheets, ensure data quality, and calculate critical metrics such as the Cost of Goods Sold (COGS) and discount values. Below are the key steps and elements of this analysis:

    1- Data Import and Transformation:

    • Gather and import relevant sales data from various sources into Excel.
    • Utilize Power Query to clean, transform, and structure the data for analysis.
    • Merge and link different data sheets to create a cohesive dataset, ensuring that all data fields are connected logically.

    2- Data Quality Assessment:

    • Perform data quality checks to identify and address issues like missing values, duplicates, outliers, and data inconsistencies.
    • Standardize data formats and ensure that all data is in a consistent, usable state.

    3- Calculating COGS:

    • Determine the Cost of Goods Sold (COGS) for each product sold by considering factors like purchase price, shipping costs, and any additional expenses.
    • Apply appropriate formulas and calculations to determine COGS accurately.

    4- Discount Analysis:

    • Analyze the discount values offered on products to understand their impact on sales and profitability.
    • Calculate the average discount percentage, identify trends, and visualize the data using charts or graphs.

    5- Sales Metrics:

    • Calculate and analyze various sales metrics, such as total revenue, profit margins, and sales growth.
    • Utilize Excel functions to compute these metrics and create visuals for better insights.

    6- Visualization:

    • Create visualizations, such as charts, graphs, and pivot tables, to present the data in an understandable and actionable format.
    • Visual representations can help identify trends, outliers, and patterns in the data.

    7- Report Generation:

    • Compile the findings and insights into a well-structured report or dashboard, making it easy for stakeholders to understand and make informed decisions.

    Throughout this analysis, the goal is to provide a clear and comprehensive understanding of the Superstore's sales performance. By using Excel and Power Query, we can efficiently manage and analyze the data, ensuring that the insights gained contribute to the store's growth and success.

  11. f

    Additional file 1 of Consistency of gene starts among Burkholderia genomes

    • springernature.figshare.com
    xls
    Updated Jun 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Dunbar; Judith D Cohn; Michael E Wall (2023). Additional file 1 of Consistency of gene starts among Burkholderia genomes [Dataset]. http://doi.org/10.6084/m9.figshare.12879290.v1
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 14, 2023
    Dataset provided by
    figshare
    Authors
    John Dunbar; Judith D Cohn; Michael E Wall
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 1: 994 Prodigal ortholog sets with inconsistent start sites. The Excel file provides information about the 994 ortholog sets with inconsistent start sites, including the genes within each set and the gene start site revisions required to achieve consistency within each set. (XLS 1 MB)

  12. Age-standardized death rate of T2DM from 2020 to 2030.

    • plos.figshare.com
    xls
    Updated Dec 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yan Li; Hao Zhang; Yi Jiang (2023). Age-standardized death rate of T2DM from 2020 to 2030. [Dataset]. http://doi.org/10.1371/journal.pone.0293681.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 21, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Yan Li; Hao Zhang; Yi Jiang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Age-standardized death rate of T2DM from 2020 to 2030.

  13. n

    Data from: Exploring the multi-level impacts of a youth-led comprehensive...

    • data-staging.niaid.nih.gov
    • search.dataone.org
    • +2more
    zip
    Updated Feb 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laura Leeson (2024). Exploring the multi-level impacts of a youth-led comprehensive sexuality education model in Madagascar using human-centered design methods [Dataset]. http://doi.org/10.5061/dryad.7sqv9s50f
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 28, 2024
    Dataset provided by
    Projet Jeune Leader
    Authors
    Laura Leeson
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Comprehensive sexuality education (CSE) is recognized as a critical tool for addressing sexuality and reproductive health challenges among adolescents. However, little is known about the broader impacts of CSE on populations beyond adolescents, such as schools, families, and communities. This study explores multi-level impacts of an innovative CSE program in Madagascar, which employs young adult CSE educators to teach a three-year curriculum in government middle schools across the country. The two-phased study embraced a participatory approach and qualitative Human-centered Design (HCD) methods. In phase 1, 90 school principals and administrators representing 45 schools participated in HCD workshops, which were held in six regional cities. Phase 2 took place one year later, which included 50 principals from partner schools, and focused on expanding and validating findings from phase 1. From the perspective of school principals and administrators, the results indicate several areas in which CSE programming is having spill-over effects, beyond direct adolescent student sexuality knowledge and behaviors. In the case of this youth-led model in Madagascar, the program has impacted the lives of students (e.g., increased academic motivation and confidence), their parents (e.g., strengthened family relationships and increased parental involvement in schools), theirschools (e.g., increased perceived value of schools and teacher effectiveness), their communities (e.g., increased community connections), and impacted broader structural issues (e.g., improved equity and access to resources such as menstrual pads). While not all impacts of the CSE program were perceived as positive (e.g., students start experimenting with sex and love), the findings uncovered opportunities for targeting investments and refining CSE programming to maximize positive impacts at family, school, and community levels. Methods Data were collected in June 2021 during seven workshops held in six regional cities in Madagascar. Principals and one other member of the school were invited from each of of the youth-led, non-governmental organization's 45 partner schools during the 2020-2021 school year. Principals had the discretion to invite second representatives, which was often the person who worked most closely with or alongside the CSE Educator. If the principal could not attend, then they assigned someone to attend in their place. In total, 90 school administrators participated, which included principals, vice principals, and school monitors representing 45 schools, both urban and rural. Workshops were facilitated by the organization's Monitoring and Evaluation Manager and lasted 3-4 hours. Workshops were held in the organization's regional offices or rented working space (i.e., not in schools), and discussions were facilitated in the local language, Malagasy. Additional organization staff members in attendance took detailed notes which they then cross-referenced with the audio recordings of the discussions as needed to accurately fill a standardized data entry sheet in Microsoft Excel file. The final data entry sheet was translated into English for qualitative analysis. At the workshops, participants responded to two research prompts: 1) What are the different effects CSE programming has on students’ lives? 2) What other effects does the CSE programming have in the school or in the community? In brief, participants brainstormed responses to the prompts individually on post-it notes, then participated in an affinity clustering exercise in which they grouped their ideas based on perceived similarities and differences. In phase 2, data were collected in May 2022 during the non-governmental organization's end of the school year symposium for principals of partner schools to gather feedback on the cluster analysis conducted in phase 1, validate findings, and vote on priority areas. Fifty-one principals from the partner schools from the 2021-2022 school year were invited to participate, and a total of 50 representatives from the schools attended the workshops, representing three regions of Madagascar. Two members of the research team consolidated key findings across the seven workshops conducted in phase 1, paying particular attention to key themes that were raised in multiple different workshops using qualitative content analysis. The team used grounded theory to analyze the data using an inductive approach, in which initial cluster themes were compared across all workshops to draw broader comparisons, and the relationships between clusters were studied to propose a final conceptual model of the CSE program impacts. The final cluster labels were created by the team based on a review of the items within the clusters. The tallied votes from the phase 2 “Visualize the Vote” exercise were collated in a standardized Microsoft excel spreadsheet. Basic counts were calculated for each cluster to reflect the principal’s observations of areas impacted by the CSE programming and the principal’s vote for the areas of impact most important to them. Translations of the Phase 2 principal feedback were reviewed by the coding team and select quotes were used as illustrative examples to contextualize the key clusters.

  14. CET_Dataset

    • kaggle.com
    zip
    Updated Mar 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ami Munshi (2022). CET_Dataset [Dataset]. https://www.kaggle.com/datasets/amimunshi/logisticregression
    Explore at:
    zip(11336 bytes)Available download formats
    Dataset updated
    Mar 31, 2022
    Authors
    Ami Munshi
    Description

    This dummy dataset contains one attribute which is the CET score and one target variable Admitted.

    Based on the CET(Common Entrance Test) score, a suitable algorithm has to be used to predict whether the student is admitted or not. This dataset can be used for logistic regression problem.

    Logistic regression using this dataset is demonstrated in the article and video below-

    https://www.analyticsvidhya.com/blog/2022/02/logistic-regression-using-python-and-excel/

    https://youtu.be/Wbtc-2f0Do0

  15. Development of Indicators for Patient Care and Monitoring Standards for...

    • plos.figshare.com
    application/cdfv2
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seema S. Malik; Roshni Cynthia D’Souza; Pramod Mukund Pashte; Smita Manohar Satoskar; Remilda Joyce D’Souza (2023). Development of Indicators for Patient Care and Monitoring Standards for Secondary Health Care Services of Mumbai [Dataset]. http://doi.org/10.1371/journal.pone.0119813
    Explore at:
    application/cdfv2Available download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Seema S. Malik; Roshni Cynthia D’Souza; Pramod Mukund Pashte; Smita Manohar Satoskar; Remilda Joyce D’Souza
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Mumbai
    Description

    BackgroundThe Qualitative aspect of health care delivery is one of the major factors in reducing morbidity and mortality in a health care setup. The expanding suburban secondary health care delivery facilities of the Municipal Corporation of Greater Mumbai are an important part of the healthcare backbone of Mumbai and therefore the quality of care delivered here needed standardization.Material and MethodsThe project was completed over a period of one year from Jan to Dec, 2013 and implemented in three phases. The framework with components and sub-components were developed and formats for data collection were standardized. The benchmarks were based on past performance in the same hospital and probability was used for development of normal range. An Excel spreadsheet was developed to facilitate data analysis.ResultsThe indicators comprise of 3 components - Statutory Requirements, Patient care & Cure and Administrative efficiency. The measurements made, pointed to the broad areas needing attention.ConclusionThe Indicators for patient care and monitoring standards can be used as a self assessment tool for health care setups for standardization and improvement of delivery of health care services.

  16. Diemthi thptqg 2020 dot 1

    • kaggle.com
    Updated Oct 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hoang (2020). Diemthi thptqg 2020 dot 1 [Dataset]. https://www.kaggle.com/datasets/tranviethoang/diemthi-thptqg-excel
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 3, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Hoang
    Description

    Dataset

    This dataset was created by Hoang

    Contents

  17. Excel spreadsheet featuring extracted data.

    • plos.figshare.com
    xlsx
    Updated May 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Owen Hibberd; James Price; Stephen H. Thomas; Tim Harris; Edward B. G. Barnard (2024). Excel spreadsheet featuring extracted data. [Dataset]. http://doi.org/10.1371/journal.pone.0303109.s003
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 28, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Owen Hibberd; James Price; Stephen H. Thomas; Tim Harris; Edward B. G. Barnard
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ObjectivesIn adult major trauma patients admission hypocalcaemia occurs in approximately half of cases and is associated with increased mortality. However, data amongst paediatric patients are limited. The objectives of this review were to determine the incidence of admission ionised hypocalcaemia in paediatric major trauma patients and to explore whether hypocalcaemia is associated with adverse outcomes.MethodsA systematic review was conducted following PRISMA guidelines. All studies including major trauma patients

  18. Extended 1.0 Dataset of "Concentration and Geospatial Modelling of Health...

    • zenodo.org
    bin, csv, pdf
    Updated Sep 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter Domjan; Peter Domjan; Viola Angyal; Viola Angyal; Istvan Vingender; Istvan Vingender (2024). Extended 1.0 Dataset of "Concentration and Geospatial Modelling of Health Development Offices' Accessibility for the Total and Elderly Populations in Hungary" [Dataset]. http://doi.org/10.5281/zenodo.13826993
    Explore at:
    bin, pdf, csvAvailable download formats
    Dataset updated
    Sep 23, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Peter Domjan; Peter Domjan; Viola Angyal; Viola Angyal; Istvan Vingender; Istvan Vingender
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Sep 23, 2024
    Area covered
    Hungary
    Description

    Introduction

    We are enclosing the database used in our research titled "Concentration and Geospatial Modelling of Health Development Offices' Accessibility for the Total and Elderly Populations in Hungary", along with our statistical calculations. For the sake of reproducibility, further information can be found in the file Short_Description_of_Data_Analysis.pdf and Statistical_formulas.pdf

    The sharing of data is part of our aim to strengthen the base of our scientific research. As of March 7, 2024, the detailed submission and analysis of our research findings to a scientific journal has not yet been completed.

    The dataset was expanded on 23rd September 2024 to include SPSS statistical analysis data, a heatmap, and buffer zone analysis around the Health Development Offices (HDOs) created in QGIS software.

    Short Description of Data Analysis and Attached Files (datasets):

    Our research utilised data from 2022, serving as the basis for statistical standardisation. The 2022 Hungarian census provided an objective basis for our analysis, with age group data available at the county level from the Hungarian Central Statistical Office (KSH) website. The 2022 demographic data provided an accurate picture compared to the data available from the 2023 microcensus. The used calculation is based on our standardisation of the 2022 data. For xlsx files, we used MS Excel 2019 (version: 1808, build: 10406.20006) with the SOLVER add-in.

    Hungarian Central Statistical Office served as the data source for population by age group, county, and regions: https://www.ksh.hu/stadat_files/nep/hu/nep0035.html, (accessed 04 Jan. 2024.) with data recorded in MS Excel in the Data_of_demography.xlsx file.

    In 2022, 108 Health Development Offices (HDOs) were operational, and it's noteworthy that no developments have occurred in this area since 2022. The availability of these offices and the demographic data from the Central Statistical Office in Hungary are considered public interest data, freely usable for research purposes without requiring permission.

    The contact details for the Health Development Offices were sourced from the following page (Hungarian National Population Centre (NNK)): https://www.nnk.gov.hu/index.php/efi (n=107). The Semmelweis University Health Development Centre was not listed by NNK, hence it was separately recorded as the 108th HDO. More information about the office can be found here: https://semmelweis.hu/egeszsegfejlesztes/en/ (n=1). (accessed 05 Dec. 2023.)

    Geocoordinates were determined using Google Maps (N=108): https://www.google.com/maps. (accessed 02 Jan. 2024.) Recording of geocoordinates (latitude and longitude according to WGS 84 standard), address data (postal code, town name, street, and house number), and the name of each HDO was carried out in the: Geo_coordinates_and_names_of_Hungarian_Health_Development_Offices.csv file.

    The foundational software for geospatial modelling and display (QGIS 3.34), an open-source software, can be downloaded from:

    https://qgis.org/en/site/forusers/download.html. (accessed 04 Jan. 2024.)

    The HDOs_GeoCoordinates.gpkg QGIS project file contains Hungary's administrative map and the recorded addresses of the HDOs from the

    Geo_coordinates_and_names_of_Hungarian_Health_Development_Offices.csv file,

    imported via .csv file.

    The OpenStreetMap tileset is directly accessible from www.openstreetmap.org in QGIS. (accessed 04 Jan. 2024.)

    The Hungarian county administrative boundaries were downloaded from the following website: https://data2.openstreetmap.hu/hatarok/index.php?admin=6 (accessed 04 Jan. 2024.)

    HDO_Buffers.gpkg is a QGIS project file that includes the administrative map of Hungary, the county boundaries, as well as the HDO offices and their corresponding buffer zones with a radius of 7.5 km.

    Heatmap.gpkg is a QGIS project file that includes the administrative map of Hungary, the county boundaries, as well as the HDO offices and their corresponding heatmap (Kernel Density Estimation).

    A brief description of the statistical formulas applied is included in the Statistical_formulas.pdf.

    Recording of our base data for statistical concentration and diversification measurement was done using MS Excel 2019 (version: 1808, build: 10406.20006) in .xlsx format.

    • Aggregated number of HDOs by county: Number_of_HDOs.xlsx
    • Standardised data (Number of HDOs per 100,000 residents): Standardized_data.xlsx
    • Calculation of the Lorenz curve: Lorenz_curve.xlsx
    • Calculation of the Gini index: Gini_Index.xlsx
    • Calculation of the LQ index: LQ_Index.xlsx
    • Calculation of the Herfindahl-Hirschman Index: Herfindahl_Hirschman_Index.xlsx
    • Calculation of the Entropy index: Entropy_Index.xlsx
    • Regression and correlation analysis calculation: Regression_correlation.xlsx

    Using the SPSS 29.0.1.0 program, we performed the following statistical calculations with the databases Data_HDOs_population_without_outliers.sav and Data_HDOs_population.sav:

    • Regression curve estimation with elderly population and number of HDOs, excluding outlier values (Types of analyzed equations: Linear, Logarithmic, Inverse, Quadratic, Cubic, Compound, Power, S, Growth, Exponential, Logistic, with summary and ANOVA analysis table): Curve_estimation_elderly_without_outlier.spv
    • Pearson correlation table between the total population, elderly population, and number of HDOs per county, excluding outlier values such as Budapest and Pest County: Pearson_Correlation_populations_HDOs_number_without_outliers.spv.
    • Dot diagram including total population and number of HDOs per county, excluding outlier values such as Budapest and Pest Counties: Dot_HDO_total_population_without_outliers.spv.
    • Dot diagram including elderly (64<) population and number of HDOs per county, excluding outlier values such as Budapest and Pest Counties: Dot_HDO_elderly_population_without_outliers.spv
    • Regression curve estimation with total population and number of HDOs, excluding outlier values (Types of analyzed equations: Linear, Logarithmic, Inverse, Quadratic, Cubic, Compound, Power, S, Growth, Exponential, Logistic, with summary and ANOVA analysis table): Curve_estimation_without_outlier.spv
    • Dot diagram including elderly (64<) population and number of HDOs per county: Dot_HDO_elderly_population.spv
    • Dot diagram including total population and number of HDOs per county: Dot_HDO_total_population.spv
    • Pearson correlation table between the total population, elderly population, and number of HDOs per county: Pearson_Correlation_populations_HDOs_number.spv
    • Regression curve estimation with total population and number of HDOs, (Types of analyzed equations: Linear, Logarithmic, Inverse, Quadratic, Cubic, Compound, Power, S, Growth, Exponential, Logistic, with summary and ANOVA analysis table): Curve_estimation_total_population.spv

    For easier readability, the files have been provided in both SPV and PDF formats.

    The translation of these supplementary files into English was completed on 23rd Sept. 2024.

    If you have any further questions regarding the dataset, please contact the corresponding author: domjan.peter@phd.semmelweis.hu

  19. Reseller Information/pg60

    • kaggle.com
    zip
    Updated Oct 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nsrenfarah (2023). Reseller Information/pg60 [Dataset]. https://www.kaggle.com/datasets/nsrenfarah/reseller-informationpg60
    Explore at:
    zip(26234 bytes)Available download formats
    Dataset updated
    Oct 22, 2023
    Authors
    nsrenfarah
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    A colleague, Lucas, has asked you to update a spreadsheet called Reseller Details that records details of Adventure Work’s resellers in the United States. This information in the spreadsheet was downloaded from another system. The download process created several inconsistencies or errors within the data.

    These errors include unnecessary spaces, the use of the wrong case, and entries that need to be joined together or split apart.

    You now need to add formulas to the worksheet to standardize the data so that it can be used for analysis.

  20. f

    Content Analysis: The Credibility Gap in CSR Reporting

    • uvaauas.figshare.com
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    I.J. Lock; Peter Seele (2023). Content Analysis: The Credibility Gap in CSR Reporting [Dataset]. http://doi.org/10.21942/uva.9933266.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    University of Amsterdam / Amsterdam University of Applied Sciences
    Authors
    I.J. Lock; Peter Seele
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset and codebook of the content analysis study published in the Journal of Cleaner Production:https://doi.org/10.1016/j.jclepro.2016.02.060Complete codebook in Word format;Dataset as excel file including the computed credibility scores

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Gaurav Tawri (2025). Netflix Movies and TV Shows Dataset Cleaned(excel) [Dataset]. https://www.kaggle.com/datasets/gauravtawri/netflix-movies-and-tv-shows-dataset-cleanedexcel
Organization logo

Netflix Movies and TV Shows Dataset Cleaned(excel)

Cleaned Netflix dataset with detailed formulas and step-by-step documentation

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 8, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Gaurav Tawri
Description

This dataset is a cleaned and preprocessed version of the original Netflix Movies and TV Shows dataset available on Kaggle. All cleaning was done using Microsoft Excel — no programming involved.

🎯 What’s Included: - Cleaned Excel file (standardized columns, proper date format, removed duplicates/missing values) - A separate "formulas_used.txt" file listing all Excel formulas used during cleaning (e.g., TRIM, CLEAN, DATE, SUBSTITUTE, TEXTJOIN, etc.) - Columns like 'date_added' have been properly formatted into DMY structure - Multi-valued columns like 'listed_in' are split for better analysis - Null values replaced with “Unknown” for clarity - Duration field broken into numeric + unit components

🔍 Dataset Purpose: Ideal for beginners and analysts who want to: - Practice data cleaning in Excel - Explore Netflix content trends - Analyze content by type, country, genre, or date added

📁 Original Dataset Credit: The base version was originally published by Shivam Bansal on Kaggle: https://www.kaggle.com/shivamb/netflix-shows

📌 Bonus: You can find a step-by-step cleaning guide and the same dataset on GitHub as well — along with screenshots and formulas documentation.

Search
Clear search
Close search
Google apps
Main menu