90 datasets found
  1. Dirty Excel Data

    • kaggle.com
    zip
    Updated Feb 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shiva Vashishtha (2022). Dirty Excel Data [Dataset]. https://www.kaggle.com/datasets/shivavashishtha/dirty-excel-data
    Explore at:
    zip(13123 bytes)Available download formats
    Dataset updated
    Feb 23, 2022
    Authors
    Shiva Vashishtha
    Description

    Dataset

    This dataset was created by Shiva Vashishtha

    Contents

  2. B

    Data Cleaning Sample

    • borealisdata.ca
    • dataone.org
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Borealis
    Authors
    Rong Luo
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Sample data for exercises in Further Adventures in Data Cleaning.

  3. q

    Cleaning Biodiversity Data: A Botanical Example Using Excel or RStudio

    • qubeshub.org
    Updated Jul 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shelly Gaynor (2020). Cleaning Biodiversity Data: A Botanical Example Using Excel or RStudio [Dataset]. http://doi.org/10.25334/DRGD-F069
    Explore at:
    Dataset updated
    Jul 16, 2020
    Dataset provided by
    QUBES
    Authors
    Shelly Gaynor
    Description

    Access and clean an open source herbarium dataset using Excel or RStudio.

  4. B

    Navigating Stats Can Data & Scrubbing Data Clean with Excel Workshop

    • borealisdata.ca
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucia Costanzo; Vivek Jadon (2024). Navigating Stats Can Data & Scrubbing Data Clean with Excel Workshop [Dataset]. http://doi.org/10.5683/SP3/FF6AI9
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Borealis
    Authors
    Lucia Costanzo; Vivek Jadon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Canada
    Description

    Ahoy, data enthusiasts! Join us for a hands-on workshop where you will hoist your sails and navigate through the Statistics Canada website, uncovering hidden treasures in the form of data tables. With the wind at your back, you’ll master the art of downloading these invaluable Stats Can datasets while braving the occasional squall of data cleaning challenges using Excel with your trusty captains Vivek and Lucia at the helm.

  5. FIFa21 Messy Dataset cleaned and transformed

    • kaggle.com
    zip
    Updated Feb 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nicolas Mora Hansen (2024). FIFa21 Messy Dataset cleaned and transformed [Dataset]. https://www.kaggle.com/datasets/nicolasmorahansen/fifa21-messy-dataset-cleaned-and-transformed
    Explore at:
    zip(5473572 bytes)Available download formats
    Dataset updated
    Feb 26, 2024
    Authors
    Nicolas Mora Hansen
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    FIFA21 - Data Transformation Cleaning and Transformation

    EA Sports FIFA 21 is a popular video game that simulates football matches. Often, data collected from this game might be messy, containing inconsistencies, missing values, and various formatting issues.

    For this project, I will attempt to clean, organize and prepare this messy FIFA_21 data for analysis using just Excel. Although, it can be done somewhat faster using Python, R, or other programming languages; the challenge at hand is to use Excel.

    Observations(Rows)=18980

    1. 'Spot blank values'.'COUNTBLANK'.

    Column 'Loan Date End' has '17966' blanks.

    2. 'Spot 'zero' values'. 'COUNTIF'.

    =COUNTIF(A1:A18980; "=0")

    'Value', 'Wage', 'Release Clause', 'Hits' have '0' values.

    3.'Column Headers'

    =SUBSTITUTE(A1; " "; "_")

    Unique_Atributes(columns)=76

    1.'Height'

    At first glance this height column looked like it needed a simple formula to turn a string ending in 'cm' to real numbers expressing a height in centimeteres, but then it was visible that some values were also in feet. And they were expressed with apostrophes and air quotes which called for a more intricate formula to fetch every value and transform it. Inches had to be turned to feet. Then the total value turned into centimeteres. The 'IF' formula verifies if the string is a number by leaving out the 'cm' 'feet(')' and 'inches(")' from the string. If it is centimeteres, the number is kept. If it is feet, the digits before the airquotes are kept, the digits after the airquotes (the inches) are turned into feet, then added together, and finally turned into centimeters.

    =IF(ISNUMBER(FIND("cm";$O2)); VALUE(SUBSTITUTE($O2; "cm"; "")); ROUND((LEFT($O2; FIND("'"; $O2) - 1) * 12 + MID($O2; FIND("'"; $O2) + 1; FIND(""""; $O2) - FIND("'"; $O2) - 1)) * 2,54;0))

    2.'Weight'

    Weight was added in 'Kg' and 'Lbs'. For 'Kg' the value is turned into numbers. For 'Lbs' the value is converted into 'Kg' and then turned into numbers. The result is rounded up to null decimal points.

    =ROUND(IF(ISNUMBER(FIND("kg";$P2));VALUE(SUBSTITUTE($P2;"kg";""))*1;IF(ISNUMBER(FIND("lbs";$P2));VALUE(SUBSTITUTE($P2;"lbs";""))/2,205;0));0)

    3.'Joined'

    A new column is added to the right of 'Joined' by the name 'WithClub10Years'. This column shows whether the player has been at the same club for a minimum of 10 years.

    =IF(YEAR(NOW())-YEAR(T2)>=10; "10 Years"; "")

    4.'Value', 'Wage', 'Release Clause'

    The monetary figures were converted into numerical values only. The values are Euros. The 'M' and 'K' removed and its according figure multiplied to show millions and thousands respectively. Decimal points delimiter changed from '.' to ',' for calculation.

    =IF(ISNUMBER(FIND("M"; Z2)); VALUE(SUBSTITUTE(Z2; "M"; ""))*1000000; IF(ISNUMBER(FIND("K"; Z2)); VALUE(SUBSTITUTE(Z2; "K"; ""))*1000; Z2*1))

    5.'W/F', 'SM', 'IR'

    Values included stars. Stars were removed and string turned to numbers.

    =LEFT(BO2; 1)

    Conclusion

    The clean dataset is now ready for more analysis, such as exploring player statistics, team performance, or other insigths that can provide a deeper understanding of the FIFA 21 game.

  6. Dirty Dataset to practice Data Cleaning

    • kaggle.com
    zip
    Updated Nov 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amrutha yenikonda (2023). Dirty Dataset to practice Data Cleaning [Dataset]. https://www.kaggle.com/datasets/amruthayenikonda/dirty-dataset-to-practice-data-cleaning
    Explore at:
    zip(1241 bytes)Available download formats
    Dataset updated
    Nov 3, 2023
    Authors
    Amrutha yenikonda
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The dataset has been obtained by web scraping a Wikipedia page for which code is linked below: https://www.kaggle.com/amruthayenikonda/simple-web-scraping-using-pandas

    This dataset can be used to practice data cleaning and manipulation for example dropping of unwanted columns, null vales, removing symbols etc

  7. v

    Clean,excel Imports in India from Italy

    • volza.com
    csv
    Updated Jun 3, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Volza FZ LLC (2026). Clean,excel Imports in India from Italy [Dataset]. https://www.volza.com/imports-india/india-import-data-of-clean-excel-from-italy
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 3, 2026
    Dataset authored and provided by
    Volza FZ LLC
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Italy, India
    Variables measured
    Count of importers, Sum of import value, 2014-01-01/2021-09-30, Count of import shipments
    Description

    Analyze 950 Clean,excel import shipments to India from Italy till Mar-26. Import data includes Buyers, Suppliers, Pricing, Qty & Contacts.

  8. v

    Global Clean Excel export import trade data, buyers & suppliers

    • volza.com
    csv
    Updated Sep 3, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Volza FZ LLC (2026). Global Clean Excel export import trade data, buyers & suppliers [Dataset]. https://www.volza.com/trade-data-global/global-exporters-importers-export-import-data-of-clean+excel
    Explore at:
    csvAvailable download formats
    Dataset updated
    Sep 3, 2026
    Dataset authored and provided by
    Volza FZ LLC
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Count of exporters, Count of importers, Count of shipments, Sum of export import value
    Description

    Discover New & profitable Clean Excel buyers & suppliers, Access 2,289 export import shipment records till Dec - 25 with 52 importers & 33 Exporters.

  9. Data-analysis-EXCEL-POWER-BI

    • kaggle.com
    zip
    Updated Jul 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Samir (2023). Data-analysis-EXCEL-POWER-BI [Dataset]. https://www.kaggle.com/datasets/ahmedsamir11111/data-analysis-excel-power-bi/discussion
    Explore at:
    zip(3235955 bytes)Available download formats
    Dataset updated
    Jul 27, 2023
    Authors
    Ahmed Samir
    Description

    In the beginning, the case was just data for a company that did not indicate any useful information that would help decision-makers. In this case, after collecting a number of revenues and expenses over the months. Needed to know the answers to a number of questions to make important decisions based on intuition-free data. The Questions:- About Rev. & Exp.
    - What is the total sales and profit for the whole period? And What Total products sold? And What is Net profit? - In which month was the highest percentage of revenue achieved? And in the same month, what is the largest day have amount of revenue? - In which month was the highest percentage of expenses achieved? And in the same month, what is the largest day have amount of exp.? - What is the extent of the change in expenditures for each month? Percentage change in net profit over the months? About Distribution - What is the number of products sold each month in the largest state? -The top 3 largest states buying products during the two years? Comparison - Between Sales Method by Sales? - Between Men and Women’s Product by Sales? - Between Retailer by Profit?

    What I did? - Understanding the data - preprocessing and clean the data - Solve The problems in the cleaning like missing data or false type data - querying the data and make some calculations like "COGS" with power query "Excel". - Modeling and make some measures on the data with power pivot "Excel" - After finishing processing and preparation, I made Some Pivot tables to answers the questions. - Last, I made a dashboard with Power BI to visualize The Results.

  10. Cleaned NHANES 1988-2018

    • figshare.com
    txt
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet (2025). Cleaned NHANES 1988-2018 [Dataset]. http://doi.org/10.6084/m9.figshare.21743372.v9
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 18, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables conveydemographics (281 variables),dietary consumption (324 variables),physiological functions (1,040 variables),occupation (61 variables),questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood),medications (29 variables),mortality information linked from the National Death Index (15 variables),survey weights (857 variables),environmental exposure biomarker measurements (598 variables), andchemical comments indicating which measurements are below or above the lower limit of detection (505 variables).csv Data Record: The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file.The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments."dictionary_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES."dictionary_harmonized_categories.csv" contains the harmonized categories for the categorical variables.“dictionary_drug_codes.csv” contains the dictionary for descriptors on the drugs codes.“nhanes_inconsistencies_documentation.xlsx” is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules.R Data Record: For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file.“w - nhanes_1988_2018.RData” contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data.“m - nhanes_1988_2018.R” shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data.Example starter codes: The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order.“example_0 - merge_datasets_together.Rmd” demonstrates how to merge the curated NHANES datasets together.“example_1 - account_for_nhanes_design.Rmd” demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model.“example_2 - calculate_summary_statistics.Rmd” demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design.“example_3 - run_multiple_regressions.Rmd” demonstrates how run multiple regression models with and without adjusting for the sampling design.

  11. n

    Data from: Designing data science workshops for data-intensive environmental...

    • data.niaid.nih.gov
    • datasetcatalog.nlm.nih.gov
    • +1more
    zip
    Updated Dec 8, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allison Theobold; Stacey Hancock; Sara Mannheimer (2020). Designing data science workshops for data-intensive environmental science research [Dataset]. http://doi.org/10.5061/dryad.7wm37pvp7
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 8, 2020
    Dataset provided by
    California State Polytechnic University
    Montana State University
    Authors
    Allison Theobold; Stacey Hancock; Sara Mannheimer
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Over the last 20 years, statistics preparation has become vital for a broad range of scientific fields, and statistics coursework has been readily incorporated into undergraduate and graduate programs. However, a gap remains between the computational skills taught in statistics service courses and those required for the use of statistics in scientific research. Ten years after the publication of "Computing in the Statistics Curriculum,'' the nature of statistics continues to change, and computing skills are more necessary than ever for modern scientific researchers. In this paper, we describe research on the design and implementation of a suite of data science workshops for environmental science graduate students, providing students with the skills necessary to retrieve, view, wrangle, visualize, and analyze their data using reproducible tools. These workshops help to bridge the gap between the computing skills necessary for scientific research and the computing skills with which students leave their statistics service courses. Moreover, though targeted to environmental science graduate students, these workshops are open to the larger academic community. As such, they promote the continued learning of the computational tools necessary for working with data, and provide resources for incorporating data science into the classroom.

    Methods Surveys from Carpentries style workshops the results of which are presented in the accompanying manuscript.

    Pre- and post-workshop surveys for each workshop (Introduction to R, Intermediate R, Data Wrangling in R, Data Visualization in R) were collected via Google Form.

    The surveys administered for the fall 2018, spring 2019 academic year are included as pre_workshop_survey and post_workshop_assessment PDF files. 
    The raw versions of these data are included in the Excel files ending in survey_raw or assessment_raw.
    
      The data files whose name includes survey contain raw data from pre-workshop surveys and the data files whose name includes assessment contain raw data from the post-workshop assessment survey.
    
    
    The annotated RMarkdown files used to clean the pre-workshop surveys and post-workshop assessments are included as workshop_survey_cleaning and workshop_assessment_cleaning, respectively. 
    The cleaned pre- and post-workshop survey data are included in the Excel files ending in clean. 
    The summaries and visualizations presented in the manuscript are included in the analysis annotated RMarkdown file.
    
  12. Students Results Analysis using Microsoft Excel

    • kaggle.com
    zip
    Updated Oct 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OIE (2025). Students Results Analysis using Microsoft Excel [Dataset]. https://www.kaggle.com/datasets/emmyofh/students-results-analysis-using-microsoft-excel
    Explore at:
    zip(31469 bytes)Available download formats
    Dataset updated
    Oct 17, 2025
    Authors
    OIE
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This dataset was created to evaluate students’ performance in the most recent school examination. The goal is to help the school administration understand overall academic achievement, examine score distribution across grades, and identify student groups that may need additional academic support to improve learning outcomes.

    The dataset provides detailed student result records, including subjects, scores, grades, and performance categories. It serves as a practical resource for educators, analysts, and data learners who wish to explore educational data using Excel or data analytics tools.

    Tool Used: Microsoft Excel Spreadsheet

    Data Frame Process: This analysis followed the Google Data Analytics data-phase approach, which involves:

    Ask: Define the key questions and objectives

    Prepare: Organize and clean the student result data

    Process: Perform calculations and structure the data in Excel

    Analyze: Evaluate performance trends and identify weak areas

    Share: Present findings using tables, charts, and summaries

    Act: Provide actionable recommendations to improve student outcomes

  13. Sales data analysis using MS Excel

    • kaggle.com
    zip
    Updated May 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yerzat Tursunkulov (2024). Sales data analysis using MS Excel [Dataset]. https://www.kaggle.com/datasets/yerzattursunkulov/sales-data-analysis-using-ms-excel
    Explore at:
    zip(31983063 bytes)Available download formats
    Dataset updated
    May 8, 2024
    Authors
    Yerzat Tursunkulov
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    The Orders database contains information on the following variables. • Continuous variables: Row ID, Order ID, Order Date, Ship Date, Customer ID, Product ID, Sales, Quantity, Discount, Profit, Shipping Cost
    • Categorical variables: Ship Mode, Customer Name, Segment, Postal Code, City, State, Country, Region, Market, Category, Subcategory, Product Name, Order Priority

    The purpose of this project: 1. To use descriptive statistics methods to assess the sales performance across various segments, markets, product categories and subcategories; 2. To use diagnostic analytics methods to understand the statistical significance of the factors that influence sales; 3. Use predictive analytics (regression) to understand the strengths of the relationship between sales and sales drivers and generate a regression formula to predict sales 4. develop a sales forecasting model based on the insights.

    Descriptive analytics Descriptive statistics for sales https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F848f47b38b7f2360163bb2221703c658%2FPicture2.png?generation=1715109635788424&alt=media" alt="">

    Frequency distribution for sales Around 44,500 transactions of value >=USD 500. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F39cfd8ffd8fdf296300bb9f1fa5243e2%2FPicture3.png?generation=1715109667755923&alt=media" alt="">

    Sales values across markets https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F3385959d11b6daafae24c848b4b00f13%2FPicture4.png?generation=1715109744629587&alt=media" alt="">

    We see an increase in sales across all markets and throughout 2012-2015. We have high sales volumes in the USCA and LATAM markets:
    • USCA: USD 757,108 in 2015; • LATAM: USD 706,632 in 2015.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F4aa59b5a5b980aad6873c8a4af4cd223%2FPicture1.png?generation=1715109770510368&alt=media" alt="">

    Sales across product categories https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F867cbe622bf94d25a25a1c4b9281656d%2FPicture5.png?generation=1715109794950614&alt=media" alt="">

    Office supplies were the largely sold product category in 2012-2015. Technology was the least sold product category by quantity. However, the Technology category yields high sales. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F5c74664f77cce2bc2f7c77c7b01e9890%2FPicture6.png?generation=1715109834309500&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2Fd3bb766183e9f58fbf009a998c01adf6%2FPicture7.png?generation=1715109872961254&alt=media" alt="">

    Further analysis of profitable products reveals that phones and copiers demonstrate high sales. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F109c4c3eab81fa581c19a5c09beff839%2FPicture9.png?generation=1715109914590660&alt=media" alt="">

    Sales across segments The data reveals that there are high sales in the Consumer segment across all product categories. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F65075cc20028a37a1aff6932fa89d3d5%2FPicture10.png?generation=1715109992655572&alt=media" alt="">

    Diagnostic analytics

    Two sample T-test Using a t-test, we can evaluate how sales differ across different segments, regions, and product types. T-test allows us to evaluate the statistical significance of sales samples. The two-sample t-test of sales numbers across markets resulted in the statistical significance of sales in USCA and LATAM markets with p-values >0.05. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F7b7264d5f44a9a79b352028b28d1c618%2FPicture11.png?generation=1715110082746375&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F4061ef38ea83d7e3bbd252a802863e8f%2FPicture12.png?generation=1715110097203251&alt=media" alt="">

    The two-sample t-test of sales numbers across product categories resulted in the statistical significance of sales in Office supplies and Technology categories with p-values >0.05. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2Fd9994377d605222d77ef67af3e273771%2FPicture13.png?generation=1715110126112322&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F669779e9aad19d51a28fb44e7c484bc7%2FPicture14.png?generation=1715110140543290&alt=media" alt="">

    Pearson correlation The correlation of continuous values in the dataset allows us to see the relationship between sales, quantity sold, shipping costs and profit. ![](https://www.googleapis.com/download/sto...

  14. u

    Data from: Survey data from the Australian Marine Debris Initiative

    • research.usc.edu.au
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heidi Tait; Jodi Jones; Caitlin Smith; Kathy Townsend, Survey data from the Australian Marine Debris Initiative [Dataset]. https://research.usc.edu.au/esploro/outputs/dataset/Survey-data-from-the-Australian-Marine/991016398702621
    Explore at:
    csv(7054018 bytes)Available download formats
    Dataset provided by
    University of the Sunshine Coast
    Authors
    Heidi Tait; Jodi Jones; Caitlin Smith; Kathy Townsend
    Time period covered
    2024
    Description

    Survey data from the Australian Marine Debris Initiative and the result of spatial analysis from multiple creative commons datasets. Data consists of: • Spatial Data Queensland Coastline – Event summaries within an Excel data table and shapefile • All years • Number of Items removed, Weight volunteers, Volume, Distance, Latitude and Longitude. • Contributing organisation files table/ sites • Environmental, physical and biological variables associated with the closest catchment to each debris survey. TBF has made all reasonable efforts to ensure that the information in the Custom Dataset is accurate. TBF will not be held responsible: • for the way these data are used by the Entity for their Reports; • for any errors that may be contained in the Custom Dataset; or • any direct or indirect damage the use of the Custom Dataset may cause. Data collected by TBF comes from citizen science initiatives and is taken at face value from contributors with each entry being vetted and periodic checks being made to maintain the integrity of the overall dataset. Some clean-up data has been extrapolated by data collectors. Some weight and distance details have not been provided by contributors. The data was collected by various organisations and individuals in clean-up events at their chosen locations where man-made items greater than 5mm were removed from the beach, and sorted, counted and recorded on data sheets, using CyberTracker software devices or the AMDI mobile application. Items were identified according to the method laid out in the TBF Marine Debris Identification Manual in which items are grouped according to their material categories (the manual is available on the TBF website). The length of beach cleaned is at the discretion of the clean-up group and the total weight of items removed is either weighed with handheld scales or estimated.

  15. Power BI Sample Data

    • kaggle.com
    zip
    Updated Oct 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shwetank Chaudhary (2022). Power BI Sample Data [Dataset]. https://www.kaggle.com/datasets/shwetankchaudhary/power-bi-sample-data
    Explore at:
    zip(73587 bytes)Available download formats
    Dataset updated
    Oct 20, 2022
    Authors
    Shwetank Chaudhary
    Description

    This a dataset of finances which are also available in Power BI for practice. Use this dataset to practice Power BI.

  16. m

    Data for: Is visual motivation for cleaning surfaces in the kitchen...

    • data.mendeley.com
    Updated Mar 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Trond Moretro (2020). Data for: Is visual motivation for cleaning surfaces in the kitchen consistent with a hygienically clean environment? [Dataset]. http://doi.org/10.17632/62js8885bn.1
    Explore at:
    Dataset updated
    Mar 31, 2020
    Authors
    Trond Moretro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset consist of: 1. soildatato sharing. Excel table showing visual detection (1) or no detection (0) by 15 consumers of three types of food soils on cutting boards or counter tops 2.visualdetection. Excel table showing data for 13 consumers doing visual detection (scale clean =1 to dirty =4) of kitchen surfaces, and swabs used at kitchen surfaces. 3.survivalpathogensdrysoil. Excel table showing fate of Salmonella, Campylobacter and total counts when dried in 3 types of food soils and water

  17. Data set: St. Louis River Watershed, MN Conductivity Assessment March 2022

    • catalog.data.gov
    • datasets.ai
    Updated Jul 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2025). Data set: St. Louis River Watershed, MN Conductivity Assessment March 2022 [Dataset]. https://catalog.data.gov/dataset/data-set-st-louis-river-watershed-mn-conductivity-assessment-march-2022
    Explore at:
    Dataset updated
    Jul 18, 2025
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Area covered
    Minnesota, Saint Louis River
    Description

    Data used to evaluate potential downstream impacts of the NorthMet Mine, by USEPA Office of Research and Development is providing, for USEPA Region 5’s use, including a characterization of stream specific conductivity (SC) levels, least disturbed background SC, and SC levels that may exceed the Fond du Lac Band’s WQ standards and adversely affect aquatic life, including brook trout (Salvelinus fontinalis), lake sturgeon (Acipenser fulvescens), and benthic macroinvertebrates. Keywords: Conductivity, St. Louis River, benthic invertebrates; mining The attached Excel Pedigree includes: _Datasets: Data file uploaded to EPA Science Hub and/or Environmental Data Set Gateway _R : Clean R scripts used to generate document figures and tables _Tables_Figures: Files generated from R script and used in the Region 5 memo 20220325 R Code and Data: All additional files used for this project, including original files, intermediate files, extra output files, and extra functions. The "_R" folder contains four subfolders. Each subfolder has several R scripts, input and output files, and an R project file. Users can run R scripts directly from each subfolder by installing R, RStudio, and associated R packages. Data Dictionary: See tab DataDictionary in Excel file Datasets: Simplified language is used in the text to identify parent data sets. Source and File names are retained in this pedigree in original form to enable R-scripts to retain functionality. • Thingvold et al. (1975-1977) • Griffith (1998-2009) • Predicted background (2000-2015) • Water Quality Portal (1996-2021) • Water Quality Portal Less Disturbed (1996-2021) • Minnesota Pollution Control Agency (MPCA) (1996-2013) • Mid-Atlantic Highlands (1990 to 2014). This dataset is associated with the following publication: Cormier, S., and Y. Wang. Appendix C: ORD Specific Conductance Memo, from Susan Cormier to Tera Fong. March 15, 2022. Assessment of effects of increased ion concentrations in the St. Louis River Watershed with special attention to potential mining influence and the jurisdiction of the Fond du Lac Band of Lake Superior Chippewa. U.S. Environmental Protection Agency, Washington, DC, USA, 2022.

  18. KAP WASH 2018 in South Sudan's Ajuong Thok and Pamir Camps - South Sudan

    • microdata.worldbank.org
    • datacatalog.ihsn.org
    • +1more
    Updated Sep 21, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samaritan's Purse (2021). KAP WASH 2018 in South Sudan's Ajuong Thok and Pamir Camps - South Sudan [Dataset]. https://microdata.worldbank.org/catalog/3891
    Explore at:
    Dataset updated
    Sep 21, 2021
    Dataset provided by
    United Nations High Commissioner for Refugeeshttp://www.unhcr.org/
    Samaritan's Purse
    Time period covered
    2018
    Area covered
    South Sudan
    Description

    Abstract

    A Knowledge, Attitudes, and Practices (KAP) survey was conducted in Ajuong Thok and Pamir Refugee Camps in November 2018 to determine the current Water, Sanitation, and Hygiene (WASH) conditions as well as hygiene attitudes and practices within the households (HHs) surveyed. The assessment utilized a systematic random sampling method, and a total of 1,040 HHs (520 HHs in each location) were surveyed using mobile data collection (MDC) within a period of 10 days. Data was cleaned and analyzed in Excel. The summary of the results is presented in this report.

    The findings showed that the overall average number of liters of water per person per day was 21, in both Ajuong Thok and Pamir Camps, which was slightly higher than the recommended Office of the United Nations High Commissioner for Refugees (UNHCR) minimum standard of at least 20 liters of water available per person per day. This is a slight improvement from the 19.5 liters reported the previous year. The average HH size was six people. Women comprised 83.2% of the surveyed respondents and males 16.8%. Almost all the respondents were refugees, constituting 99.6%. The refugees were aware of the key health and hygiene practices, possibly as a result of routine health and hygiene messages delivered to them by Samaritan´s Purse (SP), Africa Humanitarian Action (AHA) and International Rescue Committee (IRC). Most refugees had knowledge about keeping water containers clean, washing hands during critical times, safe excreta disposal and disease prevention.

    Geographic coverage

    Ajuong Thok and Pamir Refugee Camps

    Analysis unit

    Households

    Universe

    All households in Ajuong Thok and Pamir Refugee Camps

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    Households were selected using systematic random sampling. Enumerators systematically walked through each row in each block of the camps, in such a way as to give each HH a chance to be selected. For each block, the enumerators began at one corner and went row by row, systematically using the sampling interval (SI) to select HHs. The first HH sampled in each block was determined by selecting a random number between 1 and the SI, (6 in Ajuong Thok and 7 in Pamir). After selecting the first HH, the SI was used to identify the next respondent HH. The female head of the household was the preferred respondent. If she was not available, another adult (over 15 years of age) with knowledge of the HH´s WASH practices was surveyed. If no one qualified to answer the survey, the HH was replaced systematically using the SI.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The survey questionnaire used to collect the data consists of the following sections: - Demographics - Water - Sanitation - Hygiene - NFI Distribution

    Cleaning operations

    The data collected was uploaded to a server at the end of each day. IFormBuilder generated a Microsoft (MS) Excel spreadsheet dataset which was then cleaned and analyzed using MS Excel.

    Given that SP is currently implementing a WASH program in Ajuong Thok and Pamir, the assessment data collected in these camps will not only serve as the endline for UNHCR 2018 programming but also as the baseline for 2019 programming.

    Data was anonymized through decoding and local suppression.

  19. H

    The CEPS EurLex dataset: 142.036 EU laws from 1952-2019 with full text and...

    • dataverse.harvard.edu
    csv, pdf, tsv
    Updated Jun 2, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2020). The CEPS EurLex dataset: 142.036 EU laws from 1952-2019 with full text and 22 variables [Dataset]. http://doi.org/10.7910/DVN/0EGYWY
    Explore at:
    tsv(119723405), csv(1019978404), csv(248865834), pdf(136562), csv(1585521237), csv(289564219), tsv(75055125), csv(445965588), tsv(25746986), csv(481548943), tsv(3663564), tsv(50375826)Available download formats
    Dataset updated
    Jun 2, 2020
    Dataset provided by
    Harvard Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    1952 - 2019
    Area covered
    European Union
    Dataset funded by
    European Union-
    Description

    The CEPS EurLex dataset The dataset contains 142.036 EU laws - almost the entire corpus of the EU's digitally available legal acts passed between 1952 - 2019. It encompasses the three types of legally binding acts passed by the EU institutions: 102.304 regulations, 4.070 directives, 35.798 decisions in English language. The dataset was scraped from the official EU legal database (Eur-lex.eu) and transformed in machine-readable CSV format with the programming languages R and Python. The dataset was collected by the Centre for European Policy Studies (CEPS) for the TRIGGER project (https://trigger-project.eu/). We hope that it will facilitate future quantitative and computational research on the EU. Brief description: - The dataset is organised in tabular format, with each law representing one row and the columns representing 23 variables. - The full text of 134.633 laws is included (column "act_raw_text"). For newer laws, the text was scraped from Eur-lex.eu via the HTML pages, while for older laws, the text was extracted from (scanned) PDF documents (if available in English). - 22 additional variables are included, such as 'Act_name', 'Act_type', 'Subject_matter', 'Authors', 'Date_document', 'ELI_link', 'CELEX' (a unique identifier for every law). Please see the "CEPS_EurLex_codebook.pdf" file for an explanation of all variables. - Given its size, the dataset was uploaded in different batches to facilitate usage. Some Excel files are provided for non-technical users. We recommend, however, the use of the CSV files, since Excel does not save large amounts of data properly. EurLex_all.csv is the master file containing all data. Caveats: - The Eur-lex.eu website does not consistently provide data for all the variables. In addition, the HTML documents were not always cleanly formatted and text extraction from scanned PDFs is not entirely clean. Some data points are therefore missing for some laws and some laws were excluded entirely. - Not not all (older) laws were available in English, especially since Ireland and the UK only joined the European Communities in 1973. Non-English laws are excluded from the dataset. Other: - For details on the types of EU legal acts: https://ec.europa.eu/info/law/law-making-process/types-eu-law_en - An example for an experimental analysis with this dataset: https://trigger-project.eu/2019/10/28/a-data-science-approach-to-eu-differentiated-integration/ - The TRIGGER project is funded by the EU's Horizon 2020 programme, grant number 822735

  20. d

    Data from: Elephant pathway use in a human-dominated landscape

    • search.dataone.org
    • datadryad.org
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lydia Natalie Tiller (2025). Elephant pathway use in a human-dominated landscape [Dataset]. http://doi.org/10.5061/dryad.ns1rn8q20
    Explore at:
    Dataset updated
    Jul 31, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Lydia Natalie Tiller
    Description

    Habitat loss and fragmentation are one of the biggest threats facing wildlife today. Understanding the role of wildlife pathways in connecting resource areas is key to maintain landscape connectivity, reduce the impacts of habitat loss and help address human-wildlife conflict. In this study, we used sign surveys and camera trapping to understand the fine scale movement of elephants moving between a protected area and agricultural zone in the Masai Mara, Kenya. We used Generalised Linear Models to determine factors driving high frequency of pathway use by elephants. Our results showed strong seasonal trends in pathway use, with peaks coinciding with the dry season. However, no correlations between rainfall and pathway use were found. Temporal patterns of pathway use indicate that elephants use risk avoidance strategies by moving between the two areas at times of low human disturbance. Spatial analysis revealed that the most frequently used pathways were closer to farms, saltlicks and for..., We identified active pathways along the escarpment with the assistance of local rangers and farmers (Figure 2). We assumed pathways were in use if the path was devoid of vegetation (Blake and Inkamba-Nkulu, 2004), marked with elephant dung or footprints and showed signs of elephant browsing on the bordering vegetation (Von Gerhardt et al., 2014). Pathways that did not show any of these signs were not included in this study. We then mapped each pathway using a Garmin Etrek30 Global Positioning System (GPS). The GPS track was taken from the bottom of the escarpment on the border of the Masai Mara to the top of the escarpment. The end of the pathway was determined by the point at which the pathway widened and became open habitat. Habitat type was also recorded on each pathway using a classification system from Kindt et al., (2011). As each pathway went through a number of different habitats, we used a GPS to record the co-ordinate at which there was a change in habitat type. To determine s..., , # Elephant pathway use in a human-dominated landscape

    https://doi.org/10.5061/dryad.ns1rn8q20

    Data includes the final clean Excel sheets containing all the variable data that was imported into R for analysis. This data was used for Spearman’s Rank Correlation tests, a linear model and descriptive statistics.

    Description of the data and file structure

    The files 'SURVEY A_results' and 'SURVEY B_results' are Excel spreadsheets with a summary of the camera trap images from the pathways. Each row is one camera trap image with the processed data of the date, time, photo label, elephant group type, number of elephants and whether the elephants were traveling up or down the pathway.

    The file 'Data_Analysis_1' is an Excel spreadsheet that has all the data used in the papers models. This dataset has the different pathway use variables that were tested. For example, distance to farmland, slope etc.Â

    The file 'conflict' is an Excel spreadsheet wit...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Shiva Vashishtha (2022). Dirty Excel Data [Dataset]. https://www.kaggle.com/datasets/shivavashishtha/dirty-excel-data
Organization logo

Dirty Excel Data

Coca Cola Dirty Excel Format Data to practice Data Cleaning Skills

Explore at:
zip(13123 bytes)Available download formats
Dataset updated
Feb 23, 2022
Authors
Shiva Vashishtha
Description

Dataset

This dataset was created by Shiva Vashishtha

Contents

Search
Clear search
Close search
Google apps
Main menu