90 datasets found

Dirty Excel Data
kaggle.com
zip
Updated Feb 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shiva Vashishtha (2022). Dirty Excel Data [Dataset]. https://www.kaggle.com/datasets/shivavashishtha/dirty-excel-data
Explore at:
zip(13123 bytes)Available download formats
Dataset updated
Feb 23, 2022
Authors
Shiva Vashishtha
Description
Dataset

This dataset was created by Shiva Vashishtha

Contents
B
Data Cleaning Sample
borealisdata.ca
dataone.org
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/ZCN177
Dataset updated
Jul 13, 2023
Dataset provided by
Borealis
Authors
Rong Luo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Sample data for exercises in Further Adventures in Data Cleaning.
q
Cleaning Biodiversity Data: A Botanical Example Using Excel or RStudio
qubeshub.org
Updated Jul 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shelly Gaynor (2020). Cleaning Biodiversity Data: A Botanical Example Using Excel or RStudio [Dataset]. http://doi.org/10.25334/DRGD-F069
Explore at:
Unique identifier
https://doi.org/10.25334/DRGD-F069
Dataset updated
Jul 16, 2020
Dataset provided by
QUBES
Authors
Shelly Gaynor
Description
Access and clean an open source herbarium dataset using Excel or RStudio.
B
Navigating Stats Can Data & Scrubbing Data Clean with Excel Workshop
borealisdata.ca
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucia Costanzo; Vivek Jadon (2024). Navigating Stats Can Data & Scrubbing Data Clean with Excel Workshop [Dataset]. http://doi.org/10.5683/SP3/FF6AI9
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/FF6AI9
Dataset updated
Jul 19, 2024
Dataset provided by
Borealis
Authors
Lucia Costanzo; Vivek Jadon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Canada
Description
Ahoy, data enthusiasts! Join us for a hands-on workshop where you will hoist your sails and navigate through the Statistics Canada website, uncovering hidden treasures in the form of data tables. With the wind at your back, you’ll master the art of downloading these invaluable Stats Can datasets while braving the occasional squall of data cleaning challenges using Excel with your trusty captains Vivek and Lucia at the helm.
FIFa21 Messy Dataset cleaned and transformed
kaggle.com
zip
Updated Feb 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicolas Mora Hansen (2024). FIFa21 Messy Dataset cleaned and transformed [Dataset]. https://www.kaggle.com/datasets/nicolasmorahansen/fifa21-messy-dataset-cleaned-and-transformed
Explore at:
zip(5473572 bytes)Available download formats
Dataset updated
Feb 26, 2024
Authors
Nicolas Mora Hansen
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
FIFA21 - Data Transformation Cleaning and Transformation

EA Sports FIFA 21 is a popular video game that simulates football matches. Often, data collected from this game might be messy, containing inconsistencies, missing values, and various formatting issues.

For this project, I will attempt to clean, organize and prepare this messy FIFA_21 data for analysis using just Excel. Although, it can be done somewhat faster using Python, R, or other programming languages; the challenge at hand is to use Excel.

Observations(Rows)=18980

1. 'Spot blank values'.'COUNTBLANK'.

Column 'Loan Date End' has '17966' blanks.

2. 'Spot 'zero' values'. 'COUNTIF'.

=COUNTIF(A1:A18980; "=0")

'Value', 'Wage', 'Release Clause', 'Hits' have '0' values.

3.'Column Headers'

=SUBSTITUTE(A1; " "; "_")

Unique_Atributes(columns)=76

1.'Height'

At first glance this height column looked like it needed a simple formula to turn a string ending in 'cm' to real numbers expressing a height in centimeteres, but then it was visible that some values were also in feet. And they were expressed with apostrophes and air quotes which called for a more intricate formula to fetch every value and transform it. Inches had to be turned to feet. Then the total value turned into centimeteres. The 'IF' formula verifies if the string is a number by leaving out the 'cm' 'feet(')' and 'inches(")' from the string. If it is centimeteres, the number is kept. If it is feet, the digits before the airquotes are kept, the digits after the airquotes (the inches) are turned into feet, then added together, and finally turned into centimeters.

=IF(ISNUMBER(FIND("cm";$O2)); VALUE(SUBSTITUTE($O2; "cm"; "")); ROUND((LEFT($O2; FIND("'"; $O2) - 1) * 12 + MID($O2; FIND("'"; $O2) + 1; FIND(""""; $O2) - FIND("'"; $O2) - 1)) * 2,54;0))

2.'Weight'

Weight was added in 'Kg' and 'Lbs'. For 'Kg' the value is turned into numbers. For 'Lbs' the value is converted into 'Kg' and then turned into numbers. The result is rounded up to null decimal points.

=ROUND(IF(ISNUMBER(FIND("kg";$P2));VALUE(SUBSTITUTE($P2;"kg";""))*1;IF(ISNUMBER(FIND("lbs";$P2));VALUE(SUBSTITUTE($P2;"lbs";""))/2,205;0));0)

3.'Joined'

A new column is added to the right of 'Joined' by the name 'WithClub10Years'. This column shows whether the player has been at the same club for a minimum of 10 years.

=IF(YEAR(NOW())-YEAR(T2)>=10; "10 Years"; "")

4.'Value', 'Wage', 'Release Clause'

The monetary figures were converted into numerical values only. The values are Euros. The 'M' and 'K' removed and its according figure multiplied to show millions and thousands respectively. Decimal points delimiter changed from '.' to ',' for calculation.

=IF(ISNUMBER(FIND("M"; Z2)); VALUE(SUBSTITUTE(Z2; "M"; ""))*1000000; IF(ISNUMBER(FIND("K"; Z2)); VALUE(SUBSTITUTE(Z2; "K"; ""))*1000; Z2*1))

5.'W/F', 'SM', 'IR'

Values included stars. Stars were removed and string turned to numbers.

=LEFT(BO2; 1)

Conclusion

The clean dataset is now ready for more analysis, such as exploring player statistics, team performance, or other insigths that can provide a deeper understanding of the FIFA 21 game.
Dirty Dataset to practice Data Cleaning
kaggle.com
zip
Updated Nov 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amrutha yenikonda (2023). Dirty Dataset to practice Data Cleaning [Dataset]. https://www.kaggle.com/datasets/amruthayenikonda/dirty-dataset-to-practice-data-cleaning
Explore at:
zip(1241 bytes)Available download formats
Dataset updated
Nov 3, 2023
Authors
Amrutha yenikonda
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The dataset has been obtained by web scraping a Wikipedia page for which code is linked below: https://www.kaggle.com/amruthayenikonda/simple-web-scraping-using-pandas

This dataset can be used to practice data cleaning and manipulation for example dropping of unwanted columns, null vales, removing symbols etc
v
Clean,excel Imports in India from Italy
volza.com
csv
Updated Jun 3, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Volza FZ LLC (2026). Clean,excel Imports in India from Italy [Dataset]. https://www.volza.com/imports-india/india-import-data-of-clean-excel-from-italy
Explore at:
csvAvailable download formats
Dataset updated
Jun 3, 2026
Dataset authored and provided by
Volza FZ LLC
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Italy, India
Variables measured
Count of importers, Sum of import value, 2014-01-01/2021-09-30, Count of import shipments
Description
Analyze 950 Clean,excel import shipments to India from Italy till Mar-26. Import data includes Buyers, Suppliers, Pricing, Qty & Contacts.
v
Global Clean Excel export import trade data, buyers & suppliers
volza.com
csv
Updated Sep 3, 2026
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Volza FZ LLC (2026). Global Clean Excel export import trade data, buyers & suppliers [Dataset]. https://www.volza.com/trade-data-global/global-exporters-importers-export-import-data-of-clean+excel
Explore at:
csvAvailable download formats
Dataset updated
Sep 3, 2026
Dataset authored and provided by
Volza FZ LLC
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Count of exporters, Count of importers, Count of shipments, Sum of export import value
Description
Discover New & profitable Clean Excel buyers & suppliers, Access 2,289 export import shipment records till Dec - 25 with 52 importers & 33 Exporters.
Data-analysis-EXCEL-POWER-BI
kaggle.com
zip
Updated Jul 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Samir (2023). Data-analysis-EXCEL-POWER-BI [Dataset]. https://www.kaggle.com/datasets/ahmedsamir11111/data-analysis-excel-power-bi/discussion
Explore at:
zip(3235955 bytes)Available download formats
Dataset updated
Jul 27, 2023
Authors
Ahmed Samir
Description
In the beginning, the case was just data for a company that did not indicate any useful information that would help decision-makers. In this case, after collecting a number of revenues and expenses over the months. Needed to know the answers to a number of questions to make important decisions based on intuition-free data. The Questions:- About Rev. & Exp.
- What is the total sales and profit for the whole period? And What Total products sold? And What is Net profit? - In which month was the highest percentage of revenue achieved? And in the same month, what is the largest day have amount of revenue? - In which month was the highest percentage of expenses achieved? And in the same month, what is the largest day have amount of exp.? - What is the extent of the change in expenditures for each month? Percentage change in net profit over the months? About Distribution - What is the number of products sold each month in the largest state? -The top 3 largest states buying products during the two years? Comparison - Between Sales Method by Sales? - Between Men and Women’s Product by Sales? - Between Retailer by Profit?

What I did? - Understanding the data - preprocessing and clean the data - Solve The problems in the cleaning like missing data or false type data - querying the data and make some calculations like "COGS" with power query "Excel". - Modeling and make some measures on the data with power pivot "Excel" - After finishing processing and preparation, I made Some Pivot tables to answers the questions. - Last, I made a dashboard with Power BI to visualize The Results.
Cleaned NHANES 1988-2018
figshare.com
txt
Updated Feb 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet (2025). Cleaned NHANES 1988-2018 [Dataset]. http://doi.org/10.6084/m9.figshare.21743372.v9
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21743372.v9
Dataset updated
Feb 18, 2025
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Vy Nguyen; Lauren Y. M. Middleton; Neil Zhao; Lei Huang; Eliseu Verly; Jacob Kvasnicka; Luke Sagers; Chirag Patel; Justin Colacino; Olivier Jolliet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The National Health and Nutrition Examination Survey (NHANES) provides data and have considerable potential to study the health and environmental exposure of the non-institutionalized US population. However, as NHANES data are plagued with multiple inconsistencies, processing these data is required before deriving new insights through large-scale analyses. Thus, we developed a set of curated and unified datasets by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 135,310 participants and 5,078 variables. The variables conveydemographics (281 variables),dietary consumption (324 variables),physiological functions (1,040 variables),occupation (61 variables),questionnaires (1444 variables, e.g., physical activity, medical conditions, diabetes, reproductive health, blood pressure and cholesterol, early childhood),medications (29 variables),mortality information linked from the National Death Index (15 variables),survey weights (857 variables),environmental exposure biomarker measurements (598 variables), andchemical comments indicating which measurements are below or above the lower limit of detection (505 variables).csv Data Record: The curated NHANES datasets and the data dictionaries includes 23 .csv files and 1 excel file.The curated NHANES datasets involves 20 .csv formatted files, two for each module with one as the uncleaned version and the other as the cleaned version. The modules are labeled as the following: 1) mortality, 2) dietary, 3) demographics, 4) response, 5) medications, 6) questionnaire, 7) chemicals, 8) occupation, 9) weights, and 10) comments."dictionary_nhanes.csv" is a dictionary that lists the variable name, description, module, category, units, CAS Number, comment use, chemical family, chemical family shortened, number of measurements, and cycles available for all 5,078 variables in NHANES."dictionary_harmonized_categories.csv" contains the harmonized categories for the categorical variables.“dictionary_drug_codes.csv” contains the dictionary for descriptors on the drugs codes.“nhanes_inconsistencies_documentation.xlsx” is an excel file that contains the cleaning documentation, which records all the inconsistencies for all affected variables to help curate each of the NHANES modules.R Data Record: For researchers who want to conduct their analysis in the R programming language, only cleaned NHANES modules and the data dictionaries can be downloaded as a .zip file which include an .RData file and an .R file.“w - nhanes_1988_2018.RData” contains all the aforementioned datasets as R data objects. We make available all R scripts on customized functions that were written to curate the data.“m - nhanes_1988_2018.R” shows how we used the customized functions (i.e. our pipeline) to curate the original NHANES data.Example starter codes: The set of starter code to help users conduct exposome analysis consists of four R markdown files (.Rmd). We recommend going through the tutorials in order.“example_0 - merge_datasets_together.Rmd” demonstrates how to merge the curated NHANES datasets together.“example_1 - account_for_nhanes_design.Rmd” demonstrates how to conduct a linear regression model, a survey-weighted regression model, a Cox proportional hazard model, and a survey-weighted Cox proportional hazard model.“example_2 - calculate_summary_statistics.Rmd” demonstrates how to calculate summary statistics for one variable and multiple variables with and without accounting for the NHANES sampling design.“example_3 - run_multiple_regressions.Rmd” demonstrates how run multiple regression models with and without adjusting for the sampling design.
n
Data from: Designing data science workshops for data-intensive environmental...
data.niaid.nih.gov
datasetcatalog.nlm.nih.gov
+1more
zip
Updated Dec 8, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Allison Theobold; Stacey Hancock; Sara Mannheimer (2020). Designing data science workshops for data-intensive environmental science research [Dataset]. http://doi.org/10.5061/dryad.7wm37pvp7
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.7wm37pvp7
Dataset updated
Dec 8, 2020
Dataset provided by
California State Polytechnic University
Montana State University
Authors
Allison Theobold; Stacey Hancock; Sara Mannheimer
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Over the last 20 years, statistics preparation has become vital for a broad range of scientific fields, and statistics coursework has been readily incorporated into undergraduate and graduate programs. However, a gap remains between the computational skills taught in statistics service courses and those required for the use of statistics in scientific research. Ten years after the publication of "Computing in the Statistics Curriculum,'' the nature of statistics continues to change, and computing skills are more necessary than ever for modern scientific researchers. In this paper, we describe research on the design and implementation of a suite of data science workshops for environmental science graduate students, providing students with the skills necessary to retrieve, view, wrangle, visualize, and analyze their data using reproducible tools. These workshops help to bridge the gap between the computing skills necessary for scientific research and the computing skills with which students leave their statistics service courses. Moreover, though targeted to environmental science graduate students, these workshops are open to the larger academic community. As such, they promote the continued learning of the computational tools necessary for working with data, and provide resources for incorporating data science into the classroom.

Methods Surveys from Carpentries style workshops the results of which are presented in the accompanying manuscript.

Pre- and post-workshop surveys for each workshop (Introduction to R, Intermediate R, Data Wrangling in R, Data Visualization in R) were collected via Google Form.

The surveys administered for the fall 2018, spring 2019 academic year are included as pre_workshop_survey and post_workshop_assessment PDF files. The raw versions of these data are included in the Excel files ending in survey_raw or assessment_raw. The data files whose name includes survey contain raw data from pre-workshop surveys and the data files whose name includes assessment contain raw data from the post-workshop assessment survey. The annotated RMarkdown files used to clean the pre-workshop surveys and post-workshop assessments are included as workshop_survey_cleaning and workshop_assessment_cleaning, respectively. The cleaned pre- and post-workshop survey data are included in the Excel files ending in clean. The summaries and visualizations presented in the manuscript are included in the analysis annotated RMarkdown file.
Students Results Analysis using Microsoft Excel
kaggle.com
zip
Updated Oct 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OIE (2025). Students Results Analysis using Microsoft Excel [Dataset]. https://www.kaggle.com/datasets/emmyofh/students-results-analysis-using-microsoft-excel
Explore at:
zip(31469 bytes)Available download formats
Dataset updated
Oct 17, 2025
Authors
OIE
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This dataset was created to evaluate students’ performance in the most recent school examination. The goal is to help the school administration understand overall academic achievement, examine score distribution across grades, and identify student groups that may need additional academic support to improve learning outcomes.

The dataset provides detailed student result records, including subjects, scores, grades, and performance categories. It serves as a practical resource for educators, analysts, and data learners who wish to explore educational data using Excel or data analytics tools.

Tool Used: Microsoft Excel Spreadsheet

Data Frame Process: This analysis followed the Google Data Analytics data-phase approach, which involves:

Ask: Define the key questions and objectives

Prepare: Organize and clean the student result data

Process: Perform calculations and structure the data in Excel

Analyze: Evaluate performance trends and identify weak areas

Share: Present findings using tables, charts, and summaries

Act: Provide actionable recommendations to improve student outcomes
Sales data analysis using MS Excel
kaggle.com
zip
Updated May 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yerzat Tursunkulov (2024). Sales data analysis using MS Excel [Dataset]. https://www.kaggle.com/datasets/yerzattursunkulov/sales-data-analysis-using-ms-excel
Explore at:
zip(31983063 bytes)Available download formats
Dataset updated
May 8, 2024
Authors
Yerzat Tursunkulov
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
The Orders database contains information on the following variables. • Continuous variables: Row ID, Order ID, Order Date, Ship Date, Customer ID, Product ID, Sales, Quantity, Discount, Profit, Shipping Cost
• Categorical variables: Ship Mode, Customer Name, Segment, Postal Code, City, State, Country, Region, Market, Category, Subcategory, Product Name, Order Priority

The purpose of this project: 1. To use descriptive statistics methods to assess the sales performance across various segments, markets, product categories and subcategories; 2. To use diagnostic analytics methods to understand the statistical significance of the factors that influence sales; 3. Use predictive analytics (regression) to understand the strengths of the relationship between sales and sales drivers and generate a regression formula to predict sales 4. develop a sales forecasting model based on the insights.

Descriptive analytics Descriptive statistics for sales https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F848f47b38b7f2360163bb2221703c658%2FPicture2.png?generation=1715109635788424&alt=media" alt="">

Frequency distribution for sales Around 44,500 transactions of value >=USD 500. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F39cfd8ffd8fdf296300bb9f1fa5243e2%2FPicture3.png?generation=1715109667755923&alt=media" alt="">

Sales values across markets https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F3385959d11b6daafae24c848b4b00f13%2FPicture4.png?generation=1715109744629587&alt=media" alt="">

We see an increase in sales across all markets and throughout 2012-2015. We have high sales volumes in the USCA and LATAM markets:
• USCA: USD 757,108 in 2015; • LATAM: USD 706,632 in 2015.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F4aa59b5a5b980aad6873c8a4af4cd223%2FPicture1.png?generation=1715109770510368&alt=media" alt="">

Sales across product categories https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F867cbe622bf94d25a25a1c4b9281656d%2FPicture5.png?generation=1715109794950614&alt=media" alt="">

Office supplies were the largely sold product category in 2012-2015. Technology was the least sold product category by quantity. However, the Technology category yields high sales. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F5c74664f77cce2bc2f7c77c7b01e9890%2FPicture6.png?generation=1715109834309500&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2Fd3bb766183e9f58fbf009a998c01adf6%2FPicture7.png?generation=1715109872961254&alt=media" alt="">

Further analysis of profitable products reveals that phones and copiers demonstrate high sales. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F109c4c3eab81fa581c19a5c09beff839%2FPicture9.png?generation=1715109914590660&alt=media" alt="">

Sales across segments The data reveals that there are high sales in the Consumer segment across all product categories. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F65075cc20028a37a1aff6932fa89d3d5%2FPicture10.png?generation=1715109992655572&alt=media" alt="">

Diagnostic analytics

Two sample T-test Using a t-test, we can evaluate how sales differ across different segments, regions, and product types. T-test allows us to evaluate the statistical significance of sales samples. The two-sample t-test of sales numbers across markets resulted in the statistical significance of sales in USCA and LATAM markets with p-values >0.05. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F7b7264d5f44a9a79b352028b28d1c618%2FPicture11.png?generation=1715110082746375&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F4061ef38ea83d7e3bbd252a802863e8f%2FPicture12.png?generation=1715110097203251&alt=media" alt="">

The two-sample t-test of sales numbers across product categories resulted in the statistical significance of sales in Office supplies and Technology categories with p-values >0.05. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2Fd9994377d605222d77ef67af3e273771%2FPicture13.png?generation=1715110126112322&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F20744393%2F669779e9aad19d51a28fb44e7c484bc7%2FPicture14.png?generation=1715110140543290&alt=media" alt="">

Pearson correlation The correlation of continuous values in the dataset allows us to see the relationship between sales, quantity sold, shipping costs and profit. ![](https://www.googleapis.com/download/sto...
u
Data from: Survey data from the Australian Marine Debris Initiative
research.usc.edu.au
csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Heidi Tait; Jodi Jones; Caitlin Smith; Kathy Townsend, Survey data from the Australian Marine Debris Initiative [Dataset]. https://research.usc.edu.au/esploro/outputs/dataset/Survey-data-from-the-Australian-Marine/991016398702621
Explore at:
csv(7054018 bytes)Available download formats
Dataset provided by
University of the Sunshine Coast
Authors
Heidi Tait; Jodi Jones; Caitlin Smith; Kathy Townsend
Time period covered
2024
Description
Survey data from the Australian Marine Debris Initiative and the result of spatial analysis from multiple creative commons datasets. Data consists of: • Spatial Data Queensland Coastline – Event summaries within an Excel data table and shapefile • All years • Number of Items removed, Weight volunteers, Volume, Distance, Latitude and Longitude. • Contributing organisation files table/ sites • Environmental, physical and biological variables associated with the closest catchment to each debris survey. TBF has made all reasonable efforts to ensure that the information in the Custom Dataset is accurate. TBF will not be held responsible: • for the way these data are used by the Entity for their Reports; • for any errors that may be contained in the Custom Dataset; or • any direct or indirect damage the use of the Custom Dataset may cause. Data collected by TBF comes from citizen science initiatives and is taken at face value from contributors with each entry being vetted and periodic checks being made to maintain the integrity of the overall dataset. Some clean-up data has been extrapolated by data collectors. Some weight and distance details have not been provided by contributors. The data was collected by various organisations and individuals in clean-up events at their chosen locations where man-made items greater than 5mm were removed from the beach, and sorted, counted and recorded on data sheets, using CyberTracker software devices or the AMDI mobile application. Items were identified according to the method laid out in the TBF Marine Debris Identification Manual in which items are grouped according to their material categories (the manual is available on the TBF website). The length of beach cleaned is at the discretion of the clean-up group and the total weight of items removed is either weighed with handheld scales or estimated.
Power BI Sample Data
kaggle.com
zip
Updated Oct 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shwetank Chaudhary (2022). Power BI Sample Data [Dataset]. https://www.kaggle.com/datasets/shwetankchaudhary/power-bi-sample-data
Explore at:
zip(73587 bytes)Available download formats
Dataset updated
Oct 20, 2022
Authors
Shwetank Chaudhary
Description
This a dataset of finances which are also available in Power BI for practice. Use this dataset to practice Power BI.
m
Data for: Is visual motivation for cleaning surfaces in the kitchen...
data.mendeley.com
Updated Mar 31, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Trond Moretro (2020). Data for: Is visual motivation for cleaning surfaces in the kitchen consistent with a hygienically clean environment? [Dataset]. http://doi.org/10.17632/62js8885bn.1
Explore at:
Unique identifier
https://doi.org/10.17632/62js8885bn.1
Dataset updated
Mar 31, 2020
Authors
Trond Moretro
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset consist of: 1. soildatato sharing. Excel table showing visual detection (1) or no detection (0) by 15 consumers of three types of food soils on cutting boards or counter tops 2.visualdetection. Excel table showing data for 13 consumers doing visual detection (scale clean =1 to dirty =4) of kitchen surfaces, and swabs used at kitchen surfaces. 3.survivalpathogensdrysoil. Excel table showing fate of Salmonella, Campylobacter and total counts when dried in 3 types of food soils and water
Data set: St. Louis River Watershed, MN Conductivity Assessment March 2022
catalog.data.gov
datasets.ai
Updated Jul 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2025). Data set: St. Louis River Watershed, MN Conductivity Assessment March 2022 [Dataset]. https://catalog.data.gov/dataset/data-set-st-louis-river-watershed-mn-conductivity-assessment-march-2022
Explore at:
Dataset updated
Jul 18, 2025
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Area covered
Minnesota, Saint Louis River
Description
Data used to evaluate potential downstream impacts of the NorthMet Mine, by USEPA Office of Research and Development is providing, for USEPA Region 5’s use, including a characterization of stream specific conductivity (SC) levels, least disturbed background SC, and SC levels that may exceed the Fond du Lac Band’s WQ standards and adversely affect aquatic life, including brook trout (Salvelinus fontinalis), lake sturgeon (Acipenser fulvescens), and benthic macroinvertebrates. Keywords: Conductivity, St. Louis River, benthic invertebrates; mining The attached Excel Pedigree includes: _Datasets: Data file uploaded to EPA Science Hub and/or Environmental Data Set Gateway _R : Clean R scripts used to generate document figures and tables _Tables_Figures: Files generated from R script and used in the Region 5 memo 20220325 R Code and Data: All additional files used for this project, including original files, intermediate files, extra output files, and extra functions. The "_R" folder contains four subfolders. Each subfolder has several R scripts, input and output files, and an R project file. Users can run R scripts directly from each subfolder by installing R, RStudio, and associated R packages. Data Dictionary: See tab DataDictionary in Excel file Datasets: Simplified language is used in the text to identify parent data sets. Source and File names are retained in this pedigree in original form to enable R-scripts to retain functionality. • Thingvold et al. (1975-1977) • Griffith (1998-2009) • Predicted background (2000-2015) • Water Quality Portal (1996-2021) • Water Quality Portal Less Disturbed (1996-2021) • Minnesota Pollution Control Agency (MPCA) (1996-2013) • Mid-Atlantic Highlands (1990 to 2014). This dataset is associated with the following publication: Cormier, S., and Y. Wang. Appendix C: ORD Specific Conductance Memo, from Susan Cormier to Tera Fong. March 15, 2022. Assessment of effects of increased ion concentrations in the St. Louis River Watershed with special attention to potential mining influence and the jurisdiction of the Fond du Lac Band of Lake Superior Chippewa. U.S. Environmental Protection Agency, Washington, DC, USA, 2022.
KAP WASH 2018 in South Sudan's Ajuong Thok and Pamir Camps - South Sudan
microdata.worldbank.org
datacatalog.ihsn.org
+1more
Updated Sep 21, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samaritan's Purse (2021). KAP WASH 2018 in South Sudan's Ajuong Thok and Pamir Camps - South Sudan [Dataset]. https://microdata.worldbank.org/catalog/3891
Explore at:
Dataset updated
Sep 21, 2021
Dataset provided by
United Nations High Commissioner for Refugeeshttp://www.unhcr.org/
Samaritan's Purse
Time period covered
2018
Area covered
South Sudan
Description
Abstract

A Knowledge, Attitudes, and Practices (KAP) survey was conducted in Ajuong Thok and Pamir Refugee Camps in November 2018 to determine the current Water, Sanitation, and Hygiene (WASH) conditions as well as hygiene attitudes and practices within the households (HHs) surveyed. The assessment utilized a systematic random sampling method, and a total of 1,040 HHs (520 HHs in each location) were surveyed using mobile data collection (MDC) within a period of 10 days. Data was cleaned and analyzed in Excel. The summary of the results is presented in this report.

The findings showed that the overall average number of liters of water per person per day was 21, in both Ajuong Thok and Pamir Camps, which was slightly higher than the recommended Office of the United Nations High Commissioner for Refugees (UNHCR) minimum standard of at least 20 liters of water available per person per day. This is a slight improvement from the 19.5 liters reported the previous year. The average HH size was six people. Women comprised 83.2% of the surveyed respondents and males 16.8%. Almost all the respondents were refugees, constituting 99.6%. The refugees were aware of the key health and hygiene practices, possibly as a result of routine health and hygiene messages delivered to them by Samaritan´s Purse (SP), Africa Humanitarian Action (AHA) and International Rescue Committee (IRC). Most refugees had knowledge about keeping water containers clean, washing hands during critical times, safe excreta disposal and disease prevention.

Geographic coverage

Ajuong Thok and Pamir Refugee Camps

Analysis unit

Households

Universe

All households in Ajuong Thok and Pamir Refugee Camps

Kind of data

Sample survey data [ssd]

Sampling procedure

Households were selected using systematic random sampling. Enumerators systematically walked through each row in each block of the camps, in such a way as to give each HH a chance to be selected. For each block, the enumerators began at one corner and went row by row, systematically using the sampling interval (SI) to select HHs. The first HH sampled in each block was determined by selecting a random number between 1 and the SI, (6 in Ajuong Thok and 7 in Pamir). After selecting the first HH, the SI was used to identify the next respondent HH. The female head of the household was the preferred respondent. If she was not available, another adult (over 15 years of age) with knowledge of the HH´s WASH practices was surveyed. If no one qualified to answer the survey, the HH was replaced systematically using the SI.

Mode of data collection

Face-to-face [f2f]

Research instrument

The survey questionnaire used to collect the data consists of the following sections: - Demographics - Water - Sanitation - Hygiene - NFI Distribution

Cleaning operations

The data collected was uploaded to a server at the end of each day. IFormBuilder generated a Microsoft (MS) Excel spreadsheet dataset which was then cleaned and analyzed using MS Excel.

Given that SP is currently implementing a WASH program in Ajuong Thok and Pamir, the assessment data collected in these camps will not only serve as the endline for UNHCR 2018 programming but also as the baseline for 2019 programming.

Data was anonymized through decoding and local suppression.
H
The CEPS EurLex dataset: 142.036 EU laws from 1952-2019 with full text and...
dataverse.harvard.edu
csv, pdf, tsv
Updated Jun 2, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2020). The CEPS EurLex dataset: 142.036 EU laws from 1952-2019 with full text and 22 variables [Dataset]. http://doi.org/10.7910/DVN/0EGYWY
Explore at:
tsv(119723405), csv(1019978404), csv(248865834), pdf(136562), csv(1585521237), csv(289564219), tsv(75055125), csv(445965588), tsv(25746986), csv(481548943), tsv(3663564), tsv(50375826)Available download formats
Unique identifier
https://doi.org/10.7910/DVN/0EGYWY
Dataset updated
Jun 2, 2020
Dataset provided by
Harvard Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
1952 - 2019
Area covered
European Union
Dataset funded by
European Union-
Description
The CEPS EurLex dataset The dataset contains 142.036 EU laws - almost the entire corpus of the EU's digitally available legal acts passed between 1952 - 2019. It encompasses the three types of legally binding acts passed by the EU institutions: 102.304 regulations, 4.070 directives, 35.798 decisions in English language. The dataset was scraped from the official EU legal database (Eur-lex.eu) and transformed in machine-readable CSV format with the programming languages R and Python. The dataset was collected by the Centre for European Policy Studies (CEPS) for the TRIGGER project (https://trigger-project.eu/). We hope that it will facilitate future quantitative and computational research on the EU. Brief description: - The dataset is organised in tabular format, with each law representing one row and the columns representing 23 variables. - The full text of 134.633 laws is included (column "act_raw_text"). For newer laws, the text was scraped from Eur-lex.eu via the HTML pages, while for older laws, the text was extracted from (scanned) PDF documents (if available in English). - 22 additional variables are included, such as 'Act_name', 'Act_type', 'Subject_matter', 'Authors', 'Date_document', 'ELI_link', 'CELEX' (a unique identifier for every law). Please see the "CEPS_EurLex_codebook.pdf" file for an explanation of all variables. - Given its size, the dataset was uploaded in different batches to facilitate usage. Some Excel files are provided for non-technical users. We recommend, however, the use of the CSV files, since Excel does not save large amounts of data properly. EurLex_all.csv is the master file containing all data. Caveats: - The Eur-lex.eu website does not consistently provide data for all the variables. In addition, the HTML documents were not always cleanly formatted and text extraction from scanned PDFs is not entirely clean. Some data points are therefore missing for some laws and some laws were excluded entirely. - Not not all (older) laws were available in English, especially since Ireland and the UK only joined the European Communities in 1973. Non-English laws are excluded from the dataset. Other: - For details on the types of EU legal acts: https://ec.europa.eu/info/law/law-making-process/types-eu-law_en - An example for an experimental analysis with this dataset: https://trigger-project.eu/2019/10/28/a-data-science-approach-to-eu-differentiated-integration/ - The TRIGGER project is funded by the EU's Horizon 2020 programme, grant number 822735
d
Data from: Elephant pathway use in a human-dominated landscape
search.dataone.org
datadryad.org
Updated Jul 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lydia Natalie Tiller (2025). Elephant pathway use in a human-dominated landscape [Dataset]. http://doi.org/10.5061/dryad.ns1rn8q20
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.ns1rn8q20
Dataset updated
Jul 31, 2025
Dataset provided by
Dryad Digital Repository
Authors
Lydia Natalie Tiller
Description
Habitat loss and fragmentation are one of the biggest threats facing wildlife today. Understanding the role of wildlife pathways in connecting resource areas is key to maintain landscape connectivity, reduce the impacts of habitat loss and help address human-wildlife conflict. In this study, we used sign surveys and camera trapping to understand the fine scale movement of elephants moving between a protected area and agricultural zone in the Masai Mara, Kenya. We used Generalised Linear Models to determine factors driving high frequency of pathway use by elephants. Our results showed strong seasonal trends in pathway use, with peaks coinciding with the dry season. However, no correlations between rainfall and pathway use were found. Temporal patterns of pathway use indicate that elephants use risk avoidance strategies by moving between the two areas at times of low human disturbance. Spatial analysis revealed that the most frequently used pathways were closer to farms, saltlicks and for..., We identified active pathways along the escarpment with the assistance of local rangers and farmers (Figure 2). We assumed pathways were in use if the path was devoid of vegetation (Blake and Inkamba-Nkulu, 2004), marked with elephant dung or footprints and showed signs of elephant browsing on the bordering vegetation (Von Gerhardt et al., 2014). Pathways that did not show any of these signs were not included in this study. We then mapped each pathway using a Garmin Etrek30 Global Positioning System (GPS). The GPS track was taken from the bottom of the escarpment on the border of the Masai Mara to the top of the escarpment. The end of the pathway was determined by the point at which the pathway widened and became open habitat. Habitat type was also recorded on each pathway using a classification system from Kindt et al., (2011). As each pathway went through a number of different habitats, we used a GPS to record the co-ordinate at which there was a change in habitat type. To determine s..., , # Elephant pathway use in a human-dominated landscape

https://doi.org/10.5061/dryad.ns1rn8q20

Data includes the final clean Excel sheets containing all the variable data that was imported into R for analysis. This data was used for Spearmanâ€™s Rank Correlation tests, a linear model and descriptive statistics.

Description of the data and file structure

The files 'SURVEY A_results' and 'SURVEY B_results' are Excel spreadsheets with a summary of the camera trap images from the pathways. Each row is one camera trap image with the processed data of the date, time, photo label, elephant group type, number of elephants and whether the elephants were traveling up or down the pathway.

The file 'Data_Analysis_1' is an Excel spreadsheet that has all the data used in the papers models. This dataset has the different pathway use variables that were tested. For example, distance to farmland, slope etc.Â

The file 'conflict' is an Excel spreadsheet wit...

Facebook

Twitter

Click to copy link

Link copied

Cite

Shiva Vashishtha (2022). Dirty Excel Data [Dataset]. https://www.kaggle.com/datasets/shivavashishtha/dirty-excel-data

Dirty Excel Data

Coca Cola Dirty Excel Format Data to practice Data Cleaning Skills

Explore at:

zip(13123 bytes)Available download formats

Dataset updated

Feb 23, 2022

Authors

Shiva Vashishtha

Description

Dataset

This dataset was created by Shiva Vashishtha

Clear search

Close search

Google apps

Main menu

Dirty Excel Data

Dataset

Contents

Data Cleaning Sample

Cleaning Biodiversity Data: A Botanical Example Using Excel or RStudio

Navigating Stats Can Data & Scrubbing Data Clean with Excel Workshop

FIFa21 Messy Dataset cleaned and transformed

FIFA21 - Data Transformation Cleaning and Transformation

1. 'Spot blank values'.'COUNTBLANK'.

2. 'Spot 'zero' values'. 'COUNTIF'.

3.'Column Headers'

1.'Height'

2.'Weight'

3.'Joined'

4.'Value', 'Wage', 'Release Clause'

5.'W/F', 'SM', 'IR'

Dirty Dataset to practice Data Cleaning

Clean,excel Imports in India from Italy

Global Clean Excel export import trade data, buyers & suppliers

Data-analysis-EXCEL-POWER-BI

Cleaned NHANES 1988-2018

Data from: Designing data science workshops for data-intensive environmental...

Students Results Analysis using Microsoft Excel

Sales data analysis using MS Excel

Data from: Survey data from the Australian Marine Debris Initiative

Power BI Sample Data

Data for: Is visual motivation for cleaning surfaces in the kitchen...

Data set: St. Louis River Watershed, MN Conductivity Assessment March 2022

KAP WASH 2018 in South Sudan's Ajuong Thok and Pamir Camps - South Sudan

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Mode of data collection

Research instrument

Cleaning operations

The CEPS EurLex dataset: 142.036 EU laws from 1952-2019 with full text and...

Data from: Elephant pathway use in a human-dominated landscape

Description of the data and file structure

Dirty Excel Data

Coca Cola Dirty Excel Format Data to practice Data Cleaning Skills

Dataset

Contents