45 datasets found

B
Data Cleaning Sample
borealisdata.ca
dataone.org
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/ZCN177
Dataset updated
Jul 13, 2023
Dataset provided by
Borealis
Authors
Rong Luo
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Sample data for exercises in Further Adventures in Data Cleaning.
q
Cleaning Biodiversity Data: A Botanical Example Using Excel or RStudio
qubeshub.org
Updated Jul 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shelly Gaynor (2020). Cleaning Biodiversity Data: A Botanical Example Using Excel or RStudio [Dataset]. http://doi.org/10.25334/DRGD-F069
Explore at:
Unique identifier
https://doi.org/10.25334/DRGD-F069
Dataset updated
Jul 16, 2020
Dataset provided by
QUBES
Authors
Shelly Gaynor
Description
Access and clean an open source herbarium dataset using Excel or RStudio.
B
Navigating Stats Can Data & Scrubbing Data Clean with Excel Workshop
borealisdata.ca
search.dataone.org
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucia Costanzo; Vivek Jadon (2024). Navigating Stats Can Data & Scrubbing Data Clean with Excel Workshop [Dataset]. http://doi.org/10.5683/SP3/FF6AI9
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP3/FF6AI9
Dataset updated
Jul 19, 2024
Dataset provided by
Borealis
Authors
Lucia Costanzo; Vivek Jadon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Canada
Description
Ahoy, data enthusiasts! Join us for a hands-on workshop where you will hoist your sails and navigate through the Statistics Canada website, uncovering hidden treasures in the form of data tables. With the wind at your back, you’ll master the art of downloading these invaluable Stats Can datasets while braving the occasional squall of data cleaning challenges using Excel with your trusty captains Vivek and Lucia at the helm.
Sales and workload in retail industry
kaggle.com
Updated Dec 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dennis Gluesenkamp (2019). Sales and workload in retail industry [Dataset]. https://www.kaggle.com/dgluesen/sales-and-workload-data-from-retail-industry/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 12, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dennis Gluesenkamp
Description
Context

Raw data of real analytical use cases in a number of industries and companies is frequently provided in an Excel-based form. These files usually cannot be processed directly in machine learning models, but must first be cleaned and preprocessed. In this procedure, many different types of pitfalls may occur. This makes data preprocessing an essential time factor in the daily work of a data scientist.

Here, an Excel spreadsheet will be presented which in this form is closely oriented to a real case but contains only simulated figures for reasons of data and business results protection. The form and structure of the file correspond to a real case and could be encountered by a data scientist in a company in this way. Such a file can be the result of a download from a financial controlling system, e.g. SAP.

Content

The data includes information about sold goods resp. product units, the associated turnover and hours worked. This information is grouped by month, store and department of the retailer. Moreover, information about the sales area in a specific department as well as about the opening hours of the store is provided.

Possible objectives

The following goals of data cleansing might be addressed:

Import the Excel-file

Inspect the dataset

Check data types and do meaningful modifications

Handle missings/data gaps

Find and solve data inconsistencies

Rename columns for improved usage

Join tables to a single one

Furthermore, the data can be investigated with regard to correlations between different features and/or a regression model.

License

GNU General Public License v3.0 - https://www.gnu.org/licenses/gpl-3.0.en.html
Data-analysis-EXCEL-POWER-BI
kaggle.com
Updated Jul 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Samir (2023). Data-analysis-EXCEL-POWER-BI [Dataset]. https://www.kaggle.com/datasets/ahmedsamir11111/data-analysis-excel-power-bi/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 27, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ahmed Samir
Description
In the beginning, the case was just data for a company that did not indicate any useful information that would help decision-makers. In this case, after collecting a number of revenues and expenses over the months. Needed to know the answers to a number of questions to make important decisions based on intuition-free data. The Questions:- About Rev. & Exp.
- What is the total sales and profit for the whole period? And What Total products sold? And What is Net profit? - In which month was the highest percentage of revenue achieved? And in the same month, what is the largest day have amount of revenue? - In which month was the highest percentage of expenses achieved? And in the same month, what is the largest day have amount of exp.? - What is the extent of the change in expenditures for each month? Percentage change in net profit over the months? About Distribution - What is the number of products sold each month in the largest state? -The top 3 largest states buying products during the two years? Comparison - Between Sales Method by Sales? - Between Men and Women’s Product by Sales? - Between Retailer by Profit?

What I did? - Understanding the data - preprocessing and clean the data - Solve The problems in the cleaning like missing data or false type data - querying the data and make some calculations like "COGS" with power query "Excel". - Modeling and make some measures on the data with power pivot "Excel" - After finishing processing and preparation, I made Some Pivot tables to answers the questions. - Last, I made a dashboard with Power BI to visualize The Results.
i
Household Income and Expenditure 2010 - Tuvalu
catalog.ihsn.org
Updated Mar 29, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Central Statistics Division (2019). Household Income and Expenditure 2010 - Tuvalu [Dataset]. http://catalog.ihsn.org/catalog/3203
Explore at:
Dataset updated
Mar 29, 2019
Dataset authored and provided by
Central Statistics Division
Time period covered
2010
Area covered
Tuvalu
Description
Abstract

The main objectives of the survey were: - To obtain weights for the revision of the Consumer Price Index (CPI) for Funafuti; - To provide information on the nature and distribution of household income, expenditure and food consumption patterns; - To provide data on the household sector's contribution to the National Accounts - To provide information on economic activity of men and women to study gender issues - To undertake some poverty analysis

Geographic coverage

National, including Funafuti and Outer islands

Analysis unit

Household

individual

Universe

All the private household are included in the sampling frame. In each household selected, the current resident are surveyed, and people who are usual resident but are currently away (work, health, holydays reasons, or border student for example. If the household had been residing in Tuvalu for less than one year: - but intend to reside more than 12 months => The household is included - do not intend to reside more than 12 months => out of scope

Kind of data

Sample survey data [ssd]

Sampling procedure

It was decided that 33% (one third) sample was sufficient to achieve suitable levels of accuracy for key estimates in the survey. So the sample selection was spread proportionally across all the island except Niulakita as it was considered too small. For selection purposes, each island was treated as a separate stratum and independent samples were selected from each. The strategy used was to list each dwelling on the island by their geographical position and run a systematic skip through the list to achieve the 33% sample. This approach assured that the sample would be spread out across each island as much as possible and thus more representative.

For details please refer to Table 1.1 of the Report.

Sampling deviation

Only the island of Niulakita was not included in the sampling frame, considered too small.

Mode of data collection

Face-to-face [f2f]

Research instrument

There were three main survey forms used to collect data for the survey. Each question are writen in English and translated in Tuvaluan on the same version of the questionnaire. The questionnaires were designed based on the 2004 survey questionnaire.

HOUSEHOLD FORM - composition of the household and demographic profile of each members - dwelling information - dwelling expenditure - transport expenditure - education expenditure - health expenditure - land and property expenditure - household furnishing - home appliances - cultural and social payments - holydays/travel costs - Loans and saving - clothing - other major expenditure items

INDIVIDUAL FORM - health and education - labor force (individu aged 15 and above) - employment activity and income (individu aged 15 and above): wages and salaries, working own business, agriculture and livestock, fishing, income from handicraft, income from gambling, small scale activies, jobs in the last 12 months, other income, childreen income, tobacco and alcohol use, other activities, and seafarer

DIARY (one diary per week, on a 2 weeks period, 2 diaries per household were required) - All kind of expenses - Home production - food and drink (eaten by the household, given away, sold) - Goods taken from own business (consumed, given away) - Monetary gift (given away, received, winning from gambling) - Non monetary gift (given away, received, winning from gambling)

Questionnaire Design Flaws Questionnaire design flaws address any problems with the way questions were worded which will result in an incorrect answer provided by the respondent. Despite every effort to minimize this problem during the design of the respective survey questionnaires and the diaries, problems were still identified during the analysis of the data. Some examples are provided below:

Gifts, Remittances & Donations Collecting information on the following: - the receipt and provision of gifts - the receipt and provision of remittances - the provision of donations to the church, other communities and family occasions is a very difficult task in a HIES. The extent of these activities in Tuvalu is very high, so every effort should be made to address these activities as best as possible. A key problem lies in identifying the best form (questionnaire or diary) for covering such activities. A general rule of thumb for a HIES is that if the activity occurs on a regular basis, and involves the exchange of small monetary amounts or in-kind gifts, the diary is more appropriate. On the other hand, if the activity is less infrequent, and involves larger sums of money, the questionnaire with a recall approach is preferred. It is not always easy to distinguish between the two for the different activities, and as such, both the diary and questionnaire were used to collect this information. Unfortunately it probably wasn?t made clear enough as to what types of transactions were being collected from the different sources, and as such some transactions might have been missed, and others counted twice. The effects of these problems are hopefully minimal overall.

Defining Remittances Because people have different interpretations of what constitutes remittances, the questionnaire needs to be very clear as to how this concept is defined in the survey. Unfortunately this wasn?t explained clearly enough so it was difficult to distinguish between a remittance, which should be of a more regular nature, and a one-off monetary gift which was transferred between two households.

Business Expenses Still Recorded The aim of the survey is to measure "household" expenditure, and as such, any expenditure made by a household for an item or service which was primarily used for a business activity should be excluded. It was not always clear in the questionnaire that this was the case, and as such some business expenses were included. Efforts were made during data cleaning to remove any such business expenses which would impact significantly on survey results.

Purchased goods given away as a gift When a household makes a gift donation of an item it has purchased, this is recorded in section 5 of the diary. Unfortunately it was difficult to know how to treat these items as it was not clear as to whether this item had been recorded already in section 1 of the diary which covers purchases. The decision was made to exclude all information of gifts given which were considered to be purchases, as these items were assumed to have already been recorded already in section 1. Ideally these items should be treated as a purchased gift given away, which in turn is not household consumption expenditure, but this was not possible.

Some key items missed in the Questionnaire Although not a big issue, some key expenditure items were omitted from the questionnaire when it would have been best to collect them via this schedule. A key example being electric fans which many households in Tuvalu own.

Cleaning operations

Consistency of the data: - each questionnaire was checked by the supervisor during and after the collection - before data entry, all the questionnaire were coded - the CSPRo data entry system included inconsistency checks which allow the NSO staff to point some errors and to correct them with imputation estimation from their own knowledge (no time for double entry), 4 data entry operators. - after data entry, outliers were identified in order to check their consistency.

All data entry, including editing, edit checks and queries, was done using CSPro (Census Survey Processing System) with additional data editing and cleaning taking place in Excel.

The staff from the CSD was responsible for undertaking the coding and data entry, with assistance from an additional four temporary staff to help produce results in a more timely manner.

Although enumeration didn't get completed until mid June, the coding and data entry commenced as soon as forms where available from Funafuti, which was towards the end of March. The coding and data entry was then completed around the middle of July.

A visit from an SPC consultant then took place to undertake initial cleaning of the data, primarily addressing missing data items and missing schedules. Once the initial data cleaning was undertaken in CSPro, data was transferred to Excel where it was closely scrutinized to check that all responses were sensible. In the cases where unusual values were identified, original forms were consulted for these households and modifications made to the data if required.

Despite the best efforts being made to clean the data file in preparation for the analysis, no doubt errors will still exist in the data, due to its size and complexity. Having said this, they are not expected to have significant impacts on the survey results.

Under-Reporting and Incorrect Reporting as a result of Poor Field Work Procedures The most crucial stage of any survey activity, whether it be a population census or a survey such as a HIES is the fieldwork. It is crucial for intense checking to take place in the field before survey forms are returned to the office for data processing. Unfortunately, it became evident during the cleaning of the data that fieldwork wasn?t checked as thoroughly as required, and as such some unexpected values appeared in the questionnaires, as well as unusual results appearing in the diaries. Efforts were made to indentify the main issues which would have the greatest impact on final results, and this information was modified using local knowledge, to a more reasonable answer, when required.

Data Entry Errors Data entry errors are always expected, but can be kept to a minimum with
n
Spreadsheet Processing Capabilities
nantucketai.com
csv, xlsx
Updated Sep 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anthropic (2025). Spreadsheet Processing Capabilities [Dataset]. https://www.nantucketai.com/claude-just-changed-how-we-do-spreadsheets-with-its-new-feature/
Explore at:
csv, xlsxAvailable download formats
Dataset updated
Sep 10, 2025
Dataset authored and provided by
Anthropic
Description
Types of data processing Claude's Code Interpreter can handle

💄 Cosmetics & Skincare Product Sales Data (2022)

kaggle.com

Updated Jul 21, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Atharva Soundankar (2025). 💄 Cosmetics & Skincare Product Sales Data (2022) [Dataset]. https://www.kaggle.com/datasets/atharvasoundankar/cosmetics-and-skincare-product-sales-data-2022

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jul 21, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Atharva Soundankar

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

A high-quality, clean dataset simulating global cosmetics and skincare product sales between January and August 2022. This dataset mirrors real-world transactional data, making it perfect for data analysis, Excel training, visualization projects, and machine learning prototypes.

📁 Dataset Overview

Column Name	Description
`Sales Person`	Name of the salesperson responsible for the sale
`Country`	Country or region where the sale occurred
`Product`	Cosmetic or skincare product sold
`Date`	Date of the transaction (format: YYYY-MM-DD)
`Amount ($)`	Total revenue generated from the sale (USD)
`Boxes Shipped`	Number of product boxes shipped in the order

🧾 Sample Products

Hydrating Face Serum
Vitamin C Cream
Aloe Vera Gel
Charcoal Face Wash
SPF 50 Sunscreen
Niacinamide Toner
Anti-Aging Serum
Face Sheet Masks
Hair Repair Oil
Lip Balm Pack
Body Butter Cream
Salicylic Acid Cleanser

🌏 Countries Covered

India
USA
UK
Canada
Australia
New Zealand

📊 Quick Stats

Total Rows: 374
Date Range: Jan 1, 2022 – Aug 31, 2022
Revenue Range: Varies from ~$100 to ~$20,000 per order
Box Quantity Range: 10 – 500 boxes

🎯 Ideal For

Excel Practice (VLOOKUP, IF, AVERAGEIFS, INDEX-MATCH, etc.)
Pivot tables & data cleaning tasks
Power BI / Tableau dashboards
Sales trend forecasting
Exploratory Data Analysis (EDA)
Retail analytics & product demand modeling

📌 Suggested Projects & Questions

Which salesperson generated the highest revenue overall?
What’s the average amount per order in each country?
Which product was most frequently sold?
What month had the highest total boxes shipped?
Create a dashboard comparing revenue across countries.

✅ Clean Data Guarantee

✅ No missing/null values
✅ No duplicates
✅ Realistic values
✅ Globally relatable product categories
✅ Ready for ML, BI, and teaching use cases

q
REACH Project: Gel Dot Audits Data
researchdatafinder.qut.edu.au
researchdata.edu.au
Updated Mar 28, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Professor Adrian Barnett (2019). REACH Project: Gel Dot Audits Data [Dataset]. https://researchdatafinder.qut.edu.au/display/n23301
Explore at:
Dataset updated
Mar 28, 2019
Dataset provided by
Queensland University of Technology (QUT)
Authors
Professor Adrian Barnett
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
These data were collected prospectively for the REACH study, funded by the NHMRC GNT1076006, in 11 Queensland hospitals, between May 2016 and July 2017. This single dataset is the complete records of gel dot audits to measure effective cleaning practices throughout the REACH trial.

Hardware used to analyse data: DAZO® Fluorescent marking gel, Ultraviolet (UV) light torch and an iPad mini.

Software used to analyse data: iCombat software (provided by Visibility Solutions; set up according to site ward structure), R, Microsoft Excel.
Superstore Sales Analysis
kaggle.com
Updated Oct 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Reda Elblgihy (2023). Superstore Sales Analysis [Dataset]. https://www.kaggle.com/datasets/aliredaelblgihy/superstore-sales-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 21, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ali Reda Elblgihy
Description
Analyzing sales data is essential for any business looking to make informed decisions and optimize its operations. In this project, we will utilize Microsoft Excel and Power Query to conduct a comprehensive analysis of Superstore sales data. Our primary objectives will be to establish meaningful connections between various data sheets, ensure data quality, and calculate critical metrics such as the Cost of Goods Sold (COGS) and discount values. Below are the key steps and elements of this analysis:

1- Data Import and Transformation:

Gather and import relevant sales data from various sources into Excel.

Utilize Power Query to clean, transform, and structure the data for analysis.

Merge and link different data sheets to create a cohesive dataset, ensuring that all data fields are connected logically.

2- Data Quality Assessment:

Perform data quality checks to identify and address issues like missing values, duplicates, outliers, and data inconsistencies.

Standardize data formats and ensure that all data is in a consistent, usable state.

3- Calculating COGS:

Determine the Cost of Goods Sold (COGS) for each product sold by considering factors like purchase price, shipping costs, and any additional expenses.

Apply appropriate formulas and calculations to determine COGS accurately.

4- Discount Analysis:

Analyze the discount values offered on products to understand their impact on sales and profitability.

Calculate the average discount percentage, identify trends, and visualize the data using charts or graphs.

5- Sales Metrics:

Calculate and analyze various sales metrics, such as total revenue, profit margins, and sales growth.

Utilize Excel functions to compute these metrics and create visuals for better insights.

6- Visualization:

Create visualizations, such as charts, graphs, and pivot tables, to present the data in an understandable and actionable format.

Visual representations can help identify trends, outliers, and patterns in the data.

7- Report Generation:

Compile the findings and insights into a well-structured report or dashboard, making it easy for stakeholders to understand and make informed decisions.

Throughout this analysis, the goal is to provide a clear and comprehensive understanding of the Superstore's sales performance. By using Excel and Power Query, we can efficiently manage and analyze the data, ensuring that the insights gained contribute to the store's growth and success.
d
Data from: Data cleaning and enrichment through data integration: networking...
search.dataone.org
datadryad.org
Updated Feb 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Irene Finocchi; Alessio Martino; Blerina Sinaimeri; Fariba Ranjbar (2025). Data cleaning and enrichment through data integration: networking the Italian academia [Dataset]. http://doi.org/10.5061/dryad.wpzgmsbwj
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.wpzgmsbwj
Dataset updated
Feb 25, 2025
Dataset provided by
Dryad Digital Repository
Authors
Irene Finocchi; Alessio Martino; Blerina Sinaimeri; Fariba Ranjbar
Description
We describe a bibliometric network characterizing co-authorship collaborations in the entire Italian academic community. The network, consisting of 38,220 nodes and 507,050 edges, is built upon two distinct data sources: faculty information provided by the Italian Ministry of University and Research and publications available in Semantic Scholar. Both nodes and edges are associated with a large variety of semantic data, including gender, bibliometric indexes, authors' and publications' research fields, and temporal information. While linking data between the two original sources posed many challenges, the network has been carefully validated to assess its reliability and to understand its graph-theoretic characteristics. By resembling several features of social networks, our dataset can be profitably leveraged in experimental studies in the wide social network analytics domain as well as in more specific bibliometric contexts. , The proposed network is built starting from two distinct data sources:

the entire dataset dump from Semantic Scholar (with particular emphasis on the authors and papers datasets) the entire list of Italian faculty members as maintained by Cineca (under appointment by the Italian Ministry of University and Research).

By means of a custom name-identity recognition algorithm (details are available in the accompanying paper published in Scientific Data), the names of the authors in the Semantic Scholar dataset have been mapped against the names contained in the Cineca dataset and authors with no match (e.g., because of not being part of an Italian university) have been discarded. The remaining authors will compose the nodes of the network, which have been enriched with node-related (i.e., author-related) attributes. In order to build the network edges, we leveraged the papers dataset from Semantic Scholar: specifically, any two authors are said to be connected if there is at least one pap..., , # Data cleaning and enrichment through data integration: networking the Italian academia

https://doi.org/10.5061/dryad.wpzgmsbwj

Manuscript published inÂ Scientific Data with DOI .

Description of the data and file structure

This repository contains two main data files:

edge_data_AGG.csv, the full network in comma-separated edge list format (this file contains mainly temporal co-authorship information);

Coauthorship_Network_AGG.graphml, the full network in GraphML format.Â

along with several supplementary data, listed below, useful only to build the network (i.e., for reproducibility only):

University-City-match.xlsx, an Excel file that maps the name of a university against the city where its respective headquarter is located;

Areas-SS-CINECA-match.xlsx, an Excel file that maps the research areas in Cineca against the research areas in Semantic Scholar.

Description of the main data files

TheÂ `Coauthorship_Networ...
Vaginal Pulse Amplitude Data Cleaning Guide
osf.io
Updated Oct 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kirstin Clephane; Tierney Lorenz (2022). Vaginal Pulse Amplitude Data Cleaning Guide [Dataset]. http://doi.org/10.17605/OSF.IO/T67QN
Explore at:
Unique identifier
https://doi.org/10.17605/OSF.IO/T67QN
Dataset updated
Oct 9, 2022
Dataset provided by
Center for Open Sciencehttps://cos.io/
Authors
Kirstin Clephane; Tierney Lorenz
Description
This is a how-to guide for cleaning vaginal photoplethysmography (VPP) signal using AcqKnowledge (v. 5.0.5) and Excel (or equivalent database management software, like OpenOffice).
o
Data from: Skepticism in science and punitive attitudes
openicpsr.org
delimited
Updated May 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jason Rydberg; Luke DeZago (2025). Skepticism in science and punitive attitudes [Dataset]. http://doi.org/10.3886/E228541V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E228541V1
Dataset updated
May 4, 2025
Dataset provided by
University of Massachusetts Lowell
Authors
Jason Rydberg; Luke DeZago
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Replication materials for the manuscript "Skepticism in Science and Punitive Attitudes", published in the Journal of Criminal Justice.Note that the GSS repeated cross sections for 1972 to 2018 are too large to upload here, but they can be accessed from https://gss.norc.org/content/dam/gss/get-the-data/documents/spss/GSS_spss.zipIncluded here are:(A link to the repeated cross-sections data)Each of the 3 wave panels (2006-2010; 2008-2012; 2010-2014)Replication R script for the repeated cross sections cleaning and analysisReplication R script for the panel data cleaning and analysisAn excel spreadsheet with Uniform Crime Report data to merge to the cross sections.
Tata Motors Sales Analysis (2021-2022)
kaggle.com
Updated Sep 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
numen_Vikrant (2023). Tata Motors Sales Analysis (2021-2022) [Dataset]. https://www.kaggle.com/datasets/numenvikrant/tata-motors-sales-analysis-2021-2022
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 15, 2023
Dataset provided by
Kaggle
Authors
numen_Vikrant
Description
I'm excited to share my recent project where I dived deep into the world of data analysis to gain valuable insights into Tata Motors' sales data for the fiscal year 2021-2022. 📈

Project Highlights:

Data Processing and Cleaning: I meticulously cleaned and processed the dataset, ensuring accuracy and reliability in the analysis.

In-Depth Analysis: Through advanced analytical techniques, I uncovered patterns, trends, and key metrics within the data, helping to reveal critical business insights.

Data Visualization: I transformed the complex sales data into clear and insightful visual representations, making it easier for stakeholders to grasp the findings.

Interactive Dashboard: I designed an interactive dashboard that allows users to explore the data dynamically, facilitating a deeper understanding of the sales performance.

Findings: Tata Motors achieved 105% growth in sales, marking an impressive 126% profit increase compared to the year 2021.

This remarkable growth not only showcases the company's resilience but also the effectiveness of their strategies and operations. It's a testament to the hard work and dedication of the entire Tata Motors team.
s
Cleaning Robot Market Size, Share, Growth Analysis, By Product(Vacuum...
skyquestt.com
Updated Apr 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SkyQuest Technology (2024). Cleaning Robot Market Size, Share, Growth Analysis, By Product(Vacuum Cleaning Robots, Floor Cleaning Robots, Window Cleaning Robots, Pool Cleaning Robots), By Application(Residential, Commercial, Industrial, and others), By Sales Channel(Online, Offline, and Others), By Region - Industry Forecast 2024-2031 [Dataset]. https://www.skyquestt.com/report/cleaning-robot-market
Explore at:
Dataset updated
Apr 16, 2024
Dataset authored and provided by
SkyQuest Technology
License
https://www.skyquestt.com/privacy/https://www.skyquestt.com/privacy/
Time period covered
2024 - 2031
Area covered
Global
Description
Global Cleaning Robot Market size was valued at USD 4.19 billion in 2022 and is poised to grow from USD 4.97 billion in 2023 to USD 12.81 billion by 2031, growing at a CAGR of 22.9% in the forecast period (2024-2031).
d
Correspondence Metadata from the Digital Scholarly Edition of Edvard Munch's...
search.dataone.org
dataverse.azure.uit.no
+1more
Updated Sep 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rockenberger, Annika; Sjølie, Loke; Bøe, Hilde (2024). Correspondence Metadata from the Digital Scholarly Edition of Edvard Munch's Writings [Dataset]. http://doi.org/10.18710/TAFUSV
Explore at:
Unique identifier
https://doi.org/10.18710/TAFUSV
Dataset updated
Sep 25, 2024
Dataset provided by
DataverseNO
Authors
Rockenberger, Annika; Sjølie, Loke; Bøe, Hilde
Time period covered
Jan 1, 1874 - Jan 1, 1944
Description
The eMunch dataset contains correspondence metadata of 8.527 letters to and from the Norwegian painter Edvard Munch (1863-1944). The dataset is derived from the digital scholarly edition of Edvard Munch's Writings, eMunch.no, edited by Hilde Bøe, The Munch Museum, Oslo. The eMunch dataset is part of the NorKorr - Norwegian Correspondences project that aims to collect metadata from all correspondences in collections of Norwegian academic and cultural heritage institutions, project website on GitHub. A Python script was developed to parse the XML files on eMunch.no and supplementary data files (Excel spreadsheet with updated dates, CSV file with GeoNames IDs for places) and extract the following metadata: sender's name, receiver's name, place name, date, and letter ID in the scholarly edition. These metadata were then converted into the Correspondence Metadata Interchange Format (CMIF). The entire dataset has been integrated into the international CorrespSearch search service for scholarly editions of letters hosted by the Berlin-Brandenburg Academy of Sciences—link to the CorrespSearch website.
Dataset for "Cognitive behavioural therapy self-help intervention...
zenodo.org
data.niaid.nih.gov
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chelsea Coumoundouros; Chelsea Coumoundouros; Paul Farrand; Paul Farrand; Alexander Hamilton; Alexander Hamilton; Louise Von Essen; Robbert Sanderman; Joanne Woodford; Joanne Woodford; Louise Von Essen; Robbert Sanderman (2024). Dataset for "Cognitive behavioural therapy self-help intervention preferences among informal caregivers of adults with chronic kidney disease: an online cross-sectional survey" [Dataset]. http://doi.org/10.5281/zenodo.7104638
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7104638
Dataset updated
Jul 16, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Chelsea Coumoundouros; Chelsea Coumoundouros; Paul Farrand; Paul Farrand; Alexander Hamilton; Alexander Hamilton; Louise Von Essen; Robbert Sanderman; Joanne Woodford; Joanne Woodford; Louise Von Essen; Robbert Sanderman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data and R code used for the analysis of data for the publication: Coumoundouros et al., Cognitive behavioural therapy self-help intervention preferences among informal caregivers of adults with chronic kidney disease: an online cross-sectional survey. BMC Nephrology

Summary of study

An online cross-sectional survey for informal caregivers (e.g. family and friends) of people living with chronic kidney disease in the United Kingdom. Study aimed to examine informal caregivers' cognitive behavioural therapy self-help intervention preferences, and describe the caregiving situation (e.g. types of care activities) and informal caregiver's mental health (depression, anxiety and stress symptoms).

Participants were eligible to participate if they were at least 18 years old, lived in the United Kingdom, and provided unpaid care to someone living with chronic kidney disease who was at least 18 years old.

The online survey included questions regarding (1) informal caregiver's characteristics; (2) care recipient's characteristics; (3) intervention preferences (e.g. content, delivery format); and (4) informal caregiver's mental health. Informal caregiver's mental health was assessed using the 21 item Depression, Anxiety, and Stress Scale (DASS-21), which is composed of three subscales measuring depression, anxiety, and stress, respectively.

Sixty-five individuals participated in the survey.

See the published article for full study details.

Description of uploaded files

1. ENTWINE_ESR14_Kidney Carer Survey Data_FULL_2022-08-30: Excel file with the complete, raw survey data. Note: the first half of participant's postal codes was collected, however this data was removed from the uploaded dataset to ensure participant anonymity.

2. ENTWINE_ESR14_Kidney Carer Survey Data_Clean DASS-21 Data_2022-08-30: Excel file with cleaned data for the DASS-21 scale. Data cleaning involved imputation of missing data if participants were missing data for one item within a subscale of the DASS-21. Missing values were imputed by finding the mean of all other items within the relevant subscale.

3. ENTWINE_ESR14_Kidney Carer Survey_KEY_2022-08-30: Excel file with key linking item labels in uploaded datasets with the corresponding survey question.

4. R Code for Kidney Carer Survey_2022-08-30: R file of R code used to analyse survey data.

5. R code for Kidney Carer Survey_PDF_2022-08-30: PDF file of R code used to analyse survey data.
Climate Change: Earth Surface Temperature Data
kaggle.com
redivis.com
zip
Updated May 1, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Berkeley Earth (2017). Climate Change: Earth Surface Temperature Data [Dataset]. https://www.kaggle.com/datasets/berkeleyearth/climate-change-earth-surface-temperature-data
Explore at:
zip(88843537 bytes)Available download formats
Dataset updated
May 1, 2017
Dataset authored and provided by
Berkeley Earthhttp://berkeleyearth.org/
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Area covered
Earth
Description
Some say climate change is the biggest threat of our age while others say it’s a myth based on dodgy science. We are turning some of the data over to you so you can form your own view.

Even more than with other data sets that Kaggle has featured, there’s a huge amount of data cleaning and preparation that goes into putting together a long-time study of climate trends. Early data was collected by technicians using mercury thermometers, where any variation in the visit time impacted measurements. In the 1940s, the construction of airports caused many weather stations to be moved. In the 1980s, there was a move to electronic thermometers that are said to have a cooling bias.

Given this complexity, there are a range of organizations that collate climate trends data. The three most cited land and ocean temperature data sets are NOAA’s MLOST, NASA’s GISTEMP and the UK’s HadCrut.

We have repackaged the data from a newer compilation put together by the Berkeley Earth, which is affiliated with Lawrence Berkeley National Laboratory. The Berkeley Earth Surface Temperature Study combines 1.6 billion temperature reports from 16 pre-existing archives. It is nicely packaged and allows for slicing into interesting subsets (for example by country). They publish the source data and the code for the transformations they applied. They also use methods that allow weather observations from shorter time series to be included, meaning fewer observations need to be thrown away.

In this dataset, we have include several files:

Global Land and Ocean-and-Land Temperatures (GlobalTemperatures.csv):

Date: starts in 1750 for average land temperature and 1850 for max and min land temperatures and global ocean and land temperatures

LandAverageTemperature: global average land temperature in celsius

LandAverageTemperatureUncertainty: the 95% confidence interval around the average

LandMaxTemperature: global average maximum land temperature in celsius

LandMaxTemperatureUncertainty: the 95% confidence interval around the maximum land temperature

LandMinTemperature: global average minimum land temperature in celsius

LandMinTemperatureUncertainty: the 95% confidence interval around the minimum land temperature

LandAndOceanAverageTemperature: global average land and ocean temperature in celsius

LandAndOceanAverageTemperatureUncertainty: the 95% confidence interval around the global average land and ocean temperature

Other files include:

Global Average Land Temperature by Country (GlobalLandTemperaturesByCountry.csv)

Global Average Land Temperature by State (GlobalLandTemperaturesByState.csv)

Global Land Temperatures By Major City (GlobalLandTemperaturesByMajorCity.csv)

Global Land Temperatures By City (GlobalLandTemperaturesByCity.csv)

The raw data comes from the Berkeley Earth data page.
Sachem Capital Corp (SCCF): When Will the 7.125% Notes Due 2027 Excel?...
kappasignal.com
Updated Mar 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
KappaSignal (2024). Sachem Capital Corp (SCCF): When Will the 7.125% Notes Due 2027 Excel? (Forecast) [Dataset]. https://www.kappasignal.com/2024/03/sachem-capital-corp-sccf-when-will-7125.html
Explore at:
Dataset updated
Mar 9, 2024
Dataset authored and provided by
KappaSignal
License
https://www.kappasignal.com/p/legal-disclaimer.htmlhttps://www.kappasignal.com/p/legal-disclaimer.html
Description
This analysis presents a rigorous exploration of financial data, incorporating a diverse range of statistical features. By providing a robust foundation, it facilitates advanced research and innovative modeling techniques within the field of finance.

Sachem Capital Corp (SCCF): When Will the 7.125% Notes Due 2027 Excel?

Financial data:

Historical daily stock prices (open, high, low, close, volume)

Fundamental data (e.g., market capitalization, price to earnings P/E ratio, dividend yield, earnings per share EPS, price to earnings growth, debt-to-equity ratio, price-to-book ratio, current ratio, free cash flow, projected earnings growth, return on equity, dividend payout ratio, price to sales ratio, credit rating)

Technical indicators (e.g., moving averages, RSI, MACD, average directional index, aroon oscillator, stochastic oscillator, on-balance volume, accumulation/distribution A/D line, parabolic SAR indicator, bollinger bands indicators, fibonacci, williams percent range, commodity channel index)

Machine learning features:

Feature engineering based on financial data and technical indicators

Sentiment analysis data from social media and news articles

Macroeconomic data (e.g., GDP, unemployment rate, interest rates, consumer spending, building permits, consumer confidence, inflation, producer price index, money supply, home sales, retail sales, bond yields)

Potential Applications:

Stock price prediction

Portfolio optimization

Algorithmic trading

Market sentiment analysis

Risk management

Use Cases:

Researchers investigating the effectiveness of machine learning in stock market prediction

Analysts developing quantitative trading Buy/Sell strategies

Individuals interested in building their own stock market prediction models

Students learning about machine learning and financial applications

Additional Notes:

The dataset may include different levels of granularity (e.g., daily, hourly)

Data cleaning and preprocessing are essential before model training

Regular updates are recommended to maintain the accuracy and relevance of the data
c
Supporting data for "Using the scanning fluid dynamic gauging device to...
repository.cam.ac.uk
bin, xls
Updated Sep 16, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali, Akin; Ward, Glenn; Alam, Zayeem; Wilson, David Ian (2015). Supporting data for "Using the scanning fluid dynamic gauging device to understand the cleaning of baked lard layers" (Ali et al., Journal of Surfactants and Detergents) [Dataset]. http://doi.org/10.17863/CAM.68933
Explore at:
xls(182272 bytes), bin(493625 bytes), bin(302421 bytes), bin(123870 bytes), bin(234148 bytes)Available download formats
Unique identifier
https://doi.org/10.17863/CAM.68933
Dataset updated
Sep 16, 2015
Dataset provided by
University of Cambridge
Apollo
Authors
Ali, Akin; Ward, Glenn; Alam, Zayeem; Wilson, David Ian
License
Attribution-ShareAlike 2.0 (CC BY-SA 2.0)https://creativecommons.org/licenses/by-sa/2.0/
License information was derived automatically
Description
These are Microsoft Excel files which contain the data used to generate the plots in the paper. The files are labelled by Figure number: a complete description is given in the paper.

Facebook

Twitter

Click to copy link

Link copied

Cite

Rong Luo (2023). Data Cleaning Sample [Dataset]. http://doi.org/10.5683/SP3/ZCN177

Data Cleaning Sample

Explore at:

160 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.5683/SP3/ZCN177

Dataset updated

Jul 13, 2023

Dataset provided by

Borealis

Authors

Rong Luo

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Sample data for exercises in Further Adventures in Data Cleaning.

Clear search

Close search

Google apps

Main menu

Data Cleaning Sample

Cleaning Biodiversity Data: A Botanical Example Using Excel or RStudio

Navigating Stats Can Data & Scrubbing Data Clean with Excel Workshop

Sales and workload in retail industry

Context

Content

Possible objectives

License

Data-analysis-EXCEL-POWER-BI

Household Income and Expenditure 2010 - Tuvalu

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Sampling deviation

Mode of data collection

Research instrument

Cleaning operations

Spreadsheet Processing Capabilities

💄 Cosmetics & Skincare Product Sales Data (2022)

📁 Dataset Overview

🧾 Sample Products

🌏 Countries Covered

📊 Quick Stats

🎯 Ideal For

📌 Suggested Projects & Questions

✅ Clean Data Guarantee

REACH Project: Gel Dot Audits Data

Superstore Sales Analysis

Data from: Data cleaning and enrichment through data integration: networking...

Description of the data and file structure

Description of the main data files

Vaginal Pulse Amplitude Data Cleaning Guide

Data from: Skepticism in science and punitive attitudes

Tata Motors Sales Analysis (2021-2022)

Cleaning Robot Market Size, Share, Growth Analysis, By Product(Vacuum...

Correspondence Metadata from the Digital Scholarly Edition of Edvard Munch's...

Dataset for "Cognitive behavioural therapy self-help intervention...

Climate Change: Earth Surface Temperature Data

Sachem Capital Corp (SCCF): When Will the 7.125% Notes Due 2027 Excel?...

Sachem Capital Corp (SCCF): When Will the 7.125% Notes Due 2027 Excel?

Financial data:

Machine learning features:

Potential Applications:

Use Cases:

Additional Notes:

Supporting data for "Using the scanning fluid dynamic gauging device to...

Data Cleaning Sample