Facebook
TwitterHOW TO: - Hierarchy using the category, subcategory & product fields (columns “Product Category” “Product SubCategory”, & “Product Name”). - Group the values of the column "Region" into 2 groups, alphabetically, based on the name of each region.
Facebook
Twitterhttp://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
On the official website the dataset is available over SQL server (localhost) and CSVs to be used via Power BI Desktop running on Virtual Lab (Virtaul Machine). As per first two steps of Importing data are executed in the virtual lab and then resultant Power BI tables are copied in CSVs. Added records till year 2022 as required.
this dataset will be helpful in case you want to work offline with Adventure Works data in Power BI desktop in order to carry lab instructions as per training material on official website. The dataset is useful in case you want to work on Power BI desktop Sales Analysis example from Microsoft website PL 300 learning.
Download the CSV file(s) and import in Power BI desktop as tables. The CSVs are named as tables created after first two steps of importing data as mentioned in the PL-300 Microsoft Power BI Data Analyst exam lab.
Facebook
Twitterhttps://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Warning: Large file size (over 1GB). Each monthly data set is large (over 4 million rows), but can be viewed in standard software such as Microsoft WordPad (save by right-clicking on the file name and selecting 'Save Target As', or equivalent on Mac OSX). It is then possible to select the required rows of data and copy and paste the information into another software application, such as a spreadsheet. Alternatively, add-ons to existing software, such as the Microsoft PowerPivot add-on for Excel, to handle larger data sets, can be used. The Microsoft PowerPivot add-on for Excel is available from Microsoft http://office.microsoft.com/en-gb/excel/download-power-pivot-HA101959985.aspx Once PowerPivot has been installed, to load the large files, please follow the instructions below. Note that it may take at least 20 to 30 minutes to load one monthly file. 1. Start Excel as normal 2. Click on the PowerPivot tab 3. Click on the PowerPivot Window icon (top left) 4. In the PowerPivot Window, click on the "From Other Sources" icon 5. In the Table Import Wizard e.g. scroll to the bottom and select Text File 6. Browse to the file you want to open and choose the file extension you require e.g. CSV Once the data has been imported you can view it in a spreadsheet. What does the data cover? General practice prescribing data is a list of all medicines, dressings and appliances that are prescribed and dispensed each month. A record will only be produced when this has occurred and there is no record for a zero total. For each practice in England, the following information is presented at presentation level for each medicine, dressing and appliance, (by presentation name): - the total number of items prescribed and dispensed - the total net ingredient cost - the total actual cost - the total quantity The data covers NHS prescriptions written in England and dispensed in the community in the UK. Prescriptions written in England but dispensed outside England are included. The data includes prescriptions written by GPs and other non-medical prescribers (such as nurses and pharmacists) who are attached to GP practices. GP practices are identified only by their national code, so an additional data file - linked to the first by the practice code - provides further detail in relation to the practice. Presentations are identified only by their BNF code, so an additional data file - linked to the first by the BNF code - provides the chemical name for that presentation.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Conduct a thorough examination of the dataset to identify any anomalies or inconsistencies.
Duplicate Removal:
Identify and remove duplicate rows within the dataset.
Ensure data integrity by eliminating redundant entries.
Standardization of Marital Status and Gender:
Replace 'M' with 'Married' and 'S' with 'Single' in the Marital Status column.
Standardize gender data by replacing 'M' with 'Male' and 'F' with 'Female'.
Commute Distance Standardization:
Modify "10+ Miles" to "Above 10 Miles" for uniformity.
Arrange Commute Distance in ascending order to facilitate analysis.
Age Group Classification:
Introduce an additional column named "Age Group" for age categorization.
Calculate ages from existing data, categorizing: - Below 30 years as "Young Adults". - Between 31 and 45 years as "Middle-aged Adults". - Above 45 years as "Old-aged Adults".
6.Verification and Data Loading: - Validate all transformations to ensure accuracy and coherence. - Load the refined dataset back into Excel for further analysis.
Facebook
TwitterBi Power World Wide Z Limited Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
*****Documentation Process***** 1. Data Preparation: - Upload the data into Power Query to assess quality and identify duplicate values, if any. - Verify data quality and types for each column, addressing any miswriting or inconsistencies. 2. Data Management: - Duplicate the original data sheet for future reference and label the new sheet as the "Working File" to preserve the integrity of the original dataset. 3. Understanding Metrics: - Clarify the meaning of column headers, particularly distinguishing between Impressions and Reach, and comprehend how Engagement Rate is calculated. - Engagement Rate formula: Total likes, comments, and shares divided by Reach. 4. Data Integrity Assurance: - Recognize that Impressions should outnumber Reach, reflecting total views versus unique audience size. - Investigate discrepancies between Reach and Impressions to ensure data integrity, identifying and resolving root causes for accurate reporting and analysis. 5. Data Correction: - Collaborate with the relevant team to rectify data inaccuracies, specifically addressing the discrepancy between Impressions and Reach. - Engage with the concerned team to understand the root cause of discrepancies between Impressions and Reach. - Identify instances where Impressions surpass Reach, potentially attributable to data transformation errors. - Following the rectification process, meticulously adjust the dataset to reflect the corrected Impressions and Reach values accurately. - Ensure diligent implementation of the corrections to maintain the integrity and reliability of the data. - Conduct a thorough recalculation of the Engagement Rate post-correction, adhering to rigorous data integrity standards to uphold the credibility of the analysis. 6. Data Enhancement: - Categorize Audience Age into three groups: "Senior Adults" (45+ years), "Mature Adults" (31-45 years), and "Adolescent Adults" (<30 years) within a new column named "Age Group." - Split date and time into separate columns using the text-to-columns option for improved analysis. 7. Temporal Analysis: - Introduce a new column for "Weekend and Weekday," renamed as "Weekday Type," to discern patterns and trends in engagement. - Define time periods by categorizing into "Morning," "Afternoon," "Evening," and "Night" based on time intervals. 8. Sentiment Analysis: - Populate blank cells in the Sentiment column with "Mixed Sentiment," denoting content containing both positive and negative sentiments or ambiguity. 9. Geographical Analysis: - Group countries and obtain additional continent data from an online source (e.g., https://statisticstimes.com/geography/countries-by-continents.php). - Add a new column for "Audience Continent" and utilize XLOOKUP function to retrieve corresponding continent data.
*****Drawing Conclusions and Providing a Summary*****
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset represents a large tertiary-care hospital with 25 clinical departments and a total capacity of 5500 inpatient beds, including 1080 ICU beds. All data is fully synthetic and created for educational and analytical purposes. It is designed to support:
The dataset approximates real-world bed allocation patterns across major clinical specialties such as Emergency Care, Surgery, Pediatrics, ICU, Oncology, and Long-Term Care.
To maintain realism, departments have varying occupancy levels: - some are under low load (free capacity), - some operate under normal/medium load, - several are intentionally modeled as overloaded/high occupancy, to reflect real hospital dynamics.
All metrics simulate plausible hospital operations.
Facebook
TwitterAnalyzing sales data is essential for any business looking to make informed decisions and optimize its operations. In this project, we will utilize Microsoft Excel and Power Query to conduct a comprehensive analysis of Superstore sales data. Our primary objectives will be to establish meaningful connections between various data sheets, ensure data quality, and calculate critical metrics such as the Cost of Goods Sold (COGS) and discount values. Below are the key steps and elements of this analysis:
1- Data Import and Transformation:
2- Data Quality Assessment:
3- Calculating COGS:
4- Discount Analysis:
5- Sales Metrics:
6- Visualization:
7- Report Generation:
Throughout this analysis, the goal is to provide a clear and comprehensive understanding of the Superstore's sales performance. By using Excel and Power Query, we can efficiently manage and analyze the data, ensuring that the insights gained contribute to the store's growth and success.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This is a synthetic dataset generated to mimic real-world e-commerce return management scenarios. Since actual return data is often confidential and unavailable, this dataset was created with realistic assumptions around orders, products, customers, and return behaviors.
It can be used for:
Predictive modeling of return likelihood (classification problems).
Business analytics on profitability loss due to returns.
Sustainability analysis (CO₂ emissions and waste impact from reverse logistics).
📌 Dataset Features (Columns)
Order_ID → Unique order identifier.
Product_ID → Unique product identifier.
User_ID → Unique customer identifier.
Order_Date → Date when the order was placed.
Return_Date → Date when the product was returned (if returned).
Product_Category → Category of the product (e.g., Clothing, Electronics, Books, Toys, etc.).
Product_Price → Price of the product per unit.
Order_Quantity → Number of units purchased in the order.
Discount_Applied → Discount percentage applied on the product.
Return_Status → Whether the order was Returned or Not Returned.
Return_Reason → Reason for return (e.g., Damaged, Wrong Item, Changed Mind).
Days_to_Return → Number of days taken by customer to return (0 if not returned).
User_Age → Age of the customer.
User_Gender → Gender of the customer (Male/Female).
User_Location → City/region of the customer.
Payment_Method → Mode of payment (Credit Card, Debit Card, PayPal, Gift Card, etc.).
Shipping_Method → Chosen shipping type (Standard, Express, Next-Day).
Return_Cost → Estimated logistics cost incurred when a return happens.
Profit_Loss → Net profit or loss for the order, considering product price, discount, and return cost.
CO2_Saved → Estimated CO₂ emissions saved (if return avoided).
Waste_Avoided → Estimated physical waste avoided (in units/items).
💡 Use Cases
MBA & academic projects in Business Analytics and Supply Chain Management.
Training predictive models for return forecasting.
Measuring sustainability KPIs (CO₂ reduction, waste avoidance).
Dashboards in Power BI/Tableau for business decision-making.
Quick Start Example:
import pandas as pd
df = pd.read_csv("/kaggle/input/synthetic-ecommerce-returns/returns_sustainability_dataset.csv")
print(df.head())
print(df.info())
print(df['Return_Status'].value_counts(normalize=True))
category_returns = df.groupby('Product_Category')['Return_Status'].mean().sort_values(ascending=False) print(category_returns)
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Data Import and Table Selection: Import Excel data into Power BI. Select specific tables (Calendar, Customer, Product, Sales, Terriority). Data Modeling: Design star schema architecture in Model view. Establish relationships between tables. Data Transformation: Filter Calendar table for years 2017 and 2018. Remove unnecessary columns from the Calendar table. Utilize Power Query Editor for data manipulation. DAX Measures: Create measures for analyzing sales data. Use DAX functions to calculate total sales, tax amount, total orders, distinct product count, etc. Add comments to DAX measures for clarity. Visualization: Create matrices to display summarized data. Format measures (e.g., change to currency). Utilize visual elements like icons and tooltips for better understanding. Drill-Down Analysis: Implement drill-down functionality to explore data hierarchically. Additional Measures: Calculate total customers and percentage of distinct customers. Analyze product-related metrics (e.g., max price, weight values). Data Quality Analysis: Identify and analyze empty cells in specific columns. Multiple Sheets and Visuals: Create multiple sheets with different matrix tables. Utilize slicers for interactive filtering. Implement visual filters for dynamic data exploration. Advanced DAX Functions: Utilize SUMX function for calculating total sales including tax. Calculate dealer margin using SUMX function. Conclusion: Summarize the project and its focus on measures, matrix tables, and advanced DAX functions. Overall, your project plan covers various aspects of data analysis and visualization in Power BI, from data import to advanced calculations and visualization techniques, providing a comprehensive guide for analysis and decision-making.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterHOW TO: - Hierarchy using the category, subcategory & product fields (columns “Product Category” “Product SubCategory”, & “Product Name”). - Group the values of the column "Region" into 2 groups, alphabetically, based on the name of each region.