Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is Pro Power BI Desktop : Free Interactive Data Analysis with Microsoft Power BI. It features 7 columns including author, publication date, language, and book publisher.
Company Datasets for valuable business insights!
Discover new business prospects, identify investment opportunities, track competitor performance, and streamline your sales efforts with comprehensive Company Datasets.
These datasets are sourced from top industry providers, ensuring you have access to high-quality information:
We provide fresh and ready-to-use company data, eliminating the need for complex scraping and parsing. Our data includes crucial details such as:
You can choose your preferred data delivery method, including various storage options, delivery frequency, and input/output formats.
Receive datasets in CSV, JSON, and other formats, with storage options like AWS S3 and Google Cloud Storage. Opt for one-time, monthly, quarterly, or bi-annual data delivery.
With Oxylabs Datasets, you can count on:
Pricing Options:
Standard Datasets: choose from various ready-to-use datasets with standardized data schemas, priced from $1,000/month.
Custom Datasets: Tailor datasets from any public web domain to your unique business needs. Contact our sales team for custom pricing.
Experience a seamless journey with Oxylabs:
Unlock the power of data with Oxylabs' Company Datasets and supercharge your business insights today!
In the beginning, the case was just data for a company that did not indicate any useful information that would help decision-makers. In this case, after collecting a number of revenues and expenses over the months.
Needed to know the answers to a number of questions to make important decisions based on intuition-free data.
The Questions:-
About Rev. & Exp.
- What is the total sales and profit for the whole period? And What Total products sold? And What is Net profit?
- In which month was the highest percentage of revenue achieved? And in the same month, what is the largest day have amount of revenue?
- In which month was the highest percentage of expenses achieved? And in the same month, what is the largest day have amount of exp.?
- What is the extent of the change in expenditures for each month?
Percentage change in net profit over the months?
About Distribution
- What is the number of products sold each month in the largest state?
-The top 3 largest states buying products during the two years?
Comparison
- Between Sales Method by Sales?
- Between Men and Women’s Product by Sales?
- Between Retailer by Profit?
What I did? - Understanding the data - preprocessing and clean the data - Solve The problems in the cleaning like missing data or false type data - querying the data and make some calculations like "COGS" with power query "Excel". - Modeling and make some measures on the data with power pivot "Excel" - After finishing processing and preparation, I made Some Pivot tables to answers the questions. - Last, I made a dashboard with Power BI to visualize The Results.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Dataset Overview:
Contains sales data from Blinkit, including product details, order quantities, revenue, and timestamps.
Useful for demand forecasting, price optimization, trend analysis, and business insights.
Helps in understanding customer behavior and seasonal variations in online grocery shopping.
Potential Use Cases:
- Time Series Analysis: Analyze sales trends over different periods.
- Demand Forecasting: Predict future product demand based on historical data.
- Price Optimization: Identify the impact of pricing on sales and revenue.
- Customer Behavior Analysis: Understand buying patterns and preferences.
- Market Trends: Explore how different factors affect grocery sales performance.
This dataset can be beneficial for data scientists, business analysts, and researchers looking to explore e-commerce and retail trends. Feel free to use it for analysis, machine learning models, and business intelligence projects.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16299142%2Fa633fb36dc370263696b5d2ec940c74f%2FScreenshot%202025-06-16%20082824.png?generation=1750086765806732&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16299142%2F8843129c88c2f57d66006a3ac9d37dc7%2FScreenshot%202025-06-16%20084001.png?generation=1750086777975125&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16299142%2Ffa4f29a8f4cc763a1cc66c7913c077e8%2FScreenshot%202025-06-16%20084007.png?generation=1750086787100561&alt=media" alt="">
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Intellectual Property Government Open Data (IPGOD) includes over 100 years of registry data on all intellectual property (IP) rights administered by IP Australia. It also has derived information about the applicants who filed these IP rights, to allow for research and analysis at the regional, business and individual level. This is the 2019 release of IPGOD.
IPGOD is large, with millions of data points across up to 40 tables, making them too large to open with Microsoft Excel. Furthermore, analysis often requires information from separate tables which would need specialised software for merging. We recommend that advanced users interact with the IPGOD data using the right tools with enough memory and compute power. This includes a wide range of programming and statistical software such as Tableau, Power BI, Stata, SAS, R, Python, and Scalar.
IP Australia is also providing free trials to a cloud-based analytics platform with the capabilities to enable working with large intellectual property datasets, such as the IPGOD, through the web browser, without any installation of software. IP Data Platform
The following pages can help you gain the understanding of the intellectual property administration and processes in Australia to help your analysis on the dataset.
Due to the changes in our systems, some tables have been affected.
Data quality has been improved across all tables.
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Warning: Large file size (over 1GB). Each monthly data set is large (over 4 million rows), but can be viewed in standard software such as Microsoft WordPad (save by right-clicking on the file name and selecting 'Save Target As', or equivalent on Mac OSX). It is then possible to select the required rows of data and copy and paste the information into another software application, such as a spreadsheet. Alternatively, add-ons to existing software, such as the Microsoft PowerPivot add-on for Excel, to handle larger data sets, can be used. The Microsoft PowerPivot add-on for Excel is available from Microsoft http://office.microsoft.com/en-gb/excel/download-power-pivot-HA101959985.aspx Once PowerPivot has been installed, to load the large files, please follow the instructions below. Note that it may take at least 20 to 30 minutes to load one monthly file. 1. Start Excel as normal 2. Click on the PowerPivot tab 3. Click on the PowerPivot Window icon (top left) 4. In the PowerPivot Window, click on the "From Other Sources" icon 5. In the Table Import Wizard e.g. scroll to the bottom and select Text File 6. Browse to the file you want to open and choose the file extension you require e.g. CSV Once the data has been imported you can view it in a spreadsheet. What does the data cover? General practice prescribing data is a list of all medicines, dressings and appliances that are prescribed and dispensed each month. A record will only be produced when this has occurred and there is no record for a zero total. For each practice in England, the following information is presented at presentation level for each medicine, dressing and appliance, (by presentation name): - the total number of items prescribed and dispensed - the total net ingredient cost - the total actual cost - the total quantity The data covers NHS prescriptions written in England and dispensed in the community in the UK. Prescriptions written in England but dispensed outside England are included. The data includes prescriptions written by GPs and other non-medical prescribers (such as nurses and pharmacists) who are attached to GP practices. GP practices are identified only by their national code, so an additional data file - linked to the first by the practice code - provides further detail in relation to the practice. Presentations are identified only by their BNF code, so an additional data file - linked to the first by the BNF code - provides the chemical name for that presentation.
Power BI Dashboard : https://www.mavenanalytics.io/project/3776
The IPL (Indian Premier League) is one of the most popular and widely followed cricket leagues in the world. It features top cricket players from around the world playing for various franchise teams in India. The league is known for its high-scoring matches, intense rivalries, and innovative marketing strategies.
If you are a data enthusiast or a cricket fan, you will be excited to know that there is a dataset available on Kaggle that contains comprehensive information about the IPL matches played over the years. This dataset is a valuable resource for anyone interested in analyzing the performance of players and teams in the league.
The IPL dataset on Kaggle contains information on over 800 IPL matches played from 2008 to 2020. It includes details on the date, time, venue, teams, players, and various statistics such as runs scored, wickets taken, and more. The dataset also contains information on the individual performances of players and teams, as well as the overall performance of the league over the years.
The IPL dataset is a goldmine for data analysts and cricket enthusiasts alike. It provides a wealth of information that can be used to uncover insights about the league and its players. For example, you can use the dataset to analyze the performance of a particular player or team over the years, or to identify trends in the league such as changes in team strategies or the emergence of new players.
If you are new to data analysis, the IPL dataset is a great place to start. You can use it to learn how to use tools such as Excel or Power BI to create visualizations and gain insights from data. With the right skills and tools, you can use the IPL dataset to create interactive dashboards and reports that provide valuable insights into the world of cricket.
Overall, the IPL dataset on Kaggle is an excellent resource for anyone interested in cricket or data analysis. It contains a wealth of information that can be used to analyze and gain insights into the performance of players and teams in one of the most exciting cricket leagues in the world.
This dataset contains points table and player Information. To view more data such as Match stats, Ball_by_ball data & Player innings data, Please visit the below links:
Match stats, Ball_by_ball data: https://www.kaggle.com/datasets/biswajitbrahmma/ipl-complete-dataset-2008-2022
Player innings data: https://www.kaggle.com/datasets/paritosh712/cricket-every-single-ipl-inning-20082022
Thanks to Biswajit Brahmma & Paritosh Anand for their dataset.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is part of my portfolio project and is shared here to encourage exploration and further analysis. Feel free to use it as-is, build upon it, or integrate it into your own projects. Whether you're practicing data analysis, testing models, or just need a clean dataset to work with—this resource is available for you.
This dataset represents simulated sales data for an electronics shop operating in the United States from 2024 (January to November). It is designed for individuals who want to practice data analysis, visualization, and machine learning techniques. The dataset reflects real-world sales scenarios, including various products, customer information, order statuses, and sales channels. It is ideal for learning and experimenting with data analytics, business insights, and visualization tools like Power BI, Tableau, or Python libraries.
Dataset Features ProductID: Unique identifier for each product. ProductName: Name of the electronic product (e.g., Phone, Laptop, Drone). ProductPrice: The price of the product is in USD. OrderedQuantity: Number of units ordered by the customer. OrderStatus: Status of the order (e.g., Delivered, In Process, On Hold, Canceled). CustomerName: Name of the customer who placed the order. State: State of the customer in the United States (e.g., California, Texas). City: City of the customer within the state. Latitude & Longitude: Geographic coordinates of the customer's location for mapping purposes. OrderChannel: Channel through which the order was placed (e.g., Website, Phone, Physical Store, Social Media). OrderDate: Date of the order (range: January 1, 2024, to November 30, 2024).
Potential Use Cases
Exploratory Data Analysis (EDA): Analyze sales trends across months, states, or product categories. Identify the most popular sales channels or products. Examine the distribution of order statuses.
Data Visualization: Create dashboards to visualize sales performance, customer demographics, and geographic distribution. Plot order locations on a map using latitude and longitude.
Machine Learning: Predict future sales trends using historical data. Classify order statuses based on product and order details. Cluster customers based on purchase behavior or location.
Business Insights: Analyze revenue contributions from different states or cities. Understand customer preferences across product categories.
Technical Details File Format: Excel (with a .xlsx extension) Number of Rows: 11000 Period: January 1, 2024, to November 30, 2024 Simulated Data: The data is entirely synthetic and does not represent real customers or transactions.
Why Use This Dataset? This dataset is tailored for individuals and students interested in: Building their data analysis and visualization skills. Learning how to work with real-world-like business datasets. Practicing machine learning with structured data. Acknowledgment This dataset was generated to mimic real-world sales data scenarios for educational and research purposes. Feel free to use it for learning and projects, and share your insights with the community!
Harness AI-Driven Precision for Global Company Insights Leverage cutting-edge AI agents to fetch and validate company registry data in real-time, bypassing obsolete databases. Unlike traditional providers, our service dynamically retrieves data directly from government registries worldwide, ensuring up-to-the-minute accuracy and eliminating outdated records.
Key Features 1. AI-Powered Real-Time Access: Deploy autonomous AI agents to collect and structure data from any national registry, even those with dynamic layouts or authentication barriers.
Universal Registry Compatibility: Seamlessly extract data from 250+ countries, including hard-to-access regions, with automatic translation and normalization.
Document Processing: Parse financial filings, annual reports, and legal documents (PDF, DOCX) using NLP-driven analysis. Extract key attributes like ownership structures, director details, and compliance status.
Format Flexibility: Receive data via API, CSV, JSON, or custom formats (e.g., PostgreSQL DB, Google Sheets) with hourly/daily refresh options.
99% Accuracy Guarantee: Multi-layer validation via AI cross-referencing and human audits ensures error-free datasets.
Data Sourcing & Coverage 1. Sources: Direct integration with 1,800+ government registries of your choice on demand, supplemented by AI-enhanced verification of public filings and regulatory submissions.
Attributes: Company name, registration number, directors, shareholders, financials, litigation history, and industry-specific certifications (e.g., ISO, NAICS).
Historical Data: 10+ years of archived records, updated in real-time.
Use Cases 1. Due Diligence: Verify company legitimacy for mergers, acquisitions, or partnerships.
Compliance: Streamline KYC/AML workflows with automated registry checks.
Market Research: Track competitor expansions, ownership changes, or industry trends.
Risk Management: Monitor regulatory violations or financial instability signals.
Credit Reporting: Automate end-to-end credit report creation process.
Technical Specifications 1. Delivery: API (REST/GraphQL), SFTP, cloud sync (AWS S3, Google Cloud).
Integration: Custom connectors for Salesforce, HubSpot, and BI tools (Tableau, Power BI).
Latency: Sub-5-second to 60 mins response time for on-demand queries based on the complexity and response time of registry.
Why Choose Us? 1. Pioneers in AI Agent Technology: Outperform static datasets with live registry scraping.
GDPR/CCPA Compliance: Data sourced ethically from public registries, with audit trails on output.
Free Sample: Test 100 records at zero cost.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://raw.githubusercontent.com/Masterx-AI/Project_Retail_Analysis_with_Walmart/main/Wallmart1.jpg" alt="">
One of the leading retail stores in the US, Walmart, would like to predict the sales and demand accurately. There are certain events and holidays which impact sales on each day. There are sales data available for 45 stores of Walmart. The business is facing a challenge due to unforeseen demands and runs out of stock some times, due to the inappropriate machine learning algorithm. An ideal ML algorithm will predict demand accurately and ingest factors like economic conditions including CPI, Unemployment Index, etc.
Walmart runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of all, which are the Super Bowl, Labour Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks. Part of the challenge presented by this competition is modeling the effects of markdowns on these holiday weeks in the absence of complete/ideal historical data. Historical sales data for 45 Walmart stores located in different regions are available.
The dataset is taken from Kaggle.
Comprehensive dataset of 33 Solar photovoltaic power plants in Free municipal consortium of Ragusa, Italy as of July, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
📝 Dataset Overview: This enhanced dataset captures the real-world operational and clinical performance data from a major hospital in Nigeria — Lagos University Teaching Hospital (LUTH). It includes detailed information on admissions, patient care, medical services, billing, and staff activities, ideal for healthcare analytics, hospital management dashboards, and machine learning projects.
🔍 Dataset Features (Suggested Columns): Column Name Description Patient_ID Unique anonymized patient ID Admission_Date Date of admission Discharge_Date Date of discharge Gender Patient’s gender Age Patient’s age Department Medical department involved Diagnosis Primary diagnosis Doctor Attending physician (anonymized) Treatment_Provided Type of treatment/procedure Lab_Tests Count of lab tests conducted Medications_Administered Total medications given Surgery_Cost (₦) If applicable, cost of surgery Bill_Amount (₦) Total bill charged to patient Ward Hospital ward assigned Length_of_Stay (days) Duration of hospitalization
🎯 Use Cases: Build hospital operations dashboards in Power BI
Analyze billing and cost patterns across departments
Predict length of stay or discharge outcomes
Explore departmental workload and performance
Use as a base for AI in hospital management systems
🏥 Clinical & Operational Value: This dataset empowers analysts and healthcare professionals to:
Track patient outcomes and billing efficiency
Reduce operational bottlenecks
Improve patient care with data-driven recommendations
Benchmark departmental performance
Train predictive models for resource allocation
👤 Created By: Fatolu Peter (Emperor Analytics) Dedicated to transforming public healthcare using analytics and real-world data across Nigerian hospitals. This is Project 14 in my growing health-tech analytics journey.
✅ LinkedIn Post: 🚑 New Kaggle Dataset: LUTH Hospital Enhanced Clinical & Operations Data 📊 Real hospital data on admissions, billing, treatments, and care metrics 🔗 Access the dataset now on Kaggle
This dataset gives you: ✅ Real hospital operations data ✅ Billing and medication insights ✅ Doctor and ward-level activity ✅ A perfect base for building Power BI dashboards or training ML models
Whether you're a data scientist, health analyst, or Power BI pro — this is real-world data to make real impact. Let’s build something powerful together. 💡
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Airline data holds immense importance as it offers insights into the functioning and efficiency of the aviation industry. It provides valuable information about flight routes, schedules, passenger demographics, and preferences, which airlines can leverage to optimize their operations and enhance customer experiences. By analyzing data on delays, cancellations, and on-time performance, airlines can identify trends and implement strategies to improve punctuality and mitigate disruptions. Moreover, regulatory bodies and policymakers rely on this data to ensure safety standards, enforce regulations, and make informed decisions regarding aviation policies. Researchers and analysts use airline data to study market trends, assess environmental impacts, and develop strategies for sustainable growth within the industry. In essence, airline data serves as a foundation for informed decision-making, operational efficiency, and the overall advancement of the aviation sector.
This dataset comprises diverse parameters relating to airline operations on a global scale. The dataset prominently incorporates fields such as Passenger ID, First Name, Last Name, Gender, Age, Nationality, Airport Name, Airport Country Code, Country Name, Airport Continent, Continents, Departure Date, Arrival Airport, Pilot Name, and Flight Status. These columns collectively provide comprehensive insights into passenger demographics, travel details, flight routes, crew information, and flight statuses. Researchers and industry experts can leverage this dataset to analyze trends in passenger behavior, optimize travel experiences, evaluate pilot performance, and enhance overall flight operations.
https://i.imgur.com/cUFuMeU.png" alt="">
The dataset provided here is a simulated example and was generated using the online platform found at Mockaroo. This web-based tool offers a service that enables the creation of customizable Synthetic datasets that closely resemble real data. It is primarily intended for use by developers, testers, and data experts who require sample data for a range of uses, including testing databases, filling applications with demonstration data, and crafting lifelike illustrations for presentations and tutorials. To explore further details, you can visit their website.
Cover Photo by: Kevin Woblick on Unsplash
Thumbnail by: Airplane icons created by Freepik - Flaticon
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The weather station on the campus of Loughborough University, in the East Midlands of the UK, had fallen into disuse and disrepair by the mid-2000s, but in 2007 the availability of infrastructure funding made it possible to re-establish regular weather observation with new equipment. The meteorological dataset subsequently collected at this facility between 2008 and 2021 is archived here. The dataset comes as fourteen Excel (.xlsx) files of annual data, with explanatory notes in each.Site descriptionThe campus weather station is located at latitude 52.7632°, longitude -1.235° and 68 m a.s.l., in a dedicated paddock on a green space near the centre-east boundary of the campus. A cabin, which houses power and network points, sits 10 m to the northeast of the main meteorological instrument tower. The paddock is otherwise mostly open on an arc from the northwest to the northeast, but on the other sides there are fruit trees (mainly varieties of prunus domestica) at distances of 13–16 m, forming part of the university's "Fruit Routes" biodiversity initiative.Data collectionInstruments were fixed to a 3 m lattice mast which is concreted into the ground in the centre of the paddock described above. Up to late July 2013, the instruments were controlled by a solar-charged, battery-powered Campbell Scientific CR1000 data logger, and periodically manually downloaded. From early November 2013, this logger was replaced with a Campbell Scientific CR3000, run from the mains power supply from the cabin and connected to the campus network by ethernet. At the same time, the station's Young 01503 Wind Monitor was replaced by a Gill WindSonic ultrasonic anemometer. This combination remained in place for the rest of the measurement period described here. Frustratingly, the CS215 temperature/relative humidity sensor failed shortly before the peak of the 2018 heatwave, and had to be replaced with another CS215. Likewise, the ARG100 rain gauge was replaced in 2011 and 2016. The main cause of data gaps is the unreliable power supply from the cabin, particularly in 2013 and 2021 (the latter leading to the complete replacement of the cabin and all other equipment). Furthermore, even though the post-2013 CR3000 logger had a backup battery, it sometimes failed to restart after mains power was lost, yielding data gaps until it was manually restarted. Nevertheless, out of 136 instrument-years deployment, only 36 are less than 90% complete, and 21 less than 75% complete.Data processingData retrieved manually or downloaded remotely were filtered for invalid measurements. The 15-minute data were then processed to daily and monthly values, using the pivot table function in Microsoft Excel. Most variables could be output simply as midnight-to-midnight daily means (e.g. solar and net radiation, wind speed). However, certain variables needed to be referred to the UK and Ireland standard ‘Climatological Day’ (Burt, 2012:272), 0900-0900: namely, air temperature minimum and maximum, plus rainfall total. The procedure for this follows Burt (2012; https://www.measuringtheweather.net/) and requires the insertion of additional date columns into the spreadsheet, to define two further, separate ‘Climate Dates’ for maximum temperature and rainfall total (the 24 hours commencing at 0900 on the date given, ‘ClimateDateMax’), and for minimum temperatures (24 hours ending at 0900 on the date given, ‘ClimateDateMin’). For the archived data, in the spreadsheet tabs labelled ‘Output - Daily 09-09 minima’, the pivot table function derives daily minimum temperatures by the correct 0900-0900 date, given by the ClimateDateMin variable. Similarly, in the tabs labelled ‘Output - Daily 09-09 maxima’, the pivot table function derives daily maximum temperatures and daily rainfall totals by the correct 0900-0900 date, given by the ClimateDateMax variable. Then in the tabs labelled ‘Output - Daily 00-00 means’, variables with midnight-to-midnight means use the unmodified date variable. To take into account the effect of missing data, the tab ‘Completeness’ again uses a pivot table to count the numbers of daily and monthly observations where the 15-minute data are not at least 99.99% complete. Values are only entered into the ‘Daily data’ tab of the archived spreadsheets where 15-minute data are at least 75% complete; values are only entered into ‘Monthly data’ tabs where daily data are at least 75% complete.Wind directions are particularly important in UK meteorology because they indicate the origin of air masses with potentially contrasting characteristics. But wind directions are not averaged in the same way as other variables, as they are measured on a circular scale. Instead, 15-minute wind direction data in degrees are converted to 16 compass points (the formula is included in the spreadsheets), and a pivot table is used to summarise these into wind speed categories, giving the frequency and strength of winds by compass point.In order to evaluate the reliability of the collected dataset, it was compared to equivalent variables from the HadUK-Grid dataset (Hollis et al., 2019). HadUK-Grid is a collection of gridded climate variables derived from the network of UK land surface observations, which have been interpolated from meteorological station data onto a uniform grid to provide coherent coverage across the UK at 1 km x 1 km resolution. Daily and monthly air temperature and rainfall variables from the HadUK-Grid v1.1.0.0 Met Office (2022) were downloaded from the Centre for Environmental Data Analysis (CEDA) archive (https://catalogue.ceda.ac.uk/uuid/bbca3267dc7d4219af484976734c9527/). Then the grid square containing the campus weather station was identified using the Point Subset Tool of the NOAA Weather and Climate Toolkit (https://www.ncdc.noaa.gov/wct/index.php) in order to retrieve data from that specific location. Daily and monthly HadUK-grid data are included in the spreadsheets for convenience.Campus temperatures are slightly, but consistently, higher than those indicated by HadUK-grid, while HadUK-Grid rainfall is on average almost 10% higher than that recorded on the campus. Trend-free statistical relationships between campus and HadUK-grid data implies that there is unlikely to be any significant temporal bias in the campus dataset.ReferencesBurt, S. (2012). The Weather Observer's Handbook. Cambridge University Press, https://doi.org/10.1017/CBO9781139152167.Hollis, D, McCarthy, M, Kendon, M., Legg, T., Simpson, I. (2019). HadUK‐Grid—A new UK dataset of gridded climate observations. Geoscience Data Journal 6, 151–159, https://doi.org/10.1002/gdj3.78.Met Office; Hollis, D.; McCarthy, M.; Kendon, M.; Legg, T. (2022). HadUK-Grid Gridded Climate Observations on a 1km grid over the UK, v1.1.0.0 (1836-2021). NERC EDS Centre for Environmental Data Analysis, https://dx.doi.org/10.5285/bbca3267dc7d4219af484976734c9527.
Analyzing sales data is essential for any business looking to make informed decisions and optimize its operations. In this project, we will utilize Microsoft Excel and Power Query to conduct a comprehensive analysis of Superstore sales data. Our primary objectives will be to establish meaningful connections between various data sheets, ensure data quality, and calculate critical metrics such as the Cost of Goods Sold (COGS) and discount values. Below are the key steps and elements of this analysis:
1- Data Import and Transformation:
2- Data Quality Assessment:
3- Calculating COGS:
4- Discount Analysis:
5- Sales Metrics:
6- Visualization:
7- Report Generation:
Throughout this analysis, the goal is to provide a clear and comprehensive understanding of the Superstore's sales performance. By using Excel and Power Query, we can efficiently manage and analyze the data, ensuring that the insights gained contribute to the store's growth and success.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This Kaggle dataset comes from an output dataset that powers my March Madness Data Analysis dashboard in Domo. - Click here to view this dashboard: Dashboard Link - Click here to view this dashboard features in a Domo blog post: Hoops, Data, and Madness: Unveiling the Ultimate NCAA Dashboard
This dataset offers one the most robust resource you will find to discover key insights through data science and data analytics using historical NCAA Division 1 men's basketball data. This data, sourced from KenPom, goes as far back as 2002 and is updated with the latest 2025 data. This dataset is meticulously structured to provide every piece of information that I could pull from this site as an open-source tool for analysis for March Madness.
Key features of the dataset include: - Historical Data: Provides all historical KenPom data from 2002 to 2025 from the Efficiency, Four Factors (Offense & Defense), Point Distribution, Height/Experience, and Misc. Team Stats endpoints from KenPom's website. Please note that the Height/Experience data only goes as far back as 2007, but every other source contains data from 2002 onward. - Data Granularity: This dataset features an individual line item for every NCAA Division 1 men's basketball team in every season that contains every KenPom metric that you can possibly think of. This dataset has the ability to serve as a single source of truth for your March Madness analysis and provide you with the granularity necessary to perform any type of analysis you can think of. - 2025 Tournament Insights: Contains all seed and region information for the 2025 NCAA March Madness tournament. Please note that I will continually update this dataset with the seed and region information for previous tournaments as I continue to work on this dataset.
These datasets were created by downloading the raw CSV files for each season for the various sections on KenPom's website (Efficiency, Offense, Defense, Point Distribution, Summary, Miscellaneous Team Stats, and Height). All of these raw files were uploaded to Domo and imported into a dataflow using Domo's Magic ETL. In these dataflows, all of the column headers for each of the previous seasons are standardized to the current 2025 naming structure so all of the historical data can be viewed under the exact same field names. All of these cleaned datasets are then appended together, and some additional clean up takes place before ultimately creating the intermediate (INT) datasets that are uploaded to this Kaggle dataset. Once all of the INT datasets were created, I joined all of the tables together on the team name and season so all of these different metrics can be viewed under one single view. From there, I joined an NCAAM Conference & ESPN Team Name Mapping table to add a conference field in its full length and respective acronyms they are known by as well as the team name that ESPN currently uses. Please note that this reference table is an aggregated view of all of the different conferences a team has been a part of since 2002 and the different team names that KenPom has used historically, so this mapping table is necessary to map all of the teams properly and differentiate the historical conferences from their current conferences. From there, I join a reference table that includes all of the current NCAAM coaches and their active coaching lengths because the active current coaching length typically correlates to a team's success in the March Madness tournament. I also join another reference table to include the historical post-season tournament teams in the March Madness, NIT, CBI, and CIT tournaments, and I join another reference table to differentiate the teams who were ranked in the top 12 in the AP Top 25 during week 6 of the respective NCAA season. After some additional data clean-up, all of this cleaned data exports into the "DEV _ March Madness" file that contains the consolidated view of all of this data.
This dataset provides users with the flexibility to export data for further analysis in platforms such as Domo, Power BI, Tableau, Excel, and more. This dataset is designed for users who wish to conduct their own analysis, develop predictive models, or simply gain a deeper understanding of the intricacies that result in the excitement that Division 1 men's college basketball provides every year in March. Whether you are using this dataset for academic research, personal interest, or professional interest, I hope this dataset serves as a foundational tool for exploring the vast landscape of college basketball's most riveting and anticipated event of its season.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
A high-quality, clean dataset simulating global cosmetics and skincare product sales between January and August 2022. This dataset mirrors real-world transactional data, making it perfect for data analysis, Excel training, visualization projects, and machine learning prototypes.
Column Name | Description |
---|---|
Sales Person | Name of the salesperson responsible for the sale |
Country | Country or region where the sale occurred |
Product | Cosmetic or skincare product sold |
Date | Date of the transaction (format: YYYY-MM-DD) |
Amount ($) | Total revenue generated from the sale (USD) |
Boxes Shipped | Number of product boxes shipped in the order |
VLOOKUP
, IF
, AVERAGEIFS
, INDEX-MATCH
, etc.)This dataset can be used for creating an Inventory Dashboard. We can find the: - ABC Inventory Classification - XYZ Classification - Inventory Turnover Ratio - Calculation of Safety Stock - Reorder points - Stock Status Classification - Demand Forecasting on Power BI It is extremely useful for Warehouse/ In-plant Inventory Managers to effectively control the Inventory levels and also maintain the Service Levels.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is Pro Power BI Desktop : Free Interactive Data Analysis with Microsoft Power BI. It features 7 columns including author, publication date, language, and book publisher.