6 datasets found
  1. March Madness Historical DataSet (2002 to 2025)

    • kaggle.com
    Updated Apr 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Pilafas (2025). March Madness Historical DataSet (2002 to 2025) [Dataset]. https://www.kaggle.com/datasets/jonathanpilafas/2024-march-madness-statistical-analysis/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 22, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jonathan Pilafas
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This Kaggle dataset comes from an output dataset that powers my March Madness Data Analysis dashboard in Domo. - Click here to view this dashboard: Dashboard Link - Click here to view this dashboard features in a Domo blog post: Hoops, Data, and Madness: Unveiling the Ultimate NCAA Dashboard

    This dataset offers one the most robust resource you will find to discover key insights through data science and data analytics using historical NCAA Division 1 men's basketball data. This data, sourced from KenPom, goes as far back as 2002 and is updated with the latest 2025 data. This dataset is meticulously structured to provide every piece of information that I could pull from this site as an open-source tool for analysis for March Madness.

    Key features of the dataset include: - Historical Data: Provides all historical KenPom data from 2002 to 2025 from the Efficiency, Four Factors (Offense & Defense), Point Distribution, Height/Experience, and Misc. Team Stats endpoints from KenPom's website. Please note that the Height/Experience data only goes as far back as 2007, but every other source contains data from 2002 onward. - Data Granularity: This dataset features an individual line item for every NCAA Division 1 men's basketball team in every season that contains every KenPom metric that you can possibly think of. This dataset has the ability to serve as a single source of truth for your March Madness analysis and provide you with the granularity necessary to perform any type of analysis you can think of. - 2025 Tournament Insights: Contains all seed and region information for the 2025 NCAA March Madness tournament. Please note that I will continually update this dataset with the seed and region information for previous tournaments as I continue to work on this dataset.

    These datasets were created by downloading the raw CSV files for each season for the various sections on KenPom's website (Efficiency, Offense, Defense, Point Distribution, Summary, Miscellaneous Team Stats, and Height). All of these raw files were uploaded to Domo and imported into a dataflow using Domo's Magic ETL. In these dataflows, all of the column headers for each of the previous seasons are standardized to the current 2025 naming structure so all of the historical data can be viewed under the exact same field names. All of these cleaned datasets are then appended together, and some additional clean up takes place before ultimately creating the intermediate (INT) datasets that are uploaded to this Kaggle dataset. Once all of the INT datasets were created, I joined all of the tables together on the team name and season so all of these different metrics can be viewed under one single view. From there, I joined an NCAAM Conference & ESPN Team Name Mapping table to add a conference field in its full length and respective acronyms they are known by as well as the team name that ESPN currently uses. Please note that this reference table is an aggregated view of all of the different conferences a team has been a part of since 2002 and the different team names that KenPom has used historically, so this mapping table is necessary to map all of the teams properly and differentiate the historical conferences from their current conferences. From there, I join a reference table that includes all of the current NCAAM coaches and their active coaching lengths because the active current coaching length typically correlates to a team's success in the March Madness tournament. I also join another reference table to include the historical post-season tournament teams in the March Madness, NIT, CBI, and CIT tournaments, and I join another reference table to differentiate the teams who were ranked in the top 12 in the AP Top 25 during week 6 of the respective NCAA season. After some additional data clean-up, all of this cleaned data exports into the "DEV _ March Madness" file that contains the consolidated view of all of this data.

    This dataset provides users with the flexibility to export data for further analysis in platforms such as Domo, Power BI, Tableau, Excel, and more. This dataset is designed for users who wish to conduct their own analysis, develop predictive models, or simply gain a deeper understanding of the intricacies that result in the excitement that Division 1 men's college basketball provides every year in March. Whether you are using this dataset for academic research, personal interest, or professional interest, I hope this dataset serves as a foundational tool for exploring the vast landscape of college basketball's most riveting and anticipated event of its season.

  2. Superstore Sales Analysis

    • kaggle.com
    Updated Oct 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ali Reda Elblgihy (2023). Superstore Sales Analysis [Dataset]. https://www.kaggle.com/datasets/aliredaelblgihy/superstore-sales-analysis/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 21, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ali Reda Elblgihy
    Description

    Analyzing sales data is essential for any business looking to make informed decisions and optimize its operations. In this project, we will utilize Microsoft Excel and Power Query to conduct a comprehensive analysis of Superstore sales data. Our primary objectives will be to establish meaningful connections between various data sheets, ensure data quality, and calculate critical metrics such as the Cost of Goods Sold (COGS) and discount values. Below are the key steps and elements of this analysis:

    1- Data Import and Transformation:

    • Gather and import relevant sales data from various sources into Excel.
    • Utilize Power Query to clean, transform, and structure the data for analysis.
    • Merge and link different data sheets to create a cohesive dataset, ensuring that all data fields are connected logically.

    2- Data Quality Assessment:

    • Perform data quality checks to identify and address issues like missing values, duplicates, outliers, and data inconsistencies.
    • Standardize data formats and ensure that all data is in a consistent, usable state.

    3- Calculating COGS:

    • Determine the Cost of Goods Sold (COGS) for each product sold by considering factors like purchase price, shipping costs, and any additional expenses.
    • Apply appropriate formulas and calculations to determine COGS accurately.

    4- Discount Analysis:

    • Analyze the discount values offered on products to understand their impact on sales and profitability.
    • Calculate the average discount percentage, identify trends, and visualize the data using charts or graphs.

    5- Sales Metrics:

    • Calculate and analyze various sales metrics, such as total revenue, profit margins, and sales growth.
    • Utilize Excel functions to compute these metrics and create visuals for better insights.

    6- Visualization:

    • Create visualizations, such as charts, graphs, and pivot tables, to present the data in an understandable and actionable format.
    • Visual representations can help identify trends, outliers, and patterns in the data.

    7- Report Generation:

    • Compile the findings and insights into a well-structured report or dashboard, making it easy for stakeholders to understand and make informed decisions.

    Throughout this analysis, the goal is to provide a clear and comprehensive understanding of the Superstore's sales performance. By using Excel and Power Query, we can efficiently manage and analyze the data, ensuring that the insights gained contribute to the store's growth and success.

  3. B

    Business Intelligence Market Report

    • promarketreports.com
    doc, pdf, ppt
    Updated Jan 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pro Market Reports (2025). Business Intelligence Market Report [Dataset]. https://www.promarketreports.com/reports/business-intelligence-market-9138
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Jan 30, 2025
    Dataset authored and provided by
    Pro Market Reports
    License

    https://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The size of the Business Intelligence Market was valued at USD 33.12 Billion in 2024 and is projected to reach USD 70.38 Billion by 2033, with an expected CAGR of 11.37% during the forecast period. The Business Intelligence (BI) market is witnessing significant growth as organizations increasingly rely on data-driven strategies to enhance decision-making and operational efficiency. With the rising adoption of cloud computing, big data analytics, and artificial intelligence, BI tools are evolving to provide real-time insights and predictive analytics. Companies across industries, including healthcare, retail, finance, and manufacturing, are leveraging BI solutions to optimize processes, improve customer experiences, and gain a competitive edge. The market is fueled by the need for self-service analytics, data visualization, and integration of BI platforms with enterprise resource planning (ERP) and customer relationship management (CRM) systems. Additionally, advancements in machine learning and automation are further enhancing BI capabilities, enabling businesses to extract actionable insights from vast datasets. Small and medium-sized enterprises (SMEs) are also adopting BI solutions to streamline operations and enhance agility. However, challenges such as data security concerns, high implementation costs, and integration complexities persist. As organizations continue prioritizing digital transformation, the BI market is expected to expand further, with innovations in augmented analytics and embedded BI shaping its future landscape. Recent developments include: January 2023: Microsoft unveiled Power BI enhanced experiences in Microsoft Teams in January 2023. The three new features announced are rich broadcasting cards for Conversation in Microsoft Teams and an upgrade for old Power BI tabs for taking notes and learning from experiences and needs., December 2022: Tableau 2022.4 was released in December 2022 for customers and researchers to explore information. It automates creating, analyzing, and communicating insights through data stories, including Data Change Radar, Information Guide, and Explaining the Viz., October 2022: Oracle increased inclusive and included data and analytics capabilities in October 2022 to empower business users. With the extra stuff in Oracle Fusion Analytics for ERP, CX, HCM, and SCM data analysis, business users can track performance against corporate objectives using visualizations, KPIs, and analytics.. Key drivers for this market are: Growing Volume of Data: The increasing generation of data from various sources drives the need for effective data management and analysis capabilities.

    Demand for Real-Time Insights: Businesses require real-time data insights to make timely decisions and respond to market changes effectively.

    Adoption of Cloud-Based Solutions: Cloud-based BI solutions offer flexibility, cost-effectiveness, and scalability, driving their adoption.. Potential restraints include: Data Security and Privacy Concerns: The handling and storage of sensitive data raise concerns about data breaches and privacy violations.

    Integration Complexity: Integrating BI systems with other enterprise applications and data sources can be complex and time-consuming.

    Skill Shortage: The lack of skilled professionals with expertise in data analysis and business intelligence poses a challenge.. Notable trends are: Cognitive BI: BI tools are incorporating cognitive technologies to automate data analysis and provide personalized insights.

    Predictive Analytics: BI platforms are leveraging predictive analytics to anticipate future events and trends.

    Self-Service BI: Self-service BI empowers business users to create their own reports and analyses without the need for technical assistance.

    Natural Language Processing (NLP): NLP capabilities enable users to interact with BI tools using natural language queries..

  4. Los Angeles Census Tracts (500 Cities): Local Data for Better Health, 2017...

    • metropolis.demo.socrata.com
    csv, xlsx, xml
    Updated May 12, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Division of Population Health (2018). Los Angeles Census Tracts (500 Cities): Local Data for Better Health, 2017 release for Power BI OData Demo [Dataset]. https://metropolis.demo.socrata.com/Health/Los-Angeles-Census-Tracts-500-Cities-Local-Data-fo/5tyu-tf6k
    Explore at:
    xlsx, xml, csvAvailable download formats
    Dataset updated
    May 12, 2018
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    Authors
    Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Division of Population Health
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Area covered
    Los Angeles
    Description

    This is the filtered dataset of LA Census Tracts from the 500 Cities project 2017 release. This dataset includes 2015, 2014 model-based small area estimates for 27 measures of chronic disease related to unhealthy behaviors (5), health outcomes (13), and use of preventive services (9). Data were provided by the Centers for Disease Control and Prevention (CDC), Division of Population Health, Epidemiology and Surveillance Branch. The project was funded by the Robert Wood Johnson Foundation (RWJF) in conjunction with the CDC Foundation. It represents a first-of-its kind effort to release information on a large scale for cities and for small areas within those cities. It includes estimates for the 500 largest US cities and approximately 28,000 census tracts within these cities. These estimates can be used to identify emerging health problems and to inform development and implementation of effective, targeted public health prevention activities. Because the small area model cannot detect effects due to local interventions, users are cautioned against using these estimates for program or policy evaluations. Data sources used to generate these measures include Behavioral Risk Factor Surveillance System (BRFSS) data (2015, 2014), Census Bureau 2010 census population data, and American Community Survey (ACS) 2011-2015, 2010-2014 estimates. Because some questions are only asked every other year in the BRFSS, there are 7 measures from the 2014 BRFSS that are the same in the 2017 release as the previous 2016 release. More information about the methodology can be found at www.cdc.gov/500cities.

  5. U

    US Business Intelligence Market Report

    • promarketreports.com
    doc, pdf, ppt
    Updated Jan 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pro Market Reports (2025). US Business Intelligence Market Report [Dataset]. https://www.promarketreports.com/reports/us-business-intelligence-market-8130
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Jan 8, 2025
    Dataset authored and provided by
    Pro Market Reports
    License

    https://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    United States, Global
    Variables measured
    Market Size
    Description

    The size of the US Business Intelligence Market was valued at USD 19942.01 million in 2023 and is projected to reach USD 38369.43 million by 2032, with an expected CAGR of 9.80% during the forecast period. Business Intelligence (BI) refers to the technologies, processes, and practices used to collect, analyze, and present business data in a meaningful way to support decision-making within an organization. BI involves a wide range of tools and techniques, including data mining, reporting, performance management, analytics, and querying, to convert raw data into actionable insights. By integrating data from various sources such as internal databases, external data providers, and cloud platforms, BI enables companies to gain a comprehensive view of their operations, market trends, customer behavior, and financial performance. This growth is driven by factors such as the increasing adoption of data-driven decision-making, the need for real-time insights, and advancements in artificial intelligence (AI) and machine learning (ML) technologies. The market benefits from the integration of BI with other technologies such as cloud computing, big data, and the Internet of Things (IoT). Additionally, government initiatives promoting data transparency and accountability, as well as rising data security concerns, are contributing to the growth of the US Business Intelligence Market. Recent developments include: In January 2023, Microsoft launched Power Bl in Microsoft Teams to enhance user experiences. The announcements include three new features: rich broadcast cards for Chat in Microsoft Teams, an update for classic Power Bl tabs for Channels 2.0, and listening to and learning from experiences and requirements., In December 2022, Tableau released its improved Tableau 2022.4 for business users and analysts to discover insights. It automates the creation, analysis, and communication of insights through data stories like Data Change Radar, Data Guide, and Explain the Viz., In November 2022, Qlik introduced a new cloud-based data integration platform. The sophisticated platform as a service brings together catalog capabilities and data preparation in one place. The new integration enables firms to do real-time data analysis. The advanced platform includes a number of services that combine to form a data fabric, connecting data sources and providing an organization with an integrated view of its data.. Notable trends are: Increased capital infusion promotes market growth.

  6. O

    Open Database Connectivity Market Report

    • promarketreports.com
    doc, pdf, ppt
    Updated Jan 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pro Market Reports (2025). Open Database Connectivity Market Report [Dataset]. https://www.promarketreports.com/reports/open-database-connectivity-market-7981
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Jan 6, 2025
    Dataset authored and provided by
    Pro Market Reports
    License

    https://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The ODBC market offers a range of products to cater to diverse customer requirements. Multi-tier ODBC drivers are designed for complex data environments that require connectivity to multiple databases simultaneously. Single-tier ODBC drivers are suitable for simpler data environments where connectivity to a single database is sufficient. Cloud-based ODBC solutions provide the benefits of cloud computing, such as scalability, flexibility, and ease of maintenance. On-premise ODBC solutions offer greater control and customization options for organizations with specific data management requirements. Recent developments include: April 2023: Amazon DocumentDB (with MongoDB compatibility) is a scalable, incredibly durable, fully managed database service for running mission-critical MongoDB workloads. Amazon DocumentDB recently disclosed a new ODBC connection that allows Microsoft Excel and PowerBI to connect to Amazon DocumentDB clusters. With the ODBC connector, anyone may now query and view data stored in DocumentDB from programs that allow ODBC access., February 2021: The open-source data networking technology called Apache Arrow Flight, which Dremio co-developed and which dramatically increases data transmission rates, will now be supported by Dremio, a pioneer in the field of data lake transformation. Using over ten-year-old technologies like Java Database Connectivity (JDBC) and Open Database Connectivity (ODBC), client applications can now communicate with Dremio's data lake service more swiftly than they could previously..

  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jonathan Pilafas (2025). March Madness Historical DataSet (2002 to 2025) [Dataset]. https://www.kaggle.com/datasets/jonathanpilafas/2024-march-madness-statistical-analysis/discussion?sort=undefined
Organization logo

March Madness Historical DataSet (2002 to 2025)

March Madness Analytics: Insights and Projections using Historical KenPom Data

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 22, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jonathan Pilafas
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

This Kaggle dataset comes from an output dataset that powers my March Madness Data Analysis dashboard in Domo. - Click here to view this dashboard: Dashboard Link - Click here to view this dashboard features in a Domo blog post: Hoops, Data, and Madness: Unveiling the Ultimate NCAA Dashboard

This dataset offers one the most robust resource you will find to discover key insights through data science and data analytics using historical NCAA Division 1 men's basketball data. This data, sourced from KenPom, goes as far back as 2002 and is updated with the latest 2025 data. This dataset is meticulously structured to provide every piece of information that I could pull from this site as an open-source tool for analysis for March Madness.

Key features of the dataset include: - Historical Data: Provides all historical KenPom data from 2002 to 2025 from the Efficiency, Four Factors (Offense & Defense), Point Distribution, Height/Experience, and Misc. Team Stats endpoints from KenPom's website. Please note that the Height/Experience data only goes as far back as 2007, but every other source contains data from 2002 onward. - Data Granularity: This dataset features an individual line item for every NCAA Division 1 men's basketball team in every season that contains every KenPom metric that you can possibly think of. This dataset has the ability to serve as a single source of truth for your March Madness analysis and provide you with the granularity necessary to perform any type of analysis you can think of. - 2025 Tournament Insights: Contains all seed and region information for the 2025 NCAA March Madness tournament. Please note that I will continually update this dataset with the seed and region information for previous tournaments as I continue to work on this dataset.

These datasets were created by downloading the raw CSV files for each season for the various sections on KenPom's website (Efficiency, Offense, Defense, Point Distribution, Summary, Miscellaneous Team Stats, and Height). All of these raw files were uploaded to Domo and imported into a dataflow using Domo's Magic ETL. In these dataflows, all of the column headers for each of the previous seasons are standardized to the current 2025 naming structure so all of the historical data can be viewed under the exact same field names. All of these cleaned datasets are then appended together, and some additional clean up takes place before ultimately creating the intermediate (INT) datasets that are uploaded to this Kaggle dataset. Once all of the INT datasets were created, I joined all of the tables together on the team name and season so all of these different metrics can be viewed under one single view. From there, I joined an NCAAM Conference & ESPN Team Name Mapping table to add a conference field in its full length and respective acronyms they are known by as well as the team name that ESPN currently uses. Please note that this reference table is an aggregated view of all of the different conferences a team has been a part of since 2002 and the different team names that KenPom has used historically, so this mapping table is necessary to map all of the teams properly and differentiate the historical conferences from their current conferences. From there, I join a reference table that includes all of the current NCAAM coaches and their active coaching lengths because the active current coaching length typically correlates to a team's success in the March Madness tournament. I also join another reference table to include the historical post-season tournament teams in the March Madness, NIT, CBI, and CIT tournaments, and I join another reference table to differentiate the teams who were ranked in the top 12 in the AP Top 25 during week 6 of the respective NCAA season. After some additional data clean-up, all of this cleaned data exports into the "DEV _ March Madness" file that contains the consolidated view of all of this data.

This dataset provides users with the flexibility to export data for further analysis in platforms such as Domo, Power BI, Tableau, Excel, and more. This dataset is designed for users who wish to conduct their own analysis, develop predictive models, or simply gain a deeper understanding of the intricacies that result in the excitement that Division 1 men's college basketball provides every year in March. Whether you are using this dataset for academic research, personal interest, or professional interest, I hope this dataset serves as a foundational tool for exploring the vast landscape of college basketball's most riveting and anticipated event of its season.

Search
Clear search
Close search
Google apps
Main menu