17 datasets found
  1. 1000 Richest People in the World

    • kaggle.com
    zip
    Updated Jul 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Waqar Ali (2024). 1000 Richest People in the World [Dataset]. https://www.kaggle.com/datasets/waqi786/1000-richest-people-in-the-world
    Explore at:
    zip(8652 bytes)Available download formats
    Dataset updated
    Jul 28, 2024
    Authors
    Waqar Ali
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset provides a synthetic overview of the 1,000 wealthiest individuals in the world, offering insights into the distribution of wealth across industries and regions. It is designed to help analysts, researchers, and data enthusiasts explore global wealth trends, industry dominance, and regional wealth concentration.

    Whether you're conducting market research, financial analysis, or data modeling, this dataset serves as a valuable resource for understanding the characteristics of the world's top billionaires.

    📊 Key Features: Name 👤: The name of the billionaire. Country 🌍: Country of residence or primary business operation. Industry 🏭: Industry in which the individual has built their wealth. Net Worth (in billions) 💵: Estimated net worth in billions of USD. Company 🏢: The primary company or business associated with the billionaire. ⚠️ Important Note: This dataset is 100% synthetic and does not contain real financial or personal data. It is artificially generated for educational, analytical, and research purposes.

  2. Forbes billionaires 2022

    • kaggle.com
    zip
    Updated Apr 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    gsusAguirreArz (2022). Forbes billionaires 2022 [Dataset]. https://www.kaggle.com/datasets/jjdaguirre/forbes-billionaires-2022/code
    Explore at:
    zip(57096 bytes)Available download formats
    Dataset updated
    Apr 30, 2022
    Authors
    gsusAguirreArz
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset containing the list of 2500+ people with fortunes valued at least 1 Billion USD.

    Dset Features

    • Rank
    • Name
    • Net Worth
    • Age
    • Country
    • Source
    • Industry

    Source

    Scrapping python script here

  3. Most valuable media & entertainment brands worldwide 2024

    • statista.com
    • de.statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julia Faria, Most valuable media & entertainment brands worldwide 2024 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Julia Faria
    Description

    In 2024, Google ranked as the most valuable media and entertainment brand worldwide, with a brand value of 683 billion U.S. dollars. Facebook ranked second, valued at around 167 billion dollars. Part of the Tencent Group, WeChat and v.qq.com (Tencent Video) had a brand value of 56 billion and 17.5 billion dollars, respectively.

  4. World Bank Dataset

    • kaggle.com
    zip
    Updated Oct 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bhadra Mohit (2024). World Bank Dataset [Dataset]. https://www.kaggle.com/datasets/bhadramohit/world-bank-dataset
    Explore at:
    zip(5074 bytes)Available download formats
    Dataset updated
    Oct 20, 2024
    Authors
    Bhadra Mohit
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    This dataset simulates a set of key economic, social, and environmental indicators for 20 countries over the period from 2010 to 2019. The dataset is designed to reflect typical World Bank metrics, which are used for analysis, policy-making, and forecasting. It includes the following variables:

    Country Name: The country for which the data is recorded. Year: The specific year of the observation (from 2010 to 2019). GDP (USD): Gross Domestic Product in billions of US dollars, indicating the economic output of a country. Population: The total population of the country in millions. Life Expectancy (in years): The average life expectancy at birth for the country’s population. Unemployment Rate (%): The percentage of the total labor force that is unemployed but actively seeking employment. CO2 Emissions (metric tons per capita): The per capita carbon dioxide emissions, reflecting environmental impact. Access to Electricity (% of population): The percentage of the population with access to electricity, representing infrastructure development. Country:

    Description: Name of the country for which the data is recorded. Data Type: String Example: "United States", "India", "Brazil" Year:

    Description: The year in which the data is observed. Data Type: Integer Range: 2010 to 2019 Example: 2012, 2015 GDP (USD):

    Description: The Gross Domestic Product of the country in billions of US dollars, indicating the economic output. Data Type: Float (billions of USD) Example: 14200.56 (represents 14,200.56 billion USD) Population:

    Description: The total population of the country in millions. Data Type: Float (millions of people) Example: 331.42 (represents 331.42 million people) Life Expectancy (in years):

    Description: The average number of years a newborn is expected to live, assuming that current mortality rates remain constant throughout their life. Data Type: Float (years) Range: Typically between 50 and 85 years Example: 78.5 years Unemployment Rate (%):

    Description: The percentage of the total labor force that is unemployed but actively seeking employment. Data Type: Float (percentage) Range: Typically between 2% and 25% Example: 6.25% CO2 Emissions (metric tons per capita):

    Description: The amount of carbon dioxide emissions per person in the country, measured in metric tons. Data Type: Float (metric tons) Range: Typically between 0.5 and 20 metric tons per capita Example: 4.32 metric tons per capita Access to Electricity (%):

    Description: The percentage of the population with access to electricity. Data Type: Float (percentage) Range: Typically between 50% and 100% Example: 95.7%

  5. Unicorn Startups

    • kaggle.com
    zip
    Updated Sep 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ram Jas (2022). Unicorn Startups [Dataset]. https://www.kaggle.com/ramjasmaurya/unicorn-startups
    Explore at:
    zip(37734 bytes)Available download formats
    Dataset updated
    Sep 5, 2022
    Authors
    Ram Jas
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    "Unicorn" is a term used in the venture capital industry to describe a privately held startup company with a value of over $1 billion. The term was first popularized by venture capitalist Aileen Lee, founder of Cowboy Ventures, a seed-stage venture capital fund based in Palo Alto, California.

    Unicorns can also refer to a recruitment phenomenon within the human resources (HR) sector. HR managers may have high expectations to fill a position, leading them to look for candidates with qualifications that are higher than required for a specific job. In essence, these managers are looking for a unicorn, which leads to a disconnect between their ideal candidate versus who they can hire from the pool of people available.

    https://64.media.tumblr.com/bac335874ef2027808929dae97be5edc/tumblr_mqseckyDQX1r8q1s0o1_500.gifv" alt="Computer man">

  6. Social media revenue of selected companies 2023

    • statista.com
    • de.statista.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Social media revenue of selected companies 2023 [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    In 2023, Meta Platforms had a total annual revenue of over 134 billion U.S. dollars, up from 116 billion in 2022. LinkedIn reported its highest annual revenue to date, generating over 15 billion USD, whilst Snapchat reported an annual revenue of 4.6 billion USD.

  7. Major Businesses with Highest Net Assets

    • kaggle.com
    zip
    Updated May 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Danish Ammar (2024). Major Businesses with Highest Net Assets [Dataset]. https://www.kaggle.com/datasets/danishammar/major-businesses-with-highest-net-assets
    Explore at:
    zip(202570 bytes)Available download formats
    Dataset updated
    May 19, 2024
    Authors
    Danish Ammar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Major Businesses with Highest Net Assets

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F14839888%2F4ec6b027f1043145b207ba3256b7841a%2FAdd%20a%20heading%20(6).png?generation=1716158861337699&alt=media" alt="">

    Introduction: This dataset provides comprehensive information on the leading companies globally by Net assets. It includes various key metrics and identifiers for each company, facilitating detailed analysis and comparisons. This dataset is gathered from companies market capital website. below i have given the details of the dataset and columns;

    About Dataset Columns: Below is a detailed description of each column in the dataset:

    1- Rank: -Description: This column shows the ranking number of the company based on its Net Assets. The rankings are in ascending order, with rank 1 representing the company with the highest market capitalization. -Data Type: Integer -Example Values: 1, 2, 3, ...

    2-Company: -Description: This column displays the full name of the company. It helps identify the company being analyzed. -Data Type: String -Example Values: "Apple Inc.", "Microsoft Corporation", "Amazon.com Inc."

    3-Stock Symbol: -Description: This column contains the stock symbols (ticker symbols) of the companies, which are used for trading on stock exchanges. This is essential for identifying the company's stock in financial markets. -Data Type: String -Example Values: "AAPL", "MSFT", "AMZN"

    4- Net Assets (USD) in Billion: -Description: This column provides the net assets of the company in billion US dollars. net assets are given here according to company balance sheets. -Data Type: Float (to handle large values with precision) -Example Values: 2.43, 1.87, 1.76

    5-Share Price: -Description: This column contains the current share price of the respective company in US dollars. It represents the price at which a single share of the company is traded on the stock market. -Data Type: Float -Example Values: 145.09, 250.35, 3400.50

    6-Company Origin: -Description: This column provides the country name where the company is headquartered. It helps in understanding the geographical distribution of the leading companies. -Data Type: String -Example Values: "United States", "China", "Germany

  8. Data Science for Good: Kiva Crowdfunding

    • kaggle.com
    zip
    Updated Mar 2, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kiva (2018). Data Science for Good: Kiva Crowdfunding [Dataset]. https://www.kaggle.com/datasets/kiva/data-science-for-good-kiva-crowdfunding/code
    Explore at:
    zip(43895508 bytes)Available download formats
    Dataset updated
    Mar 2, 2018
    Dataset authored and provided by
    Kivahttp://kiva.org/
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    [Kiva.org][1] is an online crowdfunding platform to extend financial services to poor and financially excluded people around the world. Kiva lenders have provided over $1 billion dollars in loans to over 2 million people. In order to set investment priorities, help inform lenders, and understand their target communities, knowing the level of poverty of each borrower is critical. However, this requires inference based on a limited set of information for each borrower.

    In Kaggle Datasets' inaugural [Data Science for Good][2] challenge, Kiva is inviting the Kaggle community to help them build more localized models to estimate the poverty levels of residents in the regions where Kiva has active loans. Unlike traditional machine learning competitions with rigid evaluation criteria, participants will develop their own creative approaches to addressing the objective. Instead of making a prediction file as in a supervised machine learning problem, submissions in this challenge will take the form of Python and/or R data analyses using Kernels, Kaggle's hosted Jupyter Notebooks-based workbench.

    Kiva has provided a dataset of loans issued over the last two years, and participants are invited to use this data as well as source external public datasets to help Kiva build models for assessing borrower welfare levels. Participants will write kernels on this dataset to submit as solutions to this objective and five winners will be selected by Kiva judges at the close of the event. In addition, awards will be made to encourage public code and data sharing. With a stronger understanding of their borrowers and their poverty levels, Kiva will be able to better assess and maximize the impact of their work.

    The sections that follow describe in more detail how to participate, win, and use available resources to make a contribution towards helping Kiva better understand and help entrepreneurs around the world.

    Problem Statement

    For the locations in which Kiva has active loans, your objective is to pair Kiva's data with additional data sources to estimate the welfare level of borrowers in specific regions, based on shared economic and demographic characteristics.

    A good solution would connect the features of each loan or product to one of several poverty mapping datasets, which indicate the average level of welfare in a region on as granular a level as possible. Many datasets indicate the poverty rate in a given area, with varying levels of granularity. Kiva would like to be able to disaggregate these regional averages by gender, sector, or borrowing behavior in order to estimate a Kiva borrower’s level of welfare using all of the relevant information about them. Strong submissions will attempt to map vaguely described locations to more accurate geocodes.

    Kernels submitted will be evaluated based on the following criteria:

    1. Localization - How well does a submission account for highly localized borrower situations? Leveraging a variety of external datasets and successfully building them into a single submission will be crucial.

    2. Execution - Submissions should be efficiently built and clearly explained so that Kiva’s team can readily employ them in their impact calculations.

    3. Ingenuity - While there are many best practices to learn from in the field, there is no one way of using data to assess welfare levels. It’s a challenging, nuanced field and participants should experiment with new methods and diverse datasets.

    How to Participate and [Make a Submission »][3]

    To be considered a participant in the Kiva Crowdfunding Data Science for Good Event, there are a few requirements:

    1. [Everyone must register and accept the rules by filling out this form][10] (you'll need to be logged into your Kaggle account to view the form). This ensures you're a participant and also means you'll receive update emails from us about key deadlines and announcements throughout the event.
    2. To submit a kernel for consideration in the main prize track, make sure it's public and [submit it here][11] (you'll need to be logged into your Kaggle account to view the form). [Read more details here][4].
    3. To submit a kernel or dataset for consideration in the secondary prize track, all you need to do is make sure it's public and be a registered participant before the deadline.

    [Prizes and Eligibility »][5]

    There is a total prize pool of $30,000 split into two tracks:

    • Main prize track for the primary event objective: accurate and localized analyses or methods for assessing poverty levels. ($14,000; five winners total)
    • Upvoted kernels and popular datasets to encourage public sharing of code and data ($16,000; 12 winners total)

    Main Prize Track

    Kiva will award $14,000 in total prizes to five winning authors who submit public kernels effectively tackling the objective by the deadline...

  9. Related variety and Unrelated Variety: NUTS-3 Türkiye

    • figshare.com
    xlsx
    Updated Feb 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    İbrahim Tuğrul Çınar (2025). Related variety and Unrelated Variety: NUTS-3 Türkiye [Dataset]. http://doi.org/10.6084/m9.figshare.28457789.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 21, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    İbrahim Tuğrul Çınar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Türkiye
    Description

    I present a new dataset for the related and unrelated variety at the provincial level in Turkey by employing Hidalgo, Klinger, Barabasi, and Hausmann's (2007) proximity approach following Boschma, Minondo, and Navarro (2012 and 2013). The data covers the period between 2007 and 2022.I used province-level export data obtained from the Turkish Statistical Institute (TurkStat) classified according to the four-digit Standard International Trade Classification (SITC). Global proximity values for product groups, on the other hand, were calculated by using the International Trade Database at the Product Level (BACI) dataset. First, the HS classification of BACI has been converted to four-digit SITC (revision 3) by using Eurostat Metadata Server (RAMON) conversion tables to provide data compliance.In addition, to eliminate the noise resulting from the fluctuations in prices, exchange rates, and seasonality, especially in four-digit export data, Hidalgo (2021) recommends excluding countries that export less than 1 Billion dollars, countries with a population of less than 1 Million, and products with a global export value below 500 Million dollars from the sample. After the data cleaning process, we calculated global proximity values over 139 countries.Variables descriptionColumn A: yearColumn B: Province namesColumn C: Province idColumn D: Related Variety IndexColumn E: Unrelated Variety Index

  10. m

    Personal remittances, received (current US$) - Georgia

    • macro-rankings.com
    csv, excel
    Updated Dec 31, 1997
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    macro-rankings (1997). Personal remittances, received (current US$) - Georgia [Dataset]. https://www.macro-rankings.com/georgia/personal-remittances-received-(current-us$)
    Explore at:
    excel, csvAvailable download formats
    Dataset updated
    Dec 31, 1997
    Dataset authored and provided by
    macro-rankings
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Georgia
    Description

    Time series data for the statistic Personal remittances, received (current US$) and country Georgia. Indicator Definition:Personal remittances comprise personal transfers and compensation of employees. Personal transfers consist of all current transfers in cash or in kind made or received by resident households to or from nonresident households. Personal transfers thus include all current transfers between resident and nonresident individuals. Compensation of employees refers to the income of border, seasonal, and other short-term workers who are employed in an economy where they are not resident and of residents employed by nonresident entities. Data are the sum of two items defined in the sixth edition of the IMF's Balance of Payments Manual: personal transfers and compensation of employees. Data are in current U.S. dollars.The indicator "Personal remittances, received (current US$)" stands at 4.06 Billion usd as of 12/31/2024. Regarding the One-Year-Change of the series, the current value constitutes a decrease of -3.40 percent compared to the value the year prior.The 1 year change in percent is -3.40.The 3 year change in percent is 53.50.The 5 year change in percent is 79.72.The 10 year change in percent is 104.30.The Serie's long term average value is 1.46 Billion usd. It's latest available value, on 12/31/2024, is 178.69 percent higher, compared to it's long term average value.The Serie's change in percent from it's minimum value, on 12/31/2000, to it's latest available value, on 12/31/2024, is +1,870.79%.The Serie's change in percent from it's maximum value, on 12/31/2023, to it's latest available value, on 12/31/2024, is -3.40%.

  11. m

    Personal remittances, received (current US$) - Turkey

    • macro-rankings.com
    csv, excel
    Updated Sep 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    macro-rankings (2025). Personal remittances, received (current US$) - Turkey [Dataset]. https://www.macro-rankings.com/turkey/personal-remittances-received-(current-us$)
    Explore at:
    excel, csvAvailable download formats
    Dataset updated
    Sep 14, 2025
    Dataset authored and provided by
    macro-rankings
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States, turkey
    Description

    Time series data for the statistic Personal remittances, received (current US$) and country Turkey. Indicator Definition:Personal remittances comprise personal transfers and compensation of employees. Personal transfers consist of all current transfers in cash or in kind made or received by resident households to or from nonresident households. Personal transfers thus include all current transfers between resident and nonresident individuals. Compensation of employees refers to the income of border, seasonal, and other short-term workers who are employed in an economy where they are not resident and of residents employed by nonresident entities. Data are the sum of two items defined in the sixth edition of the IMF's Balance of Payments Manual: personal transfers and compensation of employees. Data are in current U.S. dollars.The indicator "Personal remittances, received (current US$)" stands at 0.982 Billion usd as of 12/31/2024. Regarding the One-Year-Change of the series, the current value constitutes a decrease of -4.57 percent compared to the value the year prior.The 1 year change in percent is -4.57.The 3 year change in percent is 35.26.The 5 year change in percent is 14.72.The 10 year change in percent is -43.53.The Serie's long term average value is 2.01 Billion usd. It's latest available value, on 12/31/2024, is 51.23 percent lower, compared to it's long term average value.The Serie's change in percent from it's minimum value, on 12/31/2022, to it's latest available value, on 12/31/2024, is +41.09%.The Serie's change in percent from it's maximum value, on 12/31/1998, to it's latest available value, on 12/31/2024, is -81.67%.

  12. Global Startup Success Dataset

    • kaggle.com
    zip
    Updated Mar 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hamna Kaleem (2025). Global Startup Success Dataset [Dataset]. https://www.kaggle.com/datasets/hamnakaleemds/global-startup-success-dataset
    Explore at:
    zip(131742 bytes)Available download formats
    Dataset updated
    Mar 1, 2025
    Authors
    Hamna Kaleem
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    📊 Dataset Features This dataset includes 5,000 startups from 10 countries and contains 15 key features: Startup Name: Name of the startup Founded Year: Year the startup was founded Country: Country where the startup is based Industry: Industry category (Tech, FinTech, AI, etc.) Funding Stage: Stage of investment (Seed, Series A, etc.) Total Funding ($M): Total funding received (in million $) Number of Employees: Number of employees in the startup Annual Revenue ($M): Annual revenue in million dollars Valuation ($B): Startup's valuation in billion dollars Success Score: Score from 1 to 10 based on growth Acquired?: Whether the startup was acquired (Yes/No) IPO?: Did the startup go public? (Yes/No) Customer Base (Millions): Number of active customers Tech Stack: Technologies used by the startup Social Media Followers: Total followers on social platforms Analysis Ideas 📈 What Can You Do with This Dataset? Here are some exciting analyses you can perform:

    Predict Startup Success: Train a machine learning model to predict the success score. Industry Trends: Analyze which industries get the most funding. **Valuation vs. Funding: **Explore the correlation between funding and valuation. Acquisition Analysis: Investigate the factors that contribute to startups being acquired.

  13. Diabetes Health Indicators Dataset

    • kaggle.com
    zip
    Updated Nov 8, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex Teboul (2021). Diabetes Health Indicators Dataset [Dataset]. https://www.kaggle.com/datasets/alexteboul/diabetes-health-indicators-dataset/discussion
    Explore at:
    zip(6324278 bytes)Available download formats
    Dataset updated
    Nov 8, 2021
    Authors
    Alex Teboul
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Diabetes is among the most prevalent chronic diseases in the United States, impacting millions of Americans each year and exerting a significant financial burden on the economy. Diabetes is a serious chronic disease in which individuals lose the ability to effectively regulate levels of glucose in the blood, and can lead to reduced quality of life and life expectancy. After different foods are broken down into sugars during digestion, the sugars are then released into the bloodstream. This signals the pancreas to release insulin. Insulin helps enable cells within the body to use those sugars in the bloodstream for energy. Diabetes is generally characterized by either the body not making enough insulin or being unable to use the insulin that is made as effectively as needed.

    Complications like heart disease, vision loss, lower-limb amputation, and kidney disease are associated with chronically high levels of sugar remaining in the bloodstream for those with diabetes. While there is no cure for diabetes, strategies like losing weight, eating healthily, being active, and receiving medical treatments can mitigate the harms of this disease in many patients. Early diagnosis can lead to lifestyle changes and more effective treatment, making predictive models for diabetes risk important tools for public and public health officials.

    The scale of this problem is also important to recognize. The Centers for Disease Control and Prevention has indicated that as of 2018, 34.2 million Americans have diabetes and 88 million have prediabetes. Furthermore, the CDC estimates that 1 in 5 diabetics, and roughly 8 in 10 prediabetics are unaware of their risk. While there are different types of diabetes, type II diabetes is the most common form and its prevalence varies by age, education, income, location, race, and other social determinants of health. Much of the burden of the disease falls on those of lower socioeconomic status as well. Diabetes also places a massive burden on the economy, with diagnosed diabetes costs of roughly $327 billion dollars and total costs with undiagnosed diabetes and prediabetes approaching $400 billion dollars annually.

    Content

    The Behavioral Risk Factor Surveillance System (BRFSS) is a health-related telephone survey that is collected annually by the CDC. Each year, the survey collects responses from over 400,000 Americans on health-related risk behaviors, chronic health conditions, and the use of preventative services. It has been conducted every year since 1984. For this project, a csv of the dataset available on Kaggle for the year 2015 was used. This original dataset contains responses from 441,455 individuals and has 330 features. These features are either questions directly asked of participants, or calculated variables based on individual participant responses.

    This dataset contains 3 files: 1. diabetes _ 012 _ health _ indicators _ BRFSS2015.csv is a clean dataset of 253,680 survey responses to the CDC's BRFSS2015. The target variable Diabetes_012 has 3 classes. 0 is for no diabetes or only during pregnancy, 1 is for prediabetes, and 2 is for diabetes. There is class imbalance in this dataset. This dataset has 21 feature variables 2. diabetes _ binary _ 5050split _ health _ indicators _ BRFSS2015.csv is a clean dataset of 70,692 survey responses to the CDC's BRFSS2015. It has an equal 50-50 split of respondents with no diabetes and with either prediabetes or diabetes. The target variable Diabetes_binary has 2 classes. 0 is for no diabetes, and 1 is for prediabetes or diabetes. This dataset has 21 feature variables and is balanced. 3. diabetes _ binary _ health _ indicators _ BRFSS2015.csv is a clean dataset of 253,680 survey responses to the CDC's BRFSS2015. The target variable Diabetes_binary has 2 classes. 0 is for no diabetes, and 1 is for prediabetes or diabetes. This dataset has 21 feature variables and is not balanced.

    Explore some of the following research questions: 1. Can survey questions from the BRFSS provide accurate predictions of whether an individual has diabetes? 2. What risk factors are most predictive of diabetes risk? 3. Can we use a subset of the risk factors to accurately predict whether an individual has diabetes? 4. Can we create a short form of questions from the BRFSS using feature selection to accurately predict if someone might have diabetes or is at high risk of diabetes?

    Acknowledgements

    It it important to reiterate that I did not create this dataset, it is just a cleaned and consolidated dataset created from the BRFSS 2015 dataset already on Kaggle. That dataset can be found here and the notebook I used for the data cleaning can be found here.

    Inspiration

    Zidian Xie et al fo...

  14. Forecast revenue big data market worldwide 2011-2027

    • statista.com
    Updated Mar 15, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2018). Forecast revenue big data market worldwide 2011-2027 [Dataset]. https://www.statista.com/statistics/254266/global-big-data-market-forecast/
    Explore at:
    Dataset updated
    Mar 15, 2018
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    The global big data market is forecasted to grow to 103 billion U.S. dollars by 2027, more than double its expected market size in 2018. With a share of 45 percent, the software segment would become the large big data market segment by 2027. What is Big data? Big data is a term that refers to the kind of data sets that are too large or too complex for traditional data processing applications. It is defined as having one or some of the following characteristics: high volume, high velocity or high variety. Fast-growing mobile data traffic, cloud computing traffic, as well as the rapid development of technologies such as artificial intelligence (AI) and the Internet of Things (IoT) all contribute to the increasing volume and complexity of data sets. Big data analytics Advanced analytics tools, such as predictive analytics and data mining, help to extract value from the data and generate new business insights. The global big data and business analytics market was valued at 169 billion U.S. dollars in 2018 and is expected to grow to 274 billion U.S. dollars in 2022. As of November 2018, 45 percent of professionals in the market research industry reportedly used big data analytics as a research method.

  15. Ukraine Financial Aid 2024 (Russia-Ukraine War)

    • kaggle.com
    zip
    Updated May 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    jaina (2024). Ukraine Financial Aid 2024 (Russia-Ukraine War) [Dataset]. https://www.kaggle.com/datasets/jainaru/financial-aid-to-ukraine/data
    Explore at:
    zip(3863 bytes)Available download formats
    Dataset updated
    May 19, 2024
    Authors
    jaina
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Ukraine, Russia
    Description

    This dataset provides a comprehensive overview of the financial, humanitarian, and military commitments and allocations made by various countries, with a particular focus on their contributions to the European Union (EU). The dataset includes data from both EU member countries and non-EU countries, detailing their pledges and actual disbursements in 2021.

    Description of Columns

    Financial Commitments.csv 1. Country: The name of the country. 2. EU member: Indicates whether the country is a member of the European Union (1 for EU member, 0 for non-EU member). 3. GDP in 2021 ($ billion): The Gross Domestic Product (GDP) of the country in 2021. 4. Financial commitments ($ billion): The pledged financial aid commitments made by the country/ 5. Humanitarian commitments ($ billion): The pledged humanitarian aid commitments made by the country. 6. Military commitments ($ billion): The pledged military aid commitments made by the country. 7. Total bilateral commitments ($ billion): The total amount of bilateral commitments (financial, humanitarian, and military) pledged by the country, in billions of dollars. This is the sum of the financial, humanitarian, and military commitments. 8. Share in EU commitments ($ billion): The country's share in the total EU commitments, in billions of dollars. This indicates how much of the EU's total aid commitments are contributed by this particular country. 9. Specific weapons and equipment ($ billion): The amount allocated by the country for specific weapons and equipment. 10. Financial commitments with military purpose ($ billion): The financial commitments specifically designated for military purposes. 11. Total bilateral commitments of short term ($ billion): The total amount of short-term bilateral commitments made by the country.

    Financial Allocations.csv 1. Country: The name of the country. 2. EU member: Indicates whether the country is a member of the European Union (1 for EU member, 0 for non-EU member). 3. Financial allocations ($ billion): The actual financial aid allocated by the country. 4. Humanitarian allocations ($ billion): The actual humanitarian aid allocated by the country. 5. Military allocations ($ billion): The actual military aid allocated by the country. 6. Total bilateral allocations ($ billion): The total amount of bilateral aid (financial, humanitarian, and military) allocated by the country, in billions of dollars. This is the sum of the financial, humanitarian, and military allocations. 7. Share in EU allocations ($ billion): The country's share in the total EU allocations, in billions of dollars. This indicates how much of the EU's total aid allocations are contributed by this particular country.

  16. Bangladesh Remittances Dataset (1995–2025)

    • kaggle.com
    zip
    Updated Jul 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hasanul Banna Himel (2025). Bangladesh Remittances Dataset (1995–2025) [Dataset]. https://www.kaggle.com/datasets/hasanulbannahimel/bangladesh-remittances-dataset-19952025
    Explore at:
    zip(1024 bytes)Available download formats
    Dataset updated
    Jul 30, 2025
    Authors
    Hasanul Banna Himel
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    Bangladesh
    Description

    Bangladesh Remittances Dataset (1995–2025)

    Overview

    This dataset contains annual remittance inflows to Bangladesh covering fiscal years 1995–1996 to 2024–2025 (latest year may include preliminary data).
    It includes remittance amounts both in: - Million US dollars (USD) - Billion Bangladeshi Taka (BDT)

    Additionally, it features calculated columns to capture yearly changes and long-term growth trends relative to the first year.

    Columns & Descriptions

    Column nameDescription
    YearFiscal year (e.g., 2024–2025). Each row represents one year's data.
    Remittances (million USD)Total remittance inflow to Bangladesh for the year, expressed in million US dollars.
    Remittances (billion BDT)Remittance inflow converted into billion Bangladeshi Taka (BDT), based on average yearly exchange rates.
    YoY Change (%)Year-over-year percentage change in remittance compared to the previous fiscal year. Positive values indicate increase; negative indicate decrease.
    Cumulative Growth vs. 1995–1996 (%)Percentage growth in remittances compared to the base year (1995–1996). Shows how remittance has multiplied over nearly three decades.

    Time Coverage

    • From: Fiscal year 1995–1996
    • To: Fiscal year 2024–2025 (* may include estimates or projections)

    Potential Uses

    • Visualize long-term remittance growth trends in Bangladesh.
    • Identify periods of rapid growth, stagnation, or decline.
    • Compare remittance inflows in USD vs. BDT across time.
    • Use cumulative growth metrics to analyze economic impact.

    ⚠ Limitations

    • Dataset only includes remittance amounts and derived growth metrics.
    • Does not include supporting context like exchange rates, number of migrants, GDP, or demographic data.
    • Latest year may reflect projections, subject to later revision.

    Note

    This dataset is prepared for exploratory data analysis, economic research, and educational purposes focused on the evolution of remittance inflows to Bangladesh from the mid-1990s to the mid-2020s.

  17. AI Workforce & Automation Dataset (2015–2025)

    • kaggle.com
    zip
    Updated Nov 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emirhan Akkuş (2025). AI Workforce & Automation Dataset (2015–2025) [Dataset]. https://www.kaggle.com/emirhanakku/ai-workforce-and-automation-dataset-20152025
    Explore at:
    zip(7409 bytes)Available download formats
    Dataset updated
    Nov 16, 2025
    Authors
    Emirhan Akkuş
    Description

    Dataset Overview

    AttributeDetails
    Time Span2015–2025
    Countries Included20 global economies
    Total Records220 rows
    Total Features12 quantitative & qualitative attributes
    Data TypeSynthetic, statistically coherent
    Tools UsedPython (Faker, NumPy, Pandas)
    LicenseCC BY-NC 4.0 – Attribution Non-Commercial
    CreatorEmirhan Akkuş – Kaggle Expert

    This dataset provides a macro-level simulation of how artificial intelligence and automation have transformed global workforce dynamics, productivity growth, and job distribution during the last decade. It is designed for predictive analytics, forecasting, visualization, and policy research applications.

    Data Generation Process | Step | Description | | :-------------------------- | :---------------------------------------------------------------------------------------------------------------- | | 1. Initialization | A baseline AI investment and automation rate were defined for each country (between 5–80 billion USD and 10–40%). | | 2. Temporal Simulation | Yearly values were simulated for 2015–2025 using exponential and non-linear growth models with controlled noise. | | 3. Correlation Modeling | Employment, productivity, and salary were dynamically linked to automation and AI investment levels. | | 4. Randomization | Gaussian noise (±2%) was introduced to prevent perfect correlation and ensure natural variability. | | 5. Policy Simulation | Synthetic indexes were calculated for AI readiness, policy maturity, and reskilling investment efforts. | | 6. Export | Final data were consolidated and exported to CSV using Pandas for easy reproducibility. |

    The dataset was generated to maintain internal coherence — as automation and AI investment increase, employment tends to slightly decline, productivity grows, and reskilling budgets expand proportionally.

    Column Definitions | Column | Description | Value Range / Type | | :----------------------------------- | :---------------------------------------------- | :------------------------- | | Year | Observation year between 2015–2025 | Integer | | Country | Country name | Categorical (20 unique) | | AI_Investment_BillionUSD | Annual AI investment (in billions of USD) | Continuous (5–200) | | Automation_Rate_Percent | Percentage of workforce automated | Continuous (10–95%) | | Employment_Rate_Percent | Percentage of total population employed | Continuous (50–80%) | | Average_Salary_USD | Mean annual salary in USD | Continuous (25,000–90,000) | | Productivity_Index | Productivity score scaled 0–100 | Continuous | | Reskilling_Investment_MillionUSD | Government/corporate reskilling investment | Continuous (100–5,000) | | AI_Policy_Index | Policy readiness index (0–1) | Float | | Job_Displacement_Million | Estimated number of jobs replaced by automation | Continuous (0–3 million) | | Job_Creation_Million | New AI-driven jobs created | Continuous (0–4 million) | | AI_Readiness_Score | Composite readiness and adoption index | Continuous (0–100) |

    Each feature is designed to maintain realistic relationships between AI investments, automation, and socio-economic outcomes.

    Analytical Applications | Application Area | Example Analyses | | :---------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | Exploratory Data Analysis (EDA) | Study how AI investment evolves across countries, compare productivity and employment patterns, or compute correlation...

  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Waqar Ali (2024). 1000 Richest People in the World [Dataset]. https://www.kaggle.com/datasets/waqi786/1000-richest-people-in-the-world
Organization logo

1000 Richest People in the World

A Comprehensive Dataset of the World's Wealthiest Individuals 🌍

Explore at:
3 scholarly articles cite this dataset (View in Google Scholar)
zip(8652 bytes)Available download formats
Dataset updated
Jul 28, 2024
Authors
Waqar Ali
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

This dataset provides a synthetic overview of the 1,000 wealthiest individuals in the world, offering insights into the distribution of wealth across industries and regions. It is designed to help analysts, researchers, and data enthusiasts explore global wealth trends, industry dominance, and regional wealth concentration.

Whether you're conducting market research, financial analysis, or data modeling, this dataset serves as a valuable resource for understanding the characteristics of the world's top billionaires.

📊 Key Features: Name 👤: The name of the billionaire. Country 🌍: Country of residence or primary business operation. Industry 🏭: Industry in which the individual has built their wealth. Net Worth (in billions) 💵: Estimated net worth in billions of USD. Company 🏢: The primary company or business associated with the billionaire. ⚠️ Important Note: This dataset is 100% synthetic and does not contain real financial or personal data. It is artificially generated for educational, analytical, and research purposes.

Search
Clear search
Close search
Google apps
Main menu