74 datasets found
  1. c

    Billionaires Statistics (2023) Dataset

    • cubig.ai
    Updated Apr 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2023). Billionaires Statistics (2023) Dataset [Dataset]. https://cubig.ai/store/products/552/billionaires-statistics-2023-dataset
    Explore at:
    Dataset updated
    Apr 4, 2023
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
    Description

    1) Data Introduction • The Billionaires Statistics Dataset (2023) is a comprehensive set of personal and business information, including rankings of billionaires worldwide, net assets, industries, businesses, nationalities, birth and residence information, and asset sources.

    2) Data Utilization (1) Billionaires Statistics Dataset (2023) has characteristics that: • The dataset consists of more than 35 columns, including the billionaire's rank, final Worth, industry, country, age, country of residence, source of assets, related industries, citizenship, organization, selfMade, birth information, data collection date, economic and social indicators (GDP, CPI, education enrollment, life expectancy, tax revenue, population, etc.). • In addition to individual asset information, economic indicators and demographic data by country are combined, allowing a three-dimensional analysis of billionaires and each country's economic and social environment. (2) Billionaires Statistics Dataset (2023) can be used to: • Wealth Distribution and Industry Analysis: Using billionaires' net worth, industry, and national data, we can analyze global wealth concentration and wealth distribution by industry and region. • A study linking demographics and economic indicators: Billionaire data can be combined with various economic and social indicators such as GDP, CPI, tax revenue, education, and life expectancy to be used for in-depth research on wealth formation, social background, ratio of self-made and inherited wealth, and regional characteristics.

  2. 🍕Food Bank🏦of the World🌍

    • kaggle.com
    Updated Nov 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pranav941 (2022). 🍕Food Bank🏦of the World🌍 [Dataset]. https://www.kaggle.com/datasets/pranav941/-world-food-wealth-bank
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 9, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Pranav941
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset Structure & Description

    https://imgur.com/AYzsmYU.jpg" alt="Dataset Structure">

    Context and Inspiration

    I read an article yesterday which got my mind storming, A article by Worldbank on August 15th, 2022 better explains it, It has been quoted below,
    I already have a project i'm working on since Feb 2021, trying to solving this problem, listed in my datasets

    This dataset showcases the statistics over the past 6-7 decades which covers the production of 150+ unique crops, 50+ livestock elements, Land distribution by usage and population, As aspiring data scientists one can try to extract insights incentivizing the optimal use of natural resources and distribution of resources

    August 15, 2022 - Worldbank

    Record high food prices have triggered a global crisis that will drive millions more into extreme poverty, magnifying hunger and malnutrition, while threatening to erase hard-won gains in development. The war in Ukraine, supply chain disruptions, and the continued economic fallout of the COVID-19 pandemic are reversing years of development gains and pushing food prices to all-time highs. Rising food prices have a greater impact on people in low- and middle-income countries, since they spend a larger share of their income on food than people in high-income countries. This brief looks at rising food insecurity and World Bank responses to date.

    IMAGE ALT TEXT HERE

    Please leave a upvote if you found this helpful ☮️

    Hello 👋, If you are enjoying so far, Please checkout my other Datasets, I would love to hear your support & feedback on it, Thank you !

    <---(❁´◡`❁)--->

    Checkout my other Datasets & Notebooks

  3. A

    ‘World's Billionaires’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Sep 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2021). ‘World's Billionaires’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-world-s-billionaires-b643/b54af0be/?iid=010-688&v=presentation
    Explore at:
    Dataset updated
    Sep 30, 2021
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘World's Billionaires’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/seriadiallo1/world-billionaires on 30 September 2021.

    --- Dataset description provided by original source is as follows ---

    The richest people in the world, yearly rank from 2002 to 2021

    This dataset contains 200 rows and 7 columns.

    The World's Billionaires is an annual ranking by documented net worth of the world's wealthiest billionaires compiled and published in March annually by the American business magazine Forbes. The list was first published in March 1987. The total net worth of each individual on the list is estimated and is cited in United States dollars, based on their documented assets and accounting for debt. Royalty and dictators whose wealth comes from their positions are excluded from these lists. This ranking is an index of the wealthiest documented individuals, excluding and ranking against those with wealth that is not able to be completely ascertained. (wikipedia)

    --- Original source retains full ownership of the source dataset ---

  4. Global Country Information 2023

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Jun 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nidula Elgiriyewithana; Nidula Elgiriyewithana (2024). Global Country Information 2023 [Dataset]. http://doi.org/10.5281/zenodo.8165229
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 15, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nidula Elgiriyewithana; Nidula Elgiriyewithana
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description

    This comprehensive dataset provides a wealth of information about all countries worldwide, covering a wide range of indicators and attributes. It encompasses demographic statistics, economic indicators, environmental factors, healthcare metrics, education statistics, and much more. With every country represented, this dataset offers a complete global perspective on various aspects of nations, enabling in-depth analyses and cross-country comparisons.

    Key Features

    • Country: Name of the country.
    • Density (P/Km2): Population density measured in persons per square kilometer.
    • Abbreviation: Abbreviation or code representing the country.
    • Agricultural Land (%): Percentage of land area used for agricultural purposes.
    • Land Area (Km2): Total land area of the country in square kilometers.
    • Armed Forces Size: Size of the armed forces in the country.
    • Birth Rate: Number of births per 1,000 population per year.
    • Calling Code: International calling code for the country.
    • Capital/Major City: Name of the capital or major city.
    • CO2 Emissions: Carbon dioxide emissions in tons.
    • CPI: Consumer Price Index, a measure of inflation and purchasing power.
    • CPI Change (%): Percentage change in the Consumer Price Index compared to the previous year.
    • Currency_Code: Currency code used in the country.
    • Fertility Rate: Average number of children born to a woman during her lifetime.
    • Forested Area (%): Percentage of land area covered by forests.
    • Gasoline_Price: Price of gasoline per liter in local currency.
    • GDP: Gross Domestic Product, the total value of goods and services produced in the country.
    • Gross Primary Education Enrollment (%): Gross enrollment ratio for primary education.
    • Gross Tertiary Education Enrollment (%): Gross enrollment ratio for tertiary education.
    • Infant Mortality: Number of deaths per 1,000 live births before reaching one year of age.
    • Largest City: Name of the country's largest city.
    • Life Expectancy: Average number of years a newborn is expected to live.
    • Maternal Mortality Ratio: Number of maternal deaths per 100,000 live births.
    • Minimum Wage: Minimum wage level in local currency.
    • Official Language: Official language(s) spoken in the country.
    • Out of Pocket Health Expenditure (%): Percentage of total health expenditure paid out-of-pocket by individuals.
    • Physicians per Thousand: Number of physicians per thousand people.
    • Population: Total population of the country.
    • Population: Labor Force Participation (%): Percentage of the population that is part of the labor force.
    • Tax Revenue (%): Tax revenue as a percentage of GDP.
    • Total Tax Rate: Overall tax burden as a percentage of commercial profits.
    • Unemployment Rate: Percentage of the labor force that is unemployed.
    • Urban Population: Percentage of the population living in urban areas.
    • Latitude: Latitude coordinate of the country's location.
    • Longitude: Longitude coordinate of the country's location.

    Potential Use Cases

    • Analyze population density and land area to study spatial distribution patterns.
    • Investigate the relationship between agricultural land and food security.
    • Examine carbon dioxide emissions and their impact on climate change.
    • Explore correlations between economic indicators such as GDP and various socio-economic factors.
    • Investigate educational enrollment rates and their implications for human capital development.
    • Analyze healthcare metrics such as infant mortality and life expectancy to assess overall well-being.
    • Study labor market dynamics through indicators such as labor force participation and unemployment rates.
    • Investigate the role of taxation and its impact on economic development.
    • Explore urbanization trends and their social and environmental consequences.
  5. World Population Density

    • global-fistula-hub-ucsf.hub.arcgis.com
    • directrelief.hub.arcgis.com
    • +1more
    Updated May 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Direct Relief (2020). World Population Density [Dataset]. https://global-fistula-hub-ucsf.hub.arcgis.com/items/8d57f7094eb64d58bdb994f6aad72ce6
    Explore at:
    Dataset updated
    May 20, 2020
    Dataset authored and provided by
    Direct Reliefhttp://directrelief.org/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    Description

    This layer was created by Duncan Smith and based on work by the European Commission JRC and CIESIN. A description from his website follows:--------------------A brilliant new dataset produced by the European Commission JRC and CIESIN Columbia University was recently released- the Global Human Settlement Layer (GHSL). This is the first time that detailed and comprehensive population density and built-up area for the world has been available as open data. As usual, my first thought was to make an interactive map, now online at- http://luminocity3d.org/WorldPopDen/The World Population Density map is exploratory, as the dataset is very rich and new, and I am also testing out new methods for navigating statistics at both national and city scales on this site. There are clearly many applications of this data in understanding urban geographies at different scales, urban development, sustainability and change over time.

  6. d

    Are Students Ready for a Technology-Rich World? What PISA Studies Tell Us

    • catalog.data.gov
    • s.cnmilf.com
    Updated Mar 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of State (2021). Are Students Ready for a Technology-Rich World? What PISA Studies Tell Us [Dataset]. https://catalog.data.gov/dataset/are-students-ready-for-a-technology-rich-world-what-pisa-studies-tell-us
    Explore at:
    Dataset updated
    Mar 30, 2021
    Dataset provided by
    U.S. Department of State
    Description

    ICT has profound implications for education, both because ICT can facilitate new forms of learning and because it has become important for young people to master ICT in preparation for adult life. But how extensive is access to ICT in schools and informal settings and how is it used by students? Drawing on data from the OECD’s Programme for International Student Assessment (PISA), Are Students Ready for a Technology-Rich World? What PISA Studies Tell Us, examines whether access to computers for students is equitable across countries and student groups; how students use ICT and what their attitudes are towards ICT; the relationship between students’ access to and use of ICT and their performance in PISA 2003; and the implications for educational policy.

  7. F

    Hindi Agent-Customer Chat Dataset for Healthcare Domain

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Hindi Agent-Customer Chat Dataset for Healthcare Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/hindi-healthcare-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Hindi Healthcare Chat Dataset is a rich collection of over 12,000 text-based conversations between customers and call center agents, focused on real-world healthcare interactions. Designed to reflect authentic language use and domain-specific dialogue patterns, this dataset supports the development of conversational AI, chatbots, and NLP models tailored for healthcare applications in Hindi-speaking regions.

    Participant & Chat Overview

    Participants: 200+ native Hindi speakers from the FutureBeeAI Crowd Community
    Conversation Length: 300–700 words per chat
    Turns per Chat: 50–150 dialogue turns across both participants
    Chat Types: Inbound and outbound
    Sentiment Coverage: Positive, neutral, and negative outcomes included

    Topic Diversity

    The dataset captures a wide spectrum of healthcare-related chat scenarios, ensuring comprehensive coverage for training robust AI systems:

    Inbound Chats (Customer-Initiated): Appointment scheduling, new patient registration, surgery and treatment consultations, diet and lifestyle discussions, insurance claim inquiries, lab result follow-ups
    Outbound Chats (Agent-Initiated): Appointment reminders and confirmations, health and wellness program offers, test result notifications, preventive care and vaccination reminders, subscription renewals, risk assessment and eligibility follow-ups

    This variety helps simulate realistic healthcare support workflows and patient-agent dynamics.

    Language Diversity & Realism

    This dataset reflects the natural flow of Hindi healthcare communication and includes:

    Authentic Naming Patterns: Hindi personal names, clinic names, and brands
    Localized Contact Elements: Addresses, emails, phone numbers, and clinic locations in regional Hindi formats
    Time & Currency References: Use of dates, times, numeric expressions, and currency units aligned with Hindi-speaking regions
    Colloquial & Medical Expressions: Local slang, informal speech, and common healthcare-related terminology

    These elements ensure the dataset is contextually relevant and linguistically rich for real-world use cases.

    Conversational Flow & Structure

    Conversations range from simple inquiries to complex advisory sessions, including:

    General inquiries
    Detailed problem-solving
    Routine status updates
    Treatment recommendations
    Support and feedback interactions

    Each conversation typically includes these structural components:

    Greetings and verification
    Information gathering
    Problem definition
    Solution delivery
    Closing messages
    Follow-up and feedback (where applicable)

    This structured flow mirrors actual healthcare support conversations and is ideal for training advanced dialogue systems.

    Data Format & Structure

    Available in JSON, CSV, and TXT formats, each conversation includes:

    Full message history with clear speaker labels
    Participant identifiers
    Metadata (e.g., topic tags, region, sentiment)
    Compatibility with common NLP and ML pipelines

    Applications

    <p

  8. Economic Disparity

    • kaggle.com
    Updated Mar 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    willian oliveira gibin (2024). Economic Disparity [Dataset]. http://doi.org/10.34740/kaggle/dsv/7802717
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 9, 2024
    Dataset provided by
    Kaggle
    Authors
    willian oliveira gibin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    this graphs is ourdataworld :

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F00b0f9cc2bd8326c60fd0ea3b5dbe4b7%2Finequality.png?generation=1710013947537354&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F1978511abe249d3081a3a95bae2ef7d5%2Fincome-share-top-1-before-tax-wid-extrapolations.png?generation=1710013977201099&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F2a5a54725f65801ba75b6ab07bc5cb9f%2Fincome-share-top-1-before-tax-wid-extrapolations%20(1).png?generation=1710013994341360&alt=media" alt="">

    How are incomes and wealth distributed between people? Both within countries and across the world as a whole?

    On this page, you can find all our data, visualizations, and writing relating to economic inequality.

    This evidence demonstrates that inequality in many countries is substantial and, in numerous instances, has been escalating. Global economic inequality is extensive and exacerbated by intersecting disparities in health, education, and various other dimensions.

    However, economic inequality is not uniformly increasing. In many countries, it has declined or remained steady. Furthermore, global inequality – following two centuries of ascent – is presently decreasing as well.

    The significant variations observed across countries and over time are pivotal. They indicate that high and rising inequality is not inevitable and that the current extent of inequality is subject to change.

    About this data This data explorer offers various inequality indicators measured according to two distinct definitions of income sourced from different outlets.

    Data from the World Inequality Database pertains to inequality prior to taxes and benefits. Data from the World Bank pertains to either income post taxes and benefits or consumption, contingent on the country and year. For additional details regarding the definitions and methodologies underlying this data, refer to the accompanying article below, where you can also delve into and juxtapose a broader spectrum of indicators from various sources.

  9. F

    Punjabi Agent-Customer Chat Dataset for Healthcare Domain

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Punjabi Agent-Customer Chat Dataset for Healthcare Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/punjabi-healthcare-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Punjabi Healthcare Chat Dataset is a rich collection of over 12,000 text-based conversations between customers and call center agents, focused on real-world healthcare interactions. Designed to reflect authentic language use and domain-specific dialogue patterns, this dataset supports the development of conversational AI, chatbots, and NLP models tailored for healthcare applications in Punjabi-speaking regions.

    Participant & Chat Overview

    Participants: 200+ native Punjabi speakers from the FutureBeeAI Crowd Community
    Conversation Length: 300–700 words per chat
    Turns per Chat: 50–150 dialogue turns across both participants
    Chat Types: Inbound and outbound
    Sentiment Coverage: Positive, neutral, and negative outcomes included

    Topic Diversity

    The dataset captures a wide spectrum of healthcare-related chat scenarios, ensuring comprehensive coverage for training robust AI systems:

    Inbound Chats (Customer-Initiated): Appointment scheduling, new patient registration, surgery and treatment consultations, diet and lifestyle discussions, insurance claim inquiries, lab result follow-ups
    Outbound Chats (Agent-Initiated): Appointment reminders and confirmations, health and wellness program offers, test result notifications, preventive care and vaccination reminders, subscription renewals, risk assessment and eligibility follow-ups

    This variety helps simulate realistic healthcare support workflows and patient-agent dynamics.

    Language Diversity & Realism

    This dataset reflects the natural flow of Punjabi healthcare communication and includes:

    Authentic Naming Patterns: Punjabi personal names, clinic names, and brands
    Localized Contact Elements: Addresses, emails, phone numbers, and clinic locations in regional Punjabi formats
    Time & Currency References: Use of dates, times, numeric expressions, and currency units aligned with Punjabi-speaking regions
    Colloquial & Medical Expressions: Local slang, informal speech, and common healthcare-related terminology

    These elements ensure the dataset is contextually relevant and linguistically rich for real-world use cases.

    Conversational Flow & Structure

    Conversations range from simple inquiries to complex advisory sessions, including:

    General inquiries
    Detailed problem-solving
    Routine status updates
    Treatment recommendations
    Support and feedback interactions

    Each conversation typically includes these structural components:

    Greetings and verification
    Information gathering
    Problem definition
    Solution delivery
    Closing messages
    Follow-up and feedback (where applicable)

    This structured flow mirrors actual healthcare support conversations and is ideal for training advanced dialogue systems.

    Data Format & Structure

    Available in JSON, CSV, and TXT formats, each conversation includes:

    Full message history with clear speaker labels
    Participant identifiers
    Metadata (e.g., topic tags, region, sentiment)
    Compatibility with common NLP and ML pipelines

    Applications

    <p

  10. Celebrity Net Worth Dataset

    • crawlfeeds.com
    csv, zip
    Updated Oct 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crawl Feeds (2024). Celebrity Net Worth Dataset [Dataset]. https://crawlfeeds.com/datasets/celebrity-net-worth-list-dataset
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Oct 2, 2024
    Dataset authored and provided by
    Crawl Feeds
    License

    https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy

    Description

    The Celebrity Net Worth Dataset offers an in-depth look at the estimated financial assets and wealth of global celebrities, extracted from CelebrityNetWorth.com by Crawl Feeds. This dataset provides the latest available financial data as of January 31, 2022, making it a valuable resource for analyzing the earnings, investments, and overall wealth of prominent figures in various industries such as entertainment, sports, music, and more.

    Key Features:

    • Celebrity Names: Includes a comprehensive list of celebrities from various industries.
    • Net Worth Estimates: Estimated total net worth, including assets and investments, for each celebrity.
    • Industry Classification: Categorizes celebrities into industries such as Movies, Sports, Music, and Media.
    • Source of Wealth: Details on how celebrities accumulated their wealth (e.g., acting, music, endorsements, business ventures).
    • Data Extraction Date: Last extracted on 31st January 2022, providing the most recent snapshot of celebrity finances.
    • CSV Format: Structured dataset available in CSV format, allowing seamless integration with data analysis tools.

    Applications:

    • Wealth Analysis: Conduct detailed studies on celebrity wealth, growth trends, and earnings potential.
    • Industry Comparisons: Compare net worth estimates across different industries and categories.
    • Financial Forecasting: Use historical data to predict future net worth trends for celebrities.
    • Investment Insights: Identify key areas where celebrities invest or accumulate wealth.
    • Market Research: Gain insights into the financial power of celebrity endorsements and their influence on industries.

    For access to more updated celebrity net worth datasets, reach out to the Crawl Feeds team for further assistance.

  11. o

    The Real World Worry Waves Dataset

    • osf.io
    Updated Aug 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Isabelle van der Vegt; Bennett Kleinberg (2023). The Real World Worry Waves Dataset [Dataset]. http://doi.org/10.17605/OSF.IO/9B85R
    Explore at:
    Dataset updated
    Aug 12, 2023
    Dataset provided by
    Center For Open Science
    Authors
    Isabelle van der Vegt; Bennett Kleinberg
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Besides far-reaching public health consequences, the COVID-19 pandemic had a significant psychological impact on people around the world. To gain further insight into this matter, we introduce the Real World Worry Waves Dataset (RW3D). The dataset combines rich open-ended free-text responses with survey data on emotions, significant life events, and psychological stressors in a repeated-measures design in the UK over three years (2020: n = 2441, 2021: n = 1716 and 2022: n = 1152). This paper provides background information on the data collection procedure, the recorded variables, participants’ demographics, and higher-order psychological and text-derived variables that emerged from the data. The RW3D is a unique primary data resource that could inspire new research questions on the psychological impact of the pandemic, especially those that connect modalities (here: text data, psychological survey variables and demographics) over time.

  12. OLAS Population-based Water Stress and Risk Dataset for Latin America and...

    • data.iadb.org
    • splitgraph.com
    csv
    Updated May 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IDB Datasets (2025). OLAS Population-based Water Stress and Risk Dataset for Latin America and the Caribbean [Dataset]. http://doi.org/10.60966/pb1wfxl0
    Explore at:
    csv(69660117)Available download formats
    Dataset updated
    May 8, 2025
    Dataset provided by
    Inter-American Development Bankhttp://www.iadb.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2020
    Area covered
    Latin America
    Description

    LAC is the most water-rich region in the world by most metrics; however, water resource distribution throughout the region does not correspond demand. To understand water risk throughout the region, this dataset provides population and land area estimates for factors related to water risk, allowing users to explore vulnerability throughout the region to multiple dimensions of water risk. This dataset contains estimates of populations living in areas of water stress and risk in 27 countries in Latin America and the Caribbean (LAC) at the municipal level. The dataset contains categories of 18 factors related to water risk and 39 indices of water risk and population estimates within each with aggregations possible at the basin, state, country, and regional level. The population data used to generate this dataset were obtained from the WorldPop project 2020 UN-adjusted population projections, while estimates of water stress and risk come from WRI’s Aqueduct 3.0 Water Risk Framework. Municipal administrative boundaries are from the Database of Global Administrative Areas (GADM). For more information on the methodology users are invited to read IADB Technical Note IDB-TN-2411: “Scarcity in the Land of Plenty”, and WRIs “Aqueduct 3.0: Updated Decision-relevant Global Water Risk Indicators”.

  13. Health Nutrition and Population Data Package

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). Health Nutrition and Population Data Package [Dataset]. https://www.johnsnowlabs.com/marketplace/health-nutrition-and-population-data-package/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Description

    This data package contains data on key health, education, nutrition, and population statistics gathered from different international sources.

  14. Synthetic Ride-Sharing Dataset for Dynamic Pricing

    • kaggle.com
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vasupradha Veerapaneni (2025). Synthetic Ride-Sharing Dataset for Dynamic Pricing [Dataset]. https://www.kaggle.com/datasets/vasupradha2003/synthetic-ride-sharing-dataset-for-dynamic-pricing
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Vasupradha Veerapaneni
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset was synthetically generated to simulate ride-sharing pricing dynamics. It includes features such as Distance, Time of Day, Demand, Weather, Base Price, Weather Multiplier, and Final Price. The dataset aims to model real-world scenarios for ride-sharing services, providing a rich resource for machine learning, data analysis, and predictive modeling tasks.

    • Distance: Distance of the ride in kilometers.
    • Time_of_Day: Time of the ride, categorized into Morning, Afternoon, Evening, and Night.
    • Demand: A demand score indicating ride demand during the time period.
    • Weather: Categorical variable representing weather conditions (e.g., Clear, Rainy, Snowy).
    • Base_Price: Base price for the ride in local currency.
    • Weather_Multiplier: A multiplier applied to the base price based on weather conditions.
    • Price: Final price of the ride after applying the weather multiplier.

    Note : "This dataset is static and will not be updated regularly."

  15. Mapping Ocean Wealth Explorer

    • niue-data.sprep.org
    • pacificdata.org
    • +14more
    pdf
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Secretariat of the Pacific Regional Environment Programme (2025). Mapping Ocean Wealth Explorer [Dataset]. https://niue-data.sprep.org/dataset/mapping-ocean-wealth-explorer
    Explore at:
    pdf(12573434)Available download formats
    Dataset updated
    Feb 20, 2025
    Dataset provided by
    Pacific Regional Environment Programmehttps://www.sprep.org/
    License

    Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
    License information was derived automatically

    Area covered
    Pacific Region
    Description

    The Mapping Ocean Wealth data viewer is a live online resource for sharing understanding of the value of marine and coastal ecosystems to people. It includes global maps, regionally-specific studies, reference data, and a number of “apps” providing key data analytics. Maps and apps can be opened according to key themes or geographies. The navigator the left of the maps enables you to add or remove any additional map layers as you explore. Information keys explain how the maps were made and provide additional links. Further information and resources can be found on Oceanwealth.org

    • Recreation and Tourism App - Explore the value of healthy ecosystems to the tourism industry
    • Natural Coastal Protection App - Discover the coastal protection benefits of coral reefs around the world
    • Blue Carbon App - View Mangrove Carbon Storage
    • Coral Reef Fisheries App - Learn about the status of coral reef fisheries
    • Regional Planning
    • Mangrove Restoration
  16. w

    Decomposing World Income Distribution Database

    • datacatalog.worldbank.org
    excel, pdf
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roula I. Yazigi, Decomposing World Income Distribution Database [Dataset]. https://datacatalog.worldbank.org/search/dataset/0041692/decomposing-world-income-distribution-database
    Explore at:
    excel, pdfAvailable download formats
    Dataset provided by
    Roula I. Yazigi
    License

    https://datacatalog.worldbank.org/public-licenses?fragment=cchttps://datacatalog.worldbank.org/public-licenses?fragment=cc

    Area covered
    World
    Description

    Using national income and expenditure distribution data from 119 countries, the authors decompose total income inequality between the individuals in the world, by continent and by "region" (countries grouped by income level). They use a Gini decomposition that allows for an exact breakdown (without a residual term) of the overall Gini by recipients. Looking first at income inequality in income between countries is more important than inequality within countries. Africa, Latin America, and Western Europe and North America are quite homogeneous continent, with small differences between countries (so that most of the inequality on these continents is explained by inequality within countries). Next the authors divide the world into three groups: the rich G7 countries (and those with similar income levels), the less developed countries (those with per capita income less than or equal to Brazil's), and the middle-income countries (those with per capita income between Brazil's and Italy's). They find little overlap between such groups - very few people in developing countries have incomes in the range of those in the rich countries.

  17. e

    MPIIPrivacEye - Dataset - B2FIND

    • b2find.eudat.eu
    Updated Oct 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). MPIIPrivacEye - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/681470ea-d6d4-5b46-b793-d76a0154a991
    Explore at:
    Dataset updated
    Oct 21, 2023
    Description

    First-person video dataset recorded in daily life situations of 17 participants, annotated by themselves for privacy sensitivity. The dataset of Steil et al. contains more than 90 hours of data recorded continuously from 20 participants (six females, aged 22-31) over more than four hours each. Participants were students with different backgrounds and subjects with normal or corrected- to-normal vision. During the recordings, participants roamed a university campus and performed their everyday activities, such as meeting people, eating, or working as they normally would on any day at the university. To obtain some data from multiple, and thus also “privacy-sensitive”, places on the university campus, participants were asked to not stay in one place for more than 30 minutes. Participants were further asked to stop the recording after about one and a half hours so that the laptop’s battery packs could be changed and the eye tracker re-calibrated. This yielded three recordings of about 1.5 hours per participant. Participants regularly interacted with a mobile phone provided to them and were also encouraged to use their own laptop, desktop computer, or music player if desired. The dataset thus covers a rich set of representative real-world situations, including sensitive environments and tasks. The data is only to be used for non-commercial scientific purposes.

  18. F

    Vietnamese Agent-Customer Chat Dataset for Healthcare Domain

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Vietnamese Agent-Customer Chat Dataset for Healthcare Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/vietnamese-healthcare-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Vietnamese Healthcare Chat Dataset is a rich collection of over 10,000 text-based conversations between customers and call center agents, focused on real-world healthcare interactions. Designed to reflect authentic language use and domain-specific dialogue patterns, this dataset supports the development of conversational AI, chatbots, and NLP models tailored for healthcare applications in Vietnamese-speaking regions.

    Participant & Chat Overview

    Participants: 150+ native Vietnamese speakers from the FutureBeeAI Crowd Community
    Conversation Length: 300–700 words per chat
    Turns per Chat: 50–150 dialogue turns across both participants
    Chat Types: Inbound and outbound
    Sentiment Coverage: Positive, neutral, and negative outcomes included

    Topic Diversity

    The dataset captures a wide spectrum of healthcare-related chat scenarios, ensuring comprehensive coverage for training robust AI systems:

    Inbound Chats (Customer-Initiated): Appointment scheduling, new patient registration, surgery and treatment consultations, diet and lifestyle discussions, insurance claim inquiries, lab result follow-ups
    Outbound Chats (Agent-Initiated): Appointment reminders and confirmations, health and wellness program offers, test result notifications, preventive care and vaccination reminders, subscription renewals, risk assessment and eligibility follow-ups

    This variety helps simulate realistic healthcare support workflows and patient-agent dynamics.

    Language Diversity & Realism

    This dataset reflects the natural flow of Vietnamese healthcare communication and includes:

    Authentic Naming Patterns: Vietnamese personal names, clinic names, and brands
    Localized Contact Elements: Addresses, emails, phone numbers, and clinic locations in regional Vietnamese formats
    Time & Currency References: Use of dates, times, numeric expressions, and currency units aligned with Vietnamese-speaking regions
    Colloquial & Medical Expressions: Local slang, informal speech, and common healthcare-related terminology

    These elements ensure the dataset is contextually relevant and linguistically rich for real-world use cases.

    Conversational Flow & Structure

    Conversations range from simple inquiries to complex advisory sessions, including:

    General inquiries
    Detailed problem-solving
    Routine status updates
    Treatment recommendations
    Support and feedback interactions

    Each conversation typically includes these structural components:

    Greetings and verification
    Information gathering
    Problem definition
    Solution delivery
    Closing messages
    Follow-up and feedback (where applicable)

    This structured flow mirrors actual healthcare support conversations and is ideal for training advanced dialogue systems.

    Data Format & Structure

    Available in JSON, CSV, and TXT formats, each conversation includes:

    Full message history with clear speaker labels
    Participant identifiers
    Metadata (e.g., topic tags, region, sentiment)
    Compatibility with common NLP and ML pipelines
    <h3 style="font-weight:

  19. e

    Revenue and Distributional Modelling for a UK Wealth Tax, 2020-2021 -...

    • b2find.eudat.eu
    Updated May 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Revenue and Distributional Modelling for a UK Wealth Tax, 2020-2021 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/82c1204a-dc05-5096-af59-a922ea80ed9a
    Explore at:
    Dataset updated
    May 6, 2024
    Area covered
    United Kingdom
    Description

    Advani, Hughson and Tarrant (2021) model the revenue that could be raised from an annual and a one-off wealth tax of the design recommended by Advani, Chamberlain and Summers in the Wealth Tax Commission’s Final Report (2020). This deposit contains the code required to replicate the revenue modelling and distributional analysis. The modelling draws on data from the Wealth and Assets Survey, supplemented with the Sunday Times Rich List, which we use to implement a Pareto correction for the under-coverage of wealth at the top.Around the world, the unprecedented public spending required to tackle COVID-19 will inevitably be followed by a debate about how to rebuild public finances. At the same time, politicians in many countries are already facing far-reaching questions from their electorates about the widening cracks in the social fabric that this pandemic has exposed, as prior inequalities become amplified and public services are stretched to their limits. These simultaneous shocks to national politics inevitably encourage people to 'think big' on tax policy. Even before the current crisis there were widespread calls for reforms to the taxation of wealth in the UK. These proposals have so far focused on reforming existing taxes. However, other countries have begun to raise the idea of introducing a 'wealth tax'-a new tax on ownership of wealth (net of debt). COVID-19 has rapidly pushed this idea higher up political agendas around the world, but existing studies fall a long way short of providing policymakers with a comprehensive blueprint for whether and how to introduce a wealth tax. Critics point to a number of legitimate issues that would need to be addressed. Would it be fair, and would the public support it? Is this type of tax justified from an economic perspective? How would you stop the wealthiest from hiding their assets? Will they all simply leave? How can you value some assets? What happens to people who own lots of wealth, but have little income with which to pay a wealth tax? And if wealth taxes are such a good idea, why have many countries abandoned them? These are important questions, without straightforward answers. The UK government last considered a wealth tax in the mid-1970s. This was also the last time that academics and policymakers in the UK thought seriously about how such a tax could be implemented. Over the past half century, much has changed in the mobility of people, the structure of our tax system, the availability of data, and the scope for digital solutions and coordination between tax authorities. Old plans therefore cannot be pulled 'off the shelf'. This project will evaluate whether a wealth tax for the UK would be desirable and deliverable. We will address the following three main research questions: (1) Is a wealth tax justified in principle, on economic or other grounds? (2) How should a wealth tax be designed, including definition of the tax base and solutions to administrative challenges such as valuation and liquidity? (3) What would be the revenue and distributional effects of a wealth tax in the UK, for a variety of design options and at specified rates/thresholds? To answer these questions, we will draw on a network of world-leading exports on tax policy from across academia, policy spheres, and legal practice. We will examine international experience, synthesising a large body of existing research originating in countries that already have (or have had) a wealth tax. We will add to these resources through novel research that draws on adjacent fields and disciplines to craft new solutions to the practical problems faced in delivering a wealth tax. We will also review common objections to a wealth tax. These new insights will be published in a series of 'evidence papers' made available directly to the public and policymakers. We will also publish a final report that states key recommendations for government and (if appropriate) delivers a 'ready to legislate' design for a wealth tax. We will not recommend specific rates or thresholds for the tax. Instead, we will create an online 'tax simulator' so that policymakers and members of the public can model the revenue and distributional effects of different options. We will also work with international partners to inform debates about wealth taxes in other countries. The modelling draws on data from the Wealth and Assets Survey, supplemented with the Sunday Times Rich List, which we use to implement a Pareto correction for the under-coverage of wealth at the top.

  20. F

    Spanish Agent-Customer Chat Dataset for Healthcare Domain

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Spanish Agent-Customer Chat Dataset for Healthcare Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/spanish-healthcare-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Spanish Healthcare Chat Dataset is a rich collection of over 10,000 text-based conversations between customers and call center agents, focused on real-world healthcare interactions. Designed to reflect authentic language use and domain-specific dialogue patterns, this dataset supports the development of conversational AI, chatbots, and NLP models tailored for healthcare applications in Spanish-speaking regions.

    Participant & Chat Overview

    Participants: 150+ native Spanish speakers from the FutureBeeAI Crowd Community
    Conversation Length: 300–700 words per chat
    Turns per Chat: 50–150 dialogue turns across both participants
    Chat Types: Inbound and outbound
    Sentiment Coverage: Positive, neutral, and negative outcomes included

    Topic Diversity

    The dataset captures a wide spectrum of healthcare-related chat scenarios, ensuring comprehensive coverage for training robust AI systems:

    Inbound Chats (Customer-Initiated): Appointment scheduling, new patient registration, surgery and treatment consultations, diet and lifestyle discussions, insurance claim inquiries, lab result follow-ups
    Outbound Chats (Agent-Initiated): Appointment reminders and confirmations, health and wellness program offers, test result notifications, preventive care and vaccination reminders, subscription renewals, risk assessment and eligibility follow-ups

    This variety helps simulate realistic healthcare support workflows and patient-agent dynamics.

    Language Diversity & Realism

    This dataset reflects the natural flow of Spanish healthcare communication and includes:

    Authentic Naming Patterns: Spanish personal names, clinic names, and brands
    Localized Contact Elements: Addresses, emails, phone numbers, and clinic locations in regional Spanish formats
    Time & Currency References: Use of dates, times, numeric expressions, and currency units aligned with Spanish-speaking regions
    Colloquial & Medical Expressions: Local slang, informal speech, and common healthcare-related terminology

    These elements ensure the dataset is contextually relevant and linguistically rich for real-world use cases.

    Conversational Flow & Structure

    Conversations range from simple inquiries to complex advisory sessions, including:

    General inquiries
    Detailed problem-solving
    Routine status updates
    Treatment recommendations
    Support and feedback interactions

    Each conversation typically includes these structural components:

    Greetings and verification
    Information gathering
    Problem definition
    Solution delivery
    Closing messages
    Follow-up and feedback (where applicable)

    This structured flow mirrors actual healthcare support conversations and is ideal for training advanced dialogue systems.

    Data Format & Structure

    Available in JSON, CSV, and TXT formats, each conversation includes:

    Full message history with clear speaker labels
    Participant identifiers
    Metadata (e.g., topic tags, region, sentiment)
    Compatibility with common NLP and ML pipelines

    Applications

    <p

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
CUBIG (2023). Billionaires Statistics (2023) Dataset [Dataset]. https://cubig.ai/store/products/552/billionaires-statistics-2023-dataset

Billionaires Statistics (2023) Dataset

Explore at:
Dataset updated
Apr 4, 2023
Dataset authored and provided by
CUBIG
License

https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

Measurement technique
Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
Description

1) Data Introduction • The Billionaires Statistics Dataset (2023) is a comprehensive set of personal and business information, including rankings of billionaires worldwide, net assets, industries, businesses, nationalities, birth and residence information, and asset sources.

2) Data Utilization (1) Billionaires Statistics Dataset (2023) has characteristics that: • The dataset consists of more than 35 columns, including the billionaire's rank, final Worth, industry, country, age, country of residence, source of assets, related industries, citizenship, organization, selfMade, birth information, data collection date, economic and social indicators (GDP, CPI, education enrollment, life expectancy, tax revenue, population, etc.). • In addition to individual asset information, economic indicators and demographic data by country are combined, allowing a three-dimensional analysis of billionaires and each country's economic and social environment. (2) Billionaires Statistics Dataset (2023) can be used to: • Wealth Distribution and Industry Analysis: Using billionaires' net worth, industry, and national data, we can analyze global wealth concentration and wealth distribution by industry and region. • A study linking demographics and economic indicators: Billionaire data can be combined with various economic and social indicators such as GDP, CPI, tax revenue, education, and life expectancy to be used for in-depth research on wealth formation, social background, ratio of self-made and inherited wealth, and regional characteristics.

Search
Clear search
Close search
Google apps
Main menu