100+ datasets found
  1. h

    100-richest-people-in-world

    • huggingface.co
    Updated Aug 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nate Raw (2023). 100-richest-people-in-world [Dataset]. https://huggingface.co/datasets/nateraw/100-richest-people-in-world
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 2, 2023
    Authors
    Nate Raw
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Area covered
    World
    Description

    Dataset Card for 100 Richest People In World

      Dataset Summary
    

    This dataset contains the list of Top 100 Richest People in the World Column Information:-

    Name - Person Name NetWorth - His/Her Networth Age - Person Age Country - The country person belongs to Source - Information Source Industry - Expertise Domain

      Join our Community
    
    
    
    
    
    
    
    
    
      Supported Tasks and Leaderboards
    

    [More Information Needed]

      Languages
    

    [More Information Needed]… See the full description on the dataset page: https://huggingface.co/datasets/nateraw/100-richest-people-in-world.

  2. 1000 Richest People in the World

    • kaggle.com
    zip
    Updated Jul 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Waqar Ali (2024). 1000 Richest People in the World [Dataset]. https://www.kaggle.com/datasets/waqi786/1000-richest-people-in-the-world
    Explore at:
    zip(8652 bytes)Available download formats
    Dataset updated
    Jul 28, 2024
    Authors
    Waqar Ali
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset provides a synthetic overview of the 1,000 wealthiest individuals in the world, offering insights into the distribution of wealth across industries and regions. It is designed to help analysts, researchers, and data enthusiasts explore global wealth trends, industry dominance, and regional wealth concentration.

    Whether you're conducting market research, financial analysis, or data modeling, this dataset serves as a valuable resource for understanding the characteristics of the world's top billionaires.

    📊 Key Features: Name 👤: The name of the billionaire. Country 🌍: Country of residence or primary business operation. Industry 🏭: Industry in which the individual has built their wealth. Net Worth (in billions) 💵: Estimated net worth in billions of USD. Company 🏢: The primary company or business associated with the billionaire. ⚠️ Important Note: This dataset is 100% synthetic and does not contain real financial or personal data. It is artificially generated for educational, analytical, and research purposes.

  3. Forbes World's Billionaires List 2024

    • kaggle.com
    Updated Aug 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vincent Campanaro (2025). Forbes World's Billionaires List 2024 [Dataset]. http://doi.org/10.34740/kaggle/dsv/12717950
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 9, 2025
    Dataset provided by
    Kaggle
    Authors
    Vincent Campanaro
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This comprehensive dataset encapsulates a detailed snapshot of the wealthiest individuals globally, as listed by Forbes in 2024. Compiled through meticulous web scraping and data aggregation, the dataset includes a wide range of attributes for each billionaire. Fields encompass basic personal information such as name, age, and gender, alongside financial details including net worth and sources of wealth. The dataset further delves into aspects like industry involvement, organizational affiliations, philanthropic endeavors, and educational backgrounds.

    Key attributes in this dataset include:

    Name: Full legal name of the billionaire. Age: Age of the individual. 2024 Net Worth: Estimated net worth in USD for the year 2024. Industry: Primary industry or sector of operation. Source of Wealth: Origin of the billionaire’s wealth. Title: Professional title or position. Organization: Name of the associated organization. Self-Made: Indicator if the wealth is self-made. Self-Made Score: A quantitative score assessing how self-made their wealth is. Philanthropy Score: A score reflecting the extent of their philanthropic activities. Residence: Main residence of the individual. Citizenship: Legal citizenship. Gender: Gender identity. Marital Status: Current marital status. Children: Number of children. Education: Highest level of education attained.

    This dataset is ideal for analysis, offering insights into the distribution of wealth, the influence of education on wealth accumulation, and trends across different industries. It also provides a foundation for exploring the impact of socioeconomic factors on personal wealth. The data were collected and formatted with careful consideration to ensure accuracy, making it a valuable resource for researchers, economists, and anyone interested in the dynamics of wealth and success.

    Please note that some data is missing in this dataset, primarily due to the unavailability of information from Forbes. This issue becomes more prevalent beyond the top 400 entries. Many individuals lack a self-made score, a philanthropy score, or specific details regarding their title or organization as per Forbes' listings. I am currently working to update the dataset with this missing information. However, this update process is quite tedious and time-consuming since it is mostly manual. I appreciate your patience and understanding as I work through these details.

  4. Billionaires dataset cleaned

    • kaggle.com
    zip
    Updated Feb 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Javier_SAB (2024). Billionaires dataset cleaned [Dataset]. https://www.kaggle.com/datasets/javiersab/billionaires-dataset-cleaned
    Explore at:
    zip(128906 bytes)Available download formats
    Dataset updated
    Feb 24, 2024
    Authors
    Javier_SAB
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Cleaned dataset from the Billionaires Statistic Dataset (2023) that can be found here.

    The code I used to clean and re-structure the data is also here.

    First things first: a big shout-out to Nidula Elgiriyewithana for providing the original data.

    As with it, this dataset contains various information about the world's wealthiest persons in different columns that can be grouped into three different types:

    • Business-related information. These columns contain data about the industry in which the billionaires' operate, their source of wealth, total wealth and position they occupy in the ranking.
    • Personal information. Such as name, age, nationality, country and city of residence.
    • Economic activity information. These columns are related to the country in which the billionaire resides and provide different economic indicators like GDP, education enrollment or Consumer Price Index (CPI).

    Column names

    • position. Ranking of the billionaire measured by their wealth.
    • wealth. The wealth of the billionaire measured in $.
    • industry. Industry in which the billionaire's operates their businesses.
    • full_name. Complete name of the billionaire.
    • age. The age of the billionaire.
    • country_of_residence. Country in which the billionaire resides.
    • city_of_residence. City in which the billionaire resides.
    • source. The source of the billionaire's wealth.
    • citizenship. The country of citizenship of the billionaire.
    • gender. The gender of the billionaire.
    • birth_date. The birth date of the billionaire.
    • last_name. The last name of the billionaire.
    • first_name. The first name of the billionaire.
    • residence_state. State in which the billionaire resides (only for billionaires who reside in the U.S.).
    • residence_region. Region in which the billionaire resides (only for billionaires who reside in the U.S.).
    • birth_year. The birth year of the billionaire.
    • birth_month. The birth month of the billionaire.
    • birth_day. The birth data of the billionaire.
    • cpi_country. Consumer Price Index (CPI) for the billionaire's country.
    • cpi_change_country. CPI change for the billionaire's country.
    • gdp_country. Gross Domestic Product (GDP) in $ for the billionaire's country.
    • g_tertiary_ed_enroll. Enrollment in tertiary education in the billionaire's country.
    • g_primary_ed_enroll. Enrollment in primary education in the billionaire's country.
    • life_expectancy. Life expectancy in the billionaire's country.
    • tax_revenue. Tax revenue in the billionaire's country.
    • tax_rate. Total tax rate in the billionaire's country.
    • country_pop. Population of the billionaire's country.
    • country_lat. Latitude coordinate of the billionaire's country.
    • country_long. Longitude coordinate of the billionaire's country.
    • continent. Continent in which the country of the billionaire's residence is located.

    Potential analyses

    • Analyze which industries contain the biggest groups of billionaires overall and in different countries.
    • Explore number of billionaires and total wealth across countries and continents and display the result in a map.
    • Focus on personal information columns such as age or gender to explore the distribution of billionaires from this perspective.
    • Discover if countries' economic indicators have any impact in the presence of billionaires.
    • The U.S. is the country with most billionaires presented in the dataset and also the only one with attributes in the residence_state and residence_region columns. This makes the American billionaires a good focus for a specific analysis.

    Bonus

    If you want a challenge, you can create a dashboard using tools such as Plotly to dynamically visualize the data using one or different attributes (such as industry, age or country). I did it, leave the link below in case you want to investigate:

    Dashboard notebook here


    If you find this dataset informative or inspirational, a vote is appreciated for others to easily discover value in it 💎💰

  5. Leading billionaires worldwide 2025

    • statista.com
    Updated Mar 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Leading billionaires worldwide 2025 [Dataset]. https://www.statista.com/statistics/272047/top-25-global-billionaires/
    Explore at:
    Dataset updated
    Mar 18, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Mar 2025
    Area covered
    World
    Description

    As of March 2025, Elon Musk had a net worth valued at 328.5 billion U.S. dollars, making him the richest man in the world. Amazon founder Jeff Bezos followed in second, with Marc Zuckerberg, the founder of Facebook, in third. The list is dominated by Americans, and Alice Walton and Francoise Bettencourt Meyers are the only women among the 20 richest people worldwide.

  6. Top 100 Richest People in the World

    • kaggle.com
    zip
    Updated Sep 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ayessa (2022). Top 100 Richest People in the World [Dataset]. https://www.kaggle.com/datasets/ayessa/top-100-richest-people-in-the-world
    Explore at:
    zip(3573 bytes)Available download formats
    Dataset updated
    Sep 18, 2022
    Authors
    Ayessa
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Introduction

    This dataset contains the top 100 richest people in the world based on their net worth. The dataset includes their rank, name, net worth, birthday, age, and nationality.

    Methodology

    This dataset was collected using web scraping (Beautiful Soup) on this website and this "https://en.wikipedia.org/wiki/List_of_countries_by_number_of_billionaires">wikipedia

    Thumbnail Photo

  7. World_billion_2024

    • kaggle.com
    zip
    Updated Jun 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    willian oliveira (2024). World_billion_2024 [Dataset]. https://www.kaggle.com/willianoliveiragibin/world-billion-2024
    Explore at:
    zip(55504 bytes)Available download formats
    Dataset updated
    Jun 25, 2024
    Authors
    willian oliveira
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    World
    Description

    This graph was retired this internet :

    The "Richest People in the World - 2024" dataset provides a detailed overview of the wealthiest individuals globally for the year 2024. This dataset includes crucial information about the top executives, their net worth, and the countries they are based in, offering valuable insights for economic analysis, market research, and financial studies.

  8. Billionaries dataset

    • kaggle.com
    zip
    Updated Apr 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TEJA KUMAR (2021). Billionaries dataset [Dataset]. https://www.kaggle.com/ravillatejakumar/billionaries-dataset
    Explore at:
    zip(101897 bytes)Available download formats
    Dataset updated
    Apr 29, 2021
    Authors
    TEJA KUMAR
    Description

    Content

    This dataset consists of top most billionaires in the world and respective their names, whether it is a finance company or any software company, how much money they have ,these all the details which are in the dataset

    Researchers have compiled a multi-decade database of the super-rich. Building off the Forbes World’s Billionaires lists from 1996-2014, scholars at Peterson Institute for International Economics have added a couple dozen more variables about each billionaire - including whether they were self-made or inherited their wealth. (Roughly half of European billionaires and one-third of U.S. billionaires got a significant financial boost from family, the authors estimate.)

    Acknowledgements

    Reference : https://corgis-edu.github.io/corgis/csv/billionaires/

    Inspiration

    Some of the datasets which I have seen in the kaggle or somewhere but it is limited to less number of columns . Kagglers are not able to get an insights from very low amount of data. so that I decided that to be more helpful to them or we can able to get an more insights from this dataset

  9. w

    Globalization and Income Distribution Dataset 1975-2002 - Aruba,...

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Oct 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Branko L. Milanovic (2023). Globalization and Income Distribution Dataset 1975-2002 - Aruba, Afghanistan, Angola...and 188 more [Dataset]. https://microdata.worldbank.org/index.php/catalog/1786
    Explore at:
    Dataset updated
    Oct 26, 2023
    Dataset authored and provided by
    Branko L. Milanovic
    Time period covered
    1975 - 2002
    Area covered
    Angola
    Description

    Abstract

    Dataset used in World Bank Policy Research Working Paper #2876, published in World Bank Economic Review, No. 1, 2005, pp. 21-44.

    The effects of globalization on income distribution in rich and poor countries are a matter of controversy. While international trade theory in its most abstract formulation implies that increased trade and foreign investment should make income distribution more equal in poor countries and less equal in rich countries, finding these effects has proved elusive. The author presents another attempt to discern the effects of globalization by using data from household budget surveys and looking at the impact of openness and foreign direct investment on relative income shares of low and high deciles. The author finds some evidence that at very low average income levels, it is the rich who benefit from openness. As income levels rise to those of countries such as Chile, Colombia, or Czech Republic, for example, the situation changes, and it is the relative income of the poor and the middle class that rises compared with the rich. It seems that openness makes income distribution worse before making it better-or differently in that the effect of openness on a country's income distribution depends on the country's initial income level.

    Kind of data

    Aggregate data [agg]

  10. w

    Education Attainment and Enrollment around the World

    • datacatalog.worldbank.org
    excel, html, pdf, zip
    Updated Nov 4, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ryan Douglas Hahn (2018). Education Attainment and Enrollment around the World [Dataset]. https://datacatalog.worldbank.org/search/dataset/0038973/education-attainment-and-enrollment-around-the-world
    Explore at:
    pdf, excel, html, zipAvailable download formats
    Dataset updated
    Nov 4, 2018
    Dataset provided by
    Ryan Douglas Hahn
    License

    https://datacatalog.worldbank.org/public-licenses?fragment=cchttps://datacatalog.worldbank.org/public-licenses?fragment=cc

    Area covered
    World
    Description

    Patterns of educational attainment vary greatly across countries, and across population groups within countries. In some countries, virtually all children complete basic education whereas in others large groups fall short. The primary purpose of this database, and the associated research program, is to document and analyze these differences using a compilation of a variety of household-based data sets: Demographic and Health Surveys (DHS); Multiple Indicator Cluster Surveys (MICS); Living Standards Measurement Study Surveys (LSMS); as well as country-specific Integrated Household Surveys (IHS) such as Socio-Economic Surveys.


    As shown at the website associated with this database, there are dramatic differences in attainment by wealth. When households are ranked according to their wealth status (or more precisely, a proxy based on the assets owned by members of the household) there are striking differences in the attainment patterns of children from the richest 20 percent compared to the poorest 20 percent.


    In Mali in 2012 only 34 percent of 15 to 19 year olds in the poorest quintile have completed grade 1 whereas 80 percent of the richest quintile have done so. In many countries, for example Pakistan, Peru and Indonesia, almost all the children from the wealthiest households have completed at least one year of schooling. In some countries, like Mali and Pakistan, wealth gaps are evident from grade 1 on, in other countries, like Peru and Indonesia, wealth gaps emerge later in the school system.


    The EdAttain website allows a visual exploration of gaps in attainment and enrollment within and across countries, based on the international database which spans multiple years from over 120 countries and includes indicators disaggregated by wealth, gender and urban/rural location. The database underlying that site can be downloaded from here.

  11. d

    Are Students Ready for a Technology-Rich World? What PISA Studies Tell Us

    • catalog.data.gov
    Updated Mar 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of State (2021). Are Students Ready for a Technology-Rich World? What PISA Studies Tell Us [Dataset]. https://catalog.data.gov/dataset/are-students-ready-for-a-technology-rich-world-what-pisa-studies-tell-us
    Explore at:
    Dataset updated
    Mar 30, 2021
    Dataset provided by
    U.S. Department of State
    Description

    ICT has profound implications for education, both because ICT can facilitate new forms of learning and because it has become important for young people to master ICT in preparation for adult life. But how extensive is access to ICT in schools and informal settings and how is it used by students? Drawing on data from the OECD’s Programme for International Student Assessment (PISA), Are Students Ready for a Technology-Rich World? What PISA Studies Tell Us, examines whether access to computers for students is equitable across countries and student groups; how students use ICT and what their attitudes are towards ICT; the relationship between students’ access to and use of ICT and their performance in PISA 2003; and the implications for educational policy.

  12. t

    Data from: REASSEMBLE: A Multimodal Dataset for Contact-rich Robotic...

    • researchdata.tuwien.ac.at
    txt, zip
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Jan Sliwowski; Shail Jadav; Sergej Stanovcic; Jędrzej Orbik; Johannes Heidersberger; Dongheui Lee; Daniel Jan Sliwowski; Shail Jadav; Sergej Stanovcic; Jędrzej Orbik; Johannes Heidersberger; Dongheui Lee; Daniel Jan Sliwowski; Shail Jadav; Sergej Stanovcic; Jędrzej Orbik; Johannes Heidersberger; Dongheui Lee; Daniel Jan Sliwowski; Shail Jadav; Sergej Stanovcic; Jędrzej Orbik; Johannes Heidersberger; Dongheui Lee (2025). REASSEMBLE: A Multimodal Dataset for Contact-rich Robotic Assembly and Disassembly [Dataset]. http://doi.org/10.48436/0ewrv-8cb44
    Explore at:
    zip, txtAvailable download formats
    Dataset updated
    Jul 15, 2025
    Dataset provided by
    TU Wien
    Authors
    Daniel Jan Sliwowski; Shail Jadav; Sergej Stanovcic; Jędrzej Orbik; Johannes Heidersberger; Dongheui Lee; Daniel Jan Sliwowski; Shail Jadav; Sergej Stanovcic; Jędrzej Orbik; Johannes Heidersberger; Dongheui Lee; Daniel Jan Sliwowski; Shail Jadav; Sergej Stanovcic; Jędrzej Orbik; Johannes Heidersberger; Dongheui Lee; Daniel Jan Sliwowski; Shail Jadav; Sergej Stanovcic; Jędrzej Orbik; Johannes Heidersberger; Dongheui Lee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 9, 2025 - Jan 14, 2025
    Description

    REASSEMBLE: A Multimodal Dataset for Contact-rich Robotic Assembly and Disassembly

    📋 Introduction

    Robotic manipulation remains a core challenge in robotics, particularly for contact-rich tasks such as industrial assembly and disassembly. Existing datasets have significantly advanced learning in manipulation but are primarily focused on simpler tasks like object rearrangement, falling short of capturing the complexity and physical dynamics involved in assembly and disassembly. To bridge this gap, we present REASSEMBLE (Robotic assEmbly disASSEMBLy datasEt), a new dataset designed specifically for contact-rich manipulation tasks. Built around the NIST Assembly Task Board 1 benchmark, REASSEMBLE includes four actions (pick, insert, remove, and place) involving 17 objects. The dataset contains 4,551 demonstrations, of which 4,035 were successful, spanning a total of 781 minutes. Our dataset features multi-modal sensor data including event cameras, force-torque sensors, microphones, and multi-view RGB cameras. This diverse dataset supports research in areas such as learning contact-rich manipulation, task condition identification, action segmentation, and more. We believe REASSEMBLE will be a valuable resource for advancing robotic manipulation in complex, real-world scenarios.

    ✨ Key Features

    • Multimodality: REASSEMBLE contains data from robot proprioception, RGB cameras, Force&Torque sensors, microphones, and event cameras
    • Multitask labels: REASSEMBLE contains labeling which enables research in Temporal Action Segmentation, Motion Policy Learning, Anomaly detection, and Task Inversion.
    • Long horizon: Demonstrations in the REASSEMBLE dataset cover long horizon tasks and actions which usually span multiple steps.
    • Hierarchical labels: REASSEMBLE contains actions segmentation labels at two hierarchical levels.

    🔴 Dataset Collection

    Each demonstration starts by randomizing the board and object poses, after which an operator teleoperates the robot to assemble and disassemble the board while narrating their actions and marking task segment boundaries with key presses. The narrated descriptions are transcribed using Whisper [1], and the board and camera poses are measured at the beginning using a motion capture system, though continuous tracking is avoided due to interference with the event camera. Sensory data is recorded with rosbag and later post-processed into HDF5 files without downsampling or synchronization, preserving raw data and timestamps for future flexibility. To reduce memory usage, video and audio are stored as encoded MP4 and MP3 files, respectively. Transcription errors are corrected automatically or manually, and a custom visualization tool is used to validate the synchronization and correctness of all data and annotations. Missing or incorrect entries are identified and corrected, ensuring the dataset’s completeness. Low-level Skill annotations were added manually after data collection, and all labels were carefully reviewed to ensure accuracy.

    📑 Dataset Structure

    The dataset consists of several HDF5 (.h5) and JSON (.json) files, organized into two directories. The poses directory contains the JSON files, which store the poses of the cameras and the board in the world coordinate frame. The data directory contains the HDF5 files, which store the sensory readings and annotations collected as part of the REASSEMBLE dataset. Each JSON file can be matched with its corresponding HDF5 file based on their filenames, which include the timestamp when the data was recorded. For example, 2025-01-09-13-59-54_poses.json corresponds to 2025-01-09-13-59-54.h5.

    The structure of the JSON files is as follows:

    {"Hama1": [
        [x ,y, z],
        [qx, qy, qz, qw]
     ], 
     "Hama2": [
        [x ,y, z],
        [qx, qy, qz, qw]
     ], 
     "DAVIS346": [
        [x ,y, z],
        [qx, qy, qz, qw]
     ], 
     "NIST_Board1": [
        [x ,y, z],
        [qx, qy, qz, qw]
     ]
    }

    [x, y, z] represent the position of the object, and [qx, qy, qz, qw] represent its orientation as a quaternion.

    The HDF5 (.h5) format organizes data into two main types of structures: datasets, which hold the actual data, and groups, which act like folders that can contain datasets or other groups. In the diagram below, groups are shown as folder icons, and datasets as file icons. The main group of the file directly contains the video, audio, and event data. To save memory, video and audio are stored as encoded byte strings, while event data is stored as arrays. The robot’s proprioceptive information is kept in the robot_state group as arrays. Because different sensors record data at different rates, the arrays vary in length (signified by the N_xxx variable in the data shapes). To align the sensory data, each sensor’s timestamps are stored separately in the timestamps group. Information about action segments is stored in the segments_info group. Each segment is saved as a subgroup, named according to its order in the demonstration, and includes a start timestamp, end timestamp, a success indicator, and a natural language description of the action. Within each segment, low-level skills are organized under a low_level subgroup, following the same structure as the high-level annotations.

    📁

    The splits folder contains two text files which list the h5 files used for the traning and validation splits.

    📌 Important Resources

    The project website contains more details about the REASSEMBLE dataset. The Code for loading and visualizing the data is avaibile on our github repository.

    📄 Project website: https://tuwien-asl.github.io/REASSEMBLE_page/
    💻 Code: https://github.com/TUWIEN-ASL/REASSEMBLE

    ⚠️ File comments

    Below is a table which contains a list records which have any issues. Issues typically correspond to missing data from one of the sensors.

    RecordingIssue
    2025-01-10-15-28-50.h5hand cam missing at beginning
    2025-01-10-16-17-40.h5missing hand cam
    2025-01-10-17-10-38.h5hand cam missing at beginning
    2025-01-10-17-54-09.h5no empty action at

  13. F

    English Agent-Customer Chat Dataset for Healthcare Domain

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). English Agent-Customer Chat Dataset for Healthcare Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/english-healthcare-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The English Healthcare Chat Dataset is a rich collection of over 12,000 text-based conversations between customers and call center agents, focused on real-world healthcare interactions. Designed to reflect authentic language use and domain-specific dialogue patterns, this dataset supports the development of conversational AI, chatbots, and NLP models tailored for healthcare applications in English-speaking regions.

    Participant & Chat Overview

    Participants: 200+ native English speakers from the FutureBeeAI Crowd Community
    Conversation Length: 300–700 words per chat
    Turns per Chat: 50–150 dialogue turns across both participants
    Chat Types: Inbound and outbound
    Sentiment Coverage: Positive, neutral, and negative outcomes included

    Topic Diversity

    The dataset captures a wide spectrum of healthcare-related chat scenarios, ensuring comprehensive coverage for training robust AI systems:

    Inbound Chats (Customer-Initiated): Appointment scheduling, new patient registration, surgery and treatment consultations, diet and lifestyle discussions, insurance claim inquiries, lab result follow-ups
    Outbound Chats (Agent-Initiated): Appointment reminders and confirmations, health and wellness program offers, test result notifications, preventive care and vaccination reminders, subscription renewals, risk assessment and eligibility follow-ups

    This variety helps simulate realistic healthcare support workflows and patient-agent dynamics.

    Language Diversity & Realism

    This dataset reflects the natural flow of English healthcare communication and includes:

    Authentic Naming Patterns: English personal names, clinic names, and brands
    Localized Contact Elements: Addresses, emails, phone numbers, and clinic locations in regional English formats
    Time & Currency References: Use of dates, times, numeric expressions, and currency units aligned with English-speaking regions
    Colloquial & Medical Expressions: Local slang, informal speech, and common healthcare-related terminology

    These elements ensure the dataset is contextually relevant and linguistically rich for real-world use cases.

    Conversational Flow & Structure

    Conversations range from simple inquiries to complex advisory sessions, including:

    General inquiries
    Detailed problem-solving
    Routine status updates
    Treatment recommendations
    Support and feedback interactions

    Each conversation typically includes these structural components:

    Greetings and verification
    Information gathering
    Problem definition
    Solution delivery
    Closing messages
    Follow-up and feedback (where applicable)

    This structured flow mirrors actual healthcare support conversations and is ideal for training advanced dialogue systems.

    Data Format & Structure

    Available in JSON, CSV, and TXT formats, each conversation includes:

    Full message history with clear speaker labels
    Participant identifiers
    Metadata (e.g., topic tags, region, sentiment)
    Compatibility with common NLP and ML pipelines

    Applications

    <p

  14. World's Billionaires

    • kaggle.com
    zip
    Updated May 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sadialiou Diallo (2021). World's Billionaires [Dataset]. https://www.kaggle.com/seriadiallo1/world-billionaires
    Explore at:
    zip(2962 bytes)Available download formats
    Dataset updated
    May 19, 2021
    Authors
    Sadialiou Diallo
    Description

    The richest people in the world, yearly rank from 2002 to 2021

    This dataset contains 200 rows and 7 columns.

    The World's Billionaires is an annual ranking by documented net worth of the world's wealthiest billionaires compiled and published in March annually by the American business magazine Forbes. The list was first published in March 1987. The total net worth of each individual on the list is estimated and is cited in United States dollars, based on their documented assets and accounting for debt. Royalty and dictators whose wealth comes from their positions are excluded from these lists. This ranking is an index of the wealthiest documented individuals, excluding and ranking against those with wealth that is not able to be completely ascertained. (wikipedia)

  15. Data from: InterHub: A Naturalistic Trajectory Dataset with Dense...

    • figshare.com
    csv
    Updated May 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiyan Jiang; Xiaocong Zhao; Yiru Liu; Zirui Li; Peng Hang; Lu Xiong; Jian Sun (2025). InterHub: A Naturalistic Trajectory Dataset with Dense Interaction for Autonomous Driving [Dataset]. http://doi.org/10.6084/m9.figshare.27899754.v6
    Explore at:
    csvAvailable download formats
    Dataset updated
    May 24, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Xiyan Jiang; Xiaocong Zhao; Yiru Liu; Zirui Li; Peng Hang; Lu Xiong; Jian Sun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We provide a dense interaction dataset, InterHub, derived from extensive naturalistic driving records to address the scarcity of real-world datasets capturing rich interaction events.The dataset provided on this page include:A CSV file (Interactive_Segments_Index.csv) containing the indexed list of the extracted interaction events. In addition to indexing and tracing information about interaction scenarios, we also provide some interesting labels to facilitate more targeted retrieval and utilization of interaction scenarios.(For detailed information, please refer to https://github.com/zxc-tju/InterHub.)Relevant unified data cache files (InterHub_cache_files.zip that includes cache files of lyft_train_full, nuplan_train).The Python codes used to process and analyze the dataset can be found at https://github.com/zxc-tju/InterHub. The tools for navigating InterHub involve the following three parts:0_data_unify.py converts various data resources into a unified format for seamless interaction event extraction.1_interaction_extract.py extracts interactive segments from unified driving records.2_case_visualize.py showcases interaction scenarios in InterHub.You can refer to the data structure of cache files presented in dataset.md, and after extracting the InterHub_cache_files.zip file, put it in the corresponding folder.Statement: All third-party data redistributions included in the interhub_cache_files.zip repository are carried out in full compliance with the original licensing terms of the respective source datasets, as required by their mandatory licensing conditions. This portion of the data remains subject to its original licenses, and users of the data are required to comply with these original licensing terms in any subsequent use or redistribution.

  16. G

    Golden Dataset Curation for LLMs Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Oct 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). Golden Dataset Curation for LLMs Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/golden-dataset-curation-for-llms-market
    Explore at:
    csv, pdf, pptxAvailable download formats
    Dataset updated
    Oct 4, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Golden Dataset Curation for LLMs Market Outlook



    According to our latest research, the global Golden Dataset Curation for LLMs market size stood at USD 1.42 billion in 2024, reflecting the surging demand for high-quality, bias-mitigated datasets in large language model (LLM) development. The market is projected to grow at a robust CAGR of 27.8% from 2025 to 2033, reaching an estimated USD 13.9 billion by 2033. This remarkable growth is fueled by the increasing sophistication of AI models, the critical need for reliable training data, and the expanding adoption of LLMs across diverse sectors.



    Several key factors are driving the rapid expansion of the Golden Dataset Curation for LLMs market. First and foremost is the exponential growth in the deployment of large language models across industries such as healthcare, finance, legal, and customer service. As organizations seek to leverage LLMs for complex natural language processing tasks, the demand for meticulously curated, high-quality datasets has become paramount. This is because the performance, reliability, and ethical alignment of LLMs are intrinsically linked to the quality of their training data. Companies are increasingly investing in the curation of "golden datasets"—datasets that are not only comprehensive and representative but also rigorously annotated and validated to minimize bias and ensure regulatory compliance. This trend is expected to intensify as AI regulations tighten and as organizations strive for greater transparency and accountability in AI deployments.



    Another significant growth driver for the Golden Dataset Curation for LLMs market is the advancement in data curation technologies and methodologies. The integration of automation, machine learning, and human-in-the-loop systems has revolutionized the way datasets are curated and validated. These advancements enable the efficient handling of vast and complex data sources, including text, image, audio, and multimodal datasets. The rise of specialized data curation platforms and services has further accelerated the adoption of golden dataset practices, allowing organizations to scale their AI initiatives while maintaining data integrity. Moreover, as LLMs become more multilingual and domain-specific, the need for curated datasets that reflect diverse languages, cultures, and industry-specific knowledge is growing rapidly, further boosting market demand.



    The expanding ecosystem of AI applications is also propelling the Golden Dataset Curation for LLMs market forward. As LLMs are increasingly utilized for tasks such as model training, evaluation, benchmarking, and fine-tuning, the scope and complexity of required datasets have grown exponentially. Organizations are now seeking datasets that not only support model development but also facilitate continuous evaluation and improvement of AI models in real-world scenarios. This has led to a surge in demand for datasets that are regularly updated, contextually rich, and tailored to specific use cases. Additionally, the proliferation of open-source and third-party data sources, coupled with the need for proprietary datasets, has created a dynamic and competitive market landscape where data quality and curation expertise are key differentiators.



    From a regional perspective, North America currently dominates the Golden Dataset Curation for LLMs market, accounting for the largest share in 2024. This leadership is attributed to the presence of major technology companies, a robust research ecosystem, and significant investments in AI and machine learning infrastructure. Europe and Asia Pacific are also emerging as key markets, driven by increasing regulatory focus on AI ethics and the rapid digital transformation of enterprises. The Asia Pacific region, in particular, is expected to witness the highest CAGR during the forecast period, fueled by rising AI adoption in countries such as China, Japan, and India. Meanwhile, Latin America and the Middle East & Africa are gradually catching up, supported by growing awareness of AI's potential and investments in digital infrastructure.





    Dataset Type

  17. F

    Spanish Agent-Customer Chat Dataset for Healthcare Domain

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Spanish Agent-Customer Chat Dataset for Healthcare Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/spanish-healthcare-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Spanish Healthcare Chat Dataset is a rich collection of over 10,000 text-based conversations between customers and call center agents, focused on real-world healthcare interactions. Designed to reflect authentic language use and domain-specific dialogue patterns, this dataset supports the development of conversational AI, chatbots, and NLP models tailored for healthcare applications in Spanish-speaking regions.

    Participant & Chat Overview

    Participants: 150+ native Spanish speakers from the FutureBeeAI Crowd Community
    Conversation Length: 300–700 words per chat
    Turns per Chat: 50–150 dialogue turns across both participants
    Chat Types: Inbound and outbound
    Sentiment Coverage: Positive, neutral, and negative outcomes included

    Topic Diversity

    The dataset captures a wide spectrum of healthcare-related chat scenarios, ensuring comprehensive coverage for training robust AI systems:

    Inbound Chats (Customer-Initiated): Appointment scheduling, new patient registration, surgery and treatment consultations, diet and lifestyle discussions, insurance claim inquiries, lab result follow-ups
    Outbound Chats (Agent-Initiated): Appointment reminders and confirmations, health and wellness program offers, test result notifications, preventive care and vaccination reminders, subscription renewals, risk assessment and eligibility follow-ups

    This variety helps simulate realistic healthcare support workflows and patient-agent dynamics.

    Language Diversity & Realism

    This dataset reflects the natural flow of Spanish healthcare communication and includes:

    Authentic Naming Patterns: Spanish personal names, clinic names, and brands
    Localized Contact Elements: Addresses, emails, phone numbers, and clinic locations in regional Spanish formats
    Time & Currency References: Use of dates, times, numeric expressions, and currency units aligned with Spanish-speaking regions
    Colloquial & Medical Expressions: Local slang, informal speech, and common healthcare-related terminology

    These elements ensure the dataset is contextually relevant and linguistically rich for real-world use cases.

    Conversational Flow & Structure

    Conversations range from simple inquiries to complex advisory sessions, including:

    General inquiries
    Detailed problem-solving
    Routine status updates
    Treatment recommendations
    Support and feedback interactions

    Each conversation typically includes these structural components:

    Greetings and verification
    Information gathering
    Problem definition
    Solution delivery
    Closing messages
    Follow-up and feedback (where applicable)

    This structured flow mirrors actual healthcare support conversations and is ideal for training advanced dialogue systems.

    Data Format & Structure

    Available in JSON, CSV, and TXT formats, each conversation includes:

    Full message history with clear speaker labels
    Participant identifiers
    Metadata (e.g., topic tags, region, sentiment)
    Compatibility with common NLP and ML pipelines

    Applications

    <p

  18. Mapping Ocean Wealth Explorer

    • rmi-data.sprep.org
    • pacificdata.org
    • +14more
    pdf
    Updated Feb 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Secretariat of the Pacific Regional Environment Programme (2025). Mapping Ocean Wealth Explorer [Dataset]. https://rmi-data.sprep.org/dataset/mapping-ocean-wealth-explorer
    Explore at:
    pdf(12573434)Available download formats
    Dataset updated
    Feb 20, 2025
    Dataset provided by
    Pacific Regional Environment Programmehttps://www.sprep.org/
    License

    Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
    License information was derived automatically

    Area covered
    Pacific Region
    Description

    The Mapping Ocean Wealth data viewer is a live online resource for sharing understanding of the value of marine and coastal ecosystems to people. It includes global maps, regionally-specific studies, reference data, and a number of “apps” providing key data analytics. Maps and apps can be opened according to key themes or geographies. The navigator the left of the maps enables you to add or remove any additional map layers as you explore. Information keys explain how the maps were made and provide additional links. Further information and resources can be found on Oceanwealth.org

    • Recreation and Tourism App - Explore the value of healthy ecosystems to the tourism industry
    • Natural Coastal Protection App - Discover the coastal protection benefits of coral reefs around the world
    • Blue Carbon App - View Mangrove Carbon Storage
    • Coral Reef Fisheries App - Learn about the status of coral reef fisheries
    • Regional Planning
    • Mangrove Restoration
  19. Friedl presentation at CIDU - Dataset - NASA Open Data Portal

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Friedl presentation at CIDU - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/friedl-presentation-at-cidu
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    The land remote sensing community has a long history of using supervised and unsupervised methods to help interpret and analyze remote sensing data sets. Until relatively recently, most remote sensing studies have used fairly conventional image processing and pattern recognition methodologies. In the past decade, NASA has launched a series of remote sensing missions known as the Earth Observing System (EOS). The data sets acquired by EOS instruments provide an extremely rich source of information related to the properties and dynamics of the Earth’s terrestrial ecosystems. However, these data are also characterized by large volumes and complex spectral, spatial and temporal attributes. Because of the volume and complexity of EOS data sets, efficient and effective analysis of them presents significant challenges that are difficult to address using conventional remote sensing approaches. In this paper we discuss results from applying a variety of different data mining approaches to global remote sensing data sets. Specifically, we describe three main problem domains and sets of analyses: (1) supervised classification of global land cover from using data from NASA’s Moderate Resolution Imaging Spectroradiometer; (2) the use of linear and non-linear cluster and dimensionality reduction methods to examine coupled climate-vegetation dynamics using a twenty year time series of data from the Advanced Very High Resolution Radiometer; and (3) the use of functional models, non-parametric clustering, and mixture models to help interpret and understand the feature space and class structure of high dimensional remote sensing data sets. The paper will not focus on specific details of algorithms. Instead we describe key results, successes, and lessons learned from ten years of research focusing on the use of data mining and machine learning methods for remote sensing and Earth science problems.

  20. F

    Vietnamese Agent-Customer Chat Dataset for Healthcare Domain

    • futurebeeai.com
    wav
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FutureBee AI (2022). Vietnamese Agent-Customer Chat Dataset for Healthcare Domain [Dataset]. https://www.futurebeeai.com/dataset/text-dataset/vietnamese-healthcare-domain-conversation-text-dataset
    Explore at:
    wavAvailable download formats
    Dataset updated
    Aug 1, 2022
    Dataset provided by
    FutureBeeAI
    Authors
    FutureBee AI
    License

    https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement

    Dataset funded by
    FutureBeeAI
    Description

    Introduction

    The Vietnamese Healthcare Chat Dataset is a rich collection of over 10,000 text-based conversations between customers and call center agents, focused on real-world healthcare interactions. Designed to reflect authentic language use and domain-specific dialogue patterns, this dataset supports the development of conversational AI, chatbots, and NLP models tailored for healthcare applications in Vietnamese-speaking regions.

    Participant & Chat Overview

    Participants: 150+ native Vietnamese speakers from the FutureBeeAI Crowd Community
    Conversation Length: 300–700 words per chat
    Turns per Chat: 50–150 dialogue turns across both participants
    Chat Types: Inbound and outbound
    Sentiment Coverage: Positive, neutral, and negative outcomes included

    Topic Diversity

    The dataset captures a wide spectrum of healthcare-related chat scenarios, ensuring comprehensive coverage for training robust AI systems:

    Inbound Chats (Customer-Initiated): Appointment scheduling, new patient registration, surgery and treatment consultations, diet and lifestyle discussions, insurance claim inquiries, lab result follow-ups
    Outbound Chats (Agent-Initiated): Appointment reminders and confirmations, health and wellness program offers, test result notifications, preventive care and vaccination reminders, subscription renewals, risk assessment and eligibility follow-ups

    This variety helps simulate realistic healthcare support workflows and patient-agent dynamics.

    Language Diversity & Realism

    This dataset reflects the natural flow of Vietnamese healthcare communication and includes:

    Authentic Naming Patterns: Vietnamese personal names, clinic names, and brands
    Localized Contact Elements: Addresses, emails, phone numbers, and clinic locations in regional Vietnamese formats
    Time & Currency References: Use of dates, times, numeric expressions, and currency units aligned with Vietnamese-speaking regions
    Colloquial & Medical Expressions: Local slang, informal speech, and common healthcare-related terminology

    These elements ensure the dataset is contextually relevant and linguistically rich for real-world use cases.

    Conversational Flow & Structure

    Conversations range from simple inquiries to complex advisory sessions, including:

    General inquiries
    Detailed problem-solving
    Routine status updates
    Treatment recommendations
    Support and feedback interactions

    Each conversation typically includes these structural components:

    Greetings and verification
    Information gathering
    Problem definition
    Solution delivery
    Closing messages
    Follow-up and feedback (where applicable)

    This structured flow mirrors actual healthcare support conversations and is ideal for training advanced dialogue systems.

    Data Format & Structure

    Available in JSON, CSV, and TXT formats, each conversation includes:

    Full message history with clear speaker labels
    Participant identifiers
    Metadata (e.g., topic tags, region, sentiment)
    Compatibility with common NLP and ML pipelines
    <h3 style="font-weight:

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Nate Raw (2023). 100-richest-people-in-world [Dataset]. https://huggingface.co/datasets/nateraw/100-richest-people-in-world

100-richest-people-in-world

nateraw/100-richest-people-in-world

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 2, 2023
Authors
Nate Raw
License

https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

Area covered
World
Description

Dataset Card for 100 Richest People In World

  Dataset Summary

This dataset contains the list of Top 100 Richest People in the World Column Information:-

Name - Person Name NetWorth - His/Her Networth Age - Person Age Country - The country person belongs to Source - Information Source Industry - Expertise Domain

  Join our Community









  Supported Tasks and Leaderboards

[More Information Needed]

  Languages

[More Information Needed]… See the full description on the dataset page: https://huggingface.co/datasets/nateraw/100-richest-people-in-world.

Search
Clear search
Close search
Google apps
Main menu