Facebook
Twitterhttps://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Dataset Card for 100 Richest People In World
Dataset Summary
This dataset contains the list of Top 100 Richest People in the World Column Information:-
Name - Person Name NetWorth - His/Her Networth Age - Person Age Country - The country person belongs to Source - Information Source Industry - Expertise Domain
Join our Community
Supported Tasks and Leaderboards
[More Information Needed]
Languages
[More Information Needed]… See the full description on the dataset page: https://huggingface.co/datasets/nateraw/100-richest-people-in-world.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset provides a synthetic overview of the 1,000 wealthiest individuals in the world, offering insights into the distribution of wealth across industries and regions. It is designed to help analysts, researchers, and data enthusiasts explore global wealth trends, industry dominance, and regional wealth concentration.
Whether you're conducting market research, financial analysis, or data modeling, this dataset serves as a valuable resource for understanding the characteristics of the world's top billionaires.
📊 Key Features: Name 👤: The name of the billionaire. Country 🌍: Country of residence or primary business operation. Industry 🏭: Industry in which the individual has built their wealth. Net Worth (in billions) 💵: Estimated net worth in billions of USD. Company 🏢: The primary company or business associated with the billionaire. ⚠️ Important Note: This dataset is 100% synthetic and does not contain real financial or personal data. It is artificially generated for educational, analytical, and research purposes.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This comprehensive dataset encapsulates a detailed snapshot of the wealthiest individuals globally, as listed by Forbes in 2024. Compiled through meticulous web scraping and data aggregation, the dataset includes a wide range of attributes for each billionaire. Fields encompass basic personal information such as name, age, and gender, alongside financial details including net worth and sources of wealth. The dataset further delves into aspects like industry involvement, organizational affiliations, philanthropic endeavors, and educational backgrounds.
Key attributes in this dataset include:
Name: Full legal name of the billionaire. Age: Age of the individual. 2024 Net Worth: Estimated net worth in USD for the year 2024. Industry: Primary industry or sector of operation. Source of Wealth: Origin of the billionaire’s wealth. Title: Professional title or position. Organization: Name of the associated organization. Self-Made: Indicator if the wealth is self-made. Self-Made Score: A quantitative score assessing how self-made their wealth is. Philanthropy Score: A score reflecting the extent of their philanthropic activities. Residence: Main residence of the individual. Citizenship: Legal citizenship. Gender: Gender identity. Marital Status: Current marital status. Children: Number of children. Education: Highest level of education attained.
This dataset is ideal for analysis, offering insights into the distribution of wealth, the influence of education on wealth accumulation, and trends across different industries. It also provides a foundation for exploring the impact of socioeconomic factors on personal wealth. The data were collected and formatted with careful consideration to ensure accuracy, making it a valuable resource for researchers, economists, and anyone interested in the dynamics of wealth and success.
Please note that some data is missing in this dataset, primarily due to the unavailability of information from Forbes. This issue becomes more prevalent beyond the top 400 entries. Many individuals lack a self-made score, a philanthropy score, or specific details regarding their title or organization as per Forbes' listings. I am currently working to update the dataset with this missing information. However, this update process is quite tedious and time-consuming since it is mostly manual. I appreciate your patience and understanding as I work through these details.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Cleaned dataset from the Billionaires Statistic Dataset (2023) that can be found here.
The code I used to clean and re-structure the data is also here.
First things first: a big shout-out to Nidula Elgiriyewithana for providing the original data.
As with it, this dataset contains various information about the world's wealthiest persons in different columns that can be grouped into three different types:
If you want a challenge, you can create a dashboard using tools such as Plotly to dynamically visualize the data using one or different attributes (such as industry, age or country). I did it, leave the link below in case you want to investigate:
If you find this dataset informative or inspirational, a vote is appreciated for others to easily discover value in it 💎💰
Facebook
TwitterAs of March 2025, Elon Musk had a net worth valued at 328.5 billion U.S. dollars, making him the richest man in the world. Amazon founder Jeff Bezos followed in second, with Marc Zuckerberg, the founder of Facebook, in third. The list is dominated by Americans, and Alice Walton and Francoise Bettencourt Meyers are the only women among the 20 richest people worldwide.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains the top 100 richest people in the world based on their net worth. The dataset includes their rank, name, net worth, birthday, age, and nationality.
This dataset was collected using web scraping (Beautiful Soup) on this website and this "https://en.wikipedia.org/wiki/List_of_countries_by_number_of_billionaires">wikipedia
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This graph was retired this internet :
The "Richest People in the World - 2024" dataset provides a detailed overview of the wealthiest individuals globally for the year 2024. This dataset includes crucial information about the top executives, their net worth, and the countries they are based in, offering valuable insights for economic analysis, market research, and financial studies.
Facebook
TwitterThis dataset consists of top most billionaires in the world and respective their names, whether it is a finance company or any software company, how much money they have ,these all the details which are in the dataset
Researchers have compiled a multi-decade database of the super-rich. Building off the Forbes World’s Billionaires lists from 1996-2014, scholars at Peterson Institute for International Economics have added a couple dozen more variables about each billionaire - including whether they were self-made or inherited their wealth. (Roughly half of European billionaires and one-third of U.S. billionaires got a significant financial boost from family, the authors estimate.)
Reference : https://corgis-edu.github.io/corgis/csv/billionaires/
Some of the datasets which I have seen in the kaggle or somewhere but it is limited to less number of columns . Kagglers are not able to get an insights from very low amount of data. so that I decided that to be more helpful to them or we can able to get an more insights from this dataset
Facebook
TwitterDataset used in World Bank Policy Research Working Paper #2876, published in World Bank Economic Review, No. 1, 2005, pp. 21-44.
The effects of globalization on income distribution in rich and poor countries are a matter of controversy. While international trade theory in its most abstract formulation implies that increased trade and foreign investment should make income distribution more equal in poor countries and less equal in rich countries, finding these effects has proved elusive. The author presents another attempt to discern the effects of globalization by using data from household budget surveys and looking at the impact of openness and foreign direct investment on relative income shares of low and high deciles. The author finds some evidence that at very low average income levels, it is the rich who benefit from openness. As income levels rise to those of countries such as Chile, Colombia, or Czech Republic, for example, the situation changes, and it is the relative income of the poor and the middle class that rises compared with the rich. It seems that openness makes income distribution worse before making it better-or differently in that the effect of openness on a country's income distribution depends on the country's initial income level.
Aggregate data [agg]
Facebook
Twitterhttps://datacatalog.worldbank.org/public-licenses?fragment=cchttps://datacatalog.worldbank.org/public-licenses?fragment=cc
Patterns of educational attainment vary greatly across countries, and across population groups within countries. In some countries, virtually all children complete basic education whereas in others large groups fall short. The primary purpose of this database, and the associated research program, is to document and analyze these differences using a compilation of a variety of household-based data sets: Demographic and Health Surveys (DHS); Multiple Indicator Cluster Surveys (MICS); Living Standards Measurement Study Surveys (LSMS); as well as country-specific Integrated Household Surveys (IHS) such as Socio-Economic Surveys.
As shown at the website associated with this database, there are dramatic differences in attainment by wealth. When households are ranked according to their wealth status (or more precisely, a proxy based on the assets owned by members of the household) there are striking differences in the attainment patterns of children from the richest 20 percent compared to the poorest 20 percent.
In Mali in 2012 only 34 percent of 15 to 19 year olds in the poorest quintile have completed grade 1 whereas 80 percent of the richest quintile have done so. In many countries, for example Pakistan, Peru and Indonesia, almost all the children from the wealthiest households have completed at least one year of schooling. In some countries, like Mali and Pakistan, wealth gaps are evident from grade 1 on, in other countries, like Peru and Indonesia, wealth gaps emerge later in the school system.
The EdAttain website allows a visual exploration of gaps in attainment and enrollment within and across countries, based on the international database which spans multiple years from over 120 countries and includes indicators disaggregated by wealth, gender and urban/rural location. The database underlying that site can be downloaded from here.
Facebook
TwitterICT has profound implications for education, both because ICT can facilitate new forms of learning and because it has become important for young people to master ICT in preparation for adult life. But how extensive is access to ICT in schools and informal settings and how is it used by students? Drawing on data from the OECD’s Programme for International Student Assessment (PISA), Are Students Ready for a Technology-Rich World? What PISA Studies Tell Us, examines whether access to computers for students is equitable across countries and student groups; how students use ICT and what their attitudes are towards ICT; the relationship between students’ access to and use of ICT and their performance in PISA 2003; and the implications for educational policy.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Robotic manipulation remains a core challenge in robotics, particularly for contact-rich tasks such as industrial assembly and disassembly. Existing datasets have significantly advanced learning in manipulation but are primarily focused on simpler tasks like object rearrangement, falling short of capturing the complexity and physical dynamics involved in assembly and disassembly. To bridge this gap, we present REASSEMBLE (Robotic assEmbly disASSEMBLy datasEt), a new dataset designed specifically for contact-rich manipulation tasks. Built around the NIST Assembly Task Board 1 benchmark, REASSEMBLE includes four actions (pick, insert, remove, and place) involving 17 objects. The dataset contains 4,551 demonstrations, of which 4,035 were successful, spanning a total of 781 minutes. Our dataset features multi-modal sensor data including event cameras, force-torque sensors, microphones, and multi-view RGB cameras. This diverse dataset supports research in areas such as learning contact-rich manipulation, task condition identification, action segmentation, and more. We believe REASSEMBLE will be a valuable resource for advancing robotic manipulation in complex, real-world scenarios.
Each demonstration starts by randomizing the board and object poses, after which an operator teleoperates the robot to assemble and disassemble the board while narrating their actions and marking task segment boundaries with key presses. The narrated descriptions are transcribed using Whisper [1], and the board and camera poses are measured at the beginning using a motion capture system, though continuous tracking is avoided due to interference with the event camera. Sensory data is recorded with rosbag and later post-processed into HDF5 files without downsampling or synchronization, preserving raw data and timestamps for future flexibility. To reduce memory usage, video and audio are stored as encoded MP4 and MP3 files, respectively. Transcription errors are corrected automatically or manually, and a custom visualization tool is used to validate the synchronization and correctness of all data and annotations. Missing or incorrect entries are identified and corrected, ensuring the dataset’s completeness. Low-level Skill annotations were added manually after data collection, and all labels were carefully reviewed to ensure accuracy.
The dataset consists of several HDF5 (.h5) and JSON (.json) files, organized into two directories. The poses directory contains the JSON files, which store the poses of the cameras and the board in the world coordinate frame. The data directory contains the HDF5 files, which store the sensory readings and annotations collected as part of the REASSEMBLE dataset. Each JSON file can be matched with its corresponding HDF5 file based on their filenames, which include the timestamp when the data was recorded. For example, 2025-01-09-13-59-54_poses.json corresponds to 2025-01-09-13-59-54.h5.
The structure of the JSON files is as follows:
{"Hama1": [
[x ,y, z],
[qx, qy, qz, qw]
],
"Hama2": [
[x ,y, z],
[qx, qy, qz, qw]
],
"DAVIS346": [
[x ,y, z],
[qx, qy, qz, qw]
],
"NIST_Board1": [
[x ,y, z],
[qx, qy, qz, qw]
]
}
[x, y, z] represent the position of the object, and [qx, qy, qz, qw] represent its orientation as a quaternion.
The HDF5 (.h5) format organizes data into two main types of structures: datasets, which hold the actual data, and groups, which act like folders that can contain datasets or other groups. In the diagram below, groups are shown as folder icons, and datasets as file icons. The main group of the file directly contains the video, audio, and event data. To save memory, video and audio are stored as encoded byte strings, while event data is stored as arrays. The robot’s proprioceptive information is kept in the robot_state group as arrays. Because different sensors record data at different rates, the arrays vary in length (signified by the N_xxx variable in the data shapes). To align the sensory data, each sensor’s timestamps are stored separately in the timestamps group. Information about action segments is stored in the segments_info group. Each segment is saved as a subgroup, named according to its order in the demonstration, and includes a start timestamp, end timestamp, a success indicator, and a natural language description of the action. Within each segment, low-level skills are organized under a low_level subgroup, following the same structure as the high-level annotations.
📁
The splits folder contains two text files which list the h5 files used for the traning and validation splits.
The project website contains more details about the REASSEMBLE dataset. The Code for loading and visualizing the data is avaibile on our github repository.
📄 Project website: https://tuwien-asl.github.io/REASSEMBLE_page/
💻 Code: https://github.com/TUWIEN-ASL/REASSEMBLE
| Recording | Issue |
| 2025-01-10-15-28-50.h5 | hand cam missing at beginning |
| 2025-01-10-16-17-40.h5 | missing hand cam |
| 2025-01-10-17-10-38.h5 | hand cam missing at beginning |
| 2025-01-10-17-54-09.h5 | no empty action at |
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The English Healthcare Chat Dataset is a rich collection of over 12,000 text-based conversations between customers and call center agents, focused on real-world healthcare interactions. Designed to reflect authentic language use and domain-specific dialogue patterns, this dataset supports the development of conversational AI, chatbots, and NLP models tailored for healthcare applications in English-speaking regions.
The dataset captures a wide spectrum of healthcare-related chat scenarios, ensuring comprehensive coverage for training robust AI systems:
This variety helps simulate realistic healthcare support workflows and patient-agent dynamics.
This dataset reflects the natural flow of English healthcare communication and includes:
These elements ensure the dataset is contextually relevant and linguistically rich for real-world use cases.
Conversations range from simple inquiries to complex advisory sessions, including:
Each conversation typically includes these structural components:
This structured flow mirrors actual healthcare support conversations and is ideal for training advanced dialogue systems.
Available in JSON, CSV, and TXT formats, each conversation includes:
Facebook
TwitterThis dataset contains 200 rows and 7 columns.
The World's Billionaires is an annual ranking by documented net worth of the world's wealthiest billionaires compiled and published in March annually by the American business magazine Forbes. The list was first published in March 1987. The total net worth of each individual on the list is estimated and is cited in United States dollars, based on their documented assets and accounting for debt. Royalty and dictators whose wealth comes from their positions are excluded from these lists. This ranking is an index of the wealthiest documented individuals, excluding and ranking against those with wealth that is not able to be completely ascertained. (wikipedia)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We provide a dense interaction dataset, InterHub, derived from extensive naturalistic driving records to address the scarcity of real-world datasets capturing rich interaction events.The dataset provided on this page include:A CSV file (Interactive_Segments_Index.csv) containing the indexed list of the extracted interaction events. In addition to indexing and tracing information about interaction scenarios, we also provide some interesting labels to facilitate more targeted retrieval and utilization of interaction scenarios.(For detailed information, please refer to https://github.com/zxc-tju/InterHub.)Relevant unified data cache files (InterHub_cache_files.zip that includes cache files of lyft_train_full, nuplan_train).The Python codes used to process and analyze the dataset can be found at https://github.com/zxc-tju/InterHub. The tools for navigating InterHub involve the following three parts:0_data_unify.py converts various data resources into a unified format for seamless interaction event extraction.1_interaction_extract.py extracts interactive segments from unified driving records.2_case_visualize.py showcases interaction scenarios in InterHub.You can refer to the data structure of cache files presented in dataset.md, and after extracting the InterHub_cache_files.zip file, put it in the corresponding folder.Statement: All third-party data redistributions included in the interhub_cache_files.zip repository are carried out in full compliance with the original licensing terms of the respective source datasets, as required by their mandatory licensing conditions. This portion of the data remains subject to its original licenses, and users of the data are required to comply with these original licensing terms in any subsequent use or redistribution.
Facebook
Twitter
According to our latest research, the global Golden Dataset Curation for LLMs market size stood at USD 1.42 billion in 2024, reflecting the surging demand for high-quality, bias-mitigated datasets in large language model (LLM) development. The market is projected to grow at a robust CAGR of 27.8% from 2025 to 2033, reaching an estimated USD 13.9 billion by 2033. This remarkable growth is fueled by the increasing sophistication of AI models, the critical need for reliable training data, and the expanding adoption of LLMs across diverse sectors.
Several key factors are driving the rapid expansion of the Golden Dataset Curation for LLMs market. First and foremost is the exponential growth in the deployment of large language models across industries such as healthcare, finance, legal, and customer service. As organizations seek to leverage LLMs for complex natural language processing tasks, the demand for meticulously curated, high-quality datasets has become paramount. This is because the performance, reliability, and ethical alignment of LLMs are intrinsically linked to the quality of their training data. Companies are increasingly investing in the curation of "golden datasets"—datasets that are not only comprehensive and representative but also rigorously annotated and validated to minimize bias and ensure regulatory compliance. This trend is expected to intensify as AI regulations tighten and as organizations strive for greater transparency and accountability in AI deployments.
Another significant growth driver for the Golden Dataset Curation for LLMs market is the advancement in data curation technologies and methodologies. The integration of automation, machine learning, and human-in-the-loop systems has revolutionized the way datasets are curated and validated. These advancements enable the efficient handling of vast and complex data sources, including text, image, audio, and multimodal datasets. The rise of specialized data curation platforms and services has further accelerated the adoption of golden dataset practices, allowing organizations to scale their AI initiatives while maintaining data integrity. Moreover, as LLMs become more multilingual and domain-specific, the need for curated datasets that reflect diverse languages, cultures, and industry-specific knowledge is growing rapidly, further boosting market demand.
The expanding ecosystem of AI applications is also propelling the Golden Dataset Curation for LLMs market forward. As LLMs are increasingly utilized for tasks such as model training, evaluation, benchmarking, and fine-tuning, the scope and complexity of required datasets have grown exponentially. Organizations are now seeking datasets that not only support model development but also facilitate continuous evaluation and improvement of AI models in real-world scenarios. This has led to a surge in demand for datasets that are regularly updated, contextually rich, and tailored to specific use cases. Additionally, the proliferation of open-source and third-party data sources, coupled with the need for proprietary datasets, has created a dynamic and competitive market landscape where data quality and curation expertise are key differentiators.
From a regional perspective, North America currently dominates the Golden Dataset Curation for LLMs market, accounting for the largest share in 2024. This leadership is attributed to the presence of major technology companies, a robust research ecosystem, and significant investments in AI and machine learning infrastructure. Europe and Asia Pacific are also emerging as key markets, driven by increasing regulatory focus on AI ethics and the rapid digital transformation of enterprises. The Asia Pacific region, in particular, is expected to witness the highest CAGR during the forecast period, fueled by rising AI adoption in countries such as China, Japan, and India. Meanwhile, Latin America and the Middle East & Africa are gradually catching up, supported by growing awareness of AI's potential and investments in digital infrastructure.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The Spanish Healthcare Chat Dataset is a rich collection of over 10,000 text-based conversations between customers and call center agents, focused on real-world healthcare interactions. Designed to reflect authentic language use and domain-specific dialogue patterns, this dataset supports the development of conversational AI, chatbots, and NLP models tailored for healthcare applications in Spanish-speaking regions.
The dataset captures a wide spectrum of healthcare-related chat scenarios, ensuring comprehensive coverage for training robust AI systems:
This variety helps simulate realistic healthcare support workflows and patient-agent dynamics.
This dataset reflects the natural flow of Spanish healthcare communication and includes:
These elements ensure the dataset is contextually relevant and linguistically rich for real-world use cases.
Conversations range from simple inquiries to complex advisory sessions, including:
Each conversation typically includes these structural components:
This structured flow mirrors actual healthcare support conversations and is ideal for training advanced dialogue systems.
Available in JSON, CSV, and TXT formats, each conversation includes:
Facebook
TwitterPublic Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
The Mapping Ocean Wealth data viewer is a live online resource for sharing understanding of the value of marine and coastal ecosystems to people. It includes global maps, regionally-specific studies, reference data, and a number of “apps” providing key data analytics. Maps and apps can be opened according to key themes or geographies. The navigator the left of the maps enables you to add or remove any additional map layers as you explore. Information keys explain how the maps were made and provide additional links. Further information and resources can be found on Oceanwealth.org
Facebook
TwitterThe land remote sensing community has a long history of using supervised and unsupervised methods to help interpret and analyze remote sensing data sets. Until relatively recently, most remote sensing studies have used fairly conventional image processing and pattern recognition methodologies. In the past decade, NASA has launched a series of remote sensing missions known as the Earth Observing System (EOS). The data sets acquired by EOS instruments provide an extremely rich source of information related to the properties and dynamics of the Earth’s terrestrial ecosystems. However, these data are also characterized by large volumes and complex spectral, spatial and temporal attributes. Because of the volume and complexity of EOS data sets, efficient and effective analysis of them presents significant challenges that are difficult to address using conventional remote sensing approaches. In this paper we discuss results from applying a variety of different data mining approaches to global remote sensing data sets. Specifically, we describe three main problem domains and sets of analyses: (1) supervised classification of global land cover from using data from NASA’s Moderate Resolution Imaging Spectroradiometer; (2) the use of linear and non-linear cluster and dimensionality reduction methods to examine coupled climate-vegetation dynamics using a twenty year time series of data from the Advanced Very High Resolution Radiometer; and (3) the use of functional models, non-parametric clustering, and mixture models to help interpret and understand the feature space and class structure of high dimensional remote sensing data sets. The paper will not focus on specific details of algorithms. Instead we describe key results, successes, and lessons learned from ten years of research focusing on the use of data mining and machine learning methods for remote sensing and Earth science problems.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
The Vietnamese Healthcare Chat Dataset is a rich collection of over 10,000 text-based conversations between customers and call center agents, focused on real-world healthcare interactions. Designed to reflect authentic language use and domain-specific dialogue patterns, this dataset supports the development of conversational AI, chatbots, and NLP models tailored for healthcare applications in Vietnamese-speaking regions.
The dataset captures a wide spectrum of healthcare-related chat scenarios, ensuring comprehensive coverage for training robust AI systems:
This variety helps simulate realistic healthcare support workflows and patient-agent dynamics.
This dataset reflects the natural flow of Vietnamese healthcare communication and includes:
These elements ensure the dataset is contextually relevant and linguistically rich for real-world use cases.
Conversations range from simple inquiries to complex advisory sessions, including:
Each conversation typically includes these structural components:
This structured flow mirrors actual healthcare support conversations and is ideal for training advanced dialogue systems.
Available in JSON, CSV, and TXT formats, each conversation includes:
Facebook
Twitterhttps://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Dataset Card for 100 Richest People In World
Dataset Summary
This dataset contains the list of Top 100 Richest People in the World Column Information:-
Name - Person Name NetWorth - His/Her Networth Age - Person Age Country - The country person belongs to Source - Information Source Industry - Expertise Domain
Join our Community
Supported Tasks and Leaderboards
[More Information Needed]
Languages
[More Information Needed]… See the full description on the dataset page: https://huggingface.co/datasets/nateraw/100-richest-people-in-world.