14 datasets found

Forbes Top 200 Richest American
kaggle.com
zip
Updated Feb 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ayessa (2023). Forbes Top 200 Richest American [Dataset]. https://www.kaggle.com/datasets/ayessa/forbes-top-200-richest-american/code
Explore at:
zip(9658 bytes)Available download formats
Dataset updated
Feb 6, 2023
Authors
Ayessa
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
The Forbes published "The Definitive Ranking of the Wealthiest Americans In 2022". And here is the top 200 list including additional information on each of the billionaires.

About Dataset

This dataset contains the top 200 richest American based on their net worth.

Columns Attributes

| Column | Meaning | | -- | -- | | rank | their rank | | name | their name | | net worth | their net worth | | age | their age | | title | their title (e.g. CEO, Chairman etc.) | | source of wealth | the source of how they've managed to get this much money | | self made score | shows how far each of these billionaires has climbed to make it to the top. According to Forbes, The score ranges from 1 to 10, with 1 through 5 indicating someone who inherited some or all of his or her fortune; while 6 through 10 are for those who built their company or established a fortune on his or her own. | | philanthropy score | this score shows how much these billionaires donates on nonprofits foundations | | residence | their residence | | marital status | their marital status | | children | their children | | education | their education |

Methodology

This dataset was acquired using a web scraping tool called Beautiful soup and scraped Forbes website

Image by pch.vector on Freepik
Survey of Consumer Finances
federalreserve.gov
Updated Oct 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Board of Governors of the Federal Reserve Board (2023). Survey of Consumer Finances [Dataset]. http://doi.org/10.17016/8799
Explore at:
Unique identifier
https://doi.org/10.17016/8799
Dataset updated
Oct 18, 2023
Dataset provided by
Federal Reserve Systemhttp://www.federalreserve.gov/
Federal Reserve Board of Governors
Authors
Board of Governors of the Federal Reserve Board
Time period covered
1962 - 2023
Description
The Survey of Consumer Finances (SCF) is normally a triennial cross-sectional survey of U.S. families. The survey data include information on families' balance sheets, pensions, income, and demographic characteristics.
t
Data from: REASSEMBLE: A Multimodal Dataset for Contact-rich Robotic...
researchdata.tuwien.ac.at
txt, zip
Updated Jul 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Jan Sliwowski; Shail Jadav; Sergej Stanovcic; Jędrzej Orbik; Johannes Heidersberger; Dongheui Lee; Daniel Jan Sliwowski; Shail Jadav; Sergej Stanovcic; Jędrzej Orbik; Johannes Heidersberger; Dongheui Lee; Daniel Jan Sliwowski; Shail Jadav; Sergej Stanovcic; Jędrzej Orbik; Johannes Heidersberger; Dongheui Lee; Daniel Jan Sliwowski; Shail Jadav; Sergej Stanovcic; Jędrzej Orbik; Johannes Heidersberger; Dongheui Lee (2025). REASSEMBLE: A Multimodal Dataset for Contact-rich Robotic Assembly and Disassembly [Dataset]. http://doi.org/10.48436/0ewrv-8cb44
Explore at:
zip, txtAvailable download formats
Unique identifier
https://doi.org/10.48436/0ewrv-8cb44
Dataset updated
Jul 15, 2025
Dataset provided by
TU Wien
Authors
Daniel Jan Sliwowski; Shail Jadav; Sergej Stanovcic; Jędrzej Orbik; Johannes Heidersberger; Dongheui Lee; Daniel Jan Sliwowski; Shail Jadav; Sergej Stanovcic; Jędrzej Orbik; Johannes Heidersberger; Dongheui Lee; Daniel Jan Sliwowski; Shail Jadav; Sergej Stanovcic; Jędrzej Orbik; Johannes Heidersberger; Dongheui Lee; Daniel Jan Sliwowski; Shail Jadav; Sergej Stanovcic; Jędrzej Orbik; Johannes Heidersberger; Dongheui Lee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 9, 2025 - Jan 14, 2025
Description
REASSEMBLE: A Multimodal Dataset for Contact-rich Robotic Assembly and Disassembly

📋 Introduction

Robotic manipulation remains a core challenge in robotics, particularly for contact-rich tasks such as industrial assembly and disassembly. Existing datasets have significantly advanced learning in manipulation but are primarily focused on simpler tasks like object rearrangement, falling short of capturing the complexity and physical dynamics involved in assembly and disassembly. To bridge this gap, we present REASSEMBLE (Robotic assEmbly disASSEMBLy datasEt), a new dataset designed specifically for contact-rich manipulation tasks. Built around the NIST Assembly Task Board 1 benchmark, REASSEMBLE includes four actions (pick, insert, remove, and place) involving 17 objects. The dataset contains 4,551 demonstrations, of which 4,035 were successful, spanning a total of 781 minutes. Our dataset features multi-modal sensor data including event cameras, force-torque sensors, microphones, and multi-view RGB cameras. This diverse dataset supports research in areas such as learning contact-rich manipulation, task condition identification, action segmentation, and more. We believe REASSEMBLE will be a valuable resource for advancing robotic manipulation in complex, real-world scenarios.

✨ Key Features

Multimodality: REASSEMBLE contains data from robot proprioception, RGB cameras, Force&Torque sensors, microphones, and event cameras

Multitask labels: REASSEMBLE contains labeling which enables research in Temporal Action Segmentation, Motion Policy Learning, Anomaly detection, and Task Inversion.

Long horizon: Demonstrations in the REASSEMBLE dataset cover long horizon tasks and actions which usually span multiple steps.

Hierarchical labels: REASSEMBLE contains actions segmentation labels at two hierarchical levels.

🔴 Dataset Collection

Each demonstration starts by randomizing the board and object poses, after which an operator teleoperates the robot to assemble and disassemble the board while narrating their actions and marking task segment boundaries with key presses. The narrated descriptions are transcribed using Whisper [1], and the board and camera poses are measured at the beginning using a motion capture system, though continuous tracking is avoided due to interference with the event camera. Sensory data is recorded with rosbag and later post-processed into HDF5 files without downsampling or synchronization, preserving raw data and timestamps for future flexibility. To reduce memory usage, video and audio are stored as encoded MP4 and MP3 files, respectively. Transcription errors are corrected automatically or manually, and a custom visualization tool is used to validate the synchronization and correctness of all data and annotations. Missing or incorrect entries are identified and corrected, ensuring the dataset’s completeness. Low-level Skill annotations were added manually after data collection, and all labels were carefully reviewed to ensure accuracy.

📑 Dataset Structure

The dataset consists of several HDF5 (.h5) and JSON (.json) files, organized into two directories. The poses directory contains the JSON files, which store the poses of the cameras and the board in the world coordinate frame. The data directory contains the HDF5 files, which store the sensory readings and annotations collected as part of the REASSEMBLE dataset. Each JSON file can be matched with its corresponding HDF5 file based on their filenames, which include the timestamp when the data was recorded. For example, 2025-01-09-13-59-54_poses.json corresponds to 2025-01-09-13-59-54.h5.

The structure of the JSON files is as follows:

{"Hama1": [ [x ,y, z], [qx, qy, qz, qw] ], "Hama2": [ [x ,y, z], [qx, qy, qz, qw] ], "DAVIS346": [ [x ,y, z], [qx, qy, qz, qw] ], "NIST_Board1": [ [x ,y, z], [qx, qy, qz, qw] ] }

[x, y, z] represent the position of the object, and [qx, qy, qz, qw] represent its orientation as a quaternion.

The HDF5 (.h5) format organizes data into two main types of structures: datasets, which hold the actual data, and groups, which act like folders that can contain datasets or other groups. In the diagram below, groups are shown as folder icons, and datasets as file icons. The main group of the file directly contains the video, audio, and event data. To save memory, video and audio are stored as encoded byte strings, while event data is stored as arrays. The robot’s proprioceptive information is kept in the robot_state group as arrays. Because different sensors record data at different rates, the arrays vary in length (signified by the N_xxx variable in the data shapes). To align the sensory data, each sensor’s timestamps are stored separately in the timestamps group. Information about action segments is stored in the segments_info group. Each segment is saved as a subgroup, named according to its order in the demonstration, and includes a start timestamp, end timestamp, a success indicator, and a natural language description of the action. Within each segment, low-level skills are organized under a low_level subgroup, following the same structure as the high-level annotations.

📁

The splits folder contains two text files which list the h5 files used for the traning and validation splits.

📌 Important Resources

The project website contains more details about the REASSEMBLE dataset. The Code for loading and visualizing the data is avaibile on our github repository.

📄 Project website: https://tuwien-asl.github.io/REASSEMBLE_page/
💻 Code: https://github.com/TUWIEN-ASL/REASSEMBLE

⚠️ File comments

Below is a table which contains a list records which have any issues. Issues typically correspond to missing data from one of the sensors.

Recording Issue
2025-01-10-15-28-50.h5 hand cam missing at beginning
2025-01-10-16-17-40.h5 missing hand cam
2025-01-10-17-10-38.h5 hand cam missing at beginning
2025-01-10-17-54-09.h5 no empty action at
p
Cyprus Number Dataset
listtodata.com
.csv, .xls, .txt
Updated Jul 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
List to Data (2025). Cyprus Number Dataset [Dataset]. https://listtodata.com/cyprus-dataset
Explore at:
.csv, .xls, .txtAvailable download formats
Dataset updated
Jul 17, 2025
Authors
List to Data
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Jan 1, 2025 - Dec 31, 2025
Area covered
Cyprus
Variables measured
phone numbers, Email Address, full name, Address, City, State, gender,age,income,ip address,
Description
Cyprus number dataset provides millions of powerful contacts for direct marketing. Our List To Data team carefully gathers these leads to maintain the privacy from many trusted sources. Further, you can get all verified contacts from this site for any business to communicate with recent clients. This Cyprus number dataset creates significant opportunities for boosting company sales. This Cyprus number dataset is also highly effective for business promotion through cold calls and text messages. This telemarketing method gives instant feedback from the consumers and extends contracts. Despite this, we collect the number directory for you in CSV or Excel format. In general, anyone can run it in any CRM software without any trouble. Cyprus phone data is a very valuable contact library for SMS and telemarketing. Besides, the marketing database plays a vital role in direct business plans. Actually, our team prioritizes security and strictly adheres to all GDPR rules. Anyone can buy this library from List To Data without any doubt. Most importantly, you can make your business more well-known by increasing productivity. Moreover, the Cyprus phone data helps in many ways to earn more money from this country. This country is very wealthy in all those sectors, so you can buy this number package now. Our website is the perfect place to bring all authentic client mobile contact numbers. To this end, our team is ready to help you 24/7 in supplying your necessary leads. Cyprus phone number list makes your business more profitable in a couple of months. This country has the nominal GDP (US$35 billion) and the most extensive by purchasing power parity (US$60 billion). As a result, there is a big chance of earning more from here. As such agriculture, services, industry, and tourism, are the main sources of income in Cyprus. In fact, you can get their mobile numbers from us for direct calls or SMS marketing. In addition, this Cyprus phone number list is far better for your business activities nationwide. Mainly, you can do the marketing with this massive group of individuals. Thus, it will increase your deals rapidly and develop the company’s wealth. Definitely, as a businessman, you take your required sales leads from our website at a minimal cost.
H
Replication Data for: "The Rich are Different from You and Me": College...
dataverse.harvard.edu
datasetcatalog.nlm.nih.gov
Updated Oct 11, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tali Mendelberg; Katherine McCabe; Adam Thal (2017). Replication Data for: "The Rich are Different from You and Me": College Socialization and the Economic Views of Affluent Americans [Dataset]. http://doi.org/10.7910/DVN/FS90RJ
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/FS90RJ
Dataset updated
Oct 11, 2017
Dataset provided by
Harvard Dataverse
Authors
Tali Mendelberg; Katherine McCabe; Adam Thal
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/FS90RJhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.7910/DVN/FS90RJ
Description
Affluent Americans support more conservative economic policies than the non-affluent, and government responds disproportionately to these views. Yet little is known about the emergence of these consequential views. We develop, test and find support for a theory of class cultural norms: these preferences are partly traceable to socialization that occurs on predominately affluent college campuses, especially those with norms of financial gain, and especially among socially embedded students. The economic views of the student’s cohort also matter, in part independently of affluence. We use a large panel dataset with a high response rate and more rigorous causal inference strategies than previous socialization studies. The affluent campus effect holds with matching, among students with limited school choice, and in a natural experiment, and passes placebo tests. College socialization partly explains why affluent Americans support economically conservative policies.
g
Replication Data for: Income Inequality and State Parties: Who Gets...
datasearch.gesis.org
dataverse-staging.rdmc.unc.edu
Updated Feb 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wright, Gerald; Rigby, Elizabeth (2020). Replication Data for: Income Inequality and State Parties: Who Gets Represented? [Dataset]. http://doi.org/10.15139/S3/XJZONF
Explore at:
Unique identifier
https://doi.org/10.15139/S3/XJZONF
Dataset updated
Feb 22, 2020
Dataset provided by
Odum Institute Dataverse Network
Authors
Wright, Gerald; Rigby, Elizabeth
Description
Recent studies of representation at the national and state levels have provided evidence that elected officials’ votes, political parties’ platforms, and enacted policy choices are more responsive to the preferences of the affluent, while those with average incomes and the poor have little or no impact in the political process. Yet, this research on the dominance of the affluent has overlooked key partisan differences in the electorate. In this era of hyper-partisanship, we argue that representation occurs through the party system, and we test whether taking this reality into account changes the story of policy dominance by the rich. We combine data on public preferences and state party positions to test for income bias in parties’ representation of their own co-partisans. The results show an interesting pattern in which under-representation of the poor is driven by Democratic parties pushing the more liberal social policy stances of rich Democrats and Republican parties reflecting the particularly conservative economic policy preferences of Rich Republicans. Thus, we have ample evidence that the wealthy, more often than not, do call the shots, but that the degree to which this disproportionate party responsiveness produces less representative policies depends on the party in power and the policy dimension being considered. We conclude by linking this pattern of influence and “coincidental representation” to familiar changes which define the transformation of the New Deal party system.[insert article abstract]
Human Resources Data Set
kaggle.com
zip
Updated Oct 19, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr. Rich (2020). Human Resources Data Set [Dataset]. https://www.kaggle.com/datasets/rhuebner/human-resources-data-set/discussion
Explore at:
zip(17041 bytes)Available download formats
Dataset updated
Oct 19, 2020
Authors
Dr. Rich
Description
Updated 30 January 2023

Version 14 of Dataset

License Update:

There has been some confusion around licensing for this data set. Dr. Carla Patalano and Dr. Rich Huebner are the original authors of this dataset.

We provide a license to anyone who wishes to use this dataset for learning or teaching. For the purposes of sharing, please follow this license:

CC-BY-NC-ND This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Codebook

https://rpubs.com/rhuebner/hrd_cb_v14

PLEASE NOTE -- I recently updated the codebook - please use the above link. A few minor discrepancies were identified between the codebook and the dataset. Please feel free to contact me through LinkedIn (www.linkedin.com/in/RichHuebner) to report discrepancies and make requests.

Context

HR data can be hard to come by, and HR professionals generally lag behind with respect to analytics and data visualization competency. Thus, Dr. Carla Patalano and I set out to create our own HR-related dataset, which is used in one of our graduate MSHRM courses called HR Metrics and Analytics, at New England College of Business. We created this data set ourselves. We use the data set to teach HR students how to use and analyze the data in Tableau Desktop - a data visualization tool that's easy to learn.

This version provides a variety of features that are useful for both data visualization AND creating machine learning / predictive analytics models. We are working on expanding the data set even further by generating even more records and a few additional features. We will be keeping this as one file/one data set for now. There is a possibility of creating a second file perhaps down the road where you can join the files together to practice SQL/joins, etc.

Note that this dataset isn't perfect. By design, there are some issues that are present. It is primarily designed as a teaching data set - to teach human resources professionals how to work with data and analytics.

Content

We have reduced the complexity of the dataset down to a single data file (v14). The CSV revolves around a fictitious company and the core data set contains names, DOBs, age, gender, marital status, date of hire, reasons for termination, department, whether they are active or terminated, position title, pay rate, manager name, and performance score.

Recent additions to the data include: - Absences - Most Recent Performance Review Date - Employee Engagement Score

Acknowledgements

Dr. Carla Patalano provided the baseline idea for creating this synthetic data set, which has been used now by over 200 Human Resource Management students at the college. Students in the course learn data visualization techniques with Tableau Desktop and use this data set to complete a series of assignments.

Inspiration

We've included some open-ended questions that you can explore and try to address through creating Tableau visualizations, or R or Python analyses. Good luck and enjoy the learning!

Is there any relationship between who a person works for and their performance score?

What is the overall diversity profile of the organization?

What are our best recruiting sources if we want to ensure a diverse organization?

Can we predict who is going to terminate and who isn't? What level of accuracy can we achieve on this?

Are there areas of the company where pay is not equitable?

There are so many other interesting questions that could be addressed through this interesting data set. Dr. Patalano and I look forward to seeing what we can come up with.

If you have any questions or comments about the dataset, please do not hesitate to reach out to me on LinkedIn: http://www.linkedin.com/in/RichHuebner

You can also reach me via email at: Richard.Huebner@go.cambridgecollege.edu
TIGER/Line Shapefile, Current, County, Rich County, UT, Address...
catalog.data.gov
Updated Aug 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of Commerce, U.S. Census Bureau, Geography Division (Point of Contact) (2025). TIGER/Line Shapefile, Current, County, Rich County, UT, Address Range-Feature [Dataset]. https://catalog.data.gov/dataset/tiger-line-shapefile-current-county-rich-county-ut-address-range-feature
Explore at:
Dataset updated
Aug 9, 2025
Dataset provided by
United States Census Bureauhttp://census.gov/
Area covered
Rich County
Description
The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) System (MTS). The MTS represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The Address Range Features shapefile contains the geospatial edge geometry and attributes of all unsuppressed address ranges for a county or county equivalent area. The term "address range" refers to the collection of all possible structure numbers from the first structure number to the last structure number and all numbers of a specified parity in between along an edge side relative to the direction in which the edge is coded. Single-address address ranges have been suppressed to maintain the confidentiality of the addresses they describe. Multiple coincident address range feature edge records are represented in the shapefile if more than one left or right address ranges are associated to the edge. This shapefile contains a record for each address range to street name combination. Address ranges associated to more than one street name are also represented by multiple coincident address range feature edge records. Note that this shapefile includes all unsuppressed address ranges compared to the All Lines shapefile (edges.shp) which only includes the most inclusive address range associated with each side of a street edge. The TIGER/Line shapefiles contain potential address ranges, not individual addresses. The address ranges in the TIGER/Line shapefiles are potential ranges that include the full range of possible structure numbers even though the actual structures may not exist.
TIGER/Line Shapefile, Current, County, Rich County, UT, Area Hydrography
catalog.data.gov
Updated Aug 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of Commerce, U.S. Census Bureau, Geography Division (Point of Contact) (2025). TIGER/Line Shapefile, Current, County, Rich County, UT, Area Hydrography [Dataset]. https://catalog.data.gov/dataset/tiger-line-shapefile-current-county-rich-county-ut-area-hydrography
Explore at:
Dataset updated
Aug 8, 2025
Dataset provided by
United States Census Bureauhttp://census.gov/
Area covered
Rich County
Description
The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) System (MTS). The MTS represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The Census Bureau includes landmarks in the MTS for locating special features and to help enumerators during field operations. Some of the more common landmark types include airports, cemeteries, parks, schools, and churches and other religious institutions. The Census Bureau adds landmark features to MTS on an as-needed basis and does not ensure that all instances of a particular feature are included. The presence or absence of a landmark such as a hospital or prison does not mean that the living quarters associated with that landmark were geocoded to that census tabulation block or excluded from the census enumeration. The Area Landmark shapefile does not include military installations or water bodies because they appear in their own separate shapefiles, mil.shp and areawater.shp respectively.
g
TIGER/Line Shapefile, Current, County, Rich County, UT, Area Hydrography |...
gimi9.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TIGER/Line Shapefile, Current, County, Rich County, UT, Area Hydrography | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_tiger-line-shapefile-current-county-rich-county-ut-area-hydrography/
Explore at:
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Utah, Rich County
Description
The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) System (MTS). The MTS represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The Census Bureau includes landmarks in the MTS for locating special features and to help enumerators during field operations. Some of the more common landmark types include airports, cemeteries, parks, schools, and churches and other religious institutions. The Census Bureau adds landmark features to MTS on an as-needed basis and does not ensure that all instances of a particular feature are included. The presence or absence of a landmark such as a hospital or prison does not mean that the living quarters associated with that landmark were geocoded to that census tabulation block or excluded from the census enumeration. The Area Landmark shapefile does not include military installations or water bodies because they appear in their own separate shapefiles, mil.shp and areawater.shp respectively.
TIGER/Line Shapefile, 2022, County, Rich County, UT, Address Range-Feature
catalog.data.gov
Updated Jan 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of Commerce, U.S. Census Bureau, Geography Division, Spatial Data Collection and Products Branch (Point of Contact) (2024). TIGER/Line Shapefile, 2022, County, Rich County, UT, Address Range-Feature [Dataset]. https://catalog.data.gov/dataset/tiger-line-shapefile-2022-county-rich-county-ut-address-range-feature
Explore at:
Dataset updated
Jan 28, 2024
Dataset provided by
United States Census Bureauhttp://census.gov/
Area covered
Rich County
Description
The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The Address Ranges Feature Shapefile (ADDRFEAT.dbf) contains the geospatial edge geometry and attributes of all unsuppressed address ranges for a county or county equivalent area. The term "address range" refers to the collection of all possible structure numbers from the first structure number to the last structure number and all numbers of a specified parity in between along an edge side relative to the direction in which the edge is coded. Single-address address ranges have been suppressed to maintain the confidentiality of the addresses they describe. Multiple coincident address range feature edge records are represented in the shapefile if more than one left or right address ranges are associated to the edge. The ADDRFEAT shapefile contains a record for each address range to street name combination. Address range associated to more than one street name are also represented by multiple coincident address range feature edge records. Note that the ADDRFEAT shapefile includes all unsuppressed address ranges compared to the All Lines Shapefile (EDGES.shp) which only includes the most inclusive address range associated with each side of a street edge. The TIGER/Line shapefile contain potential address ranges, not individual addresses. The address ranges in the TIGER/Line Files are potential ranges that include the full range of possible structure numbers even though the actual structures may not exist.
i
Richest Zip Codes in South Carolina
incomebyzipcode.com
Updated Dec 18, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cubit Planning, Inc. (2024). Richest Zip Codes in South Carolina [Dataset]. https://www.incomebyzipcode.com/southcarolina
Explore at:
Dataset updated
Dec 18, 2024
Dataset authored and provided by
Cubit Planning, Inc.
License
https://www.incomebyzipcode.com/terms#TERMShttps://www.incomebyzipcode.com/terms#TERMS
Area covered
South Carolina
Description
A dataset listing the richest zip codes in South Carolina per the most current US Census data, including information on rank and average income.
Web-Harvested Image and Caption Dataset
kaggle.com
zip
Updated Dec 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Web-Harvested Image and Caption Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/web-harvested-image-and-caption-dataset
Explore at:
zip(233254845 bytes)Available download formats
Dataset updated
Dec 6, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description

Web-Harvested Image and Caption Dataset

Web-Harvested Image and Caption Dataset

By conceptual_captions (From Huggingface) [source]

About this dataset

The Conceptual Captions dataset, hosted on Kaggle, is a comprehensive and expansive collection of web-harvested images and their corresponding captions. With a staggering total of approximately 3.3 million images, this dataset offers a rich resource for training and evaluating image captioning models.

Unlike other image caption datasets, the unique feature of Conceptual Captions lies in the diverse range of styles represented in its captions. These captions are sourced from the web, specifically extracted from the Alt-text HTML attribute associated with web images. This approach ensures that the dataset encompasses a broad variety of textual descriptions that accurately reflect real-world usage scenarios.

To guarantee the quality and reliability of these captions, an elaborate automatic pipeline has been developed for extracting, filtering, and transforming each image/caption pair. The goal behind this diligent curation process is to provide clean, informative, fluent, and learnable captions that effectively describe their corresponding images.

The dataset itself consists of two primary components: train.csv and validation.csv files. The train.csv file comprises an extensive collection of over 3.3 million web-harvested images along with their respective carefully curated captions. Each image is accompanied by its unique URL to allow easy retrieval during model training.

On the other hand, validation.csv contains approximately 100,000 image URLs paired with their corresponding informative captions. This subset serves as an invaluable resource for validating and evaluating model performance after training on the larger train.csv set.

Researchers and data scientists can leverage this remarkable Conceptual Captions dataset to develop state-of-the-art computer vision models focused on tasks such as image understanding, natural language processing (NLP), multimodal learning techniques combining visual features with textual context comprehension – among others.

By providing such an extensive array of high-quality images coupled with richly descriptive captions acquired from various sources across the internet landscape through a meticulous curation process - Conceptual Captions empowers professionals working in fields like artificial intelligence (AI), machine learning, computer vision, and natural language processing to explore new frontiers in visual understanding and textual comprehension

How to use the dataset

Title: How to Use the Conceptual Captions Dataset for Web-Harvested Image and Caption Analysis

Introduction: The Conceptual Captions dataset is an extensive collection of web-harvested images, each accompanied by a caption. This guide aims to help you understand and effectively utilize this dataset for various applications, such as image captioning, natural language processing, computer vision tasks, and more. Let's dive into the details!

Step 1: Acquiring the Dataset

Step 2: Exploring the Dataset Files After downloading the dataset files ('train.csv' and 'validation.csv'), you'll find that each file consists of multiple columns containing valuable information:

a) 'caption': This column holds captions associated with each image. It provides textual descriptions that can be used in various NLP tasks. b) 'image_url': This column contains URLs pointing to individual images in the dataset.

Step 3: Understanding Dataset Structure The Conceptual Captions dataset follows a tabular format where each row represents an image/caption pair. Combining knowledge from both train.csv and validation.csv files will give you access to a diverse range of approximately 3.4 million paired examples.

Step 4: Preprocessing Considerations Due to its web-harvested nature, it is recommended to perform certain preprocessing steps on this dataset before utilizing it for your specific task(s). Some considerations include:

a) Text Cleaning: Perform basic text cleaning techniques such as removing special characters or applying sentence tokenization. b) Filtering: Depending on your application, you may need to apply specific filters to remove captions that are irrelevant, inaccurate, or noisy. c) Language Preprocessing: Consider using techniques like lemmatization or stemming if it suits your task.

Step 5: Training and Evaluation Once you have preprocessed the dataset as per your requirements, it's time to train your models! The Conceptual Captions dataset can be used for a range of tasks such as image captioni...
Factory Workers’ Daily Performance & Attrition
kaggle.com
zip
Updated Feb 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew E. Gladden (2023). Factory Workers’ Daily Performance & Attrition [Dataset]. https://www.kaggle.com/datasets/gladdenme/factory-workers-daily-performance-attrition-s/discussion
Explore at:
zip(16073374 bytes)Available download formats
Dataset updated
Feb 8, 2023
Authors
Matthew E. Gladden
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This synthetic dataset contains 18 months’ worth of daily performance and attrition data (411,948 observations) for a factory whose organizational structure comprises 508 workers. Due to employee turnover, a total of 687 persons appear in the dataset. The dataset’s observations cover both regular daily events (like workers’ attendance and daily level of Efficacy) and special one-time events (like accidents, an employee’s termination, or the onboarding of a new employee). A unique feature of the dataset is diverse causal relationships “hidden” within the data that are waiting to be uncovered through machine learning. For example, one might apply machine learning to investigate: - How a worker’s high level of performance increases the likelihood that he or she will be “hired away” to a better job by a competing company. - How a worker’s mental lapses or physical accidents may indicate that he or she is becoming sick and may soon miss a day of work due to illness. - How workers’ Efficacy is influenced by the day of the week, day of the month, and month of the year. - How workers’ age impacts their average daily Efficacy. - How workers’ average daily Efficacy is influenced by the difference in age between them and their supervisor. - How a worker’s average daily Efficacy is influenced by whether he or she is working primarily with teammates of the same or opposite sex. - How the number of “Teamwork” and “Disruption” behaviors displayed by workers and recorded by their managers is influenced by the day of the month (and, e.g., the stress caused by impending production deadlines). - How workers can be classified into groups with high, moderate, or low daily Efficacy that is either relatively stable or highly variable.

The dataset was prepared using Synaptans WorkforceSim version 0.3.15.

Types of observations

Each row of the dataset reflects a single “event” that occurred on a particular day in relation to a particular worker. It’s possible for a given worker to have more than one event (and row) for the same day; for example, a worker might have “Presence”, “Efficacy”, and “Teamwork” events on the same day, with each event reflecting a different qualitative or quantitative aspect of the person’s performance. There are 14 types of events reflected in the dataset: - A “Presence” event indicates that a particular worker showed up for work on the given day, while an “Absence” event indicates that the worker failed to show up on a day when he or she was scheduled to work. - An “Efficacy” event reflects the degree of efficiency and productivity that an employee demonstrated over the course of the given workday. There are two related elements: the actual Efficacy that the employee generated on the given day, and the recorded Efficacy that the worker’s supervisor entered into the factory’s HRM/ERP system for the employee for that day. The employee’s actual Efficacy (a float with four post-decimal digits, such as 0.9548) is a “hidden” variable whose precise value isn’t known to the employee’s manager. Instead, the manager observes the employee over the course of the day and, at the end of the workday, enters into the HRM/ERP system an estimated value for the worker’s Efficacy. Such estimated values have only one post-decimal digit (e.g., 0.9). Some managers are better than others at estimating their employees’ Efficacy scores – but no supervisor is perfect. For example, if an employee worked with an Efficacy of 0.8437 on a given day, his or her manager could easily record an estimated value of 0.8 or 0.9 or (less likely) even 0.7 or 1.0. Workers who feel as though their managers are consistently mis-recording their Efficacy levels may eventually become inclined to quit their jobs. - A “Resignation” event indicates that on the given date, an employee quit his or her job (i.e., the employee experienced a voluntary separation). After resigning, the employee was no longer a part of the workforce and did not generate any future behaviors. Only Laborers and Team Leaders are liable to experience a Resignation event; the factory’s Shift Managers and Production Director remain in place throughout the entire period. - A “Termination” event is like a Resignation behavior, except that the employee was fired by the organization (i.e., the employee experienced an involuntary separation). - An “Onboarding” event indicates that the subject is a newly hired employee who began work on the given date. In order to maintain a stable size for the factory’s workforce, a new employee is hired whenever an existing employee has resigned or been terminated. The new employee assumes the organizational role vacated by the recently separated worker (i.e., having the same Shift, Team, and Role), although his or her personal characteristics may differ greatly from those of the person whom he or she is replacing...
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ayessa (2023). Forbes Top 200 Richest American [Dataset]. https://www.kaggle.com/datasets/ayessa/forbes-top-200-richest-american/code

Forbes Top 200 Richest American

List of the top 200 richest Americans according to Forbes. (age, networth, etc)

Explore at:

zip(9658 bytes)Available download formats

Dataset updated

Feb 6, 2023

Authors

Ayessa

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered

United States

Description

The Forbes published "The Definitive Ranking of the Wealthiest Americans In 2022". And here is the top 200 list including additional information on each of the billionaires.

About Dataset

This dataset contains the top 200 richest American based on their net worth.

Columns Attributes

Methodology

This dataset was acquired using a web scraping tool called Beautiful soup and scraped Forbes website

Image by pch.vector on Freepik

Clear search

Close search

Google apps

Main menu

Recording	Issue
2025-01-10-15-28-50.h5	hand cam missing at beginning
2025-01-10-16-17-40.h5	missing hand cam
2025-01-10-17-10-38.h5	hand cam missing at beginning
2025-01-10-17-54-09.h5	no empty action at

Forbes Top 200 Richest American

About Dataset

Columns Attributes

Methodology

Survey of Consumer Finances

Data from: REASSEMBLE: A Multimodal Dataset for Contact-rich Robotic...

REASSEMBLE: A Multimodal Dataset for Contact-rich Robotic Assembly and Disassembly

📋 Introduction

✨ Key Features

🔴 Dataset Collection

📑 Dataset Structure

📌 Important Resources

⚠️ File comments

Cyprus Number Dataset

Replication Data for: "The Rich are Different from You and Me": College...

Replication Data for: Income Inequality and State Parties: Who Gets...

Human Resources Data Set

Version 14 of Dataset

License Update:

Codebook

Context

Content

Acknowledgements

Inspiration

TIGER/Line Shapefile, Current, County, Rich County, UT, Address...

TIGER/Line Shapefile, Current, County, Rich County, UT, Area Hydrography

TIGER/Line Shapefile, Current, County, Rich County, UT, Area Hydrography |...

TIGER/Line Shapefile, 2022, County, Rich County, UT, Address Range-Feature

Richest Zip Codes in South Carolina

Web-Harvested Image and Caption Dataset

Web-Harvested Image and Caption Dataset

Web-Harvested Image and Caption Dataset

About this dataset

How to use the dataset

Factory Workers’ Daily Performance & Attrition

Types of observations

Forbes Top 200 Richest American

List of the top 200 richest Americans according to Forbes. (age, networth, etc)

About Dataset

Columns Attributes

Methodology