Facebook
TwitterHirai-Labs/alpr-vlm-instruct-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterA subset of the LendingClub DataSet obtained from Kaggle: https://www.kaggle.com/wordsforthewise/lending-club
LendingClub is a US peer-to-peer lending company, headquartered in San Francisco, California. It was the first peer-to-peer lender to register its offerings as securities with the Securities and Exchange Commission (SEC), and to offer loan trading on a secondary market. LendingClub is the world's largest peer-to-peer lending platform.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The Synthetic Employee Attrition Dataset is a simulated dataset designed for the analysis and prediction of employee attrition. It contains detailed information about various aspects of an employee's profile, including demographics, job-related features, and personal circumstances.
The dataset comprises 74,498 samples, split into training and testing sets to facilitate model development and evaluation. Each record includes a unique Employee ID and features that influence employee attrition. The goal is to understand the factors contributing to attrition and develop predictive models to identify at-risk employees.
This dataset is ideal for HR analytics, machine learning model development, and demonstrating advanced data analysis techniques. It provides a comprehensive and realistic view of the factors affecting employee retention, making it a valuable resource for researchers and practitioners in the field of human resources and organizational development.
FEATURES:
Employee ID: A unique identifier assigned to each employee. Age: The age of the employee, ranging from 18 to 60 years. Gender: The gender of the employee Years at Company: The number of years the employee has been working at the company. Monthly Income: The monthly salary of the employee, in dollars. Job Role: The department or role the employee works in, encoded into categories such as Finance, Healthcare, Technology, Education, and Media. Work-Life Balance: The employee's perceived balance between work and personal life, (Poor, Below Average, Good, Excellent) Job Satisfaction: The employee's satisfaction with their job: (Very Low, Low, Medium, High) Performance Rating: The employee's performance rating: (Low, Below Average, Average, High) Number of Promotions: The total number of promotions the employee has received. Distance from Home: The distance between the employee's home and workplace, in miles. Education Level: The highest education level attained by the employee: (High School, Associate Degree, Bachelor’s Degree, Master’s Degree, PhD) Marital Status: The marital status of the employee: (Divorced, Married, Single) Job Level: The job level of the employee: (Entry, Mid, Senior) Company Size: The size of the company the employee works for: (Small,Medium,Large) Company Tenure: The total number of years the employee has been working in the industry. Remote Work: Whether the employee works remotely: (Yes or No) Leadership Opportunities: Whether the employee has leadership opportunities: (Yes or No) Innovation Opportunities: Whether the employee has opportunities for innovation: (Yes or No) Company Reputation: The employee's perception of the company's reputation: (Very Poor, Poor,Good, Excellent) Employee Recognition: The level of recognition the employee receives:(Very Low, Low, Medium, High)
Attrition: Whether the employee has left the company, encoded as 0 (stayed) and 1 (Left).
Facebook
Twitterdvs/90sclub-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
What this collection is: A curated, binary-classified image dataset of grayscale (1 band) 400 x 400-pixel size, or image chips, in a JPEG format extracted from processed Sentinel-1 Synthetic Aperture Radar (SAR) satellite scenes acquired over various regions of the world, and featuring clear open ocean chips, look-alikes (wind or biogenic features) and oil slick chips.
This binary dataset contains chips labelled as:
- "0" for chips not containing any oil features (look-alikes or clean seas)
- "1" for those containing oil features.
This binary dataset is imbalanced, and biased towards "0" labelled chips (i.e., no oil features), which correspond to 66% of the dataset. Chips containing oil features, labelled "1", correspond to 34% of the dataset.
Why: This dataset can be used for training, validation and/or testing of machine learning, including deep learning, algorithms for the detection of oil features in SAR imagery. Directly applicable for algorithm development for the European Space Agency Sentinel-1 SAR mission (https://sentinel.esa.int/web/sentinel/missions/sentinel-1 ), it may be suitable for the development of detection algorithms for other SAR satellite sensors.
Overview of this dataset: Total number of chips (both classes) is N=5,630 Class 0 1 Total 3,725 1,905
Further information and description is found in the ReadMe file provided (ReadMe_Sentinel1_SAR_OilNoOil_20221215.txt)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the detailed breakdown of the count of individuals within distinct income brackets, categorizing them by gender (men and women) and employment type - full-time (FT) and part-time (PT), offering valuable insights into the diverse income landscapes within Glen Echo Park. The dataset can be utilized to gain insights into gender-based income distribution within the Glen Echo Park population, aiding in data analysis and decision-making..
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Income brackets:
Variables / Data Columns
Employment type classifications include:
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Glen Echo Park median household income by race. You can refer the same here
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
## Overview
Pen Dataset is a dataset for object detection tasks - it contains Pen annotations for 304 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Oakdale by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Oakdale across both sexes and to determine which sex constitutes the majority.
Key observations
There is a slight majority of female population, with 52.91% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Oakdale Population by Race & Ethnicity. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Rochester by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Rochester across both sexes and to determine which sex constitutes the majority.
Key observations
There is a slight majority of female population, with 51.82% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Rochester Population by Race & Ethnicity. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Hartville by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Hartville across both sexes and to determine which sex constitutes the majority.
Key observations
There is a slight majority of female population, with 51.15% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Hartville Population by Race & Ethnicity. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the distribution of median household income among distinct age brackets of householders in Washburn town. Based on the latest 2019-2023 5-Year Estimates from the American Community Survey, it displays how income varies among householders of different ages in Washburn town. It showcases how household incomes typically rise as the head of the household gets older. The dataset can be utilized to gain insights into age-based household income trends and explore the variations in incomes across households.
Key observations: Insights from 2023
In terms of income distribution across age cohorts, in Washburn town, the median household income stands at $96,250 for householders within the 25 to 44 years age group, followed by $61,250 for the 45 to 64 years age group. Notably, householders within the 65 years and over age group, had the lowest median household income at $38,000.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. All incomes have been adjusting for inflation and are presented in 2023-inflation-adjusted dollars.
Age groups classifications include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Washburn town median household income by age. You can refer the same here
Facebook
TwitterThe dataset represents a compilation of user interaction data generated by users who participated in the project's pilot activities in Patras, Greece. Data was generated by users in the SMARTBUY app and includes information about users, stores, product categories, professions, and events.
The dataset comprises the following data: - users: user account data for the Patras pilot users - occupation: all possible occupations that the pilot users could choose from - stores: stores which participated in the Patras pilot - sel_products_cat: products uploaded to the SMARTBUY platform by retailers - events: geo-stamped and time-stamped descriptions of a user interaction event (for instance, "user_id 67 rated product_id 722 with rating 4 at location x1 at datetime y1", or "user_id 91 denoted product_id 78 as favorite at location x2 at datetime y2") - event_types: all possible event types captured by the SMARTBUY platform ('Product searches', 'Product views', 'Featured product', 'Products near you views', 'Product photos browsed', 'Product ratings', 'Clicks on Read More button to read product reviews', 'Clicks on Open map button', 'Clicks on Send this info by email button', 'Products denoted as Favorite')
Privacy-sensitive information such as user names, retailer owner names and store names and keywords searched are anonymized.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Dataset First is a dataset for object detection tasks - it contains Dataset First annotations for 280 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterA simulated low-dose CT dataset generated from normal-dose CT images to be used for training a deep neural network to remove noise from low-dose CT images.
Facebook
TwitterThe dataset is used to evaluate the performance of the 𝛼-RNN model on various time series tasks.
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
This dataset, titled "Anabolic Steroids", provides a meticulously curated compilation of nearly 50 steroids. It includes detailed information on their original names, common names, medicinal applications, abuse potential, side effects, historical context, and relative molecular mass (RMM). The dataset aims to serve as a resource for exploring the dual nature of anabolic steroids—both their therapeutic benefits and their misuse in sports and bodybuilding.
Anabolic steroids are synthetic derivatives of testosterone that have been used for decades in medicine to treat conditions like anemia, muscle-wasting diseases, and hormone deficiencies. However, they are also widely abused for performance enhancement and aesthetic purposes. This dataset captures a comprehensive view of these compounds, making it valuable for researchers, educators, and data enthusiasts.
While this dataset is relatively small (approx 50 entries), it offers rich opportunities for exploratory analysis and domain-specific insights. Potential applications include:
Exploratory Data Analysis (EDA):
Domain-Specific Insights:
Educational Use:
This dataset has been ethically compiled from publicly available sources such as scientific journals, chemical databases, and educational websites. No proprietary or confidential information has been included. The data was aggregated to ensure accuracy and relevance while respecting intellectual property rights.
The following sources were instrumental in compiling this dataset: 1. PubChem Database – For verifying chemical properties and molecular mass values. 2. Wikipedia – For historical context and general information on anabolic steroids. 3. NIST Chemistry WebBook – For accurate molecular mass values and chemical details. 4. Scientific Journals – Referenced for medicinal uses, side effects documentation, and abuse patterns. 5. DALL·E 3 by OpenAI – Used to generate illustrative images related to anabolic steroids to complement dataset visualizations.
The misuse of anabolic steroids poses significant health risks and ethical concerns. While anabolic steroids have legitimate medical applications, their abuse for performance enhancement or aesthetic purposes can lead to severe physical and psychological side effects. Common adverse effects include liver damage, cardiovascular strain, hormonal imbalances, infertility, aggression, and mental health issues such as depression. Prolonged misuse can also result in irreversible damage to vital organs and an increased risk of life-threatening conditions like heart attacks or strokes. Beyond individual health risks, steroid abuse undermines the integrity of sports and creates unfair advantages in competitive environments. It is crucial to prioritize natural methods of achieving fitness goals and seek professional guidance for any medical conditions requiring treatment.
This dataset is not intended for machine learning due to its small size but serves as an excellent resource for exploratory data analysis (EDA), visualization projects, and domain-specific research into anabolic steroids' pharmacology and societal impact.
Facebook
TwitterWorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
Jack Hong1, Shilin Yan1†, Jiayin Cai1, Xiaolong Jiang1, Yao Hu1, Weidi Xie2‡
†Project Leader
‡Corresponding Author
1Xiaohongshu Inc. 2Shanghai Jiao Tong University [🏠 Project Page] [📖 arXiv Paper] [🤗 Dataset] [🏆 Leaderboard]
🔥 News
2025.02.07 🌟 We release WorldSense, the first benchmark for real-world omnimodal understanding of MLLMs.
👀 WorldSense Overview
we… See the full description on the dataset page: https://huggingface.co/datasets/honglyhly/WorldSense.
Facebook
TwitterA synthetic binary dataset of desired characteristics, comprising 3000 instances with 20 features.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Hydrometeorological time series and catchment attributes from the CABra dataset. The manuscript of "CABra: a novel large-sample dataset for Brazilian catchments" is under review in Hydrology and Earth System Sciences (HESS) journal.
Here we present the Catchments Attributes for Brazil (CABra), which is a large-sample dataset for Brazilian catchments that includes long-term data (30 years) for 735 catchments in eight main catchment attribute classes (climate, streamflow, groundwater, geology, soil, topography, land-use and land-cover, and hydrologic disturbance). We have collected and synthesized data from multiple sources (ground stations, remote sensing, and gridded datasets). To prepare the dataset, we delineated all the catchments using the Multi-Error-Removed Improved-Terrain Digital Elevation Model and the coordinates of the streamflow stations provided by the Brazilian Water Agency (ANA), where only the stations with 30 years (1980-2010) of data and less than 10% of missing records were included. Catchment areas range from 9 to 4,800,000 km² and the mean daily streamflow varies from 0.02 to 9 mm day-1. Several signatures and indices were calculated based on the climate and streamflow data. Additionally, our dataset includes boundary shapefiles, geographic coordinates, and drainage areas for each catchment, aside from more than 100 attributes within the attribute classes.
Data can also be accessed at: thecabradataset.shinyapps.io/CABra
* This version includes water demand in CABra catchments for 2020 and 2040 (projection).
Facebook
TwitterHirai-Labs/alpr-vlm-instruct-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community