100+ datasets found

Job Postings
kaggle.com
Updated Feb 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akshat Jain (2024). Job Postings [Dataset]. https://www.kaggle.com/datasets/akshatkjain/job-postings/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 3, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Akshat Jain
Description
This dataset offers an extensive assortment of job postings, designed to support investigations and examinations within the realms of job market patterns, natural language processing (NLP), and machine learning. Developed for educational and research objectives, this dataset presents a varied array of job advertisements spanning diverse industries and job categories.

Description of dataset:

job_postings.csv - Category- The category of the job. - Workplace- If the job is remote, on-site or hybrid. - Location- Location of the job posting. - Department- The department for which the job has been posted. - Type- If the job is full-time, part-time

job_description.csv - Category: The job category for the position. - Description: A detailed overview of the job role, responsibilities, and qualifications, often provided by the employer. - Benefits: Perks and advantages associated with the job, such as professional development opportunities, wellness programs, flexible working arrangements, and more. - Requirements: Essential skills, qualifications, and experiences expected from candidates applying for the job.

Potential use cases:

Optimizing workforce planning and talent acquisition strategies.

Developing NLP models for resume parsing and job matching.

Building predictive models to forecast job market trends.

Exploring salary prediction models for various job roles.

Analyzing regional job market disparities and opportunities.
Benefits of WFH
kaggle.com
Updated Jul 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shital Gaikwad (2021). Benefits of WFH [Dataset]. https://www.kaggle.com/shitalgaikwad123/benefits-of-wfh/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 23, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shital Gaikwad
Description
Dataset

This dataset was created by Shital Gaikwad

Contents
Nutrition Powerhouse Formulations
kaggle.com
zip
Updated Mar 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
willian oliveira (2024). Nutrition Powerhouse Formulations [Dataset]. https://www.kaggle.com/datasets/willianoliveiragibin/nutrition-powerhouse-formulations
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 29, 2024
Authors
willian oliveira
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
this graph was retired the OurDataWorld :

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F3acab846617aabda6345f7cf9e73ce8c%2Fgraph3.png?generation=1711743920521272&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F28ca12fa98b1aa3ba0146af179e87f57%2Fgraph1.png?generation=1711743952812788&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F3cbea2b6d7978992b7bbd51d30a9d874%2Fgraph2.png?generation=1711743936272984&alt=media" alt="">

Malnutrition is a multifaceted issue that extends far beyond the simple concept of hunger and caloric intake. While ensuring an adequate supply of calories is undoubtedly crucial, it is equally important to consider the nutritional quality of the foods consumed. The composition of our diets plays a pivotal role in determining our overall health and well-being.

When we discuss malnutrition, we must broaden our perspective to encompass not only the quantity but also the quality of food intake. It's not just about filling stomachs; it's about providing the body with essential nutrients such as proteins, fats, vitamins, and minerals. Even if individuals consume enough calories, they can still suffer from malnutrition if their diets lack diversity and fail to deliver the necessary array of nutrients for optimal health.

A significant concern associated with poor dietary choices is the prevalence of micronutrient deficiencies. These deficiencies arise when individuals consume diets that are inadequate in essential vitamins and minerals. A diet that lacks diversity and relies heavily on processed or refined foods often fails to meet the body's micronutrient requirements, leading to a range of health problems and complications.

Addressing malnutrition requires a comprehensive approach that considers not only individual dietary habits but also broader societal and environmental factors. The environmental impact of food production and consumption cannot be overstated. As the global population continues to grow, ensuring access to nutritious foods for everyone while minimizing the environmental footprint of agriculture has become an urgent priority.

One of the key challenges we face is finding sustainable solutions to ensure that nutritious diets are accessible and affordable for all. This necessitates a shift towards more sustainable food systems that prioritize nutrient-rich foods while minimizing environmental degradation. Sustainable agriculture practices, such as organic farming and regenerative agriculture, can play a crucial role in achieving this goal by promoting biodiversity, reducing chemical inputs, and enhancing soil health.

Furthermore, promoting dietary diversity and education about nutrition are essential components of any strategy aimed at combating malnutrition. Encouraging individuals to consume a wide variety of foods, including fruits, vegetables, whole grains, and lean proteins, can help ensure they receive a balanced intake of essential nutrients. Nutrition education programs can empower individuals to make healthier food choices and adopt sustainable eating habits that benefit both their health and the planet.

In addition to individual-level interventions, policymakers and stakeholders must work together to implement broader systemic changes that promote food security and sustainability. This includes investing in agricultural research and innovation, supporting smallholder farmers, and implementing policies that incentivize the production and consumption of nutritious, environmentally friendly foods.

Ultimately, addressing malnutrition requires a concerted effort from all sectors of society. By prioritizing nutritious diets, promoting sustainable food systems, and addressing the root causes of food insecurity and environmental degradation, we can work towards a future where everyone has access to healthy, sustainable food choices. Together, we can build a world where malnutrition is no longer a widespread concern, and all individuals can thrive and reach their full potential.
Kaggle DS Survey 2019
kaggle.com
Updated Dec 1, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alan Asri (2019). Kaggle DS Survey 2019 [Dataset]. https://www.kaggle.com/datasets/alanasri/kaggle-ds-survey-2019
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 1, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Alan Asri
Description
Context

This notebook contains a thorough analysis and explanation related to the survey conducted by Kaggle. The survey was conducted on respondents from work backgrounds, age variations, where they lived, the companies where they worked. Survey questions contain about the world of the field they work in related to Data Scient and Machine Learning.

Content

The following Explanatory Data Analysis is taking data from survey results conducted by Kaggle in 2019 on respondents who give questions about Mechine Learning and Data Scients. Some core points that are in this analysis are as follows, 1. Graph Distribution Age with Formal Education 2. Plot Graph Company and Spent Money in Mechine Learning 3. Comparison spent cost level in Mechine Learning by each company 4. Data Scientist Experience & Their Compensation 5. Correlation between Mechine Learning Experience and Salary benefit 6. Correlation Data Scientist with his Compensation 7. Favourite Media source on Data Scients Topic 8. Favourite media by Age Distribution, Most Likely media by Data Scientist 9. Course Platform for Data Scientist 10. Role Job for each Title, Primary Job of Data Scientist 11. Reguler Programming Languange by Job Title, especially for Data Scientist 12. Comparison Ability spesific programming and Compensation 13. What is the Languange programming learn first aspiring Data Scientist? 14. Integrated Development Environments reguler basis 15. Top 5 IDE and Which Country is using it. Microsoft not dominant in USA 16. What is Notebook as majority likely as a Reguler Basis. Google domination 17. Which Country and What Company use What Hardware for Mechine Learning 18. Role Job based on Spesific Company Type 19. Computer Vision method mostly used by Company 20. Distribution Company by each country 21. Cloud Product, Amazon domination, Goole follow 22. Big Data Product, Amazon majority in Enterprise, Google majority in All

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
A
‘DISEASE PREDICTION USING MACHINE LEARNING WITH GUI’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘DISEASE PREDICTION USING MACHINE LEARNING WITH GUI’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-disease-prediction-using-machine-learning-with-gui-5ad4/latest
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘DISEASE PREDICTION USING MACHINE LEARNING WITH GUI’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/neelima98/disease-prediction-using-machine-learning on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Due to big data progress in biomedical and healthcare communities, accurate study of medical data benefits early disease recognition, patient care and community services. When the quality of medical data is incomplete the exactness of study is reduced. Moreover, different regions exhibit unique appearances of certain regional diseases, which may results in weakening the prediction of disease outbreaks. In this project, it bid a Machine learning Decision tree map, Navie Bayes, Random forest algorithm by using structured and unstructured data from hospital. It also uses Machine learning algorithm for partitioning the data. To the highest of gen, none of the current work attentive on together data types in the zone of remedial big data analytics. Compared to several typical calculating algorithms, the scheming accuracy of our proposed algorithm reaches 94.8% with an regular speed which is quicker than that of the unimodal disease risk prediction algorithm and produces report.

--- Original source retains full ownership of the source dataset ---
5 Benefits of Creative Writing
kaggle.com
Updated Jun 21, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Warren Morrison (2021). 5 Benefits of Creative Writing [Dataset]. https://www.kaggle.com/warrenmorrison/5-benefits-of-creative-writing/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 21, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Warren Morrison
Description
Dataset

This dataset was created by Warren Morrison

Contents
P
Cow Segmentation Dataset Dataset
paperswithcode.com
Updated Feb 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Cow Segmentation Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/cow-segmentation-dataset
Explore at:
Dataset updated
Feb 21, 2025
Description
Description:

👉 Download the dataset here

The Cow Segmentation Dataset is a comprehensive resource designed for segmentation tasks in machine learning. It features a wide variety of cow images annotated in the COCO format, ensuring compatibility with a range of popular machine learning models, including YOLOv8, Mask R-CNN, and others. This dataset empowers researchers and developers to train robust models for recognizing and segmenting cows in various contexts, revolutionizing AI-driven applications in agriculture.

Download Dataset

The dataset consists of high-resolution images that capture cows from different angles, poses, and environments. With detailed annotations in the COCO format, each image is segmented to highlight the cow’s body, enabling precise object recognition. The segmentation data is easily adaptable to models used in advanced image processing tasks, making it a highly flexible resource.

Applications in Agriculture

This dataset offers diverse applications in smart farming, such as automated cow monitoring, health diagnostics, and livestock management. It supports real-time systems for tracking cow behavior, analyzing health indicators, and managing livestock populations effectively. The segmentation accuracy helps in building AI models that can contribute to precision farming, reducing manual efforts and improving overall productivity.

Use Cases and Future Potential

Livestock Management: Automating tasks like cow counting, posture analysis, and herd management using AI-driven systems.

Health Monitoring: Identifying physical conditions like lameness or injury through detailed image segmentation.

Herd Behavior Analysis: Real-time behavior tracking using models trained on various cow positions and movements.

Benefits of Using the Dataset

Diverse Annotations: A rich set of segmentation masks for various cow breeds and environments.

Model Compatibility: Ready-to-use COCO annotations for easy integration into advanced machine learning models.

Real-World Applications: Supports the development of AI systems for real-time livestock monitoring and analysis.

This dataset is sourced from Kaggle.
A
‘📈 Pension Insurance Data’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘📈 Pension Insurance Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-pension-insurance-data-2e7e/89a13dbf/?iid=000-256&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘📈 Pension Insurance Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/pension-insurance-datae on 28 January 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

The tables include statistics on the people and pensions that PBGC protects, including how many Americans are in PBGC-insured pension plans, how many get PBGC benefits, and where they live.

Note: Links in the first sheet associated with each table following.

Source: https://catalog.data.gov/dataset/pension-insurance-data-tables

This dataset was created by Data Society and contains around 100 samples along with Data Book Listing, Table, technical information and other features such as: - Data Book Listing - Table - and more.

How to use this dataset

Analyze Data Book Listing in relation to Table

Study the influence of Data Book Listing on Table

More datasets

Acknowledgements

If you use this dataset in your research, please credit Data Society

Start A New Notebook!

--- Original source retains full ownership of the source dataset ---
A
‘College Football Bowl Games’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘College Football Bowl Games’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-college-football-bowl-games-efe5/9866ff9c/
Explore at:
Dataset updated
Feb 13, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘College Football Bowl Games’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/college-football-bowl-gamese on 13 February 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

Background

Home field advantage is always the most desirable, but does data back it up? I’ve pulled stats on college football bowl games to see if having the home field advantage is all it is cracked up to be.

Methodology

The data collected was scraped from www.foxsports.com.

Source

The research and blog post can be found at The Concept Center

This dataset was created by Chase Willden and contains around 20000 samples along with Receiving Receiving Yards, Kicking Pat Made, technical information and other features such as: - Kick Return Kick Return Touchdowns - Passing Completions - and more.

How to use this dataset

Analyze Kick Return Kick Return Avg in relation to Punt Return Punt Return Long

Study the influence of Kicking Kicking Points on Kick Return Kick Return Long

More datasets

Acknowledgements

If you use this dataset in your research, please credit Chase Willden

Start A New Notebook!

--- Original source retains full ownership of the source dataset ---
m
Fruits Dataset for Classification
data.mendeley.com
Updated Feb 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS GTS (2025). Fruits Dataset for Classification [Dataset]. http://doi.org/10.17632/rg254yr63x.1
Explore at:
Unique identifier
https://doi.org/10.17632/rg254yr63x.1
Dataset updated
Feb 11, 2025
Authors
GTS GTS
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
About Dataset (strawberries, peaches, pomegranates) Photo requirements: 1-White background 2-.jpg 3- Image size 300*300 The number of photos required is 250 photos of each fruit when it is fresh and 250 photos of each Fruit Dataset for Classification when it is rotten. Total 1500 images

Diverse Collection With a diverse collection of Product images, the files provides an excellent foundation for developing and testing machine learning models designed for image recognition and allocation. Each image is captured under different lighting conditions and backgrounds, offering a realistic challenge for algorithms to overcome.

Real-World Applications The variability in the dataset ensures that models trained on it can generalize well to real-world scenarios, making them robust and reliable. The dataset includes common fruits such as apples, bananas, oranges, and strawberries, among others, allowing for comprehensive training and evaluation.

Industry Use Cases One of the significant advantages of using the Fruits Dataset for Classification is its applicability in various fields such as agriculture, retail, and the food industry. In agriculture, it can help automate the process of fruit sorting and grading, enhancing efficiency and reducing labor costs. In retail, it can be used to develop automated checkout systems that accurately identify fruits, streamlining the purchasing process.

Educational Value The dataset is also valuable for educational purposes, providing students and educators with a practical tool to learn and teach machine learning concepts. By working with this dataset, learners can gain hands-on experience in data preprocessing, model training, and evaluation.

Conclusion The Fruits Dataset for Classification is a versatile and indispensable resource for advancing the field of image classification. Its diverse and high-quality images, coupled with practical applications, make it a go-to dataset for researchers, developers, and educators aiming to improve and innovate in machine learning and computer vision.

This dataset is sourced from Kaggle.
A
‘US Public Food Assistance’ analyzed by Analyst-2
analyst-2.ai
Updated Apr 22, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2019). ‘US Public Food Assistance’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-us-public-food-assistance-5075/ca5319fe/?iid=006-512&v=presentation
Explore at:
Dataset updated
Apr 22, 2019
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
Analysis of ‘US Public Food Assistance’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/jpmiller/publicassistance on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Context

This dataset focuses on public assistance programs in the United States that provide food, namely SNAP and WIC. If you are interested in a broader picture of food security across the world, please see Food Security Indicators for the World 2016-2020.

Initial coverage was for the Special Supplemental Nutrition Program for Women, Infants, and Children Program, or simply WIC. The program allocates Federal and State funds to help low-income women and children up to age five who are at nutritional risk. Funds are used to provide supplemental foods, baby formula, health care, and nutrition education.

Starting with version 5, the dataset also covers the US Supplemental Nutrition Assistance Program, more commonly known as SNAP. The program is the successor to the Food Stamps program previously in place. The program provides food assistance to low-income families in the form of a debit card. A 2016 study using POS data from SNAP-eligible vendors showed the three most purchased types of food to be meats, sweetened beverages, and vegetables.

Content

Files may include participation data and spending for state programs, and poverty data for each state. Data for WIC covers fiscal years 2013-2016, which is actually October 2012 through September 2016. Data for SNAP covers 2015 to 2020.

Motivation

My original purpose here is two-fold:

Explore various aspects of US Public Assistance. Show trends over recent years and better understand differences across state agencies. Although the federal government sponsors the program and provides funding, program are administered at the state level and can widely vary. Indian nations (native Americans) also administer their own programs.

Share with the Kaggle Community the joy - and pain - of working with government data. Data is often spread across numerous agency sites and comes in a variety of formats. Often the data is provided in Excel, with the files consisting of multiple tabs. Also, files are formatted as reports and contain aggregated data (sums, averages, etc.) along with base data.

As of March 2nd, I am expanding the purpose to support the M5 Forecasting Challenges here on Kaggle. Store sales are partly driven by participation in Public Assistance programs. Participants typically receive the items free of charge. The store then recovers the sale price from the state agencies administering the program.

Additional Content Ideas

The dataset can benefit greatly from additional content. Economics, additional demographics, administrative costs and more. I'd like to eventually explore the money trail from taxes and corporate subsidies, through the government agencies, and on to program participants. All community ideas are welcome!

--- Original source retains full ownership of the source dataset ---
Health Insurance Marketplace
kaggle.com
zip
Updated May 1, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
US Department of Health and Human Services (2017). Health Insurance Marketplace [Dataset]. https://www.kaggle.com/hhs/health-insurance-marketplace
Explore at:
zip(868821924 bytes)Available download formats
Dataset updated
May 1, 2017
Dataset provided by
United States Department of Health and Human Serviceshttp://www.hhs.gov/
Authors
US Department of Health and Human Services
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
The Health Insurance Marketplace Public Use Files contain data on health and dental plans offered to individuals and small businesses through the US Health Insurance Marketplace.

Exploration Ideas

To help get you started, here are some data exploration ideas:

How do plan rates and benefits vary across states?

How do plan benefits relate to plan rates?

How do plan rates vary by age?

How do plans vary across insurance network providers?

See this forum thread for more ideas, and post there if you want to add your own ideas or answer some of the open questions!

Data Description

This data was originally prepared and released by the Centers for Medicare & Medicaid Services (CMS). Please read the CMS Disclaimer-User Agreement before using this data.

Here, we've processed the data to facilitate analytics. This processed version has three components:

1. Original versions of the data

The original versions of the 2014, 2015, 2016 data are available in the "raw" directory of the download and "../input/raw" on Kaggle Scripts. Search for "dictionaries" on this page to find the data dictionaries describing the individual raw files.

2. Combined CSV files that contain

In the top level directory of the download ("../input" on Kaggle Scripts), there are six CSV files that contain the combined at across all years:

BenefitsCostSharing.csv

BusinessRules.csv

Network.csv

PlanAttributes.csv

Rate.csv

ServiceArea.csv

Additionally, there are two CSV files that facilitate joining data across years:

Crosswalk2015.csv - joining 2014 and 2015 data

Crosswalk2016.csv - joining 2015 and 2016 data

3. SQLite database

The "database.sqlite" file contains tables corresponding to each of the processed CSV files.

The code to create the processed version of this data is available on GitHub.
P
Paimon Dataset YOLO Detection Dataset
paperswithcode.com
gts.ai
Updated Dec 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Paimon Dataset YOLO Detection Dataset [Dataset]. https://paperswithcode.com/dataset/paimon-dataset-yolo-detection
Explore at:
Dataset updated
Dec 3, 2024
Description
Description:

👉 Download the dataset here

This dataset consists of a diverse collection of images featuring Paimon, a popular character from the game Genshin Impact. The images have been sourced from in-game gameplay footage and capture Paimon from various angles and in different sizes (scales), making the dataset suitable for training YOLO object detection models.

The dataset provides a comprehensive view of Paimon in different lighting conditions, game environments, and positions, ensuring the model can generalize well to similar characters or object detection tasks. While most annotations are accurately labeled, a small number of annotations may include minor inaccuracies due to manual labeling errors. This is ideal for researchers and developers working on character recognition, object detection in gaming environments, or other AI vision tasks.

Download Dataset

Dataset Features:

Image Format: .jpg files in 640×320 resolution.

Annotation Format: .txt files in YOLO format, containing bounding box data with:

class_id

x_center

y_center

width

height

Use Cases:

Character Detection in Games: Train YOLO models to detect and identify in-game characters or NPCs.

Gaming Analytics: Improve recognition of specific game elements for AI-powered game analytics tools.

Research: Contribute to academic research focused on object detection or computer vision in animated and gaming environments.

Data Structure:

Images: High-quality .jpg images captured from multiple perspectives, ensuring robust model training across various orientations and lighting scenarios.

Annotations: Each image has an associated .txt file that follows the YOLO format. The annotations are structured to include class identification, object location (center coordinates), and

bounding box dimensions.

Key Advantages:

Varied Angles and Scales: The dataset includes Paimon from multiple perspectives, aiding in creating more versatile and adaptable object detection models.

Real-World Scenario: Extracted from actual gameplay footage, the dataset simulates real-world detection challenges such as varying backgrounds, motion blur, and changing character scales.

Training Ready: Suitable for training YOLO models and other deep learning frameworks that require object detection capabilities.

This dataset is sourced from Kaggle.

‘COVID vaccination vs. mortality ’ analyzed by Analyst-2

analyst-2.ai

Updated Aug 4, 2020

Facebook

Twitter

Click to copy link

Link copied

Cite

Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘COVID vaccination vs. mortality ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-covid-vaccination-vs-mortality-cbd8/06c8ccd2/?iid=010-492&v=presentation

Explore at:

Dataset updated

Aug 4, 2020

Dataset authored and provided by

Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Analysis of ‘COVID vaccination vs. mortality ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sinakaraji/covid-vaccination-vs-death on 12 November 2021.

--- Dataset description provided by original source is as follows ---

Context

The COVID-19 outbreak has brought the whole planet to its knees.More over 4.5 million people have died since the writing of this notebook, and the only acceptable way out of the disaster is to vaccinate all parts of society. Despite the fact that the benefits of vaccination have been proved to the world many times, anti-vaccine groups are springing up all over the world. This data set was generated to investigate the impact of coronavirus vaccinations on coronavirus mortality.

Content

country	iso_code	date	total_vaccinations	people_vaccinated	people_fully_vaccinated	New_deaths	population	ratio
country name	iso code for each country	date that this data belong	number of all doses of COVID vaccine usage in that country	number of people who got at least one shot of COVID vaccine	number of people who got full vaccine shots	number of daily new deaths	2021 country population	% of vaccinations in that country at that date = people_vaccinated/population * 100

Data Collection

This dataset is a combination of the following three datasets:

1.https://www.kaggle.com/gpreda/covid-world-vaccination-progress

2.https://covid19.who.int/WHO-COVID-19-global-data.csv

3.https://www.kaggle.com/rsrishav/world-population

you can find more detail about this dataset by reading this notebook:

https://www.kaggle.com/sinakaraji/simple-linear-regression-covid-vaccination

Countries in this dataset:


Afghanistan	Albania	Algeria	Andorra	Angola
Anguilla	Antigua and Barbuda	Argentina	Armenia	Aruba
Australia	Austria	Azerbaijan	Bahamas	Bahrain
Bangladesh	Barbados	Belarus	Belgium	Belize
Benin	Bermuda	Bhutan	Bolivia (Plurinational State of)	Brazil
Bosnia and Herzegovina	Botswana	Brunei Darussalam	Bulgaria	Burkina Faso
Cambodia	Cameroon	Canada	Cabo Verde	Cayman Islands
Central African Republic	Chad	Chile	China	Colombia
Comoros	Cook Islands	Costa Rica	Croatia	Cuba
Curaçao	Cyprus	Denmark	Djibouti	Dominica
Dominican Republic	Ecuador	Egypt	El Salvador	Equatorial Guinea
Estonia	Ethiopia	Falkland Islands (Malvinas)	Fiji	Finland
France	French Polynesia	Gabon	Gambia	Georgia
Germany	Ghana	Gibraltar	Greece	Greenland
Grenada	Guatemala	Guinea	Guinea-Bissau	Guyana
Haiti	Honduras	Hungary	Iceland	India
Indonesia	Iran (Islamic Republic of)	Iraq	Ireland	Isle of Man
Israel	Italy	Jamaica	Japan	Jordan
Kazakhstan	Kenya	Kiribati	Kuwait	Kyrgyzstan
Lao People's Democratic Republic	Latvia	Lebanon	Lesotho	Liberia
Libya	Liechtenstein	Lithuania	Luxembourg	Madagascar
Malawi	Malaysia	Maldives	Mali	Malta
Mauritania	Mauritius	Mexico	Republic of Moldova	Monaco
Mongolia	Montenegro	Montserrat	Morocco	Mozambique
Myanmar	Namibia	Nauru	Nepal	Netherlands
New Caledonia	New Zealand	Nicaragua	Niger	Nigeria
Niue	North Macedonia	Norway	Oman	Pakistan
occupied Palestinian territory, including east Jerusalem
Panama	Papua New Guinea	Paraguay	Peru	Philippines
Poland	Portugal	Qatar	Romania	Russian Federation
Rwanda	Saint Kitts and Nevis	Saint Lucia
Saint Vincent and the Grenadines	Samoa	San Marino	Sao Tome and Principe	Saudi Arabia
Senegal	Serbia	Seychelles	Sierra Leone	Singapore
Slovakia	Slovenia	Solomon Islands	Somalia	South Africa
Republic of Korea	South Sudan	Spain	Sri Lanka	Sudan
Suriname	Sweden	Switzerland	Syrian Arab Republic	Tajikistan
United Republic of Tanzania	Thailand	Togo	Tonga	Trinidad and Tobago
Tunisia	Turkey	Turkmenistan	Turks and Caicos Islands	Tuvalu
Uganda	Ukraine	United Arab Emirates	The United Kingdom	United States of America
Uruguay	Uzbekistan	Vanuatu	Venezuela (Bolivarian Republic of)	Viet Nam
Wallis and Futuna	Yemen	Zambia	Zimbabwe

--- Original source retains full ownership of the source dataset ---

A
‘ Sales Conversion Optimization’ analyzed by Analyst-2
analyst-2.ai
Updated Jul 14, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2016). ‘ Sales Conversion Optimization’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-sales-conversion-optimization-d134/latest
Explore at:
Dataset updated
Jul 14, 2016
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘ Sales Conversion Optimization’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/loveall/clicks-conversion-tracking on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

Cluster Analysis for Ad Conversions Data

Content

The data used in this project is from an anonymous organisation’s social media ad campaign. The data file can be downloaded from here. The file conversion_data.csv contains 1143 observations in 11 variables. Below are the descriptions of the variables.

1.) ad_id: an unique ID for each ad.

2.) xyz_campaign_id: an ID associated with each ad campaign of XYZ company.

3.) fb_campaign_id: an ID associated with how Facebook tracks each campaign.

4.) age: age of the person to whom the ad is shown.

5.) gender: gender of the person to whim the add is shown

6.) interest: a code specifying the category to which the person’s interest belongs (interests are as mentioned in the person’s Facebook public profile).

7.) Impressions: the number of times the ad was shown.

8.) Clicks: number of clicks on for that ad.

9.) Spent: Amount paid by company xyz to Facebook, to show that ad.

10.) Total conversion: Total number of people who enquired about the product after seeing the ad.

11.) Approved conversion: Total number of people who bought the product after seeing the ad.

Acknowledgements

Thanks to the Anonymous data depositor

Inspiration

Social Media Ad Campaign marketing is a leading source of Sales Conversion and i have made this data available for the benefit of Businesses using Google Adwords to track Conversions

--- Original source retains full ownership of the source dataset ---
A
‘Netflix Shows’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Netflix Shows’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-netflix-shows-53e6/ea6268fc/?iid=004-315&v=presentation
Explore at:
Dataset updated
Feb 13, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Netflix Shows’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/netflix-showse on 13 February 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

Background

Netflix in the past 5-10 years has captured a large populate of viewers. With more viewers, there most likely an increase of show variety. However, do people understand the distribution of ratings on Netflix shows?

Netflix Suggestion Engine

Because of the vast amount of time it would take to gather 1,000 shows one by one, the gathering method took advantage of the Netflix’s suggestion engine. The suggestion engine recommends shows similar to the selected show. As part of this data set, I took 4 videos from 4 ratings (totaling 16 unique shows), then pulled 53 suggested shows per video. The ratings include: G, PG, TV-14, TV-MA. I chose not to pull from every rating (e.g. TV-G, TV-Y, etc.).

Source

Access to the study can be found at The Concept Center

This dataset was created by Chase Willden and contains around 1000 samples along with User Rating Score, Rating Description, technical information and other features such as: - Release Year - Title - and more.

How to use this dataset

Analyze User Rating Size in relation to Rating

Study the influence of Rating Level on User Rating Score

More datasets

Acknowledgements

If you use this dataset in your research, please credit Chase Willden

Start A New Notebook!

--- Original source retains full ownership of the source dataset ---
P
Spanish Sign Language Alphabet Dataset
paperswithcode.com
gts.ai
Updated Feb 25, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Spanish Sign Language Alphabet Dataset [Dataset]. https://paperswithcode.com/dataset/spanish-sign-language-alphabet
Explore at:
Dataset updated
Feb 25, 2025
Description
Description:

👉 Download the dataset here

The Spanish Sign Language dataset consists of 19 static letters and 8 dynamic movements. It was created with the aim of supporting machine learning models for sign language recognition, particularly the Spanish sign language alphabet.

Dataset Composition

The dataset contains around 100 images for each of the 19 static letters. Each image was captured over a white background with high-quality cameras. These images represent three different hands at a uniform distance, simulating the perspective from an internal smartphone camera.

Download Dataset

Expert Validation

To ensure high accuracy, the dataset was reviewed and validated by a certified expert in Spanish sign language. This guarantees that each letter is correctly represented, making the dataset a valuable resource for machine learning applications.

Applications

This dataset is ideal for training models in:

Gesture and static sign language recognition

Computer vision projects focused on human-computer interaction

Accessibility technology development for the hearing-impaired community

Benefits of the Dataset

High-quality, standardized images

Multiple hand perspectives ensure model robustness

Realistic simulation of smartphone camera view for practical applications

Suitable for researchers focusing on AI, sign language translation, and accessibility solutions

Potential Use Cases

AI and Machine Learning: Can be used to train AI models for recognizing static signs.

Computer Vision: Enhances projects in gesture recognition and human-computer interaction.

Accessibility Technology: Advances sign language translation for hearing-impaired users.

This dataset is sourced from Kaggle.
P
Food Image Classification Dataset Dataset
paperswithcode.com
Updated Jul 26, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marc Bolaños; Aina Ferrà; Petia Radeva (2017). Food Image Classification Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/food-image-classification-dataset
Explore at:
Dataset updated
Jul 26, 2017
Authors
Marc Bolaños; Aina Ferrà; Petia Radeva
Description
About Dataset The file contains 24K unique figure obtained from various Google resources Meticulously curated figure ensuring diversity and representativeness Provides a solid foundation for developing robust and precise figure allocation algorithms Encourages exploration in the fascinating field of feed figure allocation

Unparalleled Diversity Dive into a vast collection spanning culinary landscapes worldwide. Immerse yourself in a diverse array of cuisines, from Italian pasta to Japanese sushi. Explore a rich tapestry of food imagery, meticulously curated for accuracy and breadth. Precision Labeling Benefit from meticulous labeling, ensuring each image is tagged with precision. Access detailed metadata for seamless integration into your machine learning projects. Empower your algorithms with the clarity they need to excel in food recognition tasks. Endless Applications Fuel advancements in machine learning and computer vision with this comprehensive dataset. Revolutionize food industry automation, from inventory management to quality control. Enable innovative applications in health monitoring and dietary analysis for a healthier tomorrow. Seamless Integration Seamlessly integrate our dataset into your projects with user-friendly access and documentation. Enjoy high-resolution images optimized for compatibility with a range of AI frameworks. Access support and resources to maximize the potential of our dataset for your specific needs.

Conclusion Embark on a culinary journey through the lens of artificial intelligence and unlock the potential of feed figure allocation with our SEO-optimized file. Elevate your research, elevate your projects, and elevate the way we perceive and interact with food in the digital age. Dive in today and savor the possibilities!

This dataset is sourced from Kaggle.
o
arXiv Paper Abstracts
opendatabay.com
.undefined
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). arXiv Paper Abstracts [Dataset]. https://www.opendatabay.com/data/dataset/b1fe3b22-0ace-4bb5-b400-818fbf063adf
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 23, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Education & Learning Analytics
Description
Context Paper submission systems (CMT, OpenReview, etc.) require the users to upload paper titles and paper abstracts and then specify the subject areas their papers best belong to. Won't it be nice if such submission systems provided viable subject area suggestions as to where the corresponding papers could be best associated with?

This dataset would allow developers to build baseline models that might benefit this use case. Data analysts might also enjoy analyzing the intricacies of different papers and how well their abstracts correlate to their noted categories. Additionally, we hope that the dataset will serve as a decent benchmark for building useful text classification systems.

Content The dataset collection process is available here in this notebook. Please use the latest version of the data to run your experiments. Here's an accompanying blog post on keras.io discussing the motivation behind this dataset, building a simple baseline model, etc.: Large-scale multi-label text classification.

Acknowledgements Thanks to Lukas Schwab (author of arxiv.py) for helping us build our initial data collection utilities. Thanks to Robert Bradshaw for his inputs on the Apache Beam pipeline. Thanks to the ML-GDE program for providing GCP credits that allowed us to run the Beam pipeline at scale on Dataflow.

Original Data Source: arXiv Paper Abstracts
A
‘1000 Netflix Shows’ analyzed by Analyst-2
analyst-2.ai
Updated Aug 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘1000 Netflix Shows’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-1000-netflix-shows-774c/1a6199df/?iid=004-347&v=presentation
Explore at:
Dataset updated
Aug 4, 2020
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘1000 Netflix Shows’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/chasewillden/netflix-shows on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

Netflix in the past 5-10 years has captured a large populate of viewers. With more viewers, there most likely an increase of show variety. However, do people understand the distribution of ratings on Netflix shows?

Content

Because of the vast amount of time it would take to gather 1,000 shows one by one, the gathering method took advantage of the Netflix’s suggestion engine. The suggestion engine recommends shows similar to the selected show. As part of this data set, I took 4 videos from 4 ratings (totaling 16 unique shows), then pulled 53 suggested shows per video. The ratings include: G, PG, TV-14, TV-MA. I chose not to pull from every rating (e.g. TV-G, TV-Y, etc.).

Acknowledgements

The data set and the research article can be found at The Concept Center

Inspiration

I was watching Netflix with my wife and we asked ourselves, why are there so many R and TV-MA rating shows?

--- Original source retains full ownership of the source dataset ---

Facebook

Twitter

Click to copy link

Link copied

Cite

Akshat Jain (2024). Job Postings [Dataset]. https://www.kaggle.com/datasets/akshatkjain/job-postings/versions/1

Job Postings

Categorized Roles with Detailed Descriptions, Benefits, and requirements

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Feb 3, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Akshat Jain

Description

This dataset offers an extensive assortment of job postings, designed to support investigations and examinations within the realms of job market patterns, natural language processing (NLP), and machine learning. Developed for educational and research objectives, this dataset presents a varied array of job advertisements spanning diverse industries and job categories.

Description of dataset:

job_postings.csv - Category- The category of the job. - Workplace- If the job is remote, on-site or hybrid. - Location- Location of the job posting. - Department- The department for which the job has been posted. - Type- If the job is full-time, part-time

job_description.csv - Category: The job category for the position. - Description: A detailed overview of the job role, responsibilities, and qualifications, often provided by the employer. - Benefits: Perks and advantages associated with the job, such as professional development opportunities, wellness programs, flexible working arrangements, and more. - Requirements: Essential skills, qualifications, and experiences expected from candidates applying for the job.

Potential use cases:

Optimizing workforce planning and talent acquisition strategies.
Developing NLP models for resume parsing and job matching.
Building predictive models to forecast job market trends.
Exploring salary prediction models for various job roles.
Analyzing regional job market disparities and opportunities.

Clear search

Close search

Google apps

Main menu

Job Postings

Description of dataset:

Potential use cases:

Benefits of WFH

Dataset

Contents

Nutrition Powerhouse Formulations

Kaggle DS Survey 2019

Context

Content

Acknowledgements

Inspiration

‘DISEASE PREDICTION USING MACHINE LEARNING WITH GUI’ analyzed by Analyst-2

5 Benefits of Creative Writing

Dataset

Contents

Cow Segmentation Dataset Dataset

‘📈 Pension Insurance Data’ analyzed by Analyst-2

About this dataset

How to use this dataset

Acknowledgements

Start A New Notebook!

‘College Football Bowl Games’ analyzed by Analyst-2

About this dataset

Background

Methodology

Source

How to use this dataset

Acknowledgements

Start A New Notebook!

Fruits Dataset for Classification

‘US Public Food Assistance’ analyzed by Analyst-2

Context

Content

Motivation

Additional Content Ideas

Health Insurance Marketplace

Exploration Ideas

Data Description

1. Original versions of the data

2. Combined CSV files that contain

3. SQLite database

Paimon Dataset YOLO Detection Dataset

‘COVID vaccination vs. mortality ’ analyzed by Analyst-2

Context

Content

Data Collection

Countries in this dataset:

‘ Sales Conversion Optimization’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

‘Netflix Shows’ analyzed by Analyst-2

About this dataset

Background

Netflix Suggestion Engine

Source

How to use this dataset

Acknowledgements

Start A New Notebook!

Spanish Sign Language Alphabet Dataset

Food Image Classification Dataset Dataset

arXiv Paper Abstracts

‘1000 Netflix Shows’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

Job Postings

Categorized Roles with Detailed Descriptions, Benefits, and requirements

Description of dataset:

Potential use cases: