100+ datasets found
  1. Job Postings

    • kaggle.com
    Updated Feb 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Akshat Jain (2024). Job Postings [Dataset]. https://www.kaggle.com/datasets/akshatkjain/job-postings/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 3, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Akshat Jain
    Description

    This dataset offers an extensive assortment of job postings, designed to support investigations and examinations within the realms of job market patterns, natural language processing (NLP), and machine learning. Developed for educational and research objectives, this dataset presents a varied array of job advertisements spanning diverse industries and job categories.

    Description of dataset:

    job_postings.csv - Category- The category of the job. - Workplace- If the job is remote, on-site or hybrid. - Location- Location of the job posting. - Department- The department for which the job has been posted. - Type- If the job is full-time, part-time

    job_description.csv - Category: The job category for the position. - Description: A detailed overview of the job role, responsibilities, and qualifications, often provided by the employer. - Benefits: Perks and advantages associated with the job, such as professional development opportunities, wellness programs, flexible working arrangements, and more. - Requirements: Essential skills, qualifications, and experiences expected from candidates applying for the job.

    Potential use cases:

    • Optimizing workforce planning and talent acquisition strategies.
    • Developing NLP models for resume parsing and job matching.
    • Building predictive models to forecast job market trends.
    • Exploring salary prediction models for various job roles.
    • Analyzing regional job market disparities and opportunities.
  2. Benefits of WFH

    • kaggle.com
    Updated Jul 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shital Gaikwad (2021). Benefits of WFH [Dataset]. https://www.kaggle.com/shitalgaikwad123/benefits-of-wfh/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 23, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Shital Gaikwad
    Description

    Dataset

    This dataset was created by Shital Gaikwad

    Contents

  3. Nutrition Powerhouse Formulations

    • kaggle.com
    zip
    Updated Mar 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    willian oliveira (2024). Nutrition Powerhouse Formulations [Dataset]. https://www.kaggle.com/datasets/willianoliveiragibin/nutrition-powerhouse-formulations
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 29, 2024
    Authors
    willian oliveira
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    this graph was retired the OurDataWorld :

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F3acab846617aabda6345f7cf9e73ce8c%2Fgraph3.png?generation=1711743920521272&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F28ca12fa98b1aa3ba0146af179e87f57%2Fgraph1.png?generation=1711743952812788&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F3cbea2b6d7978992b7bbd51d30a9d874%2Fgraph2.png?generation=1711743936272984&alt=media" alt="">

    Malnutrition is a multifaceted issue that extends far beyond the simple concept of hunger and caloric intake. While ensuring an adequate supply of calories is undoubtedly crucial, it is equally important to consider the nutritional quality of the foods consumed. The composition of our diets plays a pivotal role in determining our overall health and well-being.

    When we discuss malnutrition, we must broaden our perspective to encompass not only the quantity but also the quality of food intake. It's not just about filling stomachs; it's about providing the body with essential nutrients such as proteins, fats, vitamins, and minerals. Even if individuals consume enough calories, they can still suffer from malnutrition if their diets lack diversity and fail to deliver the necessary array of nutrients for optimal health.

    A significant concern associated with poor dietary choices is the prevalence of micronutrient deficiencies. These deficiencies arise when individuals consume diets that are inadequate in essential vitamins and minerals. A diet that lacks diversity and relies heavily on processed or refined foods often fails to meet the body's micronutrient requirements, leading to a range of health problems and complications.

    Addressing malnutrition requires a comprehensive approach that considers not only individual dietary habits but also broader societal and environmental factors. The environmental impact of food production and consumption cannot be overstated. As the global population continues to grow, ensuring access to nutritious foods for everyone while minimizing the environmental footprint of agriculture has become an urgent priority.

    One of the key challenges we face is finding sustainable solutions to ensure that nutritious diets are accessible and affordable for all. This necessitates a shift towards more sustainable food systems that prioritize nutrient-rich foods while minimizing environmental degradation. Sustainable agriculture practices, such as organic farming and regenerative agriculture, can play a crucial role in achieving this goal by promoting biodiversity, reducing chemical inputs, and enhancing soil health.

    Furthermore, promoting dietary diversity and education about nutrition are essential components of any strategy aimed at combating malnutrition. Encouraging individuals to consume a wide variety of foods, including fruits, vegetables, whole grains, and lean proteins, can help ensure they receive a balanced intake of essential nutrients. Nutrition education programs can empower individuals to make healthier food choices and adopt sustainable eating habits that benefit both their health and the planet.

    In addition to individual-level interventions, policymakers and stakeholders must work together to implement broader systemic changes that promote food security and sustainability. This includes investing in agricultural research and innovation, supporting smallholder farmers, and implementing policies that incentivize the production and consumption of nutritious, environmentally friendly foods.

    Ultimately, addressing malnutrition requires a concerted effort from all sectors of society. By prioritizing nutritious diets, promoting sustainable food systems, and addressing the root causes of food insecurity and environmental degradation, we can work towards a future where everyone has access to healthy, sustainable food choices. Together, we can build a world where malnutrition is no longer a widespread concern, and all individuals can thrive and reach their full potential.

  4. Kaggle DS Survey 2019

    • kaggle.com
    Updated Dec 1, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alan Asri (2019). Kaggle DS Survey 2019 [Dataset]. https://www.kaggle.com/datasets/alanasri/kaggle-ds-survey-2019
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 1, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Alan Asri
    Description

    Context

    This notebook contains a thorough analysis and explanation related to the survey conducted by Kaggle. The survey was conducted on respondents from work backgrounds, age variations, where they lived, the companies where they worked. Survey questions contain about the world of the field they work in related to Data Scient and Machine Learning.

    Content

    The following Explanatory Data Analysis is taking data from survey results conducted by Kaggle in 2019 on respondents who give questions about Mechine Learning and Data Scients. Some core points that are in this analysis are as follows, 1. Graph Distribution Age with Formal Education 2. Plot Graph Company and Spent Money in Mechine Learning 3. Comparison spent cost level in Mechine Learning by each company 4. Data Scientist Experience & Their Compensation 5. Correlation between Mechine Learning Experience and Salary benefit 6. Correlation Data Scientist with his Compensation 7. Favourite Media source on Data Scients Topic 8. Favourite media by Age Distribution, Most Likely media by Data Scientist 9. Course Platform for Data Scientist 10. Role Job for each Title, Primary Job of Data Scientist 11. Reguler Programming Languange by Job Title, especially for Data Scientist 12. Comparison Ability spesific programming and Compensation 13. What is the Languange programming learn first aspiring Data Scientist? 14. Integrated Development Environments reguler basis 15. Top 5 IDE and Which Country is using it. Microsoft not dominant in USA 16. What is Notebook as majority likely as a Reguler Basis. Google domination 17. Which Country and What Company use What Hardware for Mechine Learning 18. Role Job based on Spesific Company Type 19. Computer Vision method mostly used by Company 20. Distribution Company by each country 21. Cloud Product, Amazon domination, Goole follow 22. Big Data Product, Amazon majority in Enterprise, Google majority in All

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  5. A

    ‘DISEASE PREDICTION USING MACHINE LEARNING WITH GUI’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘DISEASE PREDICTION USING MACHINE LEARNING WITH GUI’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-disease-prediction-using-machine-learning-with-gui-5ad4/latest
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘DISEASE PREDICTION USING MACHINE LEARNING WITH GUI’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/neelima98/disease-prediction-using-machine-learning on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Due to big data progress in biomedical and healthcare communities, accurate study of medical data benefits early disease recognition, patient care and community services. When the quality of medical data is incomplete the exactness of study is reduced. Moreover, different regions exhibit unique appearances of certain regional diseases, which may results in weakening the prediction of disease outbreaks. In this project, it bid a Machine learning Decision tree map, Navie Bayes, Random forest algorithm by using structured and unstructured data from hospital. It also uses Machine learning algorithm for partitioning the data. To the highest of gen, none of the current work attentive on together data types in the zone of remedial big data analytics. Compared to several typical calculating algorithms, the scheming accuracy of our proposed algorithm reaches 94.8% with an regular speed which is quicker than that of the unimodal disease risk prediction algorithm and produces report.

    --- Original source retains full ownership of the source dataset ---

  6. 5 Benefits of Creative Writing

    • kaggle.com
    Updated Jun 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Warren Morrison (2021). 5 Benefits of Creative Writing [Dataset]. https://www.kaggle.com/warrenmorrison/5-benefits-of-creative-writing/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 21, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Warren Morrison
    Description

    Dataset

    This dataset was created by Warren Morrison

    Contents

  7. P

    Cow Segmentation Dataset Dataset

    • paperswithcode.com
    Updated Feb 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Cow Segmentation Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/cow-segmentation-dataset
    Explore at:
    Dataset updated
    Feb 21, 2025
    Description

    Description:

    👉 Download the dataset here

    The Cow Segmentation Dataset is a comprehensive resource designed for segmentation tasks in machine learning. It features a wide variety of cow images annotated in the COCO format, ensuring compatibility with a range of popular machine learning models, including YOLOv8, Mask R-CNN, and others. This dataset empowers researchers and developers to train robust models for recognizing and segmenting cows in various contexts, revolutionizing AI-driven applications in agriculture.

    Download Dataset

    The dataset consists of high-resolution images that capture cows from different angles, poses, and environments. With detailed annotations in the COCO format, each image is segmented to highlight the cow’s body, enabling precise object recognition. The segmentation data is easily adaptable to models used in advanced image processing tasks, making it a highly flexible resource.

    Applications in Agriculture

    This dataset offers diverse applications in smart farming, such as automated cow monitoring, health diagnostics, and livestock management. It supports real-time systems for tracking cow behavior, analyzing health indicators, and managing livestock populations effectively. The segmentation accuracy helps in building AI models that can contribute to precision farming, reducing manual efforts and improving overall productivity.

    Use Cases and Future Potential

    Livestock Management: Automating tasks like cow counting, posture analysis, and herd management using AI-driven systems.

    Health Monitoring: Identifying physical conditions like lameness or injury through detailed image segmentation.

    Herd Behavior Analysis: Real-time behavior tracking using models trained on various cow positions and movements.

    Benefits of Using the Dataset

    Diverse Annotations: A rich set of segmentation masks for various cow breeds and environments.

    Model Compatibility: Ready-to-use COCO annotations for easy integration into advanced machine learning models.

    Real-World Applications: Supports the development of AI systems for real-time livestock monitoring and analysis.

    This dataset is sourced from Kaggle.

  8. A

    ‘📈 Pension Insurance Data’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘📈 Pension Insurance Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-pension-insurance-data-2e7e/89a13dbf/?iid=000-256&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘📈 Pension Insurance Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/pension-insurance-datae on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    The tables include statistics on the people and pensions that PBGC protects, including how many Americans are in PBGC-insured pension plans, how many get PBGC benefits, and where they live.

    Note: Links in the first sheet associated with each table following.

    Source: https://catalog.data.gov/dataset/pension-insurance-data-tables

    This dataset was created by Data Society and contains around 100 samples along with Data Book Listing, Table, technical information and other features such as: - Data Book Listing - Table - and more.

    How to use this dataset

    • Analyze Data Book Listing in relation to Table
    • Study the influence of Data Book Listing on Table
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit Data Society

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  9. A

    ‘College Football Bowl Games’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘College Football Bowl Games’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-college-football-bowl-games-efe5/9866ff9c/
    Explore at:
    Dataset updated
    Feb 13, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘College Football Bowl Games’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/college-football-bowl-gamese on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    Background

    Home field advantage is always the most desirable, but does data back it up? I’ve pulled stats on college football bowl games to see if having the home field advantage is all it is cracked up to be.

    Methodology

    The data collected was scraped from www.foxsports.com.

    Source

    The research and blog post can be found at The Concept Center

    This dataset was created by Chase Willden and contains around 20000 samples along with Receiving Receiving Yards, Kicking Pat Made, technical information and other features such as: - Kick Return Kick Return Touchdowns - Passing Completions - and more.

    How to use this dataset

    • Analyze Kick Return Kick Return Avg in relation to Punt Return Punt Return Long
    • Study the influence of Kicking Kicking Points on Kick Return Kick Return Long
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit Chase Willden

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  10. m

    Fruits Dataset for Classification

    • data.mendeley.com
    Updated Feb 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS GTS (2025). Fruits Dataset for Classification [Dataset]. http://doi.org/10.17632/rg254yr63x.1
    Explore at:
    Dataset updated
    Feb 11, 2025
    Authors
    GTS GTS
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    About Dataset (strawberries, peaches, pomegranates) Photo requirements: 1-White background 2-.jpg 3- Image size 300*300 The number of photos required is 250 photos of each fruit when it is fresh and 250 photos of each Fruit Dataset for Classification when it is rotten. Total 1500 images

    Diverse Collection With a diverse collection of Product images, the files provides an excellent foundation for developing and testing machine learning models designed for image recognition and allocation. Each image is captured under different lighting conditions and backgrounds, offering a realistic challenge for algorithms to overcome.

    Real-World Applications The variability in the dataset ensures that models trained on it can generalize well to real-world scenarios, making them robust and reliable. The dataset includes common fruits such as apples, bananas, oranges, and strawberries, among others, allowing for comprehensive training and evaluation.

    Industry Use Cases One of the significant advantages of using the Fruits Dataset for Classification is its applicability in various fields such as agriculture, retail, and the food industry. In agriculture, it can help automate the process of fruit sorting and grading, enhancing efficiency and reducing labor costs. In retail, it can be used to develop automated checkout systems that accurately identify fruits, streamlining the purchasing process.

    Educational Value The dataset is also valuable for educational purposes, providing students and educators with a practical tool to learn and teach machine learning concepts. By working with this dataset, learners can gain hands-on experience in data preprocessing, model training, and evaluation.

    Conclusion The Fruits Dataset for Classification is a versatile and indispensable resource for advancing the field of image classification. Its diverse and high-quality images, coupled with practical applications, make it a go-to dataset for researchers, developers, and educators aiming to improve and innovate in machine learning and computer vision.

    This dataset is sourced from Kaggle.

  11. A

    ‘US Public Food Assistance’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Apr 22, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2019). ‘US Public Food Assistance’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-us-public-food-assistance-5075/ca5319fe/?iid=006-512&v=presentation
    Explore at:
    Dataset updated
    Apr 22, 2019
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Analysis of ‘US Public Food Assistance’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/jpmiller/publicassistance on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This dataset focuses on public assistance programs in the United States that provide food, namely SNAP and WIC. If you are interested in a broader picture of food security across the world, please see Food Security Indicators for the World 2016-2020.

    Initial coverage was for the Special Supplemental Nutrition Program for Women, Infants, and Children Program, or simply WIC. The program allocates Federal and State funds to help low-income women and children up to age five who are at nutritional risk. Funds are used to provide supplemental foods, baby formula, health care, and nutrition education.

    Starting with version 5, the dataset also covers the US Supplemental Nutrition Assistance Program, more commonly known as SNAP. The program is the successor to the Food Stamps program previously in place. The program provides food assistance to low-income families in the form of a debit card. A 2016 study using POS data from SNAP-eligible vendors showed the three most purchased types of food to be meats, sweetened beverages, and vegetables.

    Content

    Files may include participation data and spending for state programs, and poverty data for each state. Data for WIC covers fiscal years 2013-2016, which is actually October 2012 through September 2016. Data for SNAP covers 2015 to 2020.

    Motivation

    My original purpose here is two-fold:

    • Explore various aspects of US Public Assistance. Show trends over recent years and better understand differences across state agencies. Although the federal government sponsors the program and provides funding, program are administered at the state level and can widely vary. Indian nations (native Americans) also administer their own programs.

    • Share with the Kaggle Community the joy - and pain - of working with government data. Data is often spread across numerous agency sites and comes in a variety of formats. Often the data is provided in Excel, with the files consisting of multiple tabs. Also, files are formatted as reports and contain aggregated data (sums, averages, etc.) along with base data.

    As of March 2nd, I am expanding the purpose to support the M5 Forecasting Challenges here on Kaggle. Store sales are partly driven by participation in Public Assistance programs. Participants typically receive the items free of charge. The store then recovers the sale price from the state agencies administering the program.

    Additional Content Ideas

    The dataset can benefit greatly from additional content. Economics, additional demographics, administrative costs and more. I'd like to eventually explore the money trail from taxes and corporate subsidies, through the government agencies, and on to program participants. All community ideas are welcome!

    --- Original source retains full ownership of the source dataset ---

  12. Health Insurance Marketplace

    • kaggle.com
    zip
    Updated May 1, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    US Department of Health and Human Services (2017). Health Insurance Marketplace [Dataset]. https://www.kaggle.com/hhs/health-insurance-marketplace
    Explore at:
    zip(868821924 bytes)Available download formats
    Dataset updated
    May 1, 2017
    Dataset provided by
    United States Department of Health and Human Serviceshttp://www.hhs.gov/
    Authors
    US Department of Health and Human Services
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The Health Insurance Marketplace Public Use Files contain data on health and dental plans offered to individuals and small businesses through the US Health Insurance Marketplace.

    median plan premiums

    Exploration Ideas

    To help get you started, here are some data exploration ideas:

    • How do plan rates and benefits vary across states?
    • How do plan benefits relate to plan rates?
    • How do plan rates vary by age?
    • How do plans vary across insurance network providers?

    See this forum thread for more ideas, and post there if you want to add your own ideas or answer some of the open questions!

    Data Description

    This data was originally prepared and released by the Centers for Medicare & Medicaid Services (CMS). Please read the CMS Disclaimer-User Agreement before using this data.

    Here, we've processed the data to facilitate analytics. This processed version has three components:

    1. Original versions of the data

    The original versions of the 2014, 2015, 2016 data are available in the "raw" directory of the download and "../input/raw" on Kaggle Scripts. Search for "dictionaries" on this page to find the data dictionaries describing the individual raw files.

    2. Combined CSV files that contain

    In the top level directory of the download ("../input" on Kaggle Scripts), there are six CSV files that contain the combined at across all years:

    • BenefitsCostSharing.csv
    • BusinessRules.csv
    • Network.csv
    • PlanAttributes.csv
    • Rate.csv
    • ServiceArea.csv

    Additionally, there are two CSV files that facilitate joining data across years:

    • Crosswalk2015.csv - joining 2014 and 2015 data
    • Crosswalk2016.csv - joining 2015 and 2016 data

    3. SQLite database

    The "database.sqlite" file contains tables corresponding to each of the processed CSV files.

    The code to create the processed version of this data is available on GitHub.

  13. P

    Paimon Dataset YOLO Detection Dataset

    • paperswithcode.com
    • gts.ai
    Updated Dec 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Paimon Dataset YOLO Detection Dataset [Dataset]. https://paperswithcode.com/dataset/paimon-dataset-yolo-detection
    Explore at:
    Dataset updated
    Dec 3, 2024
    Description

    Description:

    👉 Download the dataset here

    This dataset consists of a diverse collection of images featuring Paimon, a popular character from the game Genshin Impact. The images have been sourced from in-game gameplay footage and capture Paimon from various angles and in different sizes (scales), making the dataset suitable for training YOLO object detection models.

    The dataset provides a comprehensive view of Paimon in different lighting conditions, game environments, and positions, ensuring the model can generalize well to similar characters or object detection tasks. While most annotations are accurately labeled, a small number of annotations may include minor inaccuracies due to manual labeling errors. This is ideal for researchers and developers working on character recognition, object detection in gaming environments, or other AI vision tasks.

    Download Dataset

    Dataset Features:

    Image Format: .jpg files in 640×320 resolution.

    Annotation Format: .txt files in YOLO format, containing bounding box data with:

    class_id

    x_center

    y_center

    width

    height

    Use Cases:

    Character Detection in Games: Train YOLO models to detect and identify in-game characters or NPCs.

    Gaming Analytics: Improve recognition of specific game elements for AI-powered game analytics tools.

    Research: Contribute to academic research focused on object detection or computer vision in animated and gaming environments.

    Data Structure:

    Images: High-quality .jpg images captured from multiple perspectives, ensuring robust model training across various orientations and lighting scenarios.

    Annotations: Each image has an associated .txt file that follows the YOLO format. The annotations are structured to include class identification, object location (center coordinates), and

    bounding box dimensions.

    Key Advantages:

    Varied Angles and Scales: The dataset includes Paimon from multiple perspectives, aiding in creating more versatile and adaptable object detection models.

    Real-World Scenario: Extracted from actual gameplay footage, the dataset simulates real-world detection challenges such as varying backgrounds, motion blur, and changing character scales.

    Training Ready: Suitable for training YOLO models and other deep learning frameworks that require object detection capabilities.

    This dataset is sourced from Kaggle.

  14. A

    ‘COVID vaccination vs. mortality ’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Aug 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘COVID vaccination vs. mortality ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-covid-vaccination-vs-mortality-cbd8/06c8ccd2/?iid=010-492&v=presentation
    Explore at:
    Dataset updated
    Aug 4, 2020
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘COVID vaccination vs. mortality ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sinakaraji/covid-vaccination-vs-death on 12 November 2021.

    --- Dataset description provided by original source is as follows ---

    Context

    The COVID-19 outbreak has brought the whole planet to its knees.More over 4.5 million people have died since the writing of this notebook, and the only acceptable way out of the disaster is to vaccinate all parts of society. Despite the fact that the benefits of vaccination have been proved to the world many times, anti-vaccine groups are springing up all over the world. This data set was generated to investigate the impact of coronavirus vaccinations on coronavirus mortality.

    Content

    countryiso_codedatetotal_vaccinationspeople_vaccinatedpeople_fully_vaccinatedNew_deathspopulationratio
    country nameiso code for each countrydate that this data belongnumber of all doses of COVID vaccine usage in that countrynumber of people who got at least one shot of COVID vaccinenumber of people who got full vaccine shotsnumber of daily new deaths2021 country population% of vaccinations in that country at that date = people_vaccinated/population * 100

    Data Collection

    This dataset is a combination of the following three datasets:

    1.https://www.kaggle.com/gpreda/covid-world-vaccination-progress

    2.https://covid19.who.int/WHO-COVID-19-global-data.csv

    3.https://www.kaggle.com/rsrishav/world-population

    you can find more detail about this dataset by reading this notebook:

    https://www.kaggle.com/sinakaraji/simple-linear-regression-covid-vaccination

    Countries in this dataset:

    AfghanistanAlbaniaAlgeriaAndorraAngola
    AnguillaAntigua and BarbudaArgentinaArmeniaAruba
    AustraliaAustriaAzerbaijanBahamasBahrain
    BangladeshBarbadosBelarusBelgiumBelize
    BeninBermudaBhutanBolivia (Plurinational State of)Brazil
    Bosnia and HerzegovinaBotswanaBrunei DarussalamBulgariaBurkina Faso
    CambodiaCameroonCanadaCabo VerdeCayman Islands
    Central African RepublicChadChileChinaColombia
    ComorosCook IslandsCosta RicaCroatiaCuba
    CuraçaoCyprusDenmarkDjiboutiDominica
    Dominican RepublicEcuadorEgyptEl SalvadorEquatorial Guinea
    EstoniaEthiopiaFalkland Islands (Malvinas)FijiFinland
    FranceFrench PolynesiaGabonGambiaGeorgia
    GermanyGhanaGibraltarGreeceGreenland
    GrenadaGuatemalaGuineaGuinea-BissauGuyana
    HaitiHondurasHungaryIcelandIndia
    IndonesiaIran (Islamic Republic of)IraqIrelandIsle of Man
    IsraelItalyJamaicaJapanJordan
    KazakhstanKenyaKiribatiKuwaitKyrgyzstan
    Lao People's Democratic RepublicLatviaLebanonLesothoLiberia
    LibyaLiechtensteinLithuaniaLuxembourgMadagascar
    MalawiMalaysiaMaldivesMaliMalta
    MauritaniaMauritiusMexicoRepublic of MoldovaMonaco
    MongoliaMontenegroMontserratMoroccoMozambique
    MyanmarNamibiaNauruNepalNetherlands
    New CaledoniaNew ZealandNicaraguaNigerNigeria
    NiueNorth MacedoniaNorwayOmanPakistan
    occupied Palestinian territory, including east Jerusalem
    PanamaPapua New GuineaParaguayPeruPhilippines
    PolandPortugalQatarRomaniaRussian Federation
    RwandaSaint Kitts and NevisSaint Lucia
    Saint Vincent and the GrenadinesSamoaSan MarinoSao Tome and PrincipeSaudi Arabia
    SenegalSerbiaSeychellesSierra LeoneSingapore
    SlovakiaSloveniaSolomon IslandsSomaliaSouth Africa
    Republic of KoreaSouth SudanSpainSri LankaSudan
    SurinameSwedenSwitzerlandSyrian Arab RepublicTajikistan
    United Republic of TanzaniaThailandTogoTongaTrinidad and Tobago
    TunisiaTurkeyTurkmenistanTurks and Caicos IslandsTuvalu
    UgandaUkraineUnited Arab EmiratesThe United KingdomUnited States of America
    UruguayUzbekistanVanuatuVenezuela (Bolivarian Republic of)Viet Nam
    Wallis and FutunaYemenZambiaZimbabwe

    --- Original source retains full ownership of the source dataset ---

  15. A

    ‘ Sales Conversion Optimization’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jul 14, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2016). ‘ Sales Conversion Optimization’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-sales-conversion-optimization-d134/latest
    Explore at:
    Dataset updated
    Jul 14, 2016
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘ Sales Conversion Optimization’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/loveall/clicks-conversion-tracking on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    Cluster Analysis for Ad Conversions Data

    Content

    The data used in this project is from an anonymous organisation’s social media ad campaign. The data file can be downloaded from here. The file conversion_data.csv contains 1143 observations in 11 variables. Below are the descriptions of the variables.

    1.) ad_id: an unique ID for each ad.

    2.) xyz_campaign_id: an ID associated with each ad campaign of XYZ company.

    3.) fb_campaign_id: an ID associated with how Facebook tracks each campaign.

    4.) age: age of the person to whom the ad is shown.

    5.) gender: gender of the person to whim the add is shown

    6.) interest: a code specifying the category to which the person’s interest belongs (interests are as mentioned in the person’s Facebook public profile).

    7.) Impressions: the number of times the ad was shown.

    8.) Clicks: number of clicks on for that ad.

    9.) Spent: Amount paid by company xyz to Facebook, to show that ad.

    10.) Total conversion: Total number of people who enquired about the product after seeing the ad.

    11.) Approved conversion: Total number of people who bought the product after seeing the ad.

    Acknowledgements

    Thanks to the Anonymous data depositor

    Inspiration

    Social Media Ad Campaign marketing is a leading source of Sales Conversion and i have made this data available for the benefit of Businesses using Google Adwords to track Conversions

    --- Original source retains full ownership of the source dataset ---

  16. A

    ‘Netflix Shows’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Netflix Shows’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-netflix-shows-53e6/ea6268fc/?iid=004-315&v=presentation
    Explore at:
    Dataset updated
    Feb 13, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Netflix Shows’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/netflix-showse on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    Background

    Netflix in the past 5-10 years has captured a large populate of viewers. With more viewers, there most likely an increase of show variety. However, do people understand the distribution of ratings on Netflix shows?

    Netflix Suggestion Engine

    Because of the vast amount of time it would take to gather 1,000 shows one by one, the gathering method took advantage of the Netflix’s suggestion engine. The suggestion engine recommends shows similar to the selected show. As part of this data set, I took 4 videos from 4 ratings (totaling 16 unique shows), then pulled 53 suggested shows per video. The ratings include: G, PG, TV-14, TV-MA. I chose not to pull from every rating (e.g. TV-G, TV-Y, etc.).

    Source

    Access to the study can be found at The Concept Center

    This dataset was created by Chase Willden and contains around 1000 samples along with User Rating Score, Rating Description, technical information and other features such as: - Release Year - Title - and more.

    How to use this dataset

    • Analyze User Rating Size in relation to Rating
    • Study the influence of Rating Level on User Rating Score
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit Chase Willden

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  17. P

    Spanish Sign Language Alphabet Dataset

    • paperswithcode.com
    • gts.ai
    Updated Feb 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Spanish Sign Language Alphabet Dataset [Dataset]. https://paperswithcode.com/dataset/spanish-sign-language-alphabet
    Explore at:
    Dataset updated
    Feb 25, 2025
    Description

    Description:

    👉 Download the dataset here

    The Spanish Sign Language dataset consists of 19 static letters and 8 dynamic movements. It was created with the aim of supporting machine learning models for sign language recognition, particularly the Spanish sign language alphabet.

    Dataset Composition

    The dataset contains around 100 images for each of the 19 static letters. Each image was captured over a white background with high-quality cameras. These images represent three different hands at a uniform distance, simulating the perspective from an internal smartphone camera.

    Download Dataset

    Expert Validation

    To ensure high accuracy, the dataset was reviewed and validated by a certified expert in Spanish sign language. This guarantees that each letter is correctly represented, making the dataset a valuable resource for machine learning applications.

    Applications

    This dataset is ideal for training models in:

    Gesture and static sign language recognition

    Computer vision projects focused on human-computer interaction

    Accessibility technology development for the hearing-impaired community

    Benefits of the Dataset

    High-quality, standardized images

    Multiple hand perspectives ensure model robustness

    Realistic simulation of smartphone camera view for practical applications

    Suitable for researchers focusing on AI, sign language translation, and accessibility solutions

    Potential Use Cases

    AI and Machine Learning: Can be used to train AI models for recognizing static signs.

    Computer Vision: Enhances projects in gesture recognition and human-computer interaction.

    Accessibility Technology: Advances sign language translation for hearing-impaired users.

    This dataset is sourced from Kaggle.

  18. P

    Food Image Classification Dataset Dataset

    • paperswithcode.com
    Updated Jul 26, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marc Bolaños; Aina Ferrà; Petia Radeva (2017). Food Image Classification Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/food-image-classification-dataset
    Explore at:
    Dataset updated
    Jul 26, 2017
    Authors
    Marc Bolaños; Aina Ferrà; Petia Radeva
    Description

    About Dataset The file contains 24K unique figure obtained from various Google resources Meticulously curated figure ensuring diversity and representativeness Provides a solid foundation for developing robust and precise figure allocation algorithms Encourages exploration in the fascinating field of feed figure allocation

    Unparalleled Diversity Dive into a vast collection spanning culinary landscapes worldwide. Immerse yourself in a diverse array of cuisines, from Italian pasta to Japanese sushi. Explore a rich tapestry of food imagery, meticulously curated for accuracy and breadth. Precision Labeling Benefit from meticulous labeling, ensuring each image is tagged with precision. Access detailed metadata for seamless integration into your machine learning projects. Empower your algorithms with the clarity they need to excel in food recognition tasks. Endless Applications Fuel advancements in machine learning and computer vision with this comprehensive dataset. Revolutionize food industry automation, from inventory management to quality control. Enable innovative applications in health monitoring and dietary analysis for a healthier tomorrow. Seamless Integration Seamlessly integrate our dataset into your projects with user-friendly access and documentation. Enjoy high-resolution images optimized for compatibility with a range of AI frameworks. Access support and resources to maximize the potential of our dataset for your specific needs.

    Conclusion Embark on a culinary journey through the lens of artificial intelligence and unlock the potential of feed figure allocation with our SEO-optimized file. Elevate your research, elevate your projects, and elevate the way we perceive and interact with food in the digital age. Dive in today and savor the possibilities!

    This dataset is sourced from Kaggle.

  19. o

    arXiv Paper Abstracts

    • opendatabay.com
    .undefined
    Updated Jun 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). arXiv Paper Abstracts [Dataset]. https://www.opendatabay.com/data/dataset/b1fe3b22-0ace-4bb5-b400-818fbf063adf
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jun 23, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Education & Learning Analytics
    Description

    Context Paper submission systems (CMT, OpenReview, etc.) require the users to upload paper titles and paper abstracts and then specify the subject areas their papers best belong to. Won't it be nice if such submission systems provided viable subject area suggestions as to where the corresponding papers could be best associated with?

    This dataset would allow developers to build baseline models that might benefit this use case. Data analysts might also enjoy analyzing the intricacies of different papers and how well their abstracts correlate to their noted categories. Additionally, we hope that the dataset will serve as a decent benchmark for building useful text classification systems.

    Content The dataset collection process is available here in this notebook. Please use the latest version of the data to run your experiments. Here's an accompanying blog post on keras.io discussing the motivation behind this dataset, building a simple baseline model, etc.: Large-scale multi-label text classification.

    Acknowledgements Thanks to Lukas Schwab (author of arxiv.py) for helping us build our initial data collection utilities. Thanks to Robert Bradshaw for his inputs on the Apache Beam pipeline. Thanks to the ML-GDE program for providing GCP credits that allowed us to run the Beam pipeline at scale on Dataflow.

    Original Data Source: arXiv Paper Abstracts

  20. A

    ‘1000 Netflix Shows’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Aug 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘1000 Netflix Shows’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-1000-netflix-shows-774c/1a6199df/?iid=004-347&v=presentation
    Explore at:
    Dataset updated
    Aug 4, 2020
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘1000 Netflix Shows’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/chasewillden/netflix-shows on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    Netflix in the past 5-10 years has captured a large populate of viewers. With more viewers, there most likely an increase of show variety. However, do people understand the distribution of ratings on Netflix shows?

    Content

    Because of the vast amount of time it would take to gather 1,000 shows one by one, the gathering method took advantage of the Netflix’s suggestion engine. The suggestion engine recommends shows similar to the selected show. As part of this data set, I took 4 videos from 4 ratings (totaling 16 unique shows), then pulled 53 suggested shows per video. The ratings include: G, PG, TV-14, TV-MA. I chose not to pull from every rating (e.g. TV-G, TV-Y, etc.).

    Acknowledgements

    The data set and the research article can be found at The Concept Center

    Inspiration

    I was watching Netflix with my wife and we asked ourselves, why are there so many R and TV-MA rating shows?

    --- Original source retains full ownership of the source dataset ---

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Akshat Jain (2024). Job Postings [Dataset]. https://www.kaggle.com/datasets/akshatkjain/job-postings/versions/1
Organization logo

Job Postings

Categorized Roles with Detailed Descriptions, Benefits, and requirements

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 3, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Akshat Jain
Description

This dataset offers an extensive assortment of job postings, designed to support investigations and examinations within the realms of job market patterns, natural language processing (NLP), and machine learning. Developed for educational and research objectives, this dataset presents a varied array of job advertisements spanning diverse industries and job categories.

Description of dataset:

job_postings.csv - Category- The category of the job. - Workplace- If the job is remote, on-site or hybrid. - Location- Location of the job posting. - Department- The department for which the job has been posted. - Type- If the job is full-time, part-time

job_description.csv - Category: The job category for the position. - Description: A detailed overview of the job role, responsibilities, and qualifications, often provided by the employer. - Benefits: Perks and advantages associated with the job, such as professional development opportunities, wellness programs, flexible working arrangements, and more. - Requirements: Essential skills, qualifications, and experiences expected from candidates applying for the job.

Potential use cases:

  • Optimizing workforce planning and talent acquisition strategies.
  • Developing NLP models for resume parsing and job matching.
  • Building predictive models to forecast job market trends.
  • Exploring salary prediction models for various job roles.
  • Analyzing regional job market disparities and opportunities.
Search
Clear search
Close search
Google apps
Main menu