81 datasets found

Physical Exercise Recognition Dataset
kaggle.com
Updated Feb 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhannad Tuameh (2023). Physical Exercise Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/muhannadtuameh/exercise-recognition
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 16, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Muhannad Tuameh
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Note:

Because this dataset has been used in a competition, we had to hide some of the data to prepare the test dataset for the competition. Thus, in the previous version of the dataset, only train.csv file is existed.

Content

This dataset represents 10 different physical poses that can be used to distinguish 5 exercises. The exercises are Push-up, Pull-up, Sit-up, Jumping Jack and Squat. For every exercise, 2 different classes have been used to represent the terminal positions of that exercise (e.g., “up” and “down” positions for push-ups).

Collection Process

About 500 videos of people doing the exercises have been used in order to collect this data. The videos are from Countix Dataset that contain the YouTube links of several human activity videos. Using a simple Python script, the videos of 5 different physical exercises are downloaded. From every video, at least 2 frames are manually extracted. The extracted frames represent the terminal positions of the exercise.

Processing Data

For every frame, MediaPipe framework is used for applying pose estimation, which detects the human skeleton of the person in the frame. The landmark model in MediaPipe Pose predicts the location of 33 pose landmarks (see figure below). Visit Mediapipe Pose Classification page for more details.

https://mediapipe.dev/images/mobile/pose_tracking_full_body_landmarks.png" alt="33 pose landmarks">
Fitness Analysis
kaggle.com
Updated Sep 8, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nithilaa (2020). Fitness Analysis [Dataset]. https://www.kaggle.com/nithilaa/fitness-analysis/activity
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 8, 2020
Dataset provided by
Kaggle
Authors
Nithilaa
Description
Context

This dataset was collected by me, along with my friends during my college days. The dataset mostly contains data from my friends and family members. This dataset has the survey data for the type of fitness practices that people follow.

Acknowledgements

This dataset wouldn't be here without the help of my friends. So, thanks to them!

What is in the dataset

Name of the person attending the survey

Gender of the person attending the survey

Age of the person attending the survey

How important is an exercise to you on the scale of 1 to 5

How do you describe your current level of fitness? - Perfect, Very good, Good, Average, Unfit

How often do you exercise? - Every day, 1 to 2 times a week, 2 to 3 times a week, 3 to 4 times a week, 5 to 6 times a week, never

What barriers, if any, prevent you from exercising more regularly? (Select all that applies) - I don't have enough time, I can't stay motivated, ill become too tired, I have an injury, I don't really enjoy exercising, I exercise regularly with no barriers

What forms of exercise do you currently participate in? (Select all that applies) - Walking or jogging, gym, swimming, yoga, Zumba dance, lifting weights, team sport, I don't really exercise

Do you exercise _? - Alone, With a friend, With a group, Within a class environment, I don't really exercise

What time of the day do you prefer to exercise? - Early morning, afternoon, evening

How long do you spend exercising per day? - 30 min, 1 hour, 2 hours, 3 hours and above, I don't really exercise

Would you say, you eat a healthy balanced diet? - Yes, No, Not always

What prevents you from eating a healthy balanced diet, if any? (Select all that applies) - Lack of time, Cost, Ease of access to fast food, Temptation, and cravings, I have a balanced diet

How healthy do you consider yourself on a scale of 1 to 5?

Have you recommended your friends to follow a fitness routine? - Yes, No

Have you ever purchased fitness equipment? - Yes, No

What motivates you to exercise? (Select all that applies) - I want to be fit, I want to increase muscle mass and strength, I want to lose weight, I want to be flexible, I want to relieve stress, I want to achieve a sporting goal, I'm not really interested in exercising.
California Adults Who Met Physical Activity Guidelines for Americans, 2013
healthdata.gov
data.ca.gov
+4more
application/rdfxml +5
Updated Apr 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
chhs.data.ca.gov (2025). California Adults Who Met Physical Activity Guidelines for Americans, 2013 [Dataset]. https://healthdata.gov/State/California-Adults-Who-Met-Physical-Activity-Guidel/fgbe-di4j
Explore at:
application/rssxml, json, tsv, xml, csv, application/rdfxmlAvailable download formats
Dataset updated
Apr 8, 2025
Dataset provided by
chhs.data.ca.gov
Area covered
California
Description
This dataset is from the 2013 California Dietary Practices Survey of Adults. This survey has been discontinued. Adults were asked a series of eight questions about their physical activity practices in the last month. These questions were borrowed from the Behavior Risk Factor Surveillance System. Data displayed in this table represent California adults who met the aerobic recommendation for physical activity, as defined by the 2008 U.S. Department of Health and Human Services Physical Activity Guidelines for Americans and Objectives 2.1 and 2.2 of Healthy People 2020.

The California Dietary Practices Surveys (CDPS) (now discontinued) was the most extensive dietary and physical activity assessment of adults 18 years and older in the state of California. CDPS was designed in 1989 and was administered biennially in odd years up through 2013. The CDPS was designed to monitor dietary trends, especially fruit and vegetable consumption, among California adults for evaluating their progress toward meeting the 2010 Dietary Guidelines for Americans and the Healthy People 2020 Objectives. For the data in this table, adults were asked a series of eight questions about their physical activity practices in the last month. Questions included: 1) During the past month, other than your regular job, did you participate in any physical activities or exercise such as running, calisthenics, golf, gardening or walking for exercise? 2) What type of physical activity or exercise did you spend the most time doing during the past month? 3) How many times per week or per month did you take part n this activity during the past month? 4) And when you took part in this activity, for how many minutes or hours did you usually keep at it? 5) During the past month, how many times per week or per month did you do physical activities or exercises to strengthen your muscles? Questions 2, 3, and 4 were repeated to collect a second activity. Data were collected using a list of participating CalFresh households and random digit dial, approximately 1,400-1,500 adults (ages 18 and over) were interviewed via phone survey between the months of June and October. Demographic data included gender, age, ethnicity, education level, income, physical activity level, overweight status, and food stamp eligibility status. Data were oversampled for low-income adults to provide greater sensitivity for analyzing trends among our target population.
u
Comprehensive Fitness Industry Statistics 2025
upmetrics.co
webpage
Updated Oct 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Upmetrics (2023). Comprehensive Fitness Industry Statistics 2025 [Dataset]. https://upmetrics.co/blog/fitness-industry-statistics
Explore at:
webpageAvailable download formats
Dataset updated
Oct 25, 2023
Dataset authored and provided by
Upmetrics
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2024
Description
A meticulously compiled dataset providing deep insights into the global fitness industry in 2025. This dataset covers high-demand topics such as the exponential growth of fitness clubs, emerging trends in boutique fitness studios, skyrocketing online fitness training statistics, the flourishing fitness equipment market, and changing consumer behavior and expenditure patterns in the fitness sector.
Workout/Exercises Video
kaggle.com
gts.ai
Updated Mar 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hasyim Abdillah (2023). Workout/Exercises Video [Dataset]. https://www.kaggle.com/hasyimabdillah/workoutfitness-video/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 10, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Hasyim Abdillah
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This dataset was created by myself. This dataset contains videos of people doing workouts. The name of the existing workout corresponds to the name of the folder listed.

Video format: .mp4 Some of the videos are muted

What is the videos resolution? The resolution of this video varies greatly, but I'm trying to find the best possible resolution so that you can lower the resolution according to what you will use later.

How about the duration of the videos? It also varies, but there is at least 1 rep on each video

What are the data sources? Mostly sourced from YouTube, but I also create some of it by myself with my friends

Need the extracted frame of each video? Try check my other dataset for the images of workout/exercise here
Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...
zenodo.org
explore.openaire.eu
zip
Updated Oct 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari (2022). LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive snapshots of our lives in the wild [Dataset]. http://doi.org/10.5281/zenodo.6832242
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6832242
Dataset updated
Oct 20, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sofia Yfantidou; Sofia Yfantidou; Christina Karagianni; Stefanos Efstathiou; Stefanos Efstathiou; Athena Vakali; Athena Vakali; Joao Palotti; Joao Palotti; Dimitrios Panteleimon Giakatos; Dimitrios Panteleimon Giakatos; Thomas Marchioro; Thomas Marchioro; Andrei Kazlouski; Elena Ferrari; Šarūnas Girdzijauskas; Šarūnas Girdzijauskas; Christina Karagianni; Andrei Kazlouski; Elena Ferrari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LifeSnaps Dataset Documentation

Ubiquitous self-tracking technologies have penetrated various aspects of our lives, from physical and mental health monitoring to fitness and entertainment. Yet, limited data exist on the association between in the wild large-scale physical activity patterns, sleep, stress, and overall health, and behavioral patterns and psychological measurements due to challenges in collecting and releasing such datasets, such as waning user engagement, privacy considerations, and diversity in data modalities. In this paper, we present the LifeSnaps dataset, a multi-modal, longitudinal, and geographically-distributed dataset, containing a plethora of anthropological data, collected unobtrusively for the total course of more than 4 months by n=71 participants, under the European H2020 RAIS project. LifeSnaps contains more than 35 different data types from second to daily granularity, totaling more than 71M rows of data. The participants contributed their data through numerous validated surveys, real-time ecological momentary assessments, and a Fitbit Sense smartwatch, and consented to make these data available openly to empower future research. We envision that releasing this large-scale dataset of multi-modal real-world data, will open novel research opportunities and potential applications in the fields of medical digital innovations, data privacy and valorization, mental and physical well-being, psychology and behavioral sciences, machine learning, and human-computer interaction.

The following instructions will get you started with the LifeSnaps dataset and are complementary to the original publication.

Data Import: Reading CSV

For ease of use, we provide CSV files containing Fitbit, SEMA, and survey data at daily and/or hourly granularity. You can read the files via any programming language. For example, in Python, you can read the files into a Pandas DataFrame with the pandas.read_csv() command.

Data Import: Setting up a MongoDB (Recommended)

To take full advantage of the LifeSnaps dataset, we recommend that you use the raw, complete data via importing the LifeSnaps MongoDB database.

To do so, open the terminal/command prompt and run the following command for each collection in the DB. Ensure you have MongoDB Database Tools installed from here.

For the Fitbit data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c fitbit

For the SEMA data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c sema

For surveys data, run the following:

mongorestore --host localhost:27017 -d rais_anonymized -c surveys

If you have access control enabled, then you will need to add the --username and --password parameters to the above commands.

Data Availability

The MongoDB database contains three collections, fitbit, sema, and surveys, containing the Fitbit, SEMA3, and survey data, respectively. Similarly, the CSV files contain related information to these collections. Each document in any collection follows the format shown below:

{ _id:
d
Health Survey for England
digital.nhs.uk
docx, pdf
Updated Dec 17, 2009
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2009). Health Survey for England [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/health-survey-for-england
Explore at:
pdf(393.3 kB), docx(137.7 kB), docx(134.9 kB), pdf(27.0 kB), pdf(7.4 MB), pdf(2.8 MB)Available download formats
Dataset updated
Dec 17, 2009
License
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Time period covered
Jan 1, 2008 - Dec 31, 2008
Area covered
England
Description
Note 08/07/13: Errata for regarding two variables incorrectly labelled with the same description in the Data Archive for the Health Survey for England - 2008 dataset deposited in the UK Data Archive Author: Health and Social Care Information Centre, Lifestyle Statistics Responsible Statistician: Paul Eastwood, Lifestyles Section Head Version: 1 Original date of publication: 17th December 2009 Date of errata: 11th June 2013 · Two physical activity variables (NSWA201 and WEPWA201) in the Health Survey for England - 2008 dataset deposited in the Data Archive had the same description of 'on weekdays in the last week have you done any cycling (not to school)?'. This is correct for NSWA201, but incorrect for WEPWA201 · The correct descriptions are: · NSWA201 - 'on weekdays in the last week have you done any cycling (not to school)?' · WEPWA201 - 'on weekends in the last week have you done any cycling (not to school)?' · This has been corrected and the amended dataset has been deposited in the UK Data Archive. NatCen Social Research and the Health and Social Care Information Centre apologise for any inconvenience this may have caused. Note 18/12/09: Please note that a slightly amended version of the Health Survey for England 2008 report, Volume 1, has been made available on this page on 18 December 2009. This was in order to correct the legend and title of figure 13G on page 321 of this volume. The NHS IC apologises for any inconvenience caused. The Health Survey for England is a series of annual surveys designed to measure health and health-related behaviours in adults and children living in private households in England. The survey was commissioned originally by the Department of Health and, from April 2005 by The NHS Information Centre for health and social care. The Health Survey for England has been designed and carried out since 1994 by the Joint Health Surveys Unit of the National Centre for Social Research (NatCen) and the Department of Epidemiology and Public Health at the University College London Medical School (UCL). The 2008 Health Survey for England focused on physical activity and fitness. Adults and children were asked to recall their physical activity over recent weeks, and objective measures of physical activity and fitness were also obtained. A secondary objective was to examine results on childhood obesity and other factors affecting health, including fruit and vegetable consumption, drinking and smoking.
d
Strategic Measure_Number and percentage of people who successfully completed...
catalog.data.gov
data.austintexas.gov
+2more
Updated Jun 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.austintexas.gov (2025). Strategic Measure_Number and percentage of people who successfully completed workforce development training, EOA.F.4 & EOA.G.3 [Dataset]. https://catalog.data.gov/dataset/strategic-measure-number-and-percentage-of-people-who-successfully-completed-workforce-dev
Explore at:
Dataset updated
Jun 25, 2025
Dataset provided by
data.austintexas.gov
Description
This dataset supports measure EOA.F.4 and EOA.G.3 of SD23. It demonstrates the total number of enrollees to those who successfully completed training. Depending on the length of training which can last anywhere from 12 weeks to 3 years there may or may not be a completion rate available at the time of reporting. These trainees are all participating in Community Based Organization workforce training programs in partnership with the Master Community Workforce Plan adopted by Imagine Austin. After participants successfully complete training, the Ray Marshall Center monitors completer-participant earnings. Earnings above Federal Poverty Limit are considered "above poverty". Data sourced by University of Texas Ray Marshall Center, in partnership with Workforce Solution, Capital Area. View more details and insights related to this data set on the story page: https://data.austintexas.gov/stories/s/xfnx-fpv8 https://data.austintexas.gov/stories/s/Number-of-Persons-Moved-Out-of-Poverty-Into-Middle/xg7g-9uru/
f
ORBIT: A real-world few-shot dataset for teachable object recognition...
city.figshare.com
bin
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniela Massiceti; Lida Theodorou; Luisa Zintgraf; Matthew Tobias Harris; Simone Stumpf; Cecily Morrison; Edward Cutrell; Katja Hofmann (2023). ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision [Dataset]. http://doi.org/10.25383/city.14294597.v3
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.25383/city.14294597.v3
Dataset updated
May 31, 2023
Dataset provided by
City, University of London
Authors
Daniela Massiceti; Lida Theodorou; Luisa Zintgraf; Matthew Tobias Harris; Simone Stumpf; Cecily Morrison; Edward Cutrell; Katja Hofmann
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Object recognition predominately still relies on many high-quality training examples per object category. In contrast, learning new objects from only a few examples could enable many impactful applications from robotics to user personalization. Most few-shot learning research, however, has been driven by benchmark datasets that lack the high variation that these applications will face when deployed in the real-world. To close this gap, we present the ORBIT dataset, grounded in a real-world application of teachable object recognizers for people who are blind/low vision. We provide a full, unfiltered dataset of 4,733 videos of 588 objects recorded by 97 people who are blind/low-vision on their mobile phones, and a benchmark dataset of 3,822 videos of 486 objects collected by 77 collectors. The code for loading the dataset, computing all benchmark metrics, and running the baseline models is available at https://github.com/microsoft/ORBIT-DatasetThis version comprises several zip files:- train, validation, test: benchmark dataset, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS- other: data not in the benchmark set, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS (please note that the train, validation, test, and other files make up the unfiltered dataset)- *_224: as for the benchmark, but static individual frames are scaled down to 224 pixels.- *_unfiltered_videos: full unfiltered dataset, organised by collector, in mp4 format.
d
Statistics on Obesity, Physical Activity and Diet (replaced by Statistics on...
digital.nhs.uk
Updated May 5, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Statistics on Obesity, Physical Activity and Diet (replaced by Statistics on Public Health) [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/statistics-on-obesity-physical-activity-and-diet
Explore at:
Dataset updated
May 5, 2020
License
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
Time period covered
Apr 1, 2018 - Dec 31, 2019
Description
This report presents information on obesity, physical activity and diet drawn together from a variety of sources for England. More information can be found in the source publications which contain a wider range of data and analysis. Each section provides an overview of key findings, as well as providing links to relevant documents and sources. Some of the data have been published previously by NHS Digital. A data visualisation tool (link provided within the key facts) allows users to select obesity related hospital admissions data for any Local Authority (as contained in the data tables), along with time series data from 2013/14. Regional and national comparisons are also provided. The report includes information on: Obesity related hospital admissions, including obesity related bariatric surgery. Obesity prevalence. Physical activity levels. Walking and cycling rates. Prescriptions items for the treatment of obesity. Perception of weight and weight management. Food and drink purchases and expenditure. Fruit and vegetable consumption. Key facts cover the latest year of data available: Hospital admissions: 2018/19 Adult obesity: 2018 Childhood obesity: 2018/19 Adult physical activity: 12 months to November 2019 Children and young people's physical activity: 2018/19 academic year
P
Data from: ImageNet Dataset
paperswithcode.com
Updated Apr 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jia Deng; Wei Dong; Richard Socher; Li-Jia Li; Kai Li; Fei-Fei Li (2021). ImageNet Dataset [Dataset]. https://paperswithcode.com/dataset/imagenet
Explore at:
Dataset updated
Apr 15, 2024
Authors
Jia Deng; Wei Dong; Richard Socher; Li-Jia Li; Kai Li; Fei-Fei Li
Description
The ImageNet dataset contains 14,197,122 annotated images according to the WordNet hierarchy. Since 2010 the dataset is used in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark in image classification and object detection. The publicly released dataset contains a set of manually annotated training images. A set of test images is also released, with the manual annotations withheld. ILSVRC annotations fall into one of two categories: (1) image-level annotation of a binary label for the presence or absence of an object class in the image, e.g., “there are cars in this image” but “there are no tigers,” and (2) object-level annotation of a tight bounding box and class label around an object instance in the image, e.g., “there is a screwdriver centered at position (20,25) with width of 50 pixels and height of 30 pixels”. The ImageNet project does not own the copyright of the images, therefore only thumbnails and URLs of images are provided.

Total number of non-empty WordNet synsets: 21841 Total number of images: 14197122 Number of images with bounding box annotations: 1,034,908 Number of synsets with SIFT features: 1000 Number of images with SIFT features: 1.2 million
F
German Open Ended Question Answer Text Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). German Open Ended Question Answer Text Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/german-open-ended-question-answer-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
What’s Included
The German Open-Ended Question Answering Dataset is a meticulously curated collection of comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and Question-answering models in the German language, advancing the field of artificial intelligence.
Dataset Content:
This QA dataset comprises a diverse set of open-ended questions paired with corresponding answers in German. There is no context paragraph given to choose an answer from, and each question is answered without any predefined context content. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.
Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native German people, and references were taken from diverse sources like books, news articles, websites, and other reliable references.
This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.
Question Diversity:
To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. Additionally, questions are further classified into fact-based and opinion-based categories, creating a comprehensive variety. The QA dataset also contains the question with constraints and persona restrictions, which makes it even more useful for LLM training.
Answer Formats:
To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraph types of answers. The answer contains text strings, numerical values, date and time formats as well. Such diversity strengthens the Language model's ability to generate coherent and contextually appropriate answers.
Data Format and Annotation Details:
This fully labeled German Open Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as id, language, domain, question_length, prompt_type, question_category, question_type, complexity, answer_type, rich_text.
Quality and Accuracy:
The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.
Both the question and answers in German are grammatically accurate without any word or grammatical errors. No copyrighted, toxic, or harmful content is used while building this dataset.
Continuous Updates and Customization:
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.
License:
The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy German Open Ended Question Answer Dataset to enhance the language understanding capabilities of their generative ai models, improve response generation, and explore new approaches to NLP question-answering tasks.
Z
The ORBIT (Object Recognition for Blind Image Training)-India Dataset
data.niaid.nih.gov
zenodo.org
Updated Jul 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massiceti, Daniela (2024). The ORBIT (Object Recognition for Blind Image Training)-India Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11394528
Explore at:
Dataset updated
Jul 2, 2024
Dataset provided by
Grayson, Martin
Jones, Matt
India, Gesu
Robinson, Simon
Massiceti, Daniela
Pearson, Jennifer
Morrison, Cecily
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
India
Description
The ORBIT (Object Recognition for Blind Image Training) -India Dataset is a collection of 105,243 images of 76 commonly used objects, collected by 12 individuals in India who are blind or have low vision. This dataset is an "Indian subset" of the original ORBIT dataset [1, 2], which was collected in the UK and Canada. In contrast to the ORBIT dataset, which was created in a Global North, Western, and English-speaking context, the ORBIT-India dataset features images taken in a low-resource, non-English-speaking, Global South context, a home to 90% of the world’s population of people with blindness. Since it is easier for blind or low-vision individuals to gather high-quality data by recording videos, this dataset, like the ORBIT dataset, contains images (each sized 224x224) derived from 587 videos. These videos were taken by our data collectors from various parts of India using the Find My Things [3] Android app. Each data collector was asked to record eight videos of at least 10 objects of their choice.

Collected between July and November 2023, this dataset represents a set of objects commonly used by people who are blind or have low vision in India, including earphones, talking watches, toothbrushes, and typical Indian household items like a belan (rolling pin), and a steel glass. These videos were taken in various settings of the data collectors' homes and workspaces using the Find My Things Android app.

The image dataset is stored in the ‘Dataset’ folder, organized by folders assigned to each data collector (P1, P2, ...P12) who collected them. Each collector's folder includes sub-folders named with the object labels as provided by our data collectors. Within each object folder, there are two subfolders: ‘clean’ for images taken on clean surfaces and ‘clutter’ for images taken in cluttered environments where the objects are typically found. The annotations are saved inside a ‘Annotations’ folder containing a JSON file per video (e.g., P1--coffee mug--clean--231220_084852_coffee mug_224.json) that contains keys corresponding to all frames/images in that video (e.g., "P1--coffee mug--clean--231220_084852_coffee mug_224--000001.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, "P1--coffee mug--clean--231220_084852_coffee mug_224--000002.jpeg": {"object_not_present_issue": false, "pii_present_issue": false}, ...). The ‘object_not_present_issue’ key is True if the object is not present in the image, and the ‘pii_present_issue’ key is True, if there is a personally identifiable information (PII) present in the image. Note, all PII present in the images has been blurred to protect the identity and privacy of our data collectors. This dataset version was created by cropping images originally sized at 1080 × 1920; therefore, an unscaled version of the dataset will follow soon.

This project was funded by the Engineering and Physical Sciences Research Council (EPSRC) Industrial ICASE Award with Microsoft Research UK Ltd. as the Industrial Project Partner. We would like to acknowledge and express our gratitude to our data collectors for their efforts and time invested in carefully collecting videos to build this dataset for their community. The dataset is designed for developing few-shot learning algorithms, aiming to support researchers and developers in advancing object-recognition systems. We are excited to share this dataset and would love to hear from you if and how you use this dataset. Please feel free to reach out if you have any questions, comments or suggestions.

REFERENCES:

Daniela Massiceti, Lida Theodorou, Luisa Zintgraf, Matthew Tobias Harris, Simone Stumpf, Cecily Morrison, Edward Cutrell, and Katja Hofmann. 2021. ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision. DOI: https://doi.org/10.25383/city.14294597

microsoft/ORBIT-Dataset. https://github.com/microsoft/ORBIT-Dataset

Linda Yilin Wen, Cecily Morrison, Martin Grayson, Rita Faia Marques, Daniela Massiceti, Camilla Longden, and Edward Cutrell. 2024. Find My Things: Personalized Accessibility through Teachable AI for People who are Blind or Low Vision. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (CHI EA '24). Association for Computing Machinery, New York, NY, USA, Article 403, 1–6. https://doi.org/10.1145/3613905.3648641
h
body-measurements-dataset
huggingface.co
Updated Jul 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Training Data (2023). body-measurements-dataset [Dataset]. https://huggingface.co/datasets/TrainingDataPro/body-measurements-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 11, 2023
Authors
Training Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Body Measurements Dataset

The dataset consists of a compilation of people's photos along with their corresponding body measurements. It is designed to provide information and insights into the physical appearances and body characteristics of individuals. The dataset includes a diverse range of subjects representing different age groups, genders, and ethnicities. The photos are captured in a standardized manner, depicting individuals in a front and side positions. The images aim to… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/body-measurements-dataset.
Labour Force Survey Five-Quarter Longitudinal Dataset, October 2019 -...
beta.ukdataservice.ac.uk
Updated 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office For National Statistics (2023). Labour Force Survey Five-Quarter Longitudinal Dataset, October 2019 - December 2020 [Dataset]. http://doi.org/10.5255/ukda-sn-8780-3
Explore at:
Unique identifier
https://doi.org/10.5255/ukda-sn-8780-3
Dataset updated
2023
Dataset provided by
UK Data Servicehttps://ukdataservice.ac.uk/
DataCitehttps://www.datacite.org/
Authors
Office For National Statistics
Description
Background
The Labour Force Survey (LFS) is a unique source of information using international definitions of employment and unemployment and economic inactivity, together with a wide range of related topics such as occupation, training, hours of work and personal characteristics of household members aged 16 years and over. It is used to inform social, economic and employment policy. The LFS was first conducted biennially from 1973-1983. Between 1984 and 1991 the survey was carried out annually and consisted of a quarterly survey conducted throughout the year and a 'boost' survey in the spring quarter (data were then collected seasonally). From 1992 quarterly data were made available, with a quarterly sample size approximately equivalent to that of the previous annual data. The survey then became known as the Quarterly Labour Force Survey (QLFS). From December 1994, data gathering for Northern Ireland moved to a full quarterly cycle to match the rest of the country, so the QLFS then covered the whole of the UK (though some additional annual Northern Ireland LFS datasets are also held at the UK Data Archive). Further information on the background to the QLFS may be found in the documentation.

Longitudinal data
The LFS retains each sample household for five consecutive quarters, with a fifth of the sample replaced each quarter. The main survey was designed to produce cross-sectional data, but the data on each individual have now been linked together to provide longitudinal information. The longitudinal data comprise two types of linked datasets, created using the weighting method to adjust for non-response bias. The two-quarter datasets link data from two consecutive waves, while the five-quarter datasets link across a whole year (for example January 2010 to March 2011 inclusive) and contain data from all five waves. A full series of longitudinal data has been produced, going back to winter 1992. Linking together records to create a longitudinal dimension can, for example, provide information on gross flows over time between different labour force categories (employed, unemployed and economically inactive). This will provide detail about people who have moved between the categories. Also, longitudinal information is useful in monitoring the effects of government policies and can be used to follow the subsequent activities and circumstances of people affected by specific policy initiatives, and to compare them with other groups in the population. There are however methodological problems which could distort the data resulting from this longitudinal linking. The ONS continues to research these issues and advises that the presentation of results should be carefully considered, and warnings should be included with outputs where necessary.

LFS Documentation
The documentation available from the Archive to accompany LFS datasets largely consists of the latest version of each user guide volume alongside the appropriate questionnaire for the year concerned. However, volumes are updated periodically by ONS, so users are advised to check the latest documents on the ONS Labour Force Survey - User Guidance pages before commencing analysis. This is especially important for users of older QLFS studies, where information and guidance in the user guide documents may have changed over time.
Occupation data for 2021 and 2022 data files
The ONS has identified an issue with the collection of some occupational data in 2021 and 2022 data files in a number of their surveys. While they estimate any impacts will be small overall, this will affect the accuracy of the breakdowns of some detailed (four-digit Standard Occupational Classification (SOC)) occupations, and data derived from them. Further information can be found in the ONS article published on 11 July 2023: Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.
2022 Weighting
The population totals used for the latest LFS estimates use projected growth rates from Real Time Information (RTI) data for UK, EU and non-EU populations based on 2021 patterns. The total population used for the LFS therefore does not take into account any changes in migration, birth rates, death rates, and so on since June 2021, and hence levels estimates may be under- or over-estimating the true values and should be used with caution. Estimates of rates will, however, be robust.

Latest edition information
For the third edition (February 2023), the 2022 longitudinal weight has been added to the study.
Z
SH17 Dataset for PPE Detection
data.niaid.nih.gov
Updated Jul 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmad, Hafiz Mughees (2024). SH17 Dataset for PPE Detection [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_12659324
Explore at:
Dataset updated
Jul 4, 2024
Dataset authored and provided by
Ahmad, Hafiz Mughees
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
We propose Safe Human dataset consisting of 17 different objects referred to as SH17 dataset. We scrapped images from the Pexels website, which offers clear usage rights for all its images, showcasing a range of human activities across diverse industrial operations.

To extract relevant images, we used multiple queries such as manufacturing worker, industrial worker, human worker, labor, etc. The tags associated with Pexels images proved reasonably accurate. After removing duplicate samples, we obtained a dataset of 8,099 images. The dataset exhibits significant diversity, representing manufacturing environments globally, thus minimizing potential regional or racial biases. Samples of the dataset are shown below.

Key features

Collected from diverse industrial environments globally

High quality images (max resolution 8192x5462, min 1920x1002)

Average of 9.38 instances per image

Includes small objects like ears and earmuffs (39,764 annotations < 1% image area, 59,025 annotations < 5% area)

Classes

Person

Head

Face

Glasses

Face-mask-medical

Face-guard

Ear

Earmuffs

Hands

Gloves

Foot

Shoes

Safety-vest

Tools

Helmet

Medical-suit

Safety-suit

The data consists of three folders,

images contains all images

labels contains labels in YOLO format for all images

voc_labels contains labels in VOC format for all images

train_files.txt contains list of all images we used for training

val_files.txt contains list of all images we used for validation

Disclaimer and Responsible Use:

This dataset, scrapped through the Pexels website, is intended for educational, research, and analysis purposes only. You may be able to use the data for training of the Machine learning models only. Users are urged to use this data responsibly, ethically, and within the bounds of legal stipulations.

Users should adhere to Copyright Notice of Pexels when utilizing this dataset.

Legal Simplicity: All photos and videos on Pexels can be downloaded and used for free.

Allowed 👌

All photos and videos on Pexels are free to use.

Attribution is not required. Giving credit to the photographer or Pexels is not necessary but always appreciated.

You can modify the photos and videos from Pexels. Be creative and edit them as you like.

Not allowed 👎

Identifiable people may not appear in a bad light or in a way that is offensive.

Don't sell unaltered copies of a photo or video, e.g. as a poster, print or on a physical product without modifying it first.

Don't imply endorsement of your product by people or brands on the imagery.

Don't redistribute or sell the photos and videos on other stock photo or wallpaper platforms.

Don't use the photos or videos as part of your trade-mark, design-mark, trade-name, business name or service mark.

No Warranty Disclaimer:

The dataset is provided "as is," without warranty, and the creator disclaims any legal liability for its use by others.

Ethical Use:

Users are encouraged to consider the ethical implications of their analyses and the potential impact on broader community.

GitHub Page:

https://github.com/ahmadmughees/SH17dataset
a
VizWiz v1.0 dataset (Answering Visual Questions from Blind People)
academictorrents.com
bittorrent
Updated Aug 23, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
None (2018). VizWiz v1.0 dataset (Answering Visual Questions from Blind People) [Dataset]. https://academictorrents.com/details/b633e14aa084fab57f20ad0b4612e0932ae1f2dc
Explore at:
bittorrent(15394669439)Available download formats
Dataset updated
Aug 23, 2018
Authors
None
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
We propose an artificial intelligence challenge to design algorithms that assist people who are blind to overcome their daily visual challenges. For this purpose, we introduce the VizWiz dataset, which originates from a natural visual question answering setting where blind people each took an image and recorded a spoken question about it, together with 10 crowdsourced answers per visual question. Our proposed challenge addresses the following two tasks for this dataset: (1) predict the answer to a visual question and (2) predict whether a visual question cannot be answered. Ultimately, we hope this work will educate more people about the technological needs of blind people while providing an exciting new opportunity for researchers to develop assistive technologies that eliminate accessibility barriers for blind people. VizWiz v1.0 dataset download: 20,000 training image/question pairs 200,000 training answer/answer confidence pairs 3,173 image/question pairs 31,730 validation answ
A
‘Strategic Measure_Number and percentage of people who successfully...
analyst-2.ai
Updated Apr 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘Strategic Measure_Number and percentage of people who successfully completed workforce development training, EOA.F.4 & EOA.G.3’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-strategic-measure-number-and-percentage-of-people-who-successfully-completed-workforce-development-training-eoa-f-4-eoa-g-3-6d80/e6efc6bc/?iid=003-523&v=presentation
Explore at:
Dataset updated
Apr 9, 2020
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Strategic Measure_Number and percentage of people who successfully completed workforce development training, EOA.F.4 & EOA.G.3’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/54be11d4-7b0f-4163-9602-5cc2e918f679 on 27 January 2022.

--- Dataset description provided by original source is as follows ---

This dataset supports measure EOA.F.4 and EOA.G.3 of SD23. It demonstrates the total number of enrollees to those who successfully completed training. Depending on the length of training which can last anywhere from 12 weeks to 3 years there may or may not be a completion rate available at the time of reporting. These trainees are all participating in Community Based Organization workforce training programs in partnership with the Master Community Workforce Plan adopted by Imagine Austin. After participants successfully complete training, the Ray Marshall Center monitors completer-participant earnings. Earnings above Federal Poverty Limit are considered "above poverty". Data sourced by University of Texas Ray Marshall Center, in partnership with Workforce Solution, Capital Area.

View more details and insights related to this data set on the story page: https://data.austintexas.gov/stories/s/xfnx-fpv8 https://data.austintexas.gov/stories/s/Number-of-Persons-Moved-Out-of-Poverty-Into-Middle/xg7g-9uru/

--- Original source retains full ownership of the source dataset ---
Diets, Recipes And Their Nutrients
kaggle.com
Updated Oct 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Diets, Recipes And Their Nutrients [Dataset]. https://www.kaggle.com/datasets/thedevastator/healthy-diet-recipes-a-comprehensive-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 18, 2022
Dataset provided by
Kaggle
Authors
The Devastator
Description
Recipes And Nutrients Per Diet

A dataset of diets and recipes

About this dataset

Do you want to nourish your body in the best and healthiest way possible? If so, then this dataset is for you! It consists of recipes from different diets and cuisines, all of which are aimed at providing healthy and nutritious meal options. The dataset includes information on the macronutrients of each recipe, as well as the extraction day and time. This makes it an incredibly valuable resource for those interested in following a healthy diet, as well as for researchers studying the relationship between diet and health. So what are you waiting for? Start exploring today!

How to use the dataset

This dataset can be used to find healthy and nutritious recipes from different diets and cuisines. The macronutrient information can be used to make sure that the recipes fit into a healthy diet plan. The extraction day and time can be used to find recipes that were extracted recently or to find recipes that have been extracted on a particular day

Research Ideas

This dataset can be used to create a healthy meal plan for those interested in following a nutritious diet.

This dataset can be used to study the relationship between diet and health.

This dataset can be used to create healthy recipes that are suitable for different diets and cuisines

Acknowledgements

We would like to thank the following people for their contributions to this dataset:

-The anonymous recipe creators who have shared their healthy and nutritious recipes with us -The researchers who have studied the relationship between diet and health, and have helped to inform our choices of recipes

License

See the dataset description for more information.

Columns

File: All_Diets.csv | Column name | Description | |:-------------------|:---------------------------------------------| | Diet_type | The type of diet the recipe is for. (String) | | Recipe_name | The name of the recipe. (String) | | Cuisine_type | The cuisine the recipe is from. (String) | | Protein(g) | The amount of protein in grams. (Float) | | Carbs(g) | The amount of carbs in grams. (Float) | | Fat(g) | The amount of fat in grams. (Float) | | Extraction_day | The day the recipe was extracted. (String) |

File: dash.csv | Column name | Description | |:-------------------|:---------------------------------------------| | Diet_type | The type of diet the recipe is for. (String) | | Recipe_name | The name of the recipe. (String) | | Cuisine_type | The cuisine the recipe is from. (String) | | Protein(g) | The amount of protein in grams. (Float) | | Carbs(g) | The amount of carbs in grams. (Float) | | Fat(g) | The amount of fat in grams. (Float) | | Extraction_day | The day the recipe was extracted. (String) |

File: keto.csv | Column name | Description | |:-------------------|:---------------------------------------------| | Diet_type | The type of diet the recipe is for. (String) | | Recipe_name | The name of the recipe. (String) | | Cuisine_type | The cuisine the recipe is from. (String) | | Protein(g) | The amount of protein in grams. (Float) | | Carbs(g) | The amount of carbs in grams. (Float) | | Fat(g) | The amount of fat in grams. (Float) | | Extraction_day | The day the recipe was extracted. (String) |

File: mediterranean.csv | Column name | Description | |:-------------------|:---------------------------------------------| | Diet_type | The type of diet the recipe is for. (String) | | Recipe_name | The name of the recipe. (String) | | Cuisine_type | The cuisine the recipe is from. (String) | | Protein(g) | The amount of protein in grams. (Float) | | Carbs(g) | The amount of carbs in grams. (Float) | | Fat(g) | The amount of fat in grams. (Float) | | Extraction_day | The day the recipe was extracted. (String) |

File: paleo.csv | Column name | Description | |:-------------------|:---------------------------------------------| | Diet_type | The type of diet the recipe is for. (String) | | Recipe_name | The name of the recipe. (String) | | Cuisine_type | The cuisine the recipe is from. (String) | | Protein(g) | The amount of protein in grams. (Float) | | Carbs(g) | The amount of carb...
Wirestock's AI/ML Image Training Data, 4.5M Files with Metadata
datarade.ai
.csv
Updated Jul 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WIRESTOCK (2023). Wirestock's AI/ML Image Training Data, 4.5M Files with Metadata [Dataset]. https://datarade.ai/data-products/wirestock-s-ai-ml-image-training-data-4-5m-files-with-metadata-wirestock
Explore at:
.csvAvailable download formats
Dataset updated
Jul 18, 2023
Dataset provided by
Wirestock, Inc.
Authors
WIRESTOCK
Area covered
Pakistan, Estonia, Sudan, Jersey, New Caledonia, Swaziland, Peru, Chile, Belarus, Georgia
Description
Wirestock's AI/ML Image Training Data, 4.5M Files with Metadata: This data product is a unique offering in the realm of AI/ML training data. What sets it apart is the sheer volume and diversity of the dataset, which includes 4.5 million files spanning across 20 different categories. These categories range from Animals/Wildlife and The Arts to Technology and Transportation, providing a rich and varied dataset for AI/ML applications.

The data is sourced from Wirestock's platform, where creators upload and sell their photos, videos, and AI art online. This means that the data is not only vast but also constantly updated, ensuring a fresh and relevant dataset for your AI/ML needs. The data is collected in a GDPR-compliant manner, ensuring the privacy and rights of the creators are respected.

The primary use-cases for this data product are numerous. It is ideal for training machine learning models for image recognition, improving computer vision algorithms, and enhancing AI applications in various industries such as retail, healthcare, and transportation. The diversity of the dataset also means it can be used for more niche applications, such as training AI to recognize specific objects or scenes.

This data product fits into Wirestock's broader data offering as a key resource for AI/ML training. Wirestock is a platform for creators to sell their work, and this dataset is a collection of that work. It represents the breadth and depth of content available on Wirestock, making it a valuable resource for any company working with AI/ML.

The core benefits of this dataset are its volume, diversity, and quality. With 4.5 million files, it provides a vast resource for AI training. The diversity of the dataset, spanning 20 categories, ensures a wide range of images for training purposes. The quality of the images is also high, as they are sourced from creators selling their work on Wirestock.

In terms of how the data is collected, creators upload their work to Wirestock, where it is then sold on various marketplaces. This means the data is sourced directly from creators, ensuring a diverse and unique dataset. The data includes both the images themselves and associated metadata, providing additional context for each image.

The different image categories included in this dataset are Animals/Wildlife, The Arts, Backgrounds/Textures, Beauty/Fashion, Buildings/Landmarks, Business/Finance, Celebrities, Education, Emotions, Food Drinks, Holidays, Industrial, Interiors, Nature Parks/Outdoor, People, Religion, Science, Signs/Symbols, Sports/Recreation, Technology, Transportation, Vintage, Healthcare/Medical, Objects, and Miscellaneous. This wide range of categories ensures a diverse dataset that can cater to a variety of AI/ML applications.

Facebook

Twitter

Click to copy link

Link copied

Cite

Muhannad Tuameh (2023). Physical Exercise Recognition Dataset [Dataset]. https://www.kaggle.com/datasets/muhannadtuameh/exercise-recognition

Physical Exercise Recognition Dataset

Dataset that represents the terminal positions of some physical exercises.

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Feb 16, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Muhannad Tuameh

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

Note:

Because this dataset has been used in a competition, we had to hide some of the data to prepare the test dataset for the competition. Thus, in the previous version of the dataset, only train.csv file is existed.

Content

This dataset represents 10 different physical poses that can be used to distinguish 5 exercises. The exercises are Push-up, Pull-up, Sit-up, Jumping Jack and Squat. For every exercise, 2 different classes have been used to represent the terminal positions of that exercise (e.g., “up” and “down” positions for push-ups).

Collection Process

About 500 videos of people doing the exercises have been used in order to collect this data. The videos are from Countix Dataset that contain the YouTube links of several human activity videos. Using a simple Python script, the videos of 5 different physical exercises are downloaded. From every video, at least 2 frames are manually extracted. The extracted frames represent the terminal positions of the exercise.

Processing Data

For every frame, MediaPipe framework is used for applying pose estimation, which detects the human skeleton of the person in the frame. The landmark model in MediaPipe Pose predicts the location of 33 pose landmarks (see figure below). Visit Mediapipe Pose Classification page for more details.

https://mediapipe.dev/images/mobile/pose_tracking_full_body_landmarks.png" alt="33 pose landmarks">

Clear search

Close search

Google apps

Main menu

Physical Exercise Recognition Dataset

Note:

Content

Collection Process

Processing Data

Fitness Analysis

Context

Acknowledgements

What is in the dataset

California Adults Who Met Physical Activity Guidelines for Americans, 2013

Comprehensive Fitness Industry Statistics 2025

Workout/Exercises Video

Data from: LifeSnaps: a 4-month multi-modal dataset capturing unobtrusive...

Health Survey for England

Strategic Measure_Number and percentage of people who successfully completed...

ORBIT: A real-world few-shot dataset for teachable object recognition...

Statistics on Obesity, Physical Activity and Diet (replaced by Statistics on...

Data from: ImageNet Dataset

German Open Ended Question Answer Text Dataset

What’s Included

The ORBIT (Object Recognition for Blind Image Training)-India Dataset

body-measurements-dataset

Labour Force Survey Five-Quarter Longitudinal Dataset, October 2019 -...

SH17 Dataset for PPE Detection

VizWiz v1.0 dataset (Answering Visual Questions from Blind People)

‘Strategic Measure_Number and percentage of people who successfully...

Diets, Recipes And Their Nutrients

Recipes And Nutrients Per Diet

A dataset of diets and recipes

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Wirestock's AI/ML Image Training Data, 4.5M Files with Metadata

Physical Exercise Recognition Dataset

Dataset that represents the terminal positions of some physical exercises.

Note:

Content

Collection Process

Processing Data