100+ datasets found

Exercise Detection dataset
kaggle.com
Updated Sep 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MRIGAANK JASWAL (2024). Exercise Detection dataset [Dataset]. https://www.kaggle.com/datasets/mrigaankjaswal/exercise-detection-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 22, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
MRIGAANK JASWAL
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This project focuses on analyzing human body movements during common exercises by capturing and processing angles of key body joints. We utilized video data to extract frame-by-frame angles of the following body parts during various exercises such as push-ups, jumping jacks, pull-ups, squats, and Russian twists. For pose estimation, MediaPipe was used to detect body landmarks, while YOLOv6 was employed for object detection to enhance accuracy.

Methodology

Video Collection: Videos were recorded for each exercise (push-ups, jumping jacks, pull-ups, squats, Russian twists), ensuring proper form and variety in movement.

Frame-by-Frame Analysis: Each video was processed frame by frame, and landmarks were detected using MediaPipe's Pose Estimation. We calculated the angles of key joints by using the positional data of landmarks across different frames.

Object Detection with YOLOv6: YOLOv6 was used to identify specific objects and enhance the robustness of the pose estimation by detecting outliers or incorrect poses during exercises, thereby improving the accuracy of the analysis.

Applications This dataset can be used for multiple applications: - Form Correction: By comparing these angles with standard benchmarks, feedback can be provided to improve exercise form. - Performance Tracking: Over time, users can monitor their improvement by analyzing the changes in their joint angles during exercises. - Pose Classification: Machine learning models can be trained to classify correct vs. incorrect form, enabling the development of smart fitness assistants. - Real-time Feedback Systems: Using pose estimation in conjunction with live video, real-time systems can be developed to guide users during workouts.

Exercises Analyzed The following exercises were captured and analyzed for this dataset:

Push-ups: Key focus on shoulder, elbow, and hip angles.

Jumping Jacks: Full-body motion tracked via shoulder, elbow, hip, knee, and ankle angles.

Pull-ups: Primarily focused on shoulder and elbow joint movements.

Squats: Analyzed hip, knee, and ankle angles for depth and posture analysis.

Russian Twists: Core movement tracked via shoulder and hip angles to assess rotational motion.

Potential Analysis - Time-Series Analysis: The data can be treated as a time-series, allowing for the identification of trends in joint movement over the duration of an exercise. - Pose Optimization: Optimization models can be used to suggest improvements in form based on angle analysis. - Machine Learning Integration: The dataset can serve as input for machine learning algorithms to automate form correction and workout optimization.
R
Stanford Data Set Dataset
universe.roboflow.com
zip
Updated Dec 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CarModelDetection (2024). Stanford Data Set Dataset [Dataset]. https://universe.roboflow.com/carmodeldetection-3otfq/stanford-data-set
Explore at:
zipAvailable download formats
Dataset updated
Dec 22, 2024
Dataset authored and provided by
CarModelDetection
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Cars Models Bounding Boxes
Description
Stanford Data Set

## Overview Stanford Data Set is a dataset for object detection tasks - it contains Cars Models annotations for 16,272 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
F
Portuguese Chain of Thought Prompt & Response Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Portuguese Chain of Thought Prompt & Response Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/portuguese-chain-of-thought-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Welcome to the Portuguese Chain of Thought prompt-response dataset, a meticulously curated collection containing 3000 comprehensive prompt and response pairs. This dataset is an invaluable resource for training Language Models (LMs) to generate well-reasoned answers and minimize inaccuracies. Its primary utility lies in enhancing LLMs' reasoning skills for solving arithmetic, common sense, symbolic reasoning, and complex problems.
Dataset Content
This COT dataset comprises a diverse set of instructions and questions paired with corresponding answers and rationales in the Portuguese language. These prompts and completions cover a broad range of topics and questions, including mathematical concepts, common sense reasoning, complex problem-solving, scientific inquiries, puzzles, and more.
Each prompt is meticulously accompanied by a response and rationale, providing essential information and insights to enhance the language model training process. These prompts, completions, and rationales were manually curated by native Portuguese people, drawing references from various sources, including open-source datasets, news articles, websites, and other reliable references.
Our chain-of-thought prompt-completion dataset includes various prompt types, such as instructional prompts, continuations, and in-context learning (zero-shot, few-shot) prompts. Additionally, the dataset contains prompts and completions enriched with various forms of rich text, such as lists, tables, code snippets, JSON, and more, with proper markdown format.
Prompt Diversity
To ensure a wide-ranging dataset, we have included prompts from a plethora of topics related to mathematics, common sense reasoning, and symbolic reasoning. These topics encompass arithmetic, percentages, ratios, geometry, analogies, spatial reasoning, temporal reasoning, logic puzzles, patterns, and sequences, among others.
These prompts vary in complexity, spanning easy, medium, and hard levels. Various question types are included, such as multiple-choice, direct queries, and true/false assessments.
Response Formats
To accommodate diverse learning experiences, our dataset incorporates different types of answers depending on the prompt and provides step-by-step rationales. The detailed rationale aids the language model in building reasoning process for complex questions.
These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.
Data Format and Annotation Details
This fully labeled Portuguese Chain of Thought Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt complexity, prompt category, domain, response, rationale, response type, and rich text presence.
Quality and Accuracy
Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses and rationales are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.
The Portuguese version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.
Continuous Updates and Customization
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom chain of thought prompt completion data tailored to specific needs, providing flexibility and customization options.
License
The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy Portuguese Chain of Thought Prompt Completion Dataset to enhance the rationale and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.
I
Cline Center Coup d’État Project Dataset
databank.illinois.edu
Updated May 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Buddy Peyton; Joseph Bajjalieh; Dan Shalmon; Michael Martin; Emilio Soto (2025). Cline Center Coup d’État Project Dataset [Dataset]. http://doi.org/10.13012/B2IDB-9651987_V7
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-9651987_V7
Dataset updated
May 11, 2025
Authors
Buddy Peyton; Joseph Bajjalieh; Dan Shalmon; Michael Martin; Emilio Soto
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Coups d'Ètat are important events in the life of a country. They constitute an important subset of irregular transfers of political power that can have significant and enduring consequences for national well-being. There are only a limited number of datasets available to study these events (Powell and Thyne 2011, Marshall and Marshall 2019). Seeking to facilitate research on post-WWII coups by compiling a more comprehensive list and categorization of these events, the Cline Center for Advanced Social Research (previously the Cline Center for Democracy) initiated the Coup d’État Project as part of its Societal Infrastructures and Development (SID) project. More specifically, this dataset identifies the outcomes of coup events (i.e., realized, unrealized, or conspiracy) the type of actor(s) who initiated the coup (i.e., military, rebels, etc.), as well as the fate of the deposed leader. Version 2.1.3 adds 19 additional coup events to the data set, corrects the date of a coup in Tunisia, and reclassifies an attempted coup in Brazil in December 2022 to a conspiracy. Version 2.1.2 added 6 additional coup events that occurred in 2022 and updated the coding of an attempted coup event in Kazakhstan in January 2022. Version 2.1.1 corrected a mistake in version 2.1.0, where the designation of “dissident coup” had been dropped in error for coup_id: 00201062021. Version 2.1.1 fixed this omission by marking the case as both a dissident coup and an auto-coup. Version 2.1.0 added 36 cases to the data set and removed two cases from the v2.0.0 data. This update also added actor coding for 46 coup events and added executive outcomes to 18 events from version 2.0.0. A few other changes were made to correct inconsistencies in the coup ID variable and the date of the event. Version 2.0.0 improved several aspects of the previous version (v1.0.0) and incorporated additional source material to include: • Reconciling missing event data • Removing events with irreconcilable event dates • Removing events with insufficient sourcing (each event needs at least two sources) • Removing events that were inaccurately coded as coup events • Removing variables that fell below the threshold of inter-coder reliability required by the project • Removing the spreadsheet ‘CoupInventory.xls’ because of inadequate attribution and citations in the event summaries • Extending the period covered from 1945-2005 to 1945-2019 • Adding events from Powell and Thyne’s Coup Data (Powell and Thyne, 2011)
Items in this Dataset 1. Cline Center Coup d'État Codebook v.2.1.3 Codebook.pdf - This 15-page document describes the Cline Center Coup d’État Project dataset. The first section of this codebook provides a summary of the different versions of the data. The second section provides a succinct definition of a coup d’état used by the Coup d'État Project and an overview of the categories used to differentiate the wide array of events that meet the project's definition. It also defines coup outcomes. The third section describes the methodology used to produce the data. Revised February 2024 2. Coup Data v2.1.3.csv - This CSV (Comma Separated Values) file contains all of the coup event data from the Cline Center Coup d’État Project. It contains 29 variables and 1000 observations. Revised February 2024 3. Source Document v2.1.3.pdf - This 325-page document provides the sources used for each of the coup events identified in this dataset. Please use the value in the coup_id variable to identify the sources used to identify that particular event. Revised February 2024 4. README.md - This file contains useful information for the user about the dataset. It is a text file written in markdown language. Revised February 2024
Citation Guidelines 1. To cite the codebook (or any other documentation associated with the Cline Center Coup d’État Project Dataset) please use the following citation: Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Scott Althaus. 2024. “Cline Center Coup d’État Project Dataset Codebook”. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7 2. To cite data from the Cline Center Coup d’État Project Dataset please use the following citation (filling in the correct date of access): Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Emilio Soto. 2024. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.1.3. February 27. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V7
F
Open Ended Question Answer Text Dataset in English
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Open Ended Question Answer Text Dataset in English [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/english-open-ended-question-answer-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
The English Open-Ended Question Answering Dataset is a meticulously curated collection of comprehensive Question-Answer pairs. It serves as a valuable resource for training Large Language Models (LLMs) and Question-answering models in the English language, advancing the field of artificial intelligence.
Dataset Content:
This QA dataset comprises a diverse set of open-ended questions paired with corresponding answers in English. There is no context paragraph given to choose an answer from, and each question is answered without any predefined context content. The questions cover a broad range of topics, including science, history, technology, geography, literature, current affairs, and more.
Each question is accompanied by an answer, providing valuable information and insights to enhance the language model training process. Both the questions and answers were manually curated by native English people, and references were taken from diverse sources like books, news articles, websites, and other reliable references.
This question-answer prompt completion dataset contains different types of prompts, including instruction type, continuation type, and in-context learning (zero-shot, few-shot) type. The dataset also contains questions and answers with different types of rich text, including tables, code, JSON, etc., with proper markdown.
Question Diversity:
To ensure diversity, this Q&A dataset includes questions with varying complexity levels, ranging from easy to medium and hard. Different types of questions, such as multiple-choice, direct, and true/false, are included. Additionally, questions are further classified into fact-based and opinion-based categories, creating a comprehensive variety. The QA dataset also contains the question with constraints and persona restrictions, which makes it even more useful for LLM training.
Answer Formats:
To accommodate varied learning experiences, the dataset incorporates different types of answer formats. These formats include single-word, short phrases, single sentences, and paragraph types of answers. The answer contains text strings, numerical values, date and time formats as well. Such diversity strengthens the Language model's ability to generate coherent and contextually appropriate answers.
Data Format and Annotation Details:
This fully labeled English Open Ended Question Answer Dataset is available in JSON and CSV formats. It includes annotation details such as id, language, domain, question_length, prompt_type, question_category, question_type, complexity, answer_type, rich_text.
Quality and Accuracy:
The dataset upholds the highest standards of quality and accuracy. Each question undergoes careful validation, and the corresponding answers are thoroughly verified. To prioritize inclusivity, the dataset incorporates questions and answers representing diverse perspectives and writing styles, ensuring it remains unbiased and avoids perpetuating discrimination.
Both the question and answers in English are grammatically accurate without any word or grammatical errors. No copyrighted, toxic, or harmful content is used while building this dataset.
Continuous Updates and Customization:
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Continuous efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to collect custom question-answer data tailored to specific needs, providing flexibility and customization options.
License:
The dataset, created by FutureBeeAI, is now ready for commercial use. Researchers, data scientists, and developers can utilize this fully labeled and ready-to-deploy English Open Ended Question Answer Dataset to enhance the language understanding capabilities of their generative ai models, improve response generation, and explore new approaches to NLP question-answering tasks.
f
10 Years Bug-Fix Dataset (PROMISE'19)
figshare.com
zip
Updated Sep 27, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Renan Vieira (2021). 10 Years Bug-Fix Dataset (PROMISE'19) [Dataset]. http://doi.org/10.6084/m9.figshare.8852084.v5
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8852084.v5
Dataset updated
Sep 27, 2021
Dataset provided by
figshare
Authors
Renan Vieira
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Replication Package of the paper "From Reports to Bug-Fix Commits: A 10 Years Dataset of Bug-Fixing Activity from 55 Apache's Open Source Projects"ABSTRACT:Bugs appear in almost any software development. Solving all or at least a large part of them requires a great deal of time, effort, and budget. Software projects typically use issue tracking systems as a way to report and monitor bug-fixing tasks. In recent years, several researchers have been conducting bug tracking analysis to better understand the problem and thus provide means to reduce costs and improve the efficiency of the bug-fixing task. In this paper, we introduce a new dataset composed of more than 70,000 bug-fix reports from 10 years of bug-fixing activity of 55 projects from the Apache Software Foundation, distributed in 9 categories. We have mined this information from Jira issue track system concerning two different perspectives of reports with closed/resolved status: static (the latest version of reports) and dynamic (the changes that have occurred in reports over time). We also extract information from the commits (if they exist) that fix such bugs from their respective version-control system (Git).We also provide a change analysis that occurs in the reports as a way of illustrating and characterizing the proposed dataset. Once the data extraction process is an error-prone nontrivial task, we believe such initiatives like this could be useful to support researchers in further more detailed investigations.You can find the full paper at: https://doi.org/10.1145/3345629.3345639If you use this dataset for your research, please reference the following paper:@inproceedings{Vieira:2019:RBC:3345629.3345639, author = {Vieira, Renan and da Silva, Ant^{o}nio and Rocha, Lincoln and Gomes, Jo~{a}o Paulo}, title = {From Reports to Bug-Fix Commits: A 10 Years Dataset of Bug-Fixing Activity from 55 Apache's Open Source Projects}, booktitle = {Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering}, series = {PROMISE'19}, year = {2019}, isbn = {978-1-4503-7233-6}, location = {Recife, Brazil}, pages = {80--89}, numpages = {10}, url = {http://doi.acm.org/10.1145/3345629.3345639}, doi = {10.1145/3345629.3345639}, acmid = {3345639}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {Bug-Fix Dataset, Mining Software Repositories, Software Traceability}, } P.S: We added a new dataset version (v1.0.1). In this version, we fix the git commit features that track the src and test files. More info can be found in the fix-script.py file.
d
Food Insecurity Hotspots Data Set
catalog.data.gov
Updated Aug 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SEDAC (2025). Food Insecurity Hotspots Data Set [Dataset]. https://catalog.data.gov/dataset/food-insecurity-hotspots-data-set
Explore at:
Dataset updated
Aug 22, 2025
Dataset provided by
SEDAC
Description
The Food Insecurity Hotspots Data Set consists of grids at 250 meter (~7.2 arc-seconds) resolution that identify the level of intensity and frequency of food insecurity over the 10 years between 2009 and 2019, as well as hotspot areas that have experienced consecutive food insecurity events. The gridded data are based on subnational food security analysis provided by FEWS NET (Famine Early Warning Systems Network) in five (5) regions, including Central America and the Caribbean, Central Asia, East Africa, Southern Africa, and West Africa. Based on the Integrated Food Security Phase Classification (IPC), food insecurity is defined as Minimal, Stressed, Crisis, Emergency, and Famine.
A
AI Training Dataset In Healthcare Market Report
archivemarketresearch.com
doc, pdf, ppt
Updated Jun 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). AI Training Dataset In Healthcare Market Report [Dataset]. https://www.archivemarketresearch.com/reports/ai-training-dataset-in-healthcare-market-5352
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Jun 20, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
global
Variables measured
Market Size
Description
The AI Training Dataset In Healthcare Market size was valued at USD 341.8 million in 2023 and is projected to reach USD 1464.13 million by 2032, exhibiting a CAGR of 23.1 % during the forecasts period. The growth is attributed to the rising adoption of AI in healthcare, increasing demand for accurate and reliable training datasets, government initiatives to promote AI in healthcare, and technological advancements in data collection and annotation. These factors are contributing to the expansion of the AI Training Dataset In Healthcare Market. Healthcare AI training data sets are vital for building effective algorithms, and enhancing patient care and diagnosis in the industry. These datasets include large volumes of Electronic Health Records, images such as X-ray and MRI scans, and genomics data which are thoroughly labeled. They help the AI systems to identify trends, forecast and even help in developing unique approaches to treating the disease. However, patient privacy and ethical use of a patient’s information is of the utmost importance, thus requiring high levels of anonymization and compliance with laws such as HIPAA. Ongoing expansion and variety of datasets are crucial to address existing bias and improve the efficiency of AI for different populations and diseases to provide safer solutions for global people’s health.
R
Head Data Set 2 Dataset
universe.roboflow.com
zip
Updated Oct 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Innovateitt (2024). Head Data Set 2 Dataset [Dataset]. https://universe.roboflow.com/innovateitt/head-data-set-2
Explore at:
zipAvailable download formats
Dataset updated
Oct 1, 2024
Dataset authored and provided by
Innovateitt
Variables measured
Heads QiDz Bounding Boxes
Description
Head Data Set 2

## Overview Head Data Set 2 is a dataset for object detection tasks - it contains Heads QiDz annotations for 2,342 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
Data set for "Evaluating fit-for-purpose cell viability assays that are...
catalog.data.gov
Updated Dec 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2023). Data set for "Evaluating fit-for-purpose cell viability assays that are sensitive to proliferative capacity" [Dataset]. https://catalog.data.gov/dataset/data-set-forevaluating-fit-for-purpose-cell-viability-assays-that-are-sensitive-to-prolife-90de4
Explore at:
Dataset updated
Dec 15, 2023
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
This dataset is associated with the manuscript "Evaluating fit-for-purpose cell viability assays that are sensitive to proliferative capacity". This dataset consists of 12 individual studies containing Jurkat cell proliferation data and viability assay data. A README file describes the data sets.
Walmart products free dataset
crawlfeeds.com
csv, zip
Updated Apr 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crawl Feeds (2025). Walmart products free dataset [Dataset]. https://crawlfeeds.com/datasets/walmart-products-free-dataset
Explore at:
zip, csvAvailable download formats
Dataset updated
Apr 27, 2025
Dataset authored and provided by
Crawl Feeds
License
https://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Description
Discover the Walmart Products Free Dataset, featuring 2,000 records in CSV format. This dataset includes detailed information about various Walmart products, such as names, prices, categories, and descriptions.

It’s perfect for data analysis, e-commerce research, and machine learning projects. Download now and kickstart your insights with accurate, real-world data.
h
bulgarian-grammar-mistakes
huggingface.co
Updated Dec 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
thebogko (2024). bulgarian-grammar-mistakes [Dataset]. https://huggingface.co/datasets/thebogko/bulgarian-grammar-mistakes
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 5, 2024
Authors
thebogko
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Area covered
Bulgaria
Description
Dataset of Bulgarian Grammar Mistakes

Dataset Summary

This is a dataset of sentences in Bulgarian with grammar mistakes created by automatically inducing errors in correct sentences.

Supported Tasks

text2text-generation: The dataset can be used to train a model for grammar error correction, which consists in correction of grammatical errors. in a source sentence, resulting in a correct version.

Languages

bg: Only Bulgarian is supported by this… See the full description on the dataset page: https://huggingface.co/datasets/thebogko/bulgarian-grammar-mistakes.
R
Validation Data Set Dataset
universe.roboflow.com
zip
Updated Oct 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Santo Tomas (2022). Validation Data Set Dataset [Dataset]. https://universe.roboflow.com/university-of-santo-tomas/validation-data-set
Explore at:
zipAvailable download formats
Dataset updated
Oct 13, 2022
Dataset authored and provided by
University of Santo Tomas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Microscopic Eggs Bounding Boxes
Description
Validation Data Set

## Overview Validation Data Set is a dataset for object detection tasks - it contains Microscopic Eggs annotations for 300 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
R
Gloves Data Set Dataset
universe.roboflow.com
zip
Updated Sep 27, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thesis (2023). Gloves Data Set Dataset [Dataset]. https://universe.roboflow.com/thesis-e02gj/gloves-data-set
Explore at:
zipAvailable download formats
Dataset updated
Sep 27, 2023
Dataset authored and provided by
Thesis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Gloves Bounding Boxes
Description
Gloves Data Set

## Overview Gloves Data Set is a dataset for object detection tasks - it contains Gloves annotations for 685 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
R
Tree Data Set Dataset
universe.roboflow.com
zip
Updated Jun 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ANISHA (2025). Tree Data Set Dataset [Dataset]. https://universe.roboflow.com/anisha-npnku/tree-data-set
Explore at:
zipAvailable download formats
Dataset updated
Jun 3, 2025
Dataset authored and provided by
ANISHA
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Tree Bounding Boxes
Description
Tree Data Set

## Overview Tree Data Set is a dataset for object detection tasks - it contains Tree annotations for 519 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
m
DATASET COLLECTION ON THE RELATIONSHIP OF LEARNING MODELS FOR 21st-CENTURY...
data.mendeley.com
Updated Mar 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yauk Hidayah (2023). DATASET COLLECTION ON THE RELATIONSHIP OF LEARNING MODELS FOR 21st-CENTURY CITIZENSHIP SKILL DEVELOPMENT [Dataset]. http://doi.org/10.17632/5hzwnr448r.2
Explore at:
Unique identifier
https://doi.org/10.17632/5hzwnr448r.2
Dataset updated
Mar 2, 2023
Authors
Yauk Hidayah
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
this data set is presented to explore the relationship of learning models to the development of 21st century citizenship skills. the variables are the level of understanding of learning models, ability in developing learning models, the relevance of learning models to the development of 21st century citizenship skills, the contribution of learning models to the development of critical thinking skills, contributions learning models to develop collaborative skills, the contribution of learning models to the development of communication skills, the contribution of learning models to the development of creative skills, the relevance of learning models to the development of aspects of spiritual attitudes, the relevance of learning models to the development of aspects of social attitudes, the relevance of learning models to the development of aspects of knowledge, relevance learning model for the development of aspects of factual knowledge, the relevance of the following learning models for development aspects of knowledge, the relevance of learning models to the development of aspects of procedural knowledge, the relevance of learning models to the development of aspects of metacognition knowledge.
Computer Science Theory QA Dataset
kaggle.com
Updated Apr 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mujtaba Mateen (2023). Computer Science Theory QA Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/5333319
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/5333319
Dataset updated
Apr 6, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mujtaba Mateen
Description
This comprehensive dataset contains a wide range of theoretical questions related to computer science, covering various domains such as operating systems, machine learning, software engineering, computer architecture and design, data structures, and algorithms. The questions are carefully curated to encompass a diverse set of topics, including hardware and software concepts, and are designed to challenge and enhance the knowledge of individuals interested in the computer science field.

The dataset is specifically tailored for training a chatbot or a question-answering system, with a focus on providing accurate and informative answers to technical questions. The questions cover a broad spectrum of complexity, ranging from basic to advanced, and are aimed at assisting users in gaining a deeper understanding of computer science concepts. Whether it's preparing for technical interviews or exams, or simply seeking guidance in the computer science field, this dataset can be a valuable resource for users looking to improve their knowledge and expertise.
A Dataset for Machine Learning Algorithm Development
fisheries.noaa.gov
Updated Jan 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alaska Fisheries Science Center (AFSC) (2021). A Dataset for Machine Learning Algorithm Development [Dataset]. https://www.fisheries.noaa.gov/inport/item/63322
Explore at:
Dataset updated
Jan 1, 2021
Dataset provided by
Alaska Fisheries Science Center
Authors
Alaska Fisheries Science Center (AFSC)
Area covered
Kotzebue Sound, Alaska, Chukchi Sea, Beaufort Sea
Description
This dataset consists of imagery, imagery footprints, associated ice seal detections and homography files associated with the KAMERA Test Flights conducted in 2019. This dataset was subset to include relevant data for detection algorithm development. This dataset is limited to data collected during flights 4, 5, 6 and 7 from our 2019 surveys.
Russian Speech Recognition Dataset - 338 Hours
kaggle.com
Updated Jun 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unidata (2025). Russian Speech Recognition Dataset - 338 Hours [Dataset]. https://www.kaggle.com/datasets/unidpro/russian-speech-recognition-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 30, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Unidata
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
Russian Speech Dataset for recognition task

Dataset comprises 338 hours of telephone dialogues in Russian, collected from 460 native speakers across various topics and domains, with an impressive 98% Word Accuracy Rate. It is designed for research in speech recognition, focusing on various recognition models, primarily aimed at meeting the requirements for automatic speech recognition (ASR) systems.

By utilizing this dataset, researchers and developers can advance their understanding and capabilities in automatic speech recognition (ASR) systems, transcribing audio, and natural language processing (NLP). - Get the data

💵 Buy the Dataset: This is a limited preview of the data. To access the full dataset, please contact us at https://unidata.pro to discuss your requirements and pricing options.

Metadata for the dataset

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2Fa3f375fb273dcad3fe17403bdfccb63b%2Fssssssssss.PNG?generation=1739884059328284&alt=media" alt=""> - Audio files: High-quality recordings in WAV format - Text transcriptions: Accurate and detailed transcripts for each audio segment - Speaker information: Metadata on native speakers, including gender and etc - Topics: Diverse domains such as general conversations, business and etc

The native speakers and various topics and domains covered in the dataset make it an ideal resource for research community, allowing researchers to study spoken languages, dialects, and language patterns.

🌐 UniData provides high-quality datasets, content moderation, data collection and annotation for your AI/ML projects
stack-exchange-preferences
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hugging Face H4, stack-exchange-preferences [Dataset]. https://huggingface.co/datasets/HuggingFaceH4/stack-exchange-preferences
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset provided by
Hugging Facehttps://huggingface.co/
Authors
Hugging Face H4
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for H4 Stack Exchange Preferences Dataset

Dataset Summary

This dataset contains questions and answers from the Stack Overflow Data Dump for the purpose of preference model training. Importantly, the questions have been filtered to fit the following criteria for preference models (following closely from Askell et al. 2021): have >=2 answers. This data could also be used for instruction fine-tuning and language model training. The questions are grouped with… See the full description on the dataset page: https://huggingface.co/datasets/HuggingFaceH4/stack-exchange-preferences.

Facebook

Twitter

Click to copy link

Link copied

Cite

MRIGAANK JASWAL (2024). Exercise Detection dataset [Dataset]. https://www.kaggle.com/datasets/mrigaankjaswal/exercise-detection-dataset

Exercise Detection dataset

This data set can be used to detect exercises

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Sep 22, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

MRIGAANK JASWAL

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This project focuses on analyzing human body movements during common exercises by capturing and processing angles of key body joints. We utilized video data to extract frame-by-frame angles of the following body parts during various exercises such as push-ups, jumping jacks, pull-ups, squats, and Russian twists. For pose estimation, MediaPipe was used to detect body landmarks, while YOLOv6 was employed for object detection to enhance accuracy.

Methodology

Video Collection: Videos were recorded for each exercise (push-ups, jumping jacks, pull-ups, squats, Russian twists), ensuring proper form and variety in movement.
Frame-by-Frame Analysis: Each video was processed frame by frame, and landmarks were detected using MediaPipe's Pose Estimation. We calculated the angles of key joints by using the positional data of landmarks across different frames.
Object Detection with YOLOv6: YOLOv6 was used to identify specific objects and enhance the robustness of the pose estimation by detecting outliers or incorrect poses during exercises, thereby improving the accuracy of the analysis.

Applications This dataset can be used for multiple applications: - Form Correction: By comparing these angles with standard benchmarks, feedback can be provided to improve exercise form. - Performance Tracking: Over time, users can monitor their improvement by analyzing the changes in their joint angles during exercises. - Pose Classification: Machine learning models can be trained to classify correct vs. incorrect form, enabling the development of smart fitness assistants. - Real-time Feedback Systems: Using pose estimation in conjunction with live video, real-time systems can be developed to guide users during workouts.

Exercises Analyzed The following exercises were captured and analyzed for this dataset:

Push-ups: Key focus on shoulder, elbow, and hip angles.
Jumping Jacks: Full-body motion tracked via shoulder, elbow, hip, knee, and ankle angles.
Pull-ups: Primarily focused on shoulder and elbow joint movements.
Squats: Analyzed hip, knee, and ankle angles for depth and posture analysis.
Russian Twists: Core movement tracked via shoulder and hip angles to assess rotational motion.

Potential Analysis - Time-Series Analysis: The data can be treated as a time-series, allowing for the identification of trends in joint movement over the duration of an exercise. - Pose Optimization: Optimization models can be used to suggest improvements in form based on angle analysis. - Machine Learning Integration: The dataset can serve as input for machine learning algorithms to automate form correction and workout optimization.

Clear search

Close search

Google apps

Main menu

Exercise Detection dataset

Stanford Data Set Dataset

Stanford Data Set

Portuguese Chain of Thought Prompt & Response Dataset

Dataset Content

Prompt Diversity

Response Formats

Data Format and Annotation Details

Cline Center Coup d’État Project Dataset

Open Ended Question Answer Text Dataset in English

Dataset Content:

Question Diversity:

Answer Formats:

Data Format and Annotation Details:

Quality and Accuracy:

Continuous Updates and Customization:

License:

10 Years Bug-Fix Dataset (PROMISE'19)

Food Insecurity Hotspots Data Set

AI Training Dataset In Healthcare Market Report

Head Data Set 2 Dataset

Head Data Set 2

Data set for "Evaluating fit-for-purpose cell viability assays that are...

Walmart products free dataset

bulgarian-grammar-mistakes

Validation Data Set Dataset

Validation Data Set

Gloves Data Set Dataset

Gloves Data Set

Tree Data Set Dataset

Tree Data Set

DATASET COLLECTION ON THE RELATIONSHIP OF LEARNING MODELS FOR 21st-CENTURY...

Computer Science Theory QA Dataset

A Dataset for Machine Learning Algorithm Development

Russian Speech Recognition Dataset - 338 Hours

Russian Speech Dataset for recognition task

💵 Buy the Dataset: This is a limited preview of the data. To access the full dataset, please contact us at https://unidata.pro to discuss your requirements and pricing options.

Metadata for the dataset

🌐 UniData provides high-quality datasets, content moderation, data collection and annotation for your AI/ML projects

stack-exchange-preferences

Exercise Detection dataset

This data set can be used to detect exercises