Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Dataset accompanying the paper "The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning", including 1.88M CoT rationales extracted across 1,060 tasks" - https://arxiv.org/abs/2305.14045
From the release repo https://github.com/kaistAI/CoT-Collection: Large Language Models (LLMs) have shown enhanced capabilities of solving novel tasks by reasoning step-by-step known as Chain-of-Thought (CoT) reasoning; how can we instill the same capability of reasoning step-by-step on unseen tasks into LMs that possess less than <100B parameters? To address this question, we first introduce the CoT Collection, a new instruction-tuning dataset that augments 1.88 million CoT rationales across 1,060 tasks. We show that continually fine-tuning Flan-T5 (3B & 11B) with the CoT Collection enables the 3B & 11B LMs to perform CoT better on unseen tasks, leading to an improvement in the average zero-shot accuracy on 27 datasets of the BIG-Bench-Hard benchmark by +4.34% and +2.44%, respectively. Furthermore, we show that instruction tuning with CoT allows LMs to possess stronger few-shot learning capabilities, resulting in an improvement of +2.97% and +2.37% on 4 domain-specific tasks over Flan-T5 (3B & 11B), respectively.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
See https://github.com/thuml/RLVR-World for examples for using this dataset.
Citation
@article{wu2025rlvr, title={RLVR-World: Training World Models with Reinforcement Learning}, author={Jialong Wu and Shaofeng Yin and Ningya Feng and Mingsheng Long}, journal={arXiv preprint arXiv:2505.13934}, year={2025}, }
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Chinese Chain of Thought prompt-response dataset, a meticulously curated collection containing 3000 comprehensive prompt and response pairs. This dataset is an invaluable resource for training Language Models (LMs) to generate well-reasoned answers and minimize inaccuracies. Its primary utility lies in enhancing LLMs' reasoning skills for solving arithmetic, common sense, symbolic reasoning, and complex problems.
This COT dataset comprises a diverse set of instructions and questions paired with corresponding answers and rationales in the Chinese language. These prompts and completions cover a broad range of topics and questions, including mathematical concepts, common sense reasoning, complex problem-solving, scientific inquiries, puzzles, and more.
Each prompt is meticulously accompanied by a response and rationale, providing essential information and insights to enhance the language model training process. These prompts, completions, and rationales were manually curated by native Chinese people, drawing references from various sources, including open-source datasets, news articles, websites, and other reliable references.
Our chain-of-thought prompt-completion dataset includes various prompt types, such as instructional prompts, continuations, and in-context learning (zero-shot, few-shot) prompts. Additionally, the dataset contains prompts and completions enriched with various forms of rich text, such as lists, tables, code snippets, JSON, and more, with proper markdown format.
To ensure a wide-ranging dataset, we have included prompts from a plethora of topics related to mathematics, common sense reasoning, and symbolic reasoning. These topics encompass arithmetic, percentages, ratios, geometry, analogies, spatial reasoning, temporal reasoning, logic puzzles, patterns, and sequences, among others.
These prompts vary in complexity, spanning easy, medium, and hard levels. Various question types are included, such as multiple-choice, direct queries, and true/false assessments.
To accommodate diverse learning experiences, our dataset incorporates different types of answers depending on the prompt and provides step-by-step rationales. The detailed rationale aids the language model in building reasoning process for complex questions.
These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.
This fully labeled Chinese Chain of Thought Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt complexity, prompt category, domain, response, rationale, response type, and rich text presence.
Quality and Accuracy
Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses and rationales are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.
The Chinese version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.
Continuous Updates and Customization
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom chain of thought prompt completion data tailored to specific needs, providing flexibility and customization options.
License
The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy Chinese Chain of Thought Prompt Completion Dataset to enhance the rationale and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Introducing Open-CoT-Reasoning-Mini:
An open source dataset of 10,200 distilled Chain-of-Thought (CoT) reasoning samples across diverse domains including mathematics, medicine, art, social sciences, computer science, logic puzzles, etc. This comprehensive collection is designed to boost step-by-step reasoning capabilities in language models under 10 billion parameters, enabling any non-reasoning model to develop structured analytical thinking across multiple disciplines.… See the full description on the dataset page: https://huggingface.co/datasets/Raymond-dev-546730/Open-CoT-Reasoning-Mini.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Tamil Chain of Thought prompt-response dataset, a meticulously curated collection containing 3000 comprehensive prompt and response pairs. This dataset is an invaluable resource for training Language Models (LMs) to generate well-reasoned answers and minimize inaccuracies. Its primary utility lies in enhancing LLMs' reasoning skills for solving arithmetic, common sense, symbolic reasoning, and complex problems.
This COT dataset comprises a diverse set of instructions and questions paired with corresponding answers and rationales in the Tamil language. These prompts and completions cover a broad range of topics and questions, including mathematical concepts, common sense reasoning, complex problem-solving, scientific inquiries, puzzles, and more.
Each prompt is meticulously accompanied by a response and rationale, providing essential information and insights to enhance the language model training process. These prompts, completions, and rationales were manually curated by native Tamil people, drawing references from various sources, including open-source datasets, news articles, websites, and other reliable references.
Our chain-of-thought prompt-completion dataset includes various prompt types, such as instructional prompts, continuations, and in-context learning (zero-shot, few-shot) prompts. Additionally, the dataset contains prompts and completions enriched with various forms of rich text, such as lists, tables, code snippets, JSON, and more, with proper markdown format.
To ensure a wide-ranging dataset, we have included prompts from a plethora of topics related to mathematics, common sense reasoning, and symbolic reasoning. These topics encompass arithmetic, percentages, ratios, geometry, analogies, spatial reasoning, temporal reasoning, logic puzzles, patterns, and sequences, among others.
These prompts vary in complexity, spanning easy, medium, and hard levels. Various question types are included, such as multiple-choice, direct queries, and true/false assessments.
To accommodate diverse learning experiences, our dataset incorporates different types of answers depending on the prompt and provides step-by-step rationales. The detailed rationale aids the language model in building reasoning process for complex questions.
These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.
This fully labeled Tamil Chain of Thought Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt complexity, prompt category, domain, response, rationale, response type, and rich text presence.
Quality and Accuracy
Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses and rationales are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.
The Tamil version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.
Continuous Updates and Customization
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom chain of thought prompt completion data tailored to specific needs, providing flexibility and customization options.
License
The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy Tamil Chain of Thought Prompt Completion Dataset to enhance the rationale and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Anandita Garg
Released under Apache 2.0
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
A high-quality, bilingual (English & Arabic) dataset for Chain-of-Thought (COT) reasoning in mathematics and related disciplines, developed by Miscovery AI.
Math-COT is a unique dataset designed to facilitate and benchmark the development of chain-of-thought reasoning capabilities in language models across mathematical domains. With meticulously crafted examples, explicit reasoning steps, and bilingual support, this dataset offers a robust foundation for training and evaluating mathematical reasoning abilities.
Each entry in the dataset contains the following fields:
{
"en_question": "Question text in English",
"ar_question": "Question text in Arabic",
"en_answer": "Detailed step-by-step solution in English",
"ar_answer": "Detailed step-by-step solution in Arabic",
"category": "Mathematical category",
"en_q_word": "Word count of English question",
"ar_q_word": "Word count of Arabic question",
"en_a_word": "Word count of English answer",
"ar_a_word": "Word count of Arabic answer"
}
The dataset covers 21 distinct categories:
Here's a sample entry from the dataset:
{
"en_question": "A bag contains only red and blue balls. If one ball is drawn at random, the probability that it is red is 2/5. If 8 more red balls are added, the probability of drawing a red ball becomes 4/5. How many blue balls are there in the bag?",
"ar_question": "تحتوي الحقيبة على كرات حمراء وزرقاء فقط. إذا تم سحب كرة واحدة عشوائيًا ، فإن احتمال أن تكون حمراء هو 2/5. إذا تمت إضافة 8 كرات حمراء أخرى ، يصبح احتمال سحب كرة حمراء 4/5. كم عدد الكرات الزرقاء الموجودة في الحقيبة؟",
This dataset is especially valuable for:
If you use this dataset in your research, please cite:
@dataset{miscoveryai2025mathcot,
title={Math CoT Arabic English Reasoning: A Bilingual Dataset for Chain-of-Thought Mathematical Reasoning},
author={Miscovery AI},
year={2025},
publisher={Kaggle},
url={https://www.kaggle.com/datasets/miscovery/math-cot-arabic-english-reasoning}
}
This project is licensed under the MIT License - see the LICENSE file for details.
For questions, feedback, or issues related to this dataset, please contact Miscovery AI at info@miscovery.com.
Facebook
TwitterThis dataset was created by Jatin Mehra_666
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Hindi Chain of Thought prompt-response dataset, a meticulously curated collection containing 3000 comprehensive prompt and response pairs. This dataset is an invaluable resource for training Language Models (LMs) to generate well-reasoned answers and minimize inaccuracies. Its primary utility lies in enhancing LLMs' reasoning skills for solving arithmetic, common sense, symbolic reasoning, and complex problems.
This COT dataset comprises a diverse set of instructions and questions paired with corresponding answers and rationales in the Hindi language. These prompts and completions cover a broad range of topics and questions, including mathematical concepts, common sense reasoning, complex problem-solving, scientific inquiries, puzzles, and more.
Each prompt is meticulously accompanied by a response and rationale, providing essential information and insights to enhance the language model training process. These prompts, completions, and rationales were manually curated by native Hindi people, drawing references from various sources, including open-source datasets, news articles, websites, and other reliable references.
Our chain-of-thought prompt-completion dataset includes various prompt types, such as instructional prompts, continuations, and in-context learning (zero-shot, few-shot) prompts. Additionally, the dataset contains prompts and completions enriched with various forms of rich text, such as lists, tables, code snippets, JSON, and more, with proper markdown format.
To ensure a wide-ranging dataset, we have included prompts from a plethora of topics related to mathematics, common sense reasoning, and symbolic reasoning. These topics encompass arithmetic, percentages, ratios, geometry, analogies, spatial reasoning, temporal reasoning, logic puzzles, patterns, and sequences, among others.
These prompts vary in complexity, spanning easy, medium, and hard levels. Various question types are included, such as multiple-choice, direct queries, and true/false assessments.
To accommodate diverse learning experiences, our dataset incorporates different types of answers depending on the prompt and provides step-by-step rationales. The detailed rationale aids the language model in building reasoning process for complex questions.
These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.
This fully labeled Hindi Chain of Thought Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt complexity, prompt category, domain, response, rationale, response type, and rich text presence.
Quality and Accuracy
Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses and rationales are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.
The Hindi version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.
Continuous Updates and Customization
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom chain of thought prompt completion data tailored to specific needs, providing flexibility and customization options.
License
The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy Hindi Chain of Thought Prompt Completion Dataset to enhance the rationale and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Huggingface Hub [source]
Know-Saraswati-COT is an open source dataset of powerful tools to support the training of models in logical reasoning and stream of consciousness thinking. Designed to advance knowledge unlocktion for everyone, this dataset was created using GPT-4 technology as an homage to Goddess Saraswati, the embodiment of wisdom and enlightenment. Guided by her grace, this corpus has been crafted with aim towards delving into deep introspection where thought processes and free flows can be analyzed. Encompassing both logic and creativity, Know-Saraswati-COT enables users to craft AI machine learning models that can encompass both analytical capacity and imaginative possibilities. This streamlined access point paths towards converting raw data into a standardized language encompassing syntax structure as well as understanding arguments --critical components for creative computational thought processes on a broad scale. Thus, Know-Saraswati-COT revolutionizes how we approach developing machines that understand not only instructions but also complex concepts that require comprehensive understanding for successful execution in real world applications
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
To begin working with this dataset, start by downloading the ‘Train.csv’ file from Kaggle which contains instructions and corresponding outputs for training models in logical reasoning and stream of consciousness thinking. The columns in this file include 'instruction' - which is the instruction given to a machine learning model - as well as the 'output' that has been generated by that model based on its own interpretation of the instruction received.
Once you have downloaded your dataset, it is important to make sure that it was downloaded correctly by carrying out some basic tests like verifying if all columns have been populated correctly or not. Verify if any instructions are repeating themselves within your file or not, as this will provide insight into how many examples you can use for training purposes, as well as help develop better systems over time through the process of continual improvement driven by feedback loops from users using these datasets regularly over time.
You can then start using data processing techniques such as normalization, feature extraction, etc., so a Machine Learning (ML) model can be trained properly on your dataset before making predictions about future test cases while testing model accuracy respectively. This could involve breaking up long strings into separate words/words-phrases or Malta-Grid Analysis etc., depending on which features need to be extracted from an individual string/instruction given within your dataset respectively. Increasingly complex scenarios could also demand additional data engineering processes such as Speech Recognition Parsing for extracting text information from audio formats/speech recognition applications etc., according to individual needs per project respectively so larger amounts of useful features can be captured accurately when capturing knowledge associated with any given topic discussed between humans naturally during conversation related situations ultimately aimed at helping humans better understand each other at further benefiting businesses through improved customer experience management techniques respectively later down their chosen paths right now today if they decide upon leveraging ML-related technologies appropriately towards future directions concurrently being applied across their landscapes right now today moving forward too now simultaneously facilities ascendant opportunities effectively along similarlands wayspaces strides past expected iterations eullated terms fitted interstingly conditions enquired sentiments reported outcomes outcomes retrieved conclusions signaled protocolized sets increasingly granularly blindly resignations metricus increments constantously occupying apps
- Using Know-Saraswati-COT to create engaging story lines by training models to generate new stories with logical reasoning and stream of consciousness thought processes.
- Training AI models to develop strong creative writing skills, especially for science fiction and fantasy genres.
- Utilizing the data set to expand on knowledge resources in fields such as philosophy, psychology, science, art and culture by understanding the response of GPT-4 models better with natural language instruction inputs
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionThis is the replication package for the paper "Automated Unit Test Generation via Chain of Thought Prompt and Reinforcement Learning".Organization of the Replication Packagecheckpoints.zip: fine-tuned models, including TestCTRL, TestCT, TestCT-no-cot, TestCT-intention, TestCT-input, TestCT-ti, CodeBERT-line, CodeT5-line, CodeGPT-line, CodeBERT-branch, CodeT5-branch, and CodeGPT-branch.dataset.zip: Datasets for fine-tuning and reinforcement learning, including the CoT dataset, reward dataset (reward folder), and the dataset for PPO optimization (rl folder).evaluation.zip: scripts for evaluating the generated tests, including CodeBLEU, syntactic correct rate, compilation passing rate, line coverage rate, and branch coverage rate.finetune.zip: scripts and configs for fine-tuning large language models for test generation.generated_test_result.zip: the generated tests.pretrain.zip: pre-trained models, including CodeLlama, CodeBERT, CodeT5, and CodeBERT.CoT_quality.zip: the example of evaluating CoT prompts.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
CoT-Verification-340k Dataset: Improving Reasoning Model Efficiency through Verification
This dataset is used for supervised verification fine-tuning of large reasoning models. It contains 340,000 question-solution pairs annotated with solution correctness, including 160,000 correct Chain-of-Thought (CoT) solutions and 190,000 incorrect ones. This data is designed to train models to effectively verify the correctness of reasoning steps, leading to more efficient and accurate… See the full description on the dataset page: https://huggingface.co/datasets/Zigeng/CoT-Verification-340k.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
COT is a dataset for object detection tasks - it contains Objects annotations for 2,221 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterCompiled-CoT: Enhancing Chain-of-Thought Reasoning
Compiled-CoT is a framework designed to improve Chain-of-Thought (CoT) reasoning capabilities in language models by leveraging curated datasets, refined prompting techniques, and adaptive learning mechanisms. It is designed to enhance model reasoning across various domains, especially in mathematical, logical, and commonsense tasks.
Contributing
Contributions are welcome! If you'd like to improve the framework or add new… See the full description on the dataset page: https://huggingface.co/datasets/Kameshr/Compiled-COT.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Bengali Chain of Thought prompt-response dataset, a meticulously curated collection containing 3000 comprehensive prompt and response pairs. This dataset is an invaluable resource for training Language Models (LMs) to generate well-reasoned answers and minimize inaccuracies. Its primary utility lies in enhancing LLMs' reasoning skills for solving arithmetic, common sense, symbolic reasoning, and complex problems.
This COT dataset comprises a diverse set of instructions and questions paired with corresponding answers and rationales in the Bengali language. These prompts and completions cover a broad range of topics and questions, including mathematical concepts, common sense reasoning, complex problem-solving, scientific inquiries, puzzles, and more.
Each prompt is meticulously accompanied by a response and rationale, providing essential information and insights to enhance the language model training process. These prompts, completions, and rationales were manually curated by native Bengali people, drawing references from various sources, including open-source datasets, news articles, websites, and other reliable references.
Our chain-of-thought prompt-completion dataset includes various prompt types, such as instructional prompts, continuations, and in-context learning (zero-shot, few-shot) prompts. Additionally, the dataset contains prompts and completions enriched with various forms of rich text, such as lists, tables, code snippets, JSON, and more, with proper markdown format.
To ensure a wide-ranging dataset, we have included prompts from a plethora of topics related to mathematics, common sense reasoning, and symbolic reasoning. These topics encompass arithmetic, percentages, ratios, geometry, analogies, spatial reasoning, temporal reasoning, logic puzzles, patterns, and sequences, among others.
These prompts vary in complexity, spanning easy, medium, and hard levels. Various question types are included, such as multiple-choice, direct queries, and true/false assessments.
To accommodate diverse learning experiences, our dataset incorporates different types of answers depending on the prompt and provides step-by-step rationales. The detailed rationale aids the language model in building reasoning process for complex questions.
These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.
This fully labeled Bengali Chain of Thought Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt complexity, prompt category, domain, response, rationale, response type, and rich text presence.
Quality and Accuracy
Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses and rationales are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.
The Bengali version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.
Continuous Updates and Customization
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom chain of thought prompt completion data tailored to specific needs, providing flexibility and customization options.
License
The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy Bengali Chain of Thought Prompt Completion Dataset to enhance the rationale and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.
Facebook
Twitter🦖🧠 Rex-Thinker: Grounded Object Referring via Chain-of-Thought Reasoning 🦖🧠
We propose Rex-Thinker, a Chain-of-Thought (CoT) reasoning model for object referring that addresses two key challenges: lack of interpretability and inability to reject unmatched expressions. Instead of directly predicting bounding boxes, Rex-Thinker reasons step-by-step over candidate objects to determine which, if any, match a given expression.… See the full description on the dataset page: https://huggingface.co/datasets/IDEA-Research/HumanRef-CoT-45k.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Scheming Detection CoT Dataset
Dataset Description
This dataset contains Chain-of-Thought (CoT) reasoning for the scheming detection task. The model is trained to explicitly reason through safety specifications before producing classifications, enabling:
More interpretable safety decisions Better policy adherence Improved robustness to edge cases Reduced overrefusal rates
Dataset Statistics
Total Samples: 44,129 Generated: 2025-11-27 Generation Model:… See the full description on the dataset page: https://huggingface.co/datasets/Syghmon/rich-cot.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison with the results of LLMs with CoT prompt.
Facebook
Twitter
According to our latest research, the global camping cot market size reached USD 1.34 billion in 2024, reflecting a robust and expanding industry. The market is poised for steady growth, with a projected compound annual growth rate (CAGR) of 5.7% from 2025 to 2033. By the end of 2033, the camping cot market is forecasted to achieve a value of USD 2.22 billion. This growth trajectory is primarily driven by the increasing popularity of outdoor recreational activities, rising disposable incomes, and the growing trend of adventure tourism worldwide. As per our latest research, these factors are expected to continue fueling demand for camping cots across diverse end-user segments over the next decade.
The primary growth factor propelling the camping cot market is the surge in outdoor recreational activities, including camping, hiking, and backpacking. As urbanization intensifies and lifestyles become more hectic, consumers are increasingly seeking opportunities to reconnect with nature and pursue wellness through outdoor experiences. This shift in consumer behavior has resulted in heightened demand for comfortable and convenient camping gear, with camping cots emerging as a preferred choice for ensuring restful sleep in outdoor environments. The proliferation of camping sites, national parks, and adventure travel operators has further contributed to the widespread adoption of camping cots, especially among millennials and families looking for safe and ergonomic sleeping solutions during their excursions.
Another crucial driver is the continuous innovation in materials and product design, which has significantly enhanced the functionality and portability of camping cots. Manufacturers are leveraging advanced materials such as lightweight aluminum alloys, high-tensile steel, and weather-resistant fabrics to create durable yet easy-to-carry products. The introduction of folding and portable camping cots has made it feasible for users to transport and set up their sleeping arrangements with minimal effort, catering to both individual campers and group expeditions. Additionally, the market has witnessed the emergence of specialized camping cots, including double cots for couples, kidsÂ’ cots for families, and heavy-duty models for commercial or institutional use, further expanding the consumer base.
The expanding e-commerce ecosystem and the rise of online retail channels have also played a pivotal role in accelerating market growth. Online platforms provide consumers with access to a wide array of camping cot options, detailed product descriptions, user reviews, and competitive pricing, thus enabling informed purchase decisions. The convenience of home delivery and easy return policies has encouraged more consumers to invest in camping gear online, especially in regions where physical specialty stores may be limited. This digital transformation, coupled with targeted marketing campaigns by leading brands, has significantly boosted market penetration and awareness, particularly among tech-savvy and younger demographics.
In recent years, the concept of pet camping has gained traction among outdoor enthusiasts, leading to the development of specialized products like the Pet Camping Cot. These cots are designed to provide a comfortable and elevated sleeping surface for pets, ensuring they can rest safely and comfortably during camping trips. With features such as durable frames, weather-resistant fabrics, and easy portability, pet camping cots have become a popular choice for pet owners who want to include their furry companions in outdoor adventures. As more families and individuals embrace pet-friendly travel, the demand for pet camping cots is expected to rise, encouraging manufacturers to innovate and expand their product offerings to cater to this growing market segment.
From a regional perspective, North America currently dominates the camping cot market, accounting for the largest revenue share in 2024, followed closely by Europe and the Asia Pacific. North AmericaÂ’s leadership can be attributed to its established outdoor recreation culture, extensive network of campsites, and high consumer spending on leisure activities. EuropeÂ’s market is also robust, supported by a strong tradition of camping and well-developed tourism infrastructure. Meanwhile, the Asia Pacific region is wi
Facebook
TwitterCCI4.0-M2 v1 Dataset Documentation
Tech Report👁
Overview
CCI4.0-M2 v1 is a comprehensive dataset collection consisting of two specialized subsets designed for language model training.
CCI4.0-M2-Base v1 CCI4.0-M2-CoT v1
Download Link BAAI_datahub / modelscope / hf BAAI_datahub / modelscope / hf
Notes 5.2TB Chinese webpage, 22TB English webpage, some data released in CCI4.0-M2-Extra(BAAI_datahub / modelscope / hf) due to the license concern. 430 million CoT… See the full description on the dataset page: https://huggingface.co/datasets/BAAI/CCI4.0-M2-CoT-v1.
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Dataset accompanying the paper "The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning", including 1.88M CoT rationales extracted across 1,060 tasks" - https://arxiv.org/abs/2305.14045
From the release repo https://github.com/kaistAI/CoT-Collection: Large Language Models (LLMs) have shown enhanced capabilities of solving novel tasks by reasoning step-by-step known as Chain-of-Thought (CoT) reasoning; how can we instill the same capability of reasoning step-by-step on unseen tasks into LMs that possess less than <100B parameters? To address this question, we first introduce the CoT Collection, a new instruction-tuning dataset that augments 1.88 million CoT rationales across 1,060 tasks. We show that continually fine-tuning Flan-T5 (3B & 11B) with the CoT Collection enables the 3B & 11B LMs to perform CoT better on unseen tasks, leading to an improvement in the average zero-shot accuracy on 27 datasets of the BIG-Bench-Hard benchmark by +4.34% and +2.44%, respectively. Furthermore, we show that instruction tuning with CoT allows LMs to possess stronger few-shot learning capabilities, resulting in an improvement of +2.97% and +2.37% on 4 domain-specific tasks over Flan-T5 (3B & 11B), respectively.