Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By garage-bAInd (From Huggingface) [source]
The garage-bAInd/Open-Platypus dataset is a curated collection of data specifically designed to enhance logical reasoning skills in LLM (Legal Language Model) models. It serves as a training resource for improving the ability of these models to reason logically and provide accurate solutions or answers to various logical reasoning questions.
This dataset, which has been utilized in training the Platypus2 models, consists of multiple datasets that have undergone a meticulous filtering process. Through keyword search and the application of Sentence Transformers technique, questions with a similarity score above 80% have been eliminated, ensuring that only unique and diverse logical reasoning questions are included.
The columns in this dataset include: -
input: The input text or question that requires logical reasoning. -output: The correct answer or solution to the logical reasoning question. -instruction: Additional instructions or guidelines for solving the logical reasoning question. -data_source: The source or origin of the logical reasoning question.By utilizing this comprehensive and carefully curated dataset, LLM models can be trained more effectively to improve their logical reasoning capabilities
How to Use This Dataset: Logical Reasoning Improvement
Dataset Overview
Columns
The dataset is organized into several columns, each serving a specific purpose:
- input: The input text or question that requires logical reasoning. This column provides the initial statement or problem that needs solving.
- output: The correct answer or solution to the logical reasoning question. This column contains the expected outcome or response.
- instruction: Additional instructions or guidelines for solving the logical reasoning question. This column provides any specific guidance or steps required to arrive at the correct answer.
- data_source: The source or origin of the logical reasoning question. This column specifies where the question was obtained from.
Usage Guidelines
To make effective use of this dataset, follow these guidelines:
- Familiarize Yourself: Take time to understand and familiarize yourself with each entry in the dataset.
- Analyze Inputs: Carefully read and analyze each input text/question provided in the input column.
- Solve Using Logic: Apply logical thinking and reasoning strategies based on your understanding of each problem.
- Confirm Answers: Compare your solutions with those provided in the output column to check their accuracy.
- Follow Instructions: Always consider any additional instructions given in the instruction column while solving a problem.
- Explore Data Sources: Utilize information from different data sources mentioned in the data_source column if needed.
Remember, practice makes perfect! Continuously work through the dataset to improve your logical reasoning skills.
Please note that this guide aims to help you utilize the dataset effectively. It does not provide direct solutions or explanations for specific entries in the dataset.
Contributing and Feedback
We believe in continuous improvement! If you have any feedback or would like to contribute additional logical reasoning questions, please feel free to do so. Together, we can enhance this dataset further and promote logical reasoning skills across LLM models.
Let's get started and embark on a journey of logical reasoning improvement with this curated dataset!
- Training and evaluating logical reasoning models: The dataset can be used to train and evaluate logical reasoning models, such as Platypus2, to enhance their performance in solving a variety of logical reasoning questions.
- Benchmarking logical reasoning algorithms: Researchers and developers can use this dataset as a benchmark for testing and comparing different logical reasoning algorithms and techniques.
- Creating educational resources: The dataset can be utilized to create educational resources or platforms that focus on improving logical reasoning skills. It can serve as a valuable source of practice questions for learners looking to enhance their abilities in this area
If you use this dataset in your research, please credit the original authors. Data Source
**Licen...
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Dataset accompanying the paper "The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning", including 1.88M CoT rationales extracted across 1,060 tasks" - https://arxiv.org/abs/2305.14045
From the release repo https://github.com/kaistAI/CoT-Collection: Large Language Models (LLMs) have shown enhanced capabilities of solving novel tasks by reasoning step-by-step known as Chain-of-Thought (CoT) reasoning; how can we instill the same capability of reasoning step-by-step on unseen tasks into LMs that possess less than <100B parameters? To address this question, we first introduce the CoT Collection, a new instruction-tuning dataset that augments 1.88 million CoT rationales across 1,060 tasks. We show that continually fine-tuning Flan-T5 (3B & 11B) with the CoT Collection enables the 3B & 11B LMs to perform CoT better on unseen tasks, leading to an improvement in the average zero-shot accuracy on 27 datasets of the BIG-Bench-Hard benchmark by +4.34% and +2.44%, respectively. Furthermore, we show that instruction tuning with CoT allows LMs to possess stronger few-shot learning capabilities, resulting in an improvement of +2.97% and +2.37% on 4 domain-specific tasks over Flan-T5 (3B & 11B), respectively.
Facebook
TwitterDataset Card for open-web-math_urls
This dataset provides the URLs and top-level domains associated with training records in open-web-math/open-web-math. It is part of a collection of datasets curated to make exploring LLM training datasets more straightforward and accessible.
Dataset Details
Dataset Description
This dataset was created by downloading the source data, extracting URLs and top-level domains, and retaining only those record identifiers.… See the full description on the dataset page: https://huggingface.co/datasets/nhagar/open-web-math_urls.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Bahasa Chain of Thought prompt-response dataset, a meticulously curated collection containing 3000 comprehensive prompt and response pairs. This dataset is an invaluable resource for training Language Models (LMs) to generate well-reasoned answers and minimize inaccuracies. Its primary utility lies in enhancing LLMs' reasoning skills for solving arithmetic, common sense, symbolic reasoning, and complex problems.
This COT dataset comprises a diverse set of instructions and questions paired with corresponding answers and rationales in the Bahasa language. These prompts and completions cover a broad range of topics and questions, including mathematical concepts, common sense reasoning, complex problem-solving, scientific inquiries, puzzles, and more.
Each prompt is meticulously accompanied by a response and rationale, providing essential information and insights to enhance the language model training process. These prompts, completions, and rationales were manually curated by native Bahasa people, drawing references from various sources, including open-source datasets, news articles, websites, and other reliable references.
Our chain-of-thought prompt-completion dataset includes various prompt types, such as instructional prompts, continuations, and in-context learning (zero-shot, few-shot) prompts. Additionally, the dataset contains prompts and completions enriched with various forms of rich text, such as lists, tables, code snippets, JSON, and more, with proper markdown format.
To ensure a wide-ranging dataset, we have included prompts from a plethora of topics related to mathematics, common sense reasoning, and symbolic reasoning. These topics encompass arithmetic, percentages, ratios, geometry, analogies, spatial reasoning, temporal reasoning, logic puzzles, patterns, and sequences, among others.
These prompts vary in complexity, spanning easy, medium, and hard levels. Various question types are included, such as multiple-choice, direct queries, and true/false assessments.
To accommodate diverse learning experiences, our dataset incorporates different types of answers depending on the prompt and provides step-by-step rationales. The detailed rationale aids the language model in building reasoning process for complex questions.
These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.
This fully labeled Bahasa Chain of Thought Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt complexity, prompt category, domain, response, rationale, response type, and rich text presence.
Quality and Accuracy
Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses and rationales are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.
The Bahasa version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.
Continuous Updates and Customization
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom chain of thought prompt completion data tailored to specific needs, providing flexibility and customization options.
License
The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy Bahasa Chain of Thought Prompt Completion Dataset to enhance the rationale and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for GSM8K
Dataset Summary
GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.
These problems take between 2 and 8 steps to solve. Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the… See the full description on the dataset page: https://huggingface.co/datasets/openai/gsm8k.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Spanish Chain of Thought prompt-response dataset, a meticulously curated collection containing 3000 comprehensive prompt and response pairs. This dataset is an invaluable resource for training Language Models (LMs) to generate well-reasoned answers and minimize inaccuracies. Its primary utility lies in enhancing LLMs' reasoning skills for solving arithmetic, common sense, symbolic reasoning, and complex problems.
This COT dataset comprises a diverse set of instructions and questions paired with corresponding answers and rationales in the Spanish language. These prompts and completions cover a broad range of topics and questions, including mathematical concepts, common sense reasoning, complex problem-solving, scientific inquiries, puzzles, and more.
Each prompt is meticulously accompanied by a response and rationale, providing essential information and insights to enhance the language model training process. These prompts, completions, and rationales were manually curated by native Spanish people, drawing references from various sources, including open-source datasets, news articles, websites, and other reliable references.
Our chain-of-thought prompt-completion dataset includes various prompt types, such as instructional prompts, continuations, and in-context learning (zero-shot, few-shot) prompts. Additionally, the dataset contains prompts and completions enriched with various forms of rich text, such as lists, tables, code snippets, JSON, and more, with proper markdown format.
To ensure a wide-ranging dataset, we have included prompts from a plethora of topics related to mathematics, common sense reasoning, and symbolic reasoning. These topics encompass arithmetic, percentages, ratios, geometry, analogies, spatial reasoning, temporal reasoning, logic puzzles, patterns, and sequences, among others.
These prompts vary in complexity, spanning easy, medium, and hard levels. Various question types are included, such as multiple-choice, direct queries, and true/false assessments.
To accommodate diverse learning experiences, our dataset incorporates different types of answers depending on the prompt and provides step-by-step rationales. The detailed rationale aids the language model in building reasoning process for complex questions.
These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.
This fully labeled Spanish Chain of Thought Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt complexity, prompt category, domain, response, rationale, response type, and rich text presence.
Quality and Accuracy
Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses and rationales are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.
The Spanish version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.
Continuous Updates and Customization
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom chain of thought prompt completion data tailored to specific needs, providing flexibility and customization options.
License
The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy Spanish Chain of Thought Prompt Completion Dataset to enhance the rationale and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Urdu Chain of Thought prompt-response dataset, a meticulously curated collection containing 3000 comprehensive prompt and response pairs. This dataset is an invaluable resource for training Language Models (LMs) to generate well-reasoned answers and minimize inaccuracies. Its primary utility lies in enhancing LLMs' reasoning skills for solving arithmetic, common sense, symbolic reasoning, and complex problems.
This COT dataset comprises a diverse set of instructions and questions paired with corresponding answers and rationales in the Urdu language. These prompts and completions cover a broad range of topics and questions, including mathematical concepts, common sense reasoning, complex problem-solving, scientific inquiries, puzzles, and more.
Each prompt is meticulously accompanied by a response and rationale, providing essential information and insights to enhance the language model training process. These prompts, completions, and rationales were manually curated by native Urdu people, drawing references from various sources, including open-source datasets, news articles, websites, and other reliable references.
Our chain-of-thought prompt-completion dataset includes various prompt types, such as instructional prompts, continuations, and in-context learning (zero-shot, few-shot) prompts. Additionally, the dataset contains prompts and completions enriched with various forms of rich text, such as lists, tables, code snippets, JSON, and more, with proper markdown format.
To ensure a wide-ranging dataset, we have included prompts from a plethora of topics related to mathematics, common sense reasoning, and symbolic reasoning. These topics encompass arithmetic, percentages, ratios, geometry, analogies, spatial reasoning, temporal reasoning, logic puzzles, patterns, and sequences, among others.
These prompts vary in complexity, spanning easy, medium, and hard levels. Various question types are included, such as multiple-choice, direct queries, and true/false assessments.
To accommodate diverse learning experiences, our dataset incorporates different types of answers depending on the prompt and provides step-by-step rationales. The detailed rationale aids the language model in building reasoning process for complex questions.
These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.
This fully labeled Urdu Chain of Thought Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt complexity, prompt category, domain, response, rationale, response type, and rich text presence.
Quality and Accuracy
Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses and rationales are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.
The Urdu version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.
Continuous Updates and Customization
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom chain of thought prompt completion data tailored to specific needs, providing flexibility and customization options.
License
The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy Urdu Chain of Thought Prompt Completion Dataset to enhance the rationale and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Kannada Chain of Thought prompt-response dataset, a meticulously curated collection containing 3000 comprehensive prompt and response pairs. This dataset is an invaluable resource for training Language Models (LMs) to generate well-reasoned answers and minimize inaccuracies. Its primary utility lies in enhancing LLMs' reasoning skills for solving arithmetic, common sense, symbolic reasoning, and complex problems.
This COT dataset comprises a diverse set of instructions and questions paired with corresponding answers and rationales in the Kannada language. These prompts and completions cover a broad range of topics and questions, including mathematical concepts, common sense reasoning, complex problem-solving, scientific inquiries, puzzles, and more.
Each prompt is meticulously accompanied by a response and rationale, providing essential information and insights to enhance the language model training process. These prompts, completions, and rationales were manually curated by native Kannada people, drawing references from various sources, including open-source datasets, news articles, websites, and other reliable references.
Our chain-of-thought prompt-completion dataset includes various prompt types, such as instructional prompts, continuations, and in-context learning (zero-shot, few-shot) prompts. Additionally, the dataset contains prompts and completions enriched with various forms of rich text, such as lists, tables, code snippets, JSON, and more, with proper markdown format.
To ensure a wide-ranging dataset, we have included prompts from a plethora of topics related to mathematics, common sense reasoning, and symbolic reasoning. These topics encompass arithmetic, percentages, ratios, geometry, analogies, spatial reasoning, temporal reasoning, logic puzzles, patterns, and sequences, among others.
These prompts vary in complexity, spanning easy, medium, and hard levels. Various question types are included, such as multiple-choice, direct queries, and true/false assessments.
To accommodate diverse learning experiences, our dataset incorporates different types of answers depending on the prompt and provides step-by-step rationales. The detailed rationale aids the language model in building reasoning process for complex questions.
These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.
This fully labeled Kannada Chain of Thought Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt complexity, prompt category, domain, response, rationale, response type, and rich text presence.
Quality and Accuracy
Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses and rationales are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.
The Kannada version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.
Continuous Updates and Customization
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom chain of thought prompt completion data tailored to specific needs, providing flexibility and customization options.
License
The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy Kannada Chain of Thought Prompt Completion Dataset to enhance the rationale and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By garage-bAInd (From Huggingface) [source]
The garage-bAInd/Open-Platypus dataset is a curated collection of data specifically designed to enhance logical reasoning skills in LLM (Legal Language Model) models. It serves as a training resource for improving the ability of these models to reason logically and provide accurate solutions or answers to various logical reasoning questions.
This dataset, which has been utilized in training the Platypus2 models, consists of multiple datasets that have undergone a meticulous filtering process. Through keyword search and the application of Sentence Transformers technique, questions with a similarity score above 80% have been eliminated, ensuring that only unique and diverse logical reasoning questions are included.
The columns in this dataset include: -
input: The input text or question that requires logical reasoning. -output: The correct answer or solution to the logical reasoning question. -instruction: Additional instructions or guidelines for solving the logical reasoning question. -data_source: The source or origin of the logical reasoning question.By utilizing this comprehensive and carefully curated dataset, LLM models can be trained more effectively to improve their logical reasoning capabilities
How to Use This Dataset: Logical Reasoning Improvement
Dataset Overview
Columns
The dataset is organized into several columns, each serving a specific purpose:
- input: The input text or question that requires logical reasoning. This column provides the initial statement or problem that needs solving.
- output: The correct answer or solution to the logical reasoning question. This column contains the expected outcome or response.
- instruction: Additional instructions or guidelines for solving the logical reasoning question. This column provides any specific guidance or steps required to arrive at the correct answer.
- data_source: The source or origin of the logical reasoning question. This column specifies where the question was obtained from.
Usage Guidelines
To make effective use of this dataset, follow these guidelines:
- Familiarize Yourself: Take time to understand and familiarize yourself with each entry in the dataset.
- Analyze Inputs: Carefully read and analyze each input text/question provided in the input column.
- Solve Using Logic: Apply logical thinking and reasoning strategies based on your understanding of each problem.
- Confirm Answers: Compare your solutions with those provided in the output column to check their accuracy.
- Follow Instructions: Always consider any additional instructions given in the instruction column while solving a problem.
- Explore Data Sources: Utilize information from different data sources mentioned in the data_source column if needed.
Remember, practice makes perfect! Continuously work through the dataset to improve your logical reasoning skills.
Please note that this guide aims to help you utilize the dataset effectively. It does not provide direct solutions or explanations for specific entries in the dataset.
Contributing and Feedback
We believe in continuous improvement! If you have any feedback or would like to contribute additional logical reasoning questions, please feel free to do so. Together, we can enhance this dataset further and promote logical reasoning skills across LLM models.
Let's get started and embark on a journey of logical reasoning improvement with this curated dataset!
- Training and evaluating logical reasoning models: The dataset can be used to train and evaluate logical reasoning models, such as Platypus2, to enhance their performance in solving a variety of logical reasoning questions.
- Benchmarking logical reasoning algorithms: Researchers and developers can use this dataset as a benchmark for testing and comparing different logical reasoning algorithms and techniques.
- Creating educational resources: The dataset can be utilized to create educational resources or platforms that focus on improving logical reasoning skills. It can serve as a valuable source of practice questions for learners looking to enhance their abilities in this area
If you use this dataset in your research, please credit the original authors. Data Source
**Licen...