8 datasets found

Logical Reasoning Improvement Dataset
kaggle.com
zip
Updated Nov 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Logical Reasoning Improvement Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/logical-reasoning-improvement-dataset
Explore at:
zip(9336513 bytes)Available download formats
Dataset updated
Nov 30, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Logical Reasoning Improvement Dataset

Enhancing LLM Logical Reasoning Skills with Platypus2 Models

By garage-bAInd (From Huggingface) [source]

About this dataset

The garage-bAInd/Open-Platypus dataset is a curated collection of data specifically designed to enhance logical reasoning skills in LLM (Legal Language Model) models. It serves as a training resource for improving the ability of these models to reason logically and provide accurate solutions or answers to various logical reasoning questions.

This dataset, which has been utilized in training the Platypus2 models, consists of multiple datasets that have undergone a meticulous filtering process. Through keyword search and the application of Sentence Transformers technique, questions with a similarity score above 80% have been eliminated, ensuring that only unique and diverse logical reasoning questions are included.

The columns in this dataset include: - input : The input text or question that requires logical reasoning. - output : The correct answer or solution to the logical reasoning question. - instruction : Additional instructions or guidelines for solving the logical reasoning question. - data_source : The source or origin of the logical reasoning question.

By utilizing this comprehensive and carefully curated dataset, LLM models can be trained more effectively to improve their logical reasoning capabilities

How to use the dataset

How to Use This Dataset: Logical Reasoning Improvement

Dataset Overview

Columns

The dataset is organized into several columns, each serving a specific purpose:

input: The input text or question that requires logical reasoning. This column provides the initial statement or problem that needs solving.

output: The correct answer or solution to the logical reasoning question. This column contains the expected outcome or response.

instruction: Additional instructions or guidelines for solving the logical reasoning question. This column provides any specific guidance or steps required to arrive at the correct answer.

data_source: The source or origin of the logical reasoning question. This column specifies where the question was obtained from.

Usage Guidelines

To make effective use of this dataset, follow these guidelines:

Familiarize Yourself: Take time to understand and familiarize yourself with each entry in the dataset.

Analyze Inputs: Carefully read and analyze each input text/question provided in the input column.

Solve Using Logic: Apply logical thinking and reasoning strategies based on your understanding of each problem.

Confirm Answers: Compare your solutions with those provided in the output column to check their accuracy.

Follow Instructions: Always consider any additional instructions given in the instruction column while solving a problem.

Explore Data Sources: Utilize information from different data sources mentioned in the data_source column if needed.

Remember, practice makes perfect! Continuously work through the dataset to improve your logical reasoning skills.

Please note that this guide aims to help you utilize the dataset effectively. It does not provide direct solutions or explanations for specific entries in the dataset.

Contributing and Feedback

We believe in continuous improvement! If you have any feedback or would like to contribute additional logical reasoning questions, please feel free to do so. Together, we can enhance this dataset further and promote logical reasoning skills across LLM models.

Let's get started and embark on a journey of logical reasoning improvement with this curated dataset!

Research Ideas

Training and evaluating logical reasoning models: The dataset can be used to train and evaluate logical reasoning models, such as Platypus2, to enhance their performance in solving a variety of logical reasoning questions.

Benchmarking logical reasoning algorithms: Researchers and developers can use this dataset as a benchmark for testing and comparing different logical reasoning algorithms and techniques.

Creating educational resources: The dataset can be utilized to create educational resources or platforms that focus on improving logical reasoning skills. It can serve as a valuable source of practice questions for learners looking to enhance their abilities in this area

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

**Licen...
Chain-of-Thought collection
kaggle.com
zip
Updated Jun 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Konrad Banachewicz (2023). Chain-of-Thought collection [Dataset]. http://identifiers.org/arxiv:2305.140
Explore at:
zip(1260225915 bytes)Available download formats
Unique identifier
https://identifiers.org/arxiv:2305.140
Dataset updated
Jun 19, 2023
Authors
Konrad Banachewicz
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Dataset accompanying the paper "The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning", including 1.88M CoT rationales extracted across 1,060 tasks" - https://arxiv.org/abs/2305.14045

From the release repo https://github.com/kaistAI/CoT-Collection: Large Language Models (LLMs) have shown enhanced capabilities of solving novel tasks by reasoning step-by-step known as Chain-of-Thought (CoT) reasoning; how can we instill the same capability of reasoning step-by-step on unseen tasks into LMs that possess less than <100B parameters? To address this question, we first introduce the CoT Collection, a new instruction-tuning dataset that augments 1.88 million CoT rationales across 1,060 tasks. We show that continually fine-tuning Flan-T5 (3B & 11B) with the CoT Collection enables the 3B & 11B LMs to perform CoT better on unseen tasks, leading to an improvement in the average zero-shot accuracy on 27 datasets of the BIG-Bench-Hard benchmark by +4.34% and +2.44%, respectively. Furthermore, we show that instruction tuning with CoT allows LMs to possess stronger few-shot learning capabilities, resulting in an improvement of +2.97% and +2.37% on 4 domain-specific tasks over Flan-T5 (3B & 11B), respectively.
h
open-web-math_urls
huggingface.co
Updated May 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nick Hagar (2025). open-web-math_urls [Dataset]. http://doi.org/10.57967/hf/5495
Explore at:
Unique identifier
https://doi.org/10.57967/hf/5495
Dataset updated
May 15, 2025
Authors
Nick Hagar
Description
Dataset Card for open-web-math_urls

This dataset provides the URLs and top-level domains associated with training records in open-web-math/open-web-math. It is part of a collection of datasets curated to make exploring LLM training datasets more straightforward and accessible.

Dataset Details Dataset Description

This dataset was created by downloading the source data, extracting URLs and top-level domains, and retaining only those record identifiers.… See the full description on the dataset page: https://huggingface.co/datasets/nhagar/open-web-math_urls.
F
Bahasa Chain of Thought Prompt & Response Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Bahasa Chain of Thought Prompt & Response Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/bahasa-chain-of-thought-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Welcome to the Bahasa Chain of Thought prompt-response dataset, a meticulously curated collection containing 3000 comprehensive prompt and response pairs. This dataset is an invaluable resource for training Language Models (LMs) to generate well-reasoned answers and minimize inaccuracies. Its primary utility lies in enhancing LLMs' reasoning skills for solving arithmetic, common sense, symbolic reasoning, and complex problems.
Dataset Content
This COT dataset comprises a diverse set of instructions and questions paired with corresponding answers and rationales in the Bahasa language. These prompts and completions cover a broad range of topics and questions, including mathematical concepts, common sense reasoning, complex problem-solving, scientific inquiries, puzzles, and more.
Each prompt is meticulously accompanied by a response and rationale, providing essential information and insights to enhance the language model training process. These prompts, completions, and rationales were manually curated by native Bahasa people, drawing references from various sources, including open-source datasets, news articles, websites, and other reliable references.
Our chain-of-thought prompt-completion dataset includes various prompt types, such as instructional prompts, continuations, and in-context learning (zero-shot, few-shot) prompts. Additionally, the dataset contains prompts and completions enriched with various forms of rich text, such as lists, tables, code snippets, JSON, and more, with proper markdown format.
Prompt Diversity
To ensure a wide-ranging dataset, we have included prompts from a plethora of topics related to mathematics, common sense reasoning, and symbolic reasoning. These topics encompass arithmetic, percentages, ratios, geometry, analogies, spatial reasoning, temporal reasoning, logic puzzles, patterns, and sequences, among others.
These prompts vary in complexity, spanning easy, medium, and hard levels. Various question types are included, such as multiple-choice, direct queries, and true/false assessments.
Response Formats
To accommodate diverse learning experiences, our dataset incorporates different types of answers depending on the prompt and provides step-by-step rationales. The detailed rationale aids the language model in building reasoning process for complex questions.
These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.
Data Format and Annotation Details
This fully labeled Bahasa Chain of Thought Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt complexity, prompt category, domain, response, rationale, response type, and rich text presence.
Quality and Accuracy
Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses and rationales are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.
The Bahasa version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.
Continuous Updates and Customization
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom chain of thought prompt completion data tailored to specific needs, providing flexibility and customization options.
License
The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy Bahasa Chain of Thought Prompt Completion Dataset to enhance the rationale and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.
gsm8k
huggingface.co
Updated Aug 11, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenAI (2022). gsm8k [Dataset]. https://huggingface.co/datasets/openai/gsm8k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 11, 2022
Dataset authored and provided by
OpenAIhttp://openai.com/
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for GSM8K

Dataset Summary

GSM8K (Grade School Math 8K) is a dataset of 8.5K high quality linguistically diverse grade school math word problems. The dataset was created to support the task of question answering on basic mathematical problems that require multi-step reasoning.

These problems take between 2 and 8 steps to solve. Solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − ×÷) to reach the… See the full description on the dataset page: https://huggingface.co/datasets/openai/gsm8k.
F
Spanish Chain of Thought Prompt & Response Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Spanish Chain of Thought Prompt & Response Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/spanish-chain-of-thought-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Welcome to the Spanish Chain of Thought prompt-response dataset, a meticulously curated collection containing 3000 comprehensive prompt and response pairs. This dataset is an invaluable resource for training Language Models (LMs) to generate well-reasoned answers and minimize inaccuracies. Its primary utility lies in enhancing LLMs' reasoning skills for solving arithmetic, common sense, symbolic reasoning, and complex problems.
Dataset Content
This COT dataset comprises a diverse set of instructions and questions paired with corresponding answers and rationales in the Spanish language. These prompts and completions cover a broad range of topics and questions, including mathematical concepts, common sense reasoning, complex problem-solving, scientific inquiries, puzzles, and more.
Each prompt is meticulously accompanied by a response and rationale, providing essential information and insights to enhance the language model training process. These prompts, completions, and rationales were manually curated by native Spanish people, drawing references from various sources, including open-source datasets, news articles, websites, and other reliable references.
Our chain-of-thought prompt-completion dataset includes various prompt types, such as instructional prompts, continuations, and in-context learning (zero-shot, few-shot) prompts. Additionally, the dataset contains prompts and completions enriched with various forms of rich text, such as lists, tables, code snippets, JSON, and more, with proper markdown format.
Prompt Diversity
To ensure a wide-ranging dataset, we have included prompts from a plethora of topics related to mathematics, common sense reasoning, and symbolic reasoning. These topics encompass arithmetic, percentages, ratios, geometry, analogies, spatial reasoning, temporal reasoning, logic puzzles, patterns, and sequences, among others.
These prompts vary in complexity, spanning easy, medium, and hard levels. Various question types are included, such as multiple-choice, direct queries, and true/false assessments.
Response Formats
To accommodate diverse learning experiences, our dataset incorporates different types of answers depending on the prompt and provides step-by-step rationales. The detailed rationale aids the language model in building reasoning process for complex questions.
These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.
Data Format and Annotation Details
This fully labeled Spanish Chain of Thought Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt complexity, prompt category, domain, response, rationale, response type, and rich text presence.
Quality and Accuracy
Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses and rationales are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.
The Spanish version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.
Continuous Updates and Customization
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom chain of thought prompt completion data tailored to specific needs, providing flexibility and customization options.
License
The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy Spanish Chain of Thought Prompt Completion Dataset to enhance the rationale and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.
F
Urdu Chain of Thought Prompt & Response Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Urdu Chain of Thought Prompt & Response Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/urdu-chain-of-thought-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Welcome to the Urdu Chain of Thought prompt-response dataset, a meticulously curated collection containing 3000 comprehensive prompt and response pairs. This dataset is an invaluable resource for training Language Models (LMs) to generate well-reasoned answers and minimize inaccuracies. Its primary utility lies in enhancing LLMs' reasoning skills for solving arithmetic, common sense, symbolic reasoning, and complex problems.
Dataset Content
This COT dataset comprises a diverse set of instructions and questions paired with corresponding answers and rationales in the Urdu language. These prompts and completions cover a broad range of topics and questions, including mathematical concepts, common sense reasoning, complex problem-solving, scientific inquiries, puzzles, and more.
Each prompt is meticulously accompanied by a response and rationale, providing essential information and insights to enhance the language model training process. These prompts, completions, and rationales were manually curated by native Urdu people, drawing references from various sources, including open-source datasets, news articles, websites, and other reliable references.
Our chain-of-thought prompt-completion dataset includes various prompt types, such as instructional prompts, continuations, and in-context learning (zero-shot, few-shot) prompts. Additionally, the dataset contains prompts and completions enriched with various forms of rich text, such as lists, tables, code snippets, JSON, and more, with proper markdown format.
Prompt Diversity
To ensure a wide-ranging dataset, we have included prompts from a plethora of topics related to mathematics, common sense reasoning, and symbolic reasoning. These topics encompass arithmetic, percentages, ratios, geometry, analogies, spatial reasoning, temporal reasoning, logic puzzles, patterns, and sequences, among others.
These prompts vary in complexity, spanning easy, medium, and hard levels. Various question types are included, such as multiple-choice, direct queries, and true/false assessments.
Response Formats
To accommodate diverse learning experiences, our dataset incorporates different types of answers depending on the prompt and provides step-by-step rationales. The detailed rationale aids the language model in building reasoning process for complex questions.
These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.
Data Format and Annotation Details
This fully labeled Urdu Chain of Thought Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt complexity, prompt category, domain, response, rationale, response type, and rich text presence.
Quality and Accuracy
Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses and rationales are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.
The Urdu version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.
Continuous Updates and Customization
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom chain of thought prompt completion data tailored to specific needs, providing flexibility and customization options.
License
The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy Urdu Chain of Thought Prompt Completion Dataset to enhance the rationale and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.
F
Kannada Chain of Thought Prompt & Response Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Kannada Chain of Thought Prompt & Response Dataset [Dataset]. https://www.futurebeeai.com/dataset/prompt-response-dataset/kannada-chain-of-thought-text-dataset
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Welcome to the Kannada Chain of Thought prompt-response dataset, a meticulously curated collection containing 3000 comprehensive prompt and response pairs. This dataset is an invaluable resource for training Language Models (LMs) to generate well-reasoned answers and minimize inaccuracies. Its primary utility lies in enhancing LLMs' reasoning skills for solving arithmetic, common sense, symbolic reasoning, and complex problems.
Dataset Content
This COT dataset comprises a diverse set of instructions and questions paired with corresponding answers and rationales in the Kannada language. These prompts and completions cover a broad range of topics and questions, including mathematical concepts, common sense reasoning, complex problem-solving, scientific inquiries, puzzles, and more.
Each prompt is meticulously accompanied by a response and rationale, providing essential information and insights to enhance the language model training process. These prompts, completions, and rationales were manually curated by native Kannada people, drawing references from various sources, including open-source datasets, news articles, websites, and other reliable references.
Our chain-of-thought prompt-completion dataset includes various prompt types, such as instructional prompts, continuations, and in-context learning (zero-shot, few-shot) prompts. Additionally, the dataset contains prompts and completions enriched with various forms of rich text, such as lists, tables, code snippets, JSON, and more, with proper markdown format.
Prompt Diversity
To ensure a wide-ranging dataset, we have included prompts from a plethora of topics related to mathematics, common sense reasoning, and symbolic reasoning. These topics encompass arithmetic, percentages, ratios, geometry, analogies, spatial reasoning, temporal reasoning, logic puzzles, patterns, and sequences, among others.
These prompts vary in complexity, spanning easy, medium, and hard levels. Various question types are included, such as multiple-choice, direct queries, and true/false assessments.
Response Formats
To accommodate diverse learning experiences, our dataset incorporates different types of answers depending on the prompt and provides step-by-step rationales. The detailed rationale aids the language model in building reasoning process for complex questions.
These responses encompass text strings, numerical values, and date and time formats, enhancing the language model's ability to generate reliable, coherent, and contextually appropriate answers.
Data Format and Annotation Details
This fully labeled Kannada Chain of Thought Prompt Completion Dataset is available in JSON and CSV formats. It includes annotation details such as a unique ID, prompt, prompt type, prompt complexity, prompt category, domain, response, rationale, response type, and rich text presence.
Quality and Accuracy
Our dataset upholds the highest standards of quality and accuracy. Each prompt undergoes meticulous validation, and the corresponding responses and rationales are thoroughly verified. We prioritize inclusivity, ensuring that the dataset incorporates prompts and completions representing diverse perspectives and writing styles, maintaining an unbiased and discrimination-free stance.
The Kannada version is grammatically accurate without any spelling or grammatical errors. No copyrighted, toxic, or harmful content is used during the construction of this dataset.
Continuous Updates and Customization
The entire dataset was prepared with the assistance of human curators from the FutureBeeAI crowd community. Ongoing efforts are made to add more assets to this dataset, ensuring its growth and relevance. Additionally, FutureBeeAI offers the ability to gather custom chain of thought prompt completion data tailored to specific needs, providing flexibility and customization options.
License
The dataset, created by FutureBeeAI, is now available for commercial use. Researchers, data scientists, and developers can leverage this fully labeled and ready-to-deploy Kannada Chain of Thought Prompt Completion Dataset to enhance the rationale and accurate response generation capabilities of their generative AI models and explore new approaches to NLP tasks.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

The Devastator (2023). Logical Reasoning Improvement Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/logical-reasoning-improvement-dataset

Logical Reasoning Improvement Dataset

Enhancing LLM Logical Reasoning Skills with Platypus2 Models

Explore at:

21 scholarly articles cite this dataset (View in Google Scholar)

zip(9336513 bytes)Available download formats

Dataset updated

Nov 30, 2023

Authors

The Devastator

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Logical Reasoning Improvement Dataset

Enhancing LLM Logical Reasoning Skills with Platypus2 Models

By garage-bAInd (From Huggingface) [source]

About this dataset

The garage-bAInd/Open-Platypus dataset is a curated collection of data specifically designed to enhance logical reasoning skills in LLM (Legal Language Model) models. It serves as a training resource for improving the ability of these models to reason logically and provide accurate solutions or answers to various logical reasoning questions.

This dataset, which has been utilized in training the Platypus2 models, consists of multiple datasets that have undergone a meticulous filtering process. Through keyword search and the application of Sentence Transformers technique, questions with a similarity score above 80% have been eliminated, ensuring that only unique and diverse logical reasoning questions are included.

The columns in this dataset include: - input : The input text or question that requires logical reasoning. - output : The correct answer or solution to the logical reasoning question. - instruction : Additional instructions or guidelines for solving the logical reasoning question. - data_source : The source or origin of the logical reasoning question.

By utilizing this comprehensive and carefully curated dataset, LLM models can be trained more effectively to improve their logical reasoning capabilities

How to use the dataset

How to Use This Dataset: Logical Reasoning Improvement

Dataset Overview

Columns

The dataset is organized into several columns, each serving a specific purpose:

input: The input text or question that requires logical reasoning. This column provides the initial statement or problem that needs solving.

output: The correct answer or solution to the logical reasoning question. This column contains the expected outcome or response.

instruction: Additional instructions or guidelines for solving the logical reasoning question. This column provides any specific guidance or steps required to arrive at the correct answer.

data_source: The source or origin of the logical reasoning question. This column specifies where the question was obtained from.

Usage Guidelines

To make effective use of this dataset, follow these guidelines:

Familiarize Yourself: Take time to understand and familiarize yourself with each entry in the dataset.

Analyze Inputs: Carefully read and analyze each input text/question provided in the input column.

Solve Using Logic: Apply logical thinking and reasoning strategies based on your understanding of each problem.

Confirm Answers: Compare your solutions with those provided in the output column to check their accuracy.

Follow Instructions: Always consider any additional instructions given in the instruction column while solving a problem.

Explore Data Sources: Utilize information from different data sources mentioned in the data_source column if needed.

Remember, practice makes perfect! Continuously work through the dataset to improve your logical reasoning skills.

Please note that this guide aims to help you utilize the dataset effectively. It does not provide direct solutions or explanations for specific entries in the dataset.

Contributing and Feedback

We believe in continuous improvement! If you have any feedback or would like to contribute additional logical reasoning questions, please feel free to do so. Together, we can enhance this dataset further and promote logical reasoning skills across LLM models.

Let's get started and embark on a journey of logical reasoning improvement with this curated dataset!

Research Ideas

Training and evaluating logical reasoning models: The dataset can be used to train and evaluate logical reasoning models, such as Platypus2, to enhance their performance in solving a variety of logical reasoning questions.

Benchmarking logical reasoning algorithms: Researchers and developers can use this dataset as a benchmark for testing and comparing different logical reasoning algorithms and techniques.

Creating educational resources: The dataset can be utilized to create educational resources or platforms that focus on improving logical reasoning skills. It can serve as a valuable source of practice questions for learners looking to enhance their abilities in this area

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

**Licen...

Clear search

Close search

Google apps

Main menu

Logical Reasoning Improvement Dataset

Logical Reasoning Improvement Dataset

Enhancing LLM Logical Reasoning Skills with Platypus2 Models

About this dataset

How to use the dataset

How to Use This Dataset: Logical Reasoning Improvement

Dataset Overview

Columns

Usage Guidelines

Contributing and Feedback

Research Ideas

Acknowledgements

License

Chain-of-Thought collection

open-web-math_urls

Bahasa Chain of Thought Prompt & Response Dataset

Dataset Content

Prompt Diversity

Response Formats

Data Format and Annotation Details

gsm8k

Spanish Chain of Thought Prompt & Response Dataset

Dataset Content

Prompt Diversity

Response Formats

Data Format and Annotation Details

Urdu Chain of Thought Prompt & Response Dataset

Dataset Content

Prompt Diversity

Response Formats

Data Format and Annotation Details

Kannada Chain of Thought Prompt & Response Dataset

Dataset Content

Prompt Diversity

Response Formats

Data Format and Annotation Details

Logical Reasoning Improvement Dataset

Enhancing LLM Logical Reasoning Skills with Platypus2 Models

Logical Reasoning Improvement Dataset

Enhancing LLM Logical Reasoning Skills with Platypus2 Models

About this dataset

How to use the dataset

How to Use This Dataset: Logical Reasoning Improvement

Dataset Overview

Columns

Usage Guidelines

Contributing and Feedback

Research Ideas

Acknowledgements

License