Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The dataset includes all chat conversations generated by GPT-4 that are hosted on open Huggingface datasets. Everything is converted to the same format so the datasets can be easily merged and used for large scale training of LLMs.
This dataset is a collection of several single chat datasets. If you use this dataset in your research, please credit the original authors of the internal datasets. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Facebook
Twitter"gpt3.5-gpt4-input-output-echram.zip" :
Input and output to GPT-3.5 and GPT-4 based on ECHR dataset published in JSON format in this paper for argument component classification only i.e. clauses that are argumentative (conclusion/premise), extracted from the JSON file
Note: Output of the model is under OpenAI Terms & policies.
Please cite our paper also if you use this dataset: Performance analysis of large language models in the domain of legal argument mining
You can click here for BibTex or copy the text below.
@ARTICLE{10.3389/frai.2023.1278796,
AUTHOR={Al Zubaer, Abdullah and Granitzer, Michael and Mitrović, Jelena },
TITLE={Performance analysis of large language models in the domain of legal argument mining},
JOURNAL={Frontiers in Artificial Intelligence},
VOLUME={6},
YEAR={2023},
URL={https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2023.1278796},
DOI={10.3389/frai.2023.1278796},
ISSN={2624-8212},
ABSTRACT={Generative pre-trained transformers (GPT) have recently demonstrated excellent performance in various natural language tasks. The development of ChatGPT and the recently released GPT-4 model has shown competence in solving complex and higher-order reasoning tasks without further training or fine-tuning. However, the applicability and strength of these models in classifying legal texts in the context of argument mining are yet to be realized and have not been tested thoroughly. In this study, we investigate the effectiveness of GPT-like models, specifically GPT-3.5 and GPT-4, for argument mining via prompting. We closely study the model's performance considering diverse prompt formulation and example selection in the prompt via semantic search using state-of-the-art embedding models from OpenAI and sentence transformers. We primarily concentrate on the argument component classification task on the legal corpus from the European Court of Human Rights. To address these models' inherent non-deterministic nature and make our result statistically sound, we conducted 5-fold cross-validation on the test set. Our experiments demonstrate, quite surprisingly, that relatively small domain-specific models outperform GPT 3.5 and GPT-4 in the F1-score for premise and conclusion classes, with 1.9% and 12% improvements, respectively. We hypothesize that the performance drop indirectly reflects the complexity of the structure in the dataset, which we verify through prompt and data analysis. Nevertheless, our results demonstrate a noteworthy variation in the performance of GPT models based on prompt formulation. We observe comparable performance between the two embedding models, with a slight improvement in the local model's ability for prompt selection. This suggests that local models are as semantically rich as the embeddings from the OpenAI model. Our results indicate that the structure of prompts significantly impacts the performance of GPT models and should be considered when designing them.}}
Facebook
TwitterGPT-3's water consumption for the training phase was estimated at roughly 4.8 billion liters of water, when assuming the model was trained on Microsoft's Iowa data center (OpeanAI has disclosed that the data center was used for training parts of the GPT-4 model). If the model were to have been fully trained in the Washington data center, water consumption could have been as high as 15 billion liters. That would've amounted to more than Microsoft's total water withdrawals in 2023.
Facebook
TwitterDataset Card for "GPT4-8K"
Sure! Here's a README.md file for your dataset:
Dataset Description
This dataset was generated using GPT-4, a powerful language model developed by OpenAI. It contains a collection of dialogs between a user and an assistant, along with additional information. from OpenChat
Dataset Configurations
The dataset includes the following configurations:
Config Name: default
Data Files: Split: train Path: data/train-*
Dataset… See the full description on the dataset page: https://huggingface.co/datasets/erfanzar/GPT4-8K.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The data was generated by gpt-4, and therefore is subject to OpenAI ToS. The tool used to generate the data airoboros is apache-2. Specific areas of focus for this training data:
trivia math nonsensical math coding closed context question answering closed context question answering, with multiple contexts to choose from as confounding factors writing multiple choice
Usage and License Notices
All airoboros models and datasets are intended and licensed for research use only.… See the full description on the dataset page: https://huggingface.co/datasets/jondurbin/airoboros-gpt4.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Huggingface Hub [source]
This dataset contains 100,000 feedback responses from GPT-4 AI models along with rubrics designed to evaluate both absolute and ranking scores. Each response is collected through a comprehensive evaluation process that takes into account the model's feedback, instruction, criteria for scoring, referenced answer and input given. This data provides researchers and developers with valuable insights into the performance of their AI models on various tasks as well as the ability to compare them against one another using precise and accurate measures. Each response is accompanied by five descriptive scores that give a detailed overview of its quality in terms of relevance to the input given, accuracy in reference to the reference answer provided, coherence between different parts of the output such as grammar and organization, fluency in expression of ideas without errors or unnecessary repetitions, and overall productivity accounting for all other factors combined. With this dataset at your disposal, you will be able to evaluate each output qualitatively without having to manually inspect every single response
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains feedback from GPT-4 models, along with associated rubrics for absolute and ranking scoring. It can be used to evaluate the performance of GPT-4 models on different challenging tasks.
In order to use this dataset effectively, it is important to understand the data provided in each column: - orig_feedback – Feedback given by the original GPT-4 model - orig_score2_description – Description of the second score given to the original GPT-4 model - orig_reference_answer – Reference answer used to evaluate the original GPT-4 model
- output – Output from the fine-grained evaluation
- orig_response – Response from the original GPT-4 model * orig_criteria – Criteria used to evaluate the original GPT-4 model *orig_instruction– Instruction given to the original GPT 4 model *orig_score3 _description– Description of third score given to
- Data-driven evaluation of GPT-4 models using the absolute and ranking scores collected from this dataset.
- Training a deep learning model to automate the assessment of GPT-4 responses based on the rubrics provided in this dataset.
- Building a semantic search engine using GPT-4 that is able to identify relevant responses more accurately with the help of this dataset's data collection metrics and rubrics for scoring
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: train.csv | Column name | Description | |:----------------------------|:---------------------------------------------------------------| | orig_feedback | Feedback from the evaluator. (Text) | | orig_score2_description | Description of the second score given by the evaluator. (Text) | | orig_reference_answer | Reference answer used to evaluate the model response. (Text) | | output | Output from the GPT-4 model. (Text) | | orig_response | Original response from the GPT-4 model. (Text) | | orig_criteria | Criteria used by the evaluator to rate the response. (Text) | | orig_instruction | Instructions provided by the evaluator. (Text) | | orig_score3_description | Description of the third score given by the evaluator. (Text) | | orig_score5_description | Description of the fifth score given by the evaluator. (Text) | | orig_score1_description | Description of the first score given by the evaluator. (Text) | | input | Input given to the evaluation. (Text) | | orig_score4_description | Description of the fourth score given by the evalua...
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The Generative Pre-trained Transformer (GPT) market is experiencing explosive growth, projected to reach a substantial USD 35,000 million by 2025, driven by an impressive Compound Annual Growth Rate (CAGR) of 35%. This rapid expansion is fueled by the increasing adoption of advanced AI technologies across diverse applications, from enhancing customer service in large enterprises to empowering small and medium-sized businesses (SMEs) with innovative content creation and automation tools. The evolution of GPT models, particularly the advanced capabilities of GPT-4, is a significant driver, offering more sophisticated language understanding, generation, and reasoning. This surge in demand is also attributed to the growing need for personalized user experiences, efficient data analysis, and the development of intelligent applications that can streamline complex tasks. Companies are increasingly investing in GPT-powered solutions to gain a competitive edge, improve operational efficiency, and unlock new revenue streams. The market is poised for continued robust expansion through 2033, as the capabilities of GPT technology continue to mature and integrate more deeply into business processes. Key trends include the rise of specialized GPT models tailored for specific industries, the development of multimodal GPTs capable of processing and generating various forms of data (text, images, audio), and the growing focus on ethical AI development and deployment. While the market benefits from substantial growth drivers, potential restraints include the high computational costs associated with training and running large GPT models, ongoing concerns regarding data privacy and security, and the need for skilled AI professionals to effectively implement and manage these solutions. Nonetheless, the overarching trend points towards a transformative impact of GPT across nearly every sector, with significant opportunities for innovation and market leadership. This report delves into the dynamic Generative Pre-trained Transformer (GPT) market, encompassing its current state and future trajectory from 2019-2033. With 2025 serving as the Base Year and Estimated Year, the Forecast Period spans 2025-2033, building upon the Historical Period of 2019-2024. The analysis will quantify market opportunities, projecting significant growth, potentially reaching hundreds of millions of dollars in market value.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The study provides a comprehensive review of OpenAI’s Generative Pre-trained Transformer 4 (GPT-4) technical report, with an emphasis on applications in high-risk settings like healthcare. A diverse team, including experts in artificial intelligence (AI), natural language processing, public health, law, policy, social science, healthcare research, and bioethics, analyzed the report against established peer review guidelines. The GPT-4 report shows a significant commitment to transparent AI research, particularly in creating a systems card for risk assessment and mitigation. However, it reveals limitations such as restricted access to training data, inadequate confidence and uncertainty estimations, and concerns over privacy and intellectual property rights. Key strengths identified include the considerable time and economic investment in transparent AI research and the creation of a comprehensive systems card. On the other hand, the lack of clarity in training processes and data raises concerns about encoded biases and interests in GPT-4. The report also lacks confidence and uncertainty estimations, crucial in high-risk areas like healthcare, and fails to address potential privacy and intellectual property issues. Furthermore, this study emphasizes the need for diverse, global involvement in developing and evaluating large language models (LLMs) to ensure broad societal benefits and mitigate risks. The paper presents recommendations such as improving data transparency, developing accountability frameworks, establishing confidence standards for LLM outputs in high-risk settings, and enhancing industry research review processes. It concludes that while GPT-4’s report is a step towards open discussions on LLMs, more extensive interdisciplinary reviews are essential for addressing bias, harm, and risk concerns, especially in high-risk domains. The review aims to expand the understanding of LLMs in general and highlights the need for new reflection forms on how LLMs are reviewed, the data required for effective evaluation, and addressing critical issues like bias and risk.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Multi-Turn Conversational Prompts from ChatGPT-4 (10K+ Tokens) Abstract: This dataset offers a valuable collection of multi-turn conversational prompts generated by ChatGPT-4, carefully curated for diverse prompt styles (chatml, gemma, llama). Each prompt exceeds 10,000 tokens, providing ample context and inspiration for training and evaluating large language models. Ideal for researchers and developers interested in exploring advanced conversational AI capabilities. Table of Contents:… See the full description on the dataset page: https://huggingface.co/datasets/erfanzar/GPT-4-Prompts.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Accuracy percentage of different models. Our model achieved 82.92% accuracy on MedQA’s MCMLE, surpassing GPT-4’s 71.07%, highlighting the efficacy of our architecture and training methods [35, 36]. Our system achieved a 64.02% accuracy on the USMLE, lower than GPT-4’s 74.71%, primarily due to the evidence-based categorization in the datasets.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
LLM Fine-Tuning Dataset - 4,000,000+ logs, 32 languages
The dataset contains over 4 million+ logs written in 32 languages and is tailored for LLM training. It includes log and response pairs from 3 models, and is designed for language models and instruction fine-tuning to achieve improved performance in various NLP tasks - Get the data
Models used for text generation:
GPT-3.5 GPT-4 Uncensored GPT Version (is not included inthe sample)
Languages in the… See the full description on the dataset page: https://huggingface.co/datasets/UniDataPro/llm-training-dataset.
Facebook
TwitterThis is the GPT4-LLM dataset from : https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM It has been filtered of all OpenAI disclaimers and refusals. (Disclaimer: It may have removed some additional things besides just OAI disclaimers, as I used the followings script which is a bit more broad: https://huggingface.co/datasets/ehartford/WizardLM_alpaca_evol_instruct_70k_unfiltered/blob/main/wizardlm_clean.py) There is a modified script of that in the repo that was used specifically for… See the full description on the dataset page: https://huggingface.co/datasets/teknium/GPT4-LLM-Cleaned.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This is the datamix created by Team 🔍 📝 🕵️‍♂️ 🤖 during the LLM - Detect AI Generated Text competition. This dataset helped us to win the competition. It facilitates a text-classification task to separate LLM generate essays from the student written ones.
It was developed in an incremental way focusing on size, diversity and complexity. For each datamix iteration, we attempted to plug blindspots of the previous generation models while maintaining robustness.
To maximally leverage in-domain human texts, we used the entire Persuade corpus comprising all 15 prompts. We also included diverse human texts from sources such as OpenAI GPT2 output dataset, ELLIPSE corpus, NarrativeQA, wikipedia, NLTK Brown corpus and IMDB movie reviews.
Sources for our generated essays can be grouped under four categories: - Proprietary LLMs (gpt-3.5, gpt-4, claude, cohere, gemini, palm) - Open source LLMs (llama, falcon, mistral, mixtral) - Existing LLM generated text datasets - Synthetic dataset made by T5 - DAIGT V2 subset - OUTFOX - Ghostbuster - gpt-2-output-dataset
We used a wide variety of generation configs and prompting strategies to promote diversity & complexity to the data. Generated essays leveraged a combination of the following: - Contrastive search - Use of Guidance scale, typical_p, suppress_tokens - High temperature & large values of top-k - Prompting to fill-in-the-blank: randomly mask words in an essay and asking LLM to reconstruct the original essay (similar to MLM) - Prompting without source texts - Prompting with source texts - Prompting to rewrite existing essays
Finally, we incorporated augmented essays to make our models aware of typical attacks on LLM content detection systems and obfuscations present in the provided training data. We mainly used a combination of the following augmentations on a random subset of essays: - Spelling correction - Deletion/insertion/swapping of characters - Replacement with synonym - Introduce obfuscations - Back translation - Random capitalization - Swap sentence
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Generative pre-trained transformers (GPT) have recently demonstrated excellent performance in various natural language tasks. The development of ChatGPT and the recently released GPT-4 model has shown competence in solving complex and higher-order reasoning tasks without further training or fine-tuning. However, the applicability and strength of these models in classifying legal texts in the context of argument mining are yet to be realized and have not been tested thoroughly. In this study, we investigate the effectiveness of GPT-like models, specifically GPT-3.5 and GPT-4, for argument mining via prompting. We closely study the model's performance considering diverse prompt formulation and example selection in the prompt via semantic search using state-of-the-art embedding models from OpenAI and sentence transformers. We primarily concentrate on the argument component classification task on the legal corpus from the European Court of Human Rights. To address these models' inherent non-deterministic nature and make our result statistically sound, we conducted 5-fold cross-validation on the test set. Our experiments demonstrate, quite surprisingly, that relatively small domain-specific models outperform GPT 3.5 and GPT-4 in the F1-score for premise and conclusion classes, with 1.9% and 12% improvements, respectively. We hypothesize that the performance drop indirectly reflects the complexity of the structure in the dataset, which we verify through prompt and data analysis. Nevertheless, our results demonstrate a noteworthy variation in the performance of GPT models based on prompt formulation. We observe comparable performance between the two embedding models, with a slight improvement in the local model's ability for prompt selection. This suggests that local models are as semantically rich as the embeddings from the OpenAI model. Our results indicate that the structure of prompts significantly impacts the performance of GPT models and should be considered when designing them.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
jojo_pii_refine, pii_with_document v1-v4 generated by gpt-3.5
gemini-1.0-pro-* generated by gemini-1.0-pro - 0.9k and 1.2k writers' occupation is none-student - 1.5k writers' occupation is student at different stages
gpt4_10k.json generated by gpt-4
jojo_pii_refine has version 1 to 20. Version 1-12 is used as jojo_5k.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance of GPT-3.5, GPT-4, and GPT-4o according to different subjects.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset is a subsample of the original OpenOrca dataset.
The OpenOrca dataset is a collection of augmented FLAN Collection data. Currently ~1M GPT-4 completions, and ~3.2M GPT-3.5 completions. It is tabularized in alignment with the distributions presented in the ORCA paper and currently represents a partial completion of the full intended dataset, with ongoing generation to expand its scope. The data is primarily used for training and evaluation in the field of natural language processing.
Each data instance in this dataset represents entries from the FLAN collection that have been augmented by submitting a listed question to either the GPT-4 or GPT-3.5 model. The response generated by the model is then recorded in the dataset.
Original Dataset:
OpenOrca ([https://huggingface.co/datasets/Open-Orca/OpenOrca])
Subsampling Methodology:
This subsample preserves the original distribution of the 17 unique 'system_prompt' values available in this feature in OpenOrca. We employed a stratified random sampling approach, selecting 5% (0.05 ratio) of the data points from each prompt style category. This ensures that the subsample retains the relative representation of different 'system_prompt' values while reducing the overall dataset size for focused analysis. While original dataset is around 4M rows, this dataset is 200K rows.
Supported Tasks and Leaderboards:
This dataset supports a range of tasks including language modeling, text generation, and text augmentation. It has been instrumental in the generation of multiple high-performing model checkpoints which have exhibited exceptional performance in our unit testing. Further information on leaderboards will be updated as they become available.
Use Cases
The dataset can be used for tasks related to language understanding, natural language processing, machine learning model training, and model performance evaluation.
Dataset Structure
Data Instances
A data instance in this dataset represents entries from the FLAN collection which have been augmented by submitting the listed question to either GPT-4 or GPT-3.5. The response is then entered into the response field.
Features
'id', a unique numbered identifier which includes one of 'niv', 't0', 'cot', or 'flan' to represent which source FLAN Collection submix the 'question' is sourced from.
'system_prompt', representing the System Prompt presented to the GPT-3.5 or GPT-4 API for the datapoint
'question', representing a question entry as provided by the FLAN Collection
'response', a response to that question received from a query to either GPT-3.5 or GPT-4.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study examines the multilingual capabilities of GPT, focusing on its handling of syntactic ambiguity across English, Korean, and Japanese. We investigate whether GPT can capture language-specific attachment preferences or if it relies primarily on English-centric training patterns. Using ambiguous relative clauses as a testing ground, we assess GPT’s interpretation tendencies across language contexts. Our findings reveal that, while GPT (GPT-3.5-turbo, GPT-4-turbo, GPT 4o)’s performance aligns with native English speakers’ preferred interpretations, it overgeneralizes this interpretation in Korean and lacks clear preferences in Japanese, despite distinct attachment biases among native speakers of these languages. The newer, smaller-scale models—o1-mini and o3-mini—further reinforce this trend by closely mirroring English attachment patterns in both Korean and Japanese. Overall results suggest that GPT’s multilingual proficiency is limited, likely reflecting a bias toward high-resource languages like English, although differences in model size and tuning strategies may partially mitigate the extent of English-centric generalization. While GPT models demonstrate aspects of human-like language processing, our findings underscore the need for further refinement to achieve a more nuanced engagement with linguistic diversity across languages.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences. We measure the relative importance of these two stages by training LIMA, a 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling. LIMA demonstrates remarkably strong performance, learning to follow specific response formats from only a handful of examples in the training data, including complex queries that range from planning trip itineraries to speculating about alternate history. Moreover, the model tends to generalize well to unseen tasks that did not appear in the training data. In a controlled human study, responses from LIMA are either equivalent or strictly preferred to GPT-4 in 43% of cases; this statistic is as high as 58% when compared to Bard and 65% versus DaVinci003, which was trained with human feedback. Taken together, these results strongly suggest that almost all knowledge in large language models is learned during pretraining, and only limited instruction tuning data is necessary to teach models to produce high quality output.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets for the paper entitled "Flexible, Model-Agnostic Method for Materials Data Extraction from Text Using General Purpose Language Models" by Maciej P. Polak, Shrey Modi, Anna Latosinska, Jinming Zhang, Ching-Wen Wang, Shanonan Wang, Ayan Deep Hazra, and Dane MorganMPPolak_BulkModulus_ValidationData.xlsx - a dataset of bulk modulus sentences, positive - containing bulk modulus data, and negative - not contaning data, used for model assessment.MPPolak_BulkModulus_AllTrainData.xlsx - a dataset of bulk modulus sentences, positive - containing bulk modulus data, and negative - not contaning data, used for fine tuning of the model and model assessment.MPPolak_CritCoolRate_Dataset.xlsx - a dataset of critical cooling rates for metallic glasses developed in this paper with the ,ethod presented in the paper, consisting of names of materials, values of critical cooling rates, their units, and DOIs of the source documents.MPPolak_DataExtraction_codes.zip - simple example codes necessary to reproduce the results. The provided 'positive' and 'negative' files are a shortened versions of the training data allowing for quick execution and testing. The 'pos' and 'neg' files contain full testing sets. The 'plotting' directory contains data and scripts which allow to reproduce the figures.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The dataset includes all chat conversations generated by GPT-4 that are hosted on open Huggingface datasets. Everything is converted to the same format so the datasets can be easily merged and used for large scale training of LLMs.
This dataset is a collection of several single chat datasets. If you use this dataset in your research, please credit the original authors of the internal datasets. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.