100+ datasets found
  1. B

    Data from: TruthEval: A Dataset to Evaluate LLM Truthfulness and Reliability...

    • borealisdata.ca
    Updated Jul 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aisha Khatun; Dan Brown (2024). TruthEval: A Dataset to Evaluate LLM Truthfulness and Reliability [Dataset]. http://doi.org/10.5683/SP3/5MZWBV
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 30, 2024
    Dataset provided by
    Borealis
    Authors
    Aisha Khatun; Dan Brown
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Large Language Model (LLM) evaluation is currently one of the most important areas of research, with existing benchmarks proving to be insufficient and not completely representative of LLMs' various capabilities. We present a curated collection of challenging statements on sensitive topics for LLM benchmarking called TruthEval. These statements were curated by hand and contain known truth values. The categories were chosen to distinguish LLMs' abilities from their stochastic nature. Details of collection method and use cases can be found in this paper: TruthEval: A Dataset to Evaluate LLM Truthfulness and Reliability

  2. d

    FileMarket | Telegram Users Geolocation Data with IP & Consent | 50,000...

    • datarade.ai
    Updated Aug 18, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FileMarket (2024). FileMarket | Telegram Users Geolocation Data with IP & Consent | 50,000 Records | AI, ML, DL & LLM Training Data [Dataset]. https://datarade.ai/data-products/filemarket-telegram-users-geolocation-data-with-ip-consen-filemarket
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Aug 18, 2024
    Dataset authored and provided by
    FileMarket
    Area covered
    Malaysia, Syrian Arab Republic, Portugal, Korea (Republic of), Anguilla, Uzbekistan, Martinique, Kiribati, Gambia, Thailand
    Description

    This dataset offers a comprehensive collection of Telegram users' geolocation data, including IP addresses, with full user consent, covering 50,000 records. This data is specifically tailored for use in AI, ML, DL, and LLM models, as well as applications requiring Geographic Data and Social Media Data. The dataset provides critical geospatial information, making it a valuable resource for developing location-based services, targeted marketing strategies, and more.

    What Makes This Data Unique? This dataset is unique due to its focus on geolocation data tied to Telegram users, a platform with a global user base. It includes IP to Geolocation Data, offering precise geospatial insights that are essential for accurate geographic analysis. The inclusion of user consent ensures that the data is ethically sourced and legally compliant. The dataset's broad coverage across various regions makes it particularly valuable for AI and machine learning models that require diverse, real-world data inputs.

    Data Sourcing: The data is collected through a network of in-app tasks across different mini-apps within Telegram. Users participate in these tasks voluntarily, providing explicit consent to share their geolocation and IP information. The data is collected in real-time, capturing accurate geospatial details as users interact with various Telegram mini-apps. This method of data collection ensures that the information is both relevant and up-to-date, making it highly valuable for applications that require current location data.

    Primary Use-Cases: This dataset is highly versatile and can be applied across multiple categories, including:

    IP to Geolocation Data: The dataset provides precise mapping of IP addresses to geographical locations, making it ideal for applications that require accurate geolocation services. Geographic Data: The geospatial information contained in the dataset supports a wide range of geographic analysis, including regional behavior studies and location-based service optimization. Social Media Data: The dataset's integration with Telegram users' activities provides insights into social media behaviors across different regions, enhancing social media analytics and targeted marketing. Large Language Model (LLM) Data: The geolocation data can be used to train LLMs to better understand and generate content that is contextually relevant to specific regions. Deep Learning (DL) Data: The dataset is ideal for training deep learning models that require accurate and diverse geospatial inputs, such as those used in autonomous systems and advanced geographic analytics. Integration with Broader Data Offering: This geolocation dataset is a valuable addition to the broader data offerings from FileMarket. It can be combined with other datasets, such as web browsing behavior or social media activity data, to create comprehensive AI models that provide deep insights into user behaviors across different contexts. Whether used independently or as part of a larger data strategy, this dataset offers unique value for developers and data scientists focused on enhancing their models with precise, consented geospatial data.

  3. d

    FileMarket |AI & ML Training Data from Sotheby's International Realty | Real...

    • datarade.ai
    Updated Aug 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FileMarket (2024). FileMarket |AI & ML Training Data from Sotheby's International Realty | Real Estate Dataset for AI Agents | LLM | ML | DL Training Data [Dataset]. https://datarade.ai/data-products/filemarket-ai-ml-training-data-from-sotheby-s-internationa-filemarket
    Explore at:
    .bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
    Dataset updated
    Aug 30, 2024
    Dataset authored and provided by
    FileMarket
    Area covered
    Mali, Virgin Islands (British), Montenegro, Ethiopia, Ukraine, Palestine, United Republic of, Bolivia (Plurinational State of), Sint Maarten (Dutch part), Togo
    Description

    The Sotheby's International Realty dataset provides a premium collection of real estate data, ideal for training AI models and enhancing various business operations in the luxury real estate market. Our data is carefully curated and prepared to ensure seamless integration with your AI systems, allowing you to innovate and optimize your business processes with minimal effort. This dataset is versatile and suitable for small boutique agencies, mid-sized firms, and large real estate enterprises.

    Key features include:

    Custom Delivery Options: Data can be delivered through Rest-API, Websockets, tRPC/gRPC, or other preferred methods, ensuring smooth integration with your AI infrastructure. Vectorized Data: Choose from multiple embedding models (LLama, ChatGPT, etc.) and vector databases (Chroma, FAISS, QdrantVectorStore) for optimal AI model performance and vectorized data processing. Comprehensive Data Coverage: Includes detailed property listings, luxury market trends, customer engagement data, and agent performance metrics, providing a robust foundation for AI-driven analytics. Ease of Integration: Our dataset is designed for easy integration with existing AI systems, providing the flexibility to create AI-driven analytics, notifications, and other business applications with minimal hassle. Additional Services: Beyond data provision, we offer AI agent development and integration services, helping you seamlessly incorporate AI into your business workflows. With this dataset, you can enhance property valuation models, optimize customer engagement strategies, and perform advanced market analysis using AI-driven insights. This dataset is perfect for training AI models that require high-quality, structured data, helping luxury real estate businesses stay competitive in a dynamic market.

  4. Replication Package: Tracking the Moving Target: A Framework for Continuous...

    • zenodo.org
    application/gzip, bin +1
    Updated Apr 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maider Azanza Sesé; Maider Azanza Sesé; Beatriz Pérez Lamancha; Eneko Pizarro; Beatriz Pérez Lamancha; Eneko Pizarro (2025). Replication Package: Tracking the Moving Target: A Framework for Continuous Evaluation of LLM Test Generation in Industry [Dataset]. http://doi.org/10.5281/zenodo.15274212
    Explore at:
    pdf, bin, application/gzipAvailable download formats
    Dataset updated
    Apr 27, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Maider Azanza Sesé; Maider Azanza Sesé; Beatriz Pérez Lamancha; Eneko Pizarro; Beatriz Pérez Lamancha; Eneko Pizarro
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Replication Package: Tracking the Moving Target: A Framework for Continuous Evaluation of LLM Test Generation in Industry

    Version: 1.0 (Date: April 27, 2025)
    DOI: https://doi.org/10.5281/zenodo.14779767

    Paper Information

    Title: Tracking the Moving Target: A Framework for Continuous Evaluation of LLM Test Generation in Industry
    Authors: Maider Azanza, Beatriz Pérez Lamancha, Eneko Pizarro
    Publication: International Conference on Evaluation and Assessment in Software Engineering (EASE), 2025 edition.

    Package Overview

    This repository contains the replication package for the research paper cited above. It provides the necessary data, source code, and prompts to understand, verify, and potentially extend our findings on the continuous evaluation of LLM-based test generation in an industrial context. The data reflects evaluations conducted between November 2024 and January 2025.

    Package Contents

    1. Metrics-Results-by-Function.7z** (Archive, requires 7-Zip or compatible tool to extract)

      • Description: Contains the detailed, raw, and processed metric results for each of the 7 Java methods and classes evaluated in the study.
      • Structure: Inside this archive, you will find 7 individual .zip files, one for each function (e.g., addUser-Metrics-Results.zip, assemble-Metrics-Results.zip, ...).
      • Contents (per function zip): Each function-specific zip file typically includes:
        • Raw test cases generated by the evaluated LLMs.
        • Metric measurements (e.g., code coverage reports from SonarQube/JaCoCo).
        • Analysis or intermediate conclusions specific to that function.
        • The specific prompt variations used for that function, if applicable beyond the main prompt.
      • Purpose: Allows for in-depth analysis of LLM performance on specific methods and verification of the metric collection process described in the paper. Data collected between November 2024 and January 2025.
    2. Metric Results by function Nov. 2024 - Jan.2025.pdf (PDF Document)

      • Description: Provides a consolidated tabular view of the key raw metrics collected for each function and LLM evaluated during the November 2024 - January 2025 period.
      • Contents: Tables summarizing metrics like code coverage, number of generated tests, expert assessment scores, etc., broken down by function and LLM. This data is directly derived from the detailed results in Metrics-Results-by-Function.7z.
      • Purpose: Offers a more detailed quantitative overview than the aggregated summary, facilitating direct comparison of raw performance metrics across functions and LLMs without needing to extract all archives.
    3. Aggregated Results by function Nov. 2024 - Jan.2025.pdf (PDF Document)

      • Description: Presents a high-level summary of the evaluation results across all tested methods and LLMs.
      • Contents: Includes an aggregated metric table showing overall performance trends, potentially including the weighted metrics discussed in the paper.
      • Purpose: Provides a quick overview of the main findings and comparative performance of the LLMs according to the evaluation framework.
    4. Prompt_for_Integration_Testing-2025.pdf (PDF Document)

      • Description: The final, refined version of the prompt provided to the LLMs for generating integration test cases.
      • Contents: Details the instructions, context (including source code snippets or descriptions), constraints, and desired output format given to the LLMs. Reflects the prompt-chaining methodology described in the paper.
      • Purpose: Enables understanding of how the LLMs were instructed and allows others to reuse or adapt the prompt engineering approach.
    5. sources.tar.gz (Compressed Tar Archive, requires tar or compatible tool to extract)

      • Description: Contains the original Java source code for the 7 methods that were the targets for test generation.
      • Contents:
        • The specific Java files containing the methods under test.
        • Relevant context or dependency information needed to understand the methods' functionality and complexity.
        • May include documentation (e.g., Javadoc) describing the intended behavior of each method.
      • Purpose: Provides the necessary code context for understanding the test generation task and potentially replicating the test execution or analysis.
  5. o

    Supplementary Material for "Investigating Software Development Teams...

    • explore.openaire.eu
    Updated Jul 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Edna Dias CANEDO; Fabiano Damasceno Sousa FALCAO (2024). Supplementary Material for "Investigating Software Development Teams Members' Perceptions of Data Privacy in the Use of Large Language Models (LLMs)" [Dataset]. http://doi.org/10.5281/zenodo.13139492
    Explore at:
    Dataset updated
    Jul 26, 2024
    Authors
    Edna Dias CANEDO; Fabiano Damasceno Sousa FALCAO
    Description

    ABSTRACT: Context: Large Language Models (LLMs) have revolutionized natural language generation and understanding. However, they raise significant data privacy concerns, especially when sensitive data is processed and stored by third parties. Goal: This paper investigates the perception of software development teams members regarding data privacy when using LLMs in their professional activities. Additionally, we examine the challenges faced and the practices adopted by these practitioners. Method: We conducted a survey with 78 ICT practitioners from five regions of the country. Results: Software development teams members have basic knowledge about data privacy and LGPD, but most have never received formal training on LLMs and possess only basic knowledge about them. Their main concerns include the leakage of sensitive data and the misuse of personal data. To mitigate risks, they avoid using sensitive data and implement anonymization techniques. The primary challenges practitioners face are ensuring transparency in the use of LLMs and minimizing data collection. Software development teams members consider current legislation inadequate for protecting data privacy in the context of LLM use. Conclusions: The results reveal a need to improve knowledge and practices related to data privacy in the context of LLM use. According to software development teams members, organizations need to invest in training, develop new tools, and adopt more robust policies to protect user data privacy. They advocate for a multifaceted approach that combines education, technology, and regulation to ensure the safe and responsible use of LLMs.

  6. S

    Research Method Instruction for LLM

    • scidb.cn
    Updated Apr 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    王猛 (2025). Research Method Instruction for LLM [Dataset]. http://doi.org/10.57760/sciencedb.24108
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 25, 2025
    Dataset provided by
    Science Data Bank
    Authors
    王猛
    Description

    The Research Method Instruction Dataset contains 1,602,207 annotated sentences collected from peer-reviewed scientific articles across computer science, engineering, and biomedical domains. Each sentence is labeled as either a research method statement or non-research method statement.

  7. LLM-Assisted Content Analysis (LACA): Coded data and model reasons

    • figshare.com
    txt
    Updated Jun 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rob Chew; Micahel Wenger; John Bollenbacher; Jessica Speer; Annice Kim (2023). LLM-Assisted Content Analysis (LACA): Coded data and model reasons [Dataset]. http://doi.org/10.6084/m9.figshare.23291147.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 22, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Rob Chew; Micahel Wenger; John Bollenbacher; Jessica Speer; Annice Kim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description This resources consists of calibrations sets (N=100) for 4 publically available datasets using the LLM-Assisted Content Analysis method (LACA). Each dataset contains the following columns:

    text_id: Unique ID for each text document code_id: Unique ID for each code category text: Document text that's been coded original_code: Coded response from the original datasets replicated_code: Coded response from independent coding exercise from our study team model_code: Coded response generated from the LLM (GPT-3.5-turbo) reason: LLM generated reason for coding decision

    Additional details on methods and definitions of individual code categories are available in the following paper:

    Chew, R., Bollenbacher, J., Speer, J., Wenger., M, Kim., A. (2023) LLM-Assisted Content Analysis: Using Large Language Models to Support Deductive Coding.

    Trump Tweets

    Citation: Coe, Kevin, Berger, Julia, Blumling, Allison , Brooks, Katelyn , Giorgi, Elizabeth , Jackson, Jennifer , … Wellman, Mariah . Quantitative Content Analysis of Donald Trump’s Twitter, 2017-2019. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2020-04-01. https://doi.org/10.3886/E118603V1 Source: https://www.openicpsr.org/openicpsr/project/118603/version/V1/view

    BBC News

    Citation: Greene, D., & Cunningham, P. (2006). Practical solutions to the problem of diagonal dominance in kernel document clustering. In Proceedings of the 23rd international conference on Machine learning (pp. 377-384). Source: https://www.kaggle.com/datasets/shivamkushwaha/bbc-full-text-document-classification

    Contrarian Claims

    Citation: Coan, T. G., Boussalis, C., Cook, J., & Nanko, M. O. (2021). Computer-assisted classification of contrarian claims about climate change. Scientific reports, 11(1), 22320. Source: https://socialanalytics.ex.ac.uk/cards/data.zip

    Ukraine Water Problems

    Citation: Afanasyev S, N. B, Bodnarchuk T, S. V, M. V, T. V, Yu V, K. G, V. D, Konovalenko O, O. K, E. K, Lietytska O, O. L, V. M, Marushevska O, Mokin V, K. M, Osadcha N, O. I (2013) River Basin Management Plan for Pivdenny Bug: river basin analysis and measures Source: https://www.kaggle.com/datasets/vbmokin/nlp-reports-news-classification

  8. P

    SynthPAI Dataset

    • paperswithcode.com
    Updated Jun 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hanna Yukhymenko; Robin Staab; Mark Vero; Martin Vechev (2024). SynthPAI Dataset [Dataset]. https://paperswithcode.com/dataset/synthpai
    Explore at:
    Dataset updated
    Jun 13, 2024
    Authors
    Hanna Yukhymenko; Robin Staab; Mark Vero; Martin Vechev
    Description

    SynthPAI was created to provide a dataset that can be used to investigate the personal attribute inference (PAI) capabilities of LLM on online texts. Due to associated privacy concerns with real-world data, open datasets are rare (non-existent) in the research community. SynthPAI is a synthetic dataset that aims to fill this gap.

    Dataset Details Dataset Description SynthPAI was created using 300 GPT-4 agents seeded with individual personalities interacting with each other in a simulated online forum and consists of 103 threads and 7823 comments. For each profile, we further provide a set of personal attributes that a human could infer from the profile. We additionally conducted a user study to evaluate the quality of the synthetic comments, establishing that humans can barely distinguish between real and synthetic comments.

    Curated by: The dataset was created by SRILab at ETH Zurich. It was not created on behalf of any outside entity. Funded by: Two authors of this work are supported by the Swiss State Secretariat for Education, Research and Innovation (SERI) (SERI-funded ERC Consolidator Grant). This project did, however, not receive explicit funding by SERI and was devised independently. Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the SERI-funded ERC Consolidator Grant. Shared by: SRILab at ETH Zurich Language(s) (NLP): English License: CC-BY-NC-SA-4.0

    Dataset Sources

    Repository: https://github.com/eth-sri/SynthPAI Paper: https://arxiv.org/abs/2406.07217

    Uses The dataset is intended to be used as a privacy-preserving method of (i) evaluating PAI capabilities of language models and (ii) aiding the development of potential defenses against such automated inferences.

    Direct Use As in the associated paper , where we include an analysis of the personal attribute inference (PAI) capabilities of 18 state-of-the-art LLMs across different attributes and on anonymized texts.

    Out-of-Scope Use The dataset shall not be used as part of any system that performs attribute inferences on real natural persons without their consent or otherwise maliciously.

    Dataset Structure We provide the instance descriptions below. Each data point consists of a single comment (that can be a top-level post):

    Comment

    author str: unique identifier of the person writing

    username str: corresponding username

    parent_id str: unique identifier of the parent comment

    thread_id str: unique identifier of the thread

    children list[str]: unique identifiers of children comments

    profile Profile: profile making the comment - described below

    text str: text of the comment

    guesses list[dict]: Dict containing model estimates of attributes based on the comment. Only contains attributes for which a prediction exists.

    reviews dict: Dict containing human estimates of attributes based on the comment. Each guess contains a corresponding hardness rating (and certainty rating). Contains all attributes

    The associated profiles are structured as follows

    Profile

    username str: identifier

    attributes: set of personal attributes that describe the user (directly listed below)

    The corresponding attributes and values are

    Attributes

    Age continuous [18-99] The age of a user in years.

    Place of Birth tuple [city, country] The place of birth of a user. We create tuples jointly for city and country in free-text format. (field name: birth_city_country)

    Location tuple [city, country] The current location of a user. We create tuples jointly for city and country in free-text format. (field name: city_country)

    Education free-text We use a free-text field to describe the user's education level. This includes additional details such as the degree and major. To ensure comparability with the evaluation of prior work, we later map these to a categorical scale: high school, college degree, master's degree, PhD.

    Income Level free-text [low, medium, high, very high] The income level of a user. We first generate a continuous income level in the profile's local currency. In our code, we map this to a categorical value considering the distribution of income levels in the respective profile location. For this, we roughly follow the local equivalents of the following reference levels for the US: Low (<30k USD), Middle (30-60k USD), High (60-150k USD), Very High (>150k USD).

    Occupation free-text The occupation of a user, described as a free-text field.

    Relationship Status categorical [single, In a Relationship, married, divorced, widowed] The relationship status of a user as one of 5 categories.

    Sex categorical [Male, Female] Biological Sex of a profile.

    Dataset Creation Curation Rationale SynthPAI was created to provide a dataset that can be used to investigate the personal attribute inference (PAI) capabilities of LLM on online texts. Due to associated privacy concerns with real-world data, open datasets are rare (non-existent) in the research community. SynthPAI is a synthetic dataset that aims to fill this gap. We additionally conducted a user study to evaluate the quality of the synthetic comments, establishing that humans can barely distinguish between real and synthetic comments.

    Source Data The dataset is fully synthetic and was created using GPT-4 agents (version gpt-4-1106-preview) seeded with individual personalities interacting with each other in a simulated online forum.

    Data Collection and Processing The dataset was created by sampling comments from the agents in threads. A human then inferred a set of personal attributes from sets of comments associated with each profile. Further, it was manually reviewed to remove any offensive or inappropriate content. We give a detailed overview of our dataset-creation procedure in the corresponding paper.

    Annotations

    Annotations are provided by authors of the paper.

    Personal and Sensitive Information

    All contained personal information is purely synthetic and does not relate to any real individual.

    Bias, Risks, and Limitations All profiles are synthetic and do not correspond to any real subpopulations. We provide a distribution of the personal attributes of the profiles in the accompanying paper. As the dataset has been created synthetically, data points can inherit limitations (e.g., biases) from the underlying model, GPT-4. While we manually reviewed comments individually, we cannot provide respective guarantees.

    Citation BibTeX:

    @misc{2406.07217, Author = {Hanna Yukhymenko and Robin Staab and Mark Vero and Martin Vechev}, Title = {A Synthetic Dataset for Personal Attribute Inference}, Year = {2024}, Eprint = {arXiv:2406.07217}, } APA:

    Hanna Yukhymenko, Robin Staab, Mark Vero, Martin Vechev: “A Synthetic Dataset for Personal Attribute Inference”, 2024; arXiv:2406.07217.

    Dataset Card Authors

    Hanna Yukhymenko Robin Staab Mark Vero

  9. s

    X (Twitter) IDs and LLM-Generated Analyses for Economic Narratives: Datasets...

    • socialmediaarchive.org
    zip
    Updated Apr 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). X (Twitter) IDs and LLM-Generated Analyses for Economic Narratives: Datasets for Pre-pandemic (2007-2020) and Post-LLM training cutoff (2021-2023) [Dataset]. https://socialmediaarchive.org/record/77
    Explore at:
    zip(21172538), zip(131733)Available download formats
    Dataset updated
    Apr 2, 2025
    Description

    This research comprises two distinct collections of economy-related posts from the X (formerly Twitter) platform – one spanning 2007-2020 (pre-pandemic) and the other 2021-2023 (post LLM training cutoff) – alongside corresponding LLM-generated analyses of the 2021-2023 posts. These collections, curated using targeted keywords, along with the LLM analyses, are provided to facilitate investigations into the potential of economic narratives and their influence. For more information about the data collection methodology, please refer to the paper:
    Almog Gueta, Amir Feder, Zorik Gekhman, Ariel Goldstein, & Roi Reichart. (2025). Can LLMs Learn Macroeconomic Narratives from Social Media? " https://doi.org/10.48550/arXiv.2406.12109">arXiv:2406.12109.

    The data provided here are the post (tweet) IDs for the pre-pandemic dataset and the LLM-generated analyses for the pre-pandemic data collection. The post-LLM training cutoff data collection could not be shared due to platform data sharing restrictions.

  10. S

    Method Value Instruction for LLM

    • scidb.cn
    Updated Apr 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WANG Meng (2025). Method Value Instruction for LLM [Dataset]. http://doi.org/10.57760/sciencedb.24104
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 23, 2025
    Dataset provided by
    Science Data Bank
    Authors
    WANG Meng
    Description

    The Method Value Instruction contains 11,150 annotated sentences collected from peer-reviewed scientific articles across computer science, engineering, and biomedical domains. Each sentence is labeled as either a research question statement or non-research question statement.

  11. [WSESE] [Prompt Engineering in Data Analysis] Included and Excluded Papers

    • figshare.com
    xlsx
    Updated Feb 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucas Valença; Ronnie de Souza Santos; Reydne Santos; Matheus de Morais Leça (2025). [WSESE] [Prompt Engineering in Data Analysis] Included and Excluded Papers [Dataset]. http://doi.org/10.6084/m9.figshare.28326737.v6
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 2, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Lucas Valença; Ronnie de Souza Santos; Reydne Santos; Matheus de Morais Leça
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Context. The use of large language models for qualitative analysis is gaining attention in various fields, including software engineering, where qualitative methods are essential to understanding human and social factors. Goal. This study aimed to investigate how LLMs are currently used in qualitative analysis and how they can be used in software engineering research, focusing on identifying the benefits, limitations, and practices associated with their application. Method. We conducted a systematic mapping study and analyzed 21 relevant studies to explore reports of using LLM for qualitative analysis reported in the literature. Findings. Our findings indicate that LLMs are primarily used for tasks such as coding, thematic analysis, and data categorization, with benefits including increased efficiency and support for new researchers. However, limitations such as output variability, challenges capturing nuanced perspectives, and ethical concerns regarding privacy and transparency were also evident. Discussions. The study highlights the need for structured strategies and guidelines to optimize LLM use in qualitative research within software engineering. Such strategies could enhance the effectiveness of LLMs while addressing ethical considerations. Conclusion. While LLMs show promise for supporting qualitative analysis, human expertise remains essential for data interpretation, and continued exploration of best practices will be crucial for their effective integration into empirical software engineering research.

  12. D

    Data from: Towards Cross-Modality Modeling for Time Series Analytics: A...

    • researchdata.ntu.edu.sg
    Updated May 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DR-NTU (Data) (2025). Towards Cross-Modality Modeling for Time Series Analytics: A Survey in the LLM Era [Dataset]. http://doi.org/10.21979/N9/I0HOYZ
    Explore at:
    Dataset updated
    May 13, 2025
    Dataset provided by
    DR-NTU (Data)
    License

    https://researchdata.ntu.edu.sg/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.21979/N9/I0HOYZhttps://researchdata.ntu.edu.sg/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.21979/N9/I0HOYZ

    Description

    The proliferation of edge devices has generated an unprecedented volume of time series data across different domains, motivating various well-customized methods. Recently, Large Language Models (LLMs) have emerged as a new paradigm for time series analytics by leveraging the shared sequential nature of textual data and time series. However, a fundamental cross-modality gap between time series and LLMs exists, as LLMs are pre-trained on textual corpora and are not inherently optimized for time series. Many recent proposals are designed to address this issue. In this survey, we provide an up-to-date overview of LLMs-based cross-modality modeling for time series analytics. We first introduce a taxonomy that classifies existing approaches into four groups based on the type of textual data employed for time series modeling. We then summarize key cross-modality strategies, e.g., alignment and fusion, and discuss their applications across a range of downstream tasks. Furthermore, we conduct experiments on multimodal datasets from different application domains to investigate effective combinations of textual data and cross-modality strategies for enhancing time series analytics. Finally, we suggest several promising directions for future research. This survey is designed for a range of professionals, researchers, and practitioners interested in LLM-based time series modeling.

  13. z

    A dataset to assess Microsoft Copilot Answers \\ in the Context of Swiss,...

    • zenodo.org
    csv
    Updated Jan 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salvatore Romano; Salvatore Romano; Riccardo Angius; Riccardo Angius; Andreas Kaltenbrunner; Andreas Kaltenbrunner (2024). A dataset to assess Microsoft Copilot Answers \\ in the Context of Swiss, Bavarian and Hesse Elections. [Dataset]. http://doi.org/10.5281/zenodo.10517697
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 16, 2024
    Dataset provided by
    Zenodo
    Authors
    Salvatore Romano; Salvatore Romano; Riccardo Angius; Riccardo Angius; Andreas Kaltenbrunner; Andreas Kaltenbrunner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Sep 21, 2023
    Description

    This readme file was generated on 2024-01-15 by Salvatore Romano

    GENERAL INFORMATION

    Title of Dataset:
    A dataset to assess Microsoft Copilot Answers in the Context of Swiss, Bavarian and Hesse Elections.

    Author/Principal Investigator Information
    Name: Salvatore Romano
    ORCID: 0000-0003-0856-4989
    Institution: Universitat Oberta de Catalunya, AID4So.
    Address: Rambla del Poblenou, 154. 08018 Barcelona.
    Email: salvatore@aiforensics.org

    Author/Associate or Co-investigator Information
    Name: Riccardo Angius
    ORCID: 0000-0003-0291-3332
    Institution: Ai Forensics
    Address: Paris, France.
    Email: riccardo@aiforensics.org


    Date of data collection:
    from 2023-09-21 to 2023-10-02.

    Geographic location of data collection:
    Switzerland and Germany.

    Information about funding sources that supported the collection of the data:
    The data collection and analysis was supported by AlgorithmWatch's DataSkop project, funded by Germany’s Federal Ministry of Education and Research (BMBF) as part of the program “Mensch-Technik-Interaktion” (human-technology interaction). dataskop.net
    In Switzerland, the investigation was realized with the support of Stiftung Mercator Schweiz.
    AI Forensics contribution was supported in part by the Open Society Foundations.
    AI Forensics data collection infrastructure is supported by the Bright Initiative.

    SHARING/ACCESS INFORMATION

    Licenses/restrictions placed on the data:
    This publication is licensed under a Creative Commons Attribution 4.0 International License.
    https://creativecommons.org/licenses/by/4.0/deed.en

    Links to publications that cite or use the data:
    https://aiforensics.org//uploads/AIF_AW_Bing_Chat_Elections_Report_ca7200fe8d.pdf

    Links to other publicly accessible locations of the data:
    NA

    Links/relationships to ancillary data sets:
    NA

    Was data derived from another source?
    NA
    If yes, list source(s):

    Recommended citation for this dataset:
    S. Romano, R. Angius, N. Kerby, P. Bouchaud, J. Amidei, A. Kaltenbrunner. 2024. A dataset to assess Microsoft Copilot Answers in the Context of Swiss, Bavarian and Hesse Elections. https://aiforensics.org//uploads/AIF_AW_Bing_Chat_Elections_Report_ca7200fe8d.pdf


    DATA & FILE OVERVIEW

    File List:
    Microsof-Copilot-Answers_in-Swiss-Bavarian-Hess-Elections.csv
    The only dataset for this research. It includes rows with prompts and responses from Microsoft Copilot, along with associated metadata for each entry.

    Relationship between files, if important:
    NA

    Additional related data collected that was not included in the current data package:
    NA

    Are there multiple versions of the dataset?
    NA
    If yes, name of file(s) that was updated:
    Why was the file updated?
    When was the file updated?


    METHODOLOGICAL INFORMATION

    Description of methods used for collection/generation of data:
    In our algorithmic auditing research, we adopted for a sock-puppet audit methodology (Sandvig at Al., 2014). This method aligns with the growing interdisciplinary focus on algorithm audits, which prioritize fairness, accountability, and transparency to uncover biases in algorithmic systems (Bandy, 2021). Sock-puppet auditing offers a fully controlled environment to understand the behavior of the system.

    Every sample was collected by running a new browser instance connected to the internet via a network of VPNs and residential IPs based in Switzerland and Germany, then accessing Microsoft Copilot through its official URL. Every time, the settings for Language and Country/Region were set to match those of potential voters from the respective regions (English, German, French, or Italian, and Switzerland or Germany). We did not simulate any form of user history or additional personalization. Importantly, Microsoft Copilot's default settings remained unchanged, ensuring that all interactions occurred in the ``Conversation Style" set as ``Balanced".

    Sandvig, C.; Hamilton, K.; Karahalios, K.; and Langbort, C. 2014. Auditing algorithms: Research methods for detecting discrimination on internet platforms. Data and discrimination: converting critical concerns into productive inquiry,
    22(2014): 4349–4357.

    Bandy, J. 2021. Problematic machine behavior: A systematic literature review of algorithm audits. Proceedings of the acm on human-computer interaction, 5(CSCW1): 1–34

    Methods for processing the data:
    The process involved analyzing the HTML code of the web pages that were accessed. During this examination, key metadata were identified and extracted from the HTML structure. Once this information was successfully extracted, the rest of the HTML page, which primarily consisted of code and elements not pertinent to the needed information, was discarded. This approach ensured that only the most relevant and useful data was retained, while all unnecessary and extraneous HTML components were efficiently removed, streamlining the data collection and analysis process.

    Instrument- or software-specific information needed to interpret the data:
    NA

    Standards and calibration information, if appropriate:
    NA

    Environmental/experimental conditions:
    NA

    Describe any quality-assurance procedures performed on the data:
    NA

    People involved with sample collection, processing, analysis and/or submission:
    Salvatore Romano, Riccardo Angius, Natalie Kerby, Paul Bouchaud, Jacopo Amidei, Andreas Kaltenbrunner.

    DATA-SPECIFIC INFORMATION FOR:
    Microsof-Copilot-Answers_in-Swiss-Bavarian-Hess-Elections.csv

    Number of variables: Number of Variables:
    33

    Number of cases/rows:
    5562

    Variable List:
    prompt - (object) Text of the prompt.
    answer - (object) Text of the answer.
    country - (object) Country information.
    language - (object) Language of the text.
    input_conversation_id - (object) Identifier for the conversation.
    conversation_group_ids - (object) Group IDs for the conversation.
    conversation_group_names - (object) Group names for the conversation.
    experiment_id - (object) Identifier for the experiment group.
    experiment_name - (object) Name of the experiment group.
    begin - (object) Start time.
    end - (object) End time.
    datetime - (int64) Datetime stamp.
    week - (int64) Week number.
    attributions - (object) Link quoted in the text.
    attribution_links - (object) Links for attributions.
    search_query - (object) Search query used by the chatbot.
    unlabelled - (int64) Unlabelled flag.
    exploratory_sample - (int64) Exploratory sample flag.
    very_relevant - (int64) Very relevant flag.
    needs_review - (int64) Needs review flag.
    misleading_factual_error - (int64) Misleading factual error flag.
    nonsense_factual_error - (int64) Nonsense factual error flag.
    rejects_question_framing - (int64) Rejects question framing flag.
    deflection - (int64) Deflection flag.
    shield - (int64) Shield flag.
    wrong_answer_language - (int64) Wrong answer language flag.
    political_imbalance - (int64) Political imbalance flag.
    refusal - (int64) Refusal flag.
    factual_error - (int64) Factual error flag.
    evasion - (int64) Evasion flag.
    absolutely_accurate - (int64) Absolutely accurate flag.
    macrocategory - (object) Macro-category of the content.

    Missing data codes:
    NA

    Specialized formats or other abbreviations used:
    NA

  14. h

    clinical-synthetic-text-llm

    • huggingface.co
    Updated Jul 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ran Xu (2024). clinical-synthetic-text-llm [Dataset]. https://huggingface.co/datasets/ritaranx/clinical-synthetic-text-llm
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 5, 2024
    Authors
    Ran Xu
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Data Description

    We release the synthetic data generated using the method described in the paper Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models (ACL 2024 Findings). The external knowledge we use is based on LLM-generated topics and writing styles.

      Generated Datasets
    

    The original train/validation/test data, and the generated synthetic training data are listed as follows. For each dataset, we generate 5000… See the full description on the dataset page: https://huggingface.co/datasets/ritaranx/clinical-synthetic-text-llm.

  15. Supported data for manuscript "Can LLM-Augmented autonomous agents...

    • zenodo.org
    Updated Dec 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ruben Manrique; Manuel Mosquera; Juan Sebastian Pinzon; Manuel Rios; Nicanor Quijano; Luis Felipe Giraldo; Ruben Manrique; Manuel Mosquera; Juan Sebastian Pinzon; Manuel Rios; Nicanor Quijano; Luis Felipe Giraldo (2024). Supported data for manuscript "Can LLM-Augmented autonomous agents cooperate?, An evaluation of their cooperative capabilities through Melting Pot" [Dataset]. http://doi.org/10.5281/zenodo.14287158
    Explore at:
    Dataset updated
    Dec 6, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ruben Manrique; Manuel Mosquera; Juan Sebastian Pinzon; Manuel Rios; Nicanor Quijano; Luis Felipe Giraldo; Ruben Manrique; Manuel Mosquera; Juan Sebastian Pinzon; Manuel Rios; Nicanor Quijano; Luis Felipe Giraldo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 6, 2024
    Description

    The repository data corresponds partially to the manuscript titled "Can LLM-Augmented Autonomous Agents Cooperate? An Evaluation of Their Cooperative Capabilities through Melting Pot," submitted to IEEE Transactions on Artificial Intelligence. The dataset comprises experiments conducted with Large Language Model-Augmented Autonomous Agents (LAAs), as implemented in the ["Cooperative Agents" repository](https://github.com/Cooperative-IA/CooperativeGPT/tree/main), using substrates from the Melting Pot framework.

    Dataset Scope

    This dataset is divided into two main experiment categories:

    1. Personality_experiments:

      • These focus on a single scenario (Commons Harvest) to assess various agent personalities and their cooperative dynamics.
    2. Comparison_baselines_experiments:

      • These experiments include three distinct scenarios designed by Melting Pot:
        • Commons Harvest Open
        • Externally Mushrooms
        • Coins

    These scenarios evaluate different cooperative and competitive behaviors among agents and are used to compare decision-making architectures of LAAs against reinforcement learning (RL) baselines. Unlike the Personality_experiments, these comparisons do not involve bots but exclusively analyze RL and LAA architectures.

    Scenarios and Metrics

    The metrics and indicators extracted from the experiments depend on the scenario being evaluated:

    1. Commons Harvest Open:

      • Focus: Resource consumption and environmental impact.
      • Metrics include:
        • Number of apples consumed.
        • Devastation of trees (i.e., depletion of resources).
    2. Externally Mushrooms:

      • Focus: Self-interest vs. collective benefit.
      • Agents consume mushrooms with different outcomes:
        • Mushrooms that benefit the individual.
        • Mushrooms that benefit everyone.
        • Mushrooms that benefit only others.
        • Mushrooms that benefit the individual but penalize others.
      • Metrics evaluate trade-offs between individual gain and collective welfare.
    3. Coins:

      • Focus: Reciprocity and fairness.
      • Agents collect coins with two options:
        • Collect their own color coin for a reward.
        • Collect a different color coin, which grants a reward to the agent but penalizes the other.
      • Metrics include reciprocity rates and the balance of mutual benefits.

    Objectives of Comparison Experiments

    The Comparison_baselines_experiments aim to:

    1. Assess how LAAs compare to RL baselines in cooperative and competitive tasks across diverse scenarios.
    2. Compare decision-making architectures within LAAs, including chain-of-thought and generative approaches.

    These experiments help evaluate the robustness of LAAs in scenarios with varying complexity and social dilemmas, providing insights into their potential applications in real-world cooperative systems.

    Simulation Details (Applicable to All Experiments)

    In each simulation:

    1. Participants:

      • Experiments involve predefined numbers of LAAs or RL agents.
      • No bots are included in Comparison_baselines_experiments.
    2. Action Dynamics:

      • Each agent performs high-level actions sequentially.
      • Simulations conclude either after reaching a preset maximum number of rounds (typically 100) or prematurely if the scenario's resources are fully depleted.
    3. Metrics and Indicators:

      • Extracted metrics depend on the scenario and include measures of individual performance, collective outcomes, and agent reciprocity.

    This repository enables reproducibility and serves as a benchmark for advancing research into cooperative and competitive behaviors in LLM-based agents.

  16. Z

    USPTO-LLM: A Large Language Model-Assisted Information-enriched Chemical...

    • data.niaid.nih.gov
    Updated Dec 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gong, Shukai (2024). USPTO-LLM: A Large Language Model-Assisted Information-enriched Chemical Reaction Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11464251
    Explore at:
    Dataset updated
    Dec 12, 2024
    Dataset provided by
    Xu, Hongteng
    Yuan, Shen
    Gong, Shukai
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    USPTO-LLM is an information-enriched chemical reaction dataset that provides more side information (reaction conditions and reaction steps division) for developing new reaction prediction and retrosynthesis methods and inspires new problems, such as reaction condition prediction. It comprises over 247K chemical reactions extracted from the patent documents of USPTO (United States Patent and Trademark Office), encompassing abundant information on reaction conditions.

    We employ large language models to expedite the data collection procedures automatically with a reliable quality control process. The extracted chemical reactions are organized as heterogeneous directed graphs, allowing us to formulate a series of prediction tasks, such as reaction prediction, retrosynthesis, and reaction condition prediction, in a unified graph-filling framework.

  17. LLM Influence on Medical Diagnostic Reasoning

    • kaggle.com
    Updated Dec 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick L Ford (2024). LLM Influence on Medical Diagnostic Reasoning [Dataset]. http://doi.org/10.34740/kaggle/dsv/10119916
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 6, 2024
    Dataset provided by
    Kaggle
    Authors
    Patrick L Ford
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Introduction

    A new study published in JAMA Network Open revealed that ChatGPT-4 outperformed doctors in diagnosing medical conditions from case reports. The AI chatbot scored an average of 92% in the study, while doctors using the chatbot scored 76% and those without it scored 74%.

    The study involved 50 doctors (26 attending, 24 residents; median years in practice, 3 [IQR, 2-8]) who were given six case histories and graded on their ability to suggest diagnoses and explain their reasoning. The results showed that doctors often stuck to their initial diagnoses even when the chatbot suggested a better one, highlighting an overconfidence bias. Additionally, many doctors didn't fully utilise the chatbot's capabilities, treating it like a search engine instead of leveraging its ability to analyse full case histories.

    The study raises questions about how doctors think and how AI tools can be best integrated into medical practice. While AI has the potential to be a "doctor extender," providing valuable second opinions, the study suggests that more training and a shift in mindset may be needed for doctors to fully embrace and benefit from these advancements. link

    Study Findings

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13231939%2F4e4c6a4ce9f191ab32e660c726c5204f%2FScreenshot%202024-12-05%2013.33.30.png?generation=1733490846716451&alt=media" alt="">

    Visualisation

    The study compares the diagnostic reasoning performance of physicians using a commercial LLM AI chatbot (ChatGPT Plus [GPT-4]: OpenAl) compared with conventional diagnostic resources (eg, UpToDate, Google): - ***Conventional Resources*-Only Group (Doctor on Own):** This group refers to doctors using only conventional resources (likely standard medical tools and knowledge) without the assistance of an LLM (large language model). - Doctor With LLM Group: This group involves doctors using conventional resources along with an LLM, which could be a tool or AI assistant helping with diagnostic reasoning. - ***LLM Alone* Group:** This group refers to the use of the LLM on its own, without any conventional resources or doctor intervention.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13231939%2F7360932a01d641b6adc3594b2e5cae11%2FScreenshot%202024-12-06%2012.11.05.png?generation=1733490890087478&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13231939%2F7e14a7c648febf04ac657f8dc51ea796%2FScreenshot%202024-12-06%2012.11.58.png?generation=1733490908679868&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13231939%2F9b9d165a7c69b1a5624186b7904c46c0%2FScreenshot%202024-12-06%2012.12.41.png?generation=1733490932343833&alt=media" alt="">

    A Markdown document with the R code for the above plots. link

    Conclusion

    This study reveals a fascinating and potentially transformative dynamic between artificial intelligence and human medical expertise. While ChatGPT-4 demonstrated remarkable diagnostic accuracy, surpassing even experienced physicians, the study also highlighted critical challenges in integrating AI into clinical practice.

    The findings suggest that: - AI can significantly enhance diagnostic accuracy: LLMs like ChatGPT-4 have the potential to revolutionise how medical diagnoses are made, offering a level of accuracy exceeding current practices. - Human factors remain crucial: Overconfidence bias and under-utilisation of AI tools by physicians underscore the need for training and a shift in mindset to effectively leverage these advancements. Doctors must learn to collaborate with AI, viewing it as a powerful partner rather than a simple search engine. - Further research is needed: This study provides a crucial starting point for further investigation into the optimal integration of AI into healthcare. Future research should explore: - Effective training methods for physicians to utilise AI tools. - The impact of AI assistance on patient outcomes. - Ethical considerations surrounding the use of AI in medicine. - The potential for AI to address healthcare disparities.

    Ultimately, the successful integration of AI into healthcare will depend not only on technological advancements but also on a willingness among medical professionals to embrace new ways of thinking and working. By harnessing the power of AI while recognising the essential role of human expertise, we can strive towards a future where medical care is more accurate, efficient, and accessible for all.

    Patrick Ford 🥼🩺🖥

  18. D

    Data Annotation Tools Market Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Data Annotation Tools Market Report [Dataset]. https://www.archivemarketresearch.com/reports/data-annotation-tools-market-4890
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Feb 18, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    global
    Variables measured
    Market Size
    Description

    The Data Annotation Tools Market size was valued at USD 1.31 billion in 2023 and is projected to reach USD 6.72 billion by 2032, exhibiting a CAGR of 26.3 % during the forecasts period. The data annotation tools market is concerned with software applications that are used to tag as well as sort data for machine learning and artificial intelligence industries. They assist in development of training sets by tagging images, text, voice and video by relevant data and information. Some of the API’s that use reinforcement learning include training models for computer vision, natural language processing and speech recognition. Currently, tendencies in the market refer to the use of automated and semiautomated techniques for the process of annotation. Also, there is a rise in the demand for annotation tool with support for various form of data and support for AI marketing platforms. The application of AI and machine learning solutions in several industries is boosting the demand hence continues to propel the growth and competition in the market. Recent developments include: In November 2023, Appen Limited, a high-quality data provider for the AI lifecycle, chose Amazon Web Services (AWS) as its primary cloud for AI solutions and innovation. As Appen utilizes additional enterprise solutions for AI data source, annotation, and model validation, the firms are expanding their collaboration with a multi-year deal. Appen is strengthening its AI data platform, which serves as the bridge between people and AI, by integrating cutting-edge AWS services. , In September 2023, Labelbox launched Large Language Model (LLM) solution to assist organizations in innovating with generative AI and deepen the partnership with Google Cloud. With the introduction of large language models (LLMs), enterprises now have a plethora of chances to generate new competitive advantages and commercial value. LLM systems have the ability to revolutionize a wide range of intelligent applications; nevertheless, in many cases, organizations will need to adjust or finetune LLMs in order to align with human preferences. Labelbox, as part of an expanded cooperation, is leveraging Google Cloud's generative AI capabilities to assist organizations in developing LLM solutions with Vertex AI. Labelbox's AI platform will be integrated with Google Cloud's leading AI and Data Cloud tools, including Vertex AI and Google Cloud's Model Garden repository, allowing ML teams to access cutting-edge machine learning (ML) models for vision and natural language processing (NLP) and automate key workflows. , In March 2023, has released the most recent version of Enlitic Curie, a platform aimed at improving radiology department workflow. This platform includes Curie|ENDEX, which uses natural language processing and computer vision to analyze and process medical images, and Curie|ENCOG, which uses artificial intelligence to detect and protect medical images in Health Information Security. , In November 2022, Appen Limited, a global leader in data for the AI Lifecycle, announced its partnership with CLEAR Global, a nonprofit organization dedicated to ensuring access to essential information and amplifying voices across languages. This collaboration aims to develop a speech-based healthcare FAQ bot tailored for Sheng, a Nairobi slang language. .

  19. ArticleSet2

    • figshare.com
    xlsx
    Updated Mar 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zsolt Tibor Dr. habil. Kosztyán; Tünde Király; Tibor Csizmadia; Attila Katona; Ágnes Vathy-Fogarassy (2025). ArticleSet2 [Dataset]. http://doi.org/10.6084/m9.figshare.28654973.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 24, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Zsolt Tibor Dr. habil. Kosztyán; Tünde Király; Tibor Csizmadia; Attila Katona; Ágnes Vathy-Fogarassy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This file is formatted as an Excel table (.xlsx format). Data table of pre-classified articles from medical sciences based on research methodology. In the data table, four columns are included:Title of Articles: The title of the collected and classified articleChatGPT: The predicted research methodology using ChatGPT platformOriginal: Research methodology classification by expertsNotebooklm: The predicted research methodology using Notebooklm platformThe dataset was used to train and validate classifiers to predict the applied research methodology of the papers such as (1) quantitative or (2) qualitative.In the case of the AI platforms, "Mixed / Unclear" category is also used where the platform could not predict the applied research methodology.

  20. v

    Global Large Language Model (LLM) Market Size By Component, By Application,...

    • verifiedmarketresearch.com
    Updated Jul 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VERIFIED MARKET RESEARCH (2024). Global Large Language Model (LLM) Market Size By Component, By Application, By Deployment Mode, By Organization Size, By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/large-language-model-llm-market/
    Explore at:
    Dataset updated
    Jul 25, 2024
    Dataset authored and provided by
    VERIFIED MARKET RESEARCH
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2024 - 2031
    Area covered
    Global
    Description

    Large Language Model (LLM) Market size was valued at USD 4.6 Billion in 2023 and is projected to reach USD 64.9 Billion by 2031, growing at a CAGR of 32.1% during the forecast period 2024-2031.

    Global Large Language Model (LLM) Market Drivers

    The market drivers for the Large Language Model (LLM) Market can be influenced by various factors. These may include:

    Advancements in AI and Machine Learning: Continuous improvements in AI algorithms and machine learning techniques are pushing the capabilities of (LLM), making them more attractive for a variety of applications.

    Increasing Demand for Automation: Businesses and industries are increasingly seeking automation solutions for customer service, content creation, and data analysis, which drives the demand for (LLM).

    Rising Investments in AI: There is a significant influx of investments from both private and public sectors in AI research and development, fostering the growth of the (LLM) market.

    Expanding Application Areas: (LLM) are being applied in a wider range of fields such as healthcare, finance, legal, and education, which broadens their market scope.

    Enhanced Computing Power: Improvements in computing infrastructure, including the advent of advanced GPUs and cloud computing services, are making it feasible to train and deploy large language models more efficiently.

    Growing Digital Transformation Initiatives: Companies undergoing digital transformation are adopting (LLM) to leverage their capabilities in natural language understanding and generation for improved business processes.

    Increased Availability of Data: The abundance of text data from the internet and other sources provides the necessary training material for developing more sophisticated (LLM).

    Consumer Demand for Better User Experiences: There is a growing expectation for intuitive and responsive user interfaces enabled by (LLM), particularly in applications like virtual assistants and catboats.

    Developments in Natural Language Processing: Progress in natural language processing (NLP) techniques contributes to more effective and efficient (LLM), enhancing their practical utility and market value.

    Regulatory and Compliance Requirements: Certain industries are leveraging (LLM) to ensure compliance with legal and regulatory standards by automating documentation and reporting tasks.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Aisha Khatun; Dan Brown (2024). TruthEval: A Dataset to Evaluate LLM Truthfulness and Reliability [Dataset]. http://doi.org/10.5683/SP3/5MZWBV

Data from: TruthEval: A Dataset to Evaluate LLM Truthfulness and Reliability

Related Article
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 30, 2024
Dataset provided by
Borealis
Authors
Aisha Khatun; Dan Brown
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Large Language Model (LLM) evaluation is currently one of the most important areas of research, with existing benchmarks proving to be insufficient and not completely representative of LLMs' various capabilities. We present a curated collection of challenging statements on sensitive topics for LLM benchmarking called TruthEval. These statements were curated by hand and contain known truth values. The categories were chosen to distinguish LLMs' abilities from their stochastic nature. Details of collection method and use cases can be found in this paper: TruthEval: A Dataset to Evaluate LLM Truthfulness and Reliability

Search
Clear search
Close search
Google apps
Main menu