19 datasets found

d
AI Training Data | Annotated Checkout Flows for Retail, Restaurant, and...
datarade.ai
Updated Dec 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MealMe (2024). AI Training Data | Annotated Checkout Flows for Retail, Restaurant, and Marketplace Websites [Dataset]. https://datarade.ai/data-products/ai-training-data-annotated-checkout-flows-for-retail-resta-mealme
Explore at:
Dataset updated
Dec 18, 2024
Dataset authored and provided by
MealMe
Area covered
United States of America
Description
AI Training Data | Annotated Checkout Flows for Retail, Restaurant, and Marketplace Websites Overview

Unlock the next generation of agentic commerce and automated shopping experiences with this comprehensive dataset of meticulously annotated checkout flows, sourced directly from leading retail, restaurant, and marketplace websites. Designed for developers, researchers, and AI labs building large language models (LLMs) and agentic systems capable of online purchasing, this dataset captures the real-world complexity of digital transactions—from cart initiation to final payment.

Key Features

Breadth of Coverage: Over 10,000 unique checkout journeys across hundreds of top e-commerce, food delivery, and service platforms, including but not limited to Walmart, Target, Kroger, Whole Foods, Uber Eats, Instacart, Shopify-powered sites, and more.

Actionable Annotation: Every flow is broken down into granular, step-by-step actions, complete with timestamped events, UI context, form field details, validation logic, and response feedback. Each step includes:

Page state (URL, DOM snapshot, and metadata)

User actions (clicks, taps, text input, dropdown selection, checkbox/radio interactions)

System responses (AJAX calls, error/success messages, cart/price updates)

Authentication and account linking steps where applicable

Payment entry (card, wallet, alternative methods)

Order review and confirmation

Multi-Vertical, Real-World Data: Flows sourced from a wide variety of verticals and real consumer environments, not just demo stores or test accounts. Includes complex cases such as multi-item carts, promo codes, loyalty integration, and split payments.

Structured for Machine Learning: Delivered in standard formats (JSONL, CSV, or your preferred schema), with every event mapped to action types, page features, and expected outcomes. Optional HAR files and raw network request logs provide an extra layer of technical fidelity for action modeling and RLHF pipelines.

Rich Context for LLMs and Agents: Every annotation includes both human-readable and model-consumable descriptions:

“What the user did” (natural language)

“What the system did in response”

“What a successful action should look like”

Error/edge case coverage (invalid forms, OOS, address/payment errors)

Privacy-Safe & Compliant: All flows are depersonalized and scrubbed of PII. Sensitive fields (like credit card numbers, user addresses, and login credentials) are replaced with realistic but synthetic data, ensuring compliance with privacy regulations.

Each flow tracks the user journey from cart to payment to confirmation, including:

Adding/removing items

Applying coupons or promo codes

Selecting shipping/delivery options

Account creation, login, or guest checkout

Inputting payment details (card, wallet, Buy Now Pay Later)

Handling validation errors or OOS scenarios

Order review and final placement

Confirmation page capture (including order summary details)

Why This Dataset?

Building LLMs, agentic shopping bots, or e-commerce automation tools demands more than just page screenshots or API logs. You need deeply contextualized, action-oriented data that reflects how real users interact with the complex, ever-changing UIs of digital commerce. Our dataset uniquely captures:

The full intent-action-outcome loop

Dynamic UI changes, modals, validation, and error handling

Nuances of cart modification, bundle pricing, delivery constraints, and multi-vendor checkouts

Mobile vs. desktop variations

Diverse merchant tech stacks (custom, Shopify, Magento, BigCommerce, native apps, etc.)

Use Cases

LLM Fine-Tuning: Teach models to reason through step-by-step transaction flows, infer next-best-actions, and generate robust, context-sensitive prompts for real-world ordering.

Agentic Shopping Bots: Train agents to navigate web/mobile checkouts autonomously, handle edge cases, and complete real purchases on behalf of users.

Action Model & RLHF Training: Provide reinforcement learning pipelines with ground truth “what happens if I do X?” data across hundreds of real merchants.

UI/UX Research & Synthetic User Studies: Identify friction points, bottlenecks, and drop-offs in modern checkout design by replaying flows and testing interventions.

Automated QA & Regression Testing: Use realistic flows as test cases for new features or third-party integrations.

What’s Included

10,000+ annotated checkout flows (retail, restaurant, marketplace)

Step-by-step event logs with metadata, DOM, and network context

Natural language explanations for each step and transition

All flows are depersonalized and privacy-compliant

Example scripts for ingesting, parsing, and analyzing the dataset

Flexible licensing for research or commercial use

Sample Categories Covered

Grocery delivery (Instacart, Walmart, Kroger, Target, etc.)

Restaurant takeout/delivery (Ub...
APIGen-MT-5k
huggingface.co
Updated May 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salesforce (2025). APIGen-MT-5k [Dataset]. https://huggingface.co/datasets/Salesforce/APIGen-MT-5k
Explore at:
Dataset updated
May 16, 2025
Dataset provided by
Salesforce Inchttp://salesforce.com/
Authors
Salesforce
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Summary

APIGen-MT is an automated agentic data generation pipeline designed to synthesize verifiable, high-quality, realistic datasets for agentic applications This dataset was released as part of APIGen-MT: Agentic PIpeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay Code: https://github.com/apigen-mt/apigen-mt.github.io The repo contains 5000 multi-turn trajectories collected by APIGen-MT This dataset is a subset of the data used to train the xLAM-2 model… See the full description on the dataset page: https://huggingface.co/datasets/Salesforce/APIGen-MT-5k.
h
uk_retail_store_synthetic_dataset
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Syncora.ai - Agentic Synthetic Data Platform, uk_retail_store_synthetic_dataset [Dataset]. https://huggingface.co/datasets/syncora/uk_retail_store_synthetic_dataset
Explore at:
Authors
Syncora.ai - Agentic Synthetic Data Platform
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Area covered
United Kingdom
Description
Synthetic Data Generation Demo — UK Retail Dataset

Welcome to this synthetic data generation demo repository by Syncora.ai. This project showcases how to generate synthetic data using real-world tabular structures, demonstrated on a UK retail dataset with columns such as:

Country
CustomerID
UnitPrice
InvoiceDate
Quantity
StockCode

This dataset is designed for dataset for LLM training and AI development, enabling developers to work with privacy-safe, high-quality… See the full description on the dataset page: https://huggingface.co/datasets/syncora/uk_retail_store_synthetic_dataset.
Example of Labeling.
plos.figshare.com
xls
Updated Aug 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Libo Yang; Yuan Li; Junhua Tan; Libo Mao (2025). Example of Labeling. [Dataset]. http://doi.org/10.1371/journal.pone.0330258.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0330258.t001
Dataset updated
Aug 26, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Libo Yang; Yuan Li; Junhua Tan; Libo Mao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Traditional knowledge graphs of water conservancy project risks have supported risk decision-making. However, they are constrained by limited data modalities and low accuracy in information extraction. A multimodal water conservancy project risk knowledge graph is proposed in this study, along with a synergistic strategy involving multimodal large language models Risk decision-making generation is facilitated through a multi-agent agentic retrieval-augmented generation framework. To enhance visual recognition, a DenseNet-based image classification model is improved by incorporating single-head self-attention and coordinate attention mechanisms. For textual data, risk entities such as locations, components, and events are extracted using a BERT-BiLSTM-CRF architecture. These extracted entities serve as the foundation for constructing the multimodal knowledge graph. To support generation, a multi-agent agentic retrieval-augmented generation mechanism is introduced. This mechanism enhances the reliability and interpretability of risk decision-making outputs. In experiments, the enhanced DenseNet model outperforms the original baseline in both precision and recall for image recognition tasks. In risk decision-making tasks, the proposed approach—combining a multimodal knowledge graph with a multi-agent agentic retrieval-augmented generation method—achieves strong performance on BERTScore and ROUGE-L metrics. This work presents a novel perspective for leveraging multimodal knowledge graphs in water conservancy project risk management.
Results of module ablation on the validation set.
figshare.com
xls
Updated Aug 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Libo Yang; Yuan Li; Junhua Tan; Libo Mao (2025). Results of module ablation on the validation set. [Dataset]. http://doi.org/10.1371/journal.pone.0330258.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0330258.t005
Dataset updated
Aug 26, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Libo Yang; Yuan Li; Junhua Tan; Libo Mao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Traditional knowledge graphs of water conservancy project risks have supported risk decision-making. However, they are constrained by limited data modalities and low accuracy in information extraction. A multimodal water conservancy project risk knowledge graph is proposed in this study, along with a synergistic strategy involving multimodal large language models Risk decision-making generation is facilitated through a multi-agent agentic retrieval-augmented generation framework. To enhance visual recognition, a DenseNet-based image classification model is improved by incorporating single-head self-attention and coordinate attention mechanisms. For textual data, risk entities such as locations, components, and events are extracted using a BERT-BiLSTM-CRF architecture. These extracted entities serve as the foundation for constructing the multimodal knowledge graph. To support generation, a multi-agent agentic retrieval-augmented generation mechanism is introduced. This mechanism enhances the reliability and interpretability of risk decision-making outputs. In experiments, the enhanced DenseNet model outperforms the original baseline in both precision and recall for image recognition tasks. In risk decision-making tasks, the proposed approach—combining a multimodal knowledge graph with a multi-agent agentic retrieval-augmented generation method—achieves strong performance on BERTScore and ROUGE-L metrics. This work presents a novel perspective for leveraging multimodal knowledge graphs in water conservancy project risk management.
i
Middle East & Africa Generative AI in Testing Market Size, Share, Analysis...
intelevoresearch.com
Updated Aug 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://www.intelevoresearch.com/ (2025). Middle East & Africa Generative AI in Testing Market Size, Share, Analysis Report Component (Software, Services), Deployment (Cloud, On-premises, Hybrid), Application (Automated Test Case Generation, Intelligent Test Data Creation, AI-Powered Test Maintenance, Predictive Quality Analytics, Technology, NL-to-Test, Agentic Orchestration, Vision & Model-Based UI Understanding, Retrieval-Augmented Testing, Test-Data Generators), Organization Size (Large Enterprises, SMEs), End Use (IT & Telecom, BFSI, Healthcare & Life Sciences, Retail & eCommerce, Manufacturing & Industrial, Public Sector & Education) Region and Key Players - Industry Segment Overview, Market Dynamics, Competitive Strategies, Trends and Forecast 2025-2034 [Dataset]. https://www.intelevoresearch.com/reports/middle-east-africa-generative-ai-in-testing-market
Explore at:
Dataset updated
Aug 20, 2025
Dataset provided by
https://www.intelevoresearch.com/
License
https://www.intelevoresearch.com/privacy-policyhttps://www.intelevoresearch.com/privacy-policy
Area covered
Africa, Middle East
Description
Middle East & Africa Generative AI in Testing market is set to grow from USD 221.08M in 2024 to USD 884.75M by 2034, at a CAGR of 15.35%. Explore trends, drivers, growth.
i
Europe Generative AI in Testing Market Size, Share, Analysis Report...
intelevoresearch.com
Updated Aug 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://www.intelevoresearch.com/ (2025). Europe Generative AI in Testing Market Size, Share, Analysis Report Component (Software, Services), Deployment (Cloud, On-premises, Hybrid), Application (Automated Test Case Generation, Intelligent Test Data Creation, AI-Powered Test Maintenance, Predictive Quality Analytics, Technology, NL-to-Test, Agentic Orchestration, Vision & Model-Based UI Understanding, Retrieval-Augmented Testing, Test-Data Generators), Organization Size (Large Enterprises, SMEs), End Use (IT & Telecom, BFSI, Healthcare & Life Sciences, Retail & eCommerce, Manufacturing & Industrial, Public Sector & Education) Region and Key Players - Industry Segment Overview, Market Dynamics, Competitive Strategies, Trends and Forecast 2025-2034 [Dataset]. https://www.intelevoresearch.com/reports/europe-generative-ai-in-testing-market
Explore at:
Dataset updated
Aug 20, 2025
Dataset provided by
https://www.intelevoresearch.com/
License
https://www.intelevoresearch.com/privacy-policyhttps://www.intelevoresearch.com/privacy-policy
Area covered
Europe
Description
Europe Generative AI in Testing market is set to rise from USD 0.21B in 2024 to USD 3.75B by 2034, growing at a CAGR of 34.21%. Explore drivers, trends and opportunities.
h
customer_support_conversations_dataset
huggingface.co
Updated Oct 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Syncora.ai - Agentic Synthetic Data Platform (2025). customer_support_conversations_dataset [Dataset]. https://huggingface.co/datasets/syncora/customer_support_conversations_dataset
Explore at:
Dataset updated
Oct 10, 2025
Authors
Syncora.ai - Agentic Synthetic Data Platform
Description
💬 Customer Support Conversation Dataset — Powered by Syncora.ai

A free synthetic dataset for chatbot training, LLM fine-tuning, and synthetic data generation research.Created using Syncora.ai’s privacy-safe synthetic data engine, this dataset is ideal for developing, testing, and benchmarking AI customer support systems. It serves as a dataset for chatbot training and a dataset for LLM training, offering rich, structured conversation data for real-world simulation.

🌟… See the full description on the dataset page: https://huggingface.co/datasets/syncora/customer_support_conversations_dataset.
h
fitness-tracker-dataset
huggingface.co
Updated Oct 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Syncora.ai - Agentic Synthetic Data Platform (2025). fitness-tracker-dataset [Dataset]. https://huggingface.co/datasets/syncora/fitness-tracker-dataset
Explore at:
Dataset updated
Oct 5, 2025
Authors
Syncora.ai - Agentic Synthetic Data Platform
Description
🏃 Synthetic Wearable & Activity Dataset — Powered by Syncora.ai

Free dataset for health analytics, activity recognition, synthetic data generation, and dataset for LLM training.

🌟 About This Dataset

This dataset contains synthetic wearable fitness records, modeled on signals from devices such as the Apple Watch. All entries are fully synthetic, generated with Syncora.ai’s synthetic data engine, ensuring privacy-safe and bias-aware data.
The dataset provides rich… See the full description on the dataset page: https://huggingface.co/datasets/syncora/fitness-tracker-dataset.
i
Global Generative AI in Testing Market Size, Share, Analysis Report...
intelevoresearch.com
Updated Aug 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://www.intelevoresearch.com/ (2025). Global Generative AI in Testing Market Size, Share, Analysis Report Component (Software, Services), Deployment (Cloud, On-premises, Hybrid), Application (Automated Test Case Generation, Intelligent Test Data Creation, AI-Powered Test Maintenance, Predictive Quality Analytics, Technology, NL-to-Test, Agentic Orchestration, Vision & Model-Based UI Understanding, Retrieval-Augmented Testing, Test-Data Generators), Organization Size (Large Enterprises, SMEs), End Use (IT & Telecom, BFSI, Healthcare & Life Sciences, Retail & eCommerce, Manufacturing & Industrial, Public Sector & Education) Region and Key Players - Industry Segment Overview, Market Dynamics, Competitive Strategies, Trends and Forecast 2025-2034 [Dataset]. https://www.intelevoresearch.com/reports/generative-ai-in-testing-market
Explore at:
Dataset updated
Aug 20, 2025
Dataset provided by
https://www.intelevoresearch.com/
License
https://www.intelevoresearch.com/privacy-policyhttps://www.intelevoresearch.com/privacy-policy
Description
Global Generative AI in Testing market is set to grow from USD 0.71B in 2024 to USD 14.15B by 2034,at a CAGR of 34.2% (2025–2034). Explore trends, opportunities and drivers.
h
mental_health_survey_dataset
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Syncora.ai - Agentic Synthetic Data Platform, mental_health_survey_dataset [Dataset]. https://huggingface.co/datasets/syncora/mental_health_survey_dataset
Explore at:
Authors
Syncora.ai - Agentic Synthetic Data Platform
Description
🧠 Mental Health Posting Dataset — Synthetic Dataset for LLM & Chatbot Training

Free dataset for mental health research, LLM training, and chatbot development, generated using synthetic data generation techniques to ensure privacy and high fidelity.

🌟 About This Dataset

This dataset contains synthetic mental health survey responses across multiple demographics and occupations. It includes participant-reported stress levels, coping mechanisms, mood swings, and social… See the full description on the dataset page: https://huggingface.co/datasets/syncora/mental_health_survey_dataset.
i
North America Generative AI in Testing Market Size, Share, Analysis Report...
intelevoresearch.com
Updated Aug 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://www.intelevoresearch.com/ (2025). North America Generative AI in Testing Market Size, Share, Analysis Report Component (Software, Services), Deployment (Cloud, On-premises, Hybrid), Application (Automated Test Case Generation, Intelligent Test Data Creation, AI-Powered Test Maintenance, Predictive Quality Analytics, Technology, NL-to-Test, Agentic Orchestration, Vision & Model-Based UI Understanding, Retrieval-Augmented Testing, Test-Data Generators), Organization Size (Large Enterprises, SMEs), End Use (IT & Telecom, BFSI, Healthcare & Life Sciences, Retail & eCommerce, Manufacturing & Industrial, Public Sector & Education) Region and Key Players - Industry Segment Overview, Market Dynamics, Competitive Strategies, Trends and Forecast 2025-2034 [Dataset]. https://www.intelevoresearch.com/reports/north-america-generative-ai-in-testing-market
Explore at:
Dataset updated
Aug 20, 2025
Dataset provided by
https://www.intelevoresearch.com/
License
https://www.intelevoresearch.com/privacy-policyhttps://www.intelevoresearch.com/privacy-policy
Description
North America Generative AI in Testing market is set to grow from USD 0.31B in 2024 to USD 5.8B by 2034, at a CAGR of 33.91%. Explore trends, drivers, and opportunities.
AI Procurement Intelligence Market Analysis, Size, and Forecast 2025-2029 :...
technavio.com
pdf
Updated Oct 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2025). AI Procurement Intelligence Market Analysis, Size, and Forecast 2025-2029 : North America (US, Canada, and Mexico), Europe (Germany, UK, France, The Netherlands, Italy, and Spain), APAC (China, Japan, India, Australia, South Korea, and Indonesia), South America (Brazil, Argentina, and Colombia), Middle East and Africa (UAE, South Africa, and Turkey), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/ai-procurement-intelligence-market-industry-analysis
Explore at:
pdfAvailable download formats
Dataset updated
Oct 9, 2025
Dataset provided by
TechNavio
Authors
Technavio
License
https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Time period covered
2025 - 2029
Area covered
United States, Canada
Description
Snapshot img { margin: 10px !important; } AI Procurement Intelligence Market Size 2025-2029

The ai procurement intelligence market size is forecast to increase by USD 14.5 billion, at a CAGR of 42.9% between 2024 and 2029.

Enterprises are increasingly adopting AI procurement intelligence to enhance operational efficiency and achieve significant cost savings in response to persistent economic pressures. This drive for strategic cost management is met by the proliferation of generative AI and hyper-automation, which are being integrated into advanced procurement software. These technologies are enabling a shift toward predictive sourcing functions, allowing teams to forecast market conditions and automate complex decision-making processes. By leveraging natural language prompts and cognitive capabilities, these tools make sophisticated data analysis more accessible, empowering procurement professionals to focus on higher-value activities like negotiation and strategic supplier relationship management. The focus is on creating autonomous and strategic sourcing capabilities through industrial ai software.However, realizing the full potential of these advanced systems is often constrained by foundational issues related to data integrity and accessibility. Many organizations grapple with a fragmented data landscape, where procurement information is trapped in disparate silos with inconsistent taxonomies, making the creation of a unified data view a significant hurdle. Without meticulous data cleansing and normalization, the insights generated by AI algorithms can be skewed or misleading, which erodes user trust and undermines the business case for the technology. This highlights the importance of robust AI governance tools to manage data quality, security, and integration effectively within the framework of agentic AI for data engineering.

What will be the Size of the AI Procurement Intelligence Market during the forecast period?

Explore in-depth regional segment analysis with market size data - historical 2019 - 2023 and forecasts 2025-2029 - in the full report.
Request Free SampleThe market is defined by a strategic shift toward proactive risk mitigation and enhanced supply chain resilience. Organizations are leveraging predictive analytics and real-time monitoring to anticipate disruptions from geopolitical or climate-related events. This move from a reactive to a proactive stance is enabled by AI-powered platforms that provide deep visibility into multi-tier supplier networks. The integration of predictive ai in supply chain systems is becoming standard practice for ensuring business continuity and managing complex global trade dynamics. This focus on foresight and preparedness underscores a fundamental change in procurement strategy.Operational efficiency is being transformed through procurement workflow automation and the adoption of hyper-automation. These technologies are streamlining routine tasks like invoice processing and purchase order generation, freeing up procurement professionals for more strategic activities. The use of generative AI is also changing user interaction via natural language prompts, making complex data analysis more accessible. This focus on intelligent automation and ai in project management helps organizations reduce sourcing cycle times and improve overall productivity.Supplier relationship management is evolving with the use of sophisticated AI tools for performance evaluation and strategic decision-making. AI-powered platforms assist in supplier discovery and vetting, ensuring that new partners meet rigorous standards for quality and compliance. These systems analyze supplier performance metrics to inform consolidation strategies and negotiation tactics. The ongoing development of ai for sales, from a procurement perspective, allows for more dynamic and data-driven interactions, fostering a collaborative and resilient supplier ecosystem.

How is this AI Procurement Intelligence Industry segmented?

The ai procurement intelligence industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in "USD million" for the period 2025-2029, as well as historical data from 2019 - 2023 for the following segments. ComponentSoftwareServicesDeploymentCloud-basedOn-premisesEnd-userLarge enterprisesSMEsGovernment and public sectorGeographyNorth AmericaUSCanadaMexicoEuropeGermanyUKFranceThe NetherlandsItalySpainAPACChinaJapanIndiaAustraliaSouth KoreaIndonesiaSouth AmericaBrazilArgentinaColombiaMiddle East and AfricaUAESouth AfricaTurkeyRest of World (ROW)

By Component Insights

The software segment is estimated to witness significant growth during the forecast period.The software segment forms the core of the market, comprising digital platforms and applications that enable data-driven procurement. These solutions, predominantly delivered via a Software-as-a-Service model, provide func
h
developer-productivity-simulated-behavioral-data
huggingface.co
Updated Aug 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Syncora.ai - Agentic Synthetic Data Platform (2025). developer-productivity-simulated-behavioral-data [Dataset]. https://huggingface.co/datasets/syncora/developer-productivity-simulated-behavioral-data
Explore at:
Dataset updated
Aug 25, 2025
Authors
Syncora.ai - Agentic Synthetic Data Platform
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Synthetic AI Developer Productivity Dataset — Behavioral + Cognitive Simulation

A synthetic data generation resource for modeling behavioral and cognitive dynamics in developers.

📘 About This Dataset

This dataset simulates productivity data from AI-assisted software developers. It blends behavioral signals, physiological inputs, and productivity metrics to explore the nuanced relationships between deep work, distractions, caffeine, AI usage, and cognitive strain.… See the full description on the dataset page: https://huggingface.co/datasets/syncora/developer-productivity-simulated-behavioral-data.
Risk Information Query and Decision Generation Workflow.
plos.figshare.com
xls
Updated Aug 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Libo Yang; Yuan Li; Junhua Tan; Libo Mao (2025). Risk Information Query and Decision Generation Workflow. [Dataset]. http://doi.org/10.1371/journal.pone.0330258.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0330258.t003
Dataset updated
Aug 26, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Libo Yang; Yuan Li; Junhua Tan; Libo Mao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Risk Information Query and Decision Generation Workflow.
h
DataScience-Instruct-500K
huggingface.co
Updated Oct 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RUC-DataLab (2025). DataScience-Instruct-500K [Dataset]. https://huggingface.co/datasets/RUC-DataLab/DataScience-Instruct-500K
Explore at:
Dataset updated
Oct 21, 2025
Dataset authored and provided by
RUC-DataLab
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
DeepAnalyze: Agentic Large Language Models for Autonomous Data Science

Authors: Shaolei Zhang, Ju Fan*, Meihao Fan, Guoliang Li, Xiaoyong Du

DeepAnalyze is the first agentic LLM for autonomous data science. It can autonomously complete a wide range of data-centric tasks without human intervention, supporting: 🛠 Entire data science pipeline: Automatically perform any data science tasks such as data preparation, analysis, modeling, visualization, and report generation. 🔍… See the full description on the dataset page: https://huggingface.co/datasets/RUC-DataLab/DataScience-Instruct-500K.
f
Data Sheet 1_On the potential of agentic workflows for animal training plan...
frontiersin.figshare.com
pdf
Updated May 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jörg Schultz (2025). Data Sheet 1_On the potential of agentic workflows for animal training plan generation.pdf [Dataset]. http://doi.org/10.3389/fvets.2025.1563233.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fvets.2025.1563233.s001
Dataset updated
May 20, 2025
Dataset provided by
Frontiers
Authors
Jörg Schultz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Effective animal training depends on well-structured training plans that ensure consistent progress and measurable outcomes. However, the creation of such plans is often time-intensive, repetitive, and detracts from hands-on training. Recent advancements in generative AI powered by large language models (LLMs) provide potential solutions but frequently fail to produce actionable, individualized plans tailored to specific contexts. This limitation is particularly significant given the diverse tasks performed by dogs–ranging from working roles in military and police operations to competitive sports–and the varying training philosophies among practitioners. To address these challenges, a modular agentic workflow framework is proposed, leveraging LLMs while mitigating their shortcomings. By decomposing the training plan generation process into specialized building blocks–autonomous agents that handle subtasks such as structuring progressions, ensuring welfare compliance, and adhering to team-specific standard operating procedures (SOPs)—this approach facilitates the creation of specific, actionable plans. The modular design further allows workflows to be tailored to the unique requirements of individual tasks and philosophies. As a proof of concept, a complete training plan generation workflow is presented, integrating these agents into a cohesive system. This framework prioritizes flexibility and adaptability, empowering trainers to create customized solutions while leveraging generative AI's capabilities. In summary, agentic workflows bridge the gap between cutting-edge technology and the practical, diverse needs of the animal training community. As such, they could form a crucial foundation for advancing computer-assisted animal training methodologies.
h
WorFBench_train
huggingface.co
Updated Jul 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ZJUNLP (2025). WorFBench_train [Dataset]. https://huggingface.co/datasets/zjunlp/WorFBench_train
Explore at:
Dataset updated
Jul 21, 2025
Dataset authored and provided by
ZJUNLP
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This repository contains the data presented in Benchmarking Agentic Workflow Generation. Code: https://github.com/zjunlp/WorfBench
h
graph-data-quantum-rl
huggingface.co
Updated Oct 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cong Yu (2025). graph-data-quantum-rl [Dataset]. https://huggingface.co/datasets/Benyucong/graph-data-quantum-rl
Explore at:
Dataset updated
Oct 5, 2025
Authors
Cong Yu
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Citation

If you use this dataset, please cite: @misc{yu2025quasarquantumassemblycode, title={QUASAR: Quantum Assembly Code Generation Using Tool-Augmented LLMs via Agentic RL}, author={Cong Yu and Valter Uotila and Shilong Deng and Qingyuan Wu and Tuo Shi and Songlin Jiang and Lei You and Bo Zhao}, year={2025}, eprint={2510.00967}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2510.00967}, }
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

MealMe (2024). AI Training Data | Annotated Checkout Flows for Retail, Restaurant, and Marketplace Websites [Dataset]. https://datarade.ai/data-products/ai-training-data-annotated-checkout-flows-for-retail-resta-mealme

AI Training Data | Annotated Checkout Flows for Retail, Restaurant, and Marketplace Websites

Explore at:

Dataset updated

Dec 18, 2024

Dataset authored and provided by

MealMe

Area covered

United States of America

Description

AI Training Data | Annotated Checkout Flows for Retail, Restaurant, and Marketplace Websites Overview

Unlock the next generation of agentic commerce and automated shopping experiences with this comprehensive dataset of meticulously annotated checkout flows, sourced directly from leading retail, restaurant, and marketplace websites. Designed for developers, researchers, and AI labs building large language models (LLMs) and agentic systems capable of online purchasing, this dataset captures the real-world complexity of digital transactions—from cart initiation to final payment.

Key Features

Breadth of Coverage: Over 10,000 unique checkout journeys across hundreds of top e-commerce, food delivery, and service platforms, including but not limited to Walmart, Target, Kroger, Whole Foods, Uber Eats, Instacart, Shopify-powered sites, and more.

Actionable Annotation: Every flow is broken down into granular, step-by-step actions, complete with timestamped events, UI context, form field details, validation logic, and response feedback. Each step includes:

Page state (URL, DOM snapshot, and metadata)

User actions (clicks, taps, text input, dropdown selection, checkbox/radio interactions)

System responses (AJAX calls, error/success messages, cart/price updates)

Authentication and account linking steps where applicable

Payment entry (card, wallet, alternative methods)

Order review and confirmation

Multi-Vertical, Real-World Data: Flows sourced from a wide variety of verticals and real consumer environments, not just demo stores or test accounts. Includes complex cases such as multi-item carts, promo codes, loyalty integration, and split payments.

Structured for Machine Learning: Delivered in standard formats (JSONL, CSV, or your preferred schema), with every event mapped to action types, page features, and expected outcomes. Optional HAR files and raw network request logs provide an extra layer of technical fidelity for action modeling and RLHF pipelines.

Rich Context for LLMs and Agents: Every annotation includes both human-readable and model-consumable descriptions:

“What the user did” (natural language)

“What the system did in response”

“What a successful action should look like”

Error/edge case coverage (invalid forms, OOS, address/payment errors)

Privacy-Safe & Compliant: All flows are depersonalized and scrubbed of PII. Sensitive fields (like credit card numbers, user addresses, and login credentials) are replaced with realistic but synthetic data, ensuring compliance with privacy regulations.

Each flow tracks the user journey from cart to payment to confirmation, including:

Adding/removing items

Applying coupons or promo codes

Selecting shipping/delivery options

Account creation, login, or guest checkout

Inputting payment details (card, wallet, Buy Now Pay Later)

Handling validation errors or OOS scenarios

Order review and final placement

Confirmation page capture (including order summary details)

Why This Dataset?

Building LLMs, agentic shopping bots, or e-commerce automation tools demands more than just page screenshots or API logs. You need deeply contextualized, action-oriented data that reflects how real users interact with the complex, ever-changing UIs of digital commerce. Our dataset uniquely captures:

The full intent-action-outcome loop

Dynamic UI changes, modals, validation, and error handling

Nuances of cart modification, bundle pricing, delivery constraints, and multi-vendor checkouts

Mobile vs. desktop variations

Diverse merchant tech stacks (custom, Shopify, Magento, BigCommerce, native apps, etc.)

Use Cases

LLM Fine-Tuning: Teach models to reason through step-by-step transaction flows, infer next-best-actions, and generate robust, context-sensitive prompts for real-world ordering.

Agentic Shopping Bots: Train agents to navigate web/mobile checkouts autonomously, handle edge cases, and complete real purchases on behalf of users.

Action Model & RLHF Training: Provide reinforcement learning pipelines with ground truth “what happens if I do X?” data across hundreds of real merchants.

UI/UX Research & Synthetic User Studies: Identify friction points, bottlenecks, and drop-offs in modern checkout design by replaying flows and testing interventions.

Automated QA & Regression Testing: Use realistic flows as test cases for new features or third-party integrations.

What’s Included

10,000+ annotated checkout flows (retail, restaurant, marketplace)

Step-by-step event logs with metadata, DOM, and network context

Natural language explanations for each step and transition

All flows are depersonalized and privacy-compliant

Example scripts for ingesting, parsing, and analyzing the dataset

Flexible licensing for research or commercial use

Sample Categories Covered

Grocery delivery (Instacart, Walmart, Kroger, Target, etc.)

Restaurant takeout/delivery (Ub...

Clear search

Close search

Google apps

Main menu

AI Training Data | Annotated Checkout Flows for Retail, Restaurant, and...

APIGen-MT-5k

uk_retail_store_synthetic_dataset

Example of Labeling.

Results of module ablation on the validation set.

Middle East & Africa Generative AI in Testing Market Size, Share, Analysis...

Europe Generative AI in Testing Market Size, Share, Analysis Report...

customer_support_conversations_dataset

fitness-tracker-dataset

Global Generative AI in Testing Market Size, Share, Analysis Report...

mental_health_survey_dataset

North America Generative AI in Testing Market Size, Share, Analysis Report...

AI Procurement Intelligence Market Analysis, Size, and Forecast 2025-2029 :...

Snapshot img { margin: 10px !important; } AI Procurement Intelligence Market Size 2025-2029

developer-productivity-simulated-behavioral-data

Risk Information Query and Decision Generation Workflow.

DataScience-Instruct-500K

Data Sheet 1_On the potential of agentic workflows for animal training plan...

WorFBench_train

graph-data-quantum-rl

AI Training Data | Annotated Checkout Flows for Retail, Restaurant, and Marketplace Websites