Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for Dataset Name
Dataset Summary
Mind2Web is a dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. Existing datasets for web agents either use simulated websites or only cover a limited set of websites and tasks, thus not suitable for generalist web agents. With over 2,000 open-ended tasks collected from 137 websites spanning 31 domains and crowdsourced action… See the full description on the dataset page: https://huggingface.co/datasets/osunlp/Mind2Web.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By osunlp (From Huggingface) [source]
The Mind2Web dataset is a valuable resource for the development and evaluation of generalist agents that can effectively perform web tasks by comprehending and executing language instructions. This dataset supports the creation of agents capable of completing complex tasks on any website while adhering to accessibility guidelines.
The dataset comprises various columns that provide essential information for training these generalist agents. The action_reprs column contains textual representations of the actions that can be executed by the agents on websites. These representations serve as guidance for understanding and implementing specific tasks.
To ensure task accuracy and completion, the confirmed_task column indicates whether a given task assigned to a generalist agent has been confirmed or not. This binary value assists in evaluating performance and validating adherence to instructions.
In addition, the subdomain column specifies the subdomain under which each website resides. This information helps contextualize the tasks performed within distinct web environments, enhancing versatility and adaptability.
With these explicit features and data points present in each row of train.csv, developers can train their models more effectively using guided language instructions specific to web tasks. By leveraging this dataset, researchers can advance techniques aimed at improving web accessibility through intelligent generalist agents capable of utilizing natural language understanding to navigate an array of websites efficiently
The Mind2Web dataset is a valuable resource for researchers and developers working on creating generalist agents capable of performing complex web tasks based on language instructions. This guide will provide you with step-by-step instructions on how to effectively use this dataset.
Understanding the Columns:
- action_reprs: This column contains representations of the actions that the generalist agents can perform on a website. It provides insights into what specific actions are available for execution.
- confirmed_task: This boolean column indicates whether the task assigned to the generalist agent has been confirmed or not. It helps in identifying which tasks have been successfully completed by the agent.
- subdomain: The subdomain column specifies where each task is performed on a website. It helps to categorize and group tasks based on their respective subdomains.
Familiarize Yourself with the Dataset Structure:
- Take some time to explore and understand how data is organized within this dataset.
- Identify potential patterns or relationships between different columns, such as how action_reprs corresponds with confirmed_task and subdomain.
- Look for any missing values or inconsistencies in data, which might require preprocessing before using it in your research or development projects.
Extraction and Cleaning of Data:
- Based on your specific research goals, identify relevant subsets of data from this dataset that align with your objectives. For example, if you are interested in studying tasks related to e-commerce websites, focus on those entries within a particular subdomain(s).
- Perform any necessary data cleaning steps, such as removing duplicates, handling missing values, or correcting erroneous entries. Ensuring high-quality data will lead to more reliable results during analysis.
Task Analysis and Model Development: i) Task Understanding: Understand each task's requirements by analyzing its corresponding language instructions (
confirmed_taskcolumn) and identify the relevant actions that need to be performed on the website (action_reprscolumn). ii) Model Development: Utilize machine learning or natural language processing techniques to develop models capable of interpreting and executing language instructions. Train these models using the Mind2Web dataset by providing both the instructions and corresponding actions.Evaluating Model Performance:
- Use a separate validation or test set (not included in the dataset) to evaluate your model's performance. This step is crucial for determining how well your developed model can complete new, unseen tasks accurately.
- Measure key performance metrics like accuracy,
- Training and evaluating generalist agents: The dataset can be used to train and evaluate generalist agents, which are capab...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mind2Web 2
Mind2Web 2 is an evaluation framework for agentic search capabilities, featuring Agent-as-a-Judge methodology for comprehensive assessment of web automation agents.
Mind2Web 2 features realistic and diverse long-horizon web search tasks and a novel Agent-as-a-Judge framework to evaluate complex, time-varying, and citation-backed answers.
🔗 Links
🏠 Homepage 🏆 Leaderboard 📖 Paper 💻 Code
🔄 Changelog
Oct 23, 2025: Updated several tasks… See the full description on the dataset page: https://huggingface.co/datasets/osunlp/Mind2Web-2.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Blog | Paper | Code | Leaderboard
Online-Mind2Web
Online-Mind2Web is the online version of Mind2Web, a more diverse and user-centric dataset includes 300 high-quality tasks from 136 popular websites across various domains. The dataset covers a diverse set of user tasks, such as clothing, food, housing, and transportation, to evaluate web agents' performance in a real-world online environment.
News
[11/03/2025] We’ve updated 36 tasks that are no longer… See the full description on the dataset page: https://huggingface.co/datasets/osunlp/Online-Mind2Web.
Facebook
TwitterDataset Card for Dataset Name
This dataset card aims to be a base template for new datasets. It has been generated using this raw template.
Dataset Details
Dataset Description
Curated by: [More Information Needed] Funded by [optional]: [More Information Needed] Shared by [optional]: [More Information Needed] Language(s) (NLP): [More Information Needed] License: [More Information Needed]
Dataset Sources [optional]
Repository: [More… See the full description on the dataset page: https://huggingface.co/datasets/Izazk/izaz-mind2web-dataset.
Facebook
TwitterWPRM/minibench-mm-mind2web dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhud-evals/Online-Mind2Web-Tiny dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterhyungjoochae/minibench-multimodal-mind2web dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterGenteki/Online-Mind2Web dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterisaiahbjork/web-agent-mind2web dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Mind2Web Subset - Human Demonstrations
A collection of human-demonstrated web navigation tasks with detailed interaction traces. This dataset captures real browser interactions including clicks, typing, scrolling, DOM states, screenshots, and HTTP requests for web agent training and evaluation.
Overview
This dataset contains tasks performed by humans in real web environments, capturing:
Golden trajectories: Step-by-step sequences of actions (clicks, typing, navigation)… See the full description on the dataset page: https://huggingface.co/datasets/josancamon/mind2web-subset-human.
Facebook
TwitterLangAGI-Lab/Multimodal-Mind2Web-HTML-WM-messages-test dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterLangAGI-Lab/Multimodal-Mind2Web-HTML-WM-messages-filter-35000 dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Izazk/Sequence-of-action-prediction-mind2web dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterLangAGI-Lab/Mind2Web-HTML-cleaned-lite-with-desc_w_tao_value_rationale dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterGenteki/Online-Mind2Web-Tiny dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterLangAGI-Lab/Mind2Web-axtree dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterDataset Card for "llama2d-mind2web"
More Information needed
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Magma: A Foundation Model for Multimodal AI Agents
Jianwei Yang*1†
Reuben Tan1†
Qianhui Wu1†
Ruijie Zheng2‡
Baolin Peng1‡
Yongyuan Liang2‡
Yu Gu1
Mu Cai3
Seonghyeon Ye4
Joel Jang5
Yuquan Deng5
Lars Liden1
Jianfeng Gao1▽
1 Microsoft Research; 2 University of Maryland; 3 University of Wisconsin-Madison4 KAIST; 5 University of Washington
* Project lead † First authors ‡ Second authors ▽ Leadership
[arXiv Paper] [Project Page] [Hugging Face Paper] [Github Repo] [Video]… See the full description on the dataset page: https://huggingface.co/datasets/MagmaAI/Magma-Mind2Web-SoM.
Facebook
TwitterDataset Card for "Mind2Web-axtree-cleaned-lite-with-refined-tao"
More Information needed
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for Dataset Name
Dataset Summary
Mind2Web is a dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. Existing datasets for web agents either use simulated websites or only cover a limited set of websites and tasks, thus not suitable for generalist web agents. With over 2,000 open-ended tasks collected from 137 websites spanning 31 domains and crowdsourced action… See the full description on the dataset page: https://huggingface.co/datasets/osunlp/Mind2Web.