51 datasets found
  1. Empirical Analysis of Ranking Models for an Adaptable Dataset Search:...

    • figshare.com
    zip
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Angelo Batista Neves Júnior; Luiz André Portes Paes Leme; Marco Antonio Casanova (2023). Empirical Analysis of Ranking Models for an Adaptable Dataset Search: complementary material [Dataset]. http://doi.org/10.6084/m9.figshare.5620651.v4
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Angelo Batista Neves Júnior; Luiz André Portes Paes Leme; Marco Antonio Casanova
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    This repository contains performance measures of dataset ranking models.- Usage: from Results/src run Python results m1 m2 ...such that mi can be omitted, or be any element of the list of model labels ['bayesian-12C', 'bayesian-5L', 'bayesian-5L12C', 'cos-12C', 'cos-5L', 'cos-5L5C', 'j48-12C', 'j48-5L', 'j48-5L5C', 'jrip-12C', 'jrip-5L', 'jrip-5L5C', 'sn-12C', 'sn-5L', 'sn-5L12C']. Results of selected models will be plotted in a 2D line plot. If no model is provided all models will be listed.

  2. A

    ‘QS World University Rankings 2017 - 2022’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Aug 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘QS World University Rankings 2017 - 2022’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-qs-world-university-rankings-2017-2022-7fc4/d793e726/?iid=007-103&v=presentation
    Explore at:
    Dataset updated
    Aug 1, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘QS World University Rankings 2017 - 2022’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/padhmam/qs-world-university-rankings-2017-2022 on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    QS World University Rankings is an annual publication of global university rankings by Quacquarelli Symonds. The QS ranking receives approval from the International Ranking Expert Group (IREG), and is viewed as one of the three most-widely read university rankings in the world. QS publishes its university rankings in partnership with Elsevier.

    Content

    This dataset contains university data from the year 2017 to 2022. It has a total of 15 features. - university - name of the university - year - year of ranking - rank_display - rank given to the university - score - score of the university based on the six key metrics mentioned above - link - link to the university profile page on QS website - country - country in which the university is located - city - city in which the university is located - region - continent in which the university is located - logo - link to the logo of the university - type - type of university (public or private) - research_output - quality of research at the university - student_faculty_ratio - number of students assigned to per faculty - international_students - number of international students enrolled at the university - size - size of the university in terms of area - faculty_count - number of faculty or academic staff at the university

    Acknowledgements

    This dataset was acquired by scraping the QS World University Rankings website with Python and Selenium. Cover Image: Source

    Inspiration

    Some of the questions that can be answered with this dataset, 1. What makes a best ranked university? 2. Does the location of a university play a role in its ranking? 3. What do the best universities have in common? 4. How important is academic research for a university? 5. Which country is preferred by international students?

    --- Original source retains full ownership of the source dataset ---

  3. Traces captured by visiting the top 1500 website

    • kaggle.com
    zip
    Updated Aug 25, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DNS_dataset (2021). Traces captured by visiting the top 1500 website [Dataset]. https://www.kaggle.com/jacksontang16/traces-captured-by-visiting-the-top-1500-website
    Explore at:
    zip(5852806 bytes)Available download formats
    Dataset updated
    Aug 25, 2021
    Authors
    DNS_dataset
    Description

    Dataset

    This dataset was created by DNS_dataset

    Contents

  4. TripAdvisor Datasets

    • brightdata.com
    .json, .csv, .xlsx
    Updated Nov 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bright Data (2023). TripAdvisor Datasets [Dataset]. https://brightdata.com/products/datasets/tripadvisor
    Explore at:
    .json, .csv, .xlsxAvailable download formats
    Dataset updated
    Nov 12, 2023
    Dataset authored and provided by
    Bright Datahttps://brightdata.com/
    License

    https://brightdata.com/licensehttps://brightdata.com/license

    Area covered
    Worldwide
    Description

    Unlock valuable insights with our comprehensive TripAdvisor Dataset, designed for businesses, analysts, and researchers to track customer reviews, ratings, and travel trends. This dataset provides structured and reliable data from TripAdvisor to enhance market research, competitive analysis, and customer satisfaction strategies.

    Dataset Features

    Business Listings: Access detailed information on hotels, restaurants, attractions, and other businesses, including names, locations, categories, and contact details. Customer Reviews & Ratings: Extract user-generated reviews, star ratings, review dates, and sentiment analysis to understand customer experiences and preferences. Pricing & Booking Data: Track pricing trends, availability, and booking options for hotels, flights, and travel services. Location & Geographical Insights: Analyze travel trends by region, city, or country to identify popular destinations and emerging markets.

    Customizable Subsets for Specific Needs Our TripAdvisor Dataset is fully customizable, allowing you to filter data based on location, business type, review sentiment, or specific keywords. Whether you need broad coverage for industry analysis or focused data for customer insights, we tailor the dataset to your needs.

    Popular Use Cases

    Customer Satisfaction & Brand Monitoring: Track customer feedback, analyze sentiment, and improve service offerings based on real user reviews. Market Research & Competitive Analysis: Compare business performance, monitor competitor reviews, and identify industry trends. Travel & Hospitality Insights: Analyze travel patterns, popular destinations, and seasonal trends to optimize marketing strategies. AI & Machine Learning Applications: Use structured review data to train AI models for sentiment analysis, recommendation engines, and predictive analytics. Pricing Strategy & Revenue Optimization: Monitor pricing trends and customer demand to optimize pricing strategies for hotels, restaurants, and travel services.

    Whether you're analyzing customer sentiment, tracking travel trends, or optimizing business strategies, our TripAdvisor Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.

  5. c

    Samsung Customer Reviews Dataset

    • cubig.ai
    Updated Jul 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Samsung Customer Reviews Dataset [Dataset]. https://cubig.ai/store/products/567/samsung-customer-reviews-dataset
    Explore at:
    Dataset updated
    Jul 8, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • The Samsung Customer Reviews Dataset contains 1,000 customer reviews of various Samsung products, including smartphones, tablets, TVs, and smartwatches. User feedback, ratings, and timestamps are included, which are useful for emotional analysis, customer satisfaction surveys, and product quality assessment.

    2) Data Utilization (1) Samsung Customer Reviews Dataset has characteristics that: • This dataset contains structured text and numerical information for each review, including product name, username, rating, review title, review body, and creation date, for detailed analysis by review. (2) Samsung Customer Reviews Dataset can be used to: • Customer Opinion Analysis and Emotional Classification: Review texts and ratings can be used to identify customer positive and negative emotions, major complaints and compliments about Samsung products, and to improve products and develop marketing strategies. • Comparison of satisfaction and trend analysis by product: By analyzing review data by product group and period, market trends such as popular products, changes in customer preferences, and repeatedly mentioned issues can be derived and used for competitor analysis or new product planning.

  6. T

    CORRUPTION RANK.PHP by Country Dataset

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Jun 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2025). CORRUPTION RANK.PHP by Country Dataset [Dataset]. https://tradingeconomics.com/country-list/corruption-rank.php
    Explore at:
    excel, csv, json, xmlAvailable download formats
    Dataset updated
    Jun 4, 2025
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2025
    Area covered
    World
    Description

    This dataset provides values for CORRUPTION RANK.PHP reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.

  7. f

    Data from: Evaluation of classification techniques for identifying fake...

    • scielo.figshare.com
    jpeg
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrey Schmidt dos Santos; Luis Felipe Riehs Camargo; Daniel Pacheco Lacerda (2023). Evaluation of classification techniques for identifying fake reviews about products and services on the internet [Dataset]. http://doi.org/10.6084/m9.figshare.14283143.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    SciELO journals
    Authors
    Andrey Schmidt dos Santos; Luis Felipe Riehs Camargo; Daniel Pacheco Lacerda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract: With the e-commerce growth, more people are buying products over the internet. To increase customer satisfaction, merchants provide spaces for product and service reviews. Products with positive reviews attract customers, while products with negative reviews lose customers. Following this idea, some individuals and corporations write fake reviews to promote their products and services or defame their competitors. The difficulty for finding these reviews was in the large amount of information available. One solution is to use data mining techniques and tools, such as the classification function. Exploring this situation, the present work evaluates classification techniques to identify fake reviews about products and services on the Internet. The research also presents a literature systematic review on fake reviews. The research used 8 classification algorithms. The algorithms were trained and tested with a hotels database. The CONCENSO algorithm presented the best result, with 88% in the precision indicator. After the first test, the algorithms classified reviews on another hotels database. To compare the results of this new classification, the Review Skeptic algorithm was used. The SVM and GLMNET algorithms presented the highest convergence with the Review Skeptic algorithm, classifying 83% of reviews with the same result. The research contributes by demonstrating the algorithms ability to understand consumers’ real reviews to products and services on the Internet. Another contribution is to be the pioneer in the investigation of fake reviews in Brazil and in production engineering.

  8. T

    GDP by Country Dataset

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Jun 29, 2011
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2011). GDP by Country Dataset [Dataset]. https://tradingeconomics.com/country-list/gdp
    Explore at:
    csv, json, xml, excelAvailable download formats
    Dataset updated
    Jun 29, 2011
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2025
    Area covered
    World
    Description

    This dataset provides values for GDP reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.

  9. 2k-ranked-images-open-image-preferences-v1

    • huggingface.co
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rapidata (2025). 2k-ranked-images-open-image-preferences-v1 [Dataset]. https://huggingface.co/datasets/Rapidata/2k-ranked-images-open-image-preferences-v1
    Explore at:
    Dataset updated
    Jun 1, 2025
    Dataset provided by
    Rapidata AG
    Authors
    Rapidata
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    2k Ranked Images

    This dataset contains roughly two thousand images ranked from most preferred to least preferred based on human feedback on pairwise comparisons (>25k responses). The generated images, which are a sample from the open-image-preferences-v1 dataset from the team @data-is-better-together, are rated purely based on aesthetic preference, disregarding the prompt used for generation. We provide the categories of the original dataset for easy filtering. This is a new… See the full description on the dataset page: https://huggingface.co/datasets/Rapidata/2k-ranked-images-open-image-preferences-v1.

  10. u

    Amazon review data 2018

    • cseweb.ucsd.edu
    • nijianmo.github.io
    • +1more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UCSD CSE Research Project, Amazon review data 2018 [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/
    Explore at:
    Dataset authored and provided by
    UCSD CSE Research Project
    Description

    Context

    This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:

    • More reviews:

      • The total number of reviews is 233.1 million (142.8 million in 2014).
    • New reviews:

      • Current data includes reviews in the range May 1996 - Oct 2018.
    • Metadata: - We have added transaction metadata for each review shown on the review page.

      • Added more detailed metadata of the product landing page.

    Acknowledgements

    If you publish articles based on this dataset, please cite the following paper:

    • Jianmo Ni, Jiacheng Li, Julian McAuley. Justifying recommendations using distantly-labeled reviews and fined-grained aspects. EMNLP, 2019.
  11. Movehub City Rankings

    • kaggle.com
    zip
    Updated Mar 24, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Blitzer (2017). Movehub City Rankings [Dataset]. https://www.kaggle.com/blitzr/movehub-city-rankings
    Explore at:
    zip(34310 bytes)Available download formats
    Dataset updated
    Mar 24, 2017
    Authors
    Blitzer
    Description

    Context

    Movehub city ranking as published on http://www.movehub.com/city-rankings

    Content

    movehubqualityoflife.csv

    Cities ranked by
    Movehub Rating: A combination of all scores for an overall rating for a city or country.
    Purchase Power: This compares the average cost of living with the average local wage.
    Health Care: Compiled from how citizens feel about their access to healthcare, and its quality.
    Pollution: Low is good. A score of how polluted people find a city, includes air, water and noise pollution.
    Quality of Life: A balance of healthcare, pollution, purchase power, crime rate to give an overall quality of life score.
    Crime Rating: Low is good. The lower the score the safer people feel in this city.

    movehubcostofliving.csv

    Unit: GBP
    City
    Cappuccino
    Cinema
    Wine
    Gasoline
    Avg Rent
    Avg Disposable Income

    cities.csv

    Cities to countries as parsed from Wikipedia https://en.wikipedia.org/wiki/List_of_towns_and_cities_with_100,000_or_more_inhabitants/cityname:_A (A-Z)

    Acknowledgements

    Movehub

    http://www.movehub.com/city-rankings

    Wikipedia

    https://en.wikipedia.org/wiki/List_of_towns_and_cities_with_100,000_or_more_inhabitants/cityname:_A

  12. h

    Data from: ReviewRebuttal

    • huggingface.co
    Updated May 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhang (2025). ReviewRebuttal [Dataset]. https://huggingface.co/datasets/Daoze/ReviewRebuttal
    Explore at:
    Dataset updated
    May 11, 2025
    Authors
    Zhang
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Introduction

    This dataset is the largest real-world consistency-ensured dataset for peer review, which features the widest range of conferences and the most complete review stages, including initial submissions, reviews, ratings and confidence, aspect ratings, rebuttals, discussions, score changes, meta-reviews, and final decisions.

      Comparison with Existing Datasets
    

    The comparison between our proposed dataset and existing peer review datasets is given below. Only the… See the full description on the dataset page: https://huggingface.co/datasets/Daoze/ReviewRebuttal.

  13. P

    IMDb Movie Reviews Dataset

    • paperswithcode.com
    Updated Dec 20, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew L. Maas; Raymond E. Daly; Peter T. Pham; Dan Huang; Andrew Y. Ng; Christopher Potts (2013). IMDb Movie Reviews Dataset [Dataset]. https://paperswithcode.com/dataset/imdb-movie-reviews
    Explore at:
    Dataset updated
    Dec 20, 2013
    Authors
    Andrew L. Maas; Raymond E. Daly; Peter T. Pham; Dan Huang; Andrew Y. Ng; Christopher Potts
    Description

    The IMDb Movie Reviews dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. No more than 30 reviews are included per movie. The dataset contains additional unlabeled data.

  14. W

    Resources of Global City Comparison Indicators

    • cloud.csiss.gmu.edu
    • data.wu.ac.at
    xls
    Updated Jun 5, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Greater London Authority (GLA) (2015). Resources of Global City Comparison Indicators [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/resources-of-global-city-comparison-indicators
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2015
    Dataset provided by
    Greater London Authority (GLA)
    Description
  15. c

    Apple iPhone SE reviews & ratings Dataset

    • cubig.ai
    Updated Feb 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Apple iPhone SE reviews & ratings Dataset [Dataset]. https://cubig.ai/store/products/143/apple-iphone-se-reviews-ratings-dataset
    Explore at:
    Dataset updated
    Feb 25, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data introduction • Apple-iphone-se-reviews dataset is a dataset that scrapes data from the Flipkart website using Selenium and BeautifulSoup links.

    2) Data utilization (1)Apple-iphone-se-reviews data has characteristics that: • User ratings for Apple iPhone SE on Indian e-commerce website Flipkart are . We aim at NLP text classification through user ratings, review titles, and review text. (2)Apple-iphone-se-reviews data can be used to: • Rating prediction: You can support automated review analysis and summarization by developing machine learning models to predict ratings based on review text. • Product Improvement: Insights gained from reviews can help us identify common issues and areas for improvement in iPhone SE and guide product development and quality improvements.

  16. d

    Data from: Analysis of intelligent vehicle technologies to improve...

    • datadryad.org
    • explore.openaire.eu
    • +1more
    zip
    Updated Oct 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ivan Runhua Xiao; Xiaodong Qian (2022). Analysis of intelligent vehicle technologies to improve vulnerable road users safety at signalized intersections [Dataset]. http://doi.org/10.25338/B8234N
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 14, 2022
    Dataset provided by
    Dryad
    Authors
    Ivan Runhua Xiao; Xiaodong Qian
    Time period covered
    2022
    Description

    The data files can be viewed by Excel.

  17. P

    MSLR WEB30K Dataset

    • paperswithcode.com
    Updated Apr 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tao Qin; Tie-Yan Liu (2025). MSLR WEB30K Dataset [Dataset]. https://paperswithcode.com/dataset/mslr-web30k
    Explore at:
    Dataset updated
    Apr 14, 2025
    Authors
    Tao Qin; Tie-Yan Liu
    Description

    The datasets are machine learning data, in which queries and urls are represented by IDs. The datasets consist of feature vectors extracted from query-url pairs along with relevance judgment labels:

    (1) The relevance judgments are obtained from a retired labeling set of a commercial web search engine (Microsoft Bing), which take 5 values from 0 (irrelevant) to 4 (perfectly relevant).

    (2) The features are basically extracted by us, and are those widely used in the research community.

    In the data files, each row corresponds to a query-url pair. The first column is relevance label of the pair, the second column is query id, and the following columns are features. The larger value the relevance label has, the more relevant the query-url pair is. A query-url pair is represented by a 136-dimensional feature vector.

  18. WONDERBREAD: A Benchmark + Dataset for Business Process Management (BPM)...

    • zenodo.org
    csv, json, zip
    Updated Oct 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Wornow; Michael Wornow (2024). WONDERBREAD: A Benchmark + Dataset for Business Process Management (BPM) Tasks [Dataset]. http://doi.org/10.5281/zenodo.12671568
    Explore at:
    csv, zip, jsonAvailable download formats
    Dataset updated
    Oct 14, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Michael Wornow; Michael Wornow
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 6, 2024
    Description

    Paper: WONDERBREAD: A Benchmark for Evaluating Multimodal Foundation Models on Business Process Management Tasks

    Background

    The WONDERBREAD dataset contains 2,928 human demonstrations of 598 web navigation workflows across 6 types of BPM tasks. These tasks measure the ability of a model to generate accurate documentation, assist in knowledge transfer, and improve the effeciency of workflows.

    Please see our website for more details: https://wonderbread.stanford.edu/

    Quick Start

    To start, download debug_demos.zip (~1 GB). It contains a subset of 24 demonstrations which can give you a sense of how the dataset is structured.

    To reproduce the paper, download gold_demos.zip (~33 GB). It contains 724 demonstrations corresponding to the 162 "Gold" tasks which were used for all the evaluations in the original paper.

    To obtain the full dataset, download demos.zip (~133 GB). This contains all 2,928 demonstrations and can be used for training, fine-tuning, and evaluating models.

    Dataset Structure

    The dataset contains several files, defined below.

    1. Raw Data (useful for training/fine-tuning/evaluation)
      1. debug_demos.zip -- a subset of only 24 demonstrations taken from the full dataset. Useful to get a sense of the dataset and for debugging.
      2. gold_demos.zip -- a subset of only 724 demonstrations corresopnding to the 162 "Gold" tasks. This is the dataset that was used for all evaluations in the original WONDERBREAD paper.
      3. demos.zip -- all 2,928 demonstrations across 598 tasks. Useful for training your own models.
    2. Evaluation (useful for evaluation)
      1. qa_dataset.csv -- contains all 120 questions and ground truth answers used in the "Knowlege Transfer" evaluation.
      2. df_rankings.csv -- contains the rankings of all "Gold" tasks used in the "SOP Ranking" evaluation.
    3. Metadata (can be safely ignored)
      1. Process Mining Task Demonstrations.xlsx -- maps human annotators to specific demonstrations; also contains "Gold" task rankings used in the "SOP Ranking" evaluation.
      2. metadata.json -- maps Google Drive URLs to Google Drive Folder IDs to demonstration names
      3. df_valid.csv -- tracks assets associated with each demonstration
  19. f

    Entity Relatedness Test Dataset - V2

    • figshare.com
    zip
    Updated May 15, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    José Eduardo Talavera Herrera; Marco Antonio Casanova (2017). Entity Relatedness Test Dataset - V2 [Dataset]. http://doi.org/10.6084/m9.figshare.5007983.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 15, 2017
    Dataset provided by
    figshare
    Authors
    José Eduardo Talavera Herrera; Marco Antonio Casanova
    License

    https://www.gnu.org/copyleft/gpl.htmlhttps://www.gnu.org/copyleft/gpl.html

    Description

    The entity relatedness problem refers to the question of computing the relationship paths that better describe the connectivity between a given entity pair. This dataset supports the evaluation of approaches that address the entity relatedness problem. It covers two familiar domains, music and movies, and uses data available in IMDb and last.fm, which are popular reference datasets in these domains. The dataset contains 20 entity pairs from each of these domains and, for each entity pair, a ranked list with 50 relationship paths. It also contains entity ratings and property relevance scores for the entities and properties used in the paths.(This version supersedes the previous one)

  20. h

    amazon_us_reviews

    • huggingface.co
    • tensorflow.org
    Updated Jun 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Polina Kazakova (2023). amazon_us_reviews [Dataset]. https://huggingface.co/datasets/polinaeterna/amazon_us_reviews
    Explore at:
    Dataset updated
    Jun 30, 2023
    Authors
    Polina Kazakova
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Amazon Customer Reviews (a.k.a. Product Reviews) is one of Amazons iconic products. In a period of over two decades since the first review in 1995, millions of Amazon customers have contributed over a hundred million reviews to express opinions and describe their experiences regarding products on the Amazon.com website. This makes Amazon Customer Reviews a rich source of information for academic researchers in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and Machine Learning (ML), amongst others. Accordingly, we are releasing this data to further research in multiple disciplines related to understanding customer product experiences. Specifically, this dataset was constructed to represent a sample of customer evaluations and opinions, variation in the perception of a product across geographical regions, and promotional intent or bias in reviews.

    Over 130+ million customer reviews are available to researchers as part of this release. The data is available in TSV files in the amazon-reviews-pds S3 bucket in AWS US East Region. Each line in the data files corresponds to an individual review (tab delimited, with no quote and escape characters).

    Each Dataset contains the following columns:

    • marketplace: 2 letter country code of the marketplace where the review was written.
    • customer_id: Random identifier that can be used to aggregate reviews written by a single author.
    • review_id: The unique ID of the review.
    • product_id: The unique Product ID the review pertains to. In the multilingual dataset the reviews for the same product in different countries can be grouped by the same product_id.
    • product_parent: Random identifier that can be used to aggregate reviews for the same product.
    • product_title: Title of the product.
    • product_category: Broad product category that can be used to group reviews (also used to group the dataset into coherent parts).
    • star_rating: The 1-5 star rating of the review.
    • helpful_votes: Number of helpful votes.
    • total_votes: Number of total votes the review received.
    • vine: Review was written as part of the Vine program.
    • verified_purchase: The review is on a verified purchase.
    • review_headline: The title of the review.
    • review_body: The review text.
    • review_date: The date the review was written.
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Angelo Batista Neves Júnior; Luiz André Portes Paes Leme; Marco Antonio Casanova (2023). Empirical Analysis of Ranking Models for an Adaptable Dataset Search: complementary material [Dataset]. http://doi.org/10.6084/m9.figshare.5620651.v4
Organization logo

Empirical Analysis of Ranking Models for an Adaptable Dataset Search: complementary material

Explore at:
zipAvailable download formats
Dataset updated
Jun 2, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Angelo Batista Neves Júnior; Luiz André Portes Paes Leme; Marco Antonio Casanova
License

https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

Description

This repository contains performance measures of dataset ranking models.- Usage: from Results/src run Python results m1 m2 ...such that mi can be omitted, or be any element of the list of model labels ['bayesian-12C', 'bayesian-5L', 'bayesian-5L12C', 'cos-12C', 'cos-5L', 'cos-5L5C', 'j48-12C', 'j48-5L', 'j48-5L5C', 'jrip-12C', 'jrip-5L', 'jrip-5L5C', 'sn-12C', 'sn-5L', 'sn-5L12C']. Results of selected models will be plotted in a 2D line plot. If no model is provided all models will be listed.

Search
Clear search
Close search
Google apps
Main menu