51 datasets found

Empirical Analysis of Ranking Models for an Adaptable Dataset Search:...
figshare.com
zip
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Angelo Batista Neves Júnior; Luiz André Portes Paes Leme; Marco Antonio Casanova (2023). Empirical Analysis of Ranking Models for an Adaptable Dataset Search: complementary material [Dataset]. http://doi.org/10.6084/m9.figshare.5620651.v4
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5620651.v4
Dataset updated
Jun 2, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Angelo Batista Neves Júnior; Luiz André Portes Paes Leme; Marco Antonio Casanova
License
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Description
This repository contains performance measures of dataset ranking models.- Usage: from Results/src run Python results m1 m2 ...such that mi can be omitted, or be any element of the list of model labels ['bayesian-12C', 'bayesian-5L', 'bayesian-5L12C', 'cos-12C', 'cos-5L', 'cos-5L5C', 'j48-12C', 'j48-5L', 'j48-5L5C', 'jrip-12C', 'jrip-5L', 'jrip-5L5C', 'sn-12C', 'sn-5L', 'sn-5L12C']. Results of selected models will be plotted in a 2D line plot. If no model is provided all models will be listed.
A
‘QS World University Rankings 2017 - 2022’ analyzed by Analyst-2
analyst-2.ai
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘QS World University Rankings 2017 - 2022’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-qs-world-university-rankings-2017-2022-7fc4/d793e726/?iid=007-103&v=presentation
Explore at:
Dataset updated
Aug 1, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘QS World University Rankings 2017 - 2022’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/padhmam/qs-world-university-rankings-2017-2022 on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Context

QS World University Rankings is an annual publication of global university rankings by Quacquarelli Symonds. The QS ranking receives approval from the International Ranking Expert Group (IREG), and is viewed as one of the three most-widely read university rankings in the world. QS publishes its university rankings in partnership with Elsevier.

Content

This dataset contains university data from the year 2017 to 2022. It has a total of 15 features. - university - name of the university - year - year of ranking - rank_display - rank given to the university - score - score of the university based on the six key metrics mentioned above - link - link to the university profile page on QS website - country - country in which the university is located - city - city in which the university is located - region - continent in which the university is located - logo - link to the logo of the university - type - type of university (public or private) - research_output - quality of research at the university - student_faculty_ratio - number of students assigned to per faculty - international_students - number of international students enrolled at the university - size - size of the university in terms of area - faculty_count - number of faculty or academic staff at the university

Acknowledgements

This dataset was acquired by scraping the QS World University Rankings website with Python and Selenium. Cover Image: Source

Inspiration

Some of the questions that can be answered with this dataset, 1. What makes a best ranked university? 2. Does the location of a university play a role in its ranking? 3. What do the best universities have in common? 4. How important is academic research for a university? 5. Which country is preferred by international students?

--- Original source retains full ownership of the source dataset ---
Traces captured by visiting the top 1500 website
kaggle.com
zip
Updated Aug 25, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DNS_dataset (2021). Traces captured by visiting the top 1500 website [Dataset]. https://www.kaggle.com/jacksontang16/traces-captured-by-visiting-the-top-1500-website
Explore at:
zip(5852806 bytes)Available download formats
Dataset updated
Aug 25, 2021
Authors
DNS_dataset
Description
Dataset

This dataset was created by DNS_dataset

Contents
TripAdvisor Datasets
brightdata.com
.json, .csv, .xlsx
Updated Nov 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2023). TripAdvisor Datasets [Dataset]. https://brightdata.com/products/datasets/tripadvisor
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Nov 12, 2023
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Unlock valuable insights with our comprehensive TripAdvisor Dataset, designed for businesses, analysts, and researchers to track customer reviews, ratings, and travel trends. This dataset provides structured and reliable data from TripAdvisor to enhance market research, competitive analysis, and customer satisfaction strategies.

Dataset Features

Business Listings: Access detailed information on hotels, restaurants, attractions, and other businesses, including names, locations, categories, and contact details. Customer Reviews & Ratings: Extract user-generated reviews, star ratings, review dates, and sentiment analysis to understand customer experiences and preferences. Pricing & Booking Data: Track pricing trends, availability, and booking options for hotels, flights, and travel services. Location & Geographical Insights: Analyze travel trends by region, city, or country to identify popular destinations and emerging markets.

Customizable Subsets for Specific Needs Our TripAdvisor Dataset is fully customizable, allowing you to filter data based on location, business type, review sentiment, or specific keywords. Whether you need broad coverage for industry analysis or focused data for customer insights, we tailor the dataset to your needs.

Popular Use Cases

Customer Satisfaction & Brand Monitoring: Track customer feedback, analyze sentiment, and improve service offerings based on real user reviews. Market Research & Competitive Analysis: Compare business performance, monitor competitor reviews, and identify industry trends. Travel & Hospitality Insights: Analyze travel patterns, popular destinations, and seasonal trends to optimize marketing strategies. AI & Machine Learning Applications: Use structured review data to train AI models for sentiment analysis, recommendation engines, and predictive analytics. Pricing Strategy & Revenue Optimization: Monitor pricing trends and customer demand to optimize pricing strategies for hotels, restaurants, and travel services.

Whether you're analyzing customer sentiment, tracking travel trends, or optimizing business strategies, our TripAdvisor Dataset provides the structured data you need. Get started today and customize your dataset to fit your business objectives.
c
Samsung Customer Reviews Dataset
cubig.ai
Updated Jul 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Samsung Customer Reviews Dataset [Dataset]. https://cubig.ai/store/products/567/samsung-customer-reviews-dataset
Explore at:
Dataset updated
Jul 8, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data Introduction • The Samsung Customer Reviews Dataset contains 1,000 customer reviews of various Samsung products, including smartphones, tablets, TVs, and smartwatches. User feedback, ratings, and timestamps are included, which are useful for emotional analysis, customer satisfaction surveys, and product quality assessment.

2) Data Utilization (1) Samsung Customer Reviews Dataset has characteristics that: • This dataset contains structured text and numerical information for each review, including product name, username, rating, review title, review body, and creation date, for detailed analysis by review. (2) Samsung Customer Reviews Dataset can be used to: • Customer Opinion Analysis and Emotional Classification: Review texts and ratings can be used to identify customer positive and negative emotions, major complaints and compliments about Samsung products, and to improve products and develop marketing strategies. • Comparison of satisfaction and trend analysis by product: By analyzing review data by product group and period, market trends such as popular products, changes in customer preferences, and repeatedly mentioned issues can be derived and used for competitor analysis or new product planning.
T
CORRUPTION RANK.PHP by Country Dataset
tradingeconomics.com
csv, excel, json, xml
Updated Jun 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2025). CORRUPTION RANK.PHP by Country Dataset [Dataset]. https://tradingeconomics.com/country-list/corruption-rank.php
Explore at:
excel, csv, json, xmlAvailable download formats
Dataset updated
Jun 4, 2025
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2025
Area covered
World
Description
This dataset provides values for CORRUPTION RANK.PHP reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
f
Data from: Evaluation of classification techniques for identifying fake...
scielo.figshare.com
jpeg
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrey Schmidt dos Santos; Luis Felipe Riehs Camargo; Daniel Pacheco Lacerda (2023). Evaluation of classification techniques for identifying fake reviews about products and services on the internet [Dataset]. http://doi.org/10.6084/m9.figshare.14283143.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14283143.v1
Dataset updated
May 30, 2023
Dataset provided by
SciELO journals
Authors
Andrey Schmidt dos Santos; Luis Felipe Riehs Camargo; Daniel Pacheco Lacerda
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract: With the e-commerce growth, more people are buying products over the internet. To increase customer satisfaction, merchants provide spaces for product and service reviews. Products with positive reviews attract customers, while products with negative reviews lose customers. Following this idea, some individuals and corporations write fake reviews to promote their products and services or defame their competitors. The difficulty for finding these reviews was in the large amount of information available. One solution is to use data mining techniques and tools, such as the classification function. Exploring this situation, the present work evaluates classification techniques to identify fake reviews about products and services on the Internet. The research also presents a literature systematic review on fake reviews. The research used 8 classification algorithms. The algorithms were trained and tested with a hotels database. The CONCENSO algorithm presented the best result, with 88% in the precision indicator. After the first test, the algorithms classified reviews on another hotels database. To compare the results of this new classification, the Review Skeptic algorithm was used. The SVM and GLMNET algorithms presented the highest convergence with the Review Skeptic algorithm, classifying 83% of reviews with the same result. The research contributes by demonstrating the algorithms ability to understand consumers’ real reviews to products and services on the Internet. Another contribution is to be the pioneer in the investigation of fake reviews in Brazil and in production engineering.
T
GDP by Country Dataset
tradingeconomics.com
csv, excel, json, xml
Updated Jun 29, 2011
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2011). GDP by Country Dataset [Dataset]. https://tradingeconomics.com/country-list/gdp
Explore at:
csv, json, xml, excelAvailable download formats
Dataset updated
Jun 29, 2011
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2025
Area covered
World
Description
This dataset provides values for GDP reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
2k-ranked-images-open-image-preferences-v1
huggingface.co
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rapidata (2025). 2k-ranked-images-open-image-preferences-v1 [Dataset]. https://huggingface.co/datasets/Rapidata/2k-ranked-images-open-image-preferences-v1
Explore at:
Dataset updated
Jun 1, 2025
Dataset provided by
Rapidata AG
Authors
Rapidata
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
2k Ranked Images

This dataset contains roughly two thousand images ranked from most preferred to least preferred based on human feedback on pairwise comparisons (>25k responses). The generated images, which are a sample from the open-image-preferences-v1 dataset from the team @data-is-better-together, are rated purely based on aesthetic preference, disregarding the prompt used for generation. We provide the categories of the original dataset for easy filtering. This is a new… See the full description on the dataset page: https://huggingface.co/datasets/Rapidata/2k-ranked-images-open-image-preferences-v1.
u
Amazon review data 2018
cseweb.ucsd.edu
nijianmo.github.io
+1more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Amazon review data 2018 [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/
Explore at:
Dataset authored and provided by
UCSD CSE Research Project
Description
Context

This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:

More reviews:

The total number of reviews is 233.1 million (142.8 million in 2014).

New reviews:

Current data includes reviews in the range May 1996 - Oct 2018.

Metadata: - We have added transaction metadata for each review shown on the review page.

Added more detailed metadata of the product landing page.

Acknowledgements

If you publish articles based on this dataset, please cite the following paper:

Jianmo Ni, Jiacheng Li, Julian McAuley. Justifying recommendations using distantly-labeled reviews and fined-grained aspects. EMNLP, 2019.
Movehub City Rankings
kaggle.com
zip
Updated Mar 24, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Blitzer (2017). Movehub City Rankings [Dataset]. https://www.kaggle.com/blitzr/movehub-city-rankings
Explore at:
zip(34310 bytes)Available download formats
Dataset updated
Mar 24, 2017
Authors
Blitzer
Description
Context

Movehub city ranking as published on http://www.movehub.com/city-rankings

Content

movehubqualityoflife.csv

Cities ranked by
Movehub Rating: A combination of all scores for an overall rating for a city or country.
Purchase Power: This compares the average cost of living with the average local wage.
Health Care: Compiled from how citizens feel about their access to healthcare, and its quality.
Pollution: Low is good. A score of how polluted people find a city, includes air, water and noise pollution.
Quality of Life: A balance of healthcare, pollution, purchase power, crime rate to give an overall quality of life score.
Crime Rating: Low is good. The lower the score the safer people feel in this city.

movehubcostofliving.csv

Unit: GBP
City
Cappuccino
Cinema
Wine
Gasoline
Avg Rent
Avg Disposable Income

cities.csv

Cities to countries as parsed from Wikipedia https://en.wikipedia.org/wiki/List_of_towns_and_cities_with_100,000_or_more_inhabitants/cityname:_A (A-Z)

Acknowledgements

Movehub

http://www.movehub.com/city-rankings

Wikipedia

https://en.wikipedia.org/wiki/List_of_towns_and_cities_with_100,000_or_more_inhabitants/cityname:_A
h
Data from: ReviewRebuttal
huggingface.co
Updated May 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhang (2025). ReviewRebuttal [Dataset]. https://huggingface.co/datasets/Daoze/ReviewRebuttal
Explore at:
Dataset updated
May 11, 2025
Authors
Zhang
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Introduction

This dataset is the largest real-world consistency-ensured dataset for peer review, which features the widest range of conferences and the most complete review stages, including initial submissions, reviews, ratings and confidence, aspect ratings, rebuttals, discussions, score changes, meta-reviews, and final decisions.

Comparison with Existing Datasets

The comparison between our proposed dataset and existing peer review datasets is given below. Only the… See the full description on the dataset page: https://huggingface.co/datasets/Daoze/ReviewRebuttal.
P
IMDb Movie Reviews Dataset
paperswithcode.com
Updated Dec 20, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew L. Maas; Raymond E. Daly; Peter T. Pham; Dan Huang; Andrew Y. Ng; Christopher Potts (2013). IMDb Movie Reviews Dataset [Dataset]. https://paperswithcode.com/dataset/imdb-movie-reviews
Explore at:
Dataset updated
Dec 20, 2013
Authors
Andrew L. Maas; Raymond E. Daly; Peter T. Pham; Dan Huang; Andrew Y. Ng; Christopher Potts
Description
The IMDb Movie Reviews dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. No more than 30 reviews are included per movie. The dataset contains additional unlabeled data.
W
Resources of Global City Comparison Indicators
cloud.csiss.gmu.edu
data.wu.ac.at
xls
Updated Jun 5, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Greater London Authority (GLA) (2015). Resources of Global City Comparison Indicators [Dataset]. https://cloud.csiss.gmu.edu/uddi/dataset/resources-of-global-city-comparison-indicators
Explore at:
xlsAvailable download formats
Dataset updated
Jun 5, 2015
Dataset provided by
Greater London Authority (GLA)
Description
A list of some key resources for comparing London with other world cities.

European Union/Eurostat, Urban Audit

Arcadis, Sustainable cities index

AT Kearney, Global Cities Index

McKinsey, Urban world: Mapping the economic power of cities

Knight Frank, Wealth report

OECD, Better Life Index

UNODC, Statistics on drugs, crime and criminal justice at the international level

Economist, Hot Spots

Economist, Global Liveability Ranking and Report August 2014

Mercer, Quality of Living Reports

PWC, Cities of opportunity

BCG, Decoding Global Talent

Forbes, World's most influential cities

Mastercard, Global Destination Cities Index

Numbeo, Database of user contributed data
c
Apple iPhone SE reviews & ratings Dataset
cubig.ai
Updated Feb 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Apple iPhone SE reviews & ratings Dataset [Dataset]. https://cubig.ai/store/products/143/apple-iphone-se-reviews-ratings-dataset
Explore at:
Dataset updated
Feb 25, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data introduction • Apple-iphone-se-reviews dataset is a dataset that scrapes data from the Flipkart website using Selenium and BeautifulSoup links.

2) Data utilization (1)Apple-iphone-se-reviews data has characteristics that: • User ratings for Apple iPhone SE on Indian e-commerce website Flipkart are . We aim at NLP text classification through user ratings, review titles, and review text. (2)Apple-iphone-se-reviews data can be used to: • Rating prediction: You can support automated review analysis and summarization by developing machine learning models to predict ratings based on review text. • Product Improvement: Insights gained from reviews can help us identify common issues and areas for improvement in iPhone SE and guide product development and quality improvements.
d
Data from: Analysis of intelligent vehicle technologies to improve...
datadryad.org
explore.openaire.eu
+1more
zip
Updated Oct 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ivan Runhua Xiao; Xiaodong Qian (2022). Analysis of intelligent vehicle technologies to improve vulnerable road users safety at signalized intersections [Dataset]. http://doi.org/10.25338/B8234N
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.25338/B8234N
Dataset updated
Oct 14, 2022
Dataset provided by
Dryad
Authors
Ivan Runhua Xiao; Xiaodong Qian
Time period covered
2022
Description
The data files can be viewed by Excel.
P
MSLR WEB30K Dataset
paperswithcode.com
Updated Apr 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tao Qin; Tie-Yan Liu (2025). MSLR WEB30K Dataset [Dataset]. https://paperswithcode.com/dataset/mslr-web30k
Explore at:
Dataset updated
Apr 14, 2025
Authors
Tao Qin; Tie-Yan Liu
Description
The datasets are machine learning data, in which queries and urls are represented by IDs. The datasets consist of feature vectors extracted from query-url pairs along with relevance judgment labels:

(1) The relevance judgments are obtained from a retired labeling set of a commercial web search engine (Microsoft Bing), which take 5 values from 0 (irrelevant) to 4 (perfectly relevant).

(2) The features are basically extracted by us, and are those widely used in the research community.

In the data files, each row corresponds to a query-url pair. The first column is relevance label of the pair, the second column is query id, and the following columns are features. The larger value the relevance label has, the more relevant the query-url pair is. A query-url pair is represented by a 136-dimensional feature vector.
WONDERBREAD: A Benchmark + Dataset for Business Process Management (BPM)...
zenodo.org
csv, json, zip
Updated Oct 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Wornow; Michael Wornow (2024). WONDERBREAD: A Benchmark + Dataset for Business Process Management (BPM) Tasks [Dataset]. http://doi.org/10.5281/zenodo.12671568
Explore at:
csv, zip, jsonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.12671568
Dataset updated
Oct 14, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Michael Wornow; Michael Wornow
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jul 6, 2024
Description
Paper: WONDERBREAD: A Benchmark for Evaluating Multimodal Foundation Models on Business Process Management Tasks

Background

The WONDERBREAD dataset contains 2,928 human demonstrations of 598 web navigation workflows across 6 types of BPM tasks. These tasks measure the ability of a model to generate accurate documentation, assist in knowledge transfer, and improve the effeciency of workflows.

Please see our website for more details: https://wonderbread.stanford.edu/

Quick Start

To start, download debug_demos.zip (~1 GB). It contains a subset of 24 demonstrations which can give you a sense of how the dataset is structured.

To reproduce the paper, download gold_demos.zip (~33 GB). It contains 724 demonstrations corresponding to the 162 "Gold" tasks which were used for all the evaluations in the original paper.

To obtain the full dataset, download demos.zip (~133 GB). This contains all 2,928 demonstrations and can be used for training, fine-tuning, and evaluating models.

Dataset Structure

The dataset contains several files, defined below.

Raw Data (useful for training/fine-tuning/evaluation)

debug_demos.zip -- a subset of only 24 demonstrations taken from the full dataset. Useful to get a sense of the dataset and for debugging.

gold_demos.zip -- a subset of only 724 demonstrations corresopnding to the 162 "Gold" tasks. This is the dataset that was used for all evaluations in the original WONDERBREAD paper.

demos.zip -- all 2,928 demonstrations across 598 tasks. Useful for training your own models.

Evaluation (useful for evaluation)

qa_dataset.csv -- contains all 120 questions and ground truth answers used in the "Knowlege Transfer" evaluation.

df_rankings.csv -- contains the rankings of all "Gold" tasks used in the "SOP Ranking" evaluation.

Metadata (can be safely ignored)

Process Mining Task Demonstrations.xlsx -- maps human annotators to specific demonstrations; also contains "Gold" task rankings used in the "SOP Ranking" evaluation.

metadata.json -- maps Google Drive URLs to Google Drive Folder IDs to demonstration names

df_valid.csv -- tracks assets associated with each demonstration
f
Entity Relatedness Test Dataset - V2
figshare.com
zip
Updated May 15, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
José Eduardo Talavera Herrera; Marco Antonio Casanova (2017). Entity Relatedness Test Dataset - V2 [Dataset]. http://doi.org/10.6084/m9.figshare.5007983.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5007983.v1
Dataset updated
May 15, 2017
Dataset provided by
figshare
Authors
José Eduardo Talavera Herrera; Marco Antonio Casanova
License
https://www.gnu.org/copyleft/gpl.htmlhttps://www.gnu.org/copyleft/gpl.html
Description
The entity relatedness problem refers to the question of computing the relationship paths that better describe the connectivity between a given entity pair. This dataset supports the evaluation of approaches that address the entity relatedness problem. It covers two familiar domains, music and movies, and uses data available in IMDb and last.fm, which are popular reference datasets in these domains. The dataset contains 20 entity pairs from each of these domains and, for each entity pair, a ranked list with 50 relationship paths. It also contains entity ratings and property relevance scores for the entities and properties used in the paths.(This version supersedes the previous one)
h
amazon_us_reviews
huggingface.co
tensorflow.org
Updated Jun 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Polina Kazakova (2023). amazon_us_reviews [Dataset]. https://huggingface.co/datasets/polinaeterna/amazon_us_reviews
Explore at:
Dataset updated
Jun 30, 2023
Authors
Polina Kazakova
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Amazon Customer Reviews (a.k.a. Product Reviews) is one of Amazons iconic products. In a period of over two decades since the first review in 1995, millions of Amazon customers have contributed over a hundred million reviews to express opinions and describe their experiences regarding products on the Amazon.com website. This makes Amazon Customer Reviews a rich source of information for academic researchers in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and Machine Learning (ML), amongst others. Accordingly, we are releasing this data to further research in multiple disciplines related to understanding customer product experiences. Specifically, this dataset was constructed to represent a sample of customer evaluations and opinions, variation in the perception of a product across geographical regions, and promotional intent or bias in reviews.

Over 130+ million customer reviews are available to researchers as part of this release. The data is available in TSV files in the amazon-reviews-pds S3 bucket in AWS US East Region. Each line in the data files corresponds to an individual review (tab delimited, with no quote and escape characters).

Each Dataset contains the following columns:

marketplace: 2 letter country code of the marketplace where the review was written.

customer_id: Random identifier that can be used to aggregate reviews written by a single author.

review_id: The unique ID of the review.

product_id: The unique Product ID the review pertains to. In the multilingual dataset the reviews for the same product in different countries can be grouped by the same product_id.

product_parent: Random identifier that can be used to aggregate reviews for the same product.

product_title: Title of the product.

product_category: Broad product category that can be used to group reviews (also used to group the dataset into coherent parts).

star_rating: The 1-5 star rating of the review.

helpful_votes: Number of helpful votes.

total_votes: Number of total votes the review received.

vine: Review was written as part of the Vine program.

verified_purchase: The review is on a verified purchase.

review_headline: The title of the review.

review_body: The review text.

review_date: The date the review was written.

Facebook

Twitter

Click to copy link

Link copied

Cite

Angelo Batista Neves Júnior; Luiz André Portes Paes Leme; Marco Antonio Casanova (2023). Empirical Analysis of Ranking Models for an Adaptable Dataset Search: complementary material [Dataset]. http://doi.org/10.6084/m9.figshare.5620651.v4

Empirical Analysis of Ranking Models for an Adaptable Dataset Search: complementary material

Explore at:

zipAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.5620651.v4

Dataset updated

Jun 2, 2023

Dataset provided by

Figsharehttp://figshare.com/

Authors

Angelo Batista Neves Júnior; Luiz André Portes Paes Leme; Marco Antonio Casanova

License

https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

Description

This repository contains performance measures of dataset ranking models.- Usage: from Results/src run Python results m1 m2 ...such that mi can be omitted, or be any element of the list of model labels ['bayesian-12C', 'bayesian-5L', 'bayesian-5L12C', 'cos-12C', 'cos-5L', 'cos-5L5C', 'j48-12C', 'j48-5L', 'j48-5L5C', 'jrip-12C', 'jrip-5L', 'jrip-5L5C', 'sn-12C', 'sn-5L', 'sn-5L12C']. Results of selected models will be plotted in a 2D line plot. If no model is provided all models will be listed.

Clear search

Close search

Google apps

Main menu

Empirical Analysis of Ranking Models for an Adaptable Dataset Search:...

‘QS World University Rankings 2017 - 2022’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

Traces captured by visiting the top 1500 website

Dataset

Contents

TripAdvisor Datasets

Samsung Customer Reviews Dataset

CORRUPTION RANK.PHP by Country Dataset

Data from: Evaluation of classification techniques for identifying fake...

GDP by Country Dataset

2k-ranked-images-open-image-preferences-v1

Amazon review data 2018

Context

Acknowledgements

Movehub City Rankings

Context

Content

movehubqualityoflife.csv

movehubcostofliving.csv

cities.csv

Acknowledgements

Movehub

Wikipedia

Data from: ReviewRebuttal

IMDb Movie Reviews Dataset

Resources of Global City Comparison Indicators

Apple iPhone SE reviews & ratings Dataset

Data from: Analysis of intelligent vehicle technologies to improve...

MSLR WEB30K Dataset

WONDERBREAD: A Benchmark + Dataset for Business Process Management (BPM)...

Background

Quick Start

Dataset Structure

Entity Relatedness Test Dataset - V2

amazon_us_reviews

Empirical Analysis of Ranking Models for an Adaptable Dataset Search: complementary material