3 datasets found

g
Amazon review data 2018
nijianmo.github.io
cseweb.ucsd.edu
+1more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Amazon review data 2018 [Dataset]. https://nijianmo.github.io/amazon/
Explore at:
Dataset authored and provided by
UCSD CSE Research Project
Description
Context

This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:

More reviews:

The total number of reviews is 233.1 million (142.8 million in 2014).

New reviews:

Current data includes reviews in the range May 1996 - Oct 2018.

Metadata: - We have added transaction metadata for each review shown on the review page.

Added more detailed metadata of the product landing page.

Acknowledgements

If you publish articles based on this dataset, please cite the following paper:

Jianmo Ni, Jiacheng Li, Julian McAuley. Justifying recommendations using distantly-labeled reviews and fined-grained aspects. EMNLP, 2019.
O
Amazon-Fraud (Multi-relational Graph Dataset for Amazon Fraudulent Account...
opendatalab.com
zip
Updated Apr 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Beihang University (2023). Amazon-Fraud (Multi-relational Graph Dataset for Amazon Fraudulent Account Detection) [Dataset]. https://opendatalab.com/OpenDataLab/Amazon-Fraud
Explore at:
zip(430310792 bytes)Available download formats
Dataset updated
Apr 8, 2023
Dataset provided by
Beihang University
University of Illinois at Chicago
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Amazon-Fraud is a multi-relational graph dataset built upon the Amazon review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models. Dataset Statistics

Nodes

%Fraud Nodes (Class=1) 11,944 9.5 Relation

Edges

U-P-U 175,608 U-S-U 3,566,479 U-V-U 1,036,737 All 4,398,392 Graph Construction The Amazon dataset includes product reviews under the Musical Instruments category. Similar to this paper, we label users with more than 80% helpful votes as benign entities and users with less than 20% helpful votes as fraudulent entities. we conduct a fraudulent user detection task on the Amazon-Fraud dataset, which is a binary classification task. We take 25 handcrafted features from this paper as the raw node features for Amazon-Fraud. We take users as nodes in the graph and design three relations: 1) U-P-U: it connects users reviewing at least one same product; 2) U-S-V: it connects users having at least one same star rating within one week; 3) U-V-U: it connects users with top 5% mutual review text similarities (measured by TF-IDF) among all users. To download the dataset, please visit this Github repo. For any other questions, please email ytongdou(AT)gmail.com for inquiry.
Product Reviews for Ordinal Quantification
zenodo.org
data.europa.eu
zip
Updated Oct 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz (2023). Product Reviews for Ordinal Quantification [Dataset]. http://doi.org/10.5281/zenodo.8176791
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8176791
Dataset updated
Oct 4, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mirko Bunse; Mirko Bunse; Alejandro Moreo; Alejandro Moreo; Fabrizio Sebastiani; Fabrizio Sebastiani; Martin Senz; Martin Senz
Description
This data set comprises a labeled training set, validation samples, and testing samples for ordinal quantification. The goal of quantification is not to predict the class label of each individual instance, but the distribution of labels in unlabeled sets of data.

The data is extracted from the McAuley data set of product reviews in Amazon, where the goal is to predict the 5-star rating of each textual review. We have sampled this data according to three protocols that are suited for quantification research.

The first protocol is the artificial prevalence protocol (APP), where all possible distributions of labels are drawn with an equal probability. The second protocol, APP-OQ(50%), is a variant thereof, where only the smoothest 50% of all APP samples are considered. This variant is targeted at ordinal quantification, where classes are ordered and a similarity of neighboring classes can be assumed. 5-star ratings of product reviews lie on an ordinal scale and, hence, pose such an ordinal quantification task. The third protocol considers "real" distributions of labels. These distributions stem from actual products in the original data set.

The data is represented by a RoBERTa embedding. In our experience, logistic regression classifiers work well with this representation.

You can extract our data sets yourself, for instance, if you require a raw textual representation. The original McAuley data set is public already and we provide all of our extraction scripts.

Extraction scripts and experiments: https://github.com/mirkobunse/regularized-oq

Original data by McAuley: https://jmcauley.ucsd.edu/data/amazon/
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

UCSD CSE Research Project, Amazon review data 2018 [Dataset]. https://nijianmo.github.io/amazon/

Amazon review data 2018

Explore at:

411 scholarly articles cite this dataset (View in Google Scholar)

Dataset authored and provided by

UCSD CSE Research Project

Description

Context

This Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:

More reviews:
- The total number of reviews is 233.1 million (142.8 million in 2014).
New reviews:
- Current data includes reviews in the range May 1996 - Oct 2018.
Metadata: - We have added transaction metadata for each review shown on the review page.
- Added more detailed metadata of the product landing page.

Acknowledgements

If you publish articles based on this dataset, please cite the following paper:

Jianmo Ni, Jiacheng Li, Julian McAuley. Justifying recommendations using distantly-labeled reviews and fined-grained aspects. EMNLP, 2019.

Clear search

Close search

Google apps

Main menu

Amazon review data 2018

Context

Acknowledgements

Amazon-Fraud (Multi-relational Graph Dataset for Amazon Fraudulent Account...

Nodes

Edges

Product Reviews for Ordinal Quantification

Amazon review data 2018

Context

Acknowledgements