Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Network was collected by crawling Amazon website. It is based on Customers Who Bought This Item Also Bought feature of the Amazon website. If a product i is frequently co-purchased with product j, the graph contains a directed edge from i to j.
The data was collected by crawling Amazon website and contains product metadata and review information about 548,552 different products (Books, music CDs, DVDs and VHS video tapes).
For each product the following information is available:
Title Salesrank List of similar products (that get co-purchased with the current product) Detailed product categorization Product reviews: time, customer, rating, number of votes, number of people that found the review helpful
Stanford Network Analysis Platform (SNAP) is a general purpose, high performance system for analysis and manipulation of large networks. Graphs consists of nodes and directed/undirected/multiple edges between the graph nodes. Networks are graphs with data on nodes and/or edges of the network.
The core SNAP library is written in C++ and optimized for maximum performance and compact graph representation. It easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. Besides scalability to large graphs, an additional strength of SNAP is that nodes, edges and attributes in a graph or a network can be changed dynamically during the computation.
SNAP was originally developed by Jure Leskovec in the course of his PhD studies. The first release was made available in Nov, 2009. SNAP uses a general purpose STL (Standard Template Library)-like library GLib developed at Jozef Stefan Institute. SNAP and GLib are being actively developed and used in numerous academic and industrial projects.
Facebook
TwitterThe dataset used in the paper is the Amazon co-purchasing graph.
Facebook
TwitterThis Dataset is an updated version of the Amazon review dataset released in 2014. As in the previous version, this dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). In addition, this version provides the following features:
More reviews:
New reviews:
Metadata: - We have added transaction metadata for each review shown on the review page.
If you publish articles based on this dataset, please cite the following paper:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
https://snap.stanford.edu/data/com-Amazon.html
Dataset information
Network was collected by crawling the Amazon.com website. It is based on
Customers Who Bought This Item Also Bought feature of the Amazon website.
If a product i is frequently co-purchased with product j, the graph
contains an undirected edge from i to j. Each product category provided by
Amazon defines each ground-truth community.
We regard each connected component in a product category as a separate
ground-truth community. We remove the ground-truth communities which have
less than 3 nodes. We also provide the top 5,000 communities with highest
quality which are described in our paper (http://arxiv.org/abs/1205.6233).
As for the network, we provide the largest connected component.
Dataset statistics
Nodes 334863
Edges 925872
Nodes in largest WCC 334863 (1.000)
Edges in largest WCC 925872 (1.000)
Nodes in largest SCC 334863 (1.000)
Edges in largest SCC 925872 (1.000)
Average clustering coefficient 0.3967
Number of triangles 667129
Fraction of closed triangles 0.07925
Diameter (longest shortest path) 44
90-percentile effective diameter 15
Source (citation) J. Yang and J. Leskovec. Defining and Evaluating Network
Communities based on Ground-truth. ICDM, 2012.
http://arxiv.org/abs/1205.6233
Files
File Description
com-amazon.ungraph.txt.gz Undirected Amazon product co-purchasing network
com-amazon.all.dedup.cmty.txt.gz Amazon communities
com-amazon.top5000.cmty.txt.gz Amazon communities (Top 5,000)
The graph in the SNAP data set is 1-based, with nodes numbered 1 to
548,551.
In the SuiteSparse Matrix Collection, Problem.A is the undirected Amazon
product co-purchasing network, a matrix of size n-by-n with n=334,863,
which is the number of unique product id's appearing in any edge.
Problem.aux.nodeid is a list of the node id's that appear in the SNAP data
set. A(i,j)=1 if the product nodeid(i) is co-purchased with product
nodeid(j). The node id's are the same as the SNAP data set (1-based).
C = Problem.aux.Communities_all is a sparse matrix of size n by 75,149,
which holds the 75,149 categories in the com-amazon.all.dedup.cmty.txt
file. The kth line in that file defines the kth community, and is the
column C(:,k), where C(i,k)=1 if product nodeid(i) is in the kth
community. Row C(i,:) and row/column i of the A matrix thus refer to the
same product, nodeid(i).
Ctop = Problem.aux.Communities_top5000 is n-by-5000, with the same
structure as the C array above, with the content of the
com-amazon.top5000.cmty.txt.
Facebook
TwitterThe dataset used in the paper is a large-scale graph dataset, consisting of users and shows with multi-attribute edges. The graph is constructed by selecting user IDs and side information combinations of shows as nodes, and click/co-click relations and view time as edges.
Facebook
TwitterWebpage: https://ogb.stanford.edu/docs/nodeprop/#ogbn-products
import os.path as osp
import pandas as pd
import datatable as dt
import torch
import torch_geometric as pyg
from ogb.nodeproppred import PygNodePropPredDataset
class PygOgbnProducts(PygNodePropPredDataset):
def _init_(self, meta_csv = None):
root, name, transform = '/kaggle/input', 'ogbn-products', None
if meta_csv is None:
meta_csv = osp.join(root, name, 'ogbn-master.csv')
master = pd.read_csv(meta_csv, index_col = 0)
meta_dict = master[name]
meta_dict['dir_path'] = osp.join(root, name)
super()._init_(name = name, root = root, transform = transform, meta_dict = meta_dict)
def get_idx_split(self, split_type = None):
if split_type is None:
split_type = self.meta_info['split']
path = osp.join(self.root, 'split', split_type)
if osp.isfile(os.path.join(path, 'split_dict.pt')):
return torch.load(os.path.join(path, 'split_dict.pt'))
if self.is_hetero:
train_idx_dict, valid_idx_dict, test_idx_dict = read_nodesplitidx_split_hetero(path)
for nodetype in train_idx_dict.keys():
train_idx_dict[nodetype] = torch.from_numpy(train_idx_dict[nodetype]).to(torch.long)
valid_idx_dict[nodetype] = torch.from_numpy(valid_idx_dict[nodetype]).to(torch.long)
test_idx_dict[nodetype] = torch.from_numpy(test_idx_dict[nodetype]).to(torch.long)
return {'train': train_idx_dict, 'valid': valid_idx_dict, 'test': test_idx_dict}
else:
train_idx = dt.fread(osp.join(path, 'train.csv'), header = None).to_numpy().T[0]
train_idx = torch.from_numpy(train_idx).to(torch.long)
valid_idx = dt.fread(osp.join(path, 'valid.csv'), header = None).to_numpy().T[0]
valid_idx = torch.from_numpy(valid_idx).to(torch.long)
test_idx = dt.fread(osp.join(path, 'test.csv'), header = None).to_numpy().T[0]
test_idx = torch.from_numpy(test_idx).to(torch.long)
return {'train': train_idx, 'valid': valid_idx, 'test': test_idx}
dataset = PygOgbnProducts()
split_idx = dataset.get_idx_split()
train_idx, valid_idx, test_idx = split_idx['train'], split_idx['valid'], split_idx['test']
graph = dataset[0] # PyG Graph object
Graph: The ogbn-products dataset is an undirected and unweighted graph, representing an Amazon product co-purchasing network [1]. Nodes represent products sold in Amazon, and edges between two products indicate that the products are purchased together. The authors follow [2] to process node features and target categories. Specifically, node features are generated by extracting bag-of-words features from the product descriptions followed by a Principal Component Analysis to reduce the dimension to 100.
Prediction task: The task is to predict the category of a product in a multi-class classification setup, where the 47 top-level categories are used for target labels.
Dataset splitting: The authors consider a more challenging and realistic dataset splitting that differs from the one used in [2] Instead of randomly assigning 90% of the nodes for training and 10% of the nodes for testing (without use of a validation set), use the sales ranking (popularity) to split nodes into training/validation/test sets. Specifically, the authors sort the products according to their sales ranking and use the top 8% for training, next top 2% for validation, and the rest for testing. This is a more challenging splitting procedure that closely matches the real-world application where labels are first assigned to important nodes in the network and ML models are subsequently used to make predictions on less important ones.
Note 1: A very small number of self-connecting edges are repeated (see here); you may remove them if necessary.
Note 2: For undirected graphs, the loaded graphs will have the doubled number of edges because the bidirectional edges will be added automatically.
| Package | #Nodes | #Edges | Split Type | Task Type | Metric |
|---|---|---|---|---|---|
ogb>=1.1.1 | 2,449,029 | 61,859,140 | Sales rank | Multi-class classification | Accuracy |
Website: https://ogb.stanford.edu
The Open Graph Benchmark (OGB) [3] is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. OGB datasets are automatically downloaded, processed, and split using the OGB Data Loader. The model performance can be evaluated using the OGB Evaluator in a unified manner.
[1] http://manikvarma.org/downloads/XC/XMLRepository.html [2] Wei-Lin Chiang, ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here we provide additional large-scale datasets used in our work "A Versatile Framework for Attributed Network Clustering via K-Nearest Neighbor Augmentation", along with the index files for constructing KNN graphs using ScaNN and Faiss.
Usage:
cd ANCKA/
unzip ~/Download_path/ANCKA_data.zip -d data/
Facebook
TwitterThese datasets contain 1.48 million question and answer pairs about products from Amazon.
Metadata includes
question and answer text
is the question binary (yes/no), and if so does it have a yes/no answer?
timestamps
product ID (to reference the review dataset)
Basic Statistics:
Questions: 1.48 million
Answers: 4,019,744
Labeled yes/no questions: 309,419
Number of unique products with questions: 191,185
Facebook
Twitterhttps://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
Dataset Information
7,650 119,043 745
Pre-processed as per the official codebase of https://arxiv.org/abs/2210.02016
Citations
@article{ju2023multi, title={Multi-task Self-supervised Graph Neural Networks Enable Stronger Task Generalization}, author={Ju, Mingxuan and Zhao, Tong and Wen, Qianlong and Yu, Wenhao and Shah, Neil and Ye, Yanfang and Zhang, Chuxu}, booktitle={International Conference on Learning… See the full description on the dataset page: https://huggingface.co/datasets/SauravMaheshkar/pareto-amazon-photo.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Ebitda Time Series for Amazon.com Inc. Amazon.com, Inc. engages in the retail sale of consumer products, advertising, and subscriptions service through online and physical stores in North America and internationally. The company operates through three segments: North America, International, and Amazon Web Services (AWS). It also manufactures and sells electronic devices, including Kindle, fire tablets, fire TVs, echo, ring, blink, and eero; and develops and produces media content. In addition, the company offers programs that enable sellers to sell their products in its stores; and programs that allow authors, independent publishers, musicians, filmmakers, Twitch streamers, skill and app developers, and others to publish and sell content. Further, it provides compute, storage, database, analytics, machine learning, and other services, as well as advertising services through programs, such as sponsored ads, display, and video advertising. Additionally, the company offers Amazon Prime, a membership program. The company's products offered through its stores include merchandise and content purchased for resale and products offered by third-party sellers. It also provides AgentCore services, such as AgentCore Runtime, AgentCore Memory, AgentCore Observability, AgentCore Identity, AgentCore Gateway, AgentCore Browser, and AgentCore Code Interpreter. It serves consumers, sellers, developers, enterprises, content creators, advertisers, and employees. Amazon.com, Inc. was incorporated in 1994 and is headquartered in Seattle, Washington.
Facebook
Twitterhttps://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The graph technology market is experiencing robust growth, driven by the increasing need for advanced data analytics and the rising adoption of artificial intelligence (AI) and machine learning (ML) applications. The market's expansion is fueled by the ability of graph databases to handle complex, interconnected data more efficiently than traditional relational databases. This is particularly crucial in industries like finance (fraud detection, risk management), healthcare (patient relationship mapping, drug discovery), and e-commerce (recommendation systems, personalized marketing). Key trends include the move towards cloud-based graph solutions, the integration of graph technology with other data management systems, and the development of more sophisticated graph algorithms for advanced analytics. While challenges remain, such as the need for skilled professionals and the complexity of implementing graph databases, the overall market outlook remains positive, with a projected Compound Annual Growth Rate (CAGR) – let's conservatively estimate this at 25% – for the forecast period 2025-2033. This growth will be driven by ongoing digital transformation initiatives across various sectors, leading to an increased demand for efficient data management and analytics capabilities. We can expect to see continued innovation in both open-source and commercial graph database solutions, further fueling the market's expansion. The competitive landscape is characterized by a mix of established players like Oracle, IBM, and Microsoft, alongside emerging innovative companies such as Neo4j, TigerGraph, and Amazon Web Services. These companies are constantly vying for market share through product innovation, strategic partnerships, and acquisitions. The presence of both open-source and proprietary solutions caters to a diverse range of needs and budgets. The market segmentation, while not explicitly detailed, likely includes categories based on deployment (cloud, on-premise), database type (property graph, RDF), and industry vertical. The regional distribution will likely show strong growth in North America and Europe, reflecting the higher adoption of advanced technologies in these regions, followed by a steady rise in Asia-Pacific and other developing markets. Looking ahead, the convergence of graph technology with other emerging technologies like blockchain and the Internet of Things (IoT) promises to unlock even greater opportunities for growth and innovation in the years to come.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Amazon Financial Dataset: R&D, Marketing, Campaigns, and Profit
This dataset provides fictional yet insightful financial data of Amazon's business activities across all 50 states of the USA. It is specifically designed to help students, researchers, and practitioners perform various data analysis tasks such as log normalization, Gaussian distribution visualization, and financial performance comparisons.
Each row represents a state and contains the following columns:
- R&D Amount (in $): The investment made in research and development.
- Marketing Amount (in $): The expenditure on marketing activities.
- Campaign Amount (in $): The costs associated with promotional campaigns.
- State: The state in which the data is recorded.
- Profit (in $): The net profit generated from the state.
Additional features include log-normalized and Z-score transformations for advanced analysis.
This dataset is ideal for practicing:
1. Log Transformation: Normalize skewed data for better modeling and analysis.
2. Statistical Analysis: Explore relationships between financial investments and profit.
3. Visualization: Create compelling graphs such as Gaussian distributions and standard normal distributions.
4. Machine Learning Projects: Build regression models to predict profits based on R&D and marketing spend.
This dataset is synthetically generated and is not based on actual Amazon financial records. It is created solely for educational and practice purposes.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Change-Receivables Time Series for Amazon.com Inc. Amazon.com, Inc. engages in the retail sale of consumer products, advertising, and subscriptions service through online and physical stores in North America and internationally. The company operates through three segments: North America, International, and Amazon Web Services (AWS). It also manufactures and sells electronic devices, including Kindle, fire tablets, fire TVs, echo, ring, blink, and eero; and develops and produces media content. In addition, the company offers programs that enable sellers to sell their products in its stores; and programs that allow authors, independent publishers, musicians, filmmakers, Twitch streamers, skill and app developers, and others to publish and sell content. Further, it provides compute, storage, database, analytics, machine learning, and other services, as well as advertising services through programs, such as sponsored ads, display, and video advertising. Additionally, the company offers Amazon Prime, a membership program. The company's products offered through its stores include merchandise and content purchased for resale and products offered by third-party sellers. It also provides AgentCore services, such as AgentCore Runtime, AgentCore Memory, AgentCore Observability, AgentCore Identity, AgentCore Gateway, AgentCore Browser, and AgentCore Code Interpreter. It serves consumers, sellers, developers, enterprises, content creators, advertisers, and employees. Amazon.com, Inc. was incorporated in 1994 and is headquartered in Seattle, Washington.
Facebook
TwitterNetwork was collected by crawling Amazon website. It is based on Customers Who Bought This Item Also Bought feature of the Amazon website. If a product i is frequently co-purchased with product j, the graph contains an undirected edge from i to j. Each product category provided by Amazon defines each ground-truth community. We regard each connected component in a product category as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component. The dataset contains 334,863 nodes and 925,872 edges.
Facebook
Twitterhttps://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The graph database market is booming, projected to reach $5.97 billion by 2025 with a 24.4% CAGR. Discover key drivers, trends, and regional insights in our comprehensive market analysis, including leading companies like Neo4j and Amazon. Explore the future of data management with this in-depth report.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Amazon is one of the most recognisable brands in the world, and the third largest by revenue. It was the fourth tech company to reach a $1 trillion market cap, and a market leader in e-commerce,...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States - CBOE Equity VIX on Amazon was 30.99000 Index in November of 2025, according to the United States Federal Reserve. Historically, United States - CBOE Equity VIX on Amazon reached a record high of 72.66000 in March of 2020 and a record low of 5.13000 in March of 2017. Trading Economics provides the current actual value, an historical data chart and related indicators for United States - CBOE Equity VIX on Amazon - last updated from the United States Federal Reserve on December of 2025.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
2 useful files:
This is a large-scale Amazon Reviews dataset, collected in 2023 by McAuley Lab, and it includes rich features such as:
- User Reviews (ratings, text, helpfulness votes, etc.); - Item Metadata (descriptions, price, raw image, etc.); - Links (user-item / bought together graphs).
What's New? In the Amazon Reviews'23, we provide:
Larger Dataset: We collected 571.54M reviews, **245.2% **larger than the last version; - Newer Interactions: Current interactions range from May. 1996 to Sep. 2023; Richer Metadata: More descriptive features in item metadata; Fine-grained Timestamp: Interaction timestamp at the second or finer level; Cleaner Processing: Cleaner item metadata than previous versions; Standard Splitting: Standard data splits to encourage RecSys benchmarking.
Facebook
TwitterThese datasets contain reviews from the Goodreads book review website, and a variety of attributes describing the items. Critically, these datasets have multiple levels of user interaction, raging from adding to a shelf, rating, and reading.
Metadata includes
reviews
add-to-shelf, read, review actions
book attributes: title, isbn
graph of similar books
Basic Statistics:
Items: 1,561,465
Users: 808,749
Interactions: 225,394,930
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The graph shows the citations of ^'s papers published in each year.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Network was collected by crawling Amazon website. It is based on Customers Who Bought This Item Also Bought feature of the Amazon website. If a product i is frequently co-purchased with product j, the graph contains a directed edge from i to j.
The data was collected by crawling Amazon website and contains product metadata and review information about 548,552 different products (Books, music CDs, DVDs and VHS video tapes).
For each product the following information is available:
Title Salesrank List of similar products (that get co-purchased with the current product) Detailed product categorization Product reviews: time, customer, rating, number of votes, number of people that found the review helpful
Stanford Network Analysis Platform (SNAP) is a general purpose, high performance system for analysis and manipulation of large networks. Graphs consists of nodes and directed/undirected/multiple edges between the graph nodes. Networks are graphs with data on nodes and/or edges of the network.
The core SNAP library is written in C++ and optimized for maximum performance and compact graph representation. It easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. Besides scalability to large graphs, an additional strength of SNAP is that nodes, edges and attributes in a graph or a network can be changed dynamically during the computation.
SNAP was originally developed by Jure Leskovec in the course of his PhD studies. The first release was made available in Nov, 2009. SNAP uses a general purpose STL (Standard Template Library)-like library GLib developed at Jozef Stefan Institute. SNAP and GLib are being actively developed and used in numerous academic and industrial projects.