https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
This dataset consists of circles (or friends lists ) from Facebook. Facebook data was collected from survey participants using this Facebook app. The dataset includes node features (profiles), circles, and ego networks.
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
This dataset consists of circles (or lists ) from Twitter. Twitter data was crawled from public sources. The dataset includes node features (profiles), circles, and ego networks. Data is also available from Facebook and Google+. ##Dataset statistics |Attribute| Value| |————-|————| |Nodes| 81306| |Edges| 1768149| |Nodes in largest WCC |81306 (1.000)| |Edges in largest WCC| 1768149 (1.000)| |Nodes in largest SCC| 68413 (0.841)| |Edges in largest SCC |1685163 (0.953)| |Average clustering coefficient| 0.5653| |Number of triangles| 13082506| |Fraction of closed triangles| 0.06415| |Diameter (longest shortest path)| 7| |90-percentile effective diameter| 4.5|
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dynamic face-to-face interaction networks represent the interactions that happen during discussions between a group of participants playing the Resistance game. This dataset contains networks extracted from 62 games. Each game is played by 5-8 participants and lasts between 45--60 minutes. We extract dynamically evolving networks from the free-form discussions using the ICAF algorithm. The extracted networks are used to characterize and detect group deceptive behavior using the DeceptionRank algorithm.
The networks are weighted, directed and temporal. Each node represents a participant. At each 1/3 second, a directed edge from node u to v is weighted by the probability of participant u looking at participant v or the laptop. Additionally, we also provide a binary version where an edge from u to v indicates participant u looks at participant v (or the laptop).
Stanford Network Analysis Platform (SNAP) is a general purpose, high performance system for analysis and manipulation of large networks. Graphs consists of nodes and directed/undirected/multiple edges between the graph nodes. Networks are graphs with data on nodes and/or edges of the network.
The core SNAP library is written in C++ and optimized for maximum performance and compact graph representation. It easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. Besides scalability to large graphs, an additional strength of SNAP is that nodes, edges and attributes in a graph or a network can be changed dynamically during the computation.
SNAP was originally developed by Jure Leskovec in the course of his PhD studies. The first release was made available in Nov, 2009. SNAP uses a general purpose STL (Standard Template Library)-like library GLib developed at Jozef Stefan Institute. SNAP and GLib are being actively developed and used in numerous academic and industrial projects.
Networks from SNAP (Stanford Network Analysis Platform) Network Data Sets, Jure Leskovec http://snap.stanford.edu/data/index.html email jure at cs.stanford.edu
Citation for the SNAP collection:
@misc{snapnets, author = {Jure Leskovec and Andrej Krevl}, title = {{SNAP Datasets}: {Stanford} Large Network Dataset Collection}, howpublished = {\url{http://snap.stanford.edu/data}}, month = jun, year = 2014 }
The following matrices/graphs were added to the collection in June 2010 by Tim Davis (problem id and name):
2284 SNAP/soc-Epinions1 who-trusts-whom network of Epinions.com 2285 SNAP/soc-LiveJournal1 LiveJournal social network 2286 SNAP/soc-Slashdot0811 Slashdot social network, Nov 2008 2287 SNAP/soc-Slashdot0902 Slashdot social network, Feb 2009 2288 SNAP/wiki-Vote Wikipedia who-votes-on-whom network 2289 SNAP/email-EuAll Email network from a EU research institution 2290 SNAP/email-Enron Email communication network from Enron 2291 SNAP/wiki-Talk Wikipedia talk (communication) network 2292 SNAP/cit-HepPh Arxiv High Energy Physics paper citation network 2293 SNAP/cit-HepTh Arxiv High Energy Physics paper citation network 2294 SNAP/cit-Patents Citation network among US Patents 2295 SNAP/ca-AstroPh Collaboration network of Arxiv Astro Physics 2296 SNAP/ca-CondMat Collaboration network of Arxiv Condensed Matter 2297 SNAP/ca-GrQc Collaboration network of Arxiv General Relativity 2298 SNAP/ca-HepPh Collaboration network of Arxiv High Energy Physics 2299 SNAP/ca-HepTh Collaboration network of Arxiv High Energy Physics Theory 2300 SNAP/web-BerkStan Web graph of Berkeley and Stanford 2301 SNAP/web-Google Web graph from Google 2302 SNAP/web-NotreDame Web graph of Notre Dame 2303 SNAP/web-Stanford Web graph of Stanford.edu 2304 SNAP/amazon0302 Amazon product co-purchasing network from March 2 2003 2305 SNAP/amazon0312 Amazon product co-purchasing network from March 12 2003 2306 SNAP/amazon0505 Amazon product co-purchasing network from May 5 2003 2307 SNAP/amazon0601 Amazon product co-purchasing network from June 1 2003 2308 SNAP/p2p-Gnutella04 Gnutella peer to peer network from August 4 2002 2309 SNAP/p2p-Gnutella05 Gnutella peer to peer network from August 5 2002 2310 SNAP/p2p-Gnutella06 Gnutella peer to peer network from August 6 2002 2311 SNAP/p2p-Gnutella08 Gnutella peer to peer network from August 8 2002 2312 SNAP/p2p-Gnutella09 Gnutella peer to peer network from August 9 2002 2313 SNAP/p2p-Gnutella24 Gnutella peer to peer network from August 24 2002 2314 SNAP/p2p-Gnutella25 Gnutella peer to peer network from August 25 2002 2315 SNAP/p2p-Gnutella30 Gnutella peer to peer network from August 30 2002 2316 SNAP/p2p-Gnutella31 Gnutella peer to peer network from August 31 2002 2317 SNAP/roadNet-CA Road network of California 2318 SNAP/roadNet-PA Road network of Pennsylvania 2319 SNAP/roadNet-TX Road network of Texas 2320 SNAP/as-735 733 daily instances(graphs) from November 8 1997 to January 2 2000 2321 SNAP/as-Skitter Internet topology graph, from traceroutes run daily in 2005 2322 SNAP/as-caida The CAIDA AS Relationships Datasets, from January 2004 to November 2007 2323 SNAP/Oregon-1 AS peering information inferred from Oregon route-views between March 31 and May 26 2001 2324 SNAP/Oregon-2 AS peering information inferred from Oregon route-views between March 31 and May 26 2001 2325 SNAP/soc-sign-epinions Epinions signed social network 2326 SNAP/soc-sign-Slashdot081106 Slashdot Zoo signed social network from November 6 2008 2327 SNAP/soc-sign-Slashdot090216 Slashdot Zoo signed social network from February 16 2009 2328 SNAP/soc-sign-Slashdot090221 Slashdot Zoo signed social network from February 21 2009
Then the following problems were added in July 2018. All data and metadata from the SNAP data set was imported into the SuiteSparse Matrix Collection.
2777 SNAP/CollegeMsg Messages on a Facebook-like platform at UC-Irvine 2778 SNAP/com-Amazon Amazon product network 2779 SNAP/com-DBLP DBLP collaboration network 2780 SNAP/com-Friendster Friendster online social network 2781 SNAP/com-LiveJournal LiveJournal online social network 2782 SNAP/com-Orkut Orkut online social network 2783 SNAP/com-Youtube Youtube online social network 2784 SNAP/email-Eu-core E-mail network 2785 SNAP/email-Eu-core-temporal E-mails between users at a research institution 2786 SNAP/higgs-twitter twitter messages re: Higgs boson on 4th July 2012. 2787 SNAP/loc-Brightkite Brightkite location based online social network 2788 SNAP/loc-Gowalla Gowalla location based online social network 2789 SNAP/soc-Pokec Pokec online social network 2790 SNAP/soc-sign-bitcoin-alpha Bitcoin Alpha web of trust network 2791 SNAP/soc-sign-bitcoin-otc Bitcoin OTC web of trust network 2792 SNAP/sx-askubuntu Comments, questions, and answers on Ask Ubuntu 2793 SNAP/sx-mathoverflow Comments, questions, and answers on Math Overflow 2794 SNAP/sx-stackoverflow Comments, questions, and answers on Stack Overflow 2795 SNAP/sx-superuser Comments, questions, and answers on Super User 2796 SNAP/twitter7 A collection of 476 million tweets collected between June-Dec 2009 2797 SNAP/wiki-RfA Wikipedia Requests for Adminship (with text) 2798 SNAP/wiki-talk-temporal Users editing talk pages on Wikipedia 2799 SNAP/wiki-topcats Wikipedia hyperlinks (with communities)
The following 13 graphs/networks were in the SNAP data set in July 2018 but have not yet been imported into the SuiteSparse Matrix Collection. They may be added in the future:
amazon-meta ego-Facebook ego-Gplus ego-Twitter gemsec-Deezer gemsec-Facebook ksc-time-series memetracker9 web-flickr web-Reddit web-RedditPizzaRequests wiki-Elec wiki-meta wikispeedia
The 2010 description of the SNAP data set gave these categories:
Social networks: online social networks, edges represent interactions between people
Communication networks: email communication networks with edges representing communication
Citation networks: nodes represent papers, edges represent citations
Collaboration networks: nodes represent scientists, edges represent collaborations (co-authoring a paper)
Web graphs: nodes represent webpages and edges are hyperlinks
Blog and Memetracker graphs: nodes represent time stamped blog posts, edges are hyperlinks [revised below]
Amazon networks : nodes represent products and edges link commonly co-purchased products
Internet networks : nodes represent computers and edges communication
Road networks : nodes represent intersections and edges roads connecting the intersections
Autonomous systems : graphs of the internet
Signed networks : networks with positive and negative edges (friend/foe, trust/distrust)
By July 2018, the following categories had been added:
Networks with ground-truth communities : ground-truth network communities in social and information networks
Location-based online social networks : Social networks with geographic check-ins
Wikipedia networks, articles, and metadata : Talk, editing, voting, and article data from Wikipedia
Temporal networks : networks where edges have timestamps
Twitter and Memetracker : Memetracker phrases, links and 467 million Tweets
Online communities : Data from online communities such as Reddit and Flickr
Online reviews : Data from online review systems such as BeerAdvocate and Amazon
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Gowalla is a location-based social networking website where users share their locations by checking-in. The friendship network is undirected and was collected using their public API, and consists of 196,591 nodes and 950,327 edges. We have collected a total of 6,442,890 check-ins of these users over the period of Feb. 2009 - Oct. 2010.
Brightkite was once a location-based social networking service provider where users shared their locations by checking-in. The friendship network was collected using their public API, and consists of 58,228 nodes and 214,078 edges. The network is originally directed but we have constructed a network with undirected edges when there is a friendship in both ways. We have also collected a total of 4,491,143 checkins of these users over the period of Apr. 2008 - Oct. 2010.
Stanford Network Analysis Platform (SNAP) is a general purpose, high performance system for analysis and manipulation of large networks. Graphs consists of nodes and directed/undirected/multiple edges between the graph nodes. Networks are graphs with data on nodes and/or edges of the network.
The core SNAP library is written in C++ and optimized for maximum performance and compact graph representation. It easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. Besides scalability to large graphs, an additional strength of SNAP is that nodes, edges and attributes in a graph or a network can be changed dynamically during the computation.
SNAP was originally developed by Jure Leskovec in the course of his PhD studies. The first release was made available in Nov, 2009. SNAP uses a general purpose STL (Standard Template Library)-like library GLib developed at Jozef Stefan Institute. SNAP and GLib are being actively developed and used in numerous academic and industrial projects.
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
This dataset consists of circles from Google+. Google+ data was collected from users who had manually shared their circles using the share circle feature. The dataset includes node features (profiles), circles, and ego networks. Data is also available from Facebook and Twitter. Dataset statistics Nodes 107614 Edges 13673453 Nodes in largest WCC 107614 (1.000) Edges in largest WCC 13673453 (1.000) Nodes in largest SCC 69501 (0.646) Edges in largest SCC 9168660 (0.671) Average clustering coefficient 0.4901 Number of triangles 1073677742 Fraction of closed triangles 0.6552 Diameter (longest shortest path) 6 90-percentile effective diameter 3 Source (citation) J. McAuley and J. Leskovec. Learning to Discover Social Circles in Ego Networks. NIPS, 2012.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Network was collected by crawling Amazon website. It is based on Customers Who Bought This Item Also Bought feature of the Amazon website. If a product i is frequently co-purchased with product j, the graph contains a directed edge from i to j.
The data was collected by crawling Amazon website and contains product metadata and review information about 548,552 different products (Books, music CDs, DVDs and VHS video tapes).
For each product the following information is available:
Title Salesrank List of similar products (that get co-purchased with the current product) Detailed product categorization Product reviews: time, customer, rating, number of votes, number of people that found the review helpful
Stanford Network Analysis Platform (SNAP) is a general purpose, high performance system for analysis and manipulation of large networks. Graphs consists of nodes and directed/undirected/multiple edges between the graph nodes. Networks are graphs with data on nodes and/or edges of the network.
The core SNAP library is written in C++ and optimized for maximum performance and compact graph representation. It easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. Besides scalability to large graphs, an additional strength of SNAP is that nodes, edges and attributes in a graph or a network can be changed dynamically during the computation.
SNAP was originally developed by Jure Leskovec in the course of his PhD studies. The first release was made available in Nov, 2009. SNAP uses a general purpose STL (Standard Template Library)-like library GLib developed at Jozef Stefan Institute. SNAP and GLib are being actively developed and used in numerous academic and industrial projects.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Name | Type | Nodes | Edges | Description |
---|---|---|---|---|
soc-Epinions1 | Directed | 75,879 | 508,837 | Who-trusts-whom network of Epinions.com |
soc-LiveJournal1 | Directed | 4,847,571 | 68,993,773 | LiveJournal online social network |
soc-Pokec | Directed | 1,632,803 | 30,622,564 | Pokec online social network |
soc-Slashdot0811 | Directed | 77,360 | 905,468 | Slashdot social network from November 2008 |
soc-Slashdot0922 | Directed | 82,168 | 948,464 | Slashdot social network from February 2009 |
soc-sign-bitcoin-otc | Weighted, Signed, Directed, Temporal | 5,881 | 35,592 | Bitcoin OTC web of trust network |
soc-sign-bitcoin-alpha | Weighted, Signed, Directed, Temporal | 3,783 | 24,186 | Bitcoin Alpha web of trust network |
https://networkrepository.com/policy.phphttps://networkrepository.com/policy.php
Youtube online social network - Youtube is a video-sharing web site that includes a social network. The dataset contains a list of all of the user-to-user links.
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
This is a who-trust-whom online social network of a a general consumer review site Epinions.com. Members of the site can decide whether to trust each other. All the trust relationships interact and form the Web of Trust which is then combined with review ratings to determine which reviews are shown to the user. Dataset statistics Nodes 75879 Edges 508837 Nodes in largest WCC 75877 (1.000) Edges in largest WCC 508836 (1.000) Nodes in largest SCC 32223 (0.425) Edges in largest SCC 443506 (0.872) Average clustering coefficient 0.1378 Number of triangles 1624481 Fraction of closed triangles 0.0229 Diameter (longest shortest path) 14 90-percentile effective diameter 5
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
LiveJournal is a free on-line community with almost 10 million members; a significant fraction of these members are highly active. (For example, roughly 300,000 update their content in any given 24-hour period.) LiveJournal allows members to maintain journals, individual and group blogs, and it allows people to declare which other members are their friends they belong. Dataset statistics Nodes 4847571 Edges 68993773 Nodes in largest WCC 4843953 (0.999) Edges in largest WCC 68983820 (1.000) Nodes in largest SCC 3828682 (0.790) Edges in largest SCC 65825429 (0.954) Average clustering coefficient 0.2742 Number of triangles 285730264 Fraction of closed triangles 0.04266 Diameter (longest shortest path) 16 90-percentile effective diameter 6.5
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The network was generated using email data from a large European research institution. For a period from October 2003 to May 2005 (18 months) we have anonymized information about all incoming and outgoing email of the research institution. For each sent or received email message we know the time, the sender and the recipient of the email. Overall we have 3,038,531 emails between 287,755 different email addresses. Note that we have a complete email graph for only 1,258 email addresses that come from the research institution. Furthermore, there are 34,203 email addresses that both sent and received email within the span of our dataset. All other email addresses are either non-existing, mistyped or spam.
Given a set of email messages, each node corresponds to an email address. We create a directed edge between nodes i and j, if i sent at least one message to j.
Enron email communication network covers all the email communication within a dataset of around half million emails. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation. Nodes of the network are email addresses and if an address i sent at least one email to address j, the graph contains an undirected edge from i to j. Note that non-Enron email addresses act as sinks and sources in the network as we only observe their communication with the Enron email addresses.
The Enron email data was originally released by William Cohen at CMU.
Wikipedia is a free encyclopedia written collaboratively by volunteers around the world. Each registered user has a talk page, that she and other users can edit in order to communicate and discuss updates to various articles on Wikipedia. Using the latest complete dump of Wikipedia page edit history (from January 3 2008) we extracted all user talk page changes and created a network.
The network contains all the users and discussion from the inception of Wikipedia till January 2008. Nodes in the network represent Wikipedia users and a directed edge from node i to node j represents that user i at least once edited a talk page of user j.
The dynamic face-to-face interaction networks represent the interactions that happen during discussions between a group of participants playing the Resistance game. This dataset contains networks extracted from 62 games. Each game is played by 5-8 participants and lasts between 45--60 minutes. We extract dynamically evolving networks from the free-form discussions using the ICAF algorithm. The extracted networks are used to characterize and detect group deceptive behavior using the DeceptionRank algorithm.
The networks are weighted, directed and temporal. Each node represents a participant. At each 1/3 second, a directed edge from node u to v is weighted by the probability of participant u looking at participant v or the laptop. Additionally, we also provide a binary version where an edge from u to v indicates participant u looks at participant v (or the laptop).
Stanford Network Analysis Platform (SNAP) is a general purpose, high performance system for analysis and manipulation of large networks. Graphs consists of nodes and directed/undirected/multiple edges between the graph nodes. Networks are graphs with data on nodes and/or edges of the network.
The core SNAP library is written in C++ and optimized for maximum performance and compact graph representation. It easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. Besides scalability to large graphs, an additional strength of SNAP is that nodes, edges and attributes in a graph or a network can be changed dynamically during the computation.
SNAP was originally developed by Jure Leskovec in the course of his PhD studies. The first release was made available in Nov, 2009. SNAP uses a general purpose STL (Standard Template Library)-like library GLib developed at Jozef Stefan Institute. SNAP and GLib are being actively developed and used in numerous academic and industrial projects.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
https://snap.stanford.edu/data/soc-sign-bitcoin-alpha.html
Dataset information
This is who-trusts-whom network of people who trade using Bitcoin on a
platform called Bitcoin Alpha (http://www.btcalpha.com/). Since Bitcoin
users are anonymous, there is a need to maintain a record of users'
reputation to prevent transactions with fraudulent and risky users. Members
of Bitcoin Alpha rate other members in a scale of -10 (total distrust) to
+10 (total trust) in steps of 1. This is the first explicit weighted signed
directed network available for research.
Dataset statistics
Nodes 3,783
Edges 24,186
Range of edge weight -10 to +10
Percentage of positive edges 93%
Similar network from another Bitcoin platform, Bitcoin OTC, is available at
https://snap.stanford.edu/data/soc-sign-bitcoinotc.html (and as
SNAP/bitcoin-otc in the SuiteSparse Matrix Collection).
Source (citation) Please cite the following paper if you use this dataset:
S. Kumar, F. Spezzano, V.S. Subrahmanian, C. Faloutsos. Edge Weight
Prediction in Weighted Signed Networks. IEEE International Conference on
Data Mining (ICDM), 2016.
http://cs.stanford.edu/~srijan/pubs/wsn-icdm16.pdf
The following BibTeX citation can be used:
@inproceedings{kumar2016edge,
title={Edge weight prediction in weighted signed networks},
author={Kumar, Srijan and Spezzano, Francesca and
Subrahmanian, VS and Faloutsos, Christos},
booktitle={Data Mining (ICDM), 2016 IEEE 16th Intl. Conf. on},
pages={221--230},
year={2016},
organization={IEEE}
}
The project webpage for this paper, along with its code to calculate two
signed network metrics---fairness and goodness---is available at
http://cs.umd.edu/~srijan/wsn/
Files
File Description
soc-sign-bitcoinalpha.csv.gz
Weighted Signed Directed Bitcoin Alpha web of trust network
Data format
Each line has one rating with the following format:
SOURCE, TARGET, RATING, TIME
where
SOURCE: node id of source, i.e., rater
TARGET: node id of target, i.e., ratee
RATING: the source's rating for the target,
ranging from -10 to +10 in steps of 1
TIME: the time of the rating, measured as seconds since Epoch.
Notes on inclusion into the Suite...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Intersections and endpoints are represented by nodes and the roads connecting these intersections or road endpoints are represented by undirected edges.
Stanford Network Analysis Platform (SNAP) is a general purpose, high performance system for analysis and manipulation of large networks. Graphs consists of nodes and directed/undirected/multiple edges between the graph nodes. Networks are graphs with data on nodes and/or edges of the network.
The core SNAP library is written in C++ and optimized for maximum performance and compact graph representation. It easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. Besides scalability to large graphs, an additional strength of SNAP is that nodes, edges and attributes in a graph or a network can be changed dynamically during the computation.
SNAP was originally developed by Jure Leskovec in the course of his PhD studies. The first release was made available in Nov, 2009. SNAP uses a general purpose STL (Standard Template Library)-like library GLib developed at Jozef Stefan Institute. SNAP and GLib are being actively developed and used in numerous academic and industrial projects.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Gowalla Dataset
The Gowalla dataset, sourced from the Stanford Network Analysis Project (SNAP), contains user check-ins and social network information from the now-defunct location-based social networking platform Gowalla.
Key features:
Check-in data: records of user check-ins at various locations with timestamps and geographical coordinates (latitude, longitude). Social graph: user relationships represented as a graph, where edges denote friendships between users.… See the full description on the dataset page: https://huggingface.co/datasets/habedi/gowalla-dataset.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Youtube social network and ground-truth communities Dataset information Youtube is a video-sharing web site that includes a social network. In the Youtube social network, users form friendship each other and users can create groups which other users can join. We consider such user-defined groups as ground-truth communities. This data is provided by Alan Mislove et al.
We regard each connected component in a group as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.
more info : https://snap.stanford.edu/data/com-Youtube.html
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The ego-nets of Eastern European users collected from the music streaming service Deezer in February 2020. Nodes are users and edges are mutual follower relationships. The related task is the prediction of gender for the ego node in the graph.
The social networks of developers who starred popular machine learning and web development repositories (with at least 10 stars) until 2019 August. Nodes are users and links are follower relationships. The task is to decide whether a social network belongs to web or machine learning developers. We only included the largest component (at least with 10 users) of graphs.
Discussion and non-discussion based threads from Reddit which we collected in May 2018. Nodes are Reddit users who participate in a discussion and links are replies between them. The task is to predict whether a thread is discussion based or not (binary classification).
The ego-nets of Twitch users who participated in the partnership program in April 2018. Nodes are users and links are friendships. The binary classification task is to predict using the ego-net whether the ego user plays a single or multple games. Players who play a single game usually have a more dense ego-net.
Stanford Network Analysis Platform (SNAP) is a general purpose, high performance system for analysis and manipulation of large networks. Graphs consists of nodes and directed/undirected/multiple edges between the graph nodes. Networks are graphs with data on nodes and/or edges of the network.
The core SNAP library is written in C++ and optimized for maximum performance and compact graph representation. It easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. Besides scalability to large graphs, an additional strength of SNAP is that nodes, edges and attributes in a graph or a network can be changed dynamically during the computation.
SNAP was originally developed by Jure Leskovec in the course of his PhD studies. The first release was made available in Nov, 2009. SNAP uses a general purpose STL (Standard Template Library)-like library GLib developed at Jozef Stefan Institute. SNAP and GLib are being actively developed and used in numerous academic and industrial projects.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
https://snap.stanford.edu/data/sx-askubuntu.html
Dataset information
This is a temporal network of interactions on the stack exchange web site
Ask Ubuntu (http://askubuntu.com/). There are three different types of
interactions represented by a directed edge (u, v, t):
user u answered user v's question at time t (in the graph sx-askubuntu-a2q)
user u commented on user v's question at time t (in the graph
sx-askubuntu-c2q) user u commented on user v's answer at time t (in the
graph sx-askubuntu-c2a)
The graph sx-askubuntu contains the union of these graphs. These graphs
were constructed from the Stack Exchange Data Dump. Node ID numbers
correspond to the 'OwnerUserId' tag in that data dump.
Dataset statistics (sx-askubuntu)
Nodes 159,316
Temporal Edges 964,437
Edges in static graph 596,933
Time span 2613 days
Dataset statistics (sx-askubuntu-a2q)
Nodes 137,517
Temporal Edges 280,102
Edges in static graph 262,106
Time span 2613 days
Dataset statistics (sx-askubuntu-c2q)
Nodes 79,155
Temporal Edges 327,513
Edges in static graph 198,852
Time span 2047 days
Dataset statistics (sx-askubuntu-c2a)
Nodes 75,555
Temporal Edges 356,822
Edges in static graph 178,210
Time span 2418 days
Source (citation)
Ashwin Paranjape, Austin R. Benson, and Jure Leskovec. "Motifs in Temporal
Networks." In Proceedings of the Tenth ACM International Conference on Web
Search and Data Mining, 2017.
Files
File Description
sx-askubuntu.txt.gz All interactions
sx-askubuntu-a2q.txt.gz Answers to questions
sx-askubuntu-c2q.txt.gz Comments to questions
sx-askubuntu-c2a.txt.gz Comments to answers
Data format
SRC DST UNIXTS
where edges are separated by a new line and
SRC: id of the source node (a user)
TGT: id of the target node (a user)
UNIXTS: Unix timestamp (seconds since the epoch)
...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All the real-world data sets are employed in the paper "Competition Between Homophily and Information Entropy Maximization in Social Networks", which will be published in PLOS ONE 2015. Three soical networks are included, in which CA-HepPh .txt is a collaboration network from the e-print arXiv(http://www.arxiv.org) and covers scientific collaborations between authors of papers submitted to High Energy Physics, neworleans-links-connected.txt is the giant component of the Facebook network in New Orleans (all node ids are converted to random numbers), jure_Email-Enron.txt is an email communication network that covers all the email communication within a data set of around half million emails. In each file, one line represtnes an edge and two nodes are seperated by a Tab. The demo code to read the graph can be found in test.py. These datasets are obtained from public available soruces in the Internet and their original download links or contacts can also be found as follows: CA-HepPh: http://snap.stanford.edu/data/ca-HepPh.html NewOrleans: http://socialnetworks.mpi-sws.org/datasets.html Email-Enron: http://snap.stanford.edu/data/email-Enron.html
This dataset includes graph theory indicators (centrality and clustering coefficients) for the Stanford Network Analysis Project (SNAP) "email-Eu-core-temporal" network, a well-known reference dataset for Social Network Analysis (SNA) of e-mail traffic.
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
This dataset consists of circles (or friends lists ) from Facebook. Facebook data was collected from survey participants using this Facebook app. The dataset includes node features (profiles), circles, and ego networks.