63 datasets found
  1. a

    Facebook SNAP Network Data

    • academictorrents.com
    bittorrent
    Updated Nov 22, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Network Analysis Platform (SNAP) (2015). Facebook SNAP Network Data [Dataset]. https://academictorrents.com/details/3efc53f35d49669b89039f2b4ec9de11ec1d73fd
    Explore at:
    bittorrent(951514)Available download formats
    Dataset updated
    Nov 22, 2015
    Dataset authored and provided by
    Stanford Network Analysis Platform (SNAP)
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    This dataset consists of circles (or friends lists ) from Facebook. Facebook data was collected from survey participants using this Facebook app. The dataset includes node features (profiles), circles, and ego networks.

  2. a

    Twitter SNAP Network Data

    • academictorrents.com
    bittorrent
    Updated Nov 22, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Network Analysis Platform (SNAP) (2015). Twitter SNAP Network Data [Dataset]. https://academictorrents.com/details/276e1028b08decbf711f275a57901dbde88ca5ab
    Explore at:
    bittorrent(32962356)Available download formats
    Dataset updated
    Nov 22, 2015
    Dataset authored and provided by
    Stanford Network Analysis Platform (SNAP)
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    This dataset consists of circles (or lists ) from Twitter. Twitter data was crawled from public sources. The dataset includes node features (profiles), circles, and ego networks. Data is also available from Facebook and Google+. ##Dataset statistics |Attribute| Value| |————-|————| |Nodes| 81306| |Edges| 1768149| |Nodes in largest WCC |81306 (1.000)| |Edges in largest WCC| 1768149 (1.000)| |Nodes in largest SCC| 68413 (0.841)| |Edges in largest SCC |1685163 (0.953)| |Average clustering coefficient| 0.5653| |Number of triangles| 13082506| |Fraction of closed triangles| 0.06415| |Diameter (longest shortest path)| 7| |90-percentile effective diameter| 4.5|

  3. P

    Group SNAP Dataset

    • paperswithcode.com
    Updated Jul 21, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Group SNAP Dataset [Dataset]. https://paperswithcode.com/dataset/group-snap-snap-suitesparse-matrix-collection
    Explore at:
    Dataset updated
    Jul 21, 2018
    Description

    Networks from SNAP (Stanford Network Analysis Platform) Network Data Sets, Jure Leskovec http://snap.stanford.edu/data/index.html email jure at cs.stanford.edu

    Citation for the SNAP collection:

    @misc{snapnets, author = {Jure Leskovec and Andrej Krevl}, title = {{SNAP Datasets}: {Stanford} Large Network Dataset Collection}, howpublished = {\url{http://snap.stanford.edu/data}}, month = jun, year = 2014 }

    The following matrices/graphs were added to the collection in June 2010 by Tim Davis (problem id and name):

    2284 SNAP/soc-Epinions1 who-trusts-whom network of Epinions.com 2285 SNAP/soc-LiveJournal1 LiveJournal social network 2286 SNAP/soc-Slashdot0811 Slashdot social network, Nov 2008 2287 SNAP/soc-Slashdot0902 Slashdot social network, Feb 2009 2288 SNAP/wiki-Vote Wikipedia who-votes-on-whom network 2289 SNAP/email-EuAll Email network from a EU research institution 2290 SNAP/email-Enron Email communication network from Enron 2291 SNAP/wiki-Talk Wikipedia talk (communication) network 2292 SNAP/cit-HepPh Arxiv High Energy Physics paper citation network 2293 SNAP/cit-HepTh Arxiv High Energy Physics paper citation network 2294 SNAP/cit-Patents Citation network among US Patents 2295 SNAP/ca-AstroPh Collaboration network of Arxiv Astro Physics 2296 SNAP/ca-CondMat Collaboration network of Arxiv Condensed Matter 2297 SNAP/ca-GrQc Collaboration network of Arxiv General Relativity 2298 SNAP/ca-HepPh Collaboration network of Arxiv High Energy Physics 2299 SNAP/ca-HepTh Collaboration network of Arxiv High Energy Physics Theory 2300 SNAP/web-BerkStan Web graph of Berkeley and Stanford 2301 SNAP/web-Google Web graph from Google 2302 SNAP/web-NotreDame Web graph of Notre Dame 2303 SNAP/web-Stanford Web graph of Stanford.edu 2304 SNAP/amazon0302 Amazon product co-purchasing network from March 2 2003 2305 SNAP/amazon0312 Amazon product co-purchasing network from March 12 2003 2306 SNAP/amazon0505 Amazon product co-purchasing network from May 5 2003 2307 SNAP/amazon0601 Amazon product co-purchasing network from June 1 2003 2308 SNAP/p2p-Gnutella04 Gnutella peer to peer network from August 4 2002 2309 SNAP/p2p-Gnutella05 Gnutella peer to peer network from August 5 2002 2310 SNAP/p2p-Gnutella06 Gnutella peer to peer network from August 6 2002 2311 SNAP/p2p-Gnutella08 Gnutella peer to peer network from August 8 2002 2312 SNAP/p2p-Gnutella09 Gnutella peer to peer network from August 9 2002 2313 SNAP/p2p-Gnutella24 Gnutella peer to peer network from August 24 2002 2314 SNAP/p2p-Gnutella25 Gnutella peer to peer network from August 25 2002 2315 SNAP/p2p-Gnutella30 Gnutella peer to peer network from August 30 2002 2316 SNAP/p2p-Gnutella31 Gnutella peer to peer network from August 31 2002 2317 SNAP/roadNet-CA Road network of California 2318 SNAP/roadNet-PA Road network of Pennsylvania 2319 SNAP/roadNet-TX Road network of Texas 2320 SNAP/as-735 733 daily instances(graphs) from November 8 1997 to January 2 2000 2321 SNAP/as-Skitter Internet topology graph, from traceroutes run daily in 2005 2322 SNAP/as-caida The CAIDA AS Relationships Datasets, from January 2004 to November 2007 2323 SNAP/Oregon-1 AS peering information inferred from Oregon route-views between March 31 and May 26 2001 2324 SNAP/Oregon-2 AS peering information inferred from Oregon route-views between March 31 and May 26 2001 2325 SNAP/soc-sign-epinions Epinions signed social network 2326 SNAP/soc-sign-Slashdot081106 Slashdot Zoo signed social network from November 6 2008 2327 SNAP/soc-sign-Slashdot090216 Slashdot Zoo signed social network from February 16 2009 2328 SNAP/soc-sign-Slashdot090221 Slashdot Zoo signed social network from February 21 2009

    Then the following problems were added in July 2018. All data and metadata from the SNAP data set was imported into the SuiteSparse Matrix Collection.

    2777 SNAP/CollegeMsg Messages on a Facebook-like platform at UC-Irvine 2778 SNAP/com-Amazon Amazon product network 2779 SNAP/com-DBLP DBLP collaboration network 2780 SNAP/com-Friendster Friendster online social network 2781 SNAP/com-LiveJournal LiveJournal online social network 2782 SNAP/com-Orkut Orkut online social network 2783 SNAP/com-Youtube Youtube online social network 2784 SNAP/email-Eu-core E-mail network 2785 SNAP/email-Eu-core-temporal E-mails between users at a research institution 2786 SNAP/higgs-twitter twitter messages re: Higgs boson on 4th July 2012. 2787 SNAP/loc-Brightkite Brightkite location based online social network 2788 SNAP/loc-Gowalla Gowalla location based online social network 2789 SNAP/soc-Pokec Pokec online social network 2790 SNAP/soc-sign-bitcoin-alpha Bitcoin Alpha web of trust network 2791 SNAP/soc-sign-bitcoin-otc Bitcoin OTC web of trust network 2792 SNAP/sx-askubuntu Comments, questions, and answers on Ask Ubuntu 2793 SNAP/sx-mathoverflow Comments, questions, and answers on Math Overflow 2794 SNAP/sx-stackoverflow Comments, questions, and answers on Stack Overflow 2795 SNAP/sx-superuser Comments, questions, and answers on Super User 2796 SNAP/twitter7 A collection of 476 million tweets collected between June-Dec 2009 2797 SNAP/wiki-RfA Wikipedia Requests for Adminship (with text) 2798 SNAP/wiki-talk-temporal Users editing talk pages on Wikipedia 2799 SNAP/wiki-topcats Wikipedia hyperlinks (with communities)

    The following 13 graphs/networks were in the SNAP data set in July 2018 but have not yet been imported into the SuiteSparse Matrix Collection. They may be added in the future:

    amazon-meta ego-Facebook ego-Gplus ego-Twitter gemsec-Deezer gemsec-Facebook ksc-time-series memetracker9 web-flickr web-Reddit web-RedditPizzaRequests wiki-Elec wiki-meta wikispeedia

    The 2010 description of the SNAP data set gave these categories:

    • Social networks: online social networks, edges represent interactions between people

    • Communication networks: email communication networks with edges representing communication

    • Citation networks: nodes represent papers, edges represent citations

    • Collaboration networks: nodes represent scientists, edges represent collaborations (co-authoring a paper)

    • Web graphs: nodes represent webpages and edges are hyperlinks

    • Blog and Memetracker graphs: nodes represent time stamped blog posts, edges are hyperlinks [revised below]

    • Amazon networks : nodes represent products and edges link commonly co-purchased products

    • Internet networks : nodes represent computers and edges communication

    • Road networks : nodes represent intersections and edges roads connecting the intersections

    • Autonomous systems : graphs of the internet

    • Signed networks : networks with positive and negative edges (friend/foe, trust/distrust)

    By July 2018, the following categories had been added:

    • Networks with ground-truth communities : ground-truth network communities in social and information networks

    • Location-based online social networks : Social networks with geographic check-ins

    • Wikipedia networks, articles, and metadata : Talk, editing, voting, and article data from Wikipedia

    • Temporal networks : networks where edges have timestamps

    • Twitter and Memetracker : Memetracker phrases, links and 467 million Tweets

    • Online communities : Data from online communities such as Reddit and Flickr

    • Online reviews : Data from online review systems such as BeerAdvocate and Amazon

    https://sparse.tamu.edu/SNAP

  4. Data from: Youtube social network

    • kaggle.com
    zip
    Updated Sep 1, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lorenzo De Tomasi (2019). Youtube social network [Dataset]. https://www.kaggle.com/datasets/lodetomasi1995/youtube-social-network
    Explore at:
    zip(10604317 bytes)Available download formats
    Dataset updated
    Sep 1, 2019
    Authors
    Lorenzo De Tomasi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    YouTube
    Description

    Youtube social network and ground-truth communities Dataset information Youtube is a video-sharing web site that includes a social network. In the Youtube social network, users form friendship each other and users can create groups which other users can join. We consider such user-defined groups as ground-truth communities. This data is provided by Alan Mislove et al.

    We regard each connected component in a group as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.

    more info : https://snap.stanford.edu/data/com-Youtube.html

  5. User-actions Graphs

    • kaggle.com
    Updated Nov 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subhajit Sahu (2021). User-actions Graphs [Dataset]. https://www.kaggle.com/datasets/wolfram77/graphs-user-actions/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 12, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Subhajit Sahu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The MOOC user action dataset represents the actions taken by users on a popular MOOC platform. The actions are represented as a directed, temporal network. The nodes represent users and course activities (targets), and edges represent the actions by users on the targets. The actions have attributes and timestamps. To protect user privacy, we anonimize the users and timestamps are standardized to start from timestamp 0. The dataset is directed, temporal, and attributed.

    Additionally, each action has a binary label, representing whether the user dropped-out of the course after this action, i.e., whether this is last action of the user.

    This dataset serves as a recommender system dataset and a dynamic network dataset.

    Stanford Network Analysis Platform (SNAP) is a general purpose, high performance system for analysis and manipulation of large networks. Graphs consists of nodes and directed/undirected/multiple edges between the graph nodes. Networks are graphs with data on nodes and/or edges of the network.

    The core SNAP library is written in C++ and optimized for maximum performance and compact graph representation. It easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. Besides scalability to large graphs, an additional strength of SNAP is that nodes, edges and attributes in a graph or a network can be changed dynamically during the computation.

    SNAP was originally developed by Jure Leskovec in the course of his PhD studies. The first release was made available in Nov, 2009. SNAP uses a general purpose STL (Standard Template Library)-like library GLib developed at Jozef Stefan Institute. SNAP and GLib are being actively developed and used in numerous academic and industrial projects.

    http://snap.stanford.edu/data/index.html#actions

  6. a

    Google Plus SNAP Network Data

    • academictorrents.com
    bittorrent
    Updated Nov 22, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Network Analysis Platform (SNAP) (2015). Google Plus SNAP Network Data [Dataset]. https://academictorrents.com/details/cd595c024206ee0e10ffd607f4a3a19d37eaf83c
    Explore at:
    bittorrent(811541565)Available download formats
    Dataset updated
    Nov 22, 2015
    Dataset authored and provided by
    Stanford Network Analysis Platform (SNAP)
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    This dataset consists of circles from Google+. Google+ data was collected from users who had manually shared their circles using the share circle feature. The dataset includes node features (profiles), circles, and ego networks. Data is also available from Facebook and Twitter. Dataset statistics Nodes 107614 Edges 13673453 Nodes in largest WCC 107614 (1.000) Edges in largest WCC 13673453 (1.000) Nodes in largest SCC 69501 (0.646) Edges in largest SCC 9168660 (0.671) Average clustering coefficient 0.4901 Number of triangles 1073677742 Fraction of closed triangles 0.6552 Diameter (longest shortest path) 6 90-percentile effective diameter 3 Source (citation) J. McAuley and J. Leskovec. Learning to Discover Social Circles in Ego Networks. NIPS, 2012.

  7. Geo-location Graphs

    • kaggle.com
    Updated Nov 11, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subhajit Sahu (2021). Geo-location Graphs [Dataset]. https://www.kaggle.com/wolfram77/graphs-geo-location/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 11, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Subhajit Sahu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Gowalla is a location-based social networking website where users share their locations by checking-in. The friendship network is undirected and was collected using their public API, and consists of 196,591 nodes and 950,327 edges. We have collected a total of 6,442,890 check-ins of these users over the period of Feb. 2009 - Oct. 2010.

    Brightkite was once a location-based social networking service provider where users shared their locations by checking-in. The friendship network was collected using their public API, and consists of 58,228 nodes and 214,078 edges. The network is originally directed but we have constructed a network with undirected edges when there is a friendship in both ways. We have also collected a total of 4,491,143 checkins of these users over the period of Apr. 2008 - Oct. 2010.

    Stanford Network Analysis Platform (SNAP) is a general purpose, high performance system for analysis and manipulation of large networks. Graphs consists of nodes and directed/undirected/multiple edges between the graph nodes. Networks are graphs with data on nodes and/or edges of the network.

    The core SNAP library is written in C++ and optimized for maximum performance and compact graph representation. It easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. Besides scalability to large graphs, an additional strength of SNAP is that nodes, edges and attributes in a graph or a network can be changed dynamically during the computation.

    SNAP was originally developed by Jure Leskovec in the course of his PhD studies. The first release was made available in Nov, 2009. SNAP uses a general purpose STL (Standard Template Library)-like library GLib developed at Jozef Stefan Institute. SNAP and GLib are being actively developed and used in numerous academic and industrial projects.

    https://snap.stanford.edu/data/index.html

  8. Epinions Signed Social Network (SNAP)

    • kaggle.com
    Updated Dec 16, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subhajit Sahu (2021). Epinions Signed Social Network (SNAP) [Dataset]. https://www.kaggle.com/wolfram77/graphs-snap-soc-sign-epinions/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 16, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Subhajit Sahu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Epinions social network

    Dataset information

    This is who-trust-whom online social network of a a general consumer review
    site Epinions.com. Members of the site can decide whether to ''trust'' each
    other. All the trust relationships interact and form the Web of Trust which is then combined with review ratings to determine which reviews are shown to the user.

    Dataset statistics

    Nodes 131828
    Edges 841372
    Nodes in largest WCC 119130 (0.904)
    Edges in largest WCC 833695 (0.991)
    Nodes in largest SCC 41441 (0.314)
    Edges in largest SCC 693737 (0.825)
    Average clustering coefficient 0.2424
    Number of triangles 4910076
    Fraction of closed triangles 0.08085
    Diameter (longest shortest path) 14
    90-percentile effective diameter 4.9

    Source (citation)

    J. Leskovec, D. Huttenlocher, J. Kleinberg: Signed Networks in Social Media.
    28th ACM Conference on Human Factors in Computing Systems (CHI), 2010.
    http://cs.stanford.edu/people/jure/pubs/triads-chi10.pdf

    Files
    File Description
    soc-sign-epinions.txt.gz Directed Epinions signed social network

  9. a

    Live Journal SNAP Network Data

    • academictorrents.com
    bittorrent
    Updated Nov 22, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Network Analysis Platform (SNAP) (2015). Live Journal SNAP Network Data [Dataset]. https://academictorrents.com/details/227d085132908313beb19e9d334bfbdce042a8f6
    Explore at:
    bittorrent(259619239)Available download formats
    Dataset updated
    Nov 22, 2015
    Dataset authored and provided by
    Stanford Network Analysis Platform (SNAP)
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    LiveJournal is a free on-line community with almost 10 million members; a significant fraction of these members are highly active. (For example, roughly 300,000 update their content in any given 24-hour period.) LiveJournal allows members to maintain journals, individual and group blogs, and it allows people to declare which other members are their friends they belong. Dataset statistics Nodes 4847571 Edges 68993773 Nodes in largest WCC 4843953 (0.999) Edges in largest WCC 68983820 (1.000) Nodes in largest SCC 3828682 (0.790) Edges in largest SCC 65825429 (0.954) Average clustering coefficient 0.2742 Number of triangles 285730264 Fraction of closed triangles 0.04266 Diameter (longest shortest path) 16 90-percentile effective diameter 6.5

  10. h

    amazon_next_item_selection_local_pipeline-preference_scorer

    • huggingface.co
    Updated May 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    snap-stanford (2025). amazon_next_item_selection_local_pipeline-preference_scorer [Dataset]. https://huggingface.co/datasets/snap-stanford/amazon_next_item_selection_local_pipeline-preference_scorer
    Explore at:
    Dataset updated
    May 12, 2025
    Dataset authored and provided by
    snap-stanford
    Description

    snap-stanford/amazon_next_item_selection_local_pipeline-preference_scorer dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    bigcodebench_three_agents_pipeline-preference_scorer

    • huggingface.co
    Updated Jun 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    bigcodebench_three_agents_pipeline-preference_scorer [Dataset]. https://huggingface.co/datasets/snap-stanford/bigcodebench_three_agents_pipeline-preference_scorer
    Explore at:
    Dataset updated
    Jun 9, 2025
    Dataset authored and provided by
    snap-stanford
    Description

    snap-stanford/bigcodebench_three_agents_pipeline-preference_scorer dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. o

    Belewitte dataset

    • explore.openaire.eu
    • zenodo.org
    Updated Jun 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Merijn Verstraaten; IvI Research (FNWI) (2020). Belewitte dataset [Dataset]. http://doi.org/10.5281/zenodo.3902234
    Explore at:
    Dataset updated
    Jun 20, 2020
    Authors
    Merijn Verstraaten; IvI Research (FNWI)
    Description

    {"references": ["J\u00e9r\u00f4me Kunegis. KONECT - The Koblenz Network Collection. In Proc. Int. Web Observatory Workshop, pages 1343-1350, 2013", "Andrej Krevl and Jure Leskovec. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data, 2014"]} Results of Belewitte (DOI: 10.5281/zenodo.3902239) experiments. Performance data for Breadth-First Search and PageRank on NVidia TitanX, RTX2080Ti, GTX980, and K20 GPUs. Including trained Binary Decision Tree models for predicting the best implementation on an input graph.

  13. h

    preference_iterative_hard

    • huggingface.co
    Updated Jul 21, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    snap-stanford (2013). preference_iterative_hard [Dataset]. https://huggingface.co/datasets/snap-stanford/preference_iterative_hard
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 21, 2013
    Dataset authored and provided by
    snap-stanford
    Description

    snap-stanford/preference_iterative_hard dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. Orkut Social Network and Communities (SNAP)

    • kaggle.com
    Updated Dec 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subhajit Sahu (2021). Orkut Social Network and Communities (SNAP) [Dataset]. https://www.kaggle.com/wolfram77/graphs-snap-com-orkut/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 16, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Subhajit Sahu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Orkut social network and ground-truth communities

    https://snap.stanford.edu/data/com-Orkut.html

    Dataset information

    Orkut (http://www.orkut.com/) is a free on-line social network where users form friendship each other. Orkut also allows users form a group which
    other members can then join. We consider such user-defined groups as
    ground-truth communities. We provide the Orkut friendship social network
    and ground-truth communities. This data is provided by Alan Mislove et al. (http://socialnetworks.mpi-sws.org/data-imc2007.html)

    We regard each connected component in a group as a separate ground-truth
    community. We remove the ground-truth communities which have less than 3
    nodes. We also provide the top 5,000 communities with highest quality
    which are described in our paper (http://arxiv.org/abs/1205.6233). As for
    the network, we provide the largest connected component.

    Dataset statistics
    Nodes 3,072,441
    Edges 117,185,083
    Nodes in largest WCC 3072441 (1.000)
    Edges in largest WCC 117185083 (1.000)
    Nodes in largest SCC 3072441 (1.000)
    Edges in largest SCC 117185083 (1.000)
    Average clustering coefficient 0.1666
    Number of triangles 627584181
    Fraction of closed triangles 0.01414
    Diameter (longest shortest path) 9
    90-percentile effective diameter 4.8

    Source (citation)
    J. Yang and J. Leskovec. Defining and Evaluating Network Communities based on Ground-truth. ICDM, 2012. http://arxiv.org/abs/1205.6233

    Files
    File Description
    com-orkut.ungraph.txt.gz Undirected Orkut network
    com-orkut.all.cmty.txt.gz Orkut communities
    com-orkut.top5000.cmty.txt.gz Orkut communities (Top 5,000)

    Notes on inclusion into the SuiteSparse Matrix Collection, July 2018:

    The graph in the SNAP data set is 1-based, with nodes numbered 1 to
    3,072,626.

    In the SuiteSparse Matrix Collection, Problem.A is the undirected
    Orkut network, a matrix of size n-by-n with n=3,072,441, which is
    the number of unique user id's appearing in any edge.

    Problem.aux.nodeid is a list of the node id's that appear in the SNAP data set. A(i,j)=1 if person nodeid(i) is friends with person nodeid(j). The
    node id's are the same as the SNAP data set (1-based).

    C = Problem.aux.Communities_all is a sparse matrix of size n by 15,301,901 which represents the same number communities in the com-orkut.all.cmty.txt file. The kth line in that file defines the kth community, and is the
    column C(:,k), where where C(i,k)=1 if person nodeid(i) is in the kth
    community. Row C(i,:) and row/column i of the A matrix thus refer to the
    same person, nodeid(i).

    Ctop = Problem.aux.Communities_to...

  15. Signed Graphs

    • kaggle.com
    Updated Nov 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subhajit Sahu (2021). Signed Graphs [Dataset]. https://www.kaggle.com/wolfram77/graphs-signed
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 15, 2021
    Dataset provided by
    Kaggle
    Authors
    Subhajit Sahu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    soc-RedditHyperlinks: Social Network: Reddit Hyperlink Network

    The hyperlink network represents the directed connections between two subreddits (a subreddit is a community on Reddit). We also provide subreddit embeddings. The network is extracted from publicly available Reddit data of 2.5 years from Jan 2014 to April 2017.

    Subreddit Hyperlink Network: the subreddit-to-subreddit hyperlink network is extracted from the posts that create hyperlinks from one subreddit to another. We say a hyperlink originates from a post in the source community and links to a post in the target community. Each hyperlink is annotated with three properties: the timestamp, the sentiment of the source community post towards the target community post, and the text property vector of the source post. The network is directed, signed, temporal, and attributed.

    Note that each post has a title and a body. The hyperlink can be present in either the title of the post or in the body. Therefore, we provide one network file for each.

    Subreddit Embeddings: We have also provided embedding vectors representing each subreddit. These can be found in this dataset link: subreddit embedding dataset. Please note that some subreddit embeddings could not be generated, so this file has 51,278 embeddings.

    soc-sign-bitcoin-otc: Bitcoin OTC trust weighted signed network

    This is who-trusts-whom network of people who trade using Bitcoin on a platform called Bitcoin OTC. Since Bitcoin users are anonymous, there is a need to maintain a record of users' reputation to prevent transactions with fraudulent and risky users. Members of Bitcoin OTC rate other members in a scale of -10 (total distrust) to +10 (total trust) in steps of 1. This is the first explicit weighted signed directed network available for research.

    soc-sign-bitcoin-alpha: Bitcoin Alpha trust weighted signed network

    This is who-trusts-whom network of people who trade using Bitcoin on a platform called Bitcoin Alpha. Since Bitcoin users are anonymous, there is a need to maintain a record of users' reputation to prevent transactions with fraudulent and risky users. Members of Bitcoin Alpha rate other members in a scale of -10 (total distrust) to +10 (total trust) in steps of 1. This is the first explicit weighted signed directed network available for research.

    soc-sign-epinions: Epinions social network

    This is who-trust-whom online social network of a a general consumer review site Epinions.com. Members of the site can decide whether to ''trust'' each other. All the trust relationships interact and form the Web of Trust which is then combined with review ratings to determine which reviews are shown to the user.

    wiki-Elec: Wikipedia adminship election data

    Wikipedia is a free encyclopedia written collaboratively by volunteers around the world. A small part of Wikipedia contributors are administrators, who are users with access to additional technical features that aid in maintenance. In order for a user to become an administrator a Request for adminship (RfA) is issued and the Wikipedia community via a public discussion or a vote decides who to promote to adminship. Using the latest complete dump of Wikipedia page edit history (from January 3 2008) we extracted all administrator elections and vote history data. This gave us nearly 2,800 elections with around 100,000 total votes and about 7,000 users participating in the elections (either casting a vote or being voted on). Out of these 1,200 elections resulted in a successful promotion, while about 1,500 elections did not result in the promotion. About half of the votes in the dataset are by existing admins, while the other half comes from ordinary Wikipedia users.

    Dataset has the following format:

    • E: did the elector result in promotion (1) or not (0)
    • T: time election was closed
    • U: user id (and screen name) of editor that is being considered for promotion
    • N: user id (and screen name) of the nominator
    • V: vote(1:support, 0:neutral, -1:oppose) user_id time screen_name

    wiki-RfA: Wikipedia Requests for Adminship (with text)

    For a Wikipedia editor to become an administrator, a request for adminship (RfA) must be submitted, either by the candidate or by another community member. Subsequently, any Wikipedia member may cast a supporting, neutral, or opposing vote.

    We crawled and parsed all votes since the adoption of the RfA process in 2003 through May 2013. The dataset contains 11,381 users (voters and votees) forming 189,004 distinct voter/votee pairs, for a total of 198,275 votes (this is larger than the number of distinct voter/votee pairs because, if the same user ran for election several times, the same voter/votee pair may contribute several votes).

    This induces a directed, signed network in which nodes represent Wikipedia members and edges represent votes. In this sense, the...

  16. R

    A row generation algorithm for finding optimal burning sequences of large...

    • redu.unicamp.br
    • data.mendeley.com
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Repositório de Dados de Pesquisa da Unicamp (2024). A row generation algorithm for finding optimal burning sequences of large graphs - complementary data [Dataset]. http://doi.org/10.25824/redu/ZGX0H7
    Explore at:
    Dataset updated
    Nov 11, 2024
    Dataset provided by
    Repositório de Dados de Pesquisa da Unicamp
    Dataset funded by
    National Council for Scientific and Technological Developmenthttp://www.cnpq.br/
    São Paulo Research Foundation
    Description

    This dataset contains complementary data to the paper "A Row Generation Algorithm for Finding Optimal Burning Sequences of Large Graphs" [1], which proposes an exact algorithm for the Graph Burning Problem, an NP-hard optimization problem that models a form of contagion diffusion on social networks. Concerning the computational experiments discussed in that paper, we make available: - Four sets of instances; - The optimal (or best known) solutions obtained; - The source code; - An Appendix with additional details about the results. The "delta" input sets include graphs that are real-world networks [1,2], while the "grid" input set contains graphs that are square grids. The directories "delta_10K_instances", "delta_100K_instances", "delta_4M_instances" and "grid_instances" contain files that describe the sets of instances. The first two lines of each file contain: {n} {m} where {n} and {m} are the number of vertices and edges in the graph. Each of the next {m} lines contains: {u} {v} where {u} and {v} identify a pair of vertices that determines an undirected edge. The directories "delta_10K_solutions", "delta_100K_solutions", "delta_4M_solutions" and "grid_solutions" contain files that describe the optimal (or best known) solutions for the corresponding sets of instances. The first line of each file contains: {s} where {s} is the number of vertices in the burning sequence. Each of the next {s} lines contains: {v} where {v} identifies a fire source. The fire sources are listed in the same order that they appear in a burning sequence of length {s}. The directory "source_code" contains the implementations of the exact algorithm proposed in the paper [1], namely, PRYM. Lastly, the file "appendix.pdf" presents additional details on the results reported in the paper. This work was supported by grants from Santander Bank, Brazil, Brazilian National Council for Scientific and Technological Development (CNPq), Brazil, São Paulo Research Foundation (FAPESP), Brazil and Fund for Support to Teaching, Research and Outreach Activities (FAEPEX). Caveat: the opinions, hypotheses and conclusions or recommendations expressed in this material are the sole responsibility of the authors and do not necessarily reflect the views of Santander, CNPq, FAPESP or FAEPEX. References [1] F. C. Pereira, P. J. de Rezende, T. Yunes and L. F. B. Morato. A Row Generation Algorithm for Finding Optimal Burning Sequences of Large Graphs. Submitted. 2024. [2] Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford Large Network Dataset Collection. 2024. https://snap.stanford.edu/data [3] Ryan A. Rossi and Nesreen K. Ahmed. The Network Data Repository with Interactive Graph Analytics and Visualization. In: AAAI, 2022. https://networkrepository.com

  17. Datasets of three social networks in PLOS ONE 2015 paper

    • figshare.com
    application/gzip
    Updated Jan 20, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jichang Zhao (2016). Datasets of three social networks in PLOS ONE 2015 paper [Dataset]. http://doi.org/10.6084/m9.figshare.1512836.v2
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 20, 2016
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Jichang Zhao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    All the real-world data sets are employed in the paper "Competition Between Homophily and Information Entropy Maximization in Social Networks", which will be published in PLOS ONE 2015. Three soical networks are included, in which CA-HepPh .txt is a collaboration network from the e-print arXiv(http://www.arxiv.org) and covers scientific collaborations between authors of papers submitted to High Energy Physics, neworleans-links-connected.txt is the giant component of the Facebook network in New Orleans (all node ids are converted to random numbers), jure_Email-Enron.txt is an email communication network that covers all the email communication within a data set of around half million emails. In each file, one line represtnes an edge and two nodes are seperated by a Tab. The demo code to read the graph can be found in test.py. These datasets are obtained from public available soruces in the Internet and their original download links or contacts can also be found as follows: CA-HepPh: http://snap.stanford.edu/data/ca-HepPh.html NewOrleans: http://socialnetworks.mpi-sws.org/datasets.html Email-Enron: http://snap.stanford.edu/data/email-Enron.html

  18. H

    Graph theory indicators for e-mail network

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Oct 8, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Panayotis Christidis (2019). Graph theory indicators for e-mail network [Dataset]. http://doi.org/10.7910/DVN/DC5M3E
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 8, 2019
    Dataset provided by
    Harvard Dataverse
    Authors
    Panayotis Christidis
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset includes graph theory indicators (centrality and clustering coefficients) for the Stanford Network Analysis Project (SNAP) "email-Eu-core-temporal" network, a well-known reference dataset for Social Network Analysis (SNA) of e-mail traffic.

  19. R

    A hybrid matheuristic for the spread of influence on social networks -...

    • redu.unicamp.br
    • data.mendeley.com
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Repositório de Dados de Pesquisa da Unicamp (2024). A hybrid matheuristic for the spread of influence on social networks - complementary data [Dataset]. http://doi.org/10.25824/redu/CAVFDT
    Explore at:
    Dataset updated
    Nov 11, 2024
    Dataset provided by
    Repositório de Dados de Pesquisa da Unicamp
    Dataset funded by
    National Council for Scientific and Technological Developmenthttp://www.cnpq.br/
    São Paulo Research Foundation
    Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
    Description

    This dataset contains complementary data to the paper "A Hybrid Matheuristic for the Spread of Influence on Social Networks" [1], which proposes a matheuristic for combinatorial optimization problems involving the spread of information in social networks. For the computational experiments discussed in that paper, we provide: - Two sets of instances, originally obtained from [2-6]; - The solutions attained by exact and heuristic methods; - The collected results; - The matheuristic source code; The directories "benchmark_*/instances/" contain files that describe the sets of instances. Each instance is associated with a graph containing {n} vertices and {m} edges. The first {m} lines of each file contain: {u} {v} where {u} and {v} identify a pair of vertices that determines an undirected edge. The next line contains {n} integers corresponding to the costs of the vertices. The last line contains {n} integers corresponding to the thresholds of the vertices. The directories "benchmark_*/solutions_*/" contain files describing feasible solutions for the corresponding sets of instances. The first line of each file contains: {s} where {s} is the number of vertices in the target set. Each of the next {s} lines contains: {v} where {v} identifies a target. The last line contains an integer that represents the target set cost. The directory "hmf_source_code/" contains an implementation of the matheuristic framework proposed in [1], namely, HMF. This work was supported by grants from Santander Bank, the Brazilian National Council for Scientific and Technological Development (CNPq), the São Paulo Research Foundation (FAPESP), the Fund for Support to Teaching, Research and Outreach Activities (FAEPEX), and the Coordination for the Improvement of Higher Education Personnel (CAPES), all in Brazil. Caveat: The opinions, hypotheses and conclusions or recommendations expressed in this material are the sole responsibility of the authors and do not necessarily reflect the views of Santander, CNPq, FAPESP, FAEPEX, or CAPES. References [1] F. C. Pereira, P. J. de Rezende, and T. Yunes. A Hybrid Matheuristic for the Spread of Influence on Social Networks. 2024. Submitted. [2] S. Raghavan and R. Zhang. A branch-and-cut approach for the weighted target set selection problem on social networks. 2024. https://doi.org/10.1287/ijoo.2019.0012 [3] J. Leskovec and A. Krevl. SNAP Datasets: Stanford Large Network Dataset Collection. 2024. https://snap.stanford.edu/data [4] R. A. Rossi and N. K. Ahmed. The Network Data Repository with Interactive Graph Analytics and Visualization. 2022. https://networkrepository.com [5] J. Kunegis. KONECT – The Koblenz Network Collection. 2013. http://dl.acm.org/citation.cfm?id=2488173 [6] O. Lesser, L. Tenenboim-Chekina, L. Rokach, and Y. Elovici. Intruder or Welcome Friend: Inferring Group Membership in Online Social Networks. 2013. https://doi.org/10.1007/978-3-642-37210-0_40

  20. h

    pubmed_pipeline-preference_scorer-combined

    • huggingface.co
    Updated Jun 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    snap-stanford (2025). pubmed_pipeline-preference_scorer-combined [Dataset]. https://huggingface.co/datasets/snap-stanford/pubmed_pipeline-preference_scorer-combined
    Explore at:
    Dataset updated
    Jun 9, 2025
    Dataset authored and provided by
    snap-stanford
    Description

    snap-stanford/pubmed_pipeline-preference_scorer-combined dataset hosted on Hugging Face and contributed by the HF Datasets community

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Stanford Network Analysis Platform (SNAP) (2015). Facebook SNAP Network Data [Dataset]. https://academictorrents.com/details/3efc53f35d49669b89039f2b4ec9de11ec1d73fd

Facebook SNAP Network Data

Explore at:
bittorrent(951514)Available download formats
Dataset updated
Nov 22, 2015
Dataset authored and provided by
Stanford Network Analysis Platform (SNAP)
License

https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

Description

This dataset consists of circles (or friends lists ) from Facebook. Facebook data was collected from survey participants using this Facebook app. The dataset includes node features (profiles), circles, and ego networks.

Search
Clear search
Close search
Google apps
Main menu