10 datasets found
  1. Communication Graphs

    • kaggle.com
    zip
    Updated Nov 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subhajit Sahu (2021). Communication Graphs [Dataset]. https://www.kaggle.com/wolfram77/graphs-communication
    Explore at:
    zip(66715371 bytes)Available download formats
    Dataset updated
    Nov 15, 2021
    Authors
    Subhajit Sahu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    email-EuAll: EU email communication network

    The network was generated using email data from a large European research institution. For a period from October 2003 to May 2005 (18 months) we have anonymized information about all incoming and outgoing email of the research institution. For each sent or received email message we know the time, the sender and the recipient of the email. Overall we have 3,038,531 emails between 287,755 different email addresses. Note that we have a complete email graph for only 1,258 email addresses that come from the research institution. Furthermore, there are 34,203 email addresses that both sent and received email within the span of our dataset. All other email addresses are either non-existing, mistyped or spam.

    Given a set of email messages, each node corresponds to an email address. We create a directed edge between nodes i and j, if i sent at least one message to j.

    email-Enron: Enron email network

    Enron email communication network covers all the email communication within a dataset of around half million emails. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation. Nodes of the network are email addresses and if an address i sent at least one email to address j, the graph contains an undirected edge from i to j. Note that non-Enron email addresses act as sinks and sources in the network as we only observe their communication with the Enron email addresses.

    The Enron email data was originally released by William Cohen at CMU.

    wiki-Talk: Wikipedia Talk network

    Wikipedia is a free encyclopedia written collaboratively by volunteers around the world. Each registered user has a talk page, that she and other users can edit in order to communicate and discuss updates to various articles on Wikipedia. Using the latest complete dump of Wikipedia page edit history (from January 3 2008) we extracted all user talk page changes and created a network.

    The network contains all the users and discussion from the inception of Wikipedia till January 2008. Nodes in the network represent Wikipedia users and a directed edge from node i to node j represents that user i at least once edited a talk page of user j.

    comm-f2f-Resistance: Dynamic Face-to-Face Interaction Networks

    The dynamic face-to-face interaction networks represent the interactions that happen during discussions between a group of participants playing the Resistance game. This dataset contains networks extracted from 62 games. Each game is played by 5-8 participants and lasts between 45--60 minutes. We extract dynamically evolving networks from the free-form discussions using the ICAF algorithm. The extracted networks are used to characterize and detect group deceptive behavior using the DeceptionRank algorithm.

    The networks are weighted, directed and temporal. Each node represents a participant. At each 1/3 second, a directed edge from node u to v is weighted by the probability of participant u looking at participant v or the laptop. Additionally, we also provide a binary version where an edge from u to v indicates participant u looks at participant v (or the laptop).

    Stanford Network Analysis Platform (SNAP) is a general purpose, high performance system for analysis and manipulation of large networks. Graphs consists of nodes and directed/undirected/multiple edges between the graph nodes. Networks are graphs with data on nodes and/or edges of the network.

    The core SNAP library is written in C++ and optimized for maximum performance and compact graph representation. It easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. Besides scalability to large graphs, an additional strength of SNAP is that nodes, edges and attributes in a graph or a network can be changed dynamically during the computation.

    SNAP was originally developed by Jure Leskovec in the course of his PhD studies. The first release was made available in Nov, 2009. SNAP uses a general purpose STL (Standard Template Library)-like library GLib developed at Jozef Stefan Institute. SNAP and GLib are being actively developed and used in numerous academic and industrial projects.

    http://snap.stanford.edu/data/index.html#email

  2. R

    A hybrid matheuristic for the spread of influence on social networks -...

    • redu.unicamp.br
    • scholarship.miami.edu
    • +1more
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Felipe de Carvalho Pereira; Pedro Jussieu de Rezende; Tallys Hoover Yunes; Felipe de Carvalho Pereira; Pedro Jussieu de Rezende; Tallys Hoover Yunes (2024). A hybrid matheuristic for the spread of influence on social networks - complementary data [Dataset]. http://doi.org/10.25824/redu/CAVFDT
    Explore at:
    Dataset updated
    Nov 11, 2024
    Dataset provided by
    Repositório de Dados de Pesquisa da Unicamp
    Authors
    Felipe de Carvalho Pereira; Pedro Jussieu de Rezende; Tallys Hoover Yunes; Felipe de Carvalho Pereira; Pedro Jussieu de Rezende; Tallys Hoover Yunes
    License

    https://redu.unicamp.br/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.25824/redu/CAVFDThttps://redu.unicamp.br/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.25824/redu/CAVFDT

    Dataset funded by
    São Paulo Research Foundation
    Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
    Conselho Nacional de Desenvolvimento Científico e Tecnológico
    Description

    This dataset contains complementary data to the paper "A Hybrid Matheuristic for the Spread of Influence on Social Networks" [1], which proposes a matheuristic for combinatorial optimization problems involving the spread of information in social networks. For the computational experiments discussed in that paper, we provide: - Two sets of instances, originally obtained from [2-6]; - The solutions attained by exact and heuristic methods; - The collected results; - The matheuristic source code; The directories "benchmark_*/instances/" contain files that describe the sets of instances. Each instance is associated with a graph containing {n} vertices and {m} edges. The first {m} lines of each file contain: {u} {v} where {u} and {v} identify a pair of vertices that determines an undirected edge. The next line contains {n} integers corresponding to the costs of the vertices. The last line contains {n} integers corresponding to the thresholds of the vertices. The directories "benchmark_*/solutions_*/" contain files describing feasible solutions for the corresponding sets of instances. The first line of each file contains: {s} where {s} is the number of vertices in the target set. Each of the next {s} lines contains: {v} where {v} identifies a target. The last line contains an integer that represents the target set cost. The directory "hmf_source_code/" contains an implementation of the matheuristic framework proposed in [1], namely, HMF. This work was supported by grants from Santander Bank, the Brazilian National Council for Scientific and Technological Development (CNPq), the São Paulo Research Foundation (FAPESP), the Fund for Support to Teaching, Research and Outreach Activities (FAEPEX), and the Coordination for the Improvement of Higher Education Personnel (CAPES), all in Brazil. Caveat: The opinions, hypotheses and conclusions or recommendations expressed in this material are the sole responsibility of the authors and do not necessarily reflect the views of Santander, CNPq, FAPESP, FAEPEX, or CAPES. References [1] F. C. Pereira, P. J. de Rezende, and T. Yunes. A Hybrid Matheuristic for the Spread of Influence on Social Networks. 2024. Submitted. [2] S. Raghavan and R. Zhang. A branch-and-cut approach for the weighted target set selection problem on social networks. 2024. https://doi.org/10.1287/ijoo.2019.0012 [3] J. Leskovec and A. Krevl. SNAP Datasets: Stanford Large Network Dataset Collection. 2024. https://snap.stanford.edu/data [4] R. A. Rossi and N. K. Ahmed. The Network Data Repository with Interactive Graph Analytics and Visualization. 2022. https://networkrepository.com [5] J. Kunegis. KONECT – The Koblenz Network Collection. 2013. http://dl.acm.org/citation.cfm?id=2488173 [6] O. Lesser, L. Tenenboim-Chekina, L. Rokach, and Y. Elovici. Intruder or Welcome Friend: Inferring Group Membership in Online Social Networks. 2013. https://doi.org/10.1007/978-3-642-37210-0_40

  3. R

    A row generation algorithm for finding optimal burning sequences of large...

    • redu.unicamp.br
    • scholarship.miami.edu
    • +1more
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Felipe de Carvalho Pereira; Pedro Jussieu de Rezende; Tallys Hoover Yunes; Luiz Fernando Batista Morato; Felipe de Carvalho Pereira; Pedro Jussieu de Rezende; Tallys Hoover Yunes; Luiz Fernando Batista Morato (2024). A row generation algorithm for finding optimal burning sequences of large graphs - complementary data [Dataset]. http://doi.org/10.25824/redu/ZGX0H7
    Explore at:
    Dataset updated
    Nov 11, 2024
    Dataset provided by
    Repositório de Dados de Pesquisa da Unicamp
    Authors
    Felipe de Carvalho Pereira; Pedro Jussieu de Rezende; Tallys Hoover Yunes; Luiz Fernando Batista Morato; Felipe de Carvalho Pereira; Pedro Jussieu de Rezende; Tallys Hoover Yunes; Luiz Fernando Batista Morato
    License

    https://redu.unicamp.br/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.25824/redu/ZGX0H7https://redu.unicamp.br/api/datasets/:persistentId/versions/1.1/customlicense?persistentId=doi:10.25824/redu/ZGX0H7

    Dataset funded by
    Fundação de Amparo à Pesquisa do Estado de São Paulo
    Conselho Nacional de Desenvolvimento Científico e Tecnológico
    Description

    This dataset contains complementary data to the paper "A Row Generation Algorithm for Finding Optimal Burning Sequences of Large Graphs" [1], which proposes an exact algorithm for the Graph Burning Problem, an NP-hard optimization problem that models a form of contagion diffusion on social networks. Concerning the computational experiments discussed in that paper, we make available: - Four sets of instances; - The optimal (or best known) solutions obtained; - The source code; - An Appendix with additional details about the results. The "delta" input sets include graphs that are real-world networks [1,2], while the "grid" input set contains graphs that are square grids. The directories "delta_10K_instances", "delta_100K_instances", "delta_4M_instances" and "grid_instances" contain files that describe the sets of instances. The first two lines of each file contain: {n} {m} where {n} and {m} are the number of vertices and edges in the graph. Each of the next {m} lines contains: {u} {v} where {u} and {v} identify a pair of vertices that determines an undirected edge. The directories "delta_10K_solutions", "delta_100K_solutions", "delta_4M_solutions" and "grid_solutions" contain files that describe the optimal (or best known) solutions for the corresponding sets of instances. The first line of each file contains: {s} where {s} is the number of vertices in the burning sequence. Each of the next {s} lines contains: {v} where {v} identifies a fire source. The fire sources are listed in the same order that they appear in a burning sequence of length {s}. The directory "source_code" contains the implementations of the exact algorithm proposed in the paper [1], namely, PRYM. Lastly, the file "appendix.pdf" presents additional details on the results reported in the paper. This work was supported by grants from Santander Bank, Brazil, Brazilian National Council for Scientific and Technological Development (CNPq), Brazil, São Paulo Research Foundation (FAPESP), Brazil and Fund for Support to Teaching, Research and Outreach Activities (FAEPEX). Caveat: the opinions, hypotheses and conclusions or recommendations expressed in this material are the sole responsibility of the authors and do not necessarily reflect the views of Santander, CNPq, FAPESP or FAEPEX. References [1] F. C. Pereira, P. J. de Rezende, T. Yunes and L. F. B. Morato. A Row Generation Algorithm for Finding Optimal Burning Sequences of Large Graphs. Submitted. 2024. [2] Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford Large Network Dataset Collection. 2024. https://snap.stanford.edu/data [3] Ryan A. Rossi and Nesreen K. Ahmed. The Network Data Repository with Interactive Graph Analytics and Visualization. In: AAAI, 2022. https://networkrepository.com

  4. S

    RSM-OC Dataset

    • scidb.cn
    Updated Dec 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    xu meng yao (2025). RSM-OC Dataset [Dataset]. http://doi.org/10.57760/sciencedb.22252
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 2, 2025
    Dataset provided by
    Science Data Bank
    Authors
    xu meng yao
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description
    1. The file includes four publicly available dataset files: congers_network dataset, Netscience dataset, email Eu core dataset, and Facebook dataset. It can be obtained through public websites [1] Stanford website: https://snap.stanford.edu/data/ And [2] Network Data Repository website: https://networkrepository.com/ The above datasets are all real network datasets, containing two columns of data indicating the existence of a relationship between two nodes. The specific description is: The congers_network dataset is based on the interactive network of members of the 117th United States Congress on Twitter, where nodes represent Congress members and edges represent forwarding, referencing, replying, or mentioning relationships between members to quantify the probability of information dissemination. The Netscience dataset is derived from a scientific collaboration network, where nodes represent scientists and edges represent collaborative relationships between scientists. It is used to simulate the dissemination and impact of information in the field of scientific research. The email Eu core dataset is based on email interactions between large European research institutions, where nodes represent members of the institution and edges represent at least one email exchange between members. The Facebook dataset is composed of "circles" (or "friend lists") from Facebook, where nodes represent users and edges represent social connections between users, reflecting the social relationships between users. 2. The file includes comparative data on the scope of truth dissemination. xlsx This data is the direct result generated from the calculation and analysis in the paper. Specifically, it includes the comparison data of the number of rumor seeds and the number of truth seeds on the diffusion range of truth under two thresholds.
  5. Wikipedia Graphs

    • kaggle.com
    zip
    Updated Nov 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subhajit Sahu (2021). Wikipedia Graphs [Dataset]. https://www.kaggle.com/datasets/wolfram77/graphs-wikipedia
    Explore at:
    zip(15093368102 bytes)Available download formats
    Dataset updated
    Nov 16, 2021
    Authors
    Subhajit Sahu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    • Navigation paths on the Wikipedia hyperlink network, collected by the human-computation game Wikispeedia
    • Wikipedia who-votes-on-whom network
    • Wikipedia talk (communication) network
    • Wikipedia adminship election data
    • Wikipedia Requests for Adminship (with text)
    • Complete Wikipedia edit history (who edited what page)
    • Public Wikipedia hoaxes
    • Wikipedia page page network with traffic information

    Stanford Network Analysis Platform (SNAP) is a general purpose, high performance system for analysis and manipulation of large networks. Graphs consists of nodes and directed/undirected/multiple edges between the graph nodes. Networks are graphs with data on nodes and/or edges of the network.

    The core SNAP library is written in C++ and optimized for maximum performance and compact graph representation. It easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. Besides scalability to large graphs, an additional strength of SNAP is that nodes, edges and attributes in a graph or a network can be changed dynamically during the computation.

    SNAP was originally developed by Jure Leskovec in the course of his PhD studies. The first release was made available in Nov, 2009. SNAP uses a general purpose STL (Standard Template Library)-like library GLib developed at Jozef Stefan Institute. SNAP and GLib are being actively developed and used in numerous academic and industrial projects.

    http://snap.stanford.edu/data/index.html#wikipedia

  6. Stanford Web Graph

    • kaggle.com
    zip
    Updated Mar 8, 2026
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fatulla Bashirov (2026). Stanford Web Graph [Dataset]. https://www.kaggle.com/datasets/fatullabashirov/stanford-web-graph
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 8, 2026
    Authors
    Fatulla Bashirov
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Stanford Web Graph (2002)

    Overview

    This dataset contains a directed web graph of Stanford University webpages from 2002.
    Each node represents a page from stanford.edu, and each directed edge represents a hyperlink from one page to another.

    It is useful for tasks such as: - network analysis - graph mining - community detection - link analysis - ranking and centrality studies - large-scale graph algorithm benchmarking

    Dataset Statistics

    MetricValue
    Nodes281,903
    Edges2,312,497
    Nodes in largest WCC255,265 (0.906)
    Edges in largest WCC2,234,572 (0.966)
    Nodes in largest SCC150,532 (0.534)
    Edges in largest SCC1,576,314 (0.682)
    Average clustering coefficient0.5976
    Number of triangles11,329,473
    Fraction of closed triangles0.002889
    Diameter (longest shortest path)674
    90-percentile effective diameter9.7

    Graph Description

    • Domain: Stanford University (stanford.edu)
    • Graph type: Directed
    • Nodes: Webpages
    • Edges: Hyperlinks between webpages
    • Collection year: 2002

    Because the graph is directed, a link from page A to page B does not imply a link from page B to page A.

    File

    • web-Stanford.txt.gz — Stanford web graph from 2002

    Potential Use Cases

    This dataset can be used for: - studying the structure of real-world web graphs - analyzing weakly and strongly connected components - computing graph centrality and prestige measures - evaluating shortest path and diameter algorithms - exploring clustering, triangles, and transitivity - benchmarking large-scale graph processing systems

    Source

    J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney.
    Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters.
    Internet Mathematics, 6(1): 29–123, 2009.

    Citation

    If you use this dataset, please cite:

    J. Leskovec, K. Lang, A. Dasgupta, and M. Mahoney. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Internet Mathematics 6(1), 29–123, 2009.

    Notes

    • The dataset represents a snapshot of Stanford’s web structure in 2002.
    • The graph is large and sparse, making it suitable for scalable network analysis experiments.
    • The reported statistics reflect standard structural properties commonly used in graph mining research.
  7. Citation Graphs

    • kaggle.com
    zip
    Updated Nov 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subhajit Sahu (2021). Citation Graphs [Dataset]. https://www.kaggle.com/datasets/wolfram77/graphs-citation
    Explore at:
    zip(111812120 bytes)Available download formats
    Dataset updated
    Nov 13, 2021
    Authors
    Subhajit Sahu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Arxiv HEP-PH (high energy physics phenomenology ) citation graph is from the e-print arXiv and covers all the citations within a dataset of 34,546 papers with 421,578 edges. If a paper i cites paper j, the graph contains a directed edge from i to j. If a paper cites, or is cited by, a paper outside the dataset, the graph does not contain any information about this.

    The data covers papers in the period from January 1993 to April 2003 (124 months). It begins within a few months of the inception of the arXiv, and thus represents essentially the complete history of its HEP-PH section.

    The data was originally released as a part of 2003 KDD Cup.

    Arxiv HEP-TH (high energy physics theory) citation graph is from the e-print arXiv and covers all the citations within a dataset of 27,770 papers with 352,807 edges. If a paper i cites paper j, the graph contains a directed edge from i to j. If a paper cites, or is cited by, a paper outside the dataset, the graph does not contain any information about this.

    The data covers papers in the period from January 1993 to April 2003 (124 months). It begins within a few months of the inception of the arXiv, and thus represents essentially the complete history of its HEP-TH section.

    The data was originally released as a part of 2003 KDD Cup.

    U.S. patent dataset is maintained by the National Bureau of Economic Research. The data set spans 37 years (January 1, 1963 to December 30, 1999), and includes all the utility patents granted during that period, totaling 3,923,922 patents. The citation graph includes all citations made by patents granted between 1975 and 1999, totaling 16,522,438 citations. For the patents dataset there are 1,803,511 nodes for which we have no information about their citations (we only have the in-links).

    The data was originally released by NBER.

    Stanford Network Analysis Platform (SNAP) is a general purpose, high performance system for analysis and manipulation of large networks. Graphs consists of nodes and directed/undirected/multiple edges between the graph nodes. Networks are graphs with data on nodes and/or edges of the network.

    The core SNAP library is written in C++ and optimized for maximum performance and compact graph representation. It easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. Besides scalability to large graphs, an additional strength of SNAP is that nodes, edges and attributes in a graph or a network can be changed dynamically during the computation.

    SNAP was originally developed by Jure Leskovec in the course of his PhD studies. The first release was made available in Nov, 2009. SNAP uses a general purpose STL (Standard Template Library)-like library GLib developed at Jozef Stefan Institute. SNAP and GLib are being actively developed and used in numerous academic and industrial projects.

    https://snap.stanford.edu/data/index.html

  8. cit-HepPh Graph (SNAP)

    • kaggle.com
    zip
    Updated Dec 31, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subhajit Sahu (2021). cit-HepPh Graph (SNAP) [Dataset]. https://www.kaggle.com/datasets/wolfram77/graph-snap-cit-hepph
    Explore at:
    zip(3536441 bytes)Available download formats
    Dataset updated
    Dec 31, 2021
    Authors
    Subhajit Sahu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Arxiv HEP-PH (high energy physics phenomenology ) citation graph is from the e-print arXiv and covers all the citations within a dataset of 34,546 papers with 421,578 edges. If a paper i cites paper j, the graph contains a directed edge from i to j. If a paper cites, or is cited by, a paper outside the dataset, the graph does not contain any information about this.

    The data covers papers in the period from January 1993 to April 2003 (124 months). It begins within a few months of the inception of the arXiv, and thus represents essentially the complete history of its HEP-PH section.

    The data was originally released as a part of 2003 KDD Cup.

    Added an additional temporal-edges file cit-HepPh-temporal.txt, which follows the same formatting as that of other temporal graphs in the Stanford Large Network Dataset Collection.

  9. Temporal Graphs

    • kaggle.com
    zip
    Updated Nov 18, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subhajit Sahu (2021). Temporal Graphs [Dataset]. https://www.kaggle.com/wolfram77/graphs-temporal
    Explore at:
    zip(1468033567 bytes)Available download formats
    Dataset updated
    Nov 18, 2021
    Authors
    Subhajit Sahu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    • Hyperlinks between subreddits on Reddit
    • Comments, questions, and answers on Stack Overflow
    • Comments, questions, and answers on Math Overflow
    • Comments, questions, and answers on Super User
    • Comments, questions, and answers on Ask Ubuntu
    • Users editing talk pages on Wikipedia
    • E-mails between users at a research institution
    • Messages on a Facebook-like platform at UC-Irvine
    • Bitcoin OTC web of trust network
    • Bitcoin Alpha web of trust network
    • Student actions on a MOOC platform, with student drop-out binary labels.
    • Dynamic face-to-face interaction network between group of people

    Stanford Network Analysis Platform (SNAP) is a general purpose, high performance system for analysis and manipulation of large networks. Graphs consists of nodes and directed/undirected/multiple edges between the graph nodes. Networks are graphs with data on nodes and/or edges of the network.

    The core SNAP library is written in C++ and optimized for maximum performance and compact graph representation. It easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. Besides scalability to large graphs, an additional strength of SNAP is that nodes, edges and attributes in a graph or a network can be changed dynamically during the computation.

    SNAP was originally developed by Jure Leskovec in the course of his PhD studies. The first release was made available in Nov, 2009. SNAP uses a general purpose STL (Standard Template Library)-like library GLib developed at Jozef Stefan Institute. SNAP and GLib are being actively developed and used in numerous academic and industrial projects.

    http://snap.stanford.edu/data/index.html#temporal

  10. Communities Graphs

    • kaggle.com
    zip
    Updated Nov 15, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subhajit Sahu (2021). Communities Graphs [Dataset]. https://www.kaggle.com/wolfram77/graphs-communities
    Explore at:
    zip(7999979671 bytes)Available download formats
    Dataset updated
    Nov 15, 2021
    Authors
    Subhajit Sahu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    com-LiveJournal: LiveJournal social network and ground-truth communities

    LiveJournal is a free on-line blogging community where users declare friendship each other. LiveJournal also allows users form a group which other members can then join. We consider such user-defined groups as ground-truth communities. We provide the LiveJournal friendship social network and ground-truth communities.

    We regard each connected component in a group as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.

    com-Friendster: Friendster social network and ground-truth communities

    Friendster is an on-line gaming network. Before re-launching as a game website, Friendster was a social networking site where users can form friendship edge each other. Friendster social network also allows users form a group which other members can then join. We consider such user-defined groups as ground-truth communities. For the social network, we take the induced subgraph of the nodes that either belong to at least one community or are connected to other nodes that belong to at least one community. This data is provided by The Web Archive Project, where the full graph is available.

    We regard each connected component in a group as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.

    com-Orkut: Orkut social network and ground-truth communities

    Orkut is a free on-line social network where users form friendship each other. Orkut also allows users form a group which other members can then join. We consider such user-defined groups as ground-truth communities. We provide the Orkut friendship social network and ground-truth communities. This data is provided by Alan Mislove et al.

    We regard each connected component in a group as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.

    com-Youtube: Youtube social network and ground-truth communities

    Youtube is a video-sharing web site that includes a social network. In the Youtube social network, users form friendship each other and users can create groups which other users can join. We consider such user-defined groups as ground-truth communities. This data is provided by Alan Mislove et al.

    We regard each connected component in a group as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.

    com-DBLP: DBLP collaboration network and ground-truth communities

    The DBLP computer science bibliography provides a comprehensive list of research papers in computer science. We construct a co-authorship network where two authors are connected if they publish at least one paper together. Publication venue, e.g, journal or conference, defines an individual ground-truth community; authors who published to a certain journal or conference form a community.

    We regard each connected component in a group as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.

    com-Amazon: Amazon product co-purchasing network and ground-truth communities

    Network was collected by crawling Amazon website. It is based on Customers Who Bought This Item Also Bought feature of the Amazon website. If a product i is frequently co-purchased with product j, the graph contains an undirected edge from i to j. Each product category provided by Amazon defines each ground-truth community.

    We regard each connected component in a product category as a separate ground-truth community. We remove the ground-truth communities which have less than 3 nodes. We also provide the top 5,000 communities with highest quality which are described in our paper. As for the network, we provide the largest connected component.

    email-Eu-core: email-Eu-core network

    The network was generated using email data from a large European research institution. We have anonymized information about all incoming and outgoing email between members of the research institution. Th...

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Subhajit Sahu (2021). Communication Graphs [Dataset]. https://www.kaggle.com/wolfram77/graphs-communication
Organization logo

Communication Graphs

Communication networks from the Stanford Network Analysis Platform (SNAP)

Explore at:
zip(66715371 bytes)Available download formats
Dataset updated
Nov 15, 2021
Authors
Subhajit Sahu
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

email-EuAll: EU email communication network

The network was generated using email data from a large European research institution. For a period from October 2003 to May 2005 (18 months) we have anonymized information about all incoming and outgoing email of the research institution. For each sent or received email message we know the time, the sender and the recipient of the email. Overall we have 3,038,531 emails between 287,755 different email addresses. Note that we have a complete email graph for only 1,258 email addresses that come from the research institution. Furthermore, there are 34,203 email addresses that both sent and received email within the span of our dataset. All other email addresses are either non-existing, mistyped or spam.

Given a set of email messages, each node corresponds to an email address. We create a directed edge between nodes i and j, if i sent at least one message to j.

email-Enron: Enron email network

Enron email communication network covers all the email communication within a dataset of around half million emails. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation. Nodes of the network are email addresses and if an address i sent at least one email to address j, the graph contains an undirected edge from i to j. Note that non-Enron email addresses act as sinks and sources in the network as we only observe their communication with the Enron email addresses.

The Enron email data was originally released by William Cohen at CMU.

wiki-Talk: Wikipedia Talk network

Wikipedia is a free encyclopedia written collaboratively by volunteers around the world. Each registered user has a talk page, that she and other users can edit in order to communicate and discuss updates to various articles on Wikipedia. Using the latest complete dump of Wikipedia page edit history (from January 3 2008) we extracted all user talk page changes and created a network.

The network contains all the users and discussion from the inception of Wikipedia till January 2008. Nodes in the network represent Wikipedia users and a directed edge from node i to node j represents that user i at least once edited a talk page of user j.

comm-f2f-Resistance: Dynamic Face-to-Face Interaction Networks

The dynamic face-to-face interaction networks represent the interactions that happen during discussions between a group of participants playing the Resistance game. This dataset contains networks extracted from 62 games. Each game is played by 5-8 participants and lasts between 45--60 minutes. We extract dynamically evolving networks from the free-form discussions using the ICAF algorithm. The extracted networks are used to characterize and detect group deceptive behavior using the DeceptionRank algorithm.

The networks are weighted, directed and temporal. Each node represents a participant. At each 1/3 second, a directed edge from node u to v is weighted by the probability of participant u looking at participant v or the laptop. Additionally, we also provide a binary version where an edge from u to v indicates participant u looks at participant v (or the laptop).

Stanford Network Analysis Platform (SNAP) is a general purpose, high performance system for analysis and manipulation of large networks. Graphs consists of nodes and directed/undirected/multiple edges between the graph nodes. Networks are graphs with data on nodes and/or edges of the network.

The core SNAP library is written in C++ and optimized for maximum performance and compact graph representation. It easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. Besides scalability to large graphs, an additional strength of SNAP is that nodes, edges and attributes in a graph or a network can be changed dynamically during the computation.

SNAP was originally developed by Jure Leskovec in the course of his PhD studies. The first release was made available in Nov, 2009. SNAP uses a general purpose STL (Standard Template Library)-like library GLib developed at Jozef Stefan Institute. SNAP and GLib are being actively developed and used in numerous academic and industrial projects.

http://snap.stanford.edu/data/index.html#email

Search
Clear search
Close search
Google apps
Main menu