Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Arxiv HEP-PH (high energy physics phenomenology ) citation graph is from the e-print arXiv and covers all the citations within a dataset of 34,546 papers with 421,578 edges. If a paper i cites paper j, the graph contains a directed edge from i to j. If a paper cites, or is cited by, a paper outside the dataset, the graph does not contain any information about this.
The data covers papers in the period from January 1993 to April 2003 (124 months). It begins within a few months of the inception of the arXiv, and thus represents essentially the complete history of its HEP-PH section.
The data was originally released as a part of 2003 KDD Cup.
Arxiv HEP-TH (high energy physics theory) citation graph is from the e-print arXiv and covers all the citations within a dataset of 27,770 papers with 352,807 edges. If a paper i cites paper j, the graph contains a directed edge from i to j. If a paper cites, or is cited by, a paper outside the dataset, the graph does not contain any information about this.
The data covers papers in the period from January 1993 to April 2003 (124 months). It begins within a few months of the inception of the arXiv, and thus represents essentially the complete history of its HEP-TH section.
The data was originally released as a part of 2003 KDD Cup.
U.S. patent dataset is maintained by the National Bureau of Economic Research. The data set spans 37 years (January 1, 1963 to December 30, 1999), and includes all the utility patents granted during that period, totaling 3,923,922 patents. The citation graph includes all citations made by patents granted between 1975 and 1999, totaling 16,522,438 citations. For the patents dataset there are 1,803,511 nodes for which we have no information about their citations (we only have the in-links).
The data was originally released by NBER.
Stanford Network Analysis Platform (SNAP) is a general purpose, high performance system for analysis and manipulation of large networks. Graphs consists of nodes and directed/undirected/multiple edges between the graph nodes. Networks are graphs with data on nodes and/or edges of the network.
The core SNAP library is written in C++ and optimized for maximum performance and compact graph representation. It easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. Besides scalability to large graphs, an additional strength of SNAP is that nodes, edges and attributes in a graph or a network can be changed dynamically during the computation.
SNAP was originally developed by Jure Leskovec in the course of his PhD studies. The first release was made available in Nov, 2009. SNAP uses a general purpose STL (Standard Template Library)-like library GLib developed at Jozef Stefan Institute. SNAP and GLib are being actively developed and used in numerous academic and industrial projects.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset contains complementary data to the paper "A Row Generation Algorithm for Finding Optimal Burning Sequences of Large Graphs" [1], which proposes an exact algorithm for the Graph Burning Problem, an NP-hard optimization problem that models a form of contagion diffusion on social networks.
Concerning the computational experiments discussed in that paper, we make available:
The "delta" input sets include graphs that are real-world networks [1,2], while the "grid" input set contains graphs that are square grids.
The directories "delta_10K_instances", "delta_100K_instances", "delta_4M_instances" and "grid_instances" contain files that describe the sets of instances. The first two lines of each file contain:
where
where and
The directories "delta_10K_solutions", "delta_100K_solutions", "delta_4M_solutions" and "grid_solutions" contain files that describe the optimal (or best known) solutions for the corresponding sets of instances.
The first line of each file contains:
where is the number of vertices in the burning sequence. Each of the next lines contains:
where
The directory "source_code" contains the implementations of the exact algorithm proposed in the paper [1], namely, PRYM.
Lastly, the file "appendix.pdf" presents additional details on the results reported in the paper.
This work was supported by grants from Santander Bank, Brazil, Brazilian National Council for Scientific and Technological Development (CNPq), Brazil, São Paulo Research Foundation (FAPESP), Brazil and Fund for Support to Teaching, Research and Outreach Activities (FAEPEX).
Caveat: the opinions, hypotheses and conclusions or recommendations expressed in this material are the sole responsibility of the authors and do not necessarily reflect the views of Santander, CNPq, FAPESP or FAEPEX.
References
[1] F. C. Pereira, P. J. de Rezende, T. Yunes and L. F. B. Morato. A Row Generation Algorithm for Finding Optimal Burning Sequences of Large Graphs. Submitted. 2024.
[2] Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford Large Network Dataset Collection. 2024. https://snap.stanford.edu/data
[3] Ryan A. Rossi and Nesreen K. Ahmed. The Network Data Repository with Interactive Graph Analytics and Visualization. In: AAAI, 2022. https://networkrepository.com
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset information
The network was generated using email data from a large European research
institution. For a period from October 2003 to May 2005 (18 months) we have
anonymized information about all incoming and outgoing email of the research
institution. For each sent or received email message we know the time, the
sender and the recipient of the email. Overall we have 3,038,531 emails between
287,755 different email addresses. Note that we have a complete email graph for
only 1,258 email addresses that come from the research institution.
Furthermore, there are 34,203 email addresses that both sent and received email
within the span of our dataset. All other email addresses are either
non-existing, mistyped or spam.
Given a set of email messages, each node corresponds to an email address. We
create a directed edge between nodes i and j, if i sent at least one message to
j.
Dataset statistics
Nodes 265214
Edges 420045
Nodes in largest WCC 224832 (0.848)
Edges in largest WCC 395270 (0.941)
Nodes in largest SCC 34203 (0.129)
Edges in largest SCC 151930 (0.362)
Average clustering coefficient 0.3093
Number of triangles 267313
Fraction of closed triangles 0.004106
Diameter (longest shortest path) 13
90-percentile effective diameter 4.5
Source (citation)
J. Leskovec, J. Kleinberg and C. Faloutsos. Graph Evolution: Densification and
Shrinking Diameters. ACM Transactions on Knowledge Discovery from Data (ACM
TKDD), 1(1), 2007.
Files
File Description
email-EuAll.txt.gz Email network of a large European Research Institution
Dataset information
Enron email communication network covers all the email communication within a
dataset of around half million emails. This data was originally made public,
and posted to the web, by the Federal Energy Regulatory Commission during its
investigation. Nodes of the network are email addresses and if an address i
sent at least one email to address j, the graph contains a directed edge from i
to j. Note that non-Enron email addresses act as sinks and sources in the
network as we only observe their communication with the Enron email addresses.
The Enron email data was originally released by William Cohen at CMU.
Dataset statistics
Nodes 36692
Edges 367662
Nodes in largest WCC 33696 (0.918)
Edges in largest WCC 361622 (0.984)
Nodes in largest...
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
This dataset contains complementary data to the paper "A Hybrid Matheuristic for the Spread of Influence on Social Networks" [1], which proposes a matheuristic for combinatorial optimization problems involving the spread of information in social networks.
For the computational experiments discussed in that paper, we provide:
The directories "benchmark_*/instances/" contain files that describe the sets of instances. Each instance is associated with a graph containing
The first
where and
The next line contains
The last line contains
The directories "benchmark_*/solutions_*/" contain files describing feasible solutions for the corresponding sets of instances.
The first line of each file contains:
where is the number of vertices in the target set. Each of the next lines contains:
where
The last line contains an integer that represents the target set cost.
The directory "hmf_source_code/" contains an implementation of the matheuristic framework proposed in [1], namely, HMF.
This work was supported by grants from Santander Bank, the Brazilian National Council for Scientific and Technological Development (CNPq), the São Paulo Research Foundation (FAPESP), the Fund for Support to Teaching, Research and Outreach Activities (FAEPEX), and the Coordination for the Improvement of Higher Education Personnel (CAPES), all in Brazil.
Caveat: The opinions, hypotheses and conclusions or recommendations expressed in this material are the sole responsibility of the authors and do not necessarily reflect the views of Santander, CNPq, FAPESP, FAEPEX, or CAPES.
References
[1] F. C. Pereira, P. J. de Rezende, and T. Yunes. A Hybrid Matheuristic for the Spread of Influence on Social Networks. 2024. Submitted.
[2] S. Raghavan and R. Zhang. A branch-and-cut approach for the weighted target set selection problem on social networks. 2024. https://doi.org/10.1287/ijoo.2019.0012
[3] J. Leskovec and A. Krevl. SNAP Datasets: Stanford Large Network Dataset Collection. 2024. https://snap.stanford.edu/data
[4] R. A. Rossi and N. K. Ahmed. The Network Data Repository with Interactive Graph Analytics and Visualization. 2022. https://networkrepository.com
[5] J. Kunegis. KONECT – The Koblenz Network Collection. 2013. http://dl.acm.org/citation.cfm?id=2488173
[6] O. Lesser, L. Tenenboim-Chekina, L. Rokach, and Y. Elovici. Intruder or Welcome Friend: Inferring Group Membership in Online Social Networks. 2013. https://doi.org/10.1007/978-3-642-37210-0_40
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Arxiv HEP-PH (high energy physics phenomenology ) citation graph is from the e-print arXiv and covers all the citations within a dataset of 34,546 papers with 421,578 edges. If a paper i cites paper j, the graph contains a directed edge from i to j. If a paper cites, or is cited by, a paper outside the dataset, the graph does not contain any information about this.
The data covers papers in the period from January 1993 to April 2003 (124 months). It begins within a few months of the inception of the arXiv, and thus represents essentially the complete history of its HEP-PH section.
The data was originally released as a part of 2003 KDD Cup.
Added an additional temporal-edges file cit-HepPh-temporal.txt, which follows the same formatting as that of other temporal graphs in the Stanford Large Network Dataset Collection.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Arxiv HEP-PH (high energy physics phenomenology ) citation graph is from the e-print arXiv and covers all the citations within a dataset of 34,546 papers with 421,578 edges. If a paper i cites paper j, the graph contains a directed edge from i to j. If a paper cites, or is cited by, a paper outside the dataset, the graph does not contain any information about this.
The data covers papers in the period from January 1993 to April 2003 (124 months). It begins within a few months of the inception of the arXiv, and thus represents essentially the complete history of its HEP-PH section.
The data was originally released as a part of 2003 KDD Cup.
Arxiv HEP-TH (high energy physics theory) citation graph is from the e-print arXiv and covers all the citations within a dataset of 27,770 papers with 352,807 edges. If a paper i cites paper j, the graph contains a directed edge from i to j. If a paper cites, or is cited by, a paper outside the dataset, the graph does not contain any information about this.
The data covers papers in the period from January 1993 to April 2003 (124 months). It begins within a few months of the inception of the arXiv, and thus represents essentially the complete history of its HEP-TH section.
The data was originally released as a part of 2003 KDD Cup.
U.S. patent dataset is maintained by the National Bureau of Economic Research. The data set spans 37 years (January 1, 1963 to December 30, 1999), and includes all the utility patents granted during that period, totaling 3,923,922 patents. The citation graph includes all citations made by patents granted between 1975 and 1999, totaling 16,522,438 citations. For the patents dataset there are 1,803,511 nodes for which we have no information about their citations (we only have the in-links).
The data was originally released by NBER.
Stanford Network Analysis Platform (SNAP) is a general purpose, high performance system for analysis and manipulation of large networks. Graphs consists of nodes and directed/undirected/multiple edges between the graph nodes. Networks are graphs with data on nodes and/or edges of the network.
The core SNAP library is written in C++ and optimized for maximum performance and compact graph representation. It easily scales to massive networks with hundreds of millions of nodes, and billions of edges. It efficiently manipulates large graphs, calculates structural properties, generates regular and random graphs, and supports attributes on nodes and edges. Besides scalability to large graphs, an additional strength of SNAP is that nodes, edges and attributes in a graph or a network can be changed dynamically during the computation.
SNAP was originally developed by Jure Leskovec in the course of his PhD studies. The first release was made available in Nov, 2009. SNAP uses a general purpose STL (Standard Template Library)-like library GLib developed at Jozef Stefan Institute. SNAP and GLib are being actively developed and used in numerous academic and industrial projects.