100+ datasets found

Citation Graph
kaggle.com
Updated Jun 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Caselaw Access Project (2020). Citation Graph [Dataset]. https://www.kaggle.com/harvardlil/citation-graph/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 30, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Caselaw Access Project
Description
Context

The Caselaw Access Project makes 40 million pages of U.S. caselaw freely available online from the collections of Harvard Law School Library.

The CAP citation graph shows the connections between cases in the Caselaw Access Project dataset. You can use the citation graph to answer questions like "what is the most influential case?" and "what jurisdictions cite most often to this jurisdiction?".

Learn More: https://case.law/download/citation_graph/

Access Limits: https://case.law/api/#limits

Content

This dataset includes citations and metadata for the CAP citation graph in CSV format.

Acknowledgements

The Caselaw Access Project is by the Library Innovation Lab at Harvard Law School Library.

Inspiration

People are using CAP data to create research, applications, and more. We're sharing examples in our gallery.

Cite Grid is the first visualization we've created based on data from our citation graph.

Have something to share? We're excited to hear about it.
P
Arxiv HEP-TH citation graph Dataset
paperswithcode.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arxiv HEP-TH citation graph Dataset [Dataset]. https://paperswithcode.com/dataset/arxiv
Explore at:
Description
Arxiv HEP-TH (high energy physics theory) citation graph is from the e-print arXiv and covers all the citations within a dataset of 27,770 papers with 352,807 edges. If a paper i cites paper j, the graph contains a directed edge from i to j. If a paper cites, or is cited by, a paper outside the dataset, the graph does not contain any information about this. The data covers papers in the period from January 1993 to April 2003 (124 months).
Z
MAG for Heterogeneous Graph Learning
data.niaid.nih.gov
Updated Jul 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Diea, Maria-Alexandra (2021). MAG for Heterogeneous Graph Learning [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5055135
Explore at:
Dataset updated
Jul 9, 2021
Dataset authored and provided by
Diea, Maria-Alexandra
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
We provide an academic graph based on a snapshot of the Microsoft Academic Graph from 26.05.2021. The Microsoft Academic Graph (MAG) is a large-scale dataset containing information about scientific publication records, their citation relations, as well as authors, affiliations, journals, conferences and fields of study. We acknowledge the Microsoft Academic Graph using the URI https://aka.ms/msracad. For more information regarding schema and the entities present in the original dataset please refer to: MAG schema.

MAG for Heterogeneous Graph Learning We use a recent version of MAG from May 2021 and extract all relevant entities to build a graph that can be directly used for heterogeneous graph learning (node classification, link prediction, etc.). The graph contains all English papers, published after 1900, that have been cited at least 5 times per year since the time of publishing. For fairness, we set a constant citation bound of 100 for papers published before 2000. We further include two smaller subgraphs, one containing computer science papers and one containing medicine papers.

Nodes and features We define the following nodes:

paper with mag_id, graph_id, normalized title, year of publication, citations and a 128-dimension title embedding built using word2vec No. of papers: 5,091,690 (all), 1,014,769 (medicine), 367,576 (computer science);

author with mag_id, graph_id, normalized name, citations No. of authors: 6,363,201 (all), 1,797,980 (medicine), 557,078 (computer science);

field with mag_id, graph_id, level, citations denoting the hierarchical level of the field where 0 is the highest-level (e.g. computer science) No. of fields: 199,457 (all), 83,970 (medicine), 45,454 (computer science);

affiliation with mag_id, graph_id, citations No. of affiliations: 19,421 (all), 12,103 (medicine), 10,139 (computer science);

venue with mag_id, graph_id, citations, type denoting whether conference or journal No. of venues: 24,608 (all), 8,514 (medicine), 9,893 (computer science).

Edges We define the following edges:

author is_affiliated_with affiliation No. of author-affiliation edges: 8,292,253 (all), 2,265,728 (medicine), 665,931 (computer science);

author is_first/last/other paper No. of author-paper edges: 24,907,473 (all), 5,081,752 (medicine), 1,269,485 (computer science);

paper has_citation_to paper No. of author-affiliation edges: 142,684,074 (all), 16,808,837 (medicine), 4,152,804 (computer science);

paper conference/journal_published_at venue No. of author-affiliation edges: 5,091,690 (all), 1,014,769 (medicine), 367,576 (computer science);

paper has_field_L0/L1/L2/L3/L4 field No. of author-affiliation edges: 47,531,366 (all), 9,403,708 (medicine), 3,341,395 (computer science);

field is_in field No. of author-affiliation edges: 339,036 (all), 138,304 (medicine), 83,245 (computer science);

We further include a reverse edge for each edge type defined above that is denoted with the prefix rev_ and can be removed based on the downstream task.

Data structure The nodes and their respective features are provided as separate .tsv files where each feature represents a column. The edges are provided as a pickled python dictionary with schema:

{target_type: {source_type: {edge_type: {target_id: {source_id: {time } } } } } }

We provide three compressed ZIP archives, one for each subgraph (all, medicine, computer science), however we split the file for the complete graph into 500mb chunks. Each archive contains the separate node features and edge dictionary.
P
DBLP Dataset
paperswithcode.com
Updated Apr 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jie Tang; Jing Zhang; Limin Yao; Juanzi Li; Li Zhang; Zhong Su (2021). DBLP Dataset [Dataset]. https://paperswithcode.com/dataset/dblp
Explore at:
Dataset updated
Apr 13, 2021
Authors
Jie Tang; Jing Zhang; Limin Yao; Juanzi Li; Li Zhang; Zhong Su
Description
The DBLP is a citation network dataset. The citation data is extracted from DBLP, ACM, MAG (Microsoft Academic Graph), and other sources. The first version contains 629,814 papers and 632,752 citations. Each paper is associated with abstract, authors, year, venue, and title. The data set can be used for clustering with network and side information, studying influence in the citation network, finding the most influential papers, topic modeling analysis, etc.
P
PubMedCite Dataset
paperswithcode.com
Updated Jan 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zheheng Luo; Qianqian Xie; Sophia Ananiadou (2023). PubMedCite Dataset [Dataset]. https://paperswithcode.com/dataset/pubmedcite
Explore at:
Dataset updated
Jan 25, 2023
Authors
Zheheng Luo; Qianqian Xie; Sophia Ananiadou
Description
PubMedCite is a domain-specific dataset with about 192K biomedical scientific papers and a large citation graph preserving 917K citation relationships between them. It is characterized by preserving the salient contents extracted from full texts of references, and the weighted correlation between the salient.
P
Data from: Citeseer Dataset
paperswithcode.com
huggingface.co
Updated Mar 4, 2007
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
C. Lee Giles; Kurt D. Bollacker; Steve Lawrence (2007). Citeseer Dataset [Dataset]. https://paperswithcode.com/dataset/citeseer
Explore at:
Dataset updated
Mar 4, 2007
Authors
C. Lee Giles; Kurt D. Bollacker; Steve Lawrence
Description
The CiteSeer dataset consists of 3312 scientific publications classified into one of six classes. The citation network consists of 4732 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 3703 unique words.
Z
Data from: 3DCP.fyi - A Comprehensive Citation Network Graph on the State of...
data.niaid.nih.gov
Updated Apr 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fischer, Oliver (2024). 3DCP.fyi - A Comprehensive Citation Network Graph on the State of the Art in 3D Concrete Printing [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10973877
Explore at:
Dataset updated
Apr 15, 2024
Dataset provided by
Auer, Daniel
Fischer, Oliver
Bos, Freek
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Research in digital fabrication, specifically in 3D concrete printing (3DCP), has seen a substantial increase in publication output in the past five years, making it hard to keep up with the latest developments. The 3dcp.fyi database aims to provide the research community with a comprehensive, up-to-date, and manually curated literature dataset documenting the development of the field from its early beginnings in the late 1990s to its resurgence in the 2010s until today. The data set is compiled using a systematic approach. A thorough literature search was conducted in scientific databases, following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) scheme. This was then enhanced iteratively with non-indexed literature through a snowball citation search. The authors of the articles were assigned unique and persistent identifiers (ORCID® IDs) through a systematic process that combined querying APIs systematically and manually curating data. The works in the data set also include references to other works, as long as those referenced works are also included within the same data set. A citation network graph is created where scientific articles are represented as vertices, and their citations to other scientific articles are the edges. The constructed network graph is subjected to detailed analysis using specific graph-theoretic algorithms, like PageRank. These algorithms evaluate the structure and connections within the graph, yielding quantitative metrics. Currently, the high-quality dataset contains more than 2600 manually curated scientific works, including journal articles, conference articles, books, and theses, with more than 40000 cross-references and 2000 authors, opening up the possibility for more detailed analysis. The data is published on https://3dcp.fyi, ready for import into several reference managers, and is continuously updated. We encourage researchers to enrich the database by submitting their publications, adding missing works, or suggesting new features.
Transaction Graph Dataset for the Bitcoin Blockchain - Part 2 of 4 - Dataset...
cryptodata.center
Updated Dec 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
cryptodata.center (2024). Transaction Graph Dataset for the Bitcoin Blockchain - Part 2 of 4 - Dataset - CryptoData Hub [Dataset]. https://cryptodata.center/dataset/transaction-graph-dataset-for-the-bitcoin-blockchain-part-2-of-4
Explore at:
Dataset updated
Dec 4, 2024
Dataset provided by
CryptoDATA
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains bitcoin transfer transactions extracted from the Bitcoin Mainnet blockchain. Details of the datasets are given below: FILENAME FORMAT: The filenames have the following format: btc-tx- where For example file btc-tx-100000-149999-aa.bz2 and the rest of the parts if any contain transactions from block 100000 to block 149999 inclusive. The files are compressed with bzip2. They can be uncompressed using command bunzip2. TRANSACTION FORMAT: Each line in a file corresponds to a transaction. The transaction has the following format: BLOCK TIME FORMAT: The block time file has the following format: IMPORTANT NOTE: Public Bitcoin Mainnet blockchain data is open and can be obtained by connecting as a node on the blockchain or by using the block explorer web sites such as https://btcscan.org . The downloaders and users of this dataset accept the full responsibility of using the data in GDPR compliant manner or any other regulations. We provide the data as is and we cannot be held responsible for anything. NOTE: If you use this dataset, please do not forget to add the DOI number to the citation. If you use our dataset in your research, please also cite our paper: https://link.springer.com/chapter/10.1007/978-3-030-94590-9_14 @incollection{kilicc2022analyzing, title={Analyzing Large-Scale Blockchain Transaction Graphs for Fraudulent Activities}, author={K{\i}l{\i}{\c{c}}, Baran and {"O}zturan, Can and {\c{S}}en, Alper}, booktitle={Big Data and Artificial Intelligence in Digital Finance}, pages={253--267}, year={2022}, publisher={Springer, Cham} }
Transaction Graph Dataset for the Ethereum Blockchain - Dataset - CryptoData...
cryptodata.center
Updated Dec 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
cryptodata.center (2024). Transaction Graph Dataset for the Ethereum Blockchain - Dataset - CryptoData Hub [Dataset]. https://cryptodata.center/dataset/transaction-graph-dataset-for-the-ethereum-blockchain
Explore at:
Dataset updated
Dec 4, 2024
Dataset provided by
CryptoDATA
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Уникальный идентификатор https://doi.org/10.5281/zenodo.4718440 Набор данных обновлен Dec 19, 2022 Набор данных предоставлен Zenodo Авторы Can Özturan; Can Özturan; Alper Şen; Alper Şen; Baran Kılıç; Baran Kılıç Лицензия Attribution 4.0 (CC BY 4.0) Информация о лицензии была получена автоматически Описание This dataset contains ether as well as popular ERC20 token transfer transactions extracted from the Ethereum Mainnet blockchain. Only send ether, contract function call, contract deployment transactions are present in the dataset. Miner reward (static block reward) and "uncle block inclusion reward" are added as transactions to the dataset. Transaction fee reward and "uncles reward" are not currently included in the dataset. Details of the datasets are given below: FILENAME FORMAT: The filenames have the following format: eth-tx- where For example file eth-tx-1000000-1099999.txt.bz2 contains transactions from block 1000000 to block 1099999 inclusive. The files are compressed with bzip2. They can be uncompressed using command bunzip2. TRANSACTION FORMAT: Each line in a file corresponds to a transaction. The transaction has the following format: units. ERC20 tokens transfers (transfer and transferFrom function calls in ERC20 contract) are indicated by token symbol. For example GUSD is Gemini USD stable coin. The JSON file erc20tokens.json given below contains the details of ERC20 tokens. Failed transactions are prefixed with "F-". BLOCK TIME FORMAT: The block time file has the following format: erc20tokens.json FILE: This file contains the list of popular ERC20 token contracts whose transfer/transferFrom transactions appear in the data files. ERC20 token list: USDT TRYb XAUt BNB LEO LINK HT HEDG MKR CRO VEN INO PAX INB SNX REP MOF ZRX SXP OKB XIN OMG SAI HOT DAI EURS HPT BUSD USDC SUSD HDG QCAD PLUS BTCB WBTC cWBTC renBTC sBTC imBTC pBTC IMPORTANT NOTE: Public Ethereum Mainnet blockchain data is open and can be obtained by connecting as a node on the blockchain or by using the block explorer web sites such as http://etherscan.io . The downloaders and users of this dataset accept the full responsibility of using the data in GDPR compliant manner or any other regulations. We provide the data as is and we cannot be held responsible for anything. NOTE: If you use this dataset, please do not forget to add the DOI number to the citation. If you use our dataset in your research, please also cite our paper: https://link.springer.com/article/10.1007/s10586-021-03511-0 @article{kilic2022parallel, title={Parallel Analysis of Ethereum Blockchain Transaction Data using Cluster Computing}, journal={Cluster Computing}, author={K{\i}l{\i}{\c{c}}, Baran and {"O}zturan, Can and Sen, Alper}, year={2022}, month={Jan} }
Microsoft Academic Graph
zenodo.org
explore.openaire.eu
application/gzip
Updated Apr 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microsoft Academic; Microsoft Academic (2023). Microsoft Academic Graph [Dataset]. http://doi.org/10.5281/zenodo.2593154
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.2593154
Dataset updated
Apr 6, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Microsoft Academic; Microsoft Academic
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
This is the Microsoft Academic Graph data from 2019-03-02. To get this, you'd normally jump through these hoops: https://docs.microsoft.com/en-us/academic-services/graph/get-started-setup-provisioning

As required by ODC-BY, I acknowledge Microsoft Academic using the URI https://aka.ms/msracad.

You can find out more about the data schema of the Microsoft Academic Graph at: https://docs.microsoft.com/en-us/academic-services/graph/reference-data-schema
Since Microsoft docs are covered by different licensing terms, the documentation cannot be provided along with the data.

There were no changes to the files except compressing them with gzip. They were downloaded and checked twice.
After uploading, the md5-hashes from Zenodo match the locally created compressed files.

The compressed files will expand to the following sizes (in bytes):

4563254 Affiliations.txt 16498013834 Authors.txt 2220754 ConferenceInstances.txt 427502 ConferenceSeries.txt 55232571 FieldsOfStudy.txt 5685746 Journals.txt 32387110344 PaperAuthorAffiliations.txt 32326282060 PaperReferences.txt 7763965 PaperResources.txt 60135810372 Papers.txt 22000508908 PaperUrls.txt -------------------------------- 163423619310 total (~152GiB)
NetVotes ENIC Dataset
zenodo.org
explore.openaire.eu
txt, zip
Updated Oct 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Israel Mendonça; Vincent Labatut; Vincent Labatut; Rosa Figueiredo; Rosa Figueiredo; Israel Mendonça (2024). NetVotes ENIC Dataset [Dataset]. http://doi.org/10.5281/zenodo.6815510
Explore at:
zip, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6815510
Dataset updated
Oct 1, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Israel Mendonça; Vincent Labatut; Vincent Labatut; Rosa Figueiredo; Rosa Figueiredo; Israel Mendonça
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description. The NetVote dataset contains the outputs of the NetVote program when applied to voting data coming from VoteWatch (http://www.votewatch.eu/).

These results were used in the following conference papers:

I. Mendonça, R. Figueiredo, V. Labatut, and P. Michelon, “Relevance of Negative Links in Graph Partitioning: A Case Study Using Votes From the European Parliament,” in 2nd European Network Intelligence Conference, 2015, pp. 122–129. ⟨hal-01176090⟩ DOI: 10.1109/ENIC.2015.25

I. Mendonça, R. Figueiredo, V. Labatut, and P. Michelon, “Informative Value of Negative Links for Graph Partitioning, with an application to European Parliament Votes,” in 6ème Conférence sur les modèles et lánalyse de réseaux : approches mathématiques et informatiques, 2015, p. 12p. ⟨hal-02055158⟩

Source code. The NetVote source code is available on GitHub: https://github.com/CompNet/NetVotes.

Citation. If you use our dataset or tool, please cite article [1] above.

@InProceedings{Mendonca2015,
author = {Mendonça, Israel and Figueiredo, Rosa and Labatut, Vincent and Michelon, Philippe},

title = {Relevance of Negative Links in Graph Partitioning: A Case Study Using Votes From the {E}uropean {P}arliament},
booktitle = {2\textsuperscript{nd} European Network Intelligence Conference ({ENIC})},
year = {2015},
pages = {122-129},
address = {Karlskrona, SE},
publisher = {IEEE Publishing},
doi = {10.1109/ENIC.2015.25},
}

-------------------------

Details. This archive contains the following folders:

`votewatch_data`: the raw data extracted from the VoteWatch website.

`VoteWatch Europe European Parliament, Council of the EU.csv`: list of the documents voted during the considered term, with some details such as the date and topic.

`votes_by_document`: this folder contains a collection of CSV files, each one describing the outcome of the vote session relatively to one specific document.

`intermediate_files`: this folder contains several CSV files:

`allvotes.csv`: concatenation of all vote outcomes for all documents and all MEPS. Can be considered as a compact representation of the data contained in the folder `votes_by_document`.

`loyalty.csv`: same thing than allvotes.csv, but for the loyalty (i.e. whether or not the MEP voted like the majority of the MEPs in his political group).

`MPs.csv`: list of the MEPs having voted at least once in the considered term, with their details.

`policies.csv`: list of the topics considered during the term.

`qtd_docs.csv`: list of the topics with the corresponding number of documents.

`parallel_ils_results`: contains the raw results of the ILS tool. This is an external algorithm able to estimate the optimal partition of the network nodes in terms of structural balance. It was applied to all the networks extracted by our scripts (from the VoteWatch data), and the produced files were placed here for postprocessing. Each subfolder corresponds to one of the topic-year pair.

`output_files`: contains the file produced by our scripts.

`agreement`: histograms representing the distributions of agreement and rebellion indices. Each subfolder corresponds to a specific topic.

`community_algorithms_csv`: Performances obtained by the partitioning algorithms (for both community detection and correlation clustering). Each subfolder corresponds to a specific topic.

`xxxx_cluster_information.csv`: table containing several variants of the imbalance measure, for the considered algorithms.

`community_algorithms_results`: Comparison of the partitions detected by the various algorithms considered, and distribution of the cluster/community sizes. Each subfolder corresponds to a specific topic.

`xxxx_cluster_comparison.csv`: table comparing the partitions detected by the community detection algorithms, in terms of Rand index and other measures.

`xxxx_ils_cluster_comparison.csv`: like `xxxx_cluster_comparison.csv`, except we compare the partition of community detection algorithms with that of the ILS.

`xxxx_yyyy_distribution.pdf`: histogram of the community (or cluster) sizes detected by algorithm `yyyy`.

`graphs`: the networks extracted from the vote data. Each subfolder corresponds to a specific topic.

`xxxx_complete_graph.graphml`: network at the Graphml format, with all the information: nodes, edges, nodal attributes (including communities), weights, etc.

`xxxx_edges_Gephi.csv`: only the links, with their weights (i.e. vote similarity).

`xxxx_graph.g`: network at the g format (for ILS).

`xxxx_net_measures.csv`: table containing some stats on the network (number of links, etc.).

`xxxx_nodes_Gephi.csv`: list of nodes (i.e. MEPs), with details.

`plots`: synthesis plots from the paper.

-------------------------

License. These data are shared under a Creative Commons 0 license.

Contact. Vincent Labatut <vincent.labatut@univ-avignon.fr> & Rosa Figueiredo <rosa.figueiredo@univ-avignon.fr>
Z
NetVotes iKnow Dataset
data.niaid.nih.gov
Updated Oct 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Labatut, Vincent (2024). NetVotes iKnow Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6816075
Explore at:
Dataset updated
Oct 1, 2024
Dataset provided by
Arınık, Nejat
Labatut, Vincent
Figueiredo, Rosa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description. This is the data used in the experiment of the following conference paper:

N. Arınık, R. Figueiredo, and V. Labatut, “Signed Graph Analysis for the Interpretation of Voting Behavior,” in International Conference on Knowledge Technologies and Data-driven Business - International Workshop on Social Network Analysis and Digital Humanities, Graz, AT, 2017, vol. 2025. ⟨hal-01583133⟩

Source code. The code source is accessible on GitHub: https://github.com/CompNet/NetVotes

Citation. If you use the data or source code, please cite the above paper.

@InProceedings{Arinik2017, author = {Arınık, Nejat and Figueiredo, Rosa and Labatut, Vincent}, title = {Signed Graph Analysis for the Interpretation of Voting Behavior}, booktitle = {International Conference on Knowledge Technologies and Data-driven Business - International Workshop on Social Network Analysis and Digital Humanities}, year = {2017}, volume = {2025}, series = {CEUR Workshop Proceedings}, address = {Graz, AT}, url = {http://ceur-ws.org/Vol-2025/paper_rssna_1.pdf},}

Details.

RAW INPUT FILESThe 'itsyourparliament' folder contains all raw input files for further data processing (such as network extraction).The folder structure is as follows:* itsyourparliament/** domains: There are 28 domain files. Each file corresponds to a domain (such as Agriculture, Economy, etc.) and contains corresponding vote identifiers and their "itsyourparliament.eu" links.** meps: There are 870 Member of Parliament (MEP) files. Each file contains the MEP information (such as name, country, address, etc.)** votes: There are 7513 vote files. Each file contains the votes expressed by MEPs# NETWORKS AND CORRESPONDING PARTITIONSThis work studies the voting behavior of French and Italian MEPs on "Agriculture and Rural Development" (AGRI) and "Economic and Monetary Affairs" (ECON) for each separate year of the 7th EP term (2009-10, 2010-11, 2011-12, 2012-13, 2013-14). Note that the interpretation part (section 4) of the published paper is limited to only a few of these instances (2009-10 in ECON and 2012-13 in AGRI).The extracted networks are located in the "networks" folder and the corresponding partitions are in the "partitions" folder. Both folders have the same structure, which is as follows:COUNTRY-NAME|_DOMAIN-NAME|_2009-10|_2010-11|_2011-12|_2012-13|_2013-14## NETWORKSThe networks in this folder are used in the article. All those networks are the ones obtained after the filtering step (as explained in the article). The networks are in 'Graphml' format. These networks are enriched with some MEPs' properties (such as name, political party, etc.) associated with each node.## ALL NETWORKSFor those who are interested in other countries or domains, we make available all possible networks that we can extract from raw data with vs. without filtering step.COUNTRY-NAME|_m3|_negtr=NA_postr=NA: This folder contains all filtered networks. Note that the filtering step is explained in Section 2.1.2 of the article.|_bygroup|_bycountry|_negtr=0_postr=0: This folder contains all original networks (i.e. no filtering step).|_bygroup|_bycountry## PARTITIONSThe partitions are obtained in this way: First, the Ex-CC (exact) method is run and we denote 'k' for the the number of detected cluster in output. This 'k' value is the reference point in order to run the ILS-RCC (heuristic) method by specifying the number of desired cluster in output. Then, ILS-RCC is run with various values ('k', 'k+1', 'k+2'). All those results are integrated into the initial network graphml files and then converted into gephi format so that this will help dive in the results in interactive way.Note that we need to handle the absent MEPs in clustering results. Because, those MEPs correspond to isolated nodes in networks. Each isolated node is considered a single cluster node in Ex-CC results. We simply omit those nodes in order to find the 'k' (number of detected cluster) value before running ILS-RCC. Not also that ILS-RCC does not process isolated nodes such that an isolated node can be part of a cluster.

----------------------# COMPARISON RESULTSThe 'material-stats' folder contains all the comparison results obtained for Ex-CC and ILS-CC. The csv files associated with plots are also provided.The folder structure is as follows:* material-stats/** execTimePerf: The plot shows the execution time of Ex-CC and ILS-CC based on randomly generated complete networks of different size.** graphStructureAnalysis: The plots show the weights and links statistics for all instances.** ILS-CC-vs-Ex-CC: The folder contains 4 different comparisons between Ex-CC and ILS-CC: Imbalance difference, number of detected clusters, difference of the number of detected clusters, NMI (Normalized Mutual Information)

----------------------Funding: Agorantic FR 3621, FMJH Program Gaspard Monge in optimization and operation research (Project 2015-2842H)
T
Leading Indicators OECD: Reference Series: Gross Domestic Product: Original...
tradingeconomics.com
csv, excel, json, xml
Updated Jul 28, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2019). Leading Indicators OECD: Reference Series: Gross Domestic Product: Original Series for the United States [Dataset]. https://tradingeconomics.com/united-states/leading-indicators-oecd-reference-series-gross-domestic-product-original-series-for-the-united-states-fed-data.html
Explore at:
excel, xml, json, csvAvailable download formats
Dataset updated
Jul 28, 2019
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1976 - Dec 31, 2025
Area covered
United States
Description
Leading Indicators OECD: Reference Series: Gross Domestic Product: Original Series for the United States was 120.60270 Index 2010=1.00 in October of 2023, according to the United States Federal Reserve. Historically, Leading Indicators OECD: Reference Series: Gross Domestic Product: Original Series for the United States reached a record high of 120.60270 in October of 2023 and a record low of 11.55572 in July of 1947. Trading Economics provides the current actual value, an historical data chart and related indicators for Leading Indicators OECD: Reference Series: Gross Domestic Product: Original Series for the United States - last updated from the United States Federal Reserve on July of 2025.
F
Federal Debt: Total Public Debt
fred.stlouisfed.org
json
Updated Jun 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Federal Debt: Total Public Debt [Dataset]. https://fred.stlouisfed.org/series/GFDEBTN
Explore at:
jsonAvailable download formats
Dataset updated
Jun 3, 2025
License
https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Description
Graph and download economic data for Federal Debt: Total Public Debt (GFDEBTN) from Q1 1966 to Q1 2025 about public, debt, federal, government, and USA.
E
EconBiz Images for Text Extraction from Scholarly Figures
live.european-language-grid.eu
data.niaid.nih.gov
json
Updated Apr 14, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). EconBiz Images for Text Extraction from Scholarly Figures [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7506
Explore at:
jsonAvailable download formats
Dataset updated
Apr 14, 2016
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
"Scholarly figures are data visualizations like bar charts, pie charts, line graphs, maps, scatter plots or similar figures. Text extraction from scholarly figures is useful in many application scenarios, since text in scholarly figures often contains information that is not present in the surrounding text. This dataset is a corpus of 121 scholarly figures from the economics domain evaluating text extraction tools. We randomly extracted these figures from a corpus of 288,000 open access publications from EconBiz. The dataset resembles a wide variety of scholarly figures from bar charts to maps. We manually labeled the figures to create the gold standard.
We adjusted the provided gold standard to have a uniform format for all datasets. Each figure is accompanied by a TSV file (tab-separated values) where each entry corresponds to a text line which has the following structure:
X-coordinate of the center of the bounding box in pixel
Y-coordinate of the center of the bounding box in pixel
Width of the bounding box in pixel
Height of the bounding box in pixel
Rotation angle around its center in degree
Text inside the bounding box
In addition we provide the ground truth in JSON format. A schema file is included in each dataset as well. The dataset is accompanied with a ReadMe file with further information about the figures and their origin.
If you use this dataset in your own work, please cite one of the papers in the references."
C
HRApop Full Dataset Graph
lod.humanatlas.io
jsonld
Updated Jun 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andreas Bueckle; Bruce Herr; Lu Chen; Daniel Bolin; Vicky Daiya; Devin Wright; Danial Qaurooni; Fusheng Wang; Katy Börner (2025). HRApop Full Dataset Graph [Dataset]. https://lod.humanatlas.io/ds-graph/hra-pop-full/latest/
Explore at:
jsonldAvailable download formats
Dataset updated
Jun 15, 2025
Authors
Andreas Bueckle; Bruce Herr; Lu Chen; Daniel Bolin; Vicky Daiya; Devin Wright; Danial Qaurooni; Fusheng Wang; Katy Börner
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset funded by
National Institutes of Health
Description
This ds-graph represents this information for the Human Reference Atlas Cell Type Populations Effort (Börner et al. 2025). It provides sample registration information submitted by consortium members in single-cell atlassing efforts, including accurate sample sizes and positions (Bueckle et al. 2025). When combined with ref-organ data, this information helps create 3D visual tissue sample placements. Additionally, the sample information is linked to datasets from researchers' assay analyses that offer deeper insights into the tissue samples. The “ds” stands for “dataset.” ds-graphs represent datasets by tissue sample and donor. It is a dataset graph for the Human Reference Atlaspop Universe. It includes all datasets considered for Human Reference Atlaspop (not enriched).

Bibliography:

Börner, Katy, Philip D. Blood, Jonathan C. Silverstein, Matthew Ruffalo, Rahul Satija, Sarah A. Teichmann, Gloria J. Pryhuber, et al. 2025. “Human BioMolecular Atlas Program (HuBMAP): 3D Human Reference Atlas Construction and Usage.” Nature Methods, March, 1–16. https://doi.org/10.1038/s41592-024-02563-5.

Bueckle, Andreas, Bruce W. Herr II, Josef Hardi, Ellen M. Quardokus, Mark A. Musen, and Katy Börner. 2025. “Construction, Deployment, and Usage of the Human Reference Atlas Knowledge Graph for Linked Open Data.” bioRxiv. https://doi.org/10.1101/2024.12.22.630006.

Lonsdale, John, Jeffrey Thomas, Mike Salvatore, Rebecca Phillips, Edmund Lo, Saboor Shad, Richard Hasz, et al. 2013. “The Genotype-Tissue Expression (GTEx) Project.” Nature Genetics 45 (6): 580–85. https://doi.org/10.1038/ng.2653.

Börner, Katy, Andreas Bueckle, Bruce W. Her II, Leonard E. Cross, Ellen M. Quardokus, Elizabeth G. Record, Yingnan Ju, et al. 2022. “Tissue Registration and Exploration User Interfaces in Support of a Human Reference Atlas.” Communications Biology 5 (1): 1369. https://doi.org/10.1038/s42003-022-03644-x.
P
CHOCOLATE Dataset
paperswithcode.com
Updated Dec 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kung-Hsiang Huang; Mingyang Zhou; Hou Pong Chan; Yi R. Fung; Zhenhailong Wang; Lingyu Zhang; Shih-Fu Chang; Heng Ji (2023). CHOCOLATE Dataset [Dataset]. https://paperswithcode.com/dataset/chocolate
Explore at:
Dataset updated
Dec 20, 2023
Authors
Kung-Hsiang Huang; Mingyang Zhou; Hou Pong Chan; Yi R. Fung; Zhenhailong Wang; Lingyu Zhang; Shih-Fu Chang; Heng Ji
Description
CHOCOLATE is a benchmark for detecting and correcting factual inconsistency in generated chart captions. It consists of captions produced by six advanced models, which are categorized into three subsets:

LVLM: GPT-4V, Bard (before Gemini) LLM-based Pipeline: DePlot + GPT-4 Fine-tuned Model: ChartT5, MatCha, UniChart

The charts are from two datasets: VisText and the Pew split of Chart-to-Text. In total, CHOCOLATE consists of 1,187 examples. Each instance in CHOCOLATE consists of a caption generated by one of the models and the annotations of the factual errors for each caption sentence.

Paper Information

Paper: https://arxiv.org/abs/2312.10160 Code: https://github.com/khuangaf/CHOCOLATE/ Project: https://khuangaf.github.io/CHOCOLATE

Citation If you use the CHOCOLATE dataset in your work, please kindly cite the paper using this BibTeX:

@misc{huang-etal-2023-do, title = "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning", author = "Huang, Kung-Hsiang and Zhou, Mingyang and Chan, Hou Pong and Fung, Yi R. and Wang, Zhenhailong and Zhang, Lingyu and Chang, Shih-Fu and Ji, Heng", year={2023}, eprint={2312.10160}, archivePrefix={arXiv}, primaryClass={cs.CL} }
T
Leading Indicators OECD: Reference series: Gross Domestic Product (GDP):...
tradingeconomics.com
csv, excel, json, xml
Updated May 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2025). Leading Indicators OECD: Reference series: Gross Domestic Product (GDP): Normalised for the United States [Dataset]. https://tradingeconomics.com/united-states/leading-indicators-oecd-reference-series-gross-domestic-product-gdp-normalised-for-the-united-states-fed-data.html
Explore at:
json, csv, excel, xmlAvailable download formats
Dataset updated
May 15, 2025
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1976 - Dec 31, 2025
Area covered
United States
Description
Leading Indicators OECD: Reference series: Gross Domestic Product (GDP): Normalised for the United States was 100.49120 Index in November of 2023, according to the United States Federal Reserve. Historically, Leading Indicators OECD: Reference series: Gross Domestic Product (GDP): Normalised for the United States reached a record high of 102.98160 in May of 1973 and a record low of 92.02608 in May of 2020. Trading Economics provides the current actual value, an historical data chart and related indicators for Leading Indicators OECD: Reference series: Gross Domestic Product (GDP): Normalised for the United States - last updated from the United States Federal Reserve on July of 2025.
T
Leading Indicators OECD: Reference series: Gross Domestic Product (GDP):...
tradingeconomics.com
csv, excel, json, xml
Updated May 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2025). Leading Indicators OECD: Reference series: Gross Domestic Product (GDP): Trend for the United States [Dataset]. https://tradingeconomics.com/united-states/leading-indicators-oecd-reference-series-gross-domestic-product-gdp-trend-for-the-united-states-fed-data.html
Explore at:
xml, csv, excel, jsonAvailable download formats
Dataset updated
May 15, 2025
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1976 - Dec 31, 2025
Area covered
United States
Description
Leading Indicators OECD: Reference series: Gross Domestic Product (GDP): Trend for the United States was 119.91160 Index in November of 2023, according to the United States Federal Reserve. Historically, Leading Indicators OECD: Reference series: Gross Domestic Product (GDP): Trend for the United States reached a record high of 119.91160 in November of 2023 and a record low of 11.29785 in February of 1947. Trading Economics provides the current actual value, an historical data chart and related indicators for Leading Indicators OECD: Reference series: Gross Domestic Product (GDP): Trend for the United States - last updated from the United States Federal Reserve on July of 2025.
h
larynx-male (v1.0) graph data
purl.humanatlas.io
application/n-quads +5
Updated Jun 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HRA Digital Object Processor (2025). larynx-male (v1.0) graph data [Dataset]. https://purl.humanatlas.io/ref-organ/larynx-male/v1.0
Explore at:
rdf, ttl, application/n-triples, application/n-quads, json, jsonldAvailable download formats
Dataset updated
Jun 12, 2025
Dataset authored and provided by
HRA Digital Object Processor
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The graph representation of the 3D Reference Organ for Larynx, Male dataset.

Facebook

Twitter

Click to copy link

Link copied

Cite

Caselaw Access Project (2020). Citation Graph [Dataset]. https://www.kaggle.com/harvardlil/citation-graph/code

Citation Graph

CAP Citation Graph Citations and Metadata

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jun 30, 2020

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Caselaw Access Project

Description

Context

The Caselaw Access Project makes 40 million pages of U.S. caselaw freely available online from the collections of Harvard Law School Library.

The CAP citation graph shows the connections between cases in the Caselaw Access Project dataset. You can use the citation graph to answer questions like "what is the most influential case?" and "what jurisdictions cite most often to this jurisdiction?".

Learn More: https://case.law/download/citation_graph/

Access Limits: https://case.law/api/#limits

Content

This dataset includes citations and metadata for the CAP citation graph in CSV format.

Acknowledgements

The Caselaw Access Project is by the Library Innovation Lab at Harvard Law School Library.

Inspiration

People are using CAP data to create research, applications, and more. We're sharing examples in our gallery.

Cite Grid is the first visualization we've created based on data from our citation graph.

Have something to share? We're excited to hear about it.

Clear search

Close search

Google apps

Main menu

Citation Graph

Context

Content

Acknowledgements

Inspiration

Arxiv HEP-TH citation graph Dataset

MAG for Heterogeneous Graph Learning

DBLP Dataset

PubMedCite Dataset

Data from: Citeseer Dataset

Data from: 3DCP.fyi - A Comprehensive Citation Network Graph on the State of...

Transaction Graph Dataset for the Bitcoin Blockchain - Part 2 of 4 - Dataset...

Transaction Graph Dataset for the Ethereum Blockchain - Dataset - CryptoData...

Microsoft Academic Graph

NetVotes ENIC Dataset

NetVotes iKnow Dataset

Leading Indicators OECD: Reference Series: Gross Domestic Product: Original...

Federal Debt: Total Public Debt

EconBiz Images for Text Extraction from Scholarly Figures

HRApop Full Dataset Graph

CHOCOLATE Dataset

Leading Indicators OECD: Reference series: Gross Domestic Product (GDP):...

Leading Indicators OECD: Reference series: Gross Domestic Product (GDP):...

larynx-male (v1.0) graph data

Citation Graph

CAP Citation Graph Citations and Metadata

Context

Content

Acknowledgements

Inspiration