100+ datasets found
  1. Citation Graph

    • kaggle.com
    Updated Jun 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Caselaw Access Project (2020). Citation Graph [Dataset]. https://www.kaggle.com/harvardlil/citation-graph/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 30, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Caselaw Access Project
    Description

    Context

    The Caselaw Access Project makes 40 million pages of U.S. caselaw freely available online from the collections of Harvard Law School Library.

    The CAP citation graph shows the connections between cases in the Caselaw Access Project dataset. You can use the citation graph to answer questions like "what is the most influential case?" and "what jurisdictions cite most often to this jurisdiction?".

    Learn More: https://case.law/download/citation_graph/

    Access Limits: https://case.law/api/#limits

    Content

    This dataset includes citations and metadata for the CAP citation graph in CSV format.

    Acknowledgements

    The Caselaw Access Project is by the Library Innovation Lab at Harvard Law School Library.

    Inspiration

    People are using CAP data to create research, applications, and more. We're sharing examples in our gallery.

    Cite Grid is the first visualization we've created based on data from our citation graph.

    Have something to share? We're excited to hear about it.

  2. P

    Arxiv HEP-TH citation graph Dataset

    • paperswithcode.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arxiv HEP-TH citation graph Dataset [Dataset]. https://paperswithcode.com/dataset/arxiv
    Explore at:
    Description

    Arxiv HEP-TH (high energy physics theory) citation graph is from the e-print arXiv and covers all the citations within a dataset of 27,770 papers with 352,807 edges. If a paper i cites paper j, the graph contains a directed edge from i to j. If a paper cites, or is cited by, a paper outside the dataset, the graph does not contain any information about this. The data covers papers in the period from January 1993 to April 2003 (124 months).

  3. Z

    MAG for Heterogeneous Graph Learning

    • data.niaid.nih.gov
    Updated Jul 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diea, Maria-Alexandra (2021). MAG for Heterogeneous Graph Learning [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5055135
    Explore at:
    Dataset updated
    Jul 9, 2021
    Dataset authored and provided by
    Diea, Maria-Alexandra
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    We provide an academic graph based on a snapshot of the Microsoft Academic Graph from 26.05.2021. The Microsoft Academic Graph (MAG) is a large-scale dataset containing information about scientific publication records, their citation relations, as well as authors, affiliations, journals, conferences and fields of study. We acknowledge the Microsoft Academic Graph using the URI https://aka.ms/msracad. For more information regarding schema and the entities present in the original dataset please refer to: MAG schema.

    MAG for Heterogeneous Graph Learning We use a recent version of MAG from May 2021 and extract all relevant entities to build a graph that can be directly used for heterogeneous graph learning (node classification, link prediction, etc.). The graph contains all English papers, published after 1900, that have been cited at least 5 times per year since the time of publishing. For fairness, we set a constant citation bound of 100 for papers published before 2000. We further include two smaller subgraphs, one containing computer science papers and one containing medicine papers.

    Nodes and features We define the following nodes:

    paper with mag_id, graph_id, normalized title, year of publication, citations and a 128-dimension title embedding built using word2vec No. of papers: 5,091,690 (all), 1,014,769 (medicine), 367,576 (computer science);

    author with mag_id, graph_id, normalized name, citations No. of authors: 6,363,201 (all), 1,797,980 (medicine), 557,078 (computer science);

    field with mag_id, graph_id, level, citations denoting the hierarchical level of the field where 0 is the highest-level (e.g. computer science) No. of fields: 199,457 (all), 83,970 (medicine), 45,454 (computer science);

    affiliation with mag_id, graph_id, citations No. of affiliations: 19,421 (all), 12,103 (medicine), 10,139 (computer science);

    venue with mag_id, graph_id, citations, type denoting whether conference or journal No. of venues: 24,608 (all), 8,514 (medicine), 9,893 (computer science).

    Edges We define the following edges:

    author is_affiliated_with affiliation No. of author-affiliation edges: 8,292,253 (all), 2,265,728 (medicine), 665,931 (computer science);

    author is_first/last/other paper No. of author-paper edges: 24,907,473 (all), 5,081,752 (medicine), 1,269,485 (computer science);

    paper has_citation_to paper No. of author-affiliation edges: 142,684,074 (all), 16,808,837 (medicine), 4,152,804 (computer science);

    paper conference/journal_published_at venue No. of author-affiliation edges: 5,091,690 (all), 1,014,769 (medicine), 367,576 (computer science);

    paper has_field_L0/L1/L2/L3/L4 field No. of author-affiliation edges: 47,531,366 (all), 9,403,708 (medicine), 3,341,395 (computer science);

    field is_in field No. of author-affiliation edges: 339,036 (all), 138,304 (medicine), 83,245 (computer science);

    We further include a reverse edge for each edge type defined above that is denoted with the prefix rev_ and can be removed based on the downstream task.

    Data structure The nodes and their respective features are provided as separate .tsv files where each feature represents a column. The edges are provided as a pickled python dictionary with schema:

    {target_type: {source_type: {edge_type: {target_id: {source_id: {time } } } } } }

    We provide three compressed ZIP archives, one for each subgraph (all, medicine, computer science), however we split the file for the complete graph into 500mb chunks. Each archive contains the separate node features and edge dictionary.

  4. P

    DBLP Dataset

    • paperswithcode.com
    Updated Apr 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jie Tang; Jing Zhang; Limin Yao; Juanzi Li; Li Zhang; Zhong Su (2021). DBLP Dataset [Dataset]. https://paperswithcode.com/dataset/dblp
    Explore at:
    Dataset updated
    Apr 13, 2021
    Authors
    Jie Tang; Jing Zhang; Limin Yao; Juanzi Li; Li Zhang; Zhong Su
    Description

    The DBLP is a citation network dataset. The citation data is extracted from DBLP, ACM, MAG (Microsoft Academic Graph), and other sources. The first version contains 629,814 papers and 632,752 citations. Each paper is associated with abstract, authors, year, venue, and title. The data set can be used for clustering with network and side information, studying influence in the citation network, finding the most influential papers, topic modeling analysis, etc.

  5. P

    PubMedCite Dataset

    • paperswithcode.com
    Updated Jan 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zheheng Luo; Qianqian Xie; Sophia Ananiadou (2023). PubMedCite Dataset [Dataset]. https://paperswithcode.com/dataset/pubmedcite
    Explore at:
    Dataset updated
    Jan 25, 2023
    Authors
    Zheheng Luo; Qianqian Xie; Sophia Ananiadou
    Description

    PubMedCite is a domain-specific dataset with about 192K biomedical scientific papers and a large citation graph preserving 917K citation relationships between them. It is characterized by preserving the salient contents extracted from full texts of references, and the weighted correlation between the salient.

  6. P

    Data from: Citeseer Dataset

    • paperswithcode.com
    • huggingface.co
    Updated Mar 4, 2007
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    C. Lee Giles; Kurt D. Bollacker; Steve Lawrence (2007). Citeseer Dataset [Dataset]. https://paperswithcode.com/dataset/citeseer
    Explore at:
    Dataset updated
    Mar 4, 2007
    Authors
    C. Lee Giles; Kurt D. Bollacker; Steve Lawrence
    Description

    The CiteSeer dataset consists of 3312 scientific publications classified into one of six classes. The citation network consists of 4732 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 3703 unique words.

  7. Z

    Data from: 3DCP.fyi - A Comprehensive Citation Network Graph on the State of...

    • data.niaid.nih.gov
    Updated Apr 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fischer, Oliver (2024). 3DCP.fyi - A Comprehensive Citation Network Graph on the State of the Art in 3D Concrete Printing [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10973877
    Explore at:
    Dataset updated
    Apr 15, 2024
    Dataset provided by
    Auer, Daniel
    Fischer, Oliver
    Bos, Freek
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Research in digital fabrication, specifically in 3D concrete printing (3DCP), has seen a substantial increase in publication output in the past five years, making it hard to keep up with the latest developments. The 3dcp.fyi database aims to provide the research community with a comprehensive, up-to-date, and manually curated literature dataset documenting the development of the field from its early beginnings in the late 1990s to its resurgence in the 2010s until today. The data set is compiled using a systematic approach. A thorough literature search was conducted in scientific databases, following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) scheme. This was then enhanced iteratively with non-indexed literature through a snowball citation search. The authors of the articles were assigned unique and persistent identifiers (ORCID® IDs) through a systematic process that combined querying APIs systematically and manually curating data. The works in the data set also include references to other works, as long as those referenced works are also included within the same data set. A citation network graph is created where scientific articles are represented as vertices, and their citations to other scientific articles are the edges. The constructed network graph is subjected to detailed analysis using specific graph-theoretic algorithms, like PageRank. These algorithms evaluate the structure and connections within the graph, yielding quantitative metrics. Currently, the high-quality dataset contains more than 2600 manually curated scientific works, including journal articles, conference articles, books, and theses, with more than 40000 cross-references and 2000 authors, opening up the possibility for more detailed analysis. The data is published on https://3dcp.fyi, ready for import into several reference managers, and is continuously updated. We encourage researchers to enrich the database by submitting their publications, adding missing works, or suggesting new features.

  8. Transaction Graph Dataset for the Bitcoin Blockchain - Part 2 of 4 - Dataset...

    • cryptodata.center
    Updated Dec 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    cryptodata.center (2024). Transaction Graph Dataset for the Bitcoin Blockchain - Part 2 of 4 - Dataset - CryptoData Hub [Dataset]. https://cryptodata.center/dataset/transaction-graph-dataset-for-the-bitcoin-blockchain-part-2-of-4
    Explore at:
    Dataset updated
    Dec 4, 2024
    Dataset provided by
    CryptoDATA
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains bitcoin transfer transactions extracted from the Bitcoin Mainnet blockchain. Details of the datasets are given below: FILENAME FORMAT: The filenames have the following format: btc-tx- where For example file btc-tx-100000-149999-aa.bz2 and the rest of the parts if any contain transactions from block 100000 to block 149999 inclusive. The files are compressed with bzip2. They can be uncompressed using command bunzip2. TRANSACTION FORMAT: Each line in a file corresponds to a transaction. The transaction has the following format: BLOCK TIME FORMAT: The block time file has the following format: IMPORTANT NOTE: Public Bitcoin Mainnet blockchain data is open and can be obtained by connecting as a node on the blockchain or by using the block explorer web sites such as https://btcscan.org . The downloaders and users of this dataset accept the full responsibility of using the data in GDPR compliant manner or any other regulations. We provide the data as is and we cannot be held responsible for anything. NOTE: If you use this dataset, please do not forget to add the DOI number to the citation. If you use our dataset in your research, please also cite our paper: https://link.springer.com/chapter/10.1007/978-3-030-94590-9_14 @incollection{kilicc2022analyzing, title={Analyzing Large-Scale Blockchain Transaction Graphs for Fraudulent Activities}, author={K{\i}l{\i}{\c{c}}, Baran and {"O}zturan, Can and {\c{S}}en, Alper}, booktitle={Big Data and Artificial Intelligence in Digital Finance}, pages={253--267}, year={2022}, publisher={Springer, Cham} }

  9. Transaction Graph Dataset for the Ethereum Blockchain - Dataset - CryptoData...

    • cryptodata.center
    Updated Dec 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    cryptodata.center (2024). Transaction Graph Dataset for the Ethereum Blockchain - Dataset - CryptoData Hub [Dataset]. https://cryptodata.center/dataset/transaction-graph-dataset-for-the-ethereum-blockchain
    Explore at:
    Dataset updated
    Dec 4, 2024
    Dataset provided by
    CryptoDATA
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Уникальный идентификатор https://doi.org/10.5281/zenodo.4718440 Набор данных обновлен Dec 19, 2022 Набор данных предоставлен Zenodo Авторы Can Özturan; Can Özturan; Alper Şen; Alper Şen; Baran Kılıç; Baran Kılıç Лицензия Attribution 4.0 (CC BY 4.0) Информация о лицензии была получена автоматически Описание This dataset contains ether as well as popular ERC20 token transfer transactions extracted from the Ethereum Mainnet blockchain. Only send ether, contract function call, contract deployment transactions are present in the dataset. Miner reward (static block reward) and "uncle block inclusion reward" are added as transactions to the dataset. Transaction fee reward and "uncles reward" are not currently included in the dataset. Details of the datasets are given below: FILENAME FORMAT: The filenames have the following format: eth-tx- where For example file eth-tx-1000000-1099999.txt.bz2 contains transactions from block 1000000 to block 1099999 inclusive. The files are compressed with bzip2. They can be uncompressed using command bunzip2. TRANSACTION FORMAT: Each line in a file corresponds to a transaction. The transaction has the following format: units. ERC20 tokens transfers (transfer and transferFrom function calls in ERC20 contract) are indicated by token symbol. For example GUSD is Gemini USD stable coin. The JSON file erc20tokens.json given below contains the details of ERC20 tokens. Failed transactions are prefixed with "F-". BLOCK TIME FORMAT: The block time file has the following format: erc20tokens.json FILE: This file contains the list of popular ERC20 token contracts whose transfer/transferFrom transactions appear in the data files. ERC20 token list: USDT TRYb XAUt BNB LEO LINK HT HEDG MKR CRO VEN INO PAX INB SNX REP MOF ZRX SXP OKB XIN OMG SAI HOT DAI EURS HPT BUSD USDC SUSD HDG QCAD PLUS BTCB WBTC cWBTC renBTC sBTC imBTC pBTC IMPORTANT NOTE: Public Ethereum Mainnet blockchain data is open and can be obtained by connecting as a node on the blockchain or by using the block explorer web sites such as http://etherscan.io . The downloaders and users of this dataset accept the full responsibility of using the data in GDPR compliant manner or any other regulations. We provide the data as is and we cannot be held responsible for anything. NOTE: If you use this dataset, please do not forget to add the DOI number to the citation. If you use our dataset in your research, please also cite our paper: https://link.springer.com/article/10.1007/s10586-021-03511-0 @article{kilic2022parallel, title={Parallel Analysis of Ethereum Blockchain Transaction Data using Cluster Computing}, journal={Cluster Computing}, author={K{\i}l{\i}{\c{c}}, Baran and {"O}zturan, Can and Sen, Alper}, year={2022}, month={Jan} }

  10. Microsoft Academic Graph

    • zenodo.org
    • explore.openaire.eu
    application/gzip
    Updated Apr 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Microsoft Academic; Microsoft Academic (2023). Microsoft Academic Graph [Dataset]. http://doi.org/10.5281/zenodo.2593154
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Apr 6, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Microsoft Academic; Microsoft Academic
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    This is the Microsoft Academic Graph data from 2019-03-02. To get this, you'd normally jump through these hoops: https://docs.microsoft.com/en-us/academic-services/graph/get-started-setup-provisioning

    As required by ODC-BY, I acknowledge Microsoft Academic using the URI https://aka.ms/msracad.

    You can find out more about the data schema of the Microsoft Academic Graph at: https://docs.microsoft.com/en-us/academic-services/graph/reference-data-schema
    Since Microsoft docs are covered by different licensing terms, the documentation cannot be provided along with the data.

    There were no changes to the files except compressing them with gzip. They were downloaded and checked twice.
    After uploading, the md5-hashes from Zenodo match the locally created compressed files.

    The compressed files will expand to the following sizes (in bytes):

       4563254 Affiliations.txt        
     16498013834 Authors.txt          
       2220754 ConferenceInstances.txt    
       427502 ConferenceSeries.txt      
      55232571 FieldsOfStudy.txt       
       5685746 Journals.txt          
     32387110344 PaperAuthorAffiliations.txt  
     32326282060 PaperReferences.txt      
       7763965 PaperResources.txt       
     60135810372 Papers.txt           
     22000508908 PaperUrls.txt
    --------------------------------
    163423619310 total (~152GiB)

  11. NetVotes ENIC Dataset

    • zenodo.org
    • explore.openaire.eu
    txt, zip
    Updated Oct 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Israel Mendonça; Vincent Labatut; Vincent Labatut; Rosa Figueiredo; Rosa Figueiredo; Israel Mendonça (2024). NetVotes ENIC Dataset [Dataset]. http://doi.org/10.5281/zenodo.6815510
    Explore at:
    zip, txtAvailable download formats
    Dataset updated
    Oct 1, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Israel Mendonça; Vincent Labatut; Vincent Labatut; Rosa Figueiredo; Rosa Figueiredo; Israel Mendonça
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description. The NetVote dataset contains the outputs of the NetVote program when applied to voting data coming from VoteWatch (http://www.votewatch.eu/).

    These results were used in the following conference papers:

    1. I. Mendonça, R. Figueiredo, V. Labatut, and P. Michelon, “Relevance of Negative Links in Graph Partitioning: A Case Study Using Votes From the European Parliament,” in 2nd European Network Intelligence Conference, 2015, pp. 122–129. ⟨hal-01176090⟩ DOI: 10.1109/ENIC.2015.25
    2. I. Mendonça, R. Figueiredo, V. Labatut, and P. Michelon, “Informative Value of Negative Links for Graph Partitioning, with an application to European Parliament Votes,” in 6ème Conférence sur les modèles et lánalyse de réseaux : approches mathématiques et informatiques, 2015, p. 12p. ⟨hal-02055158⟩

    Source code. The NetVote source code is available on GitHub: https://github.com/CompNet/NetVotes.

    Citation. If you use our dataset or tool, please cite article [1] above.


    @InProceedings{Mendonca2015,
    author = {Mendonça, Israel and Figueiredo, Rosa and Labatut, Vincent and Michelon, Philippe},

    title = {Relevance of Negative Links in Graph Partitioning: A Case Study Using Votes From the {E}uropean {P}arliament},
    booktitle = {2\textsuperscript{nd} European Network Intelligence Conference ({ENIC})},
    year = {2015},
    pages = {122-129},
    address = {Karlskrona, SE},
    publisher = {IEEE Publishing},
    doi = {10.1109/ENIC.2015.25},
    }

    -------------------------

    Details. This archive contains the following folders:

    • `votewatch_data`: the raw data extracted from the VoteWatch website.
      • `VoteWatch Europe European Parliament, Council of the EU.csv`: list of the documents voted during the considered term, with some details such as the date and topic.
      • `votes_by_document`: this folder contains a collection of CSV files, each one describing the outcome of the vote session relatively to one specific document.
      • `intermediate_files`: this folder contains several CSV files:
        • `allvotes.csv`: concatenation of all vote outcomes for all documents and all MEPS. Can be considered as a compact representation of the data contained in the folder `votes_by_document`.
        • `loyalty.csv`: same thing than allvotes.csv, but for the loyalty (i.e. whether or not the MEP voted like the majority of the MEPs in his political group).
        • `MPs.csv`: list of the MEPs having voted at least once in the considered term, with their details.
        • `policies.csv`: list of the topics considered during the term.
        • `qtd_docs.csv`: list of the topics with the corresponding number of documents.
    • `parallel_ils_results`: contains the raw results of the ILS tool. This is an external algorithm able to estimate the optimal partition of the network nodes in terms of structural balance. It was applied to all the networks extracted by our scripts (from the VoteWatch data), and the produced files were placed here for postprocessing. Each subfolder corresponds to one of the topic-year pair.
    • `output_files`: contains the file produced by our scripts.
      • `agreement`: histograms representing the distributions of agreement and rebellion indices. Each subfolder corresponds to a specific topic.
      • `community_algorithms_csv`: Performances obtained by the partitioning algorithms (for both community detection and correlation clustering). Each subfolder corresponds to a specific topic.
      • `xxxx_cluster_information.csv`: table containing several variants of the imbalance measure, for the considered algorithms.
      • `community_algorithms_results`: Comparison of the partitions detected by the various algorithms considered, and distribution of the cluster/community sizes. Each subfolder corresponds to a specific topic.
      • `xxxx_cluster_comparison.csv`: table comparing the partitions detected by the community detection algorithms, in terms of Rand index and other measures.
      • `xxxx_ils_cluster_comparison.csv`: like `xxxx_cluster_comparison.csv`, except we compare the partition of community detection algorithms with that of the ILS.
      • `xxxx_yyyy_distribution.pdf`: histogram of the community (or cluster) sizes detected by algorithm `yyyy`.
      • `graphs`: the networks extracted from the vote data. Each subfolder corresponds to a specific topic.
      • `xxxx_complete_graph.graphml`: network at the Graphml format, with all the information: nodes, edges, nodal attributes (including communities), weights, etc.
      • `xxxx_edges_Gephi.csv`: only the links, with their weights (i.e. vote similarity).
      • `xxxx_graph.g`: network at the g format (for ILS).
      • `xxxx_net_measures.csv`: table containing some stats on the network (number of links, etc.).
      • `xxxx_nodes_Gephi.csv`: list of nodes (i.e. MEPs), with details.
      • `plots`: synthesis plots from the paper.

    -------------------------

    License. These data are shared under a Creative Commons 0 license.

    Contact. Vincent Labatut <vincent.labatut@univ-avignon.fr> & Rosa Figueiredo <rosa.figueiredo@univ-avignon.fr>

  12. Z

    NetVotes iKnow Dataset

    • data.niaid.nih.gov
    Updated Oct 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Labatut, Vincent (2024). NetVotes iKnow Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6816075
    Explore at:
    Dataset updated
    Oct 1, 2024
    Dataset provided by
    Arınık, Nejat
    Labatut, Vincent
    Figueiredo, Rosa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description. This is the data used in the experiment of the following conference paper:

    N. Arınık, R. Figueiredo, and V. Labatut, “Signed Graph Analysis for the Interpretation of Voting Behavior,” in International Conference on Knowledge Technologies and Data-driven Business - International Workshop on Social Network Analysis and Digital Humanities, Graz, AT, 2017, vol. 2025. ⟨hal-01583133⟩

    Source code. The code source is accessible on GitHub: https://github.com/CompNet/NetVotes

    Citation. If you use the data or source code, please cite the above paper.

    @InProceedings{Arinik2017, author = {Arınık, Nejat and Figueiredo, Rosa and Labatut, Vincent}, title = {Signed Graph Analysis for the Interpretation of Voting Behavior}, booktitle = {International Conference on Knowledge Technologies and Data-driven Business - International Workshop on Social Network Analysis and Digital Humanities}, year = {2017}, volume = {2025}, series = {CEUR Workshop Proceedings}, address = {Graz, AT}, url = {http://ceur-ws.org/Vol-2025/paper_rssna_1.pdf},}

    Details.

    RAW INPUT FILESThe 'itsyourparliament' folder contains all raw input files for further data processing (such as network extraction).The folder structure is as follows:* itsyourparliament/** domains: There are 28 domain files. Each file corresponds to a domain (such as Agriculture, Economy, etc.) and contains corresponding vote identifiers and their "itsyourparliament.eu" links.** meps: There are 870 Member of Parliament (MEP) files. Each file contains the MEP information (such as name, country, address, etc.)** votes: There are 7513 vote files. Each file contains the votes expressed by MEPs# NETWORKS AND CORRESPONDING PARTITIONSThis work studies the voting behavior of French and Italian MEPs on "Agriculture and Rural Development" (AGRI) and "Economic and Monetary Affairs" (ECON) for each separate year of the 7th EP term (2009-10, 2010-11, 2011-12, 2012-13, 2013-14). Note that the interpretation part (section 4) of the published paper is limited to only a few of these instances (2009-10 in ECON and 2012-13 in AGRI).The extracted networks are located in the "networks" folder and the corresponding partitions are in the "partitions" folder. Both folders have the same structure, which is as follows:COUNTRY-NAME|_DOMAIN-NAME|_2009-10|_2010-11|_2011-12|_2012-13|_2013-14## NETWORKSThe networks in this folder are used in the article. All those networks are the ones obtained after the filtering step (as explained in the article). The networks are in 'Graphml' format. These networks are enriched with some MEPs' properties (such as name, political party, etc.) associated with each node.## ALL NETWORKSFor those who are interested in other countries or domains, we make available all possible networks that we can extract from raw data with vs. without filtering step.COUNTRY-NAME|_m3|_negtr=NA_postr=NA: This folder contains all filtered networks. Note that the filtering step is explained in Section 2.1.2 of the article.|_bygroup|_bycountry|_negtr=0_postr=0: This folder contains all original networks (i.e. no filtering step).|_bygroup|_bycountry## PARTITIONSThe partitions are obtained in this way: First, the Ex-CC (exact) method is run and we denote 'k' for the the number of detected cluster in output. This 'k' value is the reference point in order to run the ILS-RCC (heuristic) method by specifying the number of desired cluster in output. Then, ILS-RCC is run with various values ('k', 'k+1', 'k+2'). All those results are integrated into the initial network graphml files and then converted into gephi format so that this will help dive in the results in interactive way.Note that we need to handle the absent MEPs in clustering results. Because, those MEPs correspond to isolated nodes in networks. Each isolated node is considered a single cluster node in Ex-CC results. We simply omit those nodes in order to find the 'k' (number of detected cluster) value before running ILS-RCC. Not also that ILS-RCC does not process isolated nodes such that an isolated node can be part of a cluster.

    ----------------------# COMPARISON RESULTSThe 'material-stats' folder contains all the comparison results obtained for Ex-CC and ILS-CC. The csv files associated with plots are also provided.The folder structure is as follows:* material-stats/** execTimePerf: The plot shows the execution time of Ex-CC and ILS-CC based on randomly generated complete networks of different size.** graphStructureAnalysis: The plots show the weights and links statistics for all instances.** ILS-CC-vs-Ex-CC: The folder contains 4 different comparisons between Ex-CC and ILS-CC: Imbalance difference, number of detected clusters, difference of the number of detected clusters, NMI (Normalized Mutual Information)

    ----------------------Funding: Agorantic FR 3621, FMJH Program Gaspard Monge in optimization and operation research (Project 2015-2842H)

  13. T

    Leading Indicators OECD: Reference Series: Gross Domestic Product: Original...

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Jul 28, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2019). Leading Indicators OECD: Reference Series: Gross Domestic Product: Original Series for the United States [Dataset]. https://tradingeconomics.com/united-states/leading-indicators-oecd-reference-series-gross-domestic-product-original-series-for-the-united-states-fed-data.html
    Explore at:
    excel, xml, json, csvAvailable download formats
    Dataset updated
    Jul 28, 2019
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1976 - Dec 31, 2025
    Area covered
    United States
    Description

    Leading Indicators OECD: Reference Series: Gross Domestic Product: Original Series for the United States was 120.60270 Index 2010=1.00 in October of 2023, according to the United States Federal Reserve. Historically, Leading Indicators OECD: Reference Series: Gross Domestic Product: Original Series for the United States reached a record high of 120.60270 in October of 2023 and a record low of 11.55572 in July of 1947. Trading Economics provides the current actual value, an historical data chart and related indicators for Leading Indicators OECD: Reference Series: Gross Domestic Product: Original Series for the United States - last updated from the United States Federal Reserve on July of 2025.

  14. F

    Federal Debt: Total Public Debt

    • fred.stlouisfed.org
    json
    Updated Jun 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Federal Debt: Total Public Debt [Dataset]. https://fred.stlouisfed.org/series/GFDEBTN
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Jun 3, 2025
    License

    https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain

    Description

    Graph and download economic data for Federal Debt: Total Public Debt (GFDEBTN) from Q1 1966 to Q1 2025 about public, debt, federal, government, and USA.

  15. E

    EconBiz Images for Text Extraction from Scholarly Figures

    • live.european-language-grid.eu
    • data.niaid.nih.gov
    json
    Updated Apr 14, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). EconBiz Images for Text Extraction from Scholarly Figures [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7506
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Apr 14, 2016
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    "Scholarly figures are data visualizations like bar charts, pie charts, line graphs, maps, scatter plots or similar figures. Text extraction from scholarly figures is useful in many application scenarios, since text in scholarly figures often contains information that is not present in the surrounding text. This dataset is a corpus of 121 scholarly figures from the economics domain evaluating text extraction tools. We randomly extracted these figures from a corpus of 288,000 open access publications from EconBiz. The dataset resembles a wide variety of scholarly figures from bar charts to maps. We manually labeled the figures to create the gold standard.

    We adjusted the provided gold standard to have a uniform format for all datasets. Each figure is accompanied by a TSV file (tab-separated values) where each entry corresponds to a text line which has the following structure:

    X-coordinate of the center of the bounding box in pixel

    Y-coordinate of the center of the bounding box in pixel

    Width of the bounding box in pixel

    Height of the bounding box in pixel

    Rotation angle around its center in degree

    Text inside the bounding box

    In addition we provide the ground truth in JSON format. A schema file is included in each dataset as well. The dataset is accompanied with a ReadMe file with further information about the figures and their origin.

    If you use this dataset in your own work, please cite one of the papers in the references."

  16. C

    HRApop Full Dataset Graph

    • lod.humanatlas.io
    jsonld
    Updated Jun 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andreas Bueckle; Bruce Herr; Lu Chen; Daniel Bolin; Vicky Daiya; Devin Wright; Danial Qaurooni; Fusheng Wang; Katy Börner (2025). HRApop Full Dataset Graph [Dataset]. https://lod.humanatlas.io/ds-graph/hra-pop-full/latest/
    Explore at:
    jsonldAvailable download formats
    Dataset updated
    Jun 15, 2025
    Authors
    Andreas Bueckle; Bruce Herr; Lu Chen; Daniel Bolin; Vicky Daiya; Devin Wright; Danial Qaurooni; Fusheng Wang; Katy Börner
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    National Institutes of Health
    Description

    This ds-graph represents this information for the Human Reference Atlas Cell Type Populations Effort (Börner et al. 2025). It provides sample registration information submitted by consortium members in single-cell atlassing efforts, including accurate sample sizes and positions (Bueckle et al. 2025). When combined with ref-organ data, this information helps create 3D visual tissue sample placements. Additionally, the sample information is linked to datasets from researchers' assay analyses that offer deeper insights into the tissue samples. The “ds” stands for “dataset.” ds-graphs represent datasets by tissue sample and donor. It is a dataset graph for the Human Reference Atlaspop Universe. It includes all datasets considered for Human Reference Atlaspop (not enriched).

    Bibliography:

    • Börner, Katy, Philip D. Blood, Jonathan C. Silverstein, Matthew Ruffalo, Rahul Satija, Sarah A. Teichmann, Gloria J. Pryhuber, et al. 2025. “Human BioMolecular Atlas Program (HuBMAP): 3D Human Reference Atlas Construction and Usage.” Nature Methods, March, 1–16. https://doi.org/10.1038/s41592-024-02563-5.
    • Bueckle, Andreas, Bruce W. Herr II, Josef Hardi, Ellen M. Quardokus, Mark A. Musen, and Katy Börner. 2025. “Construction, Deployment, and Usage of the Human Reference Atlas Knowledge Graph for Linked Open Data.” bioRxiv. https://doi.org/10.1101/2024.12.22.630006.
    • Lonsdale, John, Jeffrey Thomas, Mike Salvatore, Rebecca Phillips, Edmund Lo, Saboor Shad, Richard Hasz, et al. 2013. “The Genotype-Tissue Expression (GTEx) Project.” Nature Genetics 45 (6): 580–85. https://doi.org/10.1038/ng.2653.
    • Börner, Katy, Andreas Bueckle, Bruce W. Her II, Leonard E. Cross, Ellen M. Quardokus, Elizabeth G. Record, Yingnan Ju, et al. 2022. “Tissue Registration and Exploration User Interfaces in Support of a Human Reference Atlas.” Communications Biology 5 (1): 1369. https://doi.org/10.1038/s42003-022-03644-x.
  17. P

    CHOCOLATE Dataset

    • paperswithcode.com
    Updated Dec 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kung-Hsiang Huang; Mingyang Zhou; Hou Pong Chan; Yi R. Fung; Zhenhailong Wang; Lingyu Zhang; Shih-Fu Chang; Heng Ji (2023). CHOCOLATE Dataset [Dataset]. https://paperswithcode.com/dataset/chocolate
    Explore at:
    Dataset updated
    Dec 20, 2023
    Authors
    Kung-Hsiang Huang; Mingyang Zhou; Hou Pong Chan; Yi R. Fung; Zhenhailong Wang; Lingyu Zhang; Shih-Fu Chang; Heng Ji
    Description

    CHOCOLATE is a benchmark for detecting and correcting factual inconsistency in generated chart captions. It consists of captions produced by six advanced models, which are categorized into three subsets:

    LVLM: GPT-4V, Bard (before Gemini) LLM-based Pipeline: DePlot + GPT-4 Fine-tuned Model: ChartT5, MatCha, UniChart

    The charts are from two datasets: VisText and the Pew split of Chart-to-Text. In total, CHOCOLATE consists of 1,187 examples. Each instance in CHOCOLATE consists of a caption generated by one of the models and the annotations of the factual errors for each caption sentence.

    Paper Information

    Paper: https://arxiv.org/abs/2312.10160 Code: https://github.com/khuangaf/CHOCOLATE/ Project: https://khuangaf.github.io/CHOCOLATE

    Citation If you use the CHOCOLATE dataset in your work, please kindly cite the paper using this BibTeX:

    @misc{huang-etal-2023-do, title = "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning", author = "Huang, Kung-Hsiang and Zhou, Mingyang and Chan, Hou Pong and Fung, Yi R. and Wang, Zhenhailong and Zhang, Lingyu and Chang, Shih-Fu and Ji, Heng", year={2023}, eprint={2312.10160}, archivePrefix={arXiv}, primaryClass={cs.CL} }

  18. T

    Leading Indicators OECD: Reference series: Gross Domestic Product (GDP):...

    • tradingeconomics.com
    csv, excel, json, xml
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2025). Leading Indicators OECD: Reference series: Gross Domestic Product (GDP): Normalised for the United States [Dataset]. https://tradingeconomics.com/united-states/leading-indicators-oecd-reference-series-gross-domestic-product-gdp-normalised-for-the-united-states-fed-data.html
    Explore at:
    json, csv, excel, xmlAvailable download formats
    Dataset updated
    May 15, 2025
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1976 - Dec 31, 2025
    Area covered
    United States
    Description

    Leading Indicators OECD: Reference series: Gross Domestic Product (GDP): Normalised for the United States was 100.49120 Index in November of 2023, according to the United States Federal Reserve. Historically, Leading Indicators OECD: Reference series: Gross Domestic Product (GDP): Normalised for the United States reached a record high of 102.98160 in May of 1973 and a record low of 92.02608 in May of 2020. Trading Economics provides the current actual value, an historical data chart and related indicators for Leading Indicators OECD: Reference series: Gross Domestic Product (GDP): Normalised for the United States - last updated from the United States Federal Reserve on July of 2025.

  19. T

    Leading Indicators OECD: Reference series: Gross Domestic Product (GDP):...

    • tradingeconomics.com
    csv, excel, json, xml
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2025). Leading Indicators OECD: Reference series: Gross Domestic Product (GDP): Trend for the United States [Dataset]. https://tradingeconomics.com/united-states/leading-indicators-oecd-reference-series-gross-domestic-product-gdp-trend-for-the-united-states-fed-data.html
    Explore at:
    xml, csv, excel, jsonAvailable download formats
    Dataset updated
    May 15, 2025
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1976 - Dec 31, 2025
    Area covered
    United States
    Description

    Leading Indicators OECD: Reference series: Gross Domestic Product (GDP): Trend for the United States was 119.91160 Index in November of 2023, according to the United States Federal Reserve. Historically, Leading Indicators OECD: Reference series: Gross Domestic Product (GDP): Trend for the United States reached a record high of 119.91160 in November of 2023 and a record low of 11.29785 in February of 1947. Trading Economics provides the current actual value, an historical data chart and related indicators for Leading Indicators OECD: Reference series: Gross Domestic Product (GDP): Trend for the United States - last updated from the United States Federal Reserve on July of 2025.

  20. h

    larynx-male (v1.0) graph data

    • purl.humanatlas.io
    application/n-quads +5
    Updated Jun 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HRA Digital Object Processor (2025). larynx-male (v1.0) graph data [Dataset]. https://purl.humanatlas.io/ref-organ/larynx-male/v1.0
    Explore at:
    rdf, ttl, application/n-triples, application/n-quads, json, jsonldAvailable download formats
    Dataset updated
    Jun 12, 2025
    Dataset authored and provided by
    HRA Digital Object Processor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The graph representation of the 3D Reference Organ for Larynx, Male dataset.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Caselaw Access Project (2020). Citation Graph [Dataset]. https://www.kaggle.com/harvardlil/citation-graph/code
Organization logo

Citation Graph

CAP Citation Graph Citations and Metadata

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 30, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Caselaw Access Project
Description

Context

The Caselaw Access Project makes 40 million pages of U.S. caselaw freely available online from the collections of Harvard Law School Library.

The CAP citation graph shows the connections between cases in the Caselaw Access Project dataset. You can use the citation graph to answer questions like "what is the most influential case?" and "what jurisdictions cite most often to this jurisdiction?".

Learn More: https://case.law/download/citation_graph/

Access Limits: https://case.law/api/#limits

Content

This dataset includes citations and metadata for the CAP citation graph in CSV format.

Acknowledgements

The Caselaw Access Project is by the Library Innovation Lab at Harvard Law School Library.

Inspiration

People are using CAP data to create research, applications, and more. We're sharing examples in our gallery.

Cite Grid is the first visualization we've created based on data from our citation graph.

Have something to share? We're excited to hear about it.

Search
Clear search
Close search
Google apps
Main menu