85 datasets found
  1. a

    Maine Beach Profiling Graph Data Table

    • maine.hub.arcgis.com
    • mgs-maine.opendata.arcgis.com
    • +3more
    Updated Apr 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State of Maine (2023). Maine Beach Profiling Graph Data Table [Dataset]. https://maine.hub.arcgis.com/maps/maine-beach-profiling-graph-data-table
    Explore at:
    Dataset updated
    Apr 3, 2023
    Dataset authored and provided by
    State of Maine
    Area covered
    Pacific Ocean, South Pacific Ocean
    Description

    All data approved by the beach profiling administrator is included in this table. The data is formatted for production of the beach profile graphs.

  2. Sample Graph Datasets in CSV Format

    • zenodo.org
    csv
    Updated Dec 9, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Edwin Carreño; Edwin Carreño (2024). Sample Graph Datasets in CSV Format [Dataset]. http://doi.org/10.5281/zenodo.14335015
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 9, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Edwin Carreño; Edwin Carreño
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sample Graph Datasets in CSV Format

    Note: none of the data sets published here contain actual data, they are for testing purposes only.

    Description

    This data repository contains graph datasets, where each graph is represented by two CSV files: one for node information and another for edge details. To link the files to the same graph, their names include a common identifier based on the number of nodes. For example:

    • dataset_30_nodes_interactions.csv:contains 30 rows (nodes).
    • dataset_30_edges_interactions.csv: contains 47 rows (edges).
    • the common identifier dataset_30 refers to the same graph.

    CSV nodes

    Each dataset contains the following columns:

    Name of the ColumnTypeDescription
    UniProt IDstringprotein identification
    labelstringprotein label (type of node)
    propertiesstringa dictionary containing properties related to the protein.

    CSV edges

    Each dataset contains the following columns:

    Name of the ColumnTypeDescription
    Relationship IDstringrelationship identification
    Source IDstringidentification of the source protein in the relationship
    Target IDstringidentification of the target protein in the relationship
    labelstringrelationship label (type of relationship)
    propertiesstringa dictionary containing properties related to the relationship.

    Metadata

    GraphNumber of NodesNumber of EdgesSparse graph

    dataset_30*

    30

    47

    Y

    dataset_60*

    60

    181

    Y

    dataset_120*

    120

    689

    Y

    dataset_240*

    240

    2819

    Y

    dataset_300*

    300

    4658

    Y

    dataset_600*

    600

    18004

    Y

    dataset_1200*

    1200

    71785

    Y

    dataset_2400*

    2400

    288600

    Y

    dataset_3000*

    3000

    449727

    Y

    dataset_6000*

    6000

    1799413

    Y

    dataset_12000*

    12000

    7199863

    Y

    dataset_24000*

    24000

    28792361

    Y

    dataset_30000*

    30000

    44991744

    Y

    This repository include two (2) additional tiny graph datasets to experiment before dealing with larger datasets.

    CSV nodes (tiny graphs)

    Each dataset contains the following columns:

    Name of the ColumnTypeDescription
    IDstringnode identification
    labelstringnode label (type of node)
    propertiesstringa dictionary containing properties related to the node.

    CSV edges (tiny graphs)

    Each dataset contains the following columns:

    Name of the ColumnTypeDescription
    IDstringrelationship identification
    sourcestringidentification of the source node in the relationship
    targetstringidentification of the target node in the relationship
    labelstringrelationship label (type of relationship)
    propertiesstringa dictionary containing properties related to the relationship.

    Metadata (tiny graphs)

    GraphNumber of NodesNumber of EdgesSparse graph
    dataset_dummy*36N
    dataset_dummy2*36N
  3. T

    Graph Database Landscape Analysis by Graph Database Platform and Services...

    • futuremarketinsights.com
    pdf
    Updated Apr 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Graph Database Landscape Analysis by Graph Database Platform and Services from 2024 to 2034 [Dataset]. https://www.futuremarketinsights.com/reports/graph-database-market
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Apr 30, 2024
    Dataset authored and provided by
    Future Market Insights
    License

    https://www.futuremarketinsights.com/privacy-policyhttps://www.futuremarketinsights.com/privacy-policy

    Time period covered
    2024 - 2034
    Area covered
    Worldwide
    Description

    The global graph database market growth will be propelled through 2034 at a massive CAGR of 19.4%. With the growing data usage and the rising data storage requirements, the global graph database market size will likely inflate from US$ 3.17 billion to US$ 18.68 billion in the next decade. Technological advancements also fuel the growth prospects of the industry.

    AttributesKey Insights
    Estimated Industry Size in 2024US$ 3.17 billion
    Projected Industry Value in 2034US$ 18.68 billion
    Value-based CAGR from 2024 to 203419.4%

    Growing Technology to Enlarge the Global Graph Database Market Size

    AttributesValues
    Historical CAGR16.5%
    Valuation in 2019US$ 1.46 billion
    Valuation in 2023US$ 2.69 billion

    Country-wise Analysis

    CountriesForecasted CAGR
    Germany8.9%
    Japan9.2%
    The United States of America13.5%
    China19.9%
    Australia22.9%

    Category-wise Insights

    CategorySolution- Graph Database Platform
    Industry Share in 202463.3%
    Segment Drivers
    • Seamless integration with traditional systems increases their usability, enhancing the demand for these platforms.
    • Excellent scalability of these platforms elevates the usability standards.
    • The wider adaptability drives the segment’s popularity, fueling the global graph database market size.
    CategoryApplication- Fraud & Risk Analytics
    Industry Share in 202424.3%
    Segment Drivers
    • Due to the ability of graph databases to deliver accurate risk forecasting, they save a lot of time and money.
    • With the rising financial transactions, the demand for fraud detection and potential risk identification is rising.
    • Therefore, the increasing popularity of the segment drives the global graph database market size.
  4. KG20C Scholarly Knowledge Graph

    • kaggle.com
    zip
    Updated Nov 4, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    T H N (2021). KG20C Scholarly Knowledge Graph [Dataset]. https://www.kaggle.com/tranhungnghiep/kg20c-scholarly-knowledge-graph
    Explore at:
    zip(851624 bytes)Available download formats
    Dataset updated
    Nov 4, 2021
    Authors
    T H N
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Context

    This knowledge graph is constructed to aid research in scholarly data analysis. It can serve as a standard benchmark dataset for several tasks, including knowledge graph embedding, link prediction, recommendation systems, and question answering about high quality papers from 20 top computer science conferences.

    This has been introduced and used in the PhD thesis Multi-Relational Embedding for Knowledge Graph Representation and Analysis and TPDL'19 paper Exploring Scholarly Data by Semantic Query on Knowledge Graph Embedding Space.

    Content

    Construction protocol

    Scholarly data

    From the Microsoft Academic Graph dataset, we extracted high quality computer science papers published in top conferences between 1990 and 2010. The top conference list are based on the CORE ranking A* conferences. The data was cleaned by removing conferences with less than 300 publications and papers with less than 20 citations. The final list includes 20 top conferences: AAAI, AAMAS, ACL, CHI, COLT, DCC, EC, FOCS, ICCV, ICDE, ICDM, ICML, ICSE, IJCAI, NIPS, SIGGRAPH, SIGIR, SIGMOD, UAI, and WWW.

    Knowledge graph

    The scholarly dataset was converted to a knowledge graph by defining the entities, the relations, and constructing the triples. The knowledge graph can be seen as a labeled multi-digraph between scholarly entities, where the edge labels express there relationships between the nodes. We use 5 intrinsic entity types including Paper, Author, Affiliation, Venue, and Domain. We also use 5 intrinsic relation types between the entities including author_in_affiliation, author_write_paper, paper_in_domain, paper_cite_paper, and paper_in_venue.

    Benchmark data splitting

    The knowledge graph was split uniformly at random into the training, validation, and test sets. We made sure that all entities and relations in the validation and test sets also appear in the training set so that their embeddings can be learned. We also made sure that there is no data leakage and no redundant triples in these splits, thus, constitute a challenging benchmark for link prediction similar to WN18RR and FB15K-237.

    Data content

    File format

    All files are in tab-separated-values format, compatible with other popular benchmark datasets including WN18RR and FB15K-237. For example, train.txt includes "28674CFA author_in_affiliation 075CFC38", which denotes the author with id 28674CFA works in the affiliation with id 075CFC38. The repo includes these files: - all_entity_info.txt contains id name type of all entities - all_relation_info.txt contains id of all relations - train.txt contains training triples of the form entity_1_id relation_id entity_2_id - valid.txt contains validation triples - test.txt contains test triples

    Statistics

    Data statistics of the KG20C knowledge graph:

    AuthorPaperConferenceDomainAffiliation
    8,6805,047201,923692
    EntitiesRelationsTraining triplesValidation triplesTest triples
    16,362548,2133,6703,724

    Acknowledgements

    For the dataset and semantic query method, please cite: - Hung Nghiep Tran and Atsuhiro Takasu. Exploring Scholarly Data by Semantic Query on Knowledge Graph Embedding Space. In Proceedings of International Conference on Theory and Practice of Digital Libraries (TPDL), 2019.

    For the MEI knowledge graph embedding model, please cite: - Hung Nghiep Tran and Atsuhiro Takasu. Multi-Partition Embedding Interaction with Block Term Format for Knowledge Graph Completion. In Proceedings of the European Conference on Artificial Intelligence (ECAI), 2020.

    For the baseline results and extended semantic query method, please cite: - Hung Nghiep Tran. Multi-Relational Embedding for Knowledge Graph Representation and Analysis. PhD Dissertation, The Graduate University for Advanced Studies, SOKENDAI, Japan, 2020.

    For the Microsoft Academic Graph dataset, please cite: - Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. An Overview of Microsoft Academic Service (MAS) and Applications. In Proceedings of the International Conference on World Wide Web (WWW), 2015.

    Inspiration

    We include the baseline results for two tasks on the KG20C dataset, link prediction and semantic queries. Link prediction is a relational query task given a relation and the head or tail entity to predict the corresponding tail or head entities. Semantic queries include human-friendly query on the scholarly data. MRR is the mean reciprocal rank, Hit@k is the percentage of correct predictions at top k.

    For more information, please refer to the citations.

    Link prediction results

    We report results for 4 methods. Random, which is just random guess to show the task difficulty. Word2vec, which is the popular embedding method. SimplE/CP and MEI are two recent knowledge graph embedding methods.

    All models are in small size settings, equivalent to total embedding size 100 (50x2 for Word2vec and SimplE/CP, 10x10 for MEI).

    ModelsMRRHit@1Hit@3Hit@10
    Random0.001< 5e-4< 5e-4< 5e-4
    Word2vec (small)0.0680.0110.0700.177
    SimplE/CP (small)0.2150.1480.2340.348
    MEI (small)0.2300.1570.2580.368

    Semantic queries results

    The following results demonstrate semantic queries on knowledge graph embedding space, using the above MEI (small) model.

    QueriesMRRHit@1Hit@3Hit@10
    Who may work at this organization?0.2990.2210.3420.440
    Where may this author work at?0.6260.5620.6690.731
    Who may write this paper?0.2470.1640.2830.405
    What papers may this author write?0.2730.1820.3240.430
    Which papers may cite this paper?0.1160.0330.1200.290
    Which papers may this paper cite?0.1930.0970.2250.404
    Which papers may belong to this domain?0.0520.0250.0490.100
    Which may be the domains of this paper?0.1890.1140.2060.333
    Which papers may publish in this conference?0.1480.0840.1680.257
    Which conferences may this paper publish in?0.6930.5420.8100.976
  5. Data from: The OREGANO knowledge graph for computational drug repurposing

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    bin, tsv
    Updated Nov 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marina Boudin; Marina Boudin; Fleur Mougin; Fleur Mougin; Gayo Diallo; Gayo Diallo; Martin Drancé; Martin Drancé (2023). The OREGANO knowledge graph for computational drug repurposing [Dataset]. http://doi.org/10.5281/zenodo.10103842
    Explore at:
    bin, tsvAvailable download formats
    Dataset updated
    Nov 13, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Marina Boudin; Marina Boudin; Fleur Mougin; Fleur Mougin; Gayo Diallo; Gayo Diallo; Martin Drancé; Martin Drancé
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Oct 16, 2023
    Description

    The files here are data files from the OREGANO project, which consists of building a holistic knowledge graph on drugs, including natural compounds. Here is the list of files:

    - OREGANO_V2.tsv : The triplet file used for link prediction. 3 columns : Subjet ; Predicate ; Object

    - oreganov2.1_metadata_complet.ttl : The OREGANO knowledge graph in turtle format with the names and cross-references of the various integrated entities.

    The following files contain the cross-references of OREGANO entities according to their type. They are all organised as follows: the external sources are the titles of the columns and each line begins with the identifier of the entity in OREGANO :

    - TARGET.tsv: Cross-reference table of the 22,096 targets.
    - PHENOTYPES.tsv: Cross-reference table of the 11,605 phenotypes.
    - DISEASES.tsv: Cross-reference table of the 18,333 diseases.
    - PATHWAYS.tsv: Cross-reference table of the 2,129 pathways.
    - GENES.tsv: Cross-reference table of the 35,794 genes.
    - COMPOUND.tsv: Cross-reference table of the 90,868 compounds.
    - INDICATIONS.tsv: Cross-reference table of the 2,714 indications.
    - SIDE_EFFECT.tsv: Cross-reference table of the 6,060 side-effects.
    - ACTIVITY.tsv: Names of the 78 activities.
    - EFFECT.tsv: Names of the 171 effects.

    The OREGANO knowledge graph is composed of 11 types of nodes and 19 types of links. The current version of the graph contains 88,937 nodes and 824,231 links.

    A SPARQL endpoint has been provided to enable users to retrieve and explore the knowledge graph at OREGANO SPARQL endpoint .

    The integration files and the knowledge graph are available on the GitHub of the OREGANO project in the Integration folder: Gitub repository .

  6. d

    Key generic technology prediction in patent citation using graph neural...

    • dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Jun 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M. L. Ding (2024). Key generic technology prediction in patent citation using graph neural networks [Dataset]. http://doi.org/10.5061/dryad.nk98sf803
    Explore at:
    Dataset updated
    Jun 5, 2024
    Dataset provided by
    Dryad Digital Repository
    Authors
    M. L. Ding
    Time period covered
    Jan 11, 2024
    Description

    With the rapid advancement of the Fourth Industrial Revolution, international competition in technology and industry is intensifying. However, in the era of big data and large-scale science, making accurate judgments about the key areas of technology and innovative trends has become exceptionally difficult. This paper constructs a patent indicator evaluation system based on the dimensions of key and generic patent citation, integrates graph neural network modeling to predict key common technologies, and confirms the effectiveness of the method using the field of genetic engineering as an example. According to the LDA topic model, the main technical R&D directions in genetic engineering are genetic analysis and detection technologies, the application of microorganisms in industrial production, virology research involving vaccine development and immune responses, high-throughput sequencing and analysis technologies in genomics, targeted drug design and molecular therapeutic strategies..., These datasets were obtained by the Incopat patent database for cited patents (2013-2022) in the field of genetic engineering. Details for the datasets are provided in the README file. This directory contains the selection of the patent datasets. 1) Table of key generic indicators for nodes (partial 1).csv This file consists of 10 indicators of patents: technical coverage, patent families, patent family citation, patent cooperation, enterprise-enterprise cooperation, industry-university-research cooperation, claims, citation frequency, layout countries, and layout countries. 2) Table of key generic indicators for nodes (partial 2).csv This file consists of 10 indicators of patents: technical convergence, cited countries, inventors, citations, homologous countries/areas, degree centrality, closeness centrality, betweenness centrality, eigenvector centrality, and PageRank. 3) patent.content The content file contains descriptions of the patents in the following format:

    This README file was generated on 2023-11-25 by Mingli Ding.

    GENERAL INFORMATION

    1. Author Information Investigators Contact Information Name: Mingli Ding; Wangke Yu; Shuhua Wang Institution: Jingdezhen Ceramic University Address: Jingdezhen, Jiangxi, China Email: mlding1@163.com
    2. Date of data collection:2013-2022

    DATA & FILE OVERVIEW

    1. File List:

    A) Table of key generic indicators for nodes (partial 1).csv

    B) Table of key generic indicators for nodes (partial 2).csv

    C) patent.content

    D) patent.cites

    E) Graph neural network modeling highest accuracy for different dimensions.csv

    F) Prediction effects of key generic technologies.csv

    DATA-SPECIFIC INFORMATION FOR: Table of key generic indicators for nodes (partial 1).csv

    1. Number of variables: 10
    2. Number of cases/rows: 72489
    3. Variable List:
    • technical coverage: number ...
  7. H

    CDC's PRAMS Online Data for Epidemiological Research (CPONDER)

    • dataverse.harvard.edu
    • data.niaid.nih.gov
    Updated Nov 30, 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2010). CDC's PRAMS Online Data for Epidemiological Research (CPONDER) [Dataset]. http://doi.org/10.7910/DVN/1JPCH8
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 30, 2010
    Dataset provided by
    Harvard Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This interactive tool allows users to generate tables and graphs on information relating to pregnancy and childbirth. All data comes from the CDC's PRAMS. Topics include: breastfeeding, prenatal care, insurance coverage and alcohol use during pregnancy. Background CPONDER is the interaction online data tool for the Center's for Disease Control and Prevention (CDC)'s Pregnancy Risk Assessment Monitoring System (PRAMS). PRAMS gathers state and national level data on a variety of topics related to pregnancy and childbirth. Examples of information include: breastfeeding, alcohol use, multivitamin use, prenatal care, and contraception. User Functionality Users select choices from three drop down menus to search for d ata. The menus are state, year and topic. Users can then select the specific question from PRAMS they are interested in, and the data table or graph will appear. Users can then compare that question to another state or to another year to generate a new data table or graph. Data Notes The data source for CPONDER is PRAMS. The data is from every year between 2000 and 2008, and data is available at the state and national level. However, states must have participated in PRAMS to be part of CPONDER. Not every state, and not every year for every state, is available.

  8. h

    anatomical-systems (v1.1) graph data

    • purl.humanatlas.io
    application/n-quads +4
    Updated Dec 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HRA Digital Object Processor (2024). anatomical-systems (v1.1) graph data [Dataset]. https://purl.humanatlas.io/asct-b/anatomical-systems/v1.1
    Explore at:
    ttl, jsonld, rdf, application/n-quads, application/n-triplesAvailable download formats
    Dataset updated
    Dec 12, 2024
    Dataset authored and provided by
    HRA Digital Object Processor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The graph representation of the Anatomical Structures, Cell Types, plus Biomarkers (ASCT+B) table for Anatomical Systems dataset.

  9. Data from: NeMig - A Bilingual News Collection and Knowledge Graph about...

    • zenodo.org
    Updated May 9, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andreea Iana; Andreea Iana; Mehwish Alam; Mehwish Alam; Alexander Grote; Katharina Ludwig; Philipp Müller; Christof Weinhardt; Heiko Paulheim; Heiko Paulheim; Alexander Grote; Katharina Ludwig; Philipp Müller; Christof Weinhardt (2023). NeMig - A Bilingual News Collection and Knowledge Graph about Migration [Dataset]. http://doi.org/10.5281/zenodo.7442425
    Explore at:
    Dataset updated
    May 9, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andreea Iana; Andreea Iana; Mehwish Alam; Mehwish Alam; Alexander Grote; Katharina Ludwig; Philipp Müller; Christof Weinhardt; Heiko Paulheim; Heiko Paulheim; Alexander Grote; Katharina Ludwig; Philipp Müller; Christof Weinhardt
    Description

    NeMig are two English and German knowledge graphs constructed from news articles on the topic of migration, collected from online media outlets from Germany and the US, respectively. NeMIg contains rich textual and metadata information, sub-topics and sentiment annotations, as well as named entities extracted from the articles' content and metadata and linked to Wikidata. The graphs are expanded with up to two-hop neighbors from Wikidata of the initial set of linked entities.

    NeMig comes in four flavors, for both the German, and the English corpora:

    • Base NeMig: contains literals and entities from the corresponding annotated news corpus;
    • Entities NeMig: derived from the Base NeMIg by removing all literal nodes, it contains only resource nodes;
    • Enriched Entities NeMig: derived from the Entities NeMig by enriching it with up to two-hop neighbors from Wikidata, it contains only resource nodes and Wikidata triples;
    • Complete NeMig: the combination of the Base and Enriched Entities NeMig, it contains both literals and resources.

    Information about uploaded files:

    (all files are b-zipped and in the N-Triples format.)

    FileDescription
    nemig_${language}_ ${graph_type}-metadata.nt.bz2Metadata about the dataset, described using void vocabulary.
    nemig_${language}_ ${graph_type}-instances_types.nt.bz2Class definitions of news and event instances.
    nemig_${language}_ ${graph_type}-instances_labels.nt.bz2Labels of instances.
    nemig_${language}_ ${graph_type}-instances_related.nt.bz2Relations between news instances based on one another.
    nemig_${language}_ ${graph_type}-instances_metadata_literals.nt.bz2Relations between news instances and metadata literals (e.g. URL, publishing date, modification date, sentiment label, political orientation of news outlets).
    nemig_${language}_ ${graph_type}-instances_content_mapping.nt.bz2Mapping of news instances to content instances (e.g. title, abstract, body).
    nemig_${language}_ ${graph_type}-instances_topic_mapping.nt.bz2Mapping of news instances to sub-topic instances.
    nemig_${language}_ ${graph_type}-instances_content_literals.nt.bz2Relations between content instances and corresponding literals (e.g. text of title, abstract, body).
    nemig_${language}_ ${graph_type}-instances_metadata_resources.nt.bz2Relations between news or sub-topic instances and entities extracted from metadata (i.e. publishers, authors, keywords).
    nemig_${language}_ ${graph_type}-instances_event_mapping.nt.bz2Mapping of news instances to event instances.
    nemig_${language}_ ${graph_type}-event_resources.nt.bz2Relations between event instances and entities extracted from the text of the news (i.e. actors, places, mentions).
    nemig_${language}_ ${graph_type}-resources_provenance.nt.bz2Provenance information about the entities extracted from the text of the news (e.g. title, abstract, body).
    nemig_${language}_ ${graph_type}-wiki_resources.nt.bz2Relations between Wikidata entities from news and their k-hop entity neighbors from Wikidata.

  10. OGBN-Products (Processed for PyG)

    • kaggle.com
    Updated Feb 27, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Redao da Taupl (2021). OGBN-Products (Processed for PyG) [Dataset]. https://www.kaggle.com/datasets/dataup1/ogbn-products/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 27, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Redao da Taupl
    Description

    OGBN-Products

    Webpage: https://ogb.stanford.edu/docs/nodeprop/#ogbn-products

    Usage in Python

    import os.path as osp
    import pandas as pd
    import datatable as dt
    import torch
    import torch_geometric as pyg
    from ogb.nodeproppred import PygNodePropPredDataset
    
    class PygOgbnProducts(PygNodePropPredDataset):
      def _init_(self, meta_csv = None):
        root, name, transform = '/kaggle/input', 'ogbn-products', None
        if meta_csv is None:
          meta_csv = osp.join(root, name, 'ogbn-master.csv')
        master = pd.read_csv(meta_csv, index_col = 0)
        meta_dict = master[name]
        meta_dict['dir_path'] = osp.join(root, name)
        super()._init_(name = name, root = root, transform = transform, meta_dict = meta_dict)
      def get_idx_split(self, split_type = None):
        if split_type is None:
          split_type = self.meta_info['split']
        path = osp.join(self.root, 'split', split_type)
        if osp.isfile(os.path.join(path, 'split_dict.pt')):
          return torch.load(os.path.join(path, 'split_dict.pt'))
        if self.is_hetero:
          train_idx_dict, valid_idx_dict, test_idx_dict = read_nodesplitidx_split_hetero(path)
          for nodetype in train_idx_dict.keys():
            train_idx_dict[nodetype] = torch.from_numpy(train_idx_dict[nodetype]).to(torch.long)
            valid_idx_dict[nodetype] = torch.from_numpy(valid_idx_dict[nodetype]).to(torch.long)
            test_idx_dict[nodetype] = torch.from_numpy(test_idx_dict[nodetype]).to(torch.long)
            return {'train': train_idx_dict, 'valid': valid_idx_dict, 'test': test_idx_dict}
        else:
          train_idx = dt.fread(osp.join(path, 'train.csv'), header = None).to_numpy().T[0]
          train_idx = torch.from_numpy(train_idx).to(torch.long)
          valid_idx = dt.fread(osp.join(path, 'valid.csv'), header = None).to_numpy().T[0]
          valid_idx = torch.from_numpy(valid_idx).to(torch.long)
          test_idx = dt.fread(osp.join(path, 'test.csv'), header = None).to_numpy().T[0]
          test_idx = torch.from_numpy(test_idx).to(torch.long)
          return {'train': train_idx, 'valid': valid_idx, 'test': test_idx}
    
    dataset = PygOgbnProducts()
    split_idx = dataset.get_idx_split()
    train_idx, valid_idx, test_idx = split_idx['train'], split_idx['valid'], split_idx['test']
    graph = dataset[0] # PyG Graph object
    

    Description

    Graph: The ogbn-products dataset is an undirected and unweighted graph, representing an Amazon product co-purchasing network [1]. Nodes represent products sold in Amazon, and edges between two products indicate that the products are purchased together. The authors follow [2] to process node features and target categories. Specifically, node features are generated by extracting bag-of-words features from the product descriptions followed by a Principal Component Analysis to reduce the dimension to 100.

    Prediction task: The task is to predict the category of a product in a multi-class classification setup, where the 47 top-level categories are used for target labels.

    Dataset splitting: The authors consider a more challenging and realistic dataset splitting that differs from the one used in [2] Instead of randomly assigning 90% of the nodes for training and 10% of the nodes for testing (without use of a validation set), use the sales ranking (popularity) to split nodes into training/validation/test sets. Specifically, the authors sort the products according to their sales ranking and use the top 8% for training, next top 2% for validation, and the rest for testing. This is a more challenging splitting procedure that closely matches the real-world application where labels are first assigned to important nodes in the network and ML models are subsequently used to make predictions on less important ones.

    Note 1: A very small number of self-connecting edges are repeated (see here); you may remove them if necessary.

    Note 2: For undirected graphs, the loaded graphs will have the doubled number of edges because the bidirectional edges will be added automatically.

    Summary

    Package#Nodes#EdgesSplit TypeTask TypeMetric
    ogb>=1.1.12,449,02961,859,140Sales rankMulti-class classificationAccuracy

    Open Graph Benchmark

    Website: https://ogb.stanford.edu

    The Open Graph Benchmark (OGB) [3] is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. OGB datasets are automatically downloaded, processed, and split using the OGB Data Loader. The model performance can be evaluated using the OGB Evaluator in a unified manner.

    References

    [1] http://manikvarma.org/downloads/XC/XMLRepository.html [2] Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh. Cluster-GCN: An efficient algorithm for training deep and large graph convolutional networks. ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 257–266, 2019. [3] Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open graph benchmark: Datasets for machine learning on graphs. Advances in Neural Information Processing Systems, pp. 22118–22133, 2020.

    License: Amazon License

    By accessing the Amazon Customer Reviews Library ("Reviews Library"), you agree that the Reviews Library is an Amazon Service subject to the Amazon.com Conditions of Use (https://www.amazon.com/gp/help/customer/display.html/ref=footer_cou?ie=UTF8&nodeId=508088) and you agree to be bound by them, with the following additional conditions: In addition to the license rights granted under the Conditions of Use, Amazon or its content providers grant you a limited, non-exclusive, non-transferable, non-sublicensable, revocable license to access and use the Reviews Library for purposes of academic research. You may not resell, republish, or make any commercial use of the Reviews Library or its contents, including use of the Reviews Library for commercial research, such as research related to a funding or consultancy contract, internship, or other relationship in which the results are provided for a fee or delivered to a for-profit organization. You may not (a) link or associate content in the Reviews Library with any personal information (including Amazon customer accounts), or (b) attempt to determine the identity of the author of any content in the Reviews Library. If you violate any of the foregoing conditions, your license to access and use the Reviews Library will automatically terminate without prejudice to any of the other rights or remedies Amazon may have.

    Disclaimer

    I am NOT the author of this dataset. It was downloaded from its official website. I assume no responsibility or liability for the content in this dataset. Any questions, problems or issues, please contact the original authors at their website or their GitHub repo.

  11. T

    United States - Assets: Other: Other Assets, Consolidated Table: Wednesday...

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Feb 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2020). United States - Assets: Other: Other Assets, Consolidated Table: Wednesday Level [Dataset]. https://tradingeconomics.com/united-states/assets-other-assets-fed-data.html
    Explore at:
    xml, csv, json, excelAvailable download formats
    Dataset updated
    Feb 3, 2020
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1976 - Dec 31, 2025
    Area covered
    United States
    Description

    United States - Assets: Other: Other Assets, Consolidated Table: Wednesday Level was 35274.00000 Mil. of $ in March of 2025, according to the United States Federal Reserve. Historically, United States - Assets: Other: Other Assets, Consolidated Table: Wednesday Level reached a record high of 50550.00000 in August of 2023 and a record low of 5958.00000 in November of 2006. Trading Economics provides the current actual value, an historical data chart and related indicators for United States - Assets: Other: Other Assets, Consolidated Table: Wednesday Level - last updated from the United States Federal Reserve on March of 2025.

  12. Z

    Life tables and graphs for Bahry (2022) - Equilibrium conditions in the...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Sep 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bahry, David (2022). Life tables and graphs for Bahry (2022) - Equilibrium conditions in the evolution of senescence [MSc thesis, Carleton Univeristy] [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7069069
    Explore at:
    Dataset updated
    Sep 12, 2022
    Dataset authored and provided by
    Bahry, David
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Life table data, and derived quantities, for Equilibrium Conditions in the Evolution of Senescence (Bahry, 2022, MSc thesis); adapted from the supplementary data of (Jones et al., 2014). Life table data for human (Japan 2009), human (Aché hunter-gatherer), fruit fly, Soay sheep, freshwater hydra, and desert tortoise.

    Basic life table quantities: age interval ((X)); survival function ((l_X)); and age-specific interval fecundity ((m_X)). Derived quantities include interval average force of mortality; reproductive value; residual reproductive value; Hamilton's indicators of the age-specific forces of selection; and actual age-specific mortality vs. predicted age-specific mortality based on models treated in (Bahry, 2022).

    In the original life tables of Jones et al. (2014), desert tortoises negatively senesce over the range of observed ages, but had a final observed cut-off age of 74; this causes reproductive value to artifactually fall to 0 as age-approached the cutoff. To get around this, I also used an extrapolated desert tortoise life table, assuming the age-74 mortality and fecundity rates remained constant until age 1000, then using the extrapolated life table to calculate reproductive value (and Hamilton's indicators) up to the cutoff age 74.

    References

    Bahry, D. (2022). Equilibrium Conditions in the Evolution of Senescence [Master's thesis, Carleton University].

    Jones, O. R. et al. (2014). Diversity of ageing across the tree of life. Nature 505: 169–174. https://doi.org/10.1038/nature12789

  13. d

    Data Visualization in Social Work Research

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rothwell, David; Esposito, Tonino; Wegner-Lohin (2023). Data Visualization in Social Work Research [Dataset]. http://doi.org/10.7910/DVN/I6IIXL
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Rothwell, David; Esposito, Tonino; Wegner-Lohin
    Time period covered
    Jan 1, 2009 - Jan 1, 2012
    Description

    Research dissemination and knowledge translation are imperative in social work. Methodological developments in data visualization techniques have improved the ability to convey meaning and reduce erroneous conclusions. The purpose of this project is to examine: (1) How are empirical results presented visually in social work research?; (2) To what extent do top social work journals vary in the publication of data visualization techniques?; (3) What is the predominant type of analysis presented in tables and graphs?; (4) How can current data visualization methods be improved to increase understanding of social work research? Method: A database was built from a systematic literature review of the four most recent issues of Social Work Research and 6 other highly ranked journals in social work based on the 2009 5-year impact factor (Thomson Reuters ISI Web of Knowledge). Overall, 294 articles were reviewed. Articles without any form of data visualization were not included in the final database. The number of articles reviewed by journal includes : Child Abuse & Neglect (38), Child Maltreatment (30), American Journal of Community Psychology (31), Family Relations (36), Social Work (29), Children and Youth Services Review (112), and Social Work Research (18). Articles with any type of data visualization (table, graph, other) were included in the database and coded sequentially by two reviewers based on the type of visualization method and type of analyses presented (descriptive, bivariate, measurement, estimate, predicted value, other). Additional revi ew was required from the entire research team for 68 articles. Codes were discussed until 100% agreement was reached. The final database includes 824 data visualization entries.

  14. m

    Dataset of development of business during the COVID-19 crisis

    • data.mendeley.com
    • narcis.nl
    Updated Nov 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tatiana N. Litvinova (2020). Dataset of development of business during the COVID-19 crisis [Dataset]. http://doi.org/10.17632/9vvrd34f8t.1
    Explore at:
    Dataset updated
    Nov 9, 2020
    Authors
    Tatiana N. Litvinova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.

  15. Station-B Biological Knowledge Graph Data

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Sep 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prashant Vaidyanathan; Prashant Vaidyanathan; Boyan Yordanov; Boyan Yordanov; Paul K. Grant; Paul K. Grant; Colin Gravill; Neil Dalchau; Neil Dalchau; Colin Gravill (2021). Station-B Biological Knowledge Graph Data [Dataset]. http://doi.org/10.5281/zenodo.5245860
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 27, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Prashant Vaidyanathan; Prashant Vaidyanathan; Boyan Yordanov; Boyan Yordanov; Paul K. Grant; Paul K. Grant; Colin Gravill; Neil Dalchau; Neil Dalchau; Colin Gravill
    Description

    This dataset contains all the experimental data and metadata collected as part of the Station-B project at Microsoft Research Cambridge. The data has been structured using the Biological Knowledge Graph Schema and was stored in Azure Tables and Azure Blobs. This data includes two files:

    • blobs.zip: This zipped file primarily contains blobs that stored raw and processed fluorescence data from the Microplate Reader at the Station-B wet lab. This zip also contains bundles compatible with the Synthace Platform to enable lab automation with Liquid handling robots.
    • tables.zip: This zipped file contains all the data and metadata associated with the Assembly and Characterization experiments conducted at Station-B. Each CSV in this zipped file represents data stored in an Azure Table. The columns in each CSV are based on the Biological Knowledge Graph Schema.
  16. T

    United States Stock Market Index Data

    • tradingeconomics.com
    • ar.tradingeconomics.com
    • +15more
    csv, excel, json, xml
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS, United States Stock Market Index Data [Dataset]. https://tradingeconomics.com/united-states/stock-market
    Explore at:
    excel, xml, json, csvAvailable download formats
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 3, 1928 - Mar 27, 2025
    Area covered
    United States
    Description

    The main stock market index in the United States (US500) decreased 176 points or 2.99% since the beginning of 2025, according to trading on a contract for difference (CFD) that tracks this benchmark index from United States. United States Stock Market Index - values, historical data, forecasts and news - updated on March of 2025.

  17. h

    skeleton (v1.0) graph data

    • lod.humanatlas.io
    application/n-quads +4
    Updated Dec 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HRA Digital Object Processor (2024). skeleton (v1.0) graph data [Dataset]. https://lod.humanatlas.io/asct-b/skeleton/v1.0/
    Explore at:
    rdf, application/n-triples, application/n-quads, jsonld, ttlAvailable download formats
    Dataset updated
    Dec 12, 2024
    Dataset authored and provided by
    HRA Digital Object Processor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The graph representation of the Anatomical Structures, Cell Types, plus Biomarkers (ASCT+B) table for Skeleton dataset.

  18. T

    United States - Commercial and Industrial Loans, Domestically Chartered...

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Apr 28, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2020). United States - Commercial and Industrial Loans, Domestically Chartered Commercial Banks [Dataset]. https://tradingeconomics.com/united-states/commercial-and-industrial-loans-domestically-chartered-commercial-banks-bil-of-u-s-dollar-sa-fed-data.html
    Explore at:
    excel, xml, csv, jsonAvailable download formats
    Dataset updated
    Apr 28, 2020
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1976 - Dec 31, 2025
    Area covered
    United States
    Description

    United States - Commercial and Industrial Loans, Domestically Chartered Commercial Banks was 2098.26530 Bil. of U.S. $ in March of 2022, according to the United States Federal Reserve. Historically, United States - Commercial and Industrial Loans, Domestically Chartered Commercial Banks reached a record high of 2515.61760 in May of 2020 and a record low of 128.86610 in January of 1973. Trading Economics provides the current actual value, an historical data chart and related indicators for United States - Commercial and Industrial Loans, Domestically Chartered Commercial Banks - last updated from the United States Federal Reserve on March of 2025.

  19. d

    Model predictions for heterogeneous stream-reservoir graph networks with...

    • catalog.data.gov
    • data.usgs.gov
    Updated Jul 6, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2024). Model predictions for heterogeneous stream-reservoir graph networks with data assimilation [Dataset]. https://catalog.data.gov/dataset/model-predictions-for-heterogeneous-stream-reservoir-graph-networks-with-data-assimilation
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    This data release provides the predictions from stream temperature models described in Chen et al. 2021. Briefly, various deep learning and process-guided deep learning models were built to test improved performance of stream temperature predictions below reservoirs in the Delaware River Basin. The spatial extent of predictions was restricted to streams above the Delaware River at Lordville, NY, and includes the West Branch of the Delaware River below Cannonsville Reservoir and the East Branch of the Delaware River below Pepacton Reservoir. Various model architectures, training schemes, and data assimilation methods were used to generate the table and figures in Chen et a.l (2021) and predictions of each model are captured in this release. For each model, there are test period predictions for 56 river reaches from 2006-10-01 through 2020-09-30. Model input and validation data can be found in Oliver et al. (2021).

    The publication associated with this data release is Chen, S., Appling, A.P., Oliver, S.K., Corson-Dosch, H.R., Read, J.S., Sadler, J.M., Zwart, J.A., Jia, X, 2021, Heterogeneous stream-reservoir graph networks with data assimilation. International Conference on Data Mining (ICDM). DOI: https://doi.org/10.1109/ICDM51629.2021.00117.

  20. NetVotes ENIC Dataset

    • zenodo.org
    txt, zip
    Updated Oct 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Israel Mendonça; Vincent Labatut; Vincent Labatut; Rosa Figueiredo; Rosa Figueiredo; Israel Mendonça (2024). NetVotes ENIC Dataset [Dataset]. http://doi.org/10.5281/zenodo.6815510
    Explore at:
    zip, txtAvailable download formats
    Dataset updated
    Oct 1, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Israel Mendonça; Vincent Labatut; Vincent Labatut; Rosa Figueiredo; Rosa Figueiredo; Israel Mendonça
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Description. The NetVote dataset contains the outputs of the NetVote program when applied to voting data coming from VoteWatch (http://www.votewatch.eu/).

    These results were used in the following conference papers:

    1. I. Mendonça, R. Figueiredo, V. Labatut, and P. Michelon, “Relevance of Negative Links in Graph Partitioning: A Case Study Using Votes From the European Parliament,” in 2nd European Network Intelligence Conference, 2015, pp. 122–129. ⟨hal-01176090⟩ DOI: 10.1109/ENIC.2015.25
    2. I. Mendonça, R. Figueiredo, V. Labatut, and P. Michelon, “Informative Value of Negative Links for Graph Partitioning, with an application to European Parliament Votes,” in 6ème Conférence sur les modèles et lánalyse de réseaux : approches mathématiques et informatiques, 2015, p. 12p. ⟨hal-02055158⟩

    Source code. The NetVote source code is available on GitHub: https://github.com/CompNet/NetVotes.

    Citation. If you use our dataset or tool, please cite article [1] above.


    @InProceedings{Mendonca2015,
    author = {Mendonça, Israel and Figueiredo, Rosa and Labatut, Vincent and Michelon, Philippe},

    title = {Relevance of Negative Links in Graph Partitioning: A Case Study Using Votes From the {E}uropean {P}arliament},
    booktitle = {2\textsuperscript{nd} European Network Intelligence Conference ({ENIC})},
    year = {2015},
    pages = {122-129},
    address = {Karlskrona, SE},
    publisher = {IEEE Publishing},
    doi = {10.1109/ENIC.2015.25},
    }

    -------------------------

    Details. This archive contains the following folders:

    • `votewatch_data`: the raw data extracted from the VoteWatch website.
      • `VoteWatch Europe European Parliament, Council of the EU.csv`: list of the documents voted during the considered term, with some details such as the date and topic.
      • `votes_by_document`: this folder contains a collection of CSV files, each one describing the outcome of the vote session relatively to one specific document.
      • `intermediate_files`: this folder contains several CSV files:
        • `allvotes.csv`: concatenation of all vote outcomes for all documents and all MEPS. Can be considered as a compact representation of the data contained in the folder `votes_by_document`.
        • `loyalty.csv`: same thing than allvotes.csv, but for the loyalty (i.e. whether or not the MEP voted like the majority of the MEPs in his political group).
        • `MPs.csv`: list of the MEPs having voted at least once in the considered term, with their details.
        • `policies.csv`: list of the topics considered during the term.
        • `qtd_docs.csv`: list of the topics with the corresponding number of documents.
    • `parallel_ils_results`: contains the raw results of the ILS tool. This is an external algorithm able to estimate the optimal partition of the network nodes in terms of structural balance. It was applied to all the networks extracted by our scripts (from the VoteWatch data), and the produced files were placed here for postprocessing. Each subfolder corresponds to one of the topic-year pair.
    • `output_files`: contains the file produced by our scripts.
      • `agreement`: histograms representing the distributions of agreement and rebellion indices. Each subfolder corresponds to a specific topic.
      • `community_algorithms_csv`: Performances obtained by the partitioning algorithms (for both community detection and correlation clustering). Each subfolder corresponds to a specific topic.
      • `xxxx_cluster_information.csv`: table containing several variants of the imbalance measure, for the considered algorithms.
      • `community_algorithms_results`: Comparison of the partitions detected by the various algorithms considered, and distribution of the cluster/community sizes. Each subfolder corresponds to a specific topic.
      • `xxxx_cluster_comparison.csv`: table comparing the partitions detected by the community detection algorithms, in terms of Rand index and other measures.
      • `xxxx_ils_cluster_comparison.csv`: like `xxxx_cluster_comparison.csv`, except we compare the partition of community detection algorithms with that of the ILS.
      • `xxxx_yyyy_distribution.pdf`: histogram of the community (or cluster) sizes detected by algorithm `yyyy`.
      • `graphs`: the networks extracted from the vote data. Each subfolder corresponds to a specific topic.
      • `xxxx_complete_graph.graphml`: network at the Graphml format, with all the information: nodes, edges, nodal attributes (including communities), weights, etc.
      • `xxxx_edges_Gephi.csv`: only the links, with their weights (i.e. vote similarity).
      • `xxxx_graph.g`: network at the g format (for ILS).
      • `xxxx_net_measures.csv`: table containing some stats on the network (number of links, etc.).
      • `xxxx_nodes_Gephi.csv`: list of nodes (i.e. MEPs), with details.
      • `plots`: synthesis plots from the paper.

    -------------------------

    License. These data are shared under a Creative Commons 0 license.

    Contact. Vincent Labatut <vincent.labatut@univ-avignon.fr> & Rosa Figueiredo <rosa.figueiredo@univ-avignon.fr>

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
State of Maine (2023). Maine Beach Profiling Graph Data Table [Dataset]. https://maine.hub.arcgis.com/maps/maine-beach-profiling-graph-data-table

Maine Beach Profiling Graph Data Table

Explore at:
Dataset updated
Apr 3, 2023
Dataset authored and provided by
State of Maine
Area covered
Pacific Ocean, South Pacific Ocean
Description

All data approved by the beach profiling administrator is included in this table. The data is formatted for production of the beach profile graphs.

Search
Clear search
Close search
Google apps
Main menu