Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Note: none of the data sets published here contain actual data, they are for testing purposes only.
This data repository contains graph datasets, where each graph is represented by two CSV files: one for node information and another for edge details. To link the files to the same graph, their names include a common identifier based on the number of nodes. For example:
dataset_30_nodes_interactions.csv
:contains 30 rows (nodes).dataset_30_edges_interactions.csv
: contains 47 rows (edges).dataset_30
refers to the same graph.Each dataset contains the following columns:
Name of the Column | Type | Description |
UniProt ID | string | protein identification |
label | string | protein label (type of node) |
properties | string | a dictionary containing properties related to the protein. |
Each dataset contains the following columns:
Name of the Column | Type | Description |
Relationship ID | string | relationship identification |
Source ID | string | identification of the source protein in the relationship |
Target ID | string | identification of the target protein in the relationship |
label | string | relationship label (type of relationship) |
properties | string | a dictionary containing properties related to the relationship. |
Graph | Number of Nodes | Number of Edges | Sparse graph |
dataset_30* |
30 | 47 |
Y |
dataset_60* |
60 |
181 |
Y |
dataset_120* |
120 |
689 |
Y |
dataset_240* |
240 |
2819 |
Y |
dataset_300* |
300 |
4658 |
Y |
dataset_600* |
600 |
18004 |
Y |
dataset_1200* |
1200 |
71785 |
Y |
dataset_2400* |
2400 |
288600 |
Y |
dataset_3000* |
3000 |
449727 |
Y |
dataset_6000* |
6000 |
1799413 |
Y |
dataset_12000* |
12000 |
7199863 |
Y |
dataset_24000* |
24000 |
28792361 |
Y |
dataset_30000* |
30000 |
44991744 |
Y |
This repository include two (2) additional tiny graph datasets to experiment before dealing with larger datasets.
Each dataset contains the following columns:
Name of the Column | Type | Description |
ID | string | node identification |
label | string | node label (type of node) |
properties | string | a dictionary containing properties related to the node. |
Each dataset contains the following columns:
Name of the Column | Type | Description |
ID | string | relationship identification |
source | string | identification of the source node in the relationship |
target | string | identification of the target node in the relationship |
label | string | relationship label (type of relationship) |
properties | string | a dictionary containing properties related to the relationship. |
Graph | Number of Nodes | Number of Edges | Sparse graph |
dataset_dummy* | 3 | 6 | N |
dataset_dummy2* | 3 | 6 | N |
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A MS-Excel notebook of several spreadsheets pertaining to Figs 1–9 and Table 1, including ANOVA analysis, means, standard deviations, and charts/graphs, and also an adaptation of Andersen 1958 Table 1 adjusted to convert CFU/plate to CFU/m3 with the 1.25x adjustment factor for use of plastic Petri dishes. (XLSX)
Files used to create the tables and graphs in the paper A Fistful of Dollars: Financial Incentives, Peer Information, and Retirement Savings (Bauer, Eberhardt, & Smeets, forthcoming). Sample datasets are included. Programs are written in Stata.
With the rapid advancement of the Fourth Industrial Revolution, international competition in technology and industry is intensifying. However, in the era of big data and large-scale science, making accurate judgments about the key areas of technology and innovative trends has become exceptionally difficult. This paper constructs a patent indicator evaluation system based on the dimensions of key and generic patent citation, integrates graph neural network modeling to predict key common technologies, and confirms the effectiveness of the method using the field of genetic engineering as an example. According to the LDA topic model, the main technical R&D directions in genetic engineering are genetic analysis and detection technologies, the application of microorganisms in industrial production, virology research involving vaccine development and immune responses, high-throughput sequencing and analysis technologies in genomics, targeted drug design and molecular therapeutic strategies..., These datasets were obtained by the Incopat patent database for cited patents (2013-2022) in the field of genetic engineering. Details for the datasets are provided in the README file. This directory contains the selection of the patent datasets. 1) Table of key generic indicators for nodes (partial 1).csv This file consists of 10 indicators of patents: technical coverage, patent families, patent family citation, patent cooperation, enterprise-enterprise cooperation, industry-university-research cooperation, claims, citation frequency, layout countries, and layout countries. 2) Table of key generic indicators for nodes (partial 2).csv This file consists of 10 indicators of patents: technical convergence, cited countries, inventors, citations, homologous countries/areas, degree centrality, closeness centrality, betweenness centrality, eigenvector centrality, and PageRank. 3) patent.content The content file contains descriptions of the patents in the following format:
This README file was generated on 2023-11-25 by Mingli Ding.
A) Table of key generic indicators for nodes (partial 1).csv
B) Table of key generic indicators for nodes (partial 2).csv
C) patent.content
D) patent.cites
E) Graph neural network modeling highest accuracy for different dimensions.csv
F) Prediction effects of key generic technologies.csv
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The Canadian Environmental Sustainability Indicators (CESI) program provides data and information to track Canada's performance on key environmental sustainability issues. The Population status of Canada's migratory birds indicator reports the proportion of bird species listed in the Migratory Birds Convention Act whose populations fall within, or are above or below national population goals. It provides a snapshot assessment of the state of bird populations in Canada. Some bird species are managed towards specific population levels (for example, some hunted species or species of conservation concern). While the indicator reports whether species' populations are within acceptable bounds, it does not indicate if management goals are being met. This information is provided to Canadians in a number of formats including: static and interactive maps, charts and graphs, HTML and CSV data tables and downloadable reports. See the supplementary documentation for data sources and details on how those data were collected and how the indicator was calculated. Supplemental Information Canadian Environmental Sustainability Indicators - Home page: https://www.canada.ca/environmental-indicators
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
tBiomedL is a dataset for tabular data to knowledge graph matching. It is derived for the Biodiversity domain and has two types of tables. On the one hand, Horizontal Relational Tables are where each table represents a collection of entities. On the other hand, Entity Tables represent a single entity. We supported ground truth data from Wikidata as a target knowledge graph (KG). tBiomedL is generated by KG2Tables using five levels of a recursive hierarchy of related concepts in Wikidata. It is the successor work of tBiomed tBiomedL contains 860,479 entity and horizontal tables, while this repository contains only a sample of 1% of the total of the entire benchmark with its ground truth data (gt). The Full size of this dataset is 27 GB. We will update this repository with the full dataset, including the test fold with its ground truth data in the Future. Please get in touch if you are interested in the full dataset, The supported tasks for semantic table annotations are:
Topic Detection (TD) links the entire table to an entity or a class from the target KG. Cell Entity Annotation (CEA) maps individual table cells to entities from the target KG. Column Type Annotation (CTA) links individual table columns to classes from the target KG. Column Property Annotation (CPA) detects the relations between column pairs from the target knowledge graph. Row Annotation (RA) annotates the entire row to a KG entity or property.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
The Australian Census Longitudinal Dataset (ACLD) brings together a 5% sample from the 2006 Census with records from the 2011 Census to create a research tool for exploring how Australian society is changing over time. In taking a longitudinal view of Australians, the ACLD may uncover new insights into the dynamics and transitions that drive social and economic change over time, conveying how these vary for diverse population groups and geographies. It is envisaged that the 2016 and successive Censuses will be added in the future, as well as administrative data sets. The ACLD is released in ABS TableBuilder and as a microdata product in the ABS Data Laboratory.
The Census of Population and Housing is conducted every five years and aims to measure accurately the number of people and dwellings in Australia on Census Night.
Microdata products are the most detailed information available from a Census or survey and are generally the responses to individual questions on the questionnaire. They also include derived data from answers to two or more questions and are released with the approval of the Australian Statistician. The following microdata products are available for this longitudinal dataset: •ACLD in TableBuilder - an online tool for creating tables and graphs. •ACLD in ABS Data Laboratory (ABSDL) - for in-depth analysis using a range of statistical software packages.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This dataset includes pesticide-concentration results for 32 discrete water samples that were collected on the islands of Kauai and Oahu between November 21, 2016 and April 29, 2017. The water samples were collected for the Pesticide-Monitoring Program of Surface Water in the State of Hawaii. This dataset consists of five files: a summary file, a sample-list file, two results files, and this metadata file. The summary file (Summary_of_pesticide_results_for_discrete_samples_Hawaii_Nov2016_Apr2017.pdf) includes maps and a table of the sample sites, graphs that summarize the most frequently detected pesticide compounds in the water samples, and tables that summarize comparisons between (1) concentrations of detected pesticide compounds and (2) water-quality standards, criteria, and benchmarks. The sample-list file [List_of_discrete_samples_Hawaii_Nov2016_Apr2017.csv] contains a list of 32 samples and attributes that describe where, when, and how each sample was collected. The first r ...
Aim: Planted forests are becoming increasingly common worldwide for a variety of reasons including water conservation and carbon sequestration, whereas the effects of tree plantations on biodiversity are unclear as to whether planted ecosystems are ‘green deserts’ or valuable habitats for biodiversity.
Location: Global.
Time period: 1980–2020.
Taxa studied: Flora, fauna, and microorganisms.
Methods: By conducting a meta-analysis of 361 observations from 138 sites worldwide, we explored the global patterns and associated drivers of biodiversity responding to tree plantations by comparing biodiversity levels in plantations and adjacent habitats (primary or secondary forests).
Results: Overall, the biodiversity (species richness) and abundance across multi-trophic levels in tree plantations was lower than that in primary forests, reached similar values to secondary succession, but varied with plantation and management regimes. Specifically, the biodiversity across multi-trophic levels...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
tBiodiv is a dataset for tabular data to knowledge graph matching. It is derived for the Biodiversity domain and has two types of tables. On the one hand, Horizontal Relational Tables are where each table represents a collection of entities. On the other hand, Entity Tables represent a single entity. We supported ground truth data from Wikidata as a target knowledge graph (KG).
tBiodiv is generated by KG2Tables using two levels of a recursive hierarchy of related concepts in Wikidata.
tBiodiv contains 57,426 entity and horizontal tables, while this repository contains only a sample of 1% of the total generated tables of the entire benchmark with its ground truth data (gt). The Full size of this dataset is 122 GB. We will update this repository with the full dataset in the Future.
Please get in touch if you are interested in the full dataset,
The supported tasks for semantic table annotations are:
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Note: none of the data sets published here contain actual data, they are for testing purposes only.
This data repository contains graph datasets, where each graph is represented by two CSV files: one for node information and another for edge details. To link the files to the same graph, their names include a common identifier based on the number of nodes. For example:
dataset_30_nodes_interactions.csv
:contains 30 rows (nodes).dataset_30_edges_interactions.csv
: contains 47 rows (edges).dataset_30
refers to the same graph.Each dataset contains the following columns:
Name of the Column | Type | Description |
UniProt ID | string | protein identification |
label | string | protein label (type of node) |
properties | string | a dictionary containing properties related to the protein. |
Each dataset contains the following columns:
Name of the Column | Type | Description |
Relationship ID | string | relationship identification |
Source ID | string | identification of the source protein in the relationship |
Target ID | string | identification of the target protein in the relationship |
label | string | relationship label (type of relationship) |
properties | string | a dictionary containing properties related to the relationship. |
Graph | Number of Nodes | Number of Edges | Sparse graph |
dataset_30* |
30 | 47 |
Y |
dataset_60* |
60 |
181 |
Y |
dataset_120* |
120 |
689 |
Y |
dataset_240* |
240 |
2819 |
Y |
dataset_300* |
300 |
4658 |
Y |
dataset_600* |
600 |
18004 |
Y |
dataset_1200* |
1200 |
71785 |
Y |
dataset_2400* |
2400 |
288600 |
Y |
dataset_3000* |
3000 |
449727 |
Y |
dataset_6000* |
6000 |
1799413 |
Y |
dataset_12000* |
12000 |
7199863 |
Y |
dataset_24000* |
24000 |
28792361 |
Y |
dataset_30000* |
30000 |
44991744 |
Y |
This repository include two (2) additional tiny graph datasets to experiment before dealing with larger datasets.
Each dataset contains the following columns:
Name of the Column | Type | Description |
ID | string | node identification |
label | string | node label (type of node) |
properties | string | a dictionary containing properties related to the node. |
Each dataset contains the following columns:
Name of the Column | Type | Description |
ID | string | relationship identification |
source | string | identification of the source node in the relationship |
target | string | identification of the target node in the relationship |
label | string | relationship label (type of relationship) |
properties | string | a dictionary containing properties related to the relationship. |
Graph | Number of Nodes | Number of Edges | Sparse graph |
dataset_dummy* | 3 | 6 | N |
dataset_dummy2* | 3 | 6 | N |