21 datasets found

f
Graph Input Data Example.xlsx
figshare.com
xlsx
Updated Dec 26, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr Corynen (2018). Graph Input Data Example.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.7506209.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7506209.v1
Dataset updated
Dec 26, 2018
Dataset provided by
figshare
Authors
Dr Corynen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The various performance criteria applied in this analysis include the probability of reaching the ultimate target, the costs, elapsed times and system vulnerability resulting from any intrusion. This Excel file contains all the logical, probabilistic and statistical data entered by a user, and required for the evaluation of the criteria. It also reports the results of all the computations.
m
The banksia plot: a method for visually comparing point estimates and...
bridges.monash.edu
researchdata.edu.au
txt
Updated Oct 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simon Turner; Amalia Karahalios; Elizabeth Korevaar; Joanne E. McKenzie (2024). The banksia plot: a method for visually comparing point estimates and confidence intervals across datasets [Dataset]. http://doi.org/10.26180/25286407.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.26180/25286407.v2
Dataset updated
Oct 15, 2024
Dataset provided by
Monash University
Authors
Simon Turner; Amalia Karahalios; Elizabeth Korevaar; Joanne E. McKenzie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Companion data for the creation of a banksia plot:Background:In research evaluating statistical analysis methods, a common aim is to compare point estimates and confidence intervals (CIs) calculated from different analyses. This can be challenging when the outcomes (and their scale ranges) differ across datasets. We therefore developed a plot to facilitate pairwise comparisons of point estimates and confidence intervals from different statistical analyses both within and across datasets.Methods:The plot was developed and refined over the course of an empirical study. To compare results from a variety of different studies, a system of centring and scaling is used. Firstly, the point estimates from reference analyses are centred to zero, followed by scaling confidence intervals to span a range of one. The point estimates and confidence intervals from matching comparator analyses are then adjusted by the same amounts. This enables the relative positions of the point estimates and CI widths to be quickly assessed while maintaining the relative magnitudes of the difference in point estimates and confidence interval widths between the two analyses. Banksia plots can be graphed in a matrix, showing all pairwise comparisons of multiple analyses. In this paper, we show how to create a banksia plot and present two examples: the first relates to an empirical evaluation assessing the difference between various statistical methods across 190 interrupted time series (ITS) data sets with widely varying characteristics, while the second example assesses data extraction accuracy comparing results obtained from analysing original study data (43 ITS studies) with those obtained by four researchers from datasets digitally extracted from graphs from the accompanying manuscripts.Results:In the banksia plot of statistical method comparison, it was clear that there was no difference, on average, in point estimates and it was straightforward to ascertain which methods resulted in smaller, similar or larger confidence intervals than others. In the banksia plot comparing analyses from digitally extracted data to those from the original data it was clear that both the point estimates and confidence intervals were all very similar among data extractors and original data.Conclusions:The banksia plot, a graphical representation of centred and scaled confidence intervals, provides a concise summary of comparisons between multiple point estimates and associated CIs in a single graph. Through this visualisation, patterns and trends in the point estimates and confidence intervals can be easily identified.This collection of files allows the user to create the images used in the companion paper and amend this code to create their own banksia plots using either Stata version 17 or R version 4.3.1
Datasets for manuscript: A Graph-Based Modeling Framework for Tracing...
catalog.data.gov
Updated Nov 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2023). Datasets for manuscript: A Graph-Based Modeling Framework for Tracing Hydrological Pollutant Transport in Surface Waters [Dataset]. https://catalog.data.gov/dataset/datasets-for-manuscript-a-graph-based-modeling-framework-for-tracing-hydrological-pollutan
Explore at:
Dataset updated
Nov 5, 2023
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Hydrology Graphs This repository contains the code for the manuscript "A Graph Formulation for Tracing Hydrological Pollutant Transport in Surface Waters." There are three main folders containing code and data, and these are outlined below. We call the framework for building a graph of these hydrological systems "Hydrology Graphs". Several of the datafiles for building this framework are large and cannot be stored on Github. To conserve space, the notebook get_and_unpack_data.ipynb or the script get_and_unpack_data.py can be used to download the data from the Watershed Boundary Dataset (WBD), the National Hydrography Dataset (NHDPlusV2), and the agricultural land dataset for the state of Wisconsin. The files WILakes.df and WIRivers.df metnioend in section 1 below are contained within the WI_lakes_rivers.zip folder, and the files 24k Hydro Waterbodies dataset are contained in a zip file under the directory DNR_data/Hydro_Waterbodies. These files can also be unpacked by running the corresponding cells in the notebook get_and_unpack_data.ipynb or get_and_unpack_data.py. 1. graph_construction This folder contains the data and code for building a graph of the watershed-river-waterbody hydrological system. It uses data from the Watershed Boundary Dataset (link here) and the National Hydrography Dataset (link here) as a basis and builds a list of directed edges. We use NetworkX to build and visualize the list as a graph. case_studies This folder contains three .ipynb files for three separate case studies. These three case studies focus on how "Hydrology Graphs" can be used to analyze pollutant impacts in surface waters. Details of these case studies can be found in the manuscript above. DNR_data This folder contains data from the Wisconsin Department of Natural Resources (DNR) on water quality in several Wisconsin lakes. The data was obtained from here using the file Web_scraping_script.py. The original downloaded reports are found in the folder original_lake_reports. These reports were then cleaned and reformatted using the script DNR_data_filter.ipynb. The resulting, cleaned reports are found in the Lakes folder. Each subfolder of the Lakes folder contains data for a single lake. The two .csvs lake_index_WBIC.csv contain an index for what lake each numbered subfolder corresponds. In addition, we added the corresponding COMID in lake_index_WBIC_COMID.csv by matching the NHDPlusV2 data to the Wisconsin DNR's 24k Hydro Waterbodies dataset which we downloaded from here. The DNR's reported data only matches lakes to a waterbody identification code (WBIC), so we use HYDROLakes (indexed by WBIC) to match to the COMID. This is done in the DNR_data_filter.ipynb script as well. Python Versions The .py files in graph_construction/ were run using Python version 3.9.7. The scripts used the following packages and version numbers: geopandas (0.10.2) shapely (1.8.1.post1) tqdm (4.63.0) networkx (2.7.1) pandas (1.4.1) numpy (1.21.2). This dataset is associated with the following publication: Cole, D.L., G.J. Ruiz-Mercado, and V.M. Zavala. A graph-based modeling framework for tracing hydrological pollutant transport in surface waters. COMPUTERS AND CHEMICAL ENGINEERING. Elsevier Science Ltd, New York, NY, USA, 179: 108457, (2023).
Amount of data created, consumed, and stored 2010-2023, with forecasts to...
statista.com
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Amount of data created, consumed, and stored 2010-2023, with forecasts to 2028 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 2024
Area covered
Worldwide
Description
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
National Energy Efficiency Data-Framework (NEED) data explorer
gov.uk
s3.amazonaws.com
Updated Jun 27, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Energy Security and Net Zero (2024). National Energy Efficiency Data-Framework (NEED) data explorer [Dataset]. https://www.gov.uk/government/statistical-data-sets/national-energy-efficiency-data-framework-need-data-explorer
Explore at:
Dataset updated
Jun 27, 2024
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Department for Energy Security and Net Zero
Description
The data explorer allows users to create bespoke cross tabs and charts on consumption by property attributes and characteristics, based on the data available from NEED. Two variables can be selected at once (for example property age and property type), with mean, median or number of observations shown in the table. There is also a choice of fuel (electricity or gas). The data spans 2008 to 2022.

Figures provided in the latest version of the tool (June 2024) are based on data used in the June 2023 National Energy Efficiency Data-Framework (NEED) publication. More information on the development of the framework, headline results and data quality are available in the publication. There are also additional detailed tables including distributions of consumption and estimates at local authority level. The data are also available as a comma separated value (csv) file.

If you have any queries or comments on these outputs please contact: energyefficiency.stats@energysecurity.gov.uk.

https://assets.publishing.service.gov.uk/media/668669197541f54efe51b992/NEED_data_explorer_2024.xlsm">

https://assets.publishing.service.gov.uk/media/668669197541f54efe51b992/NEED_data_explorer_2024.xlsm">NEED data explorer

2.56 MB This file may not be suitable for users of assistive technology. <details data-module="ga4-event-tracker" data-ga4-event='{"event_name":"select_content","type":"detail","text":"Request an accessible format.","section":"Request an accessible format.","index_section":1}' class="gem-c-details govuk-details govuk-!-margin-bottom-0" title="Request an accessible format.">

Request an accessible format.

If you use assistive technology (such as a screen reader) and need a version of this document in a more accessible format, please email <a href="mailto:alt.formats@energysecurity.gov.uk" target="_blank" class="govuk-link">alt.formats@energysecurity.gov.uk</a>. Please tell us what format you need. It will help us if you say what assistive technology you use.

https://assets.publishing.service.gov.uk/media/668669367541f54efe51b993/NEED_Data_Explorer_Values_2024.csv">

https://assets.publishing.service.gov.uk/media/668669367541f54efe51b993/NEED_Data_Explorer_Values_2024.csv">NEED data explorer
u
Data from: Results of KROWN: Knowledge Graph Construction Benchmark
investigacion.usc.gal
Updated 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Van Assche, Dylan; Chaves-Fraga, David; Dimou, Anastasia; Van Assche, Dylan; Chaves-Fraga, David; Dimou, Anastasia (2024). Results of KROWN: Knowledge Graph Construction Benchmark [Dataset]. https://investigacion.usc.gal/documentos/67321d87aea56d4af0484853
Explore at:
Dataset updated
2024
Authors
Van Assche, Dylan; Chaves-Fraga, David; Dimou, Anastasia; Van Assche, Dylan; Chaves-Fraga, David; Dimou, Anastasia
Description
In this Zenodo repository we present the results of using KROWN to benchmark popular RDF Graph Materialization systems such as RMLMapper, RMLStreamer, Morph-KGC, SDM-RDFizer, and Ontop (in materialization mode).

What is KROWN 👑?

KROWN 👑 is a benchmark for materialization systems to construct Knowledge Graphs from (semi-)heterogeneous data sources using declarative mappings such as RML.

Many benchmarks already exist for virtualization systems e.g. GTFS-Madrid-Bench, NPD, BSBM which focus on complex queries with a single declarative mapping. However, materialization systems are unaffected by complex queries since their input is the dataset and the mappings to generate a Knowledge Graph. Some specialized datasets exist to benchmark specific limitations of materialization systems such as duplicated or empty values in datasets e.g. GENOMICS, but they do not cover all aspects of materialization systems. Therefore, it is hard to compare materialization systems among each other in general which is where KROWN 👑 comes in!

Results

The raw results are available as ZIP archives, the analysis of the results are available in the spreadsheet results.ods.

Evaluation setup

We generated several scenarios using KROWN’s data generator and executed them 5 times with KROWN’s execution framework. All experiments were performed on Ubuntu 22.04 LTS machines (Linux 5.15.0, x86_64) with each Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz, 48 GB RAM memory, and 2 GB swap memory. The output of each materialization system was set to N-Triples.

Materialization systems

We selected the most popular maintained materialization systems for constructing RDF graphs for performing our experiments with KROWN:

RMLMapper

RMLStreamer

Morph-KGC

SDM-RDFizer

OntopM (Ontop in materialization mode)

Note: KROWN is flexible and allows adding any other materialization system, see KROWN’s execution framework documentation for more information.

Scenarios

We consider the following scenarios:

Raw data: number of rows, columns and cell size

Duplicates & empty values: percentage of the data containing duplicates or empty values

Mappings: Triples Maps (TM), Predicate Object Maps (POM), Named Graph Maps (NG).

Joins: relations (1-N, N-1, N-M), conditions, and duplicates during joins

Note: KROWN is flexible and allows adding any other scenario, see KROWN’s data generator documentation for more information.

In the table below we list all parameter values we used to configure our scenarios:

Scenario

Parameter values

Raw data: rows

10K, 100K, 1M, 10M

Raw data: columns

1, 10, 20, 30

Raw data: cell size

500, 1K, 5K, 10K

Duplicates: percentage

0%, 25%, 50%, 75%, 100%

Empty values: percentage

0%, 25%, 50%, 75%, 100%

Mappings: TMs + 5POMs

1, 10, 20, 30 TMs

Mappings: 20TMs + POMs

1, 3, 5, 10 POMs

Mappings: NG in SM

1, 5, 10, 15 NGs

Mappings: NG in POM

1, 5, 10, 15 NGs

Mappings: NG in SM/POM

1/1, 5/5, 10/10, 15/15 NGs

Joins: 1-N relations

1-1, 1-5, 1-10, 1-15

Joins: N-1 relations

1-1, 5-1, 10-1, 15-1

Joins: N-M relations

3-3, 3-5, 5-3, 10-5, 5-10

Joins: join conditions

1, 5, 10, 15

Joins: join duplicates

0, 5, 10, 15
g
Datasets for manuscript: ADAM: A Web Platform for Graph-Based Modeling and...
gimi9.com
Updated Nov 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Datasets for manuscript: ADAM: A Web Platform for Graph-Based Modeling and Optimization of Supply Chains | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_datasets-for-manuscript-adam-a-web-platform-for-graph-based-modeling-and-optimization-of-s/
Explore at:
Dataset updated
Nov 2, 2023
Description
ADAM-Data-Repository This repository contains all the data needed to run the case studies for the ADAM manuscript. ## Biogas production The directory "biogas" contains all data for the biogas production case studies (Figs 13 and 14). Specifically, "biogas/biogas_x" contains the data files for the scenario where "x" is the corresponding Renewable Energy Certificates (RECs) value. ## Plastic waste recycling The directory "plastic_waste" contains all data for the plastic waste recycling case studies (Figs 15 and 16). Different scenarios share the same supply, technology site, and technology candidate data, as specified by the "csv" files under "plastic_waste". Each scenario has a different demand data file, which is contained in "plastic_waste/Elec_price" and "plastic_waste/PET_price". ## How to run the case studies In order to run the case studies, one can create a new model in ADAM and upload appropriate CSV file at each step (e.g. upload biogas/biogas_0/supplydata197.csv in step 2 where supply data are specified). This dataset is associated with the following publication: Hu, Y., W. Zhang, P. Tominac, M. Shen, D. Göreke, E. Martín-Hernández, M. Martín, G.J. Ruiz-Mercado, and V.M. Zavala. ADAM: A web platform for graph-based modeling and optimization of supply chains. COMPUTERS AND CHEMICAL ENGINEERING. Elsevier Science Ltd, New York, NY, USA, 165: 107911, (2022).
g
Focus on London - Income and Spending
gimi9.com
data.europa.eu
+1more
Updated Oct 17, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). Focus on London - Income and Spending [Dataset]. https://gimi9.com/dataset/eu_focus-on-london-income-and-spending
Explore at:
Dataset updated
Oct 17, 2019
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
London
Description
FOCUSON**LONDON**2010:**INCOME**AND**SPENDING**AT**HOME** Household income in London far exceeds that of any other region in the UK. At £900 per week, London’s gross weekly household income is 15 per cent higher than the next highest region. Despite this, the costs to each household are also higher in the capital. Londoners pay a greater amount of their income in tax and national insurance than the UK average as well as footing a higher bill for housing and everyday necessities. All of which leaves London households less well off than the headline figures suggest. This chapter, authored by Richard Walker in the GLA Intelligence Unit, begins with an analysis of income at both individual and household level, before discussing the distribution and sources of income. This is followed by a look at wealth and borrowing and finally, focuses on expenditure including an insight to the cost of housing in London, compared with other regions in the UK. See other reports from this Focus on London series. REPORT: To view the report online click on the image below. Income and Spending Report PDF https://londondatastore-upload.s3.amazonaws.com/fol/fol10-income-cover-thumb1.png" alt="Alt text"> PRESENTATION: This interactive presentation finds the answer to the question, who really is better off, an average London or UK household? This analysis takes into account available data from all types of income and expenditure. Click on the link to access. PREZI The Prezi in plain text version RANKINGS: https://londondatastore-upload.s3.amazonaws.com/fol/fol10-income-tableau-chart-thumb.jpg" alt="Alt text"> This interactive chart shows some key borough level income and expenditure data. This chart helps show the relationships between five datasets. Users can rank each of the indicators in turn. Borough rankings Tableau Chart MAP: These interactive borough maps help to geographically present a range of income and expenditure data within London. Interactive Maps - Instant Atlas DATA: All the data contained within the Income and Spending at Home report as well as the data used to create the charts and maps can be accessed in this spreadsheet. Report data FACTS: Some interesting facts from the data… ● Five boroughs with the highest median gross weekly pay per person in 2009: -1. Kensington & Chelsea - £809 -2. City of London - £767 -3. Westminster - £675 -4. Wandsworth - £636 -5. Richmond - £623 -32. Brent - £439 -33. Newham - £422 ● Five boroughs with the highest median weekly rent for a 2 bedroom property in October 2010: -1. Kensington & Chelsea - £550 -2. Westminster - £500 -3. City of London - £450 -4. Camden - £375 -5. Islington - £360 -32. Havering - £183 -33. Bexley - £173 ● Five boroughs with the highest percentage of households that own their home outright in 2009: -1. Bexley – 38 per cent -2. Havering – 36 per cent -3. Richmond – 32 per cent -4. Bromley – 31 per cent -5. Barnet – 28 per cent -31. Tower Hamlets – 9 per cent -32. Southwark – 9 per cent
u
Goodreads Book Reviews
cseweb.ucsd.edu
json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
UCSD CSE Research Project, Goodreads Book Reviews [Dataset]. https://cseweb.ucsd.edu/~jmcauley/datasets.html
Explore at:
jsonAvailable download formats
Dataset authored and provided by
UCSD CSE Research Project
Description
These datasets contain reviews from the Goodreads book review website, and a variety of attributes describing the items. Critically, these datasets have multiple levels of user interaction, raging from adding to a shelf, rating, and reading.

Metadata includes

reviews

add-to-shelf, read, review actions

book attributes: title, isbn

graph of similar books

Basic Statistics:

Items: 1,561,465

Users: 808,749

Interactions: 225,394,930
Global Biotic Interactions: Interpreted Data Products...
zenodo.org
application/gzip, bin +1
Updated Jun 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GloBI Community; GloBI Community (2024). Global Biotic Interactions: Interpreted Data Products hash://md5/946f7666667d60657dc89d9af8ffb909 hash://sha256/4e83d2daee05a4fa91819d58259ee58ffc5a29ec37aa7e84fd5ffbb2f92aa5b8 [Dataset]. http://doi.org/10.5281/zenodo.11552565
Explore at:
application/gzip, zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11552565
Dataset updated
Jun 10, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
GloBI Community; GloBI Community
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Global Biotic Interactions: Interpreted Data Products

Global Biotic Interactions (GloBI, https://globalbioticinteractions.org, [1]) aims to facilitate access to existing species interaction records (e.g., predator-prey, plant-pollinator, virus-host). This data publication provides interpreted species interaction data products. These products are the result of a process in which versioned, existing species interaction datasets ([2]) are linked to the so-called GloBI Taxon Graph ([3]) and transformed into various aggregate formats (e.g., tsv, csv, neo4j, rdf/nquad, darwin core-ish archives). In addition, the applied name maps are included to make the applied taxonomic linking explicit.

Citation
--------

GloBI is made possible by researchers, collections, projects and institutions openly sharing their datasets. When using this data, please make sure to attribute these *original data contributors*, including citing the specific datasets in derivative work. Each species interaction record indexed by GloBI contains a reference and dataset citation. Also, a full lists of all references can be found in citations.csv/citations.tsv files in this publication. If you have ideas on how to make it easier to cite original datasets, please open/join a discussion via https://globalbioticinteractions.org or related projects.

To credit GloBI for more easily finding interaction data, please use the following citation to reference GloBI:

Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2014.08.005.

Bias and Errors
--------

As with any analysis and processing workflow, care should be taken to understand the bias and error propagation of data sources and related data transformation processes. The datasets indexed by GloBI are biased geospatially, temporally and taxonomically ([5], [6]). Also, mapping of verbatim names from datasets to known name concept may contains errors due to synonym mismatches, outdated names lists, typos or conflicting name authorities. Finally, bugs may introduce bias and errors in the resulting integrated data product.

To help better understand where bias and errors are introduced, only versioned data and code are used as an input: the datasets ([2]), name maps ([3]) and integration software ([6]) are versioned so that the integration processes can be reproduced if needed. This way, steps take to compile an integrated data record can be traced and the sources of bias and errors can be more easily found.

This version was preceded by [7].

Contents
--------

README:
this file

citations.csv.gz:
contains data citations in a in a gzipped comma-separated values format.

citations.tsv.gz:
contains data citations in a gzipped tab-separated values format.

datasets.csv.gz:
contains list of indexed datasets in a gzipped comma-separated values format.

datasets.tsv.gz:
contains list of indexed datasets in a gzipped tab-separated values format.

verbatim-interactions.csv.gz
contains species interactions tabulated as pair-wise interaction in a gzipped comma-separated values format. Included taxonomic name are *not* interpreted, but included as documented in their sources.

verbatim-interactions.tsv.gz
contains species interactions tabulated as pair-wise interaction in a gzipped tab-separated values format. Included taxonomic name are *not* interpreted, but included as documented in their sources.

interactions.csv.gz:
contains species interactions tabulated as pair-wise interactions in a gzipped comma-separated values format. Included taxonomic names are interpreted using taxonomic alignment workflows and may be different than those provided by the original sources.

interactions.tsv.gz:
contains species interactions tabulated as pair-wise interactions in a gzipped tab-separated values format. Included taxonomic names are interpreted using taxonomic alignment workflows and may be different than those provided by the original sources.

refuted-interactions.csv.gz:
contains refuted species interactions tabulated as pair-wise interactions in a gzipped comma-separated values format. Included taxonomic names are interpreted using taxonomic alignment workflows and may be different than those provided by the original sources.

refuted-interactions.tsv.gz:
contains refuted species interactions tabulated as pair-wise interactions in a gzipped tab-separated values format. Included taxonomic names are interpreted using taxonomic alignment workflows and may be different than those provided by the original sources.

refuted-verbatim-interactions.csv.gz:
contains refuted species interactions tabulated as pair-wise interactions in a gzipped comma-separated values format. Included taxonomic name are *not* interpreted, but included as documented in their sources.

refuted-verbatim-interactions.tsv.gz:
contains refuted species interactions tabulated as pair-wise interactions in a gzipped tab-separated values format. Included taxonomic name are *not* interpreted, but included as documented in their sources.

interactions.nq.gz:
contains species interactions expressed in the resource description framework in a gzipped rdf/quads format.

dwca-by-study.zip:
contains species interactions data as a Darwin Core Archive aggregated by study using a custom, occurrence level, association extension.

dwca.zip:
contains species interactions data as a Darwin Core Archive using a custom, occurrence level, association extension.

neo4j-graphdb.zip:
contains a neo4j v3.5.32 graph database snapshot containing a graph representation of the species interaction data.

taxonCache.tsv.gz:
contains hierarchies and identifiers associated with names from naming schemes in a gzipped tab-separated values format.

taxonMap.tsv.gz:
describes how names in existing datasets were mapped into existing naming schemes in a gzipped tab-separated values format.

References
-----

[1] Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. doi: 10.1016/j.ecoinf.2014.08.005.

[2] Poelen, J. H. (2020) Global Biotic Interactions: Elton Dataset Cache. Zenodo. doi: 10.5281/ZENODO.3950557.

[3] Poelen, J. H. (2021). Global Biotic Interactions: Taxon Graph (Version 0.3.28) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.4451472

[4] Hortal, J. et al. (2015) Seven Shortfalls that Beset Large-Scale Knowledge of Biodiversity. Annual Review of Ecology, Evolution, and Systematics, 46(1), pp.523–549. doi: 10.1146/annurev-ecolsys-112414-054400.

[5] Cains, M. et al. (2017) Ivmooc 2017 - Gap Analysis Of Globi: Identifying Research And Data Sharing Opportunities For Species Interactions. Zenodo. Zenodo. doi: 10.5281/ZENODO.814978.

[6] Poelen, J. et al. (2022) globalbioticinteractions/globalbioticinteractions v0.24.6. Zenodo. doi: 10.5281/ZENODO.7327955.

[7] GloBI Community. (2023). Global Biotic Interactions: Interpreted Data Products hash://md5/89797a5a325ac5c50990581689718edf hash://sha256/946178b36c3ea2f2daa105ad244cf5d6cd236ec8c99956616557cf4e6666545b (0.6) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.8284068

Content References
-----

hash://sha256/fb4e5f2d0288ab9936dc2298b0a7a22526f405e540e55c3de9c1cbd01afa9a00 citations.csv.gz
hash://sha256/12a154440230203b9d54f5233d4bda20c482d9d2a34a8363c6d7efdf4281ee47 citations.tsv.gz
hash://sha256/236882c394ff15eda4fe2e994a8f07cb9c0c42bd77d9a5339c9fac217b16a004 datasets.csv.gz
hash://sha256/236882c394ff15eda4fe2e994a8f07cb9c0c42bd77d9a5339c9fac217b16a004 datasets.tsv.gz
hash://sha256/42d50329eca99a6ded1b3fc63af5fa99b029b44ffeba79a02187311422c8710c dwca-by-study.zip
hash://sha256/77f7e1db20e977287ed6983ce7ea1d8b35bd88fe148372b9886ce62989bc2c22 dwca.zip
hash://sha256/4fb8f91d5638ef94ddc0b301e891629802e8080f01e3040bf3d0e819e0bfbe9e interactions.csv.gz
hash://sha256/c83ffa45ffc8e32f1933d23364c108fff92d8b9480401d54e2620a961ad9f0c5 interactions.nq.gz
hash://sha256/ce0d1ce3bebf94198996f471a03a15ad54a8c1aac5a5a6905e0f2fd4687427ac interactions.tsv.gz
hash://sha256/e4adf8c0fe545410c08e497d3189075a262f086977556c0f0fd229f8a2f39ffe neo4j-graphdb.zip
hash://sha256/8cbf6cd70ecbd724f1a4184aeeb0ba78b67747a627e5824d960fe98651871b34 refuted-interactions.csv.gz
hash://sha256/caa0f7bcf91531160fda7c4fc14020154ce6183215f77aacb8dbb0b823295022 refuted-interactions.tsv.gz
hash://sha256/29ed2703c0696d0d6ab1f1a00fcdce6da7c86d0a85ddd6e8bb00a3b1017daac9 refuted-verbatim-interactions.csv.gz
hash://sha256/5542136e32baa935ffa4834889f6af07989fab94db763ab01a3e135886a23556 refuted-verbatim-interactions.tsv.gz
hash://sha256/af742d945a1ecdb698926589fceb8147e99f491d7475b39e9b516ce1cfe2599b taxonCache.tsv.gz
hash://sha256/1a85b81dc9312994695e63966dec06858bbcd3c084f5044c29371b1c14f15c3d taxonMap.tsv.gz
hash://sha256/5f9ebc62be68f7ffb097c4ff168e6b7b45b1e835843c90a2af6b30d7e2a9eab1 verbatim-interactions.csv.gz
hash://sha256/d29704b6275a2f7aaffbd131d63009914bdbbf1d9bc2667ff4ce0713d586f4f6 verbatim-interactions.tsv.gz

hash://sha256/735599feaf18a416a375d985a27f51bb citations.csv.gz
hash://sha256/328049ca46682b8aee2611fe3ef2e3c9 citations.tsv.gz
hash://sha256/8a645af66bf9cf8ddae0c3d6bc3ccb30 datasets.csv.gz
hash://sha256/8a645af66bf9cf8ddae0c3d6bc3ccb30 datasets.tsv.gz
hash://sha256/654eb9d9445ed382036f0e45398ec6bb dwca-by-study.zip
hash://sha256/291e517d3ca72b727d85501a289d7d59 dwca.zip
hash://sha256/4dbfb8605adce1c0e2165d5bdb918f95
H
Tutorial: How to use Google Data Studio and ArcGIS Online to create an...
hydroshare.org
dataone.org
+1more
zip
Updated Jul 31, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sarah Beganskas (2020). Tutorial: How to use Google Data Studio and ArcGIS Online to create an interactive data portal [Dataset]. http://doi.org/10.4211/hs.9edae0ef99224e0b85303c6d45797d56
Explore at:
zip(2.9 MB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.9edae0ef99224e0b85303c6d45797d56
Dataset updated
Jul 31, 2020
Dataset provided by
HydroShare
Authors
Sarah Beganskas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This tutorial will teach you how to take time-series data from many field sites and create a shareable online map, where clicking on a field location brings you to a page with interactive graph(s).

The tutorial can be completed with a sample dataset (provided via a Google Drive link within the document) or with your own time-series data from multiple field sites.

Part 1 covers how to make interactive graphs in Google Data Studio and Part 2 covers how to link data pages to an interactive map with ArcGIS Online. The tutorial will take 1-2 hours to complete.

An example interactive map and data portal can be found at: https://temple.maps.arcgis.com/apps/View/index.html?appid=a259e4ec88c94ddfbf3528dc8a5d77e8
d
Replication data for: Job-to-Job Mobility and Inflation
search.dataone.org
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Faccini, Renato; Melosi, Leonardo (2023). Replication data for: Job-to-Job Mobility and Inflation [Dataset]. http://doi.org/10.7910/DVN/SMQFGS
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/SMQFGS
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Faccini, Renato; Melosi, Leonardo
Description
Replication files for "Job-to-Job Mobility and Inflation" Authors: Renato Faccini and Leonardo Melosi Review of Economics and Statistics Date: February 2, 2023 -------------------------------------------------------------------------------------------- ORDERS OF TOPICS .Section 1. We explain the code to replicate all the figures in the paper (except Figure 6) .Section 2. We explain how Figure 6 is constructed .Section 3. We explain how the data are constructed SECTION 1 Replication_Main.m is used to reproduce all the figures of the paper except Figure 6. All the primitive variables are defined in the code and all the steps are commented in code to facilitate the replication of our results. Replication_Main.m, should be run in Matlab. The authors tested it on a DELL XPS 15 7590 laptop wih the follwoing characteristics: -------------------------------------------------------------------------------------------- Processor Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz 2.40 GHz Installed RAM 64.0 GB System type 64-bit operating system, x64-based processor -------------------------------------------------------------------------------------------- It took 2 minutes and 57 seconds for this machine to construct Figures 1, 2, 3, 4a, 4b, 5, 7a, and 7b. The following version of Matlab and Matlab toolboxes has been used for the test: -------------------------------------------------------------------------------------------- MATLAB Version: 9.7.0.1190202 (R2019b) MATLAB License Number: 363305 Operating System: Microsoft Windows 10 Enterprise Version 10.0 (Build 19045) Java Version: Java 1.8.0_202-b08 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode -------------------------------------------------------------------------------------------- MATLAB Version 9.7 (R2019b) Financial Toolbox Version 5.14 (R2019b) Optimization Toolbox Version 8.4 (R2019b) Statistics and Machine Learning Toolbox Version 11.6 (R2019b) Symbolic Math Toolbox Version 8.4 (R2019b) -------------------------------------------------------------------------------------------- The replication code uses auxiliary files and save the pictures in various subfolders: \JL_models: It contains the equations describing the model including the observation equations and routine used to solve the model. To do so, the routine in this folder calls other routines located in some fo the subfolders below. \gensystoama: It contains a set of codes that allow us to solve linear rational expectations models. We use the AMA solver. More information are provided in the file AMASOLVE.m. The codes in this subfolder have been developed by Alejandro Justiniano. \filters: it contains the Kalman filter augmented with a routine to make sure that the zero lower bound constraint for the nominal interest rate is satisfied in every period in our sample. \SteadyStateSolver: It contains a set of routines that are used to solved the steady state of the model numerically. \NLEquations: It contains some of the equations of the model that are log-linearized using the symbolic toolbox of matlab. \NberDates: It contains a set of routines that allows to add shaded area to graphs to denote NBER recessions. \Graphics: It contains useful codes enabling features to construct some of the graphs in the paper. \Data: it contains the data set used in the paper. \Params: It contains a spreadsheet with the values attributes to the model parameters. \VAR_Estimation: It contains the forecasts implied by the Bayesian VAR model of Section 2. The output of Replication_Main.m are the figures of the paper that are stored in the subfolder \Figures SECTION 2 The Excel file "Figure-6.xlsx" is used to create the charts in Figure 6. All three panels of the charts (A, B, and C) plot a measure of unexpected wage inflation against the unemployment rate, then fits separate linear regressions for the periods 1960-1985,1986-2007, and 2008-2009. Unexpected wage inflation is given by the difference between wage growth and a measure of expected wage growth. In all three panels, the unemployment rate used is the civilian unemployment rate (UNRATE), seasonally adjusted, from the BLS. The sheet "Panel A" uses quarterly manufacturing sector average hourly earnings growth data, seasonally adjusted (CES3000000008), from the Bureau of Labor Statistics (BLS) Employment Situation report as the measure of wage inflation. The unexpected wage inflation is given by the difference between earnings growth at time t and the average of earnings growth across the previous four months. Growth rates are annualized quarterly values. The sheet "Panel B" uses quarterly Nonfarm Business Sector Compensation Per Hour, seasonally adjusted (COMPNFB), from the BLS Productivity and Costs report as its measure of wage inflation. As in Panel A, expected wage inflation is given by the... Visit https://dataone.org/datasets/sha256%3A44c88fe82380bfff217866cac93f85483766eb9364f66cfa03f1ebdaa0408335 for complete metadata about this dataset.
d
Factori | US Consumer Graph Data - Acquisition Marketing & Consumer Data...
datarade.ai
.json, .csv
Updated Jul 23, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Factori (2022). Factori | US Consumer Graph Data - Acquisition Marketing & Consumer Data Insights | Append 100+ Attributes from 220M+ Consumer Profiles [Dataset]. https://datarade.ai/data-products/factori-usa-consumer-graph-data-acquisition-marketing-a-factori
Explore at:
.json, .csvAvailable download formats
Dataset updated
Jul 23, 2022
Dataset authored and provided by
Factori
Area covered
United States of America
Description
Our consumer data is gathered and aggregated via surveys, digital services, and public data sources. We use powerful profiling algorithms to collect and ingest only fresh and reliable data points.

Our comprehensive data enrichment solution includes a variety of data sets that can help you address gaps in your customer data, gain a deeper understanding of your customers, and power superior client experiences. 1. Geography - City, State, ZIP, County, CBSA, Census Tract, etc. 2. Demographics - Gender, Age Group, Marital Status, Language etc. 3. Financial - Income Range, Credit Rating Range, Credit Type, Net worth Range, etc 4. Persona - Consumer type, Communication preferences, Family type, etc 5. Interests - Content, Brands, Shopping, Hobbies, Lifestyle etc. 6. Household - Number of Children, Number of Adults, IP Address, etc. 7. Behaviours - Brand Affinity, App Usage, Web Browsing etc. 8. Firmographics - Industry, Company, Occupation, Revenue, etc 9. Retail Purchase - Store, Category, Brand, SKU, Quantity, Price etc. 10. Auto - Car Make, Model, Type, Year, etc. 11. Housing - Home type, Home value, Renter/Owner, Year Built etc.

Consumer Graph Schema & Reach: Our data reach represents the total number of counts available within various categories and comprises attributes such as country location, MAU, DAU & Monthly Location Pings:

Data Export Methodology: Since we collect data dynamically, we provide the most updated data and insights via a best-suited method on a suitable interval (daily/weekly/monthly).

Consumer Graph Use Cases: 360-Degree Customer View: Get a comprehensive image of customers by the means of internal and external data aggregation. Data Enrichment: Leverage Online to offline consumer profiles to build holistic audience segments to improve campaign targeting using user data enrichment Fraud Detection: Use multiple digital (web and mobile) identities to verify real users and detect anomalies or fraudulent activity. Advertising & Marketing: Understand audience demographics, interests, lifestyle, hobbies, and behaviors to build targeted marketing campaigns.

Here's the schema of Consumer Data: person_id first_name last_name age gender linkedin_url twitter_url facebook_url city state address zip zip4 country delivery_point_bar_code carrier_route walk_seuqence_code fips_state_code fips_country_code country_name latitude longtiude address_type metropolitan_statistical_area core_based+statistical_area census_tract census_block_group census_block primary_address pre_address streer post_address address_suffix address_secondline address_abrev census_median_home_value home_market_value property_build+year property_with_ac property_with_pool property_with_water property_with_sewer general_home_value property_fuel_type year month household_id Census_median_household_income household_size marital_status length+of_residence number_of_kids pre_school_kids single_parents working_women_in_house_hold homeowner children adults generations net_worth education_level occupation education_history credit_lines credit_card_user newly_issued_credit_card_user credit_range_new
credit_cards loan_to_value mortgage_loan2_amount mortgage_loan_type
mortgage_loan2_type mortgage_lender_code
mortgage_loan2_render_code
mortgage_lender mortgage_loan2_lender
mortgage_loan2_ratetype mortgage_rate
mortgage_loan2_rate donor investor interest buyer hobby personal_email work_email devices phone employee_title employee_department employee_job_function skills recent_job_change company_id company_name company_description technologies_used office_address office_city office_country office_state office_zip5 office_zip4 office_carrier_route office_latitude office_longitude office_cbsa_code
office_census_block_group
office_census_tract office_county_code
company_phone
company_credit_score
company_csa_code
company_dpbc
company_franchiseflag
company_facebookurl company_linkedinurl company_twitterurl
company_website company_fortune_rank
company_government_type company_headquarters_branch company_home_business
company_industry
company_num_pcs_used
company_num_employees
company_firm_individual company_msa company_msa_name
company_naics_code
company_naics_description
company_naics_code2 company_naics_description2
company_sic_code2
company_sic_code2_description
company_sic_code4 company_sic_code4_description
company_sic_code6
company_sic_code6_description
company_sic_code8
company_sic_code8_description company_parent_company
company_parent_company_location company_public_private company_subsidiary_company company_residential_business_code company_revenue_at_side_code company_revenue_range
company_revenue company_sales_volume
company_small_business company_stock_ticker company_year_founded company_minorityowned
company_female_owned_or_operated company_franchise_code company_dma company_dma_name
company_hq_address
company_hq_city company_hq_duns company_hq_state
company_hq_zip5 company_hq_zip4 co...
S
Analysis of ExperimentalREsults and Graphing
scidb.cn
Updated Jul 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
peng xiao ju (2024). Analysis of ExperimentalREsults and Graphing [Dataset]. http://doi.org/10.57760/sciencedb.18144
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.18144
Dataset updated
Jul 26, 2024
Dataset provided by
Science Data Bank
Authors
peng xiao ju
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data sets contain six sheets, which are: (1) samples and their statistical descriptions, (2) two grading methods and their co-occurrence maps, (3)-(6) rating results under four output conditions.
u
Data from: Warm Core Ring Trajectories in the Northwest Atlantic Slope Sea...
repository.lib.umassd.edu
data.niaid.nih.gov
+1more
Updated Jun 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicholas Porter; Avijit Gangopadhyay; Adrienne Silver (2024). Warm Core Ring Trajectories in the Northwest Atlantic Slope Sea (2021-2023) [Dataset]. https://repository.lib.umassd.edu/esploro/outputs/dataset/Warm-Core-Ring-Trajectories-in-the/9914420810401301
Explore at:
Dataset updated
Jun 2, 2024
Dataset provided by
Zenodo
Authors
Nicholas Porter; Avijit Gangopadhyay; Adrienne Silver
Time period covered
2024
Description
This dataset consists of weekly trajectory information of Gulf Stream Warm Core Rings (WCR) that existed between 2021 and 2023. This work builds upon two previous datasets: (i) Warm Core Ring trajectory information from 2000 to 2010 -- Porter et al. (2022) (https://doi.org/10.5281/zenodo.7406675) (ii) Warm Core Ring trajectory information from 2011 to 2020 -- Silver et al. (2022a) (https://doi.org/10.5281/zenodo.6436380). Combining these three datasets (previous two and this one), a total of 24 years of weekly Warm Core Ring trajectories are now available. An example of how to use such a dataset can be found in Silver et al. (2022b). The format of the dataset is similar to that of Porter et al. (2022) and Silver et al. (2022a), and the following description is adapted from those datasets. This dataset is comprised of individual files containing each ring’s weekly center location and its surface area for 81 WCRs that existed and tracked between January 1, 2021 and December 31, 2023 (5 WCRs formed in 2020 and still existed in 2021; 28 formed in 2021; 30 formed in 2022; 18 formed in 2023). Each Warm Core Ring is identified by a unique alphanumeric code 'WEyyyymmddX', where 'WE' represents a Warm Eddy (as identified in the analysis charts); 'yyyymmdd' is the year, month and day of formation; and the last character 'X' represents the sequential sighting (formation) of the eddy in that particular year. Continuity of a ring which passes from one year to the next is maintained by the same character in the previous year and absorbed by the initial alphabets for the next year. For example, the first ring formed in 2022 has a trailing alphabet of 'H', which signifies that a total of seven rings were carried over from 2021 which were still present on January 1, 2022 and were assigned the initial seven alphabets (A, B, C, D, E, F and G). Each ring has its own netCDF (.nc) filename following its alphanumeric code. Each file contains 4 variables every week, “Lon”- the ring center’s longitude, “Lat”- the ring center’s latitude, “Area” - the rings size in km^2, and “Date” in days – which is the number of days since Jan 01, 0000. Five rings formed in the year 2020 that carried over into the year 2021 were included in this dataset. These rings include ‘WE20200724Q’, ‘WE20200826R’, ‘WE20200911S’, ‘WE20200930T’, and ‘WE20201111W’. The two rings that formed in 2023, and were carried over into the following year were included with their full trajectories going into the year 2024. These rings include ‘WE20231006U’ and ‘WE20231211W’. The process of creating the WCR tracking dataset follows the same methodology of the previously generated WCR census (Gangopadhyay et al., 2019, 2020). The Jenifer Clark’s Gulf Stream Charts (Gangopadhyay et al., 2019) used to create this dataset are 2-3 times a week from 2021-2023. Thus, we used approximately 360+ Charts for the 3 years of analysis. All of these charts were reanalyzed between -75° and -55°W using QGIS 2.18.16 (2016) and geo-referenced on a WGS84 coordinate system (Decker, 1986).
c
Semantic Knowledge Graphing Market is Growing at a CAGR of 14.80% from 2024...
cognitivemarketresearch.com
pdf,excel,csv,ppt
Updated Mar 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cognitive Market Research (2024). Semantic Knowledge Graphing Market is Growing at a CAGR of 14.80% from 2024 to 2031. [Dataset]. https://www.cognitivemarketresearch.com/semantic-knowledge-graphing-market-report
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Mar 4, 2024
Dataset authored and provided by
Cognitive Market Research
License
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
Time period covered
2021 - 2033
Area covered
Global
Description
According to Cognitive Market Research, the global semantic knowledge graphing market size is USD 1512.2 million in 2024 and will expand at a compound annual growth rate (CAGR) of 14.80% from 2024 to 2031.

North America held the major market of around 40% of the global revenue with a market size of USD 604.88 million in 2024 and will grow at a compound annual growth rate (CAGR) of 13.0% from 2024 to 2031. Europe accounted for a share of over 30% of the global market size of USD 453.66 million. Asia Pacific held the market of around 23% of the global revenue with a market size of USD 347.81 million in 2024 and will grow at a compound annual growth rate (CAGR) of 16.8% from 2024 to 2031. Latin America market of around 5% of the global revenue with a market size of USD 75.61 million in 2024 and will grow at a compound annual growth rate (CAGR) of 14.2% from 2024 to 2031. Middle East and Africa held the major market of around 2% of the global revenue with a market size of USD 30.24 million in 2024 and will grow at a compound annual growth rate (CAGR) of 14.5% from 2024 to 2031. The natural language processing knowledge graphing held the highest growth rate in semantic knowledge graphing market in 2024.

Market Dynamics of Semantic Knowledge Graphing Market

Key Drivers of Semantic Knowledge Graphing Market

Growing Volumes of Structured, Semi-structured, and Unstructured Data to Increase the Global Demand

The global demand for semantic knowledge graphing is escalating in response to the exponential growth of structured, semi-structured, and unstructured data. Enterprises are inundated with vast amounts of data from diverse sources such as social media, IoT devices, and enterprise applications. Structured data from databases, semi-structured data like XML and JSON, and unstructured data from documents, emails, and multimedia files present significant challenges in terms of organization, analysis, and deriving actionable insights. Semantic knowledge graphing addresses these challenges by providing a unified framework for representing, integrating, and analyzing disparate data types. By leveraging semantic technologies, businesses can unlock the value hidden within their data, enabling advanced analytics, natural language processing, and knowledge discovery. As organizations increasingly recognize the importance of harnessing data for strategic decision-making, the demand for semantic knowledge graphing solutions continues to surge globally.

Demand for Contextual Insights to Propel the Growth

The burgeoning demand for contextual insights is propelling the growth of semantic knowledge graphing solutions. In today's data-driven landscape, businesses are striving to extract deeper contextual meaning from their vast datasets to gain a competitive edge. Semantic knowledge graphing enables organizations to connect disparate data points, understand relationships, and derive valuable insights within the appropriate context. This contextual understanding is crucial for various applications such as personalized recommendations, predictive analytics, and targeted marketing campaigns. By leveraging semantic technologies, companies can not only enhance decision-making processes but also improve customer experiences and operational efficiency. As industries across sectors increasingly recognize the importance of contextual insights in driving innovation and business success, the adoption of semantic knowledge graphing solutions is poised to witness significant growth. This trend underscores the pivotal role of semantic technologies in unlocking the true potential of data for strategic advantage in today's dynamic marketplace.

Restraint Factors Of Semantic Knowledge Graphing Market

Stringent Data Privacy Regulations to Hinder the Market Growth

Stringent data privacy regulations present a significant hurdle to the growth of the Semantic Knowledge Graphing market. Regulations such as GDPR (General Data Protection Regulation) in Europe and CCPA (California Consumer Privacy Act) in the United States impose strict requirements on how organizations collect, store, process, and share personal data. Compliance with these regulations necessitates robust data protection measures, including anonymization, encryption, and access controls, which can complicate the implementation of semantic knowledge graphing systems. Moreover, concerns about data breach...
CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company...
zenodo.org
data.niaid.nih.gov
application/gzip, bin +1
Updated Jun 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lele Cao; Lele Cao; Vilhelm von Ehrenheim; Vilhelm von Ehrenheim; Mark Granroth-Wilding; Mark Granroth-Wilding; Richard Anselmo Stahl; Richard Anselmo Stahl; Drew McCornack; Drew McCornack; Armin Catovic; Armin Catovic; Dhiana Deva Cavacanti Rocha; Dhiana Deva Cavacanti Rocha (2024). CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company Similarity Quantification [Dataset]. http://doi.org/10.5281/zenodo.11391315
Explore at:
application/gzip, bin, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11391315
Dataset updated
Jun 4, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lele Cao; Lele Cao; Vilhelm von Ehrenheim; Vilhelm von Ehrenheim; Mark Granroth-Wilding; Mark Granroth-Wilding; Richard Anselmo Stahl; Richard Anselmo Stahl; Drew McCornack; Drew McCornack; Armin Catovic; Armin Catovic; Dhiana Deva Cavacanti Rocha; Dhiana Deva Cavacanti Rocha
Time period covered
May 29, 2024
Description
CompanyKG is a heterogeneous graph consisting of 1,169,931 nodes and 50,815,503 undirected edges, with each node representing a real-world company and each edge signifying a relationship between the connected pair of companies.

Edges: We model 15 different inter-company relations as undirected edges, each of which corresponds to a unique edge type. These edge types capture various forms of similarity between connected company pairs. Associated with each edge of a certain type, we calculate a real-numbered weight as an approximation of the similarity level of that type. It is important to note that the constructed edges do not represent an exhaustive list of all possible edges due to incomplete information. Consequently, this leads to a sparse and occasionally skewed distribution of edges for individual relation/edge types. Such characteristics pose additional challenges for downstream learning tasks. Please refer to our paper for a detailed definition of edge types and weight calculations.

Nodes: The graph includes all companies connected by edges defined previously. Each node represents a company and is associated with a descriptive text, such as "Klarna is a fintech company that provides support for direct and post-purchase payments ...". To comply with privacy and confidentiality requirements, we encoded the text into numerical embeddings using four different pre-trained text embedding models: mSBERT (multilingual Sentence BERT), ADA2, SimCSE (fine-tuned on the raw company descriptions) and PAUSE.

Evaluation Tasks. The primary goal of CompanyKG is to develop algorithms and models for quantifying the similarity between pairs of companies. In order to evaluate the effectiveness of these methods, we have carefully curated three evaluation tasks:

Similarity Prediction (SP). To assess the accuracy of pairwise company similarity, we constructed the SP evaluation set comprising 3,219 pairs of companies that are labeled either as positive (similar, denoted by "1") or negative (dissimilar, denoted by "0"). Of these pairs, 1,522 are positive and 1,697 are negative.

Competitor Retrieval (CR). Each sample contains one target company and one of its direct competitors. It contains 76 distinct target companies, each of which has 5.3 competitors annotated in average. For a given target company A with N direct competitors in this CR evaluation set, we expect a competent method to retrieve all N competitors when searching for similar companies to A.

Similarity Ranking (SR) is designed to assess the ability of any method to rank candidate companies (numbered 0 and 1) based on their similarity to a query company. Paid human annotators, with backgrounds in engineering, science, and investment, were tasked with determining which candidate company is more similar to the query company. It resulted in an evaluation set comprising 1,856 rigorously labeled ranking questions. We retained 20% (368 samples) of this set as a validation set for model development.

Edge Prediction (EP) evaluates a model's ability to predict future or missing relationships between companies, providing forward-looking insights for investment professionals. The EP dataset, derived (and sampled) from new edges collected between April 6, 2023, and May 25, 2024, includes 40,000 samples, with edges not present in the pre-existing CompanyKG (a snapshot up until April 5, 2023).

Background and Motivation

In the investment industry, it is often essential to identify similar companies for a variety of purposes, such as market/competitor mapping and Mergers & Acquisitions (M&A). Identifying comparable companies is a critical task, as it can inform investment decisions, help identify potential synergies, and reveal areas for growth and improvement. The accurate quantification of inter-company similarity, also referred to as company similarity quantification, is the cornerstone to successfully executing such tasks. However, company similarity quantification is often a challenging and time-consuming process, given the vast amount of data available on each company, and the complex and diversified relationships among them.

While there is no universally agreed definition of company similarity, researchers and practitioners in PE industry have adopted various criteria to measure similarity, typically reflecting the companies' operations and relationships. These criteria can embody one or more dimensions such as industry sectors, employee profiles, keywords/tags, customers' review, financial performance, co-appearance in news, and so on. Investment professionals usually begin with a limited number of companies of interest (a.k.a. seed companies) and require an algorithmic approach to expand their search to a larger list of companies for potential investment.

In recent years, transformer-based Language Models (LMs) have become the preferred method for encoding textual company descriptions into vector-space embeddings. Then companies that are similar to the seed companies can be searched in the embedding space using distance metrics like cosine similarity. The rapid advancements in Large LMs (LLMs), such as GPT-3/4 and LLaMA, have significantly enhanced the performance of general-purpose conversational models. These models, such as ChatGPT, can be employed to answer questions related to similar company discovery and quantification in a Q&A format.

However, graph is still the most natural choice for representing and learning diverse company relations due to its ability to model complex relationships between a large number of entities. By representing companies as nodes and their relationships as edges, we can form a Knowledge Graph (KG). Utilizing this KG allows us to efficiently capture and analyze the network structure of the business landscape. Moreover, KG-based approaches allow us to leverage powerful tools from network science, graph theory, and graph-based machine learning, such as Graph Neural Networks (GNNs), to extract insights and patterns to facilitate similar company analysis. While there are various company datasets (mostly commercial/proprietary and non-relational) and graph datasets available (mostly for single link/node/graph-level predictions), there is a scarcity of datasets and benchmarks that combine both to create a large-scale KG dataset expressing rich pairwise company relations.

Source Code and Tutorial:
https://github.com/llcresearch/CompanyKG2

Paper: to be published
P
Huawei-UK-University-Challenge-Competition-2021 Dataset
paperswithcode.com
Updated Dec 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Huawei-UK-University-Challenge-Competition-2021 Dataset [Dataset]. https://paperswithcode.com/dataset/huawei-uk-university-challenge-competition
Explore at:
Dataset updated
Dec 23, 2021
Area covered
United Kingdom
Description
Huawei University Challenge Competition 2021

Data Science for Indoor positioning

2.2 Full Mall Graph Clustering Train The sample training data for this problem is a set of 106981 fingerprints (task2_train_fingerprints.json) and some edges between them. We have provided files that indicate three different edge types, all of which should be treated differently.

task2_train_steps.csv indicates edges that connect subsequent steps within a trajectory. These edges should be highly trusted as they indicate a certainty that two fingerprints were recorded from the same floor.

task2_train_elevations.csv indicate the opposite of the steps. These elevations indicate that the fingerprints are almost definitely from a different floor. You can thus extrapolate that if fingerprint $N$ from trajectory $n$ is on a different floor to fingerprint $M$ from trajectory $m$, then all other fingerprints in both trajectories $m$ and $n$ must also be on seperate floors.

task2_train_estimated_wifi_distances.csv are the pre-computed distances that we have calculated using our own distance metric. This metric is imperfect and as such we know that many of these edges will be incorrect (i.e. they will connect two floors together). We suggest that initially you use the edges in this file to construct your initial graph and compute some solution. However, if you get a high score on task1 then you might consider computing your own wifi distances to build a graph.

Your graph can be at one of two levels of detail, either trajectory level or fingerprint level, you can choose what representation you want to use, but ultimately we want to know the trajectory clusters. Trajectory level would have every node as a trajectory and edges between nodes would occur if fingerprints in their trajectories had high similiraty. Fingerprint level would have each fingerprint as a node. You can lookup the trajectory id of the fingerprint using the task2_train_lookup.json to convert between representations.

To help you debug and train your solution we have provided a ground truth for some of the trajectories in task2_train_GT.json. In this file the keys are the trajectory ids (the same as in task2_train_lookup.json) and the values are the real floor id of the building.

Test The test set is the exact same format as the training set (for a seperate building, we weren't going to make it that easy ;) ) but we haven't included the equivalent ground truth file. This will be withheld to allow us to score your solution.

Points to consider - When doing this on real data we do not know the exact number of floors to expect, so your model will need to decide this for itself as well. For this data, do not expect to find more than 20 floors or less than 3 floors. - Sometimes in balcony areas the similarity between fingerprints on different floors can be deceivingly high. In these cases it may be wise to try to rely on the graph information rather than the individual similarity (e.g. what is the similarity of the other neighbour nodes to this candidate other-floor node?) - To the best of our knowledge there are no outlier fingerprints in the data that do not belong to the building. Every fingerprint belongs to a floor

2.3 Loading the data In this section we will provide some example code to open the files and construct both types of graph.

import os import json import csv import networkx as nx from tqdm import tqdm path_to_data = "task2_for_participants/train" with open(os.path.join(path_to_data,"task2_train_estimated_wifi_distances.csv")) as f: wifi = [] reader = csv.DictReader(f) for line in tqdm(reader): wifi.append([line['id1'],line['id2'],float(line['estimated_distance'])]) with open(os.path.join(path_to_data,"task2_train_elevations.csv")) as f: elevs = [] reader = csv.DictReader(f) for line in tqdm(reader): elevs.append([line['id1'],line['id2']]) with open(os.path.join(path_to_data,"task2_train_steps.csv")) as f: steps = [] reader = csv.DictReader(f) for line in tqdm(reader): steps.append([line['id1'],line['id2'],float(line['displacement'])]) fp_lookup_path = os.path.join(path_to_data,"task2_train_lookup.json") gt_path = os.path.join(path_to_data,"task2_train_GT.json") with open(fp_lookup_path) as f: fp_lookup = json.load(f) with open(gt_path) as f: gt = json.load(f)

Fingerprint graph This is one way to construct the fingerprint-level graph, where each node in the graph is a fingerprint. We have added edge weights that correspond to the estimated/true distances from the wifi and pdr edges respectively. We have also added elevation edges to indicate this relationship. You might want to explicitly enforce that there are none of these edges (or any valid elevation edge between trajectories) when developing your solution.

G = nx.Graph() for id1,id2,dist in tqdm(steps): G.add_edge(id1, id2, ty = "s", weight=dist) for id1,id2,dist in tqdm(wifi): G.add_edge(id1, id2, ty = "w", weight=dist) for id1,id2 in tqdm(elevs): G.add_edge(id1, id2, ty = "e")

Trajectory graph The trajectory graph is arguably not as simple as you need to think of a way to represent many wifi connections between trajectories. In the example graph below we just take the mean distance as a weight, but is this really the best representation?

B = nx.Graph() Get all the trajectory ids from the lookup valid_nodes = set(fp_lookup.values()) for node in valid_nodes: B.add_node(node) Either add an edge or append the distance to the edge data for id1,id2,dist in tqdm(wifi): if not B.has_edge(fp_lookup[str(id1)], fp_lookup[str(id2)]): B.add_edge(fp_lookup[str(id1)], fp_lookup[str(id2)], ty = "w", weight=[dist]) else: B[fp_lookup[str(id1)]][fp_lookup[str(id2)]]['weight'].append(dist) Compute the mean edge weight for edge in B.edges(data=True): B[edge[0]][edge[1]]['weight'] = sum(B[edge[0]][edge[1]]['weight'])/len(B[edge[0]][edge[1]]['weight']) If you have made a wifi connection between trajectories with an elev, delete the edge for id1,id2 in tqdm(elevs): if B.has_edge(fp_lookup[str(id1)], fp_lookup[str(id2)]): B.remove_edge(fp_lookup[str(id1)], fp_lookup[str(id2)])
P
SupplyGraph Dataset
paperswithcode.com
Updated Jan 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Azmine Toushik Wasi; MD Shafikul Islam; Adipto Raihan Akib (2024). SupplyGraph Dataset [Dataset]. https://paperswithcode.com/dataset/supplygraph
Explore at:
Dataset updated
Jan 30, 2024
Authors
Azmine Toushik Wasi; MD Shafikul Islam; Adipto Raihan Akib
Description
Graph Neural Networks (GNNs) have gained traction across different domains such as transportation, bio-informatics, language processing, and computer vision. However, there is a noticeable absence of research on applying GNNs to supply chain networks. Supply chain networks are inherently graphlike in structure, making them prime candidates for applying GNN methodologies. This opens up a world of possibilities for optimizing, predicting, and solving even the most complex supply chain problems. A major setback in this approach lies in the absence of real-world benchmark datasets to facilitate the research and resolution of supply chain problem using GNNs. To address the issue, we present a real-world benchmark dataset for temporal tasks, obtained from one of the leading FMCG companies in Bangladesh, focusing on supply chain planning for production purposes. The dataset includes temporal data as node features to enable sales predictions, production planning, and the identification of factory issues. By utilizing this dataset, researchers can employ GNNs to address numerous supply chain problems, thereby advancing the field of supply chain analytics and planning.

Dataset GitHub arXiv PDF on arXiv

Read the paper to learn more details and data statistics.
PheKnowLator Human Disease Knowledge Graph Benchmarks Archive
zenodo.org
bin
Updated Feb 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PheKnowLator Ecosystem Developers; PheKnowLator Ecosystem Developers (2024). PheKnowLator Human Disease Knowledge Graph Benchmarks Archive [Dataset]. http://doi.org/10.5281/zenodo.10689968
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10689968
Dataset updated
Feb 22, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
PheKnowLator Ecosystem Developers; PheKnowLator Ecosystem Developers
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
PKT Human Disease KG Benchmark Builds

The PheKnowLator (PKT) Human Disease KG (PKT-KG) was built to model mechanisms of human disease, which includes the Central Dogma and represents multiple biological scales of organization including molecular, cellular, tissue, and organ. The knowledge representation was designed in collaboration with a PhD-level molecular biologist (Figure).

The PKT Human Disease KG was constructed using 12 OBO Foundry ontologies, 31 Linked Open Data sets, and results from two large-scale experiments (Supplementary Material). The 12 OBO Foundry ontologies were selected to represent chemicals and vaccines (i.e., ChEBI and Vaccine Ontology), cells and cell lines (i.e., Cell Ontology, Cell Line Ontology), gene/gene product attributes (i.e., Gene Ontology), phenotypes and diseases (i.e., Human Phenotype Ontology, Mondo Disease Ontology), proteins, including complexes and isoforms (i.e., Protein Ontology), pathways (i.e., Pathway Ontology), types and attributes of biological sequences (i.e., Sequence Ontology), and anatomical entities (Uberon ontology). The RO is used to provide relationships between the core OBO Foundry ontologies and database entities.

The PKT Human Disease KG contained 18 node types and 33 edge types. Note that the number of nodes and edge types reflects those that are explicitly added to the core set of OBO Foundry ontologies and does not take into account the node and edge types provided by the ontologies. These nodes and edge types were used to construct 12 different PKT Human Disease benchmark KGs by altering the Knowledge Model (i.e., class- vs. instance-based), Relation Strategy (i.e., standard vs. inverse relations), and Semantic Abstraction (i.e., OWL-NETS (yes/no) with and without Knowledge Model harmonization [OWL-NETS Only vs. OWL-NETS + Harmonization]) parameters. Benchmarks within the PheKnowLator ecosystem are different versions of a KG that can be built under alternative knowledge models, relation strategies, and with or without semantic abstraction. They provide users with the ability to evaluate different modeling decisions (based on the prior mentioned parameters) and to examine the impact of these decisions on different downstream tasks.

The Figures and Tables explaining attributes in the builds can be found here.

Build Data Access

Important Build Information

The benchmarks were originally built and stored using Google Cloud Platform (GCP) resources. For details and a complete description of this process, can be found on GitHub (here). Note that we have developed this Zenodo-based archive for the builds. While the original GCP resources contained all of the resources needed to generate the builds, due to the file size upload limits associated with each archive, we have limited the uploaded files to the KGs, associated metadata, and log files. The list of resources, including their URLs, and date of download, can all be found in the logs associated with each build.

🗂 For additional information on the KG file types please see the following Wiki page, which is also available as a download from this repository (PheKnowLator_HumanDiseaseKG_Output_FileInformation.xlsx).

v1.0.0

KGs: https://zenodo.org/doi/10.5281/zenodo.7030200

Embeddings: https://zenodo.org/doi/10.5281/zenodo.7030188

All Other Build Versions

Class-based Builds

Standard Relations

OWL Build

v2.0.0: MAY2020 ; JAN2021; FEB2021

v2.1.0: MAY2021; JUN2021; JUL2021; AUG2021; SEP2021

v3.0.2: OCT2021; NOV2021

OWL-NETS Build

v2.0.0: MAY2020 ; JAN2021; FEB2021

v2.1.0: MAY2021; JUN2021; JUL2021; AUG2021; SEP2021

v3.0.2: OCT2021; NOV2021

Inverse Relations

OWL Build

v2.0.0: MAY2020 ; JAN2021; FEB2021

v2.1.0: MAY2021; JUN2021; JUL2021; AUG2021; SEP2021

v3.0.2: OCT2021; NOV2021

OWL-NETS Build

v2.0.0: MAY2020 ; JAN2021; FEB2021

v2.1.0: MAY2021; JUN2021; JUL2021; AUG2021; SEP2021

v3.0.2: OCT2021; NOV2021

Instance-based Builds

Standard Relations

OWL Build

v2.0.0: MAY2020 ; JAN2021; FEB2021

v2.1.0: MAY2021; JUN2021; JUL2021; AUG2021; SEP2021

v3.0.2: OCT2021; NOV2021

OWL-NETS Build

v2.0.0: MAY2020 ; JAN2021; FEB2021

v2.1.0: MAY2021; JUN2021; JUL2021; AUG2021; SEP2021

v3.0.2: OCT2021; NOV2021

Inverse Relations

OWL Build

v2.0.0: MAY2020 ; JAN2021; FEB2021

v2.1.0: MAY2021; JUN2021; JUL2021; AUG2021; SEP2021

v3.0.2: OCT2021; NOV2021

OWL-NETS Build

v2.0.0: MAY2020 ; JAN2021; FEB2021

v2.1.0: MAY2021; JUN2021; JUL2021; AUG2021; SEP2021

v3.0.2: OCT2021; NOV2021

Facebook

Twitter

Click to copy link

Link copied

Cite

Dr Corynen (2018). Graph Input Data Example.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.7506209.v1

Graph Input Data Example.xlsx

Explore at:

xlsxAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.7506209.v1

Dataset updated

Dec 26, 2018

Dataset provided by

figshare

Authors

Dr Corynen

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The various performance criteria applied in this analysis include the probability of reaching the ultimate target, the costs, elapsed times and system vulnerability resulting from any intrusion. This Excel file contains all the logical, probabilistic and statistical data entered by a user, and required for the evaluation of the criteria. It also reports the results of all the computations.

Clear search

Close search

Google apps

Main menu

Graph Input Data Example.xlsx

The banksia plot: a method for visually comparing point estimates and...

Datasets for manuscript: A Graph-Based Modeling Framework for Tracing...

Amount of data created, consumed, and stored 2010-2023, with forecasts to...

National Energy Efficiency Data-Framework (NEED) data explorer

https://assets.publishing.service.gov.uk/media/668669197541f54efe51b992/NEED_data_explorer_2024.xlsm">NEED data explorer

https://assets.publishing.service.gov.uk/media/668669367541f54efe51b993/NEED_Data_Explorer_Values_2024.csv">NEED data explorer

Data from: Results of KROWN: Knowledge Graph Construction Benchmark

Datasets for manuscript: ADAM: A Web Platform for Graph-Based Modeling and...

Focus on London - Income and Spending

Goodreads Book Reviews

Global Biotic Interactions: Interpreted Data Products...

Tutorial: How to use Google Data Studio and ArcGIS Online to create an...

Replication data for: Job-to-Job Mobility and Inflation

Factori | US Consumer Graph Data - Acquisition Marketing & Consumer Data...

Analysis of ExperimentalREsults and Graphing

Data from: Warm Core Ring Trajectories in the Northwest Atlantic Slope Sea...

Semantic Knowledge Graphing Market is Growing at a CAGR of 14.80% from 2024...

CompanyKG Dataset V2.0: A Large-Scale Heterogeneous Graph for Company...

Huawei-UK-University-Challenge-Competition-2021 Dataset

Huawei University Challenge Competition 2021

Data Science for Indoor positioning

SupplyGraph Dataset

PheKnowLator Human Disease Knowledge Graph Benchmarks Archive

PKT Human Disease KG Benchmark Builds

Build Data Access

Important Build Information

v1.0.0

All Other Build Versions

Graph Input Data Example.xlsx