Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset of our EMNLP 2021 paper:
Graphine: A Dataset for Graph-aware Terminology Definition Generation.
Please read the "readme.md" in it for the format of the dataset.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Contains a graph representation of the the English dictionary, where each word is a node and its edges are defined when a word appears in a definition. The JSON file is of the form:
JSON
{word: [Each, word, in, its, definition]
... }
Use this dataset to explore the structure of natural language!
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Zhang et al. (https://link.springer.com/article/10.1140/epjb/e2017-80122-8) suggest a temporal random network with changing dynamics that follow a Markov process, allowing for a continuous-time network history moving from a static definition of a random graph with a fixed number of nodes n and edge probability p to a temporal one. Defining lambda = probability per time granule of a new edge to appear and mu = probability per time granule of an existing edge to disappear, Zhang et al. show that the equilibrium probability of an edge is p=lambda/(lambda+mu) Our implementation, a Python package that we refer to as RandomDynamicGraph https://github.com/ScanLab-ossi/DynamicRandomGraphs, generates large-scale dynamic random graphs according to the defined density. The package focuses on massive data generation; it uses efficient math calculations, writes to file instead of in-memory when datasets are too large, and supports multi-processing. Please note the datetime is arbitrary.
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
We provide an academic graph based on a snapshot of the Microsoft Academic Graph from 26.05.2021. The Microsoft Academic Graph (MAG) is a large-scale dataset containing information about scientific publication records, their citation relations, as well as authors, affiliations, journals, conferences and fields of study. We acknowledge the Microsoft Academic Graph using the URI https://aka.ms/msracad. For more information regarding schema and the entities present in the original dataset please refer to: MAG schema.
MAG for Heterogeneous Graph Learning We use a recent version of MAG from May 2021 and extract all relevant entities to build a graph that can be directly used for heterogeneous graph learning (node classification, link prediction, etc.). The graph contains all English papers, published after 1900, that have been cited at least 5 times per year since the time of publishing. For fairness, we set a constant citation bound of 100 for papers published before 2000. We further include two smaller subgraphs, one containing computer science papers and one containing medicine papers.
Nodes and features We define the following nodes:
paper with mag_id, graph_id, normalized title, year of publication, citations and a 128-dimension title embedding built using word2vec No. of papers: 5,091,690 (all), 1,014,769 (medicine), 367,576 (computer science);
author with mag_id, graph_id, normalized name, citations No. of authors: 6,363,201 (all), 1,797,980 (medicine), 557,078 (computer science);
field with mag_id, graph_id, level, citations denoting the hierarchical level of the field where 0 is the highest-level (e.g. computer science) No. of fields: 199,457 (all), 83,970 (medicine), 45,454 (computer science);
affiliation with mag_id, graph_id, citations No. of affiliations: 19,421 (all), 12,103 (medicine), 10,139 (computer science);
venue with mag_id, graph_id, citations, type denoting whether conference or journal No. of venues: 24,608 (all), 8,514 (medicine), 9,893 (computer science).
Edges We define the following edges:
author is_affiliated_with affiliation No. of author-affiliation edges: 8,292,253 (all), 2,265,728 (medicine), 665,931 (computer science);
author is_first/last/other paper No. of author-paper edges: 24,907,473 (all), 5,081,752 (medicine), 1,269,485 (computer science);
paper has_citation_to paper No. of author-affiliation edges: 142,684,074 (all), 16,808,837 (medicine), 4,152,804 (computer science);
paper conference/journal_published_at venue No. of author-affiliation edges: 5,091,690 (all), 1,014,769 (medicine), 367,576 (computer science);
paper has_field_L0/L1/L2/L3/L4 field No. of author-affiliation edges: 47,531,366 (all), 9,403,708 (medicine), 3,341,395 (computer science);
field is_in field No. of author-affiliation edges: 339,036 (all), 138,304 (medicine), 83,245 (computer science);
We further include a reverse edge for each edge type defined above that is denoted with the prefix rev_ and can be removed based on the downstream task.
Data structure The nodes and their respective features are provided as separate .tsv files where each feature represents a column. The edges are provided as a pickled python dictionary with schema:
{target_type: {source_type: {edge_type: {target_id: {source_id: {time } } } } } }
We provide three compressed ZIP archives, one for each subgraph (all, medicine, computer science), however we split the file for the complete graph into 500mb chunks. Each archive contains the separate node features and edge dictionary.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States - Balance Sheet: Tier 1 Leverage Capital (PCA Definition) was 2221887.22900 Mil. of U.S. $ in October of 2024, according to the United States Federal Reserve. Historically, United States - Balance Sheet: Tier 1 Leverage Capital (PCA Definition) reached a record high of 2221887.22900 in October of 2024 and a record low of 2114377.65500 in April of 2023. Trading Economics provides the current actual value, an historical data chart and related indicators for United States - Balance Sheet: Tier 1 Leverage Capital (PCA Definition) - last updated from the United States Federal Reserve on May of 2025.
These data were used to examine grammatical structures and patterns within a set of geospatial glossary definitions. Objectives of our study were to analyze the semantic structure of input definitions, use this information to build triple structures of RDF graph data, upload our lexicon to a knowledge graph software, and perform SPARQL queries on the data. Upon completion of this study, SPARQL queries were proven to effectively convey graph triples which displayed semantic significance. These data represent and characterize the lexicon of our input text which are used to form graph triples. These data were collected in 2024 by passing text through multiple Python programs utilizing spaCy (a natural language processing library) and its pre-trained English transformer pipeline. Before data was processed by the Python programs, input definitions were first rewritten as natural language and formatted as tabular data. Passages were then tokenized and characterized by their part-of-speech, tag, dependency relation, dependency head, and lemma. Each word within the lexicon was tokenized. A stop-words list was utilized only to remove punctuation and symbols from the text, excluding hyphenated words (ex. bowl-shaped) which remained as such. The tokens’ lemmas were then aggregated and totaled to find their recurrences within the lexicon. This procedure was repeated for tokenizing noun chunks using the same glossary definitions.
https://fred.stlouisfed.org/legal/#copyright-citation-requiredhttps://fred.stlouisfed.org/legal/#copyright-citation-required
Graph and download economic data for M3, Alternate Definition 2 for Germany (MAM3A2DEM189S) from Jan 1974 to Dec 1998 about M3, Germany, and monetary aggregates.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States - Balance Sheet: Total Risk Based Capital (PCA Definition) was 2319171.53100 Mil. of U.S. $ in January of 2025, according to the United States Federal Reserve. Historically, United States - Balance Sheet: Total Risk Based Capital (PCA Definition) reached a record high of 2319171.53100 in October of 2024 and a record low of 322350.26900 in January of 1990. Trading Economics provides the current actual value, an historical data chart and related indicators for United States - Balance Sheet: Total Risk Based Capital (PCA Definition) - last updated from the United States Federal Reserve on July of 2025.
CompanyKG is a heterogeneous graph consisting of 1,169,931 nodes and 50,815,503 undirected edges, with each node representing a real-world company and each edge signifying a relationship between the connected pair of companies.
Edges: We model 15 different inter-company relations as undirected edges, each of which corresponds to a unique edge type. These edge types capture various forms of similarity between connected company pairs. Associated with each edge of a certain type, we calculate a real-numbered weight as an approximation of the similarity level of that type. It is important to note that the constructed edges do not represent an exhaustive list of all possible edges due to incomplete information. Consequently, this leads to a sparse and occasionally skewed distribution of edges for individual relation/edge types. Such characteristics pose additional challenges for downstream learning tasks. Please refer to our paper for a detailed definition of edge types and weight calculations.
Nodes: The graph includes all companies connected by edges defined previously. Each node represents a company and is associated with a descriptive text, such as "Klarna is a fintech company that provides support for direct and post-purchase payments ...". To comply with privacy and confidentiality requirements, we encoded the text into numerical embeddings using four different pre-trained text embedding models: mSBERT (multilingual Sentence BERT), ADA2, SimCSE (fine-tuned on the raw company descriptions) and PAUSE.
Evaluation Tasks. The primary goal of CompanyKG is to develop algorithms and models for quantifying the similarity between pairs of companies. In order to evaluate the effectiveness of these methods, we have carefully curated three evaluation tasks:
Similarity Prediction (SP). To assess the accuracy of pairwise company similarity, we constructed the SP evaluation set comprising 3,219 pairs of companies that are labeled either as positive (similar, denoted by "1") or negative (dissimilar, denoted by "0"). Of these pairs, 1,522 are positive and 1,697 are negative.
Competitor Retrieval (CR). Each sample contains one target company and one of its direct competitors. It contains 76 distinct target companies, each of which has 5.3 competitors annotated in average. For a given target company A with N direct competitors in this CR evaluation set, we expect a competent method to retrieve all N competitors when searching for similar companies to A.
Similarity Ranking (SR) is designed to assess the ability of any method to rank candidate companies (numbered 0 and 1) based on their similarity to a query company. Paid human annotators, with backgrounds in engineering, science, and investment, were tasked with determining which candidate company is more similar to the query company. It resulted in an evaluation set comprising 1,856 rigorously labeled ranking questions. We retained 20% (368 samples) of this set as a validation set for model development.
Edge Prediction (EP) evaluates a model's ability to predict future or missing relationships between companies, providing forward-looking insights for investment professionals. The EP dataset, derived (and sampled) from new edges collected between April 6, 2023, and May 25, 2024, includes 40,000 samples, with edges not present in the pre-existing CompanyKG (a snapshot up until April 5, 2023).
Background and Motivation
In the investment industry, it is often essential to identify similar companies for a variety of purposes, such as market/competitor mapping and Mergers & Acquisitions (M&A). Identifying comparable companies is a critical task, as it can inform investment decisions, help identify potential synergies, and reveal areas for growth and improvement. The accurate quantification of inter-company similarity, also referred to as company similarity quantification, is the cornerstone to successfully executing such tasks. However, company similarity quantification is often a challenging and time-consuming process, given the vast amount of data available on each company, and the complex and diversified relationships among them.
While there is no universally agreed definition of company similarity, researchers and practitioners in PE industry have adopted various criteria to measure similarity, typically reflecting the companies' operations and relationships. These criteria can embody one or more dimensions such as industry sectors, employee profiles, keywords/tags, customers' review, financial performance, co-appearance in news, and so on. Investment professionals usually begin with a limited number of companies of interest (a.k.a. seed companies) and require an algorithmic approach to expand their search to a larger list of companies for potential investment.
In recent years, transformer-based Language Models (LMs) have become the preferred method for encoding textual company descriptions into vector-space embeddings. Then companies that are similar to the seed companies can be searched in the embedding space using distance metrics like cosine similarity. The rapid advancements in Large LMs (LLMs), such as GPT-3/4 and LLaMA, have significantly enhanced the performance of general-purpose conversational models. These models, such as ChatGPT, can be employed to answer questions related to similar company discovery and quantification in a Q&A format.
However, graph is still the most natural choice for representing and learning diverse company relations due to its ability to model complex relationships between a large number of entities. By representing companies as nodes and their relationships as edges, we can form a Knowledge Graph (KG). Utilizing this KG allows us to efficiently capture and analyze the network structure of the business landscape. Moreover, KG-based approaches allow us to leverage powerful tools from network science, graph theory, and graph-based machine learning, such as Graph Neural Networks (GNNs), to extract insights and patterns to facilitate similar company analysis. While there are various company datasets (mostly commercial/proprietary and non-relational) and graph datasets available (mostly for single link/node/graph-level predictions), there is a scarcity of datasets and benchmarks that combine both to create a large-scale KG dataset expressing rich pairwise company relations.
Source Code and Tutorial:https://github.com/llcresearch/CompanyKG2
Paper: to be published
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains a list of 186 Digital Humanities projects leveraging information visualisation methods. Each project has been classified according to visualisation and interaction techniques, narrativity and narrative solutions, domain, methods for the representation of uncertainty and interpretation, and the employment of critical and custom approaches to visually represent humanities data.
The project_id
column contains unique internal identifiers assigned to each project. Meanwhile, the last_access
column records the most recent date (in DD/MM/YYYY format) on which each project was reviewed based on the web address specified in the url
column.
The remaining columns can be grouped into descriptive categories aimed at characterising projects according to different aspects:
Narrativity. It reports the presence of narratives employing information visualisation techniques. Here, the term narrative encompasses both author-driven linear data stories and more user-directed experiences where the narrative sequence is composed of user exploration [1]. We define 2 columns to identify projects using visualisation techniques in narrative, or non-narrative sections. Both conditions can be true for projects employing visualisations in both contexts. Columns:
non_narrative
(boolean)
narrative
(boolean)
Domain. The humanities domain to which the project is related. We rely on [2] and the chapters of the first part of [3] to abstract a set of general domains. Column:
domain
(categorical):
History and archaeology
Art and art history
Language and literature
Music and musicology
Multimedia and performing arts
Philosophy and religion
Other: both extra-list domains and cases of collections without a unique or specific thematic focus.
Visualisation of uncertainty and interpretation. Buiding upon the frameworks proposed by [4] and [5], a set of categories was identified, highlighting a distinction between precise and impressional communication of uncertainty. Precise methods explicitly represent quantifiable uncertainty such as missing, unknown, or uncertain data, precisely locating and categorising it using visual variables and positioning. Two sub-categories are interactive distinction, when uncertain data is not visually distinguishable from the rest of the data but can be dynamically isolated or included/excluded categorically through interaction techniques (usually filters); and visual distinction, when uncertainty visually “emerges” from the representation by means of dedicated glyphs and spatial or visual cues and variables. On the other hand, impressional methods communicate the constructed and situated nature of data [6], exposing the interpretative layer of the visualisation and indicating more abstract and unquantifiable uncertainty using graphical aids or interpretative metrics. Two sub-categories are: ambiguation, when the use of graphical expedients—like permeable glyph boundaries or broken lines—visually convey the ambiguity of a phenomenon; and interpretative metrics, when expressive, non-scientific, or non-punctual metrics are used to build a visualisation. Column:
uncertainty_interpretation
(categorical):
Interactive distinction
Visual distinction
Ambiguation
Interpretative metrics
Critical adaptation. We identify projects in which, for what concerns at least a visualisation, the following criteria are fulfilled: 1) avoid uncritical repurposing of prepackaged, generic-use, or ready-made solutions; 2) being tailored and unique to reflect the peculiarities of the phenomena at hand; 3) avoid extreme simplifications to embraces and depict complexity promoting time-spending visualisation-based inquiry. Column:
critical_adaptation
(boolean)
Non-temporal visualisation techniques. We adopt and partially adapt the terminology and definitions from [7]. A column is defined for each type of visualisation and accounts for its presence within a project, also including stacked layouts and more complex variations. Columns and inclusion criteria:
plot
(boolean): visual representations that map data points onto a two-dimensional coordinate system.
cluster_or_set
(bool): sets or cluster-based visualisations used to unveil possible inter-object similarities.
map
(boolean): geographical maps used to show spatial insights. While we do not specify the variants of maps (e.g., pin maps, dot density maps, flow maps, etc.), we make an exception for maps where each data point is represented by another visualisation (e.g., a map where each data point is a pie chart) by accounting for the presence of both in their respective columns.
network
(boolean): visual representations highlighting relational aspects through nodes connected by links or edges.
hierarchical_diagram
(boolean): tree-like structures such as tree diagrams, radial trees, but also dendrograms. They differ from networks for their strictly hierarchical structure and absence of closed connection loops.
treemap
(boolean): still hierarchical, but highlighting quantities expressed by means of area size. It also includes circle packing variants.
word_cloud
(boolean): clouds of words, where each instance’s size is proportional to its frequency in a related context
bars
(boolean): includes bar charts, histograms, and variants. It coincides with “bar charts” in [7] but with a more generic term to refer to all bar-based visualisations.
line_chart
(boolean): the display of information as sequential data points connected by straight-line segments.
area_chart
(boolean): similar to a line chart but with a filled area below the segments. It also includes density plots.
pie_chart
(boolean): circular graphs divided into slices which can also use multi-level solutions.
plot_3d
(boolean): plots that use a third dimension to encode an additional variable.
proportional_area
(boolean): representations used to compare values through area size. Typically, using circle- or square-like shapes.
other
(boolean): it includes all other types of non-temporal visualisations that do not fall into the aforementioned categories.
Temporal visualisations and encodings. In addition to non-temporal visualisations, a group of techniques to encode temporality is considered in order to enable comparisons with [7]. Columns:
timeline
(boolean): the display of a list of data points or spans in chronological order. They include timelines working either with a scale or simply displaying events in sequence. As in [7], we also include structured solutions resembling Gantt chart layouts.
temporal_dimension
(boolean): to report when time is mapped to any dimension of a visualisation, with the exclusion of timelines. We use the term “dimension” and not “axis” as in [7] as more appropriate for radial layouts or more complex representational choices.
animation
(boolean): temporality is perceived through an animation changing the visualisation according to time flow.
visual_variable
(boolean): another visual encoding strategy is used to represent any temporality-related variable (e.g., colour).
Interaction techniques. A set of categories to assess affordable interaction techniques based on the concept of user intent [8] and user-allowed data actions [9]. The following categories roughly match the “processing”, “mapping”, and “presentation” actions from [9] and the manipulative subset of methods of the “how” an interaction is performed in the conception of [10]. Only interactions that affect the visual representation or the aspect of data points, symbols, and glyphs are taken into consideration. Columns:
basic_selection
(boolean): the demarcation of an element either for the duration of the interaction or more permanently until the occurrence of another selection.
advanced_selection
(boolean): the demarcation involves both the selected element and connected elements within the visualisation or leads to brush and link effects across views. Basic selection is tacitly implied.
navigation
(boolean): interactions that allow moving, zooming, panning, rotating, and scrolling the view but only when applied to the visualisation and not to the web page. It also includes “drill” interactions (to navigate through different levels or portions of data detail, often generating a new view that replaces or accompanies the original) and “expand” interactions generating new perspectives on data by expanding and collapsing nodes.
arrangement
(boolean): methods to organise visualisation elements (symbols, glyphs, etc.) or
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
M3, Alternate Definition 4 for Russian Federation was 51421100000000.00000 National Currency in May of 2017, according to the United States Federal Reserve. Historically, M3, Alternate Definition 4 for Russian Federation reached a record high of 51421100000000.00000 in May of 2017 and a record low of 214049700000.00000 in June of 1995. Trading Economics provides the current actual value, an historical data chart and related indicators for M3, Alternate Definition 4 for Russian Federation - last updated from the United States Federal Reserve on June of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This folder contains the formation energy of BDE-db, QM9, PC9, QMugs, and QMugs1.1 datasets by filtering (The training, test, and validation sets were randomly split in a ratio of 0.8, 0.1, and 0.1, respectively). The filtered process is described in the article "Graph-based deep learning models for thermodynamic property prediction: The interplay between target definition, data distribution, featurization, and model architecture" and the code can be found at https://github.com/chimie-paristech-CTM/thermo_GNN.After application of the filter procedure described in the article, final versions of the QM9 (127,007 data points), BDE-db (289,639 data points), PC9 (96,634 data points), QMugs (636,821 data points) and QMugs1.1 (70,546 data points) were obtained and used throughout this study.
https://fred.stlouisfed.org/legal/#copyright-citation-requiredhttps://fred.stlouisfed.org/legal/#copyright-citation-required
Graph and download economic data for Consumer Price Index: Services Less Housing National Definition for Austria (CPSELR02ATM661N) from Jan 1966 to May 2018 about Austria, services, CPI, housing, price index, indexes, and price.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The mp.2019.04.01.json.zip
data set contains 133420 graph-target pairs for structures materials project used to train the latest megnet formation energy models. The whole data is a list with each item being a dictionary. The keys for the dictionary are - "material_id": the material id in materials project- "graph": the graph dictionary computed with megnet, cutoff radius is 5 A- "formation_energy_per_atom": the formation energy per atom (eV/atom)- "structure": cif string of the structureThe mp_elastic.2019.04.01.json.zip
contains a list of length 12179, where each item is a dictionary with the following keys- "material_id"- "graph"- "K": bulk modulus- "G": shear modulus
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
public_supplementary_material.pdf includes the questionnaire, the tutorial, the instructions and tasks shown during the experiment and the visual and textual activity definitions for the tasks used for the experiment reported in our paper. data.xls includes all our raw data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The study examines different graph-based methods of detecting anomalous activities on digital markets, proposing the most efficient way to increase market actors’ protection and reduce information asymmetry. Anomalies are defined below as both bots and fraudulent users (who can be both bots and real people). Methods are compared against each other, and state-of-the-art results from the literature and a new algorithm is proposed. The goal is to find an efficient method suitable for threat detection, both in terms of predictive performance and computational efficiency. It should scale well and remain robust on the advancements of the newest technologies. The article utilized three publicly accessible graph-based datasets: one describing the Twitter social network (TwiBot-20) and two describing Bitcoin cryptocurrency markets (Bitcoin OTC and Bitcoin Alpha). In the former, an anomaly is defined as a bot, as opposed to a human user, whereas in the latter, an anomaly is a user who conducted a fraudulent transaction, which may (but does not have to) imply being a bot. The study proves that graph-based data is a better-performing predictor than text data. It compares different graph algorithms to extract feature sets for anomaly detection models. It states that methods based on nodes’ statistics result in better model performance than state-of-the-art graph embeddings. They also yield a significant improvement in computational efficiency. This often means reducing the time by hours or enabling modeling on significantly larger graphs (usually not feasible in the case of embeddings). On that basis, the article proposes its own graph-based statistics algorithm. Furthermore, using embeddings requires two engineering choices: the type of embedding and its dimension. The research examines whether there are types of graph embeddings and dimensions that perform significantly better than others. The solution turned out to be dataset-specific and needed to be tailored on a case-by-case basis, adding even more engineering overhead to using embeddings (building a leaderboard of grid of embedding instances, where each of them takes hours to be generated). This, again, speaks in favor of the proposed algorithm based on nodes’ statistics. The research proposes its own efficient algorithm, which makes this engineering overhead redundant.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is from a study that intends to define, prototype and test a graphical user interface for a housing co-design system. To define the requirements of the interface, we conducted interviews with professionals of architecture, urbanism and social sciences areas, as well as with housing cooperatives and inhabitants of these institutions. An interface solution was prototyped, tested and refined. Then we conducted a heuristic evaluation and a summative evaluation. Such evaluations involved the testing of a high-fidelity prototype, to receive feedback from UX/UI experts, potential users (inhabitants) and architects.
S1_File refers to the interview protocol used with the three groups of interviewees. We share the English and Portuguese versions of the interviews with professionals and the original (Portuguese) and translated versions of the remaining ones since these were conducted in Portuguese.
S2_File is a dataset reporting the results of the interviews. Each question includes the answers given and the identification (anonymized) of the interviewees who responded to that question.
S3_File describes the usability issues identified by the experts during the heuristic evaluation of the high-fidelity prototype. The first page organizes the issues by severity (left) and priority (right). The remaining pages have a table for each issue, including rows for problem designation, heuristic violated, problem description, solution proposal, severity degree, and an image of the interface pointing to the referred issue.
S4_File refers to the results of the heuristic evaluation. It includes the identification of each issue, which expert (anonymized) identified such issue, and the heuristic it violates, with the sum of the times each heuristic was violated at the end of each column. At the right, a table presents the consolidation of issues, organized by priority, with columns identifying the issue, severity level, frequency, and priority.
S5_File is the script given to potential users to experiment with the interface during the summative evaluation. This script guides the user through the tasks to perform since the prototype does not have all the features functioning.
S6_File refers to the questionnaires applied during the summative evaluation with inhabitants. It includes a preliminary questionnaire, a Single Ease Question (SEQ) questionnaire, a System Usability Scale (SUS) questionnaire, and a Graphical User Interface (GUI) questionnaire.
S7_File refers to the results of the summative evaluation with inhabitants (potential users).
Page A refers to the preliminary questionnaire with demographic information such as age, gender, education, relationship with digital technologies, etc. Each field corresponds with each inhabitant (anonymised) and the sum and percentage. In the middle, a table presents a summary of the consolidation. In the right possible relations are presented.
Page B presents the results of the SEQ questionnaire, identifying the ratings each inhabitant (anonymized) gave each task. A summary of such values is at the right.
On page C, the result of each rating for the SUS questionnaire given by each inhabitant (anonymized) is shown. At the bottom is the calculation of the SUS score.
Page D presents the GUI questionnaire results for each inhabitant (anonymized), with the average and SD identified for each question. A summary of such results is on the right.
Page E holds the notes taken by the researchers based on their observations regarding task performance. The information is organized in tables for each step of each task and includes the completeness, attempts, and time taken for each inhabitant (anonymized) to complete such task. Also, the sum, percentage, average, and SD are registered. Next to each task is a table identifying how many participants accomplished the task at the first attempt.
Page F refers to the strong and weak aspects identified by the inhabitants. Strong and weak aspects are identified, as well as which inhabitant (anonymized) has identified them. The sum and percentage are also given. At the right, there is a table with the consolidation of results by combining similar answers.
S8_File refers to the results of the discussion with architects after experiencing the interface. Such results relate to the positive and negative aspects that the architects identified in the interface and its usefulness for architecture. The left table identifies the strong and weak aspects that architects (anonymized) identified and the sum and percentage associated with them. The table on the right consolidates such results, with similar responses combined.
The global precipitation time series provides time series charts showing observations of daily precipitation as well as accumulated precipitation compared to normal accumulated amounts for various stations around the world. These charts are created for different scales of time (30, 90, 365 days). Each station has a graphic that contains two charts. The first chart in the graphic is a time series in the format of a line graph, representing accumulated precipitation for each day in the time series compared to the accumulated normal amount of precipitation. The second chart is a bar graph displaying actual daily precipitation. The total accumulation and surplus or deficit amounts are displayed as text on the charts representing the entire time scale, in both inches and millimeters. The graphics are updated daily and the graphics reflect the updated observations and accumulated precipitation amounts including the latest daily data available. The available graphics are rotated, meaning that only the most recently created graphics are available. Previously made graphics are not archived.
https://fred.stlouisfed.org/legal/#copyright-citation-requiredhttps://fred.stlouisfed.org/legal/#copyright-citation-required
Graph and download economic data for M2, Alternate Definition 2 for Canada (MAM2A2CAM189S) from Jan 1968 to Apr 2017 about M2, Canada, and monetary aggregates.
https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/RAS7U7https://heidata.uni-heidelberg.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11588/DATA/RAS7U7
This dataset contains source code and data used in the PhD thesis "Metrics of Graph-Based Meaning Representations with Applications from Parsing Evaluation to Explainable NLG Evaluation and Semantic Search". The dataset is split into five repositories: S3BERT: Source code to run experiments for chapter 9 "Building efficient and effective similarity models from MR metrics". amr-metric-suite, weisfeiler-leman-amr-metrics: Source code to run metric experiments for chapters 4, 5, 6. amr-argument-sim: Source code to run experiments for chapter 8 "Exploring argumentation with MR metrics". bamboo-amr-benchmark: Benchmark for testing and developing metrics (chapter 5).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset of our EMNLP 2021 paper:
Graphine: A Dataset for Graph-aware Terminology Definition Generation.
Please read the "readme.md" in it for the format of the dataset.