Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends TSMx is an R script that was developed to facilitate multi-temporal-scale visualizations of time-series data. The script requires only a two-column CSV of years and values to plot the slope of the linear regression line for all possible year combinations from the supplied temporal range. The outputs include a time-series matrix showing slope direction based on the linear regression, slope values plotted with colors indicating magnitude, and results of a Mann-Kendall test. The start year is indicated on the y-axis and the end year is indicated on the x-axis. In the example below, the cell in the top-right corner is the direction of the slope for the temporal range 2001–2019. The red line corresponds with the temporal range 2010–2019 and an arrow is drawn from the cell that represents that range. One cell is highlighted with a black border to demonstrate how to read the chart—that cell represents the slope for the temporal range 2004–2014. This publication entry also includes an excel template that produces the same visualizations without a need to interact with any code, though minor modifications will need to be made to accommodate year ranges other than what is provided. TSMx for R was developed by Georgios Boumis; TSMx was originally conceptualized and created by Brad G. Peter in Microsoft Excel. Please refer to the associated publication: Peter, B.G., Messina, J.P., Breeze, V., Fung, C.Y., Kapoor, A. and Fan, P., 2024. Perspectives on modifiable spatiotemporal unit problems in remote sensing of agriculture: evaluating rice production in Vietnam and tools for analysis. Frontiers in Remote Sensing, 5, p.1042624. https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2024.1042624 TSMx sample chart from the supplied Excel template. Data represent the productivity of rice agriculture in Vietnam as measured via EVI (enhanced vegetation index) from the NASA MODIS data product (MOD13Q1.V006). TSMx R script: # import packages library(dplyr) library(readr) library(ggplot2) library(tibble) library(tidyr) library(forcats) library(Kendall) options(warn = -1) # disable warnings # read data (.csv file with "Year" and "Value" columns) data <- read_csv("EVI.csv") # prepare row/column names for output matrices years <- data %>% pull("Year") r.names <- years[-length(years)] c.names <- years[-1] years <- years[-length(years)] # initialize output matrices sign.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) pval.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) slope.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) # function to return remaining years given a start year getRemain <- function(start.year) { years <- data %>% pull("Year") start.ind <- which(data[["Year"]] == start.year) + 1 remain <- years[start.ind:length(years)] return (remain) } # function to subset data for a start/end year combination splitData <- function(end.year, start.year) { keep <- which(data[['Year']] >= start.year & data[['Year']] <= end.year) batch <- data[keep,] return(batch) } # function to fit linear regression and return slope direction fitReg <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(sign(slope)) } # function to fit linear regression and return slope magnitude fitRegv2 <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(slope) } # function to implement Mann-Kendall (MK) trend test and return significance # the test is implemented only for n>=8 getMann <- function(batch) { if (nrow(batch) >= 8) { mk <- MannKendall(batch[['Value']]) pval <- mk[['sl']] } else { pval <- NA } return(pval) } # function to return slope direction for all combinations given a start year getSign <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) signs <- lapply(combs, fitReg) return(signs) } # function to return MK significance for all combinations given a start year getPval <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) pvals <- lapply(combs, getMann) return(pvals) } # function to return slope magnitude for all combinations given a start year getMagn <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) magns <- lapply(combs, fitRegv2) return(magns) } # retrieve slope direction, MK significance, and slope magnitude signs <- lapply(years, getSign) pvals <- lapply(years, getPval) magns <- lapply(years, getMagn) # fill-in output matrices dimension <- nrow(sign.matrix) for (i in 1:dimension) { sign.matrix[i, i:dimension] <- unlist(signs[i]) pval.matrix[i, i:dimension] <- unlist(pvals[i]) slope.matrix[i, i:dimension] <- unlist(magns[i]) } sign.matrix <-...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The main stock market index of United States, the US500, rose to 6159 points on June 27, 2025, gaining 0.30% from the previous session. Over the past month, the index has climbed 4.60% and is up 12.80% compared to the same time last year, according to trading on a contract for difference (CFD) that tracks this benchmark index from United States. United States Stock Market Index - values, historical data, forecasts and news - updated on June of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Construction
This dataset captures the temporal network of Bitcoin (BTC) flow exchanged between entities at the finest time resolution in UNIX timestamp. Its construction is based on the blockchain covering the period from January, 3rd of 2009 to January the 25th of 2021. The blockchain extraction has been made using bitcoin-etl (https://github.com/blockchain-etl/bitcoin-etl) Python package. The entity-entity network is built by aggregating Bitcoin addresses using the common-input heuristic [1] as well as popular Bitcoin users' addresses provided by https://www.walletexplorer.com/
[1] M. Harrigan and C. Fretter, "The Unreasonable Effectiveness of Address Clustering," 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), Toulouse, France, 2016, pp. 368-373, doi: 10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0071.keywords: {Online banking;Merging;Protocols;Upper bound;Bipartite graph;Electronic mail;Size measurement;bitcoin;cryptocurrency;blockchain},
Dataset Description
Bitcoin Activity Temporal Coverage: From 03 January 2009 to 25 January 2021
Overview:
This dataset provides a comprehensive representation of Bitcoin exchanges between entities over a significant temporal span, spanning from the inception of Bitcoin to recent years. It encompasses various temporal resolutions and representations to facilitate Bitcoin transaction network analysis in the context of temporal graphs.
Every dates have been retrieved from bloc UNIX timestamp and GMT timezone.
Contents:
The dataset is distributed across three compressed archives:
All data are stored in the Apache Parquet file format, a columnar storage format optimized for analytical queries. It can be used with pyspark Python package.
orbitaal-stream_graph.tar.gz:
The root directory is STREAM_GRAPH/
Contains a stream graph representation of Bitcoin exchanges at the finest temporal scale, corresponding to the validation time of each block (averaging approximately 10 minutes).
The stream graph is divided into 13 files, one for each year
Files format is parquet
Name format is orbitaal-stream_graph-date-[YYYY]-file-id-[ID].snappy.parquet, where [YYYY] stands for the corresponding year and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year ordering
These files are in the subdirectory STREAM_GRAPH/EDGES/
orbitaal-snapshot-all.tar.gz:
The root directory is SNAPSHOT/
Contains the snapshot network representing all transactions aggregated over the whole dataset period (from Jan. 2009 to Jan. 2021).
Files format is parquet
Name format is orbitaal-snapshot-all.snappy.parquet.
These files are in the subdirectory SNAPSHOT/EDGES/ALL/
orbitaal-snapshot-year.tar.gz:
The root directory is SNAPSHOT/
Contains the yearly resolution of snapshot networks
Files format is parquet
Name format is orbitaal-snapshot-date-[YYYY]-file-id-[ID].snappy.parquet, where [YYYY] stands for the corresponding year and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year ordering
These files are in the subdirectory SNAPSHOT/EDGES/year/
orbitaal-snapshot-month.tar.gz:
The root directory is SNAPSHOT/
Contains the monthly resoluted snapshot networks
Files format is parquet
Name format is orbitaal-snapshot-date-[YYYY]-[MM]-file-id-[ID].snappy.parquet, where
[YYYY] and [MM] stands for the corresponding year and month, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year and month ordering
These files are in the subdirectory SNAPSHOT/EDGES/month/
orbitaal-snapshot-day.tar.gz:
The root directory is SNAPSHOT/
Contains the daily resoluted snapshot networks
Files format is parquet
Name format is orbitaal-snapshot-date-[YYYY]-[MM]-[DD]-file-id-[ID].snappy.parquet, where
[YYYY], [MM], and [DD] stand for the corresponding year, month, and day, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year, month, and day ordering
These files are in the subdirectory SNAPSHOT/EDGES/day/
orbitaal-snapshot-hour.tar.gz:
The root directory is SNAPSHOT/
Contains the hourly resoluted snapshot networks
Files format is parquet
Name format is orbitaal-snapshot-date-[YYYY]-[MM]-[DD]-[hh]-file-id-[ID].snappy.parquet, where
[YYYY], [MM], [DD], and [hh] stand for the corresponding year, month, day, and hour, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year, month, day and hour ordering
These files are in the subdirectory SNAPSHOT/EDGES/hour/
orbitaal-nodetable.tar.gz:
The root directory is NODE_TABLE/
Contains two files in parquet format, the first one gives information related to nodes present in stream graphs and snapshots such as period of activity and associated global Bitcoin balance, and the other one contains the list of all associated Bitcoin addresses.
Small samples in CSV format
orbitaal-stream_graph-2016_07_08.csv and orbitaal-stream_graph-2016_07_09.csv
These two CSV files are related to stream graph representations of an halvening happening in 2016.
orbitaal-snapshot-2016_07_08.csv and orbitaal-snapshot-2016_07_09.csv
These two CSV files are related to daily snapshot representations of an halvening happening in 2016.
The Graph extension for CKAN adds the ability to visualize data resources as graphs, providing users with a more intuitive understanding of the information contained within datasets. It currently supports temporal and categorical graph types, enabling the creation of count-based visualizations over time or across different categories. While the current version is primarily designed for use with an Elasticsearch backend within the Natural History Museum's infrastructure, it is built to be extensible for broader applicability. Key Features: Temporal Graphs: Generates line graphs that display counts of data points over time, based on a designated date field within the resource. This allows to visualize trends and patterns dynamically. Categorical Graphs: Creates bar charts that show the distribution of counts for various values found within a specified field in a resource, making it easier to understand data groupings. Extensible Backend Architecture: Designed to support multiple backend data storage options, with Elasticsearch currently implemented, paving the way for future integration with other systems like PostgreSQL. Template Customization: Includes a template (templates/graph/view.html) that can be extended to override or add custom content to the graph view, giving full control over the visualization design. Configuration Options: Backend selection through the .ini configuration file. Users can choose between Elasticsearch or SQL, allowing administrators to align the extension with their specific requirements. Technical Integration: The Graph extension integrates with CKAN by adding a new view option to resources. Once enabled, the graph view will appear as an available option alongside existing resource viewers. The configuration requires modifying the CKAN .ini file to add 'graph' to the list of enabled plugins and setting the desired backend. The template templates/graph/view.html allows for full customization of the view. Benefits & Impact: The Graph extension enhances the usability of CKAN-managed datasets by providing interactive visualizations of data. Temporal graphs help users identify time-based trends, while categorical graphs illustrate data distribution. The extensible architecture ensures that the extension can be adapted to different data storage systems, improving its versatility. By providing a graphical representation of data, this extension makes it easier to understand complex information, benefiting both data providers and consumers.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Implementation of visibility graphs algorithm in R language. These scripts generate visibility graphs from series built in RStudio or imported into RStudio; Plot the series, the series histogram, the degree distribution of the graphs generated from these series; They determine the fit of the distribution curve in a log-log graph; Calculates the fundamental metrics of complex networks for the visibility graphs associated with each series.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Interactive chart of the Dow Jones Industrial Average (DJIA) stock market index for the last 100 years. Historical data is inflation-adjusted using the headline CPI and each data point represents the month-end closing value. The current month is updated on an hourly basis with today's latest value.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F25663426%2Fef0839f1c6342b2f89b87d08acfb4b74%2Fpeertube_graph(1).png?generation=1746770713374326&alt=media" alt="Peertube "follow" graph">
Above is the Peertube "follow" graph. The colours correspond to the language of the server (purple: unknown, green: French, blue: English, black: German, orange: Italian, grey: others).
Decentralized machine learning---where each client keeps its own data locally and uses its own computational resources to collaboratively train a model by exchanging peer-to-peer messages---is increasingly popular, as it enables better scalability and control over the data. A major challenge in this setting is that learning dynamics depend on the topology of the communication graph, which motivates the use of real graph datasets for benchmarking decentralized algorithms. Unfortunately, existing graph datasets are largely limited to for-profit social networks crawled at a fixed point in time and often collected at the user scale, where links are heavily influenced by the platform and its recommendation algorithms. The Fediverse, which includes several free and open-source decentralized social media platforms such as Mastodon, Misskey, and Lemmy, offers an interesting real-world alternative. We introduce Fedivertex, a new dataset covering seven social networks from the Fediverse, crawled weekly on a weekly basis.
We refer to our paper for a detailed presentation of the graphs: [SOON]
We implemented a simple Python API to interact easily with the dataset: https://pypi.org/project/fedivertex/
pip3 install fedivertex
This package automatically downloads the dataset and generate NetworkX graphs.
from fedivertex import GraphLoader
loader.list_graph_types("mastodon")
# List available graphs for a given software, here federation and active_user
G = loader.get_graph(software = "mastodon", graph_type = "active_user", index = 0, only_largest_component = True)
# G contains the Networkx graph of the giant component of the active users graph at the 1st date of collection
We also provide a Kaggle notebook demonstrating simple operations using this library: https://www.kaggle.com/code/marcdamie/exploratory-graph-data-analysis-of-fedivertex
The dataset contains graphs crawled on a daily basis on 7 social networks from the Fediverse. Each graph quantifies/characterizes the interaction differently depending on the information provided by the public API of these networks.
We present briefly the graph below (NB: the term "instance" refers to servers on the Fediverse):
These graphs provide diverse perspectives on the Fediverse as they capture more or less subtle phenomenon. For example, "federation" graphs are dense, while "intra-instance" graphs are sparse. We have performed a detailed exploratory data analysis in this notebook.
Our CSV files are formatted so that they can be directly imported into Gephi for graph visualization. Find below an example Gephi visualization of the Misskey "active users" graph (without the misskey.io
node). The colours correspond to the language of the server (purple:Unknown, red: Japanese, brown: Korean, blue: English, yellow: Chinese).
https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Different graph types may differ in their suitability to support group comparisons, due to the underlying graph schemas. This study examined whether graph schemas are based on perceptual features (i.e., each graph type, e.g., bar or line graph, has its own graph schema) or common invariant structures (i.e., graph types share common schemas). Furthermore, it was of interest which graph type (bar, line, or pie) is optimal for comparing discrete groups. A switching paradigm was used in three experiments. Two graph types were examined at a time (Experiment 1: bar vs. line, Experiment 2: bar vs. pie, Experiment 3: line vs. pie). On each trial, participants received a data graph presenting the data from three groups and were to determine the numerical difference of group A and group B displayed in the graph. We scrutinized whether switching the type of graph from one trial to the next prolonged RTs. The slowing of RTs in switch trials in comparison to trials with only one graph type can indicate to what extent the graph schemas differ. As switch costs were observed in all pairings of graph types, none of the different pairs of graph types tested seems to fully share a common schema. Interestingly, there was tentative evidence for differences in switch costs among different pairings of graph types. Smaller switch costs in Experiment 1 suggested that the graph schemas of bar and line graphs overlap more strongly than those of bar graphs and pie graphs or line graphs and pie graphs. This implies that results were not in line with completely distinct schemas for different graph types either. Taken together, the pattern of results is consistent with a hierarchical view according to which a graph schema consists of parts shared for different graphs and parts that are specific for each graph type. Apart from investigating graph schemas, the study provided evidence for performance differences among graph types. We found that bar graphs yielded the fastest group comparisons compared to line graphs and pie graphs, suggesting that they are the most suitable when used to compare discrete groups.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Wikipedia temporal graph.
The dataset is based on two Wikipedia SQL dumps: (1) English language articles and (2) user visit counts per page per hour (aka pagecounts). The original datasets are publicly available on the Wikimedia website.
Static graph structure is extracted from English language Wikipedia articles. Redirects are removed. Before building the Wikipedia graph we introduce thresholds on the minimum number of visits per hour and maximum in-degree. We remove the pages that have less than 500 visits per hour at least once during the specified period. Besides, we remove the nodes (pages) with in-degree higher than 8 000 to build a more meaningful initial graph. After cleaning, the graph contains 116 016 nodes (out of total 4 856 639 pages), 6 573 475 edges. The graph can be imported in two ways: (1) using edges.csv and vertices.csv or (2) using enwiki-20150403-graph.gt file that can be opened with open source Python library Graph-Tool.
Time-series data contains users' visit counts from 02:00, 23 September 2014 until 23:00, 30 April 2015. The total number of hours is 5278. The data is stored in two formats: CSV and H5. CSV file contains data in the following format [page_id :: count_views :: layer], where layer represents an hour. In H5 file, each layer corresponds to an hour as well.
all_state_112_graph.zip
The file contains visual representations of the 64 states for each of the 112 graphs. These states comprise 2 absorbing states and 62 transient states. * code 1. DB_six_nodes_random_initial.R: This script file is designed to compute simulation results for Death-Birth process under neutral selection for each graph included in the graphs.RData file. Specifically, it calculates the fixation probability, fixation time, extinction time, and absorption time under uniform initialization. 2. DB_six_nodes_temp.R: This script file is intended to compute simulation results for Death-Birth process under neutral selection for each graph included in the graphs.RData file. It calculates the fixation probability, fixation time, extinction time, and absorption time under temperature-dependent mutation. 3. BD_six_nodes_random_initial.R: This script file is designed to compute simulation results for Birth-Death process under neutral selection for each graph included in the graphs.RData file. Specifically, it calculates the fixation probability, fixation time, extinction time, and absorption time under uniform initialization. ## The speed of evolution on structured populations is crucial for biological and social systems. The likelihood of invasion is key for evolutionary stability, but it makes little sense if it takes long. It is far from known what population structure slows down evolution. We investigate the absorption time of a single neutral mutant for all the 112 non-isomorphic undirected graphs of size 6. We find that about three-quarters of the graphs have an absorption time close to that of the complete graph, less than one-third are accelerators, and more than two-thirds are decelerators. Surprisingly, determining whether a graph has a long absorption time is too complicated to be captured by the joint degree distribution. Via the largest sojourn time, we find that echo-chamber-like graphs, which consist of two homogeneous graphs connected by few sparse links, are likely to slow down absorption. These results are robust for large graphs, mutation patterns as well as evolutionary processes. This work serves as a benchmark for timing evolution with complex interactions and fosters the understanding of polarization in opinion formation.https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Graph Database Market size was valued at USD 2.86 Billion in 2024 and is projected to reach USD 14.58 Billion by 2032, growing at a CAGR of 22.6% from 2026 to 2032.
Global Graph Database Market Drivers
The growth and development of the Graph Database Market is attributed to certain main market drivers. These factors have a big impact on how Graph Database are demanded and adopted in different sectors. Several of the major market forces are as follows:
Growth of Connected Data: Graph databases are excellent at expressing and querying relationships as businesses work with datasets that are more complex and interconnected. Graph databases are becoming more and more in demand as connected data gains significance across multiple industries.
Knowledge Graph Emergence: In fields like artificial intelligence, machine learning, and data analytics, knowledge graphs—which arrange information in a graph structure—are becoming more and more popular. Knowledge graphs can only be created and queried via graph databases, which is what is causing their widespread use.
Analytics and Machine Learning Advancements: Graph databases handle relationships and patterns in data effectively, enabling applications related to advanced analytics and machine learning. Graph databases are becoming more and more in demand when combined with analytics and machine learning as businesses want to extract more insights from their data.
Real-Time Data Processing: Graph databases can process data in real-time, which makes them appropriate for applications that need quick answers and insights. In situations like fraud detection, recommendation systems, and network analysis, this is especially helpful.
Increasing Need for Security and Fraud Detection: Graph databases are useful for fraud security and detection applications because they can identify patterns and abnormalities in linked data. The growing need for graph databases in security solutions is a result of the ongoing evolution of cybersecurity threats.
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
We provide an academic graph based on a snapshot of the Microsoft Academic Graph from 26.05.2021. The Microsoft Academic Graph (MAG) is a large-scale dataset containing information about scientific publication records, their citation relations, as well as authors, affiliations, journals, conferences and fields of study. We acknowledge the Microsoft Academic Graph using the URI https://aka.ms/msracad. For more information regarding schema and the entities present in the original dataset please refer to: MAG schema.
MAG for Heterogeneous Graph Learning We use a recent version of MAG from May 2021 and extract all relevant entities to build a graph that can be directly used for heterogeneous graph learning (node classification, link prediction, etc.). The graph contains all English papers, published after 1900, that have been cited at least 5 times per year since the time of publishing. For fairness, we set a constant citation bound of 100 for papers published before 2000. We further include two smaller subgraphs, one containing computer science papers and one containing medicine papers.
Nodes and features We define the following nodes:
paper with mag_id, graph_id, normalized title, year of publication, citations and a 128-dimension title embedding built using word2vec No. of papers: 5,091,690 (all), 1,014,769 (medicine), 367,576 (computer science);
author with mag_id, graph_id, normalized name, citations No. of authors: 6,363,201 (all), 1,797,980 (medicine), 557,078 (computer science);
field with mag_id, graph_id, level, citations denoting the hierarchical level of the field where 0 is the highest-level (e.g. computer science) No. of fields: 199,457 (all), 83,970 (medicine), 45,454 (computer science);
affiliation with mag_id, graph_id, citations No. of affiliations: 19,421 (all), 12,103 (medicine), 10,139 (computer science);
venue with mag_id, graph_id, citations, type denoting whether conference or journal No. of venues: 24,608 (all), 8,514 (medicine), 9,893 (computer science).
Edges We define the following edges:
author is_affiliated_with affiliation No. of author-affiliation edges: 8,292,253 (all), 2,265,728 (medicine), 665,931 (computer science);
author is_first/last/other paper No. of author-paper edges: 24,907,473 (all), 5,081,752 (medicine), 1,269,485 (computer science);
paper has_citation_to paper No. of author-affiliation edges: 142,684,074 (all), 16,808,837 (medicine), 4,152,804 (computer science);
paper conference/journal_published_at venue No. of author-affiliation edges: 5,091,690 (all), 1,014,769 (medicine), 367,576 (computer science);
paper has_field_L0/L1/L2/L3/L4 field No. of author-affiliation edges: 47,531,366 (all), 9,403,708 (medicine), 3,341,395 (computer science);
field is_in field No. of author-affiliation edges: 339,036 (all), 138,304 (medicine), 83,245 (computer science);
We further include a reverse edge for each edge type defined above that is denoted with the prefix rev_ and can be removed based on the downstream task.
Data structure The nodes and their respective features are provided as separate .tsv files where each feature represents a column. The edges are provided as a pickled python dictionary with schema:
{target_type: {source_type: {edge_type: {target_id: {source_id: {time } } } } } }
We provide three compressed ZIP archives, one for each subgraph (all, medicine, computer science), however we split the file for the complete graph into 500mb chunks. Each archive contains the separate node features and edge dictionary.
https://fred.stlouisfed.org/legal/#copyright-citation-requiredhttps://fred.stlouisfed.org/legal/#copyright-citation-required
Graph and download economic data for Hours: Overtime Hours: Manufacturing: Monthly for Japan (HOOVMN03JPA661N) from 1960 to 2022 about overtime, Japan, hours, manufacturing, and employment.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Inflation Rate in the United States increased to 2.40 percent in May from 2.30 percent in April of 2025. This dataset provides - United States Inflation Rate - actual values, historical data, forecast, chart, statistics, economic calendar and news.
https://www.law.cornell.edu/uscode/text/17/106https://www.law.cornell.edu/uscode/text/17/106
Graph data represents complex relationships across diverse domains, from social networks to healthcare and chemical sciences. However, real-world graph data often spans multiple modalities, including time-varying signals from sensors, semantic information from textual representations, and domain-specific encodings. This dissertation introduces innovative multimodal learning techniques for graph-based predictive modeling, addressing the intricate nature of these multidimensional data representations. The research systematically advances graph learning through innovative methodological approaches across three critical modalities. Initially, we establish robust graph-based methodological foundations through advanced techniques including prompt tuning for heterogeneous graphs and a comprehensive framework for imbalanced learning on graph data. we then extend these methods to time series analysis, demonstrating their practical utility through applications such as hierarchical spatio-temporal modeling for COVID-19 forecasting and graph-based density estimation for anomaly detection in unmanned aerial systems. Finally, we explore textual representations of graphs in the chemical domain, reformulating reaction yield prediction as an imbalanced regression problem to enhance performance in underrepresented high-yield regions critical to chemists.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Although methodologists have provided us ample notice of both the problem of nonproportional hazards (NPHs) and the means of correcting for them, less attention has been paid to the postestimation interpretation. The suggested inclusion of time interactions in our models is more than a statistical fix: These corrections alter the substantive meaning and interpretation of results. Framing the issue as a specific case of multiplicative-interaction modeling, I provide detailed discussion of the problem of NPHs and present several appropriate means of interpreting both the substantive impact and the significance of variables whose effects may change over time.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The main stock market index of United States, the US500, rose to 6173 points on June 27, 2025, gaining 0.52% from the previous session. Over the past month, the index has climbed 4.83% and is up 13.05% compared to the same time last year, according to trading on a contract for difference (CFD) that tracks this benchmark index from United States. United States Stock Market Index - values, historical data, forecasts and news - updated on June of 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A critical issue in intelligent building control is detecting energy consumption anomalies based on intelligent device status data. The building field is plagued by energy consumption anomalies caused by a number of factors, many of which are associated with one another in apparent temporal relationships. For the detection of abnormalities, most traditional detection methods rely solely on a single variable of energy consumption data and its time series changes. Therefore, they are unable to examine the correlation between the multiple characteristic factors that affect energy consumption anomalies and their relationship in time. The outcomes of anomaly detection are one-sided. To address the above problems, this paper proposes an anomaly detection method based on multivariate time series. Firstly, in order to extract the correlation between different feature variables affecting energy consumption, this paper introduces a graph convolutional network to build an anomaly detection framework. Secondly, as different feature variables have different influences on each other, the framework is enhanced by a graph attention mechanism so that time series features with higher influence on energy consumption are given more attention weights, resulting in better anomaly detection of building energy consumption. Finally, the effectiveness of this paper’s method and existing methods for detecting energy consumption anomalies in smart buildings are compared using standard data sets. The experimental results show that the model has better detection accuracy.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The KIST Europe Data Set Introduction This dataset focuses on networks formed by Physarum Polycephalum. Detailed methods are available in the companion paper. Description The KIST Europe data set contains raw and processed data of 81 identical experiments, carefully executed under constant laboratory conditions. The data was produced using the following procedure: A rectangular plastic dish is prepared with a thin sheet of agar. A small amount of dried P. Polycephalum (HU195xHU200) sclerotia crumbs is lined up along the short edge of the dish. The dish is put into a large light-proof box. After approximately 14 hours the plasmodium has resuscitated and starts exploring the available space towards the far side of the dish. Typically, the apical zone needs to cover a distance of several centimeters before network formation can be observed properly. For the next 30 hours we take a top-view image of the growing plasmodium and its changing network every 120 seconds from a fixed position. We stop capturing when the apical zone is about to reach the far side of the dish, which is outside of the observed area. After obtaining sequences of images showing the characteristic networks of P. Polycephalum, we use a software called NEFI to compute corresponding sequences of graph representations of the depicted structures within a predefined region of interest. In addition to the topology the graphs store precise information of the length and width of the edges as well as the coordinates of the nodes in the plane. Given the resulting sequence of graphs we apply filters removing artifacts and other unwanted features of the graphs. Then we proceed to compute a novel node tracking, encoding the time development of every node, taking into account the changing topology of the evolving graphs. Repeating this experiment we obtain 81 sequence of images, which we consider our raw data. We stress at this point that given the inherently uncontrollable growth process of P. Polycephalum, the obtained sequences differ in length and nature. That is to say, in some experiments the organism behaved unfavorably, simply stopping its growth, changing direction or even escaping the container. While such sequences are part of the raw dataset, we excluded them partially or completely from the subsequent graph extraction efforts. The removal of such data reduces the number of series depicting proper network formation to 54. After obtaining the raw data, we transform the images into equivalent mathematical graphs, thus opening up a wealth of possibilities for data analysis. To this end we deploy a convenient automatic software tool called NEFI, which analyzes a digital image, separates the depicted slime mold network from the background and returns a graph representation of said structure. Using this tool effectively requires some moderate amount of image preprocessing. In particular, for each sequence of images it is necessary to decide on a suitable subsequence to be processed. Here we typically exclude parts of the sequence where the apical zone is still visible. For each such subsequence a suitable region of interest is defined manually. The graph stores the position of the nodes in the plane as well as edge attributes such as edge length and widths for each edge. In addition to the output of NEFI including the unfiltered graphs, the dataset contains NEFI's input, i.e. the selected subsequences of images cropped according to their defined regions of interest. Note that some parts of the image series showing proper network formation did not yield optimal representations of the depicted networks. This is a result of images exhibiting strong color gradients rendering them too challenging for automatic network extraction. While such cases can still be handled by tuning the parameters of image processing manually on an image per image basis, we decided to discard affected series from subsequent processing efforts. As a result the number of usable graph sequences of highest quality reduced to 36 to which we apply a set of filters removing artifacts, isolated small components and dead-end paths. Thus we obtain a total of 3134 distinct filtered graphs faithfully reflecting the topology and edge attributes that P. Polycephalum displayed during the wet-lab experiments. At this point available graph analysis packages or custom written analysis code can be deployed to investigate the data in various ways. The dataset includes the filtered graphs as well as all corresponding graph drawings. The latter enable a quick visual inspection of the results of the graph extraction. Given the obtained time-ordered sequences of graphs the development of the entire graph can be investigated. However, one may also study what happens to single nodes as P. Polycephalum evolves. Given a graph in a time ordered sequence of graphs, let us pick any node u. Can we find pick a set of nodes from graphs in the sequence that are equivalent to u, that is, all nodes in...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.