19 datasets found

Graph Input Data Example.xlsx
figshare.com
xlsx
Updated Dec 26, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr Corynen (2018). Graph Input Data Example.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.7506209.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7506209.v1
Dataset updated
Dec 26, 2018
Dataset provided by
Figsharehttp://figshare.com/
Authors
Dr Corynen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The various performance criteria applied in this analysis include the probability of reaching the ultimate target, the costs, elapsed times and system vulnerability resulting from any intrusion. This Excel file contains all the logical, probabilistic and statistical data entered by a user, and required for the evaluation of the criteria. It also reports the results of all the computations.
H
Time-Series Matrix (TSMx): A visualization tool for plotting multiscale...
dataverse.harvard.edu
Updated Jul 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georgios Boumis; Brad Peter (2024). Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends [Dataset]. http://doi.org/10.7910/DVN/ZZDYM9
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/ZZDYM9
Dataset updated
Jul 8, 2024
Dataset provided by
Harvard Dataverse
Authors
Georgios Boumis; Brad Peter
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends TSMx is an R script that was developed to facilitate multi-temporal-scale visualizations of time-series data. The script requires only a two-column CSV of years and values to plot the slope of the linear regression line for all possible year combinations from the supplied temporal range. The outputs include a time-series matrix showing slope direction based on the linear regression, slope values plotted with colors indicating magnitude, and results of a Mann-Kendall test. The start year is indicated on the y-axis and the end year is indicated on the x-axis. In the example below, the cell in the top-right corner is the direction of the slope for the temporal range 2001–2019. The red line corresponds with the temporal range 2010–2019 and an arrow is drawn from the cell that represents that range. One cell is highlighted with a black border to demonstrate how to read the chart—that cell represents the slope for the temporal range 2004–2014. This publication entry also includes an excel template that produces the same visualizations without a need to interact with any code, though minor modifications will need to be made to accommodate year ranges other than what is provided. TSMx for R was developed by Georgios Boumis; TSMx was originally conceptualized and created by Brad G. Peter in Microsoft Excel. Please refer to the associated publication: Peter, B.G., Messina, J.P., Breeze, V., Fung, C.Y., Kapoor, A. and Fan, P., 2024. Perspectives on modifiable spatiotemporal unit problems in remote sensing of agriculture: evaluating rice production in Vietnam and tools for analysis. Frontiers in Remote Sensing, 5, p.1042624. https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2024.1042624 TSMx sample chart from the supplied Excel template. Data represent the productivity of rice agriculture in Vietnam as measured via EVI (enhanced vegetation index) from the NASA MODIS data product (MOD13Q1.V006). TSMx R script: # import packages library(dplyr) library(readr) library(ggplot2) library(tibble) library(tidyr) library(forcats) library(Kendall) options(warn = -1) # disable warnings # read data (.csv file with "Year" and "Value" columns) data <- read_csv("EVI.csv") # prepare row/column names for output matrices years <- data %>% pull("Year") r.names <- years[-length(years)] c.names <- years[-1] years <- years[-length(years)] # initialize output matrices sign.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) pval.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) slope.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) # function to return remaining years given a start year getRemain <- function(start.year) { years <- data %>% pull("Year") start.ind <- which(data[["Year"]] == start.year) + 1 remain <- years[start.ind:length(years)] return (remain) } # function to subset data for a start/end year combination splitData <- function(end.year, start.year) { keep <- which(data[['Year']] >= start.year & data[['Year']] <= end.year) batch <- data[keep,] return(batch) } # function to fit linear regression and return slope direction fitReg <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(sign(slope)) } # function to fit linear regression and return slope magnitude fitRegv2 <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(slope) } # function to implement Mann-Kendall (MK) trend test and return significance # the test is implemented only for n>=8 getMann <- function(batch) { if (nrow(batch) >= 8) { mk <- MannKendall(batch[['Value']]) pval <- mk[['sl']] } else { pval <- NA } return(pval) } # function to return slope direction for all combinations given a start year getSign <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) signs <- lapply(combs, fitReg) return(signs) } # function to return MK significance for all combinations given a start year getPval <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) pvals <- lapply(combs, getMann) return(pvals) } # function to return slope magnitude for all combinations given a start year getMagn <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) magns <- lapply(combs, fitRegv2) return(magns) } # retrieve slope direction, MK significance, and slope magnitude signs <- lapply(years, getSign) pvals <- lapply(years, getPval) magns <- lapply(years, getMagn) # fill-in output matrices dimension <- nrow(sign.matrix) for (i in 1:dimension) { sign.matrix[i, i:dimension] <- unlist(signs[i]) pval.matrix[i, i:dimension] <- unlist(pvals[i]) slope.matrix[i, i:dimension] <- unlist(magns[i]) } sign.matrix <-...
f
Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm
plos.figshare.com
docx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic (2023). Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm [Dataset]. http://doi.org/10.1371/journal.pbio.1002128
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pbio.1002128
Dataset updated
May 31, 2023
Dataset provided by
PLOS Biology
Authors
Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.
o
Moving Beyond the Bar Plot and Line Graph To Create Informative and...
openicpsr.org
Updated Jul 2, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jenifer Larson-Hall (2016). Moving Beyond the Bar Plot and Line Graph To Create Informative and Attractive Graphics [Dataset]. http://doi.org/10.3886/E100118V3
Explore at:
Unique identifier
https://doi.org/10.3886/E100118V3
Dataset updated
Jul 2, 2016
Authors
Jenifer Larson-Hall
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data supports the assertions made in a paper (same name as this project) which surveyed 3 Second Language Acquisition journals (Modern Language Journal, Language Learning, and Studies in Second Language Acquisition) from the time of their inception to 2011/2012. The raw data used for calculations about the number of graphics and which type of graphics were published is included in the attached Excel file.
D
Graph Database Market Report | Global Forecast From 2025 To 2033
dataintelo.com
csv, pdf, pptx
Updated Sep 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2024). Graph Database Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-graph-database-market
Explore at:
pptx, pdf, csvAvailable download formats
Dataset updated
Sep 22, 2024
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Graph Database Market Outlook

The global graph database market size was valued at USD 1.5 billion in 2023 and is projected to reach USD 8.5 billion by 2032, growing at a CAGR of 21.2% from 2024 to 2032. The substantial growth of this market is driven primarily by increasing data complexity, advancements in data analytics technologies, and the rising need for more efficient database management systems.

One of the primary growth factors for the graph database market is the exponential increase in data generation. As organizations generate vast amounts of data from various sources such as social media, e-commerce platforms, and IoT devices, the need for sophisticated data management and analysis tools becomes paramount. Traditional relational databases struggle to handle the complexity and interconnectivity of this data, leading to a shift towards graph databases which excel in managing such intricate relationships.

Another significant driver is the growing adoption of artificial intelligence (AI) and machine learning (ML) technologies. These technologies rely heavily on connected data for predictive analytics and decision-making processes. Graph databases, with their inherent ability to model relationships between data points effectively, provide a robust foundation for AI and ML applications. This synergy between AI/ML and graph databases further accelerates market growth.

Additionally, the increasing prevalence of personalized customer experiences across industries like retail, finance, and healthcare is fueling demand for graph databases. Businesses are leveraging graph databases to analyze customer behaviors, preferences, and interactions in real-time, enabling them to offer tailored recommendations and services. This enhanced customer experience translates to higher customer satisfaction and retention, driving further adoption of graph databases.

From a regional perspective, North America currently holds the largest market share due to early adoption of advanced technologies and the presence of key market players. However, significant growth is also anticipated in the Asia-Pacific region, driven by rapid digital transformation, increasing investments in IT infrastructure, and growing awareness of the benefits of graph databases. Europe is also expected to witness steady growth, supported by stringent data management regulations and a strong focus on data privacy and security.

Component Analysis

The graph database market can be segmented into two primary components: software and services. The software segment holds the largest market share, driven by extensive adoption across various industries. Graph database software is designed to create, manage, and query graph databases, offering features such as scalability, high performance, and efficient handling of complex data relationships. The growth in this segment is propelled by continuous advancements and innovations in graph database technologies. Companies are increasingly investing in research and development to enhance the capabilities of their graph database software products, catering to the evolving needs of their customers.

On the other hand, the services segment is also witnessing substantial growth. This segment includes consulting, implementation, and support services provided by vendors to help organizations effectively deploy and manage graph databases. As businesses recognize the benefits of graph databases, the demand for expert services to ensure successful implementation and integration into existing systems is rising. Additionally, ongoing support and maintenance services are crucial for the smooth operation of graph databases, driving further growth in this segment.

The increasing complexity of data and the need for specialized expertise to manage and analyze it effectively are key factors contributing to the growth of the services segment. Organizations often lack the in-house skills required to harness the full potential of graph databases, prompting them to seek external assistance. This trend is particularly evident in large enterprises, where the scale and complexity of data necessitate robust support services.

Moreover, the services segment is benefiting from the growing trend of outsourcing IT functions. Many organizations are opting to outsource their database management needs to specialized service providers, allowing them to focus on their core business activities. This shift towards outsourcing is further bolstering the demand for graph database services, driving market growth.

&l
q
Using iNaturalist Data to Create Charts
qubeshub.org
Updated Dec 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VERONICA BIANCO (2024). Using iNaturalist Data to Create Charts [Dataset]. http://doi.org/10.25334/FEAJ-Q687
Explore at:
Unique identifier
https://doi.org/10.25334/FEAJ-Q687
Dataset updated
Dec 26, 2024
Dataset provided by
QUBES
Authors
VERONICA BIANCO
Description
The purpose of this project is to become comfortable with obtaining citizen science datasets and spreadsheet software systems (e.g., Excel), and to gain experience working with, analyzing, and visualizing scientific data. Students will work independently (pairs or small group optional) to create five different charts including graphs visualizing the data collected in the LOYNO Biodiversity Project.
Knowledge Graph Market By Technology (Graph Databases, Triple Stores),...
verifiedmarketresearch.com
Updated Feb 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VERIFIED MARKET RESEARCH (2025). Knowledge Graph Market By Technology (Graph Databases, Triple Stores), Application (Search and Recommendation Systems, Business Intelligence, Data Integration), End-User (IT and Telecommunications, Healthcare, Retail) & Region for 2025-2032 [Dataset]. https://www.verifiedmarketresearch.com/product/knowledge-graph-market/
Explore at:
Dataset updated
Feb 16, 2025
Dataset provided by
Verified Market Researchhttps://www.verifiedmarketresearch.com/
Authors
VERIFIED MARKET RESEARCH
License
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Time period covered
2025 - 2032
Area covered
Global
Description
Knowledge Graph Market size was valued at USD 7.19 Billion in 2024 and is expected to reach USD 4.1 Billion by 2032, growing at a CAGR of 18.1% from 2025 to 2032.

Knowledge Graph Market Drivers

Enhanced Data Integration and Analysis: Knowledge graphs excel at integrating and analyzing data from diverse sources, including structured, semi-structured, and unstructured data. This enables organizations to gain a holistic view of information and make more informed decisions. Improved Search and Information Retrieval: Knowledge graphs provide a more semantic understanding of information, enabling more accurate and relevant search results. Instead of just keyword matching, knowledge graphs understand the relationships between entities and provide more contextually relevant information. Personalized Experiences: Knowledge graphs can be used to personalize user experiences by understanding individual preferences, interests, and behaviors. This is crucial for applications like personalized recommendations, targeted advertising, and customer service. AI and Machine Learning: Knowledge graphs are essential for powering AI and machine learning applications, such as chatbots, recommendation systems, and fraud detection. They provide a structured representation of knowledge that AI/ML models can easily understand and utilize. Business Intelligence and Decision Making: Knowledge graphs can help businesses gain deeper insights into their customers, markets, and operations. They can be used to identify trends, predict future outcomes, and make more informed business decisions.
Graph Database Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Jun 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). Graph Database Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/graph-database-market
Explore at:
pptx, csv, pdfAvailable download formats
Dataset updated
Jun 28, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Graph Database Market Outlook

According to our latest research, the global graph database market size in 2024 stands at USD 2.92 billion, with a robust compound annual growth rate (CAGR) of 21.6% projected from 2025 to 2033. By the end of 2033, the market is expected to reach approximately USD 21.1 billion. The rapid expansion of this market is primarily driven by the rising need for advanced data analytics, real-time big data processing, and the growing adoption of artificial intelligence and machine learning across various industry verticals. As organizations continue to seek innovative solutions to manage complex and interconnected data, the demand for graph database technologies is accelerating at an unprecedented pace.

One of the most significant growth factors for the graph database market is the exponential increase in data complexity and volume. Traditional relational databases often struggle to efficiently handle highly connected data, which is becoming more prevalent in modern business environments. Graph databases excel at managing relationships between data points, making them ideal for applications such as fraud detection, social network analysis, and recommendation engines. The ability to visualize and query data relationships in real-time provides organizations with actionable insights, enabling faster and more informed decision-making. This capability is particularly valuable in sectors like BFSI, healthcare, and e-commerce, where understanding intricate data connections can lead to substantial competitive advantages.

Another key driver fueling market growth is the widespread digital transformation initiatives undertaken by enterprises worldwide. As businesses increasingly migrate to cloud-based infrastructures and adopt advanced analytics tools, the need for scalable and flexible database solutions becomes paramount. Graph databases offer seamless integration with cloud platforms, supporting both on-premises and cloud deployment models. This flexibility allows organizations to efficiently manage growing data workloads while ensuring security and compliance. Additionally, the proliferation of IoT devices and the surge in unstructured data generation further amplify the demand for graph database solutions, as they are uniquely equipped to handle dynamic and heterogeneous data sources.

The integration of artificial intelligence and machine learning with graph databases is also a pivotal growth factor. AI-driven analytics require robust data models capable of uncovering hidden patterns and relationships within vast datasets. Graph databases provide the foundational infrastructure for such applications, enabling advanced features like predictive analytics, anomaly detection, and personalized recommendations. As more organizations invest in AI-powered solutions to enhance customer experiences and operational efficiency, the adoption of graph database technologies is expected to surge. Furthermore, continuous advancements in graph processing algorithms and the emergence of open-source graph database platforms are lowering entry barriers, fostering innovation, and expanding the market’s reach.

From a regional perspective, North America currently dominates the graph database market, owing to the early adoption of advanced technologies and the presence of major industry players. However, the Asia Pacific region is anticipated to witness the highest growth rate over the forecast period, driven by rapid digitalization, increasing investments in IT infrastructure, and the rising demand for data-driven decision-making across emerging economies. Europe also holds a significant share, supported by stringent data privacy regulations and the growing emphasis on innovation across sectors such as finance, healthcare, and manufacturing. As organizations across all regions recognize the value of graph databases in unlocking business insights, the global market is poised for sustained growth.

Component Analysis

The graph database market is broadly segmented by component into s
Superstore Sales Analysis
kaggle.com
Updated Oct 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Reda Elblgihy (2023). Superstore Sales Analysis [Dataset]. https://www.kaggle.com/datasets/aliredaelblgihy/superstore-sales-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 21, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ali Reda Elblgihy
Description
Analyzing sales data is essential for any business looking to make informed decisions and optimize its operations. In this project, we will utilize Microsoft Excel and Power Query to conduct a comprehensive analysis of Superstore sales data. Our primary objectives will be to establish meaningful connections between various data sheets, ensure data quality, and calculate critical metrics such as the Cost of Goods Sold (COGS) and discount values. Below are the key steps and elements of this analysis:

1- Data Import and Transformation:

Gather and import relevant sales data from various sources into Excel.

Utilize Power Query to clean, transform, and structure the data for analysis.

Merge and link different data sheets to create a cohesive dataset, ensuring that all data fields are connected logically.

2- Data Quality Assessment:

Perform data quality checks to identify and address issues like missing values, duplicates, outliers, and data inconsistencies.

Standardize data formats and ensure that all data is in a consistent, usable state.

3- Calculating COGS:

Determine the Cost of Goods Sold (COGS) for each product sold by considering factors like purchase price, shipping costs, and any additional expenses.

Apply appropriate formulas and calculations to determine COGS accurately.

4- Discount Analysis:

Analyze the discount values offered on products to understand their impact on sales and profitability.

Calculate the average discount percentage, identify trends, and visualize the data using charts or graphs.

5- Sales Metrics:

Calculate and analyze various sales metrics, such as total revenue, profit margins, and sales growth.

Utilize Excel functions to compute these metrics and create visuals for better insights.

6- Visualization:

Create visualizations, such as charts, graphs, and pivot tables, to present the data in an understandable and actionable format.

Visual representations can help identify trends, outliers, and patterns in the data.

7- Report Generation:

Compile the findings and insights into a well-structured report or dashboard, making it easy for stakeholders to understand and make informed decisions.

Throughout this analysis, the goal is to provide a clear and comprehensive understanding of the Superstore's sales performance. By using Excel and Power Query, we can efficiently manage and analyze the data, ensuring that the insights gained contribute to the store's growth and success.
m
Dataset of development of business during the COVID-19 crisis
data.mendeley.com
narcis.nl
Updated Nov 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tatiana N. Litvinova (2020). Dataset of development of business during the COVID-19 crisis [Dataset]. http://doi.org/10.17632/9vvrd34f8t.1
Explore at:
Unique identifier
https://doi.org/10.17632/9vvrd34f8t.1
Dataset updated
Nov 9, 2020
Authors
Tatiana N. Litvinova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.
f
Petre_Slide_CategoricalScatterplotFigShare.pptx
figshare.com
pptx
Updated Sep 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benj Petre; Aurore Coince; Sophien Kamoun (2016). Petre_Slide_CategoricalScatterplotFigShare.pptx [Dataset]. http://doi.org/10.6084/m9.figshare.3840102.v1
Explore at:
pptxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3840102.v1
Dataset updated
Sep 19, 2016
Dataset provided by
figshare
Authors
Benj Petre; Aurore Coince; Sophien Kamoun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Categorical scatterplots with R for biologists: a step-by-step guide

Benjamin Petre1, Aurore Coince2, Sophien Kamoun1

1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK

Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.

Protocol

• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.

• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.

• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.

Notes

• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.

• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.

7 Display the graph in a separate window. Dot colors indicate

replicates

graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()

References

Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.

Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035

Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128

https://cran.r-project.org/

http://ggplot2.org/
Graph Database Vector Search Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Aug 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). Graph Database Vector Search Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/graph-database-vector-search-market
Explore at:
pdf, pptx, csvAvailable download formats
Dataset updated
Aug 4, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
Graph Database Vector Search Market Outlook

According to our latest research, the global Graph Database Vector Search market size reached USD 2.35 billion in 2024, exhibiting robust growth driven by the increasing demand for advanced data analytics and AI-powered search capabilities. The market is expected to expand at a CAGR of 21.7% during the forecast period, propelling the market size to an anticipated USD 16.8 billion by 2033. This remarkable growth trajectory is primarily fueled by the proliferation of big data, the widespread adoption of AI and machine learning, and the growing necessity for real-time, context-aware search solutions across diverse industry verticals.

One of the primary growth factors for the Graph Database Vector Search market is the exponential increase in unstructured and semi-structured data generated by enterprises worldwide. Organizations are increasingly seeking efficient ways to extract meaningful insights from complex datasets, and graph databases paired with vector search capabilities are emerging as the preferred solution. These technologies enable organizations to model intricate relationships and perform semantic searches with unprecedented speed and accuracy. Additionally, the integration of AI and machine learning algorithms with graph databases is enhancing their ability to deliver context-rich, relevant results, thereby improving decision-making processes and business outcomes.

Another significant driver is the rising adoption of recommendation systems and fraud detection solutions across various sectors, particularly in BFSI, retail, and e-commerce. Graph database vector search platforms excel at identifying patterns, anomalies, and connections that traditional relational databases often miss. This capability is crucial for detecting fraudulent activities, building sophisticated recommendation engines, and powering knowledge graphs that underpin intelligent digital experiences. The growing need for personalized customer engagement and proactive risk mitigation is prompting organizations to invest heavily in these advanced technologies, further accelerating market growth.

Furthermore, the shift towards cloud-based deployment models is catalyzing the adoption of graph database vector search solutions. Cloud platforms offer scalability, flexibility, and cost-effectiveness, making it easier for organizations of all sizes to implement and scale graph-powered applications. The availability of managed services and API-driven architectures is reducing the complexity associated with deployment and maintenance, enabling faster time-to-value. As more enterprises migrate their data infrastructure to the cloud, the demand for cloud-native graph database vector search solutions is expected to surge, driving sustained market expansion.

Geographically, North America currently dominates the Graph Database Vector Search market, owing to its advanced IT infrastructure, high adoption rate of AI-driven technologies, and presence of leading technology vendors. However, rapid digital transformation initiatives across Europe and the Asia Pacific are positioning these regions as high-growth markets. The increasing focus on data-driven decision-making, coupled with supportive regulatory frameworks and government investments in AI and big data analytics, is expected to fuel robust growth in these regions over the forecast period.

Component Analysis

The Component segment of the Graph Database Vector Search market is broadly categorized into software and services. The software sub-segment commands the largest share, driven by the relentless innovation in graph database technologies and the integration of advanced vector search functionalities. Organizations are increasingly deploying graph database software to manage complex data relationships, power semantic search, and enhance the performance of AI and machine learning applications. The software market is characterized by the proliferation of both open-source and proprietary solutions, with vendors
Z
A study on real graphs of fake news spreading on Twitter
data.niaid.nih.gov
Updated Aug 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amirhosein Bodaghi (2021). A study on real graphs of fake news spreading on Twitter [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3711599
Explore at:
Dataset updated
Aug 20, 2021
Dataset authored and provided by
Amirhosein Bodaghi
Description
*** Fake News on Twitter ***

These 5 datasets are the results of an empirical study on the spreading process of newly fake news on Twitter. Particularly, we have focused on those fake news which have given rise to a truth spreading simultaneously against them. The story of each fake news is as follow:

1- FN1: A Muslim waitress refused to seat a church group at a restaurant, claiming "religious freedom" allowed her to do so.

2- FN2: Actor Denzel Washington said electing President Trump saved the U.S. from becoming an "Orwellian police state."

3- FN3: Joy Behar of "The View" sent a crass tweet about a fatal fire in Trump Tower.

4- FN4: The animated children's program 'VeggieTales' introduced a cannabis character in August 2018.

5- FN5: In September 2018, the University of Alabama football program ended its uniform contract with Nike, in response to Nike's endorsement deal with Colin Kaepernick.

The data collection has been done in two stages that each provided a new dataset: 1- attaining Dataset of Diffusion (DD) that includes information of fake news/truth tweets and retweets 2- Query of neighbors for spreaders of tweets that provides us with Dataset of Graph (DG).

DD

DD for each fake news story is an excel file, named FNx_DD where x is the number of fake news, and has the following structure:

The structure of excel files for each dataset is as follow:

Each row belongs to one captured tweet/retweet related to the rumor, and each column of the dataset presents a specific information about the tweet/retweet. These columns from left to right present the following information about the tweet/retweet:

User ID (user who has posted the current tweet/retweet)

The description sentence in the profile of the user who has published the tweet/retweet

The number of published tweet/retweet by the user at the time of posting the current tweet/retweet

Date and time of creation of the account by which the current tweet/retweet has been posted

Language of the tweet/retweet

Number of followers

Number of followings (friends)

Date and time of posting the current tweet/retweet

Number of like (favorite) the current tweet had been acquired before crawling it

Number of times the current tweet had been retweeted before crawling it

Is there any other tweet inside of the current tweet/retweet (for example this happens when the current tweet is a quote or reply or retweet)

The source (OS) of device by which the current tweet/retweet was posted

Tweet/Retweet ID

Retweet ID (if the post is a retweet then this feature gives the ID of the tweet that is retweeted by the current post)

Quote ID (if the post is a quote then this feature gives the ID of the tweet that is quoted by the current post)

Reply ID (if the post is a reply then this feature gives the ID of the tweet that is replied by the current post)

Frequency of tweet occurrences which means the number of times the current tweet is repeated in the dataset (for example the number of times that a tweet exists in the dataset in the form of retweet posted by others)

State of the tweet which can be one of the following forms (achieved by an agreement between the annotators):

r : The tweet/retweet is a fake news post

a : The tweet/retweet is a truth post

q : The tweet/retweet is a question about the fake news, however neither confirm nor deny it

n : The tweet/retweet is not related to the fake news (even though it contains the queries related to the rumor, but does not refer to the given fake news)

DG

DG for each fake news contains two files:

A file in graph format (.graph) which includes the information of graph such as who is linked to whom. (This file named FNx_DG.graph, where x is the number of fake news)

A file in Jsonl format (.jsonl) which includes the real user IDs of nodes in the graph file. (This file named FNx_Labels.jsonl, where x is the number of fake news)

Because in the graph file, the label of each node is the number of its entrance in the graph. For example if node with user ID 12345637 be the first node which has been entered into the graph file then its label in the graph is 0 and its real ID (12345637) would be at the row number 1 (because the row number 0 belongs to column labels) in the jsonl file and so on other node IDs would be at the next rows of the file (each row corresponds to 1 user id). Therefore, if we want to know for example what the user id of node 200 (labeled 200 in the graph) is, then in jsonl file we should look at row number 202.

The user IDs of spreaders in DG (those who have had a post in DD) would be available in DD to get extra information about them and their tweet/retweet. The other user IDs in DG are the neighbors of these spreaders and might not exist in DD.
u
Temperature, salinity and rainfall analysis of the Olifants and Breede...
researchdata.up.ac.za
bin
Updated Aug 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Edwin Greyling (2023). Temperature, salinity and rainfall analysis of the Olifants and Breede estuaries [Dataset]. http://doi.org/10.25403/UPresearchdata.23807511.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.25403/UPresearchdata.23807511.v1
Dataset updated
Aug 1, 2023
Dataset provided by
University of Pretoria
Authors
Edwin Greyling
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This MS Excel data has been processed into line graphs to create time series line graphs and data tables which give insight into changing physiochemical water quality characteristics and influences. The study sets out to determine if climate change has had an influence on physiochemical water quality characteristics both within and between the Breede and Olifants estuaries over a nine year monitoring period. The data represents changes and comparisons between salinity, temperature and rainfall within and between the Olifants and Breede river estuaries in the Wester Cape Province of South Africa.
o
Eit Data Of A Human Arm
explore.openaire.eu
zenodo.org
Updated Jun 21, 2009
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brown (2009). Eit Data Of A Human Arm [Dataset]. http://doi.org/10.5281/zenodo.16838
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.16838
Dataset updated
Jun 21, 2009
Authors
Brown
Description
Authors:Brian Brown Date:27th November 1981Brief Description:Data were recorded from Rod Smallwoood's arm on the 27th November 1981; the dot matrix image which shows the ulna and radius bones. We made a 'radiotherapy type' mould of the arm and then put drawing pins through the plastic (pin head inwards) as electrodes. There two sets of data. One is recorded from the arm and the other is with saline filling the mould. The data were published in: D.C. Barber, B.H. Brown, and I.L. Freeston, "Imaging spatial distributions of resistivity using applied potential tomography", Electronics Letters, 19(22):933-935, 1983 http://digital-library.theiet.org/content/journals/10.1049/el_19830637 License:Creative Commons Artistic License (with Attribution)Attribution Requirement:Use or presentation of these data reference this publication: D.C. Barber, B.H. Brown, and I.L. Freeston, "Imaging spatial distributions of resistivity using applied potential tomography", Electronics Letters, 19(22):933-935, 1983 Format:Data are handwritten and scanned into the linked pdf file. The adjacent drive/receive data sets for both the Uniform(Saline) and Arm data and these are included in the attached Excel file. The are 6 columns of data in the xls file. The first three are for the uniform case and give the two reciprocal data sets and the mean of the two. Columns 4-6 are for the arm. I did a quick reconstruction using columns 3 and 6 as ref and data respectively and it looked OK. Methods:The pdf file that is attached shows the line printer output of the data we recorded from Rod Smallwoood's arm on the 27th November 1981 and the dot matrix image which shows the ulna and radius bones. We made a 'radiotherapy type' mould of the arm and then put drawing pins through the plastic (pin head inwards) as electrodes. There two sets of data. One is recorded from the arm and the other is with saline filling the mould. The pdf file also shows my plot of the XY position of the electrodes. Now the data set on the line printer is a complete data set i.e. Drive 1/2 then 1/3 then 1/4 etc for every combination. I could only find the print out for one of the data sets. However, I found my notebook with the adjacent drive/receive data set and this is page 7 of the pdf file. I have extracted the adjacent drive/receive data sets for both the Uniform(Saline) and Arm data and these are included in the attached Excel file. The are 6 columns of data in the xls file. The first three are for the uniform case and give the two reciprocal data sets and the mean of the two. Columns 4-6 are for the arm. I did a quick reconstruction using columns 3 and 6 as ref and data respectively and it looked OK. The first column of data is 104 point as follows. Drive 1/2 receive 3/4 Drive 1/2 receive 4/5 etc Drive 1/2 receive 16/1 Drive 2/3 receive 4/5 Drive 2/3 receive 5/6 etc Drive 2/3 receive 16/1 Drive 4/5 receive 6/7 Drive 4/5 receive 7/8 etc Drive 4/5 receive 16/1 etc etc Drive 14/15 receive 16/1 The second column is the other reciprocal set. I think these data are the ones used to produce the image in the Electronics Letters paper of 1983 - page 1 of my pdf file. Also in the Contributed Data section of the EIDORS project on Sourceforge http://eidors3d.sourceforge.net/data_contrib/bb-human-arm/bb-human-arm.shtml
f
Data from: Excel Templates: A Helpful Tool for Teaching Statistics
tandf.figshare.com
zip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alejandro Quintela-del-Río; Mario Francisco-Fernández (2023). Excel Templates: A Helpful Tool for Teaching Statistics [Dataset]. http://doi.org/10.6084/m9.figshare.3408052.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3408052.v2
Dataset updated
May 30, 2023
Dataset provided by
Taylor & Francis
Authors
Alejandro Quintela-del-Río; Mario Francisco-Fernández
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This article describes a free, open-source collection of templates for the popular Excel (2013, and later versions) spreadsheet program. These templates are spreadsheet files that allow easy and intuitive learning and the implementation of practical examples concerning descriptive statistics, random variables, confidence intervals, and hypothesis testing. Although they are designed to be used with Excel, they can also be employed with other free spreadsheet programs (changing some particular formulas). Moreover, we exploit some possibilities of the ActiveX controls of the Excel Developer Menu to perform interactive Gaussian density charts. Finally, it is important to note that they can be often embedded in a web page, so it is not necessary to employ Excel software for their use. These templates have been designed as a useful tool to teach basic statistics and to carry out data analysis even when the students are not familiar with Excel. Additionally, they can be used as a complement to other analytical software packages. They aim to assist students in learning statistics, within an intuitive working environment. Supplementary materials with the Excel templates are available online.
Graph Input Data
figshare.com
xlsx
Updated Dec 28, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dr Corynen (2018). Graph Input Data [Dataset]. http://doi.org/10.6084/m9.figshare.7203401.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7203401.v1
Dataset updated
Dec 28, 2018
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Dr Corynen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
With the user manual provided at the end of the research manuscript, and the Graph Input Data Example.xlsx as a reference, the user provides all the graph semantic data required to evaluate all the performance criteria for the system.These criteria include the probability that the principal target can be reached, and the costs, elapsed times and total vulnerability resulting from a penetration attempt by one or more intruders.This performance computation is accurate and efficient, requiring an insignificant amount of computation time.It also resolves all the statistical dependencies and probabilistic uncertainties believed to be an important challenge to a risk manager and his or her analysts.User enters the Graph Topological data in this excel file, thereby creating a topological model.
Figure 1.xlsx
figshare.com
xlsx
Updated Jul 10, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shigeki Watanabe (2020). Figure 1.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.12637088.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12637088.v1
Dataset updated
Jul 10, 2020
Dataset provided by
Figsharehttp://figshare.com/
Authors
Shigeki Watanabe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Source data used to make the graphs in Figure 1 of Kusick et al., 2020
Geological_Event_Chart
figshare.com
xlsx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kirstie Wright (2023). Geological_Event_Chart [Dataset]. http://doi.org/10.6084/m9.figshare.8242802.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8242802.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Kirstie Wright
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Geological events chart in excel, includes columns to input information on the plate tectonic setting, regional tectonic setting, depositional environment and lithostratigraphy. This list is not exhaustive and should be used to build on, adding your own (e.g. local datasets, petroleum systems, data by basin or quad).Age (Ma) based on the International Chronostratigraphic Chart v2018/08 by the International Commission on Stratigraphy. Use this spreadsheet as a basis for collating and cataloging information when starting a new project or when working in a new area. Allows tectonostratigraphic data to be laid out chronologically along the geological timescale.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Dr Corynen (2018). Graph Input Data Example.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.7506209.v1

Graph Input Data Example.xlsx

Explore at:

xlsxAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.7506209.v1

Dataset updated

Dec 26, 2018

Dataset provided by

Figsharehttp://figshare.com/

Authors

Dr Corynen

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The various performance criteria applied in this analysis include the probability of reaching the ultimate target, the costs, elapsed times and system vulnerability resulting from any intrusion. This Excel file contains all the logical, probabilistic and statistical data entered by a user, and required for the evaluation of the criteria. It also reports the results of all the computations.

Clear search

Close search

Google apps

Main menu

Graph Input Data Example.xlsx

Time-Series Matrix (TSMx): A visualization tool for plotting multiscale...

Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm

Moving Beyond the Bar Plot and Line Graph To Create Informative and...

Graph Database Market Report | Global Forecast From 2025 To 2033

Graph Database Market Outlook

Component Analysis

Using iNaturalist Data to Create Charts

Knowledge Graph Market By Technology (Graph Databases, Triple Stores),...

Graph Database Market Research Report 2033

Graph Database Market Outlook

Component Analysis

Superstore Sales Analysis

Dataset of development of business during the COVID-19 crisis

Petre_Slide_CategoricalScatterplotFigShare.pptx

7 Display the graph in a separate window. Dot colors indicate

Graph Database Vector Search Market Research Report 2033

Graph Database Vector Search Market Outlook

Component Analysis

A study on real graphs of fake news spreading on Twitter

Temperature, salinity and rainfall analysis of the Olifants and Breede...

Eit Data Of A Human Arm

Data from: Excel Templates: A Helpful Tool for Teaching Statistics

Graph Input Data

Figure 1.xlsx

Geological_Event_Chart

Graph Input Data Example.xlsx