Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Graphs And Charts is a dataset for object detection tasks - it contains Bar Chart Line Graph Pie annotations for 384 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterThese files represent the state and regional summaries of sensitivities to formaldehyde, acetaldehyde and ozone to various sources and compounds.
This dataset is associated with the following publication: Luecken, D., S. Napelenok, M. Strum, R. Scheffe, and S. Phillips. Sensitivity of Ambient Atmospheric Formaldehyde and Ozone to Precursor Species and Source Types Across the United States. ENVIRONMENTAL SCIENCE & TECHNOLOGY. American Chemical Society, Washington, DC, USA, 52(8): 4668–4675, (2018).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Specialized collection of 0 free data visualization SVG illustrations from the technology & electronics category. Data visualization illustrations including bar charts, network graphs, and information graphics Examples include: bar chart, network graph.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Domain-Specific Dataset and Visualization Guide
This package contains 20 realistic datasets in CSV format across different industries, along with 20 text files suggesting visualization ideas. Each dataset includes about 300 rows of synthetic but domain-appropriate data. They are designed for data analysis, visualization practice, machine learning projects, and dashboard building.
What’s inside
20 CSV files, one for each domain:
20 TXT files, each listing 10 relevant graphing options for the dataset.
MASTER_INDEX.csv, which summarizes all domains with their column names.
Use cases
Example
Education dataset has columns like StudentName, Class, Subject, Marks, AttendancePercent. Suggested graphs: bar chart of average marks by subject, scatter plot of marks vs attendance percent, line chart of attendance over time.
E-Commerce dataset has columns like OrderDate, Product, Category, Price, Quantity, Total. Suggested graphs: line chart of revenue trend, bar chart of revenue by category, pie chart of payment mode share.
Facebook
TwitterUse the Chart Viewer template to display bar charts, line charts, pie charts, histograms, and scatterplots to complement a map. Include multiple charts to view with a map or side by side with other charts for comparison. Up to three charts can be viewed side by side or stacked, but you can access and view all the charts that are authored in the map. Examples: Present a bar chart representing average property value by county for a given area. Compare charts based on multiple population statistics in your dataset. Display an interactive scatterplot based on two values in your dataset along with an essential set of map exploration tools. Data requirements The Chart Viewer template requires a map with at least one chart configured. Key app capabilities Multiple layout options - Choose Stack to display charts stacked with the map, or choose Side by side to display charts side by side with the map. Manage chart - Reorder, rename, or turn charts on and off in the app. Multiselect chart - Compare two charts in the panel at the same time. Bookmarks - Allow users to zoom and pan to a collection of preset extents that are saved in the map. Home, Zoom controls, Legend, Layer List, Search Supportability This web app is designed responsively to be used in browsers on desktops, mobile phones, and tablets. We are committed to ongoing efforts towards making our apps as accessible as possible. Please feel free to leave a comment on how we can improve the accessibility of our apps for those who use assistive technologies.
Facebook
Twitterhttps://doi.org/10.17026/fp39-0x58https://doi.org/10.17026/fp39-0x58
This deposit includes the data that was collected in an experimental study on debunking strategies for misleading bar charts, involving 2 surveys (one week delay) with a total of 24 unique bar charts each with two bars, filled in by 441 representative (age, ethnicity, gender) participants from the USA. De experiment compares four methods for correcting misleading bar charts with truncated vertical axes by measuring the participants evaluated difference between the bars at five time points. Measures were taken on a visual analogue scale. The first survey also included a short graph literacy scale and a question on highest completed educational level. Date Submitted: 2022-06-24
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset classifies 587 visualisation–interaction units extracted from 186 web-based Digital Humanities projects, previously classified in a related dataset https://doi.org/10.5281/zenodo.14192758" target="_blank" rel="noopener">[1], allowing cross-references between them. Each row represents a distinct combination of visualisation technique(s) (e.g., map, bar chart, network) and associated interactive features within a project. The dataset provides a finer-grained view of design choices, documenting how visualisation and interaction methods are implemented, including their connection to narrative or non-narrative contexts, temporal encodings, and multi-view or reconfiguration strategies.
A visualisation–interaction unit is a distinct configuration combining a visualisation technique (or multiple techniques when linked through coordinated views) with a specific set of interactive features. Following [2], we consider these elements as working interdependently to achieve a shared data-related goal.
These units form the basic level of analysis in our dataset, with each row representing one unit. Units are distinguished not only by their visualisation and interaction techniques, but also by their temporal characteristics and narrative context. Temporal encodings—such as time axes, animated transitions, or other time-based variables—define a new unit even if the visualisation and interaction remain unchanged. Similarly, an identical configuration appearing in both a narrative and a non-narrative context counts as two separate units, reflecting their differing intent and function.
Accordingly, the number of units in a project does not directly correspond to the number of visualisations it contains. Two otherwise identical charts are treated as distinct units if they differ in interactive features, temporal encoding, or narrative context, while exact duplicates without variation are counted as a single unit. For example, if a project contains five bar charts that all support drill-down on click, they are counted as a single unit. Conversely, if the same five bar charts each offer different interaction techniques, they are treated as separate units based on their unique visualisation–interaction combinations.
Every unit includes at least one visualisation technique, although interaction may or may not be present (for instance, in a static chart).
Identifiers. Three columns in the dataset are dedicated to uniquely identifying units and their relationships within projects:
project_id: the identifier of the project to which the unit belongs. This reuses the same incremental IDs from https://doi.org/10.5281/zenodo.14192758" target="_blank" rel="noopener">[1] to enable cross-referencing between datasets.
vis_unit_id: the identifier of the individual visualisation–interaction unit. IDs increment within each project and reset to 1 for a new project.
visualisation_version: an identifier used to track interactive transformations of visualisations. Multiple rows can share the same project_id and vis_unit_id if they represent different states of the same view, triggered by user interactions that modify the visual form.
Narrativity. We record whether a visualisation–interaction unit is employed within a narrative context. Some projects contain units exclusively in narrative or non-narrative settings, while others include units in both. The relevant columns are:
non_narrative: a boolean value indicating whether the unit appears in non-narrative contexts.
narrative: a boolean value indicating whether the unit is used in narrative contexts (including both strongly guided, author-driven data stories and more interactive, user-driven narratives).
Visualisation techniques. We adopt, and where necessary adapt, the terminology and definitions from [3]. Each column corresponds to a specific type of visualisation and indicates (by means of a boolean value) whether that visualisation technique is present in a given visualisation–interaction unit. The following columns and inclusion criteria are used to encode this information:
plot: visual representations that map data points onto a two-dimensional coordinate system.
cluster_or_set: sets or cluster-based visualisations used to unveil possible inter-object similarities.
map: geographical maps used to show spatial insights. While we do not specify the variants of maps (e.g., pin maps, dot density maps, flow maps, etc.), we make an exception for maps where each data point is represented by another visualisation (e.g., a map where each data point is a pie chart) by accounting for the presence of both in their respective columns.
network: visual representations highlighting relational aspects through nodes connected by links or edges.
hierarchical_diagram: tree-like structures such as tree diagrams, radial trees, but also dendrograms. They differ from networks for their strictly hierarchical structure and absence of closed connection loops.
treemap: still hierarchical, but highlighting quantities expressed by means of area size. It also includes circle packing variants.
word_cloud: clouds of words, where each instance’s size is proportional to its frequency in a related context
bars: includes bar charts, histograms, and variants. It coincides with “bar charts” in [7] but with a more generic term to refer to all bar-based visualisations.
line_chart: the display of information as sequential data points connected by straight-line segments.
area_chart: similar to a line chart but with a filled area below the segments. It also includes density plots.
pie_chart: circular graphs divided into slices, which can also use multi-level solutions.
plot_3d: plots that use a third dimension to encode an additional variable.
proportional_area: representations used to compare values through area size. Typically, using circle- or square-like shapes.
timeline: the display of a list of data points or spans in chronological order. They include timelines working either with a scale or simply displaying events in sequence. As in [3], we also include structured solutions resembling Gantt chart layouts.other: it includes all other types of non-temporal visualisations that do not fall into the aforementioned categories.
Temporal encodings. We identify techniques used to encode temporality (except for timelines, where temporal encoding is tacitly assumed). Columns:
temporal_dimension: to report when time is mapped to any dimension of a visualisation. We use the term “dimension” and not “axis” as in [3] as more appropriate for radial layouts or more complex representational choices.
animation: temporality is perceived through an animation changing the visualisation according to time flow.
visual_variable: another visual encoding strategy is used to represent any temporality-related variable (e.g. colour).
Multi-type coordinated views. Tracking coordinated views across the dataset is limited to cases where multiple visualisation types can be clearly identified within a single view. For these instances, a dedicated column indicates which visualisation—if any—plays a central or dominant role:
primary_visualisation: contains the name of the visualisation technique (as defined in the corresponding column) that holds a dominant role in the coordinated view. If no single type can be considered guiding because multiple types have similar perceived importance, the column contains "NA".
Interaction techniques. A set of categories to assess affordable interaction techniques based on the concept of user intent [2] and user-allowed data actions [4]. The following categories roughly match the “processing”, “mapping”, and “presentation” actions from [4] and the manipulative subset of methods of the “how” an interaction is performed in the conception of [5]. Only interactions that affect the visual representation or the aspect of data points, symbols, and glyphs are taken into consideration. A two-level analysis is enabled by the columns, referring to interaction categories also explored at an aggregated project level in [1], and their values, exposing more specific interaction techniques (multiple values are divided by a semicolumn). They include:
basic_selection: the demarcation of an element either for the duration of the interaction (highlight) or more permanently until the occurrence of another selection (mark).
advanced_selection: the demarcation of an element triggers the demarcation of related instances within the same visualisation
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
With annotation pipeline, we applied compound figure classification, subfigure separation, and bar chart classification to obtain bar charts from this sample and then ask annotators to annotate graphical integrity issues on these bar chart. With prediction pipeline, we applied our whole graphical integrity issues detector on this sample. Both sets are similar, as demonstrated by analysis in Fig 2. (XLSX)
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
LLM Distribution Evaluation Dataset
This dataset contains 1000 synthetic graphs with questions and answers about statistical distributions, designed to evaluate large language models' ability to analyze data visualizations.
Dataset Description
Dataset Summary
This dataset contains diverse statistical visualizations (bar charts, line plots, scatter plots, histograms, area charts, and step plots) with associated questions about:
Normality testing Distribution… See the full description on the dataset page: https://huggingface.co/datasets/robvanvolt/llm-distribution-sample.
Facebook
TwitterOpen Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Contained within the 3rd Edition (1957) of the Atlas of Canada is a plate that shows six condensed maps of the distribution of plants producing the following: leather footwear, womens and childrens factory made clothing, synthetic textiles and silks, mens factory made clothing, cotton textiles, and rubber products. All data for these maps is for 1954 with the exception of the rubber products map which is for 1955. Each map is accompanied by a bar graph and pie chart. The bar graphs show the value of production by major categories of products. The pie charts show the percentage distribution of persons employed in each manufacturing industry by province.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
"Scholarly figures are data visualizations like bar charts, pie charts, line graphs, maps, scatter plots or similar figures. Text extraction from scholarly figures is useful in many application scenarios, since text in scholarly figures often contains information that is not present in the surrounding text. This dataset is a corpus of 121 scholarly figures from the economics domain evaluating text extraction tools. We randomly extracted these figures from a corpus of 288,000 open access publications from EconBiz. The dataset resembles a wide variety of scholarly figures from bar charts to maps. We manually labeled the figures to create the gold standard.
We adjusted the provided gold standard to have a uniform format for all datasets. Each figure is accompanied by a TSV file (tab-separated values) where each entry corresponds to a text line which has the following structure:
X-coordinate of the center of the bounding box in pixel
Y-coordinate of the center of the bounding box in pixel
Width of the bounding box in pixel
Height of the bounding box in pixel
Rotation angle around its center in degree
Text inside the bounding box
In addition we provide the ground truth in JSON format. A schema file is included in each dataset as well. The dataset is accompanied with a ReadMe file with further information about the figures and their origin.
If you use this dataset in your own work, please cite one of the papers in the references."
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
In this project, we aimed to map the visualisation design space of visualisation embedded in right-to-left (RTL) scripts. We aimed to expand our knowledge of visualisation design beyond the dominance of research based on left-to-right (LTR) scripts. Through this project, we identify common design practices regarding the chart structure, the text, and the source. We also identify ambiguity, particularly regarding the axis position and direction, suggesting that the community may benefit from unified standards similar to those found on web design for RTL scripts. To achieve this goal, we curated a dataset that covered 128 visualisations found in Arabic news media and coded these visualisations based on the chart composition (e.g., chart type, x-axis direction, y-axis position, legend position, interaction, embellishment type), text (e.g., availability of text, availability of caption, annotation type), and source (source position, attribution to designer, ownership of the visualisation design). Links are also provided to the articles and the visualisations. This dataset is limited for stand-alone visualisations, whether they were single-panelled or included small multiples. We also did not consider infographics in this project, nor any visualisation that did not have an identifiable chart type (e.g., bar chart, line chart). The attached documents also include some graphs from our analysis of the dataset provided, where we illustrate common design patterns and their popularity within our sample.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This table shows our analysis of the relationship between proportional ink violations and a group of variables (journal rank, research field, researcher seniority, affiliation country, and year of publication). (XLSX)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Hr Analytics Job Prediction’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mfaisalqureshi/hr-analytics-and-job-prediction on 30 September 2021.
--- Dataset description provided by original source is as follows ---
Hr Data Analytics This dataset contains information about employees who worked in a company.
This dataset contains columns: Satisfactory Level, Number of Project, Average Monthly Hours, Time Spend Company, Promotion Last 5
Years, Department, Salary
You can download, copy and share this dataset for analysis and Predictions employees Behaviour.
Answer the following questions would be worthy 1- Do Exploratory Data analysis to figure out which variables have a direct and clear impact on employee retention (i.e. whether they leave the company or continue to work) 2- Plot bar charts showing the impact of employee salaries on retention 3- Plot bar charts showing a correlation between department and employee retention 4- Now build a logistic regression model using variables that were narrowed down in step 1 5- Measure the accuracy of the model
--- Original source retains full ownership of the source dataset ---
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was used to produce land cover change analyses for Figures 3.8 (A-D) and Figures 3.9 (A -D) as part of a master's thesis titled- Repeatable methods for classification of alien and native vegetation in the Montane grasslands (2024).This dataset encompasses forty-five data entries. This includes 11 Hidden Markov Model (HMM) post-processed GeoTIFFS for 1990 until 2020 under each management class (i.e. Barloworld/Commercial, Communal, Forestry/Conservation, and Plantations). An r. script is also included to combine all GeoTIFFS then format and run code to produce river plots and bar graphs for the four management classes. The dataset aims to illustrate the changes among each land cover class (i.e. NVF, Aliens, Indigenous Forest, Grassland, and Mixed Woody Grassland) under the different management classes. This helps identify drivers of land cover change in the various management classes and the entire study area. It is also important to note that the land cover changes will portray realistic changes because they are HMM post-processed.Date of data collection: February 2020Location of data collection: Blyde River Canyon Conservancy and its surrounds, in Mpumalanga/Limpopo Provinces, South Africa.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset includes 15 visual diagrams (pie and bar charts) comparing the distribution of agricultural residues, OFMSW, and used cooking oil across each state in Nigeria, province in South Africa, and county in Kenya. These summaries provide a comparative overview of regional feedstock strengths. The charts complement quantitative analyses by providing visual summaries of feedstock availability.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
JGraphQA
Introduction
We introduce JGraphQA, a multimodal benchmark designed to evaluate the chart understanding capabilities of Large Multimodal Models (LMMs) in Japanese. To create JGraphQA, we first conducted a detailed analysis of the existing ChartQA benchmark. Then, focusing on Japanese investor relations (IR) materials, we collected a total of 100 images consisting of four types: pie charts, line charts, bar charts, and tables. For each image, we created two… See the full description on the dataset page: https://huggingface.co/datasets/r-g2-2024/JGraphQA.
Facebook
TwitterThe dataset used is US Census data which is an extraction of the 1994 census data which was donated to the UC Irvine’s Machine Learning Repository. The data contains approximately 32,000 observations with over 15 variables. The dataset was downloaded from: http://archive.ics.uci.edu/ml/datasets/Adult. The dependent variable in our analysis will be income level and who earns above $50,000 a year using SQL queries, Proportion Analysis using bar charts and Simple Decision Tree to understand the important variables and their influence on prediction.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains comprehensive data on internal student mobility between settlements and regions in Kazakhstan from the year 2020 through to year 2024. It contains yearly totals of transferring students, broken down by origin and destination sites, both at the regional and settlement levels. The data was derived from the Kazakhstan National Education Database and represents official administrative counts for transferring schools. It is provided to help facilitate research on educational mobility, educational inequality, urbanization, and school planning and forecasting.
-This dataset consists of six individual tables derived from aggregated student migration data from Kazakhstan for the period from 2020 to 2024. The tables are in .csv form and can be found in the 'Dataset/data' folder. -There is a detailed variable description in the provided 'Dataset/data/Codebook.xlsx' that explains every field utilized in the dataset. -Each table is also accompanied by a related visualisation (such as bar charts, maps, line graphs) that presents main patterns and insights. The visualisations appear in the 'Dataset/data/visualizations' folder and they correspond to every table by figure number for easy reference. -The Dataset/code folder contains the Python code used to process and analyze the raw data from the National Education Database (NEDB), along with a README.txt file that provides a step-by-step explanation of the methodology.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Graphs And Charts is a dataset for object detection tasks - it contains Bar Chart Line Graph Pie annotations for 384 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).