42 datasets found

Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm
plos.figshare.com
docx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic (2023). Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm [Dataset]. http://doi.org/10.1371/journal.pbio.1002128
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pbio.1002128
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.
w
Part I and Part II crimes bar chart
data.wu.ac.at
csv, json, xml
Updated Aug 27, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
County of San Mateo Sheriff's Office (2016). Part I and Part II crimes bar chart [Dataset]. https://data.wu.ac.at/schema/performance_smcgov_org/bnZoNi01OW1m
Explore at:
csv, json, xmlAvailable download formats
Dataset updated
Aug 27, 2016
Dataset provided by
County of San Mateo Sheriff's Office
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
Counts of Part I committed in San Mateo County from 1985 on. This dataset also includes Part II crimes from 2013 on.

Part I crimes include: homicide, rape, robbery, aggravated assault, burglary, motor vehicle theft, larceny-theft, and arson. These counts include crimes committed at San Francisco International Airport (SFO), Unincorporated San Mateo County, Woodside, Portola Valley, San Carlos from 10/31/10 forward; Half Moon Bay from 6/12/11 forward; and Millbrae from 3/4/12 forward.

Part II crimes do not include San Francisco International Airport (SFO) cases and is an estimate only. An estimate is required because there are no specific data types used when keying in Type II crime types. Therefore, Records Manager judgment is used.
w
Civic Projects - Status Count (Bar Chart)
data.wu.ac.at
csv, json, xml
Updated Nov 14, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Austin Public Works (2013). Civic Projects - Status Count (Bar Chart) [Dataset]. https://data.wu.ac.at/schema/data_austintexas_gov/ZXFjei03cXZj
Explore at:
csv, json, xmlAvailable download formats
Dataset updated
Nov 14, 2013
Dataset provided by
City of Austin Public Works
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
This dataset includes information for projects that appear on the City of Austin’s Capital Improvement Visualization Information and Communication (CIVIC) Map Viewer, www.austintexas.gov/GIS/civic. These projects, also known as Capital Improvements Program (CIP) projects, implement the construction, replacement, or renovation of city assets that are useful to the community. Data is currently available for most CIP projects funded in full or in part by voter-approved bond programs from 2010 and 2012. The dataset below is subject to change at any time, and does not represent a comprehensive list of capital improvement projects. For more information about the City of Austin’s Capital Improvement Program, please visit www.austintexas.gov/department/civic. The City of Austin has produced CIVIC, a web application to search Capital Improvement Projects, for informational purposes only. The data and information available at this web site is provided "As is", and "As Available" and without any warranties of any kind either express or implied. The City makes no warranty regarding the accuracy or completeness of this site and the information provided. By accessing or using CIVIC, you agree to these terms of use. The City of Austin may change the terms of use at any time at its sole discretion and without notice.”

Pokemon Detective: Unmask Team Rocket

kaggle.com

Updated Mar 27, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Kotso P (2025). Pokemon Detective: Unmask Team Rocket [Dataset]. https://www.kaggle.com/datasets/kotsop/pokmon-detective-challenge

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Mar 27, 2025

Dataset provided by

Kaggle

Authors

Kotso P

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

🔍 The Case of the Disguised Villains: Predicting Team Rocket with Data

In the bustling world of Kanto, where Pokémon battles shape destinies, crime lurks in the shadows. Detective Kotso, the sharpest mind in Pokémon crime investigations, has been tasked with an urgent mission. The mayor suspects that Team Rocket has infiltrated the city, disguising themselves as ordinary citizens.

But Kotso doesn’t work alone—he relies on you, a brilliant data scientist, to uncover the truth. Your job? Analyze the data of 5,000 residents to predict which of the 1,000 unclassified individuals are secretly part of Team Rocket.

Can you spot the hidden patterns? Can Machine Learning crack the case where traditional detective work fails? The fate of Kanto depends on your skills.

📊 Dataset Structure & Features

This dataset holds the key to exposing Team Rocket’s operatives. Below is a breakdown of the features at your disposal:

Column Name	Description
ID	Unique identifier for each citizen
Age	Age of the citizen
City	City the citizen is from
Economic Status	Low, Medium, High
Occupation	Profession in the Pokémon world
Most Frequent Pokémon Type	The type of Pokémon most frequently used
Average Pokémon Level	Average level of owned Pokémon
Criminal Record	Clean (0) or Dirty (1)
Pokéball Usage	Preferred Pokéball type (e.g., DarkBall, UltraBall)
Winning Percentage	Battle win rate (e.g., 64%, 88%)
Gym Badges	Number of gym badges collected (0 to 8)
Is Pokémon Champion	True if the citizen has defeated the Pokémon Elite Four
Battle Strategy	Defensive, Aggressive, Unpredictable
City Movement Frequency	Number of times the citizen moved between cities in the last year
Possession of Rare Items	Yes or No
Debts to the Kanto System	Amount of debt (e.g., 20,000)
Charitable Activities	Yes or No
Team Rocket Membership	Yes or No (target variable)

🕵️ Can You Crack the Case?

This dataset is not just about numbers—it’s a criminal investigation. Hidden patterns lurk beneath the surface, waiting to be uncovered.

Are certain Pokémon types more common among Team Rocket members?
Do suspicious financial transactions hint at illegal activities?
Does their battle strategy betray their allegiance?

This isn’t just another classification task—it’s a race against time to stop Team Rocket before they take control of Kanto!

Detective Kotso is counting on you. Will you rise to the challenge? 🕵️‍♂️🔎

🔎 10 Key Questions & Suggested Analysis Techniques

1️⃣ Do certain Pokémon types indicate suspicious behavior?
- 📈 Graph: Stacked bar chart comparing Pokémon type distribution between Rocket & non-Rocket members.
- 🎯 Test: Chi-square test for correlation.

2️⃣ Is economic status a reliable predictor of criminal affiliation?
- 📊 Graph: Box plot of debt and economic status per Team Rocket status.
- 🏦 Test: ANOVA test for group differences.

3️⃣ Do Team Rocket members have a preference for specific PokéBalls?
- 🎨 Graph: Heatmap of PokéBall usage vs. Team Rocket status.
- ⚡ Test: Chi-square test for independence.

4️⃣ Does a high battle win ratio correlate with Team Rocket membership?
- 📉 Graph: KDE plot of win ratio distribution for both classes.
- 🏆 Test: T-test for mean differences.

5️⃣ Are migration patterns different for Team Rocket members?
- 📈 Graph: Violin plot of migration counts per group.
- 🌍 Test: Mann-Whitney U test.

6️⃣ Do Rocket members tend to avoid charity participation?
- 📊 Graph: Grouped bar chart of charity participation rates.
- 🕵️‍♂️ Test: Fisher’s Exact Test for small sample sizes.

7️⃣ Do Rocket members disguise themselves in certain professions?
- 📊 Graph: Horizontal bar chart of profession frequency per group.
- 🕵️‍♂️ Test: Chi-square test for profession-Team Rocket relationship.

8️⃣ Is there an unusual cluster of Rocket members in specific cities?
- 🗺 Graph: Geographic heatmap of city distributions.
- 📌 Test: Spatial autocorrelation test.

9️⃣ How does badge count affect the likelihood of being a Rocket member?
- 📉 Graph: Histogram of gym badge distributions.
- 🏅 Test: Kruskal-Wallis test.

🔟 **Are there any multi-feature interactions that reve...

u
Code book of RTL visualization in Arabic News media
rdr.ucl.ac.uk
xlsx
Updated Jul 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muna Alebri; No ̈elle Rakotondravony; Lane Harrison (2024). Code book of RTL visualization in Arabic News media [Dataset]. http://doi.org/10.5522/04/26150749.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.5522/04/26150749.v1
Dataset updated
Jul 3, 2024
Dataset provided by
University College London
Authors
Muna Alebri; No ̈elle Rakotondravony; Lane Harrison
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
In this project, we aimed to map the visualisation design space of visualisation embedded in right-to-left (RTL) scripts. We aimed to expand our knowledge of visualisation design beyond the dominance of research based on left-to-right (LTR) scripts. Through this project, we identify common design practices regarding the chart structure, the text, and the source. We also identify ambiguity, particularly regarding the axis position and direction, suggesting that the community may benefit from unified standards similar to those found on web design for RTL scripts. To achieve this goal, we curated a dataset that covered 128 visualisations found in Arabic news media and coded these visualisations based on the chart composition (e.g., chart type, x-axis direction, y-axis position, legend position, interaction, embellishment type), text (e.g., availability of text, availability of caption, annotation type), and source (source position, attribution to designer, ownership of the visualisation design). Links are also provided to the articles and the visualisations. This dataset is limited for stand-alone visualisations, whether they were single-panelled or included small multiples. We also did not consider infographics in this project, nor any visualisation that did not have an identifiable chart type (e.g., bar chart, line chart). The attached documents also include some graphs from our analysis of the dataset provided, where we illustrate common design patterns and their popularity within our sample.
c
ckanext-charts
catalog.civicdataecosystem.org
Updated Jun 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). ckanext-charts [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-charts
Explore at:
Dataset updated
Jun 4, 2025
Description
The Charts extension for CKAN enhances the platform's data visualization capabilities, allowing users to create, manage, and share charts that are linked to CKAN datasets. It allows users to create interactive and visually appealing chart representations of data directly within the CKAN environment, providing essential data analysis tools. This streamlines the process of visualizing data for a more intuitive and accessible experience. Key Features: Chart Creation: Enables users to create charts directly from CKAN datasets. Chart Editing: Allows users to modify and customize existing charts. Chart Embedding: Provides the ability to embed created charts into other web applications or platforms for wider dissemination. Chart Sharing: Supports sharing of chart visualizations with other users or groups within or outside the CKAN ecosystem. Multiple Chart Types: Supports a variety of common chart types, including bar charts, line charts, and pie charts. Further chart types are not mentioned explicitly, but it is implied the extension can be extended as well. Technical Integration: The extension integrates with CKAN primarily as a plugin. To enable the Charts extension, the chartsview and chartsbuilderview plugins must be added to the CKAN configuration file. The documentation also mentions the need to set CHARTS_FIELDS when autogenerating documentation for chart types fields, which implies a level of customization and extensibility for different chart types. It requires proper initialization of the CKAN instance and relies on validators and helpers, emphasizing the need for a correctly configured CKAN environment. Benefits & Impact: The primary benefit of the CKAN Charts extension is the enhancement of data analysis and presentation capabilities within CKAN. By providing tools to create, manage, and share charts, the extension makes it easier for users to understand and communicate insights from their data, fostering better data-driven decision-making. Also the documentation for chart types can be autogenerated.
Electronic Sales
kaggle.com
Updated Dec 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anshul Pachauri (2023). Electronic Sales [Dataset]. https://www.kaggle.com/datasets/anshulpachauri/electronic-sales
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 19, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Anshul Pachauri
Description
The provided Python code is a comprehensive analysis of sales data for a business that involves the merging of monthly sales data, cleaning and augmenting the dataset, and performing various analytical tasks. Here's a breakdown of the code:

Data Preparation and Merging:

The code begins by importing necessary libraries and filtering out warnings. It merges sales data from 12 months into a single file named "all_data.csv." Data Cleaning:

Rows with NaN values are dropped, and any entries starting with 'Or' in the 'Order Date' column are removed. Columns like 'Quantity Ordered' and 'Price Each' are converted to numeric types for further analysis. Data Augmentation:

Additional columns such as 'Month,' 'Sales,' and 'City' are added to the dataset. The 'City' column is derived from the 'Purchase Address' column. Analysis:

Several analyses are conducted, answering questions such as: The best month for sales and total earnings. The city with the highest number of sales. The ideal time for advertisements based on the number of orders per hour. Products that are often sold together. The best-selling products and their correlation with price. Visualization:

Bar charts and line plots are used for visualizing the analysis results, making it easier to interpret trends and patterns. Matplotlib is employed for creating visualizations. Summary:

The code concludes with a comprehensive visualization that combines the quantity ordered and average price for each product, shedding light on product performance. This code is structured to offer insights into sales patterns, customer behavior, and product performance, providing valuable information for strategic decision-making in the business.
F
SlideImages
data.uni-hannover.de
service.tib.eu
tar, zip
Updated Jan 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TIB (2022). SlideImages [Dataset]. https://data.uni-hannover.de/dataset/slideimages
Explore at:
tar(1360140103), zip(107220518)Available download formats
Dataset updated
Jan 20, 2022
Dataset authored and provided by
TIB
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Please note: this archive requires support for dangling symlinks, which excludes the Windows operating system.

To use this dataset, you will need to download the MS COCO 2017 detection images and expand them to a folder called coco17 in the train_val_combined directory. The download can be found here: https://cocodataset.org/#download You will also need to download the AI2D image description dataset and expand them to a folder called ai2d in the train_val_combined directory. The download can be found here: https://prior.allenai.org/projects/diagram-understanding

License Notes for Train and Val: Since the images in this dataset come from different sources, they are bound by different licenses.

Images for bar charts, x-y plots, maps, pie charts, tables, and technical drawings were downloaded directly from wikimedia commons. License and authorship information is stored independently for each image in these categories in the wikimedia_commons_licenses.csv file. Each row (note: some rows are multi-line) is formatted so:

Images in the slides category were taken from presentations which were downloaded from Wikimedia Commons. The names of the presentations on Wikimedia Commons omits the trailing underscore, number, and file extension, and ends with .pdf instead. The source materials' licenses are shown in source_slices_licenses.csv.

Wikimedia commons photos' information page can be found at "https://commons.wikimedia.org/wiki/File:

License Notes for Testing: The testing images have been uploaded to SlideWiki by SlideWiki users. The image authorship and copyright information is available in authors.csv.

Further information can be found for each image using the SlideWiki file service. Documentation is available at https://fileservice.slidewiki.org/documentation#/ and in particular: metadata is available at "https://fileservice.slidewiki.org/metadata/

This is the SlideImages dataset, which has been assembled for the SlideImages paper. If you find the dataset useful, please cite our paper: https://doi.org/10.1007/978-3-030-45442-5_36
Global Refugees Dataset 1951-2015
kaggle.com
Updated Aug 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abu Talha (2023). Global Refugees Dataset 1951-2015 [Dataset]. https://www.kaggle.com/datasets/talhabu/global-refugee-trends-and-displacement-statistics
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 13, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Abu Talha
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This comprehensive dataset presents the global refugee landscape by providing a detailed overview of refugee and displacement statistics from various countries and territories over a span of time. With a total of 107,980 rows and 11 columns, this dataset delves into the complexities of forced migration and human displacement, offering insights into the movements of refugees, asylum-seekers, internally displaced persons (IDPs), returned refugees and IDPs, stateless individuals, and other populations of concern.

Columns in the dataset:

Year: The year of data collection.

Country / territory of asylum/residence: The host country or territory where refugees seek asylum or residence.

Origin: The country of origin from which refugees are fleeing.

Refugees (incl. refugee-like situations): The number of refugees, including those in refugee-like situations.

Asylum-seekers (pending cases): The count of individuals seeking asylum whose cases are pending.

Returned refugees: The number of refugees who have returned to their country of origin.

Internally displaced persons (IDPs): The count of people who have been displaced within their own country due to conflict or other reasons.

Returned IDPs: The number of internally displaced persons who have returned to their previous locations.

Stateless persons: Individuals who do not have the legal recognition of any country.

Others of concern: Additional populations in need of humanitarian assistance.

Total Population: The sum of all above categories, representing the overall population affected by displacement. The dataset serves as a valuable resource for studying global refugee trends, assessing the impact of conflicts and crises on displacement, and understanding the challenges faced by vulnerable populations worldwide.

Visualization Ideas: Time Series Analysis: Plot the trends in different refugee populations over the years, such as refugees, asylum-seekers, IDPs, returned refugees, etc. Geographic Analysis: Create heatmaps or choropleth maps to visualize refugee flows between different countries and regions. Origin and Destination Analysis: Show the top countries of origin and the top host countries for refugees using bar charts. Pie Charts: Visualize the distribution of different refugee populations (refugees, asylum-seekers, IDPs, etc.) as a percentage of the total population. Stacked Area Chart: Display the cumulative total of different refugee populations over time to observe changes and trends.

Data Modeling and Machine Learning Ideas: Time Series Forecasting: Use machine learning algorithms like ARIMA or LSTM to predict future refugee trends based on historical data. Clustering: Group countries based on similar refugee patterns using clustering algorithms such as K-Means or DBSCAN. Classification: Build a classification model to predict whether a country will experience a significant increase in refugee inflow based on historical and socio-political factors. Sentiment Analysis: Analyze social media or news data to determine the sentiment around refugee-related topics and how it correlates with migration patterns. Network Analysis: Construct a network graph to visualize the connections and interactions between countries in terms of refugee flows.

These visualization and modeling ideas can provide meaningful insights into the global refugee crisis and aid in decision-making, policy formulation, and humanitarian efforts.
c
ckanext-graph
catalog.civicdataecosystem.org
Updated Jun 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). ckanext-graph [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-graph
Explore at:
Dataset updated
Jun 4, 2025
Description
The Graph extension for CKAN adds the ability to visualize data resources as graphs, providing users with a more intuitive understanding of the information contained within datasets. It currently supports temporal and categorical graph types, enabling the creation of count-based visualizations over time or across different categories. While the current version is primarily designed for use with an Elasticsearch backend within the Natural History Museum's infrastructure, it is built to be extensible for broader applicability. Key Features: Temporal Graphs: Generates line graphs that display counts of data points over time, based on a designated date field within the resource. This allows to visualize trends and patterns dynamically. Categorical Graphs: Creates bar charts that show the distribution of counts for various values found within a specified field in a resource, making it easier to understand data groupings. Extensible Backend Architecture: Designed to support multiple backend data storage options, with Elasticsearch currently implemented, paving the way for future integration with other systems like PostgreSQL. Template Customization: Includes a template (templates/graph/view.html) that can be extended to override or add custom content to the graph view, giving full control over the visualization design. Configuration Options: Backend selection through the .ini configuration file. Users can choose between Elasticsearch or SQL, allowing administrators to align the extension with their specific requirements. Technical Integration: The Graph extension integrates with CKAN by adding a new view option to resources. Once enabled, the graph view will appear as an available option alongside existing resource viewers. The configuration requires modifying the CKAN .ini file to add 'graph' to the list of enabled plugins and setting the desired backend. The template templates/graph/view.html allows for full customization of the view. Benefits & Impact: The Graph extension enhances the usability of CKAN-managed datasets by providing interactive visualizations of data. Temporal graphs help users identify time-based trends, while categorical graphs illustrate data distribution. The extensible architecture ensures that the extension can be adapted to different data storage systems, improving its versatility. By providing a graphical representation of data, this extension makes it easier to understand complex information, benefiting both data providers and consumers.
Hanhaoyang123/Pozzolanic-activity-experimental-dataset-of-calcined-coal-gangue-v1.0.2:...
zenodo.org
bin, txt
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Junfei Zhang; Haoyang Han; Ling Wang; Junfei Zhang; Haoyang Han; Ling Wang (2024). Hanhaoyang123/Pozzolanic-activity-experimental-dataset-of-calcined-coal-gangue-v1.0.2: Pozzolanic-activity-experimental-dataset-of-calcined-coal-gangue-v1.0.2 [Dataset]. http://doi.org/10.5281/zenodo.10049352
Explore at:
txt, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10049352
Dataset updated
Jul 11, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Junfei Zhang; Haoyang Han; Ling Wang; Junfei Zhang; Haoyang Han; Ling Wang
Description
This release contains experimental data on the pozzolanic of calcined coal gangue. "The data on the strength" presents the compressive and flexural strength data of cement mortar specimens (40×40×160 cm) containing 30% calcined coal gangue at different temperatures and curing times (3 days, 7 days, and 28 days). "Column chart of strength" visually represents the flexural and compressive strength data mentioned above in the form of bar charts, with temperature intervals on the x-axis and flexural strength and compressive strength on the y-axis. "R3 activity test data" displays the weights before and after calcination, along with the weight difference representing the combined water content measured through R3 activity testing. "The bar chart of R3 activity test" visually represents the combined water content in the form of bar charts, with temperature intervals on the x-axis and combined water content on the y-axis. Thermogravimetric data show the changes in TG and DTG concerning temperature(T). FTIR curve data at different temperatures include Wavenumber and absorbance values. XRD curve data display Degrees and Intensity, along with 80 scanning electron microscope images capturing different temperature coal gangue powder photos.
Dataset of frequency of occurrence of words in title of journal articles of...
figshare.com
zip
Updated Oct 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenfa Ng (2020). Dataset of frequency of occurrence of words in title of journal articles of different fields in biology [Dataset]. http://doi.org/10.6084/m9.figshare.13102502.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13102502.v1
Dataset updated
Oct 16, 2020
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Wenfa Ng
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Article title should theoretically reflect the content to which it represents. But, there is a paucity of quantitative methods for measuring the relevance of a title to an article’s content. One approach for quantifying the relevance of a title with respect to an article’s content is through mining the frequency of occurrence of words in the article title in the main text of the article. Profiling of high frequency of occurrence of many words in the title would suggests higher degree of relevance of the title with respect to the article’s content. But this must be taken in light of the caveat that some titles are phrased to express the conclusion of a paper, and would thus suffer from a lower frequency of occurrence of words of the title in the main text. This work reports a dataset that has been obtained through analysing the frequency of occurrence of words in the title of journal article in the main text of the article. This process is aided by an in-house MATLAB software coded for the project. Articles from different fields of biology such as biochemistry, bioinformatics, biotechnology, cell biology, computational biology, genetics, genomics, microbiology, molecular biology, structural biology, synthetic biology, and systems biology were put through the same analysis, and an Excel file detailing the frequency of occurrence of words in the title of the article serve as the output. The data is also visualized as a bar chart in the same Excel file.
Classification of web-based Digital Humanities projects leveraging...
zenodo.org
csv, tsv
Updated Jul 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tommaso Battisti; Tommaso Battisti (2025). Classification of web-based Digital Humanities projects leveraging information visualisation techniques [Dataset]. http://doi.org/10.5281/zenodo.14192758
Explore at:
tsv, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14192758
Dataset updated
Jul 18, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Tommaso Battisti; Tommaso Battisti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description

This dataset contains a list of 186 Digital Humanities projects leveraging information visualisation methods. Each project has been classified according to visualisation and interaction techniques, narrativity and narrative solutions, domain, methods for the representation of uncertainty and interpretation, and the employment of critical and custom approaches to visually represent humanities data.

Classification schema: categories and columns

The project_id column contains unique internal identifiers assigned to each project. Meanwhile, the last_access column records the most recent date (in DD/MM/YYYY format) on which each project was reviewed based on the web address specified in the url column.
The remaining columns can be grouped into descriptive categories aimed at characterising projects according to different aspects:

Narrativity. It reports the presence of narratives employing information visualisation techniques. Here, the term narrative encompasses both author-driven linear data stories and more user-directed experiences where the narrative sequence is determined by user exploration [1]. We define 2 columns to identify projects using visualisation techniques in narrative, or non-narrative sections. Both conditions can be true for projects employing visualisations in both contexts. Columns:

non_narrative (boolean)

narrative (boolean)

Domain. The humanities domain to which the project is related. We rely on [2] and the chapters of the first part of [3] to abstract a set of general domains. Column:

domain (categorical):

History and archaeology

Art and art history

Language and literature

Music and musicology

Multimedia and performing arts

Philosophy and religion

Other: both extra-list domains and cases of collections without a unique or specific thematic focus.

Visualisation of uncertainty and interpretation. Buiding upon the frameworks proposed by [4] and [5], a set of categories was identified, highlighting a distinction between precise and impressional communication of uncertainty. Precise methods explicitly represent quantifiable uncertainty such as missing, unknown, or uncertain data, precisely locating and categorising it using visual variables and positioning. Two sub-categories are interactive distinction, when uncertain data is not visually distinguishable from the rest of the data but can be dynamically isolated or included/excluded categorically through interaction techniques (usually filters); and visual distinction, when uncertainty visually “emerges” from the representation by means of dedicated glyphs and spatial or visual cues and variables. On the other hand, impressional methods communicate the constructed and situated nature of data [6], exposing the interpretative layer of the visualisation and indicating more abstract and unquantifiable uncertainty using graphical aids or interpretative metrics. Two sub-categories are: ambiguation, when the use of graphical expedients—like permeable glyph boundaries or broken lines—visually convey the ambiguity of a phenomenon; and interpretative metrics, when expressive, non-scientific, or non-punctual metrics are used to build a visualisation. Column:

uncertainty_interpretation (categorical):

Interactive distinction

Visual distinction

Ambiguation

Interpretative metrics

Critical adaptation. We identify projects in which, with regards to at least a visualisation, the following criteria are fulfilled: 1) avoid repurposing of prepackaged, generic-use, or ready-made solutions; 2) being tailored and unique to reflect the peculiarities of the phenomena at hand; 3) avoid simplifications to embrace and depict complexity, promoting time-consuming visualisation-based inquiry. Column:

critical_adaptation (boolean)

Non-temporal visualisation techniques. We adopt and partially adapt the terminology and definitions from [7]. A column is defined for each type of visualisation and accounts for its presence within a project, also including stacked layouts and more complex variations. Columns and inclusion criteria:

plot (boolean): visual representations that map data points onto a two-dimensional coordinate system.

cluster_or_set (bool): sets or cluster-based visualisations used to unveil possible inter-object similarities.

map (boolean): geographical maps used to show spatial insights. While we do not specify the variants of maps (e.g., pin maps, dot density maps, flow maps, etc.), we make an exception for maps where each data point is represented by another visualisation (e.g., a map where each data point is a pie chart) by accounting for the presence of both in their respective columns.

network (boolean): visual representations highlighting relational aspects through nodes connected by links or edges.

hierarchical_diagram (boolean): tree-like structures such as tree diagrams, radial trees, but also dendrograms. They differ from networks for their strictly hierarchical structure and absence of closed connection loops.

treemap (boolean): still hierarchical, but highlighting quantities expressed by means of area size. It also includes circle packing variants.

word_cloud (boolean): clouds of words, where each instance’s size is proportional to its frequency in a related context

bars (boolean): includes bar charts, histograms, and variants. It coincides with “bar charts” in [7] but with a more generic term to refer to all bar-based visualisations.

line_chart (boolean): the display of information as sequential data points connected by straight-line segments.

area_chart (boolean): similar to a line chart but with a filled area below the segments. It also includes density plots.

pie_chart (boolean): circular graphs divided into slices which can also use multi-level solutions.

plot_3d (boolean): plots that use a third dimension to encode an additional variable.

proportional_area (boolean): representations used to compare values through area size. Typically, using circle- or square-like shapes.

other (boolean): it includes all other types of non-temporal visualisations that do not fall into the aforementioned categories.

Temporal visualisations and encodings. In addition to non-temporal visualisations, a group of techniques to encode temporality is considered in order to enable comparisons with [7]. Columns:

timeline (boolean): the display of a list of data points or spans in chronological order. They include timelines working either with a scale or simply displaying events in sequence. As in [7], we also include structured solutions resembling Gantt chart layouts.

temporal_dimension (boolean): to report when time is mapped to any dimension of a visualisation, with the exclusion of timelines. We use the term “dimension” and not “axis” as in [7] as more appropriate for radial layouts or more complex representational choices.

animation (boolean): temporality is perceived through an animation changing the visualisation according to time flow.

visual_variable (boolean): another visual encoding strategy is used to represent any temporality-related variable (e.g., colour).

Interaction techniques. A set of categories to assess affordable interaction techniques based on the concept of user intent [8] and user-allowed data actions [9]. The following categories roughly match the “processing”, “mapping”, and “presentation” actions from [9] and the manipulative subset of methods of the “how” an interaction is performed in the conception of [10]. Only interactions that affect the visual representation or the aspect of data points, symbols, and glyphs are taken into consideration. Columns:

basic_selection (boolean): the demarcation of an element either for the duration of the interaction or more permanently until the occurrence of another selection.

advanced_selection (boolean): the demarcation involves both the selected element and connected elements within the visualisation or leads to brush and link effects across views. Basic selection is tacitly implied.

navigation (boolean): interactions that allow moving, zooming, panning, rotating, and scrolling the view but only when applied to the visualisation and not to the web page. It also includes “drill” interactions (to navigate through different levels or portions of data detail, often generating a new view that replaces or accompanies the original) and “expand” interactions generating new perspectives on data by expanding and collapsing nodes.

arrangement (boolean): methods to organise visualisation elements (symbols, glyphs, etc.) or multi-visualisation
IPL 2025 Player Auction and Retention Data
kaggle.com
Updated Mar 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Osama Hafeez (2025). IPL 2025 Player Auction and Retention Data [Dataset]. https://www.kaggle.com/datasets/osamahafeez002/ipl-2025-player-auction-and-retention-data/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 13, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Osama Hafeez
Description
This dataset contains detailed information about players participating in the Indian Premier League (IPL) 2025 season. It includes player names, their auction prices, player type (capped/uncapped, Indian/Overseas), acquisition method (retained, auction, RTM), role (batter, bowler, all-rounder, wicketkeeper), and the team they belong to. This dataset is ideal for analyzing player valuations, team compositions, and trends in IPL auctions.

Columns/Features:

Player: Name of the player (including nationality for overseas players).

Price_in_cr: Price of the player in Indian Rupees (in crores).

Type: Player type (e.g., Indian capped, Indian uncapped, Overseas capped).

Acquisition: Method of acquisition (Retained, Auction, RTM).

Role: Player's role in the team (Batter, Bowler, All-rounder, Wicketkeeper).

Team: IPL team the player belongs to (e.g., Chennai Super Kings, Mumbai Indians).

Use Cases:

Player Valuation Analysis: Analyze how player prices vary based on their role, type, and acquisition method.

Team Composition Analysis: Study how teams are structured in terms of batters, bowlers, and all-rounders.

Auction Trends: Identify trends in player retention, auction prices, and RTM usage.

Machine Learning: Predict player prices or team performance based on player roles and types.

Visualizations: Create visualizations like bar charts, pie charts, and heatmaps to explore the data.
Submarine Cable Features Dataset
kaggle.com
Updated Dec 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Submarine Cable Features Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/submarine-cable-features-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 18, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
Description
Submarine Cable Features Dataset

Submarine Cable Features: Scale, Description, and Effective Dates

By Homeland Infrastructure Foundation [source]

About this dataset

The Submarine Cables dataset provides a comprehensive collection of features related to submarine cables. It includes information such as the scale band, description, and effective dates of these cables. These data are specifically designed to support coastal planning at both regional and national scales.

The dataset is derived from 2010 NOAA Electronic Navigational Charts (ENCs), along with 2009 NOAA Raster Navigational Charts (RNCs) which were updated in 2013 using the most recent RNCs as a reference point. The source material's scale varied significantly, resulting in discontinuities between multiple sources that were resolved with minimal spatial adjustments.

Polyline features representing submarine cables were extracted from the original sources while excluding 'cable areas' noted within the data. The S-57 data model was modified for improved readability and performance purposes.

Overall, this dataset provides valuable information regarding the occurrence and characteristics of submarine cables in and around U.S. navigable waters. It serves as an essential resource for coastal planning efforts at various geographic scales

How to use the dataset

Here's a guide on how to effectively utilize this dataset:

1. Familiarize Yourself with the Columns

The dataset contains multiple columns that provide important information:

scaleBand: This categorical column indicates the scale band of each submarine cable.

description: The text column provides a description of each submarine cable.

effectiveDate: Indicates the effective date of the information about each submarine cable.

Understanding these columns will help you navigate and interpret the data effectively.

2. Explore Scale Bands

Start by analyzing the distribution of different scale bands in the dataset. The scale band categorizes submarines cables based on their size or capacity. Identifying patterns or trends within specific scale bands can provide valuable insights into how submarine cables are deployed.

For example, you could analyze which scale bands are most commonly used in certain regions or countries, helping coastal planners understand infrastructure needs and potential connectivity gaps.

3. Analyze Cable Descriptions

The description column provides detailed information about each submarine cable's characteristics, purpose, or intended use. By examining these descriptions, you can uncover specific attributes related to each cable.

This information can be crucial when evaluating potential impacts on marine ecosystems, identifying areas prone to damage or interference with other maritime activities, or understanding connectivity options for coastal regions.

4. Consider Effective Dates

While excluding dates from this analysis as per your request (as we exclude them here), effective dates play an important role in keeping track of when information about a particular cable was collected or updated.

By considering effective dates over time: - You can monitor changes in infrastructure deployment strategies. - Identify areas where new cables have been installed. - Track outdated infrastructure that may need replacements or upgrades.

5. Combine with Other Datasets

To gain a comprehensive understanding and unlock deeper insights, consider integrating this dataset with other relevant datasets. For example: - Population density data can help identify areas in high need of improved connectivity. - Coastal environmental data can help assess potential ecological impacts of submarine cables.

By merging datasets, you can explore relationships, draw correlations, and make more informed decisions based on the available information.

6. Visualize the Data

Create meaningful visualizations to better understand and communicate insights from the dataset. Utilize scatter plots, bar charts, heatmaps, or GIS maps

Research Ideas

Coastal Planning: The dataset can be used for coastal planning at both regional and national scales. By analyzing the submarine cable features, planners can assess the impact of these cables on coastal infrastructure development and design plans accordingly.

Communication Network Analysis: The dataset can be utilized to analyze the connectivity and coverage of submarine cable networks. This information is valuable for telecommunications companies and network providers to understand gaps in communication infras...
d
Data from: Tephritid fruit fly gut bacterial population and community...
catalog.data.gov
agdatacommons.nal.usda.gov
+1more
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Data from: Tephritid fruit fly gut bacterial population and community dynamics following adult emergence [Dataset]. https://catalog.data.gov/dataset/data-from-tephritid-fruit-fly-gut-bacterial-population-and-community-dynamics-following-ad-d0838
Explore at:
Dataset updated
Jun 5, 2025
Dataset provided by
Agricultural Research Service
Description
Data include microbial count data (CFUs), 16S-rRNA copy number data (qPCR), and microbial community (microbiome) data from the guts of the invasive tephritid fruit flies, melon fly (Zeugodacus cucurbitae) and medfly (Ceratitis capitata). Resources in this dataset: Resource Title: R code for dada2 processing and stacked bar charts of control microbiomes File Name: Control_Processing.zip Resource Description: Data showing performance of known controls (purchased from Zymo Research) using in-house DNA extraction and PCR methods for 16S-rRNA gene amplification and sequencing. Resource Title: Data processing of 16S amplicon data File Name: 16S SSU rRNA Microbiome Data Processing and Analysis.zip Resource Description: Raw data and accompanying R scripts for analysis of figure and generation of figures and tables. Data files include both amplicon sequence variant (ASV) count data matrix and accompanying ASV sequence files and taxonomies. Analysis and figure generation are made through independent R files. Resource Title: Data and analysis of fly culturable titers File Name: CFU titers.zip Resource Description: Colony forming units (CFUs) of fruit flies at different ages and the R code for figure generation and analysis. Resource Title: qPCR of 16S rRNA of Tephritid fruit flies at different ages File Name: 16S qPCR Titers.zip Resource Description: Raw data and R code of 16S rRNA copy numbers associated with medfly and melon fly gut tissues.
NFL Injury Analysis 2012-2017
kaggle.com
Updated Dec 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). NFL Injury Analysis 2012-2017 [Dataset]. https://www.kaggle.com/datasets/thedevastator/nfl-injury-analysis-2012-2017
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 19, 2023
Dataset provided by
Kaggle
Authors
The Devastator
Description
NFL Injury Analysis 2012-2017

NFL Injuries 2012-2017: Yearly, injury type, scenario, and season type data

By Throwback Thursday [source]

About this dataset

This dataset provides comprehensive information on injuries that occurred in the National Football League (NFL) during the period from 2012 to 2017. The dataset includes details such as the type of injury sustained by players, the specific situation or event that led to the injury, and the type of season (regular season or playoffs) during which each injury occurred.

The Injury Type column categorizes the various types of injuries suffered by players, providing insights into specific anatomical areas or specific conditions. For example, it may include injuries like concussions, ankle sprains, knee ligament tears, shoulder dislocations, and many others.

The Scenario column offers further granularity by describing the specific situation or event that caused each injury. It can provide context about whether an injury happened during a tackle, collision with another player or object on field (such as goalposts), blocking maneuvers gone wrong, falls to the ground resulting from being off-balance while making plays, and other possible scenarios leading to player harm.

The Season Type column classifies when exactly each injury occurred within a particular year. It differentiates between regular season games and playoff matches – identifying whether an incident took place during high-stakes postseason competition or routine games throughout the regular season.

The Injuries column represents numeric data detailing how many times a particular combination of year-injury type-scenario-season type has occurred within this dataset's timeframe – measuring both occurrence frequency and severity for each unique combination.

Overall, this extensive dataset provides valuable insight into NFL injuries over a six-year span. By understanding which types of injuries are most prevalent under certain scenarios and during different seasons of play - such as regular seasons versus playoffs - stakeholders within professional football can identify potential areas for improvement in safety measures and develop strategies aimed at reducing player harm on-field

How to use the dataset

The dataset contains six columns:

Year: This column represents the year in which the injury occurred. It allows you to filter and analyze data based on specific years.

Injury Type: This column indicates the specific type of injury sustained by players. It includes various categories such as concussions, fractures, sprains, strains, etc.

Scenario: The scenario column describes the situation or event that led to each injury. It provides context for understanding how injuries occur during football games.

Season Type: This column categorizes injuries based on whether they occurred during regular season games or playoff games.

Injuries: The number of injuries recorded for each specific combination of year, injury type, scenario, and season type is mentioned in this column's numeric values.

Using this dataset effectively involves several steps:

Data Exploration: Start by examining all available columns carefully and making note of their meanings and data types (categorical or numeric).

Filtering Data by Year or Season Type: If you are interested in analyzing injuries during a particular year(s) or specific seasons (regular vs playoffs), apply filters accordingly using either one or both these columns respectively.

3a. Analyzing Injury Types: To gain insights into different types of reported injuries over time periods specified by your filters (e.g., a given year), group data based on Injury Type and calculate aggregate statistics like maximum occurrences or average frequency across years/seaso

3b.Scenario-based Analysis:/frequency across years/seasons. Group the data based on Scenario and calculate aggregate values to determine which situations or events lead to more injuries.

Exploring Injury Trends: Explore the overall trend of injuries throughout the 2012-2017 period to identify any significant patterns, spikes, or declines in injury occurrence.

Visualizing Data: Utilize appropriate visualization techniques such as bar graphs, line charts, or pie charts to present your findings effectively. These visualizations will help you communicate your analysis concisely and provide clear insights into both common injuries and specific scenarios.

Drawing Conclusions: Based on your analysis of the

Research Ideas

Understanding trends in NFL injuries: This dataset can be used to analyze the number and types of in...
Paintings Collection
kaggle.com
Updated Dec 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Paintings Collection [Dataset]. https://www.kaggle.com/datasets/thedevastator/paintings-collection
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 8, 2023
Dataset provided by
Kaggle
Authors
The Devastator
Description
Paintings Collection

A diverse collection of paintings from artists throughout history

By Gove Allen [source]

About this dataset

The Paintings Dataset is a rich and diverse collection of various paintings from different artists spanning across multiple time periods. It includes a wide range of art styles, techniques, and subjects, providing an extensive resource for art enthusiasts, historians, researchers, and anyone interested in exploring the world of visual arts.

This dataset aims to capture the essence of artistic expression through its vast array of paintings. From classical masterpieces to contemporary works, it offers a comprehensive perspective on the evolution of artistic creativity throughout history.

Each record in this dataset represents an individual painting with detailed information such as artist's name, artwork title (if applicable), genre/style classification (e.g., landscape, portrait), medium (e.g., oil on canvas), dimensions (height and width), and provenance details if available. Additionally, some records may include additional metadata like the year or era in which the artwork was created.

By providing such comprehensive data about each painting included within this dataset, it enables users to study various aspects of art history. Researchers can analyze trends across different time periods or explore specific artistic movements by filtering the dataset based on genre or style categories. Art enthusiasts can also use this dataset to discover new artists or artworks that align with their interests.

This valuable collection appeals not only to those seeking knowledge or inspiration from renowned artworks but also encourages exploration into lesser-known pieces that may have been overlooked in mainstream discourse. It fosters engagement with cultural heritage while promoting diversity and inclusivity within the realm of visual arts.

Whether you are interested in studying classical works by universally acclaimed painters like Leonardo da Vinci or exploring modern expressions by emerging contemporary artists—this Paintings Dataset has something for everyone who appreciates aesthetics and enjoys unraveling stories through brushstrokes on canvas

How to use the dataset

How to Use the Paintings Dataset

Welcome to the Paintings Dataset! This dataset is a comprehensive collection of various paintings from different artists and time periods. It contains information about the artist, title, genre, style, and medium of each painting. Whether you are an art enthusiast, researcher, or just curious about paintings, this guide will help you navigate through this dataset easily.

1. Understanding the Columns

This dataset consists of several columns that provide detailed information about each painting. Here is a brief description of each column:

Artist: The name of the artist who created the painting.

Title: The title or name given to the artwork by the artist.

Genre: The artistic category or subject matter depicted in the painting.

Style: The specific artistic style or movement associated with the painting.

Medium: The materials and techniques used by the artist to create the artwork.

2. Exploring Artists and Their Paintings

One interesting way to use this dataset is to explore individual artists and their artworks. You can filter by a specific artist's name in order to retrieve all their paintings included in this collection.

For example: If you are interested in exploring all paintings by Leonardo da Vinci, simply filter using Leonardo da Vinci in Artist column using your preferred data analysis tool.

3. Analyzing Painting Genres

The genre column allows you to analyze different categories within this collection of paintings. You can examine popular genres or compare them across different eras.

To analyze genres: - Get unique values for Genre column. - Count frequency for each genre value. - Visualize results using bar charts or other graphical representations.

You might discover which genres were more predominant during certain periods or which artists were known for specific subjects!

4. Investigating Artistic Styles

Similar to genres, artistic styles also play an essential role in the world of painting. This dataset includes various styles like Impressionism, Cubism, Realism, etc. By analyzing the artistic styles column, you can explore trends and shifts in artistic movements.

To investigate styles: - Get unique values for Style column. - Count frequency for each style value. - Visualize results using bar charts or other graphical...
Data from: Mapping Research Data at the University of Bologna: Dataset
zenodo.org
csv, pdf
Updated Mar 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sara Coppini; Sara Coppini; Giulia Caldoni; Giulia Caldoni; Bianca Gualandi; Bianca Gualandi; Mario Marino; Mario Marino (2025). Mapping Research Data at the University of Bologna: Dataset [Dataset]. http://doi.org/10.5281/zenodo.14234555
Explore at:
csv, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14234555
Dataset updated
Mar 26, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sara Coppini; Sara Coppini; Giulia Caldoni; Giulia Caldoni; Bianca Gualandi; Bianca Gualandi; Mario Marino; Mario Marino
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Dec 12, 2024
Area covered
Bologna
Description
This dataset was developed within an analysis of research data generated and managed within the University of Bologna, with respect to the differences and commonalities between disciplines and potential challenges for institutional data support services and infrastructures. We are primarily mapping the type (e.g., image), content (e.g., scan of a manuscript) and format (e.g., .tiff) of managed data, thus sustaining the value of FAIR data as granular resources.

The analysis is based on data management plans (DMPs) produced by grantees of Horizon Europe and Horizon 2020 funding who are affiliated to the University of Bologna and are either project coordinators or partners in charge of the DMP. We are including in the study only the DMPs shared with us between May 2022 (when the data stewards team was created) and October 2023.

In short, we have selected variables of interest to be headers of a table that is progressively filled with information garnered through a close reading of the DMPs.

Computational analysis (R version 4.2.2) on the collected data produce graphs showing composition, relationship (bar graphs, pie charts and alluvial/sankey charts) and incidences (waterfall graph) of the different variables. Code for computational analysis on this data is "Mapping Reseach Data at the University of Bologna: Code" and it is also deposited on Zenodo (see Related Works).
a
The Venue Stats
aurora-public-arts-cityofaurora.hub.arcgis.com
opendata-cityofaurora.hub.arcgis.com
+2more
Updated Apr 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Aurora GIS Online (2025). The Venue Stats [Dataset]. https://aurora-public-arts-cityofaurora.hub.arcgis.com/datasets/the-venue-stats
Explore at:
Dataset updated
Apr 4, 2025
Dataset authored and provided by
City of Aurora GIS Online
Description
The Venue dataset includes yearly show and ticket figures ranging from 2019 to present day. The data presented consists of the following columns:YearEvent TypeShow CountTicket CountThe Venue music hall which hosts concerts in downtown Aurora is operated by the Fox Valley Music Foundation, a 501(c)3 non-profit dedicated to music education and preservation. The dashboard above depicts number of shows and tickets sold from 2019 to present.It is important to note that prior to 2022, show types were not captured, only number of shows. Thus the stacked bar chart representing show types and counts only begins at 2022.

Facebook

Twitter

Click to copy link

Link copied

Cite

Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic (2023). Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm [Dataset]. http://doi.org/10.1371/journal.pbio.1002128

Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm

Explore at:

325 scholarly articles cite this dataset (View in Google Scholar)

docxAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pbio.1002128

Dataset updated

May 31, 2023

Dataset provided by

PLOShttp://plos.org/

Authors

Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.

Clear search

Close search

Google apps

Main menu

Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm

Part I and Part II crimes bar chart

Civic Projects - Status Count (Bar Chart)

Pokemon Detective: Unmask Team Rocket

🔍 The Case of the Disguised Villains: Predicting Team Rocket with Data

📊 Dataset Structure & Features

🕵️ Can You Crack the Case?

🔎 10 Key Questions & Suggested Analysis Techniques

Code book of RTL visualization in Arabic News media

ckanext-charts

Electronic Sales

SlideImages

Global Refugees Dataset 1951-2015

ckanext-graph

Hanhaoyang123/Pozzolanic-activity-experimental-dataset-of-calcined-coal-gangue-v1.0.2:...

Dataset of frequency of occurrence of words in title of journal articles of...

Classification of web-based Digital Humanities projects leveraging...

Description

Classification schema: categories and columns

IPL 2025 Player Auction and Retention Data

Submarine Cable Features Dataset

Submarine Cable Features Dataset

Submarine Cable Features: Scale, Description, and Effective Dates

About this dataset

How to use the dataset

1. Familiarize Yourself with the Columns

2. Explore Scale Bands

3. Analyze Cable Descriptions

4. Consider Effective Dates

5. Combine with Other Datasets

6. Visualize the Data

Research Ideas

Data from: Tephritid fruit fly gut bacterial population and community...

NFL Injury Analysis 2012-2017

NFL Injury Analysis 2012-2017

NFL Injuries 2012-2017: Yearly, injury type, scenario, and season type data

About this dataset

How to use the dataset

Research Ideas

Paintings Collection

Paintings Collection

A diverse collection of paintings from artists throughout history

About this dataset

How to use the dataset

How to Use the Paintings Dataset

1. Understanding the Columns

2. Exploring Artists and Their Paintings

3. Analyzing Painting Genres

4. Investigating Artistic Styles

Data from: Mapping Research Data at the University of Bologna: Dataset

The Venue Stats

Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm