67 datasets found

Data from: Statistical Graphs in Mathematical Textbooks of Primary Education...
scielo.figshare.com
datasetcatalog.nlm.nih.gov
jpeg
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Danilo Díaz-Levicoy; Miluska Osorio; Pedro Arteaga; Francisco Rodríguez-Alveal (2023). Statistical Graphs in Mathematical Textbooks of Primary Education in Perú [Dataset]. http://doi.org/10.6084/m9.figshare.6857033.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6857033.v1
Dataset updated
May 30, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Danilo Díaz-Levicoy; Miluska Osorio; Pedro Arteaga; Francisco Rodríguez-Alveal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract This paper presents the results of the statistical graphs’ analysis according to the curricular guidelines and its implementation in eighteen primary education mathematical textbooks in Perú, which correspond to three complete series and are from different editorials. In them, through a content analysis, we analyzed sections where graphs appeared, identifying the type of activity that arises from the graphs involved, the demanded reading level and the semiotic complexity task involved. The textbooks are partially suited to the curricular guidelines regarding the graphs presentation by educational level and the number of activities proposed by the three editorials are similar. The main activity that is required in textbooks is calculating and building. The predominance of bar graphs, a basic reading level and the representation of an univariate data distribution in the graph are observed in this study.
f
Data from: Statistical Graphs in Costa Rica Textbooks for Primary Education
scielo.figshare.com
jpeg
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maynor Jiménez-Castro; Pedro Arteaga; Carmen Batanero (2023). Statistical Graphs in Costa Rica Textbooks for Primary Education [Dataset]. http://doi.org/10.6084/m9.figshare.12171666.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12171666.v1
Dataset updated
Jun 3, 2023
Dataset provided by
SciELO journals
Authors
Maynor Jiménez-Castro; Pedro Arteaga; Carmen Batanero
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Costa Rica
Description
Abstract The aim of this work was to analyze the statistical graphs included in the two most frequently series of textbooks used in Costa Rica basic education. We analyze the type of graph, its semiotic complexity, and the data context, as well as the type of task, reading level required to complete the task and purpose of the graph within the task. We observed the predominance of bar graphs, third level of semiotic complexity (representing a distribution), second reading level (reading between the data), work and school context, reading and computation tasks and analysis purpose. We describe the differences in the various grades and between both editorials, as well as differences and coincidences with results of other textbook studies carried out in Spain and Chile.
Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm
plos.figshare.com
docx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic (2023). Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm [Dataset]. http://doi.org/10.1371/journal.pbio.1002128
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pbio.1002128
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.
q
Intro to Data Types and Graphing Lab
qubeshub.org
Updated Oct 12, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephanie Spera (2020). Intro to Data Types and Graphing Lab [Dataset]. http://doi.org/10.25334/1XYA-TF48
Explore at:
Unique identifier
https://doi.org/10.25334/1XYA-TF48
Dataset updated
Oct 12, 2020
Dataset provided by
QUBES
Authors
Stephanie Spera
Description
This is the third lab in an Introductory Physical Geography/Environmental Studies course. It introduces students to different data types (qualitative vs quantitative), basic statistical analyses (correlation analysis s, t-test), and graphing techniques.
Stack Exchange Graphs (SNAP)
kaggle.com
zip
Updated Dec 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Subhajit Sahu (2021). Stack Exchange Graphs (SNAP) [Dataset]. https://www.kaggle.com/datasets/wolfram77/graphs-snap-sx
Explore at:
zip(1480133729 bytes)Available download formats
Dataset updated
Dec 16, 2021
Authors
Subhajit Sahu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Ask Ubuntu temporal network

https://snap.stanford.edu/data/sx-askubuntu.html

Dataset information

This is a temporal network of interactions on the stack exchange web site
Ask Ubuntu (http://askubuntu.com/). There are three different types of
interactions represented by a directed edge (u, v, t):

user u answered user v's question at time t (in the graph sx-askubuntu-a2q) user u commented on user v's question at time t (in the graph
sx-askubuntu-c2q) user u commented on user v's answer at time t (in the
graph sx-askubuntu-c2a)

The graph sx-askubuntu contains the union of these graphs. These graphs
were constructed from the Stack Exchange Data Dump. Node ID numbers
correspond to the 'OwnerUserId' tag in that data dump.

Dataset statistics (sx-askubuntu)
Nodes 159,316
Temporal Edges 964,437
Edges in static graph 596,933
Time span 2613 days

Dataset statistics (sx-askubuntu-a2q)
Nodes 137,517
Temporal Edges 280,102
Edges in static graph 262,106
Time span 2613 days

Dataset statistics (sx-askubuntu-c2q)
Nodes 79,155
Temporal Edges 327,513
Edges in static graph 198,852
Time span 2047 days

Dataset statistics (sx-askubuntu-c2a)
Nodes 75,555
Temporal Edges 356,822
Edges in static graph 178,210
Time span 2418 days

Source (citation)
Ashwin Paranjape, Austin R. Benson, and Jure Leskovec. "Motifs in Temporal Networks." In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, 2017.

Files
File Description
sx-askubuntu.txt.gz All interactions
sx-askubuntu-a2q.txt.gz Answers to questions
sx-askubuntu-c2q.txt.gz Comments to questions
sx-askubuntu-c2a.txt.gz Comments to answers

Data format

SRC DST UNIXTS

where edges are separated by a new line and

SRC: id of the source node (a user) TGT: id of the target node (a user) UNIXTS: Unix timestamp (seconds since the epoch) ...
Further education and skills - Underlying Charts Data
explore-education-statistics.service.gov.uk
Updated Nov 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Education (2024). Further education and skills - Underlying Charts Data [Dataset]. https://explore-education-statistics.service.gov.uk/data-catalogue/data-set/c0579bf7-96fd-4771-9034-e8642b529114
Explore at:
Dataset updated
Nov 28, 2024
Dataset authored and provided by
Department for Educationhttps://gov.uk/dfe
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Historical time series of headline adult (19+) further education and skills learner participation, containing breakdowns by provision type and in some cases level. Also includes some all age apprenticeship participation figures.Academic years: 2005/06 to 2023/24 full academic yearsIndicators: ParticipationFilter: Provision type, Age group, Level
Compare Baseball Player Statistics using Visualiza
kaggle.com
zip
Updated Sep 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdelaziz Sami (2024). Compare Baseball Player Statistics using Visualiza [Dataset]. https://www.kaggle.com/datasets/abdelazizsami/compare-baseball-player-statistics-using-visualiza
Explore at:
zip(1030978 bytes)Available download formats
Dataset updated
Sep 28, 2024
Authors
Abdelaziz Sami
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
To compare baseball player statistics effectively using visualization, we can create some insightful plots. Below are the steps to accomplish this in Python using libraries like Pandas and Matplotlib or Seaborn.

1. Load the Data

First, we need to load the judge.csv file into a DataFrame. This will allow us to manipulate and analyze the data easily.

2. Explore the Data

Before creating visualizations, it’s good to understand the data structure and identify the columns we want to compare. The relevant columns in your data include pitch_type, release_speed, game_date, and events.

3. Visualization

We can create various visualizations, such as: - A bar chart to compare the average release speed of different pitch types. - A line plot to visualize trends over time based on game dates. - A scatter plot to analyze the relationship between release speed and the outcome of the pitches (e.g., strikeouts, home runs).

Example Code

Here is a sample code to demonstrate how to create these visualizations using Matplotlib and Seaborn:

import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # Load the data df = pd.read_csv('judge.csv') # Display the first few rows of the dataframe print(df.head()) # Set the style of seaborn sns.set(style="whitegrid") # 1. Average Release Speed by Pitch Type plt.figure(figsize=(12, 6)) avg_speed = df.groupby('pitch_type')['release_speed'].mean().sort_values() sns.barplot(x=avg_speed.values, y=avg_speed.index, palette="viridis") plt.title('Average Release Speed by Pitch Type') plt.xlabel('Average Release Speed (mph)') plt.ylabel('Pitch Type') plt.show() # 2. Trends in Release Speed Over Time # First, convert the 'game_date' to datetime df['game_date'] = pd.to_datetime(df['game_date']) plt.figure(figsize=(14, 7)) sns.lineplot(data=df, x='game_date', y='release_speed', estimator='mean', ci=None) plt.title('Trends in Release Speed Over Time') plt.xlabel('Game Date') plt.ylabel('Average Release Speed (mph)') plt.xticks(rotation=45) plt.tight_layout() plt.show() # 3. Scatter Plot of Release Speed vs. Events plt.figure(figsize=(12, 6)) sns.scatterplot(data=df, x='release_speed', y='events', hue='pitch_type', alpha=0.7) plt.title('Release Speed vs. Events') plt.xlabel('Release Speed (mph)') plt.ylabel('Event Type') plt.legend(title='Pitch Type', bbox_to_anchor=(1.05, 1), loc='upper left') plt.show()

Explanation of the Code

Data Loading: The CSV file is loaded into a Pandas DataFrame.

Average Release Speed: A bar chart shows the average release speed for each pitch type.

Trends Over Time: A line plot illustrates the trend in release speed over time, which can indicate changes in performance or strategy.

Scatter Plot: A scatter plot visualizes the relationship between release speed and different events, providing insight into performance outcomes.

Conclusion

These visualizations will help you compare player statistics in a meaningful way. You can customize the plots further based on your specific needs, such as filtering data for specific players or seasons. If you have any specific comparisons in mind or additional data to visualize, let me know!
NLP feature set variables for TwiBot-20.
plos.figshare.com
xls
Updated Dec 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agata Skorupka (2024). NLP feature set variables for TwiBot-20. [Dataset]. http://doi.org/10.1371/journal.pone.0315849.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0315849.t001
Dataset updated
Dec 23, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Agata Skorupka
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The study examines different graph-based methods of detecting anomalous activities on digital markets, proposing the most efficient way to increase market actors’ protection and reduce information asymmetry. Anomalies are defined below as both bots and fraudulent users (who can be both bots and real people). Methods are compared against each other, and state-of-the-art results from the literature and a new algorithm is proposed. The goal is to find an efficient method suitable for threat detection, both in terms of predictive performance and computational efficiency. It should scale well and remain robust on the advancements of the newest technologies. The article utilized three publicly accessible graph-based datasets: one describing the Twitter social network (TwiBot-20) and two describing Bitcoin cryptocurrency markets (Bitcoin OTC and Bitcoin Alpha). In the former, an anomaly is defined as a bot, as opposed to a human user, whereas in the latter, an anomaly is a user who conducted a fraudulent transaction, which may (but does not have to) imply being a bot. The study proves that graph-based data is a better-performing predictor than text data. It compares different graph algorithms to extract feature sets for anomaly detection models. It states that methods based on nodes’ statistics result in better model performance than state-of-the-art graph embeddings. They also yield a significant improvement in computational efficiency. This often means reducing the time by hours or enabling modeling on significantly larger graphs (usually not feasible in the case of embeddings). On that basis, the article proposes its own graph-based statistics algorithm. Furthermore, using embeddings requires two engineering choices: the type of embedding and its dimension. The research examines whether there are types of graph embeddings and dimensions that perform significantly better than others. The solution turned out to be dataset-specific and needed to be tailored on a case-by-case basis, adding even more engineering overhead to using embeddings (building a leaderboard of grid of embedding instances, where each of them takes hours to be generated). This, again, speaks in favor of the proposed algorithm based on nodes’ statistics. The research proposes its own efficient algorithm, which makes this engineering overhead redundant.
H
United States Cancer Statistics (USCS)
dataverse.harvard.edu
Updated May 4, 2011
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2011). United States Cancer Statistics (USCS) [Dataset]. http://doi.org/10.7910/DVN/JBJVUW
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/JBJVUW
Dataset updated
May 4, 2011
Dataset provided by
Harvard Dataverse
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
United States
Description
Users can download the data set and static graphs, tables and charts regarding cancers in the United States. Background The United States Cancer Statistics is web-based report created by the Centers for Disease Control and Prevention, in partnership with the National Cancer Institute (NCI) and the North American Association of Central Cancer Registries (NAACCR). The site contains cancer incidence and cancer mortality data. Specific information includes: the top ten cancers, state vs. national comparisons, selected cancers, childhood cancer, cancers grouped by state/ region, cancers gr ouped by race/ ethnicity and brain cancers by tumor type. User Functionality Users can view static graphs, tables and charts, which can be downloaded. Within childhood cancer, users can view by year and by cancer type and age group or by cancer type and racial/ ethnic group. Otherwise, users can view data by female, male or male and female. Users may also download the entire data sets directly. Data Notes The data sources for the cancer incidence data are the CD C's National Program for Cancer Registries (NPCR) and NCI's Surveillance, Epidemiology and End Result (SEER). CDC's National Vital Statistics System (NVSS) collects the data on cancer mortality. Data is available for each year between 1999 and 2007 or for 2003- 2007 combined. The site does not specify when new data becomes available.
96 wells fluorescence reading and R code statistic for analysis
zenodo.org
bin, csv, doc, pdf
Updated Aug 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JVD Molino; JVD Molino (2024). 96 wells fluorescence reading and R code statistic for analysis [Dataset]. http://doi.org/10.5281/zenodo.1119285
Explore at:
doc, csv, pdf, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1119285
Dataset updated
Aug 2, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
JVD Molino; JVD Molino
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview

Data points present in this dataset were obtained following the subsequent steps: To assess the secretion efficiency of the constructs, 96 colonies from the selection plates were evaluated using the workflow presented in Figure Workflow. We picked transformed colonies and cultured in 400 μL TAP medium for 7 days in Deep-well plates (Corning Axygen®, No.: PDW500CS, Thermo Fisher Scientific Inc., Waltham, MA), covered with Breathe-Easy® (Sigma-Aldrich®). Cultivation was performed on a rotary shaker, set to 150 rpm, under constant illumination (50 μmol photons/m²s). Then 100 μL sample were transferred clear bottom 96-well plate (Corning Costar, Tewksbury, MA, USA) and fluorescence was measured using an Infinite® M200 PRO plate reader (Tecan, Männedorf, Switzerland). Fluorescence was measured at excitation 575/9 nm and emission 608/20 nm. Supernatant samples were obtained by spinning Deep-well plates at 3000 × g for 10 min and transferring 100 μL from each well to the clear bottom 96-well plate (Corning Costar, Tewksbury, MA, USA), followed by fluorescence measurement. To compare the constructs, R Statistic version 3.3.3 was used to perform one-way ANOVA (with Tukey's test), and to test statistical hypotheses, the significance level was set at 0.05. Graphs were generated in RStudio v1.0.136. The codes are deposit herein.

Info

ANOVA_Turkey_Sub.R -> code for ANOVA analysis in R statistic 3.3.3

barplot_R.R -> code to generate bar plot in R statistic 3.3.3

boxplotv2.R -> code to generate boxplot in R statistic 3.3.3

pRFU_+_bk.csv -> relative supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

sup_+_bl.csv -> supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

sup_raw.csv -> supernatant mCherry fluorescence dataset of 96 colonies for each construct.

who_+_bl2.csv -> whole culture mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

who_raw.csv -> whole culture mCherry fluorescence dataset of 96 colonies for each construct.

who_+_Chlo.csv -> whole culture chlorophyll fluorescence dataset of 96 colonies for each construct.

Anova_Output_Summary_Guide.pdf -> Explain the ANOVA files content

ANOVA_pRFU_+_bk.doc -> ANOVA of relative supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

ANOVA_sup_+_bk.doc -> ANOVA of supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

ANOVA_who_+_bk.doc -> ANOVA of whole culture mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

ANOVA_Chlo.doc -> ANOVA of whole culture chlorophyll fluorescence of all constructs, plus average and standard deviation values.

Consider citing our work.

Molino JVD, de Carvalho JCM, Mayfield SP (2018) Comparison of secretory signal peptides for heterologous protein expression in microalgae: Expanding the secretion portfolio for Chlamydomonas reinhardtii. PLoS ONE 13(2): e0192433. https://doi.org/10.1371/journal. pone.0192433
f
Appendix A. Supporting fertilization scheme, statistical results, graphs,...
datasetcatalog.nlm.nih.gov
wiley.figshare.com
Updated Aug 9, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pierik, Marleen; Bezemer, T. Martijn; van Ruijven, Jasper; Berendse, Frank; Geerts, Rob H. E. M. (2016). Appendix A. Supporting fertilization scheme, statistical results, graphs, and species information. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001597956
Explore at:
Dataset updated
Aug 9, 2016
Authors
Pierik, Marleen; Bezemer, T. Martijn; van Ruijven, Jasper; Berendse, Frank; Geerts, Rob H. E. M.
Description
Supporting fertilization scheme, statistical results, graphs, and species information.
Higgs Twitter Graphs (SNAP)
kaggle.com
zip
Updated Jan 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Subhajit Sahu (2022). Higgs Twitter Graphs (SNAP) [Dataset]. https://www.kaggle.com/datasets/wolfram77/graphs-snap-higgs-twitter/versions/2
Explore at:
zip(125886451 bytes)Available download formats
Dataset updated
Jan 2, 2022
Authors
Subhajit Sahu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Higgs Twitter Dataset

https://snap.stanford.edu/data/higgs-twitter.html

Dataset information

The Higgs dataset has been built after monitoring the spreading processes
on Twitter before, during and after the announcement of the discovery of a new particle with the features of the elusive Higgs boson on 4th July 2012. The messages posted in Twitter about this discovery between 1st and 7th
July 2012 are considered.

The four directional networks made available here have been extracted from user activities in Twitter as:

1. re-tweeting (retweet network) 2. replying (reply network) to existing tweets 3. mentioning (mention network) other users 4. friends/followers social relationships among user involved in the above activities 5. information about activity on Twitter during the discovery of Higgs boson

It is worth remarking that the user IDs have been anonimized, and the same user ID is used for all networks. This choice allows to use the Higgs
dataset in studies about large-scale interdependent/interconnected
multiplex/multilayer networks, where one layer accounts for the social
structure and three layers encode different types of user dynamics.

For more information about data collection, please refer to our paper.

Dataset statistics are calculated for the graph with the highest number of nodes and edges:

Social Network statistics
Nodes 456,626
Edges 14,855,842
Nodes in largest WCC 456290 (0.999)
Edges in largest WCC 14855466 (1.000)
Nodes in largest SCC 360210 (0.789)
Edges in largest SCC 14102605 (0.949)
Average clustering coefficient 0.1887
Number of triangles 83023401
Fraction of closed triangles 0.002901
Diameter (longest shortest path) 9
90-percentile effective diameter 3.7

Retweet Network statistics
Nodes 256,491
Edges 328,132
Nodes in largest WCC 223833 (0.873)
Edges in largest WCC 308596 (0.940)
Nodes in largest SCC 984 (0.004)
Edges in largest SCC 3850 (0.012)
Average clustering coefficient 0.0156
Number of triangles 21172
Fraction of closed triangles 0.0001085
Diameter (longest shortest path) 19
90-percentile effective diameter 6.8

Reply Network statistics
Nodes 38,918
Edges 32,523
Nodes in largest WCC 12839 (0.330)
Edges in largest WCC 14944 (0....
f
Statistics information of datasets.
figshare.com
xls
Updated Oct 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhen Xie; Wenzhe Hou; Feiyang Wu; Hao Xu (2025). Statistics information of datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0334724.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0334724.t001
Dataset updated
Oct 23, 2025
Dataset provided by
PLOS ONE
Authors
Zhen Xie; Wenzhe Hou; Feiyang Wu; Hao Xu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Graphs are a representative type of fundamental data structures. They are capable of representing complex association relationships in diverse domains. For large-scale graph processing, the stream graphs have become efficient tools to process dynamically evolving graph data. When processing stream graphs, the subgraph counting problem is a key technique, which faces significant computational challenges due to its #P-complete nature. This work introduces StreamSC, a novel framework that efficiently estimate subgraph counting results on stream graphs through two key innovations: (i) It’s the first learning-based framework to address the subgraph counting problem focused on stream graphs; and (ii) this framework addresses the challenges from dynamic changes of the data graph caused by the insertion or deletion of edges. Experiments on 5 real-word graphs show the priority of StreamSC on accuracy and efficiency.
m
Statistical performance indicators (SPI): Pillar 4 data sources score (scale...
macro-rankings.com
csv, excel
Updated Dec 31, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
macro-rankings (2015). Statistical performance indicators (SPI): Pillar 4 data sources score (scale 0-100) - Botswana [Dataset]. https://www.macro-rankings.com/botswana/statistical-performance-indicators-(spi)-pillar-4-data-sources-score-(scale-0-100)
Explore at:
csv, excelAvailable download formats
Dataset updated
Dec 31, 2015
Dataset authored and provided by
macro-rankings
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Botswana
Description
Time series data for the statistic Statistical performance indicators (SPI): Pillar 4 data sources score (scale 0-100) and country Botswana. Indicator Definition:The data sources overall score is a composity measure of whether countries have data available from the following sources: Censuses and surveys, administrative data, geospatial data, and private sector/citizen generated data. The data sources (input) pillar is segmented by four types of sources generated by (i) the statistical office (censuses and surveys), and sources accessed from elsewhere such as (ii) administrative data, (iii) geospatial data, and (iv) private sector data and citizen generated data. The appropriate balance between these source types will vary depending on a country’s institutional setting and the maturity of its statistical system. High scores should reflect the extent to which the sources being utilized enable the necessary statistical indicators to be generated. For example, a low score on environment statistics (in the data production pillar) may reflect a lack of use of (and low score for) geospatial data (in the data sources pillar). This type of linkage is inherent in the data cycle approach and can help highlight areas for investment required if country needs are to be met.The indicator "Statistical performance indicators (SPI): Pillar 4 data sources score (scale 0-100)" stands at 62.75 as of 12/31/2023. Regarding the One-Year-Change of the series, the current value is equal to the value the year prior.The 1 year change in percent is 0.0.The 3 year change in percent is 17.97.The 5 year change in percent is 7.93.The Serie's long term average value is 58.13. It's latest available value, on 12/31/2023, is 7.95 percent higher, compared to it's long term average value.The Serie's change in percent from it's minimum value, on 12/31/2020, to it's latest available value, on 12/31/2023, is +17.97%.The Serie's change in percent from it's maximum value, on 12/31/2022, to it's latest available value, on 12/31/2023, is 0.0%.
AI4PROFHEALTH - Profession-health status co-occurrence graph statistics
zenodo.org
zip
Updated Nov 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rodríguez-Ortega, Miguel; Rodríguez-Ortega, Miguel (2024). AI4PROFHEALTH - Profession-health status co-occurrence graph statistics [Dataset]. http://doi.org/10.5281/zenodo.14223005
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14223005
Dataset updated
Nov 26, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rodríguez-Ortega, Miguel; Rodríguez-Ortega, Miguel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the Pointwise Mutual Information (PMI) values for co-occurrence pairs between different mention categories extracted from two distinct clinical datasets: MESINESP2 and the Clinical Case Reports Collection. PMI is a statistical measure used to assess the strength of association between pairs of entities by comparing their observed co-occurrence to the expected frequency under the assumption of independence.

The datasets include PMI values for each co-occurrence pair, derived from the association of professions and clinical concepts, with the aim of identifying potential occupational health risks. By sharing these datasets, we aim to support further research into the relationships between professions and clinical entities, enabling the development of more accurate and targeted occupational health risk models.

There is a separate file for each corpus, and each dataset is provided in CSV format for easy access and analysis. These files include the PMI values for co-occurrence pairs extracted from the respective corpora, making them suitable for further data analysis.

Data Structure:

MESINESP2: mesinesp2_co-occurrence_pmi.zip

Clinical case reports: clinical_cases_co-occurrence_pmi.zip

The repository contains a .zip file for each of the corpus, each containing a .csv file with the co-occurrences between the detected professions and clinical entities. The file has the following columns order:

span_mention_1: Mention string (original): profession

normalized_entity_1: Controlled vocabulary entry for this term

mention1_category: Semantic class (i.e., NER label)

mention1_freq: Absolute frequency of this mention entity 1

span_mention_2: Mention string (original): entity 2 (disease, symptom, species, etc.)

normalized_entity_2: Controlled vocabulary entry for this term

mention2_category: Semantic class (i.e., NER label)

mention1_freq: Absolute frequency of this mention entity 2

co-occurrence: Number of co-occurrences

PMID: PMID value

Notes

This resource been funded by the Spanish National Proyectos I+D+i 2020 AI4ProfHealth project PID2020-119266RA-I00 (PID2020-119266RA-I0/AEI/10.13039/501100011033).

Contact

If you have any questions or suggestions, please contact us at:

- Miguel Rodríguez Ortega (

Additional resources and corpora

If you are interested, you might want to check out these corpora and resources:

MEDDOPROF (Corpus of mentions of professions, occupations and working status and normalization, different document collection with some overlapping documents)

MESINESP-2 (Corpus of manually indexed records with DeCS /MeSH terms comprising scientific literature abstracts, clinical trials, and patent abstracts, different document collection)
F
Average Price: Gasoline, All Types (Cost per Gallon/3.785 Liters) in...
fred.stlouisfed.org
json
Updated Oct 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Average Price: Gasoline, All Types (Cost per Gallon/3.785 Liters) in Washington-Arlington-Alexandria, DC-VA-MD-WV (CBSA) [Dataset]. https://fred.stlouisfed.org/series/APUS35A7471A
Explore at:
jsonAvailable download formats
Dataset updated
Oct 24, 2025
License
https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain
Area covered
Maryland, Washington Metropolitan Area, Washington-Arlington-Alexandria, DC-VA-MD-WV, West Virginia
Description
Graph and download economic data for Average Price: Gasoline, All Types (Cost per Gallon/3.785 Liters) in Washington-Arlington-Alexandria, DC-VA-MD-WV (CBSA) (APUS35A7471A) from Jan 1978 to Sep 2025 about DC, Washington, WV, MD, energy, VA, gas, urban, retail, price, and USA.
Preferred variables (mean score 4.21) and chart types for dashboard...
plos.figshare.com
xls
Updated Sep 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ratchanont Thippimanporn; Wuttichai Khamna; Kannika Wiratchawa; Thanapong Intharah (2025). Preferred variables (mean score 4.21) and chart types for dashboard construction based on user assessment (n=10). [Dataset]. http://doi.org/10.1371/journal.pone.0332484.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0332484.t005
Dataset updated
Sep 17, 2025
Dataset provided by
PLOShttp://plos.org/
Authors
Ratchanont Thippimanporn; Wuttichai Khamna; Kannika Wiratchawa; Thanapong Intharah
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Preferred variables (mean score 4.21) and chart types for dashboard construction based on user assessment (n=10).
Share of French people who have experienced discrimination 2016, by type and...
statista.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista, Share of French people who have experienced discrimination 2016, by type and gender [Dataset]. https://www.statista.com/statistics/982298/people-discrimination-by-type-and-gender-france/
Explore at:
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Feb 18, 2016 - May 26, 2016
Area covered
France
Description
This graph shows the percentage of French people who have experienced discrimination based on gender, age, origin, skin color, religion, health condition, disability, pregnancy/maternity in France in 2016, distributed by gender and type of discrimination. It appears that more than 23 percent of responding women stated that they have already been discriminated because of their gender compared to 5.5 percent of responding men.
Water usage statistics chart of Taiwan Water Corporation
data.gov.tw
csv
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Taiwan Water Corporation (2025). Water usage statistics chart of Taiwan Water Corporation [Dataset]. https://data.gov.tw/en/datasets/86112
Explore at:
csvAvailable download formats
Dataset updated
Jun 1, 2025
Dataset authored and provided by
Taiwan Water Corporationhttps://www.water.gov.tw/
License
https://data.gov.tw/licensehttps://data.gov.tw/license
Description
Provide the company's various types of water usage segmented statistical table for water usage.
KG20C Scholarly Knowledge Graph
kaggle.com
zip
Updated Nov 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
T H N (2025). KG20C Scholarly Knowledge Graph [Dataset]. https://www.kaggle.com/tranhungnghiep/kg20c-scholarly-knowledge-graph
Explore at:
zip(1369962 bytes)Available download formats
Dataset updated
Nov 21, 2025
Authors
T H N
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Context

This knowledge graph is constructed to aid research in scholarly data analysis. It can serve as a standard benchmark dataset for several tasks, including knowledge graph embedding, link prediction, recommendation systems, and question answering about high quality papers from 20 top computer science conferences.

This has been introduced and used in the PhD thesis Multi-Relational Embedding for Knowledge Graph Representation and Analysis and TPDL'19 paper Exploring Scholarly Data by Semantic Query on Knowledge Graph Embedding Space.

Content

Construction protocol

Scholarly data

From the Microsoft Academic Graph dataset, we extracted high quality computer science papers published in top conferences between 1990 and 2010. The top conference list are based on the CORE ranking A* conferences. The data was cleaned by removing conferences with less than 300 publications and papers with less than 20 citations. The final list includes 20 top conferences: AAAI, AAMAS, ACL, CHI, COLT, DCC, EC, FOCS, ICCV, ICDE, ICDM, ICML, ICSE, IJCAI, NIPS, SIGGRAPH, SIGIR, SIGMOD, UAI, and WWW.

Knowledge graph

The scholarly dataset was converted to a knowledge graph by defining the entities, the relations, and constructing the triples. The knowledge graph can be seen as a labeled multi-digraph between scholarly entities, where the edge labels express there relationships between the nodes. We use 5 intrinsic entity types including Paper, Author, Affiliation, Venue, and Domain. We also use 5 intrinsic relation types between the entities including author_in_affiliation, author_write_paper, paper_in_domain, paper_cite_paper, and paper_in_venue.

Benchmark data splitting

The knowledge graph was split uniformly at random into the training, validation, and test sets. We made sure that all entities and relations in the validation and test sets also appear in the training set so that their embeddings can be learned. We also made sure that there is no data leakage and no redundant triples in these splits, thus, constitute a challenging benchmark for link prediction similar to WN18RR and FB15K-237.

Data content

File format

All files are in tab-separated-values format, compatible with other popular benchmark datasets including WN18RR and FB15K-237. For example, train.txt includes "28674CFA author_in_affiliation 075CFC38", which denotes the author with id 28674CFA works in the affiliation with id 075CFC38. The repo includes these files: - all_entity_info.txt contains id name type of all entities - all_relation_info.txt contains id of all relations - train.txt contains training triples of the form entity_1_id relation_id entity_2_id - valid.txt contains validation triples - test.txt contains test triples

Statistics

Data statistics of the KG20C knowledge graph:

Author Paper Conference Domain Affiliation
8,680 5,047 20 1,923 692

Entities Relations Training triples Validation triples Test triples
16,362 5 48,213 3,670 3,724

Acknowledgements

For the dataset and semantic query method, please cite: - Hung Nghiep Tran and Atsuhiro Takasu. Exploring Scholarly Data by Semantic Query on Knowledge Graph Embedding Space. In Proceedings of International Conference on Theory and Practice of Digital Libraries (TPDL), 2019.

For the MEI knowledge graph embedding model, please cite: - Hung Nghiep Tran and Atsuhiro Takasu. Multi-Partition Embedding Interaction with Block Term Format for Knowledge Graph Completion. In Proceedings of the European Conference on Artificial Intelligence (ECAI), 2020.

For the baseline results and extended semantic query method, please cite: - Hung Nghiep Tran. Multi-Relational Embedding for Knowledge Graph Representation and Analysis. PhD Dissertation, The Graduate University for Advanced Studies, SOKENDAI, Japan, 2020.

For the Microsoft Academic Graph dataset, please cite: - Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. An Overview of Microsoft Academic Service (MAS) and Applications. In Proceedings of the International Conference on World Wide Web (WWW), 2015.

Inspiration

We include the baseline results for two tasks on ...

Author	Paper	Conference	Domain	Affiliation
8,680	5,047	20	1,923	692

Entities	Relations	Training triples	Validation triples	Test triples
16,362	5	48,213	3,670	3,724

Facebook

Twitter

Click to copy link

Link copied

Cite

Danilo Díaz-Levicoy; Miluska Osorio; Pedro Arteaga; Francisco Rodríguez-Alveal (2023). Statistical Graphs in Mathematical Textbooks of Primary Education in Perú [Dataset]. http://doi.org/10.6084/m9.figshare.6857033.v1

Data from: Statistical Graphs in Mathematical Textbooks of Primary Education in Perú

Explore at:

jpegAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.6857033.v1

Dataset updated

May 30, 2023

Dataset provided by

SciELOhttp://www.scielo.org/

Authors

Danilo Díaz-Levicoy; Miluska Osorio; Pedro Arteaga; Francisco Rodríguez-Alveal

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Abstract This paper presents the results of the statistical graphs’ analysis according to the curricular guidelines and its implementation in eighteen primary education mathematical textbooks in Perú, which correspond to three complete series and are from different editorials. In them, through a content analysis, we analyzed sections where graphs appeared, identifying the type of activity that arises from the graphs involved, the demanded reading level and the semiotic complexity task involved. The textbooks are partially suited to the curricular guidelines regarding the graphs presentation by educational level and the number of activities proposed by the three editorials are similar. The main activity that is required in textbooks is calculating and building. The predominance of bar graphs, a basic reading level and the representation of an univariate data distribution in the graph are observed in this study.

Clear search

Close search

Google apps

Main menu

Data from: Statistical Graphs in Mathematical Textbooks of Primary Education...

Data from: Statistical Graphs in Costa Rica Textbooks for Primary Education

Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm

Intro to Data Types and Graphing Lab

Stack Exchange Graphs (SNAP)

Ask Ubuntu temporal network

Further education and skills - Underlying Charts Data

Compare Baseball Player Statistics using Visualiza

1. Load the Data

2. Explore the Data

3. Visualization

Example Code

Explanation of the Code

Conclusion

NLP feature set variables for TwiBot-20.

United States Cancer Statistics (USCS)

96 wells fluorescence reading and R code statistic for analysis

Appendix A. Supporting fertilization scheme, statistical results, graphs,...

Higgs Twitter Graphs (SNAP)

Higgs Twitter Dataset

Statistics information of datasets.

Statistical performance indicators (SPI): Pillar 4 data sources score (scale...

AI4PROFHEALTH - Profession-health status co-occurrence graph statistics

Average Price: Gasoline, All Types (Cost per Gallon/3.785 Liters) in...

Preferred variables (mean score 4.21) and chart types for dashboard...

Share of French people who have experienced discrimination 2016, by type and...

Water usage statistics chart of Taiwan Water Corporation

KG20C Scholarly Knowledge Graph

Context

Content

Construction protocol

Scholarly data

Knowledge graph

Benchmark data splitting

Data content

File format

Statistics

Acknowledgements

Inspiration

Data from: Statistical Graphs in Mathematical Textbooks of Primary Education in Perú