67 datasets found
  1. Data from: Statistical Graphs in Mathematical Textbooks of Primary Education...

    • scielo.figshare.com
    • datasetcatalog.nlm.nih.gov
    jpeg
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Danilo Díaz-Levicoy; Miluska Osorio; Pedro Arteaga; Francisco Rodríguez-Alveal (2023). Statistical Graphs in Mathematical Textbooks of Primary Education in Perú [Dataset]. http://doi.org/10.6084/m9.figshare.6857033.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Danilo Díaz-Levicoy; Miluska Osorio; Pedro Arteaga; Francisco Rodríguez-Alveal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract This paper presents the results of the statistical graphs’ analysis according to the curricular guidelines and its implementation in eighteen primary education mathematical textbooks in Perú, which correspond to three complete series and are from different editorials. In them, through a content analysis, we analyzed sections where graphs appeared, identifying the type of activity that arises from the graphs involved, the demanded reading level and the semiotic complexity task involved. The textbooks are partially suited to the curricular guidelines regarding the graphs presentation by educational level and the number of activities proposed by the three editorials are similar. The main activity that is required in textbooks is calculating and building. The predominance of bar graphs, a basic reading level and the representation of an univariate data distribution in the graph are observed in this study.

  2. f

    Data from: Statistical Graphs in Costa Rica Textbooks for Primary Education

    • scielo.figshare.com
    jpeg
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maynor Jiménez-Castro; Pedro Arteaga; Carmen Batanero (2023). Statistical Graphs in Costa Rica Textbooks for Primary Education [Dataset]. http://doi.org/10.6084/m9.figshare.12171666.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    SciELO journals
    Authors
    Maynor Jiménez-Castro; Pedro Arteaga; Carmen Batanero
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Costa Rica
    Description

    Abstract The aim of this work was to analyze the statistical graphs included in the two most frequently series of textbooks used in Costa Rica basic education. We analyze the type of graph, its semiotic complexity, and the data context, as well as the type of task, reading level required to complete the task and purpose of the graph within the task. We observed the predominance of bar graphs, third level of semiotic complexity (representing a distribution), second reading level (reading between the data), work and school context, reading and computation tasks and analysis purpose. We describe the differences in the various grades and between both editorials, as well as differences and coincidences with results of other textbook studies carried out in Spain and Chile.

  3. Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm

    • plos.figshare.com
    docx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic (2023). Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm [Dataset]. http://doi.org/10.1371/journal.pbio.1002128
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.

  4. q

    Intro to Data Types and Graphing Lab

    • qubeshub.org
    Updated Oct 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephanie Spera (2020). Intro to Data Types and Graphing Lab [Dataset]. http://doi.org/10.25334/1XYA-TF48
    Explore at:
    Dataset updated
    Oct 12, 2020
    Dataset provided by
    QUBES
    Authors
    Stephanie Spera
    Description

    This is the third lab in an Introductory Physical Geography/Environmental Studies course. It introduces students to different data types (qualitative vs quantitative), basic statistical analyses (correlation analysis s, t-test), and graphing techniques.

  5. Stack Exchange Graphs (SNAP)

    • kaggle.com
    zip
    Updated Dec 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subhajit Sahu (2021). Stack Exchange Graphs (SNAP) [Dataset]. https://www.kaggle.com/datasets/wolfram77/graphs-snap-sx
    Explore at:
    zip(1480133729 bytes)Available download formats
    Dataset updated
    Dec 16, 2021
    Authors
    Subhajit Sahu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Ask Ubuntu temporal network

    https://snap.stanford.edu/data/sx-askubuntu.html

    Dataset information

    This is a temporal network of interactions on the stack exchange web site
    Ask Ubuntu (http://askubuntu.com/). There are three different types of
    interactions represented by a directed edge (u, v, t):

    user u answered user v's question at time t (in the graph sx-askubuntu-a2q) user u commented on user v's question at time t (in the graph
    sx-askubuntu-c2q) user u commented on user v's answer at time t (in the
    graph sx-askubuntu-c2a)

    The graph sx-askubuntu contains the union of these graphs. These graphs
    were constructed from the Stack Exchange Data Dump. Node ID numbers
    correspond to the 'OwnerUserId' tag in that data dump.

    Dataset statistics (sx-askubuntu)
    Nodes 159,316
    Temporal Edges 964,437
    Edges in static graph 596,933
    Time span 2613 days

    Dataset statistics (sx-askubuntu-a2q)
    Nodes 137,517
    Temporal Edges 280,102
    Edges in static graph 262,106
    Time span 2613 days

    Dataset statistics (sx-askubuntu-c2q)
    Nodes 79,155
    Temporal Edges 327,513
    Edges in static graph 198,852
    Time span 2047 days

    Dataset statistics (sx-askubuntu-c2a)
    Nodes 75,555
    Temporal Edges 356,822
    Edges in static graph 178,210
    Time span 2418 days

    Source (citation)
    Ashwin Paranjape, Austin R. Benson, and Jure Leskovec. "Motifs in Temporal Networks." In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, 2017.

    Files
    File Description
    sx-askubuntu.txt.gz All interactions
    sx-askubuntu-a2q.txt.gz Answers to questions
    sx-askubuntu-c2q.txt.gz Comments to questions
    sx-askubuntu-c2a.txt.gz Comments to answers

    Data format

    SRC DST UNIXTS                             
    

    where edges are separated by a new line and

    SRC: id of the source node (a user)                  
    TGT: id of the target node (a user)                  
    UNIXTS: Unix timestamp (seconds since the epoch)            
                   ...
    
  6. Further education and skills - Underlying Charts Data

    • explore-education-statistics.service.gov.uk
    Updated Nov 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Education (2024). Further education and skills - Underlying Charts Data [Dataset]. https://explore-education-statistics.service.gov.uk/data-catalogue/data-set/c0579bf7-96fd-4771-9034-e8642b529114
    Explore at:
    Dataset updated
    Nov 28, 2024
    Dataset authored and provided by
    Department for Educationhttps://gov.uk/dfe
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Historical time series of headline adult (19+) further education and skills learner participation, containing breakdowns by provision type and in some cases level. Also includes some all age apprenticeship participation figures.Academic years: 2005/06 to 2023/24 full academic yearsIndicators: ParticipationFilter: Provision type, Age group, Level

  7. Compare Baseball Player Statistics using Visualiza

    • kaggle.com
    zip
    Updated Sep 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdelaziz Sami (2024). Compare Baseball Player Statistics using Visualiza [Dataset]. https://www.kaggle.com/datasets/abdelazizsami/compare-baseball-player-statistics-using-visualiza
    Explore at:
    zip(1030978 bytes)Available download formats
    Dataset updated
    Sep 28, 2024
    Authors
    Abdelaziz Sami
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    To compare baseball player statistics effectively using visualization, we can create some insightful plots. Below are the steps to accomplish this in Python using libraries like Pandas and Matplotlib or Seaborn.

    1. Load the Data

    First, we need to load the judge.csv file into a DataFrame. This will allow us to manipulate and analyze the data easily.

    2. Explore the Data

    Before creating visualizations, it’s good to understand the data structure and identify the columns we want to compare. The relevant columns in your data include pitch_type, release_speed, game_date, and events.

    3. Visualization

    We can create various visualizations, such as: - A bar chart to compare the average release speed of different pitch types. - A line plot to visualize trends over time based on game dates. - A scatter plot to analyze the relationship between release speed and the outcome of the pitches (e.g., strikeouts, home runs).

    Example Code

    Here is a sample code to demonstrate how to create these visualizations using Matplotlib and Seaborn:

    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    # Load the data
    df = pd.read_csv('judge.csv')
    
    # Display the first few rows of the dataframe
    print(df.head())
    
    # Set the style of seaborn
    sns.set(style="whitegrid")
    
    # 1. Average Release Speed by Pitch Type
    plt.figure(figsize=(12, 6))
    avg_speed = df.groupby('pitch_type')['release_speed'].mean().sort_values()
    sns.barplot(x=avg_speed.values, y=avg_speed.index, palette="viridis")
    plt.title('Average Release Speed by Pitch Type')
    plt.xlabel('Average Release Speed (mph)')
    plt.ylabel('Pitch Type')
    plt.show()
    
    # 2. Trends in Release Speed Over Time
    # First, convert the 'game_date' to datetime
    df['game_date'] = pd.to_datetime(df['game_date'])
    
    plt.figure(figsize=(14, 7))
    sns.lineplot(data=df, x='game_date', y='release_speed', estimator='mean', ci=None)
    plt.title('Trends in Release Speed Over Time')
    plt.xlabel('Game Date')
    plt.ylabel('Average Release Speed (mph)')
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.show()
    
    # 3. Scatter Plot of Release Speed vs. Events
    plt.figure(figsize=(12, 6))
    sns.scatterplot(data=df, x='release_speed', y='events', hue='pitch_type', alpha=0.7)
    plt.title('Release Speed vs. Events')
    plt.xlabel('Release Speed (mph)')
    plt.ylabel('Event Type')
    plt.legend(title='Pitch Type', bbox_to_anchor=(1.05, 1), loc='upper left')
    plt.show()
    

    Explanation of the Code

    • Data Loading: The CSV file is loaded into a Pandas DataFrame.
    • Average Release Speed: A bar chart shows the average release speed for each pitch type.
    • Trends Over Time: A line plot illustrates the trend in release speed over time, which can indicate changes in performance or strategy.
    • Scatter Plot: A scatter plot visualizes the relationship between release speed and different events, providing insight into performance outcomes.

    Conclusion

    These visualizations will help you compare player statistics in a meaningful way. You can customize the plots further based on your specific needs, such as filtering data for specific players or seasons. If you have any specific comparisons in mind or additional data to visualize, let me know!

  8. NLP feature set variables for TwiBot-20.

    • plos.figshare.com
    xls
    Updated Dec 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agata Skorupka (2024). NLP feature set variables for TwiBot-20. [Dataset]. http://doi.org/10.1371/journal.pone.0315849.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Dec 23, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Agata Skorupka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The study examines different graph-based methods of detecting anomalous activities on digital markets, proposing the most efficient way to increase market actors’ protection and reduce information asymmetry. Anomalies are defined below as both bots and fraudulent users (who can be both bots and real people). Methods are compared against each other, and state-of-the-art results from the literature and a new algorithm is proposed. The goal is to find an efficient method suitable for threat detection, both in terms of predictive performance and computational efficiency. It should scale well and remain robust on the advancements of the newest technologies. The article utilized three publicly accessible graph-based datasets: one describing the Twitter social network (TwiBot-20) and two describing Bitcoin cryptocurrency markets (Bitcoin OTC and Bitcoin Alpha). In the former, an anomaly is defined as a bot, as opposed to a human user, whereas in the latter, an anomaly is a user who conducted a fraudulent transaction, which may (but does not have to) imply being a bot. The study proves that graph-based data is a better-performing predictor than text data. It compares different graph algorithms to extract feature sets for anomaly detection models. It states that methods based on nodes’ statistics result in better model performance than state-of-the-art graph embeddings. They also yield a significant improvement in computational efficiency. This often means reducing the time by hours or enabling modeling on significantly larger graphs (usually not feasible in the case of embeddings). On that basis, the article proposes its own graph-based statistics algorithm. Furthermore, using embeddings requires two engineering choices: the type of embedding and its dimension. The research examines whether there are types of graph embeddings and dimensions that perform significantly better than others. The solution turned out to be dataset-specific and needed to be tailored on a case-by-case basis, adding even more engineering overhead to using embeddings (building a leaderboard of grid of embedding instances, where each of them takes hours to be generated). This, again, speaks in favor of the proposed algorithm based on nodes’ statistics. The research proposes its own efficient algorithm, which makes this engineering overhead redundant.

  9. H

    United States Cancer Statistics (USCS)

    • dataverse.harvard.edu
    Updated May 4, 2011
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2011). United States Cancer Statistics (USCS) [Dataset]. http://doi.org/10.7910/DVN/JBJVUW
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 4, 2011
    Dataset provided by
    Harvard Dataverse
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Users can download the data set and static graphs, tables and charts regarding cancers in the United States. Background The United States Cancer Statistics is web-based report created by the Centers for Disease Control and Prevention, in partnership with the National Cancer Institute (NCI) and the North American Association of Central Cancer Registries (NAACCR). The site contains cancer incidence and cancer mortality data. Specific information includes: the top ten cancers, state vs. national comparisons, selected cancers, childhood cancer, cancers grouped by state/ region, cancers gr ouped by race/ ethnicity and brain cancers by tumor type. User Functionality Users can view static graphs, tables and charts, which can be downloaded. Within childhood cancer, users can view by year and by cancer type and age group or by cancer type and racial/ ethnic group. Otherwise, users can view data by female, male or male and female. Users may also download the entire data sets directly. Data Notes The data sources for the cancer incidence data are the CD C's National Program for Cancer Registries (NPCR) and NCI's Surveillance, Epidemiology and End Result (SEER). CDC's National Vital Statistics System (NVSS) collects the data on cancer mortality. Data is available for each year between 1999 and 2007 or for 2003- 2007 combined. The site does not specify when new data becomes available.

  10. 96 wells fluorescence reading and R code statistic for analysis

    • zenodo.org
    bin, csv, doc, pdf
    Updated Aug 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JVD Molino; JVD Molino (2024). 96 wells fluorescence reading and R code statistic for analysis [Dataset]. http://doi.org/10.5281/zenodo.1119285
    Explore at:
    doc, csv, pdf, binAvailable download formats
    Dataset updated
    Aug 2, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    JVD Molino; JVD Molino
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview

    Data points present in this dataset were obtained following the subsequent steps: To assess the secretion efficiency of the constructs, 96 colonies from the selection plates were evaluated using the workflow presented in Figure Workflow. We picked transformed colonies and cultured in 400 μL TAP medium for 7 days in Deep-well plates (Corning Axygen®, No.: PDW500CS, Thermo Fisher Scientific Inc., Waltham, MA), covered with Breathe-Easy® (Sigma-Aldrich®). Cultivation was performed on a rotary shaker, set to 150 rpm, under constant illumination (50 μmol photons/m2s). Then 100 μL sample were transferred clear bottom 96-well plate (Corning Costar, Tewksbury, MA, USA) and fluorescence was measured using an Infinite® M200 PRO plate reader (Tecan, Männedorf, Switzerland). Fluorescence was measured at excitation 575/9 nm and emission 608/20 nm. Supernatant samples were obtained by spinning Deep-well plates at 3000 × g for 10 min and transferring 100 μL from each well to the clear bottom 96-well plate (Corning Costar, Tewksbury, MA, USA), followed by fluorescence measurement. To compare the constructs, R Statistic version 3.3.3 was used to perform one-way ANOVA (with Tukey's test), and to test statistical hypotheses, the significance level was set at 0.05. Graphs were generated in RStudio v1.0.136. The codes are deposit herein.

    Info

    ANOVA_Turkey_Sub.R -> code for ANOVA analysis in R statistic 3.3.3

    barplot_R.R -> code to generate bar plot in R statistic 3.3.3

    boxplotv2.R -> code to generate boxplot in R statistic 3.3.3

    pRFU_+_bk.csv -> relative supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

    sup_+_bl.csv -> supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

    sup_raw.csv -> supernatant mCherry fluorescence dataset of 96 colonies for each construct.

    who_+_bl2.csv -> whole culture mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

    who_raw.csv -> whole culture mCherry fluorescence dataset of 96 colonies for each construct.

    who_+_Chlo.csv -> whole culture chlorophyll fluorescence dataset of 96 colonies for each construct.

    Anova_Output_Summary_Guide.pdf -> Explain the ANOVA files content

    ANOVA_pRFU_+_bk.doc -> ANOVA of relative supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

    ANOVA_sup_+_bk.doc -> ANOVA of supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

    ANOVA_who_+_bk.doc -> ANOVA of whole culture mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

    ANOVA_Chlo.doc -> ANOVA of whole culture chlorophyll fluorescence of all constructs, plus average and standard deviation values.

    Consider citing our work.

    Molino JVD, de Carvalho JCM, Mayfield SP (2018) Comparison of secretory signal peptides for heterologous protein expression in microalgae: Expanding the secretion portfolio for Chlamydomonas reinhardtii. PLoS ONE 13(2): e0192433. https://doi.org/10.1371/journal. pone.0192433

  11. f

    Appendix A. Supporting fertilization scheme, statistical results, graphs,...

    • datasetcatalog.nlm.nih.gov
    • wiley.figshare.com
    Updated Aug 9, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pierik, Marleen; Bezemer, T. Martijn; van Ruijven, Jasper; Berendse, Frank; Geerts, Rob H. E. M. (2016). Appendix A. Supporting fertilization scheme, statistical results, graphs, and species information. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001597956
    Explore at:
    Dataset updated
    Aug 9, 2016
    Authors
    Pierik, Marleen; Bezemer, T. Martijn; van Ruijven, Jasper; Berendse, Frank; Geerts, Rob H. E. M.
    Description

    Supporting fertilization scheme, statistical results, graphs, and species information.

  12. Higgs Twitter Graphs (SNAP)

    • kaggle.com
    zip
    Updated Jan 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Subhajit Sahu (2022). Higgs Twitter Graphs (SNAP) [Dataset]. https://www.kaggle.com/datasets/wolfram77/graphs-snap-higgs-twitter/versions/2
    Explore at:
    zip(125886451 bytes)Available download formats
    Dataset updated
    Jan 2, 2022
    Authors
    Subhajit Sahu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Higgs Twitter Dataset

    https://snap.stanford.edu/data/higgs-twitter.html

    Dataset information

    The Higgs dataset has been built after monitoring the spreading processes
    on Twitter before, during and after the announcement of the discovery of a new particle with the features of the elusive Higgs boson on 4th July 2012. The messages posted in Twitter about this discovery between 1st and 7th
    July 2012 are considered.

    The four directional networks made available here have been extracted from user activities in Twitter as:

    1. re-tweeting (retweet network)                    
    2. replying (reply network) to existing tweets             
    3. mentioning (mention network) other users              
    4. friends/followers social relationships among user involved     
      in the above activities                       
    5. information about activity on Twitter during the discovery of    
      Higgs boson                             
    

    It is worth remarking that the user IDs have been anonimized, and the same user ID is used for all networks. This choice allows to use the Higgs
    dataset in studies about large-scale interdependent/interconnected
    multiplex/multilayer networks, where one layer accounts for the social
    structure and three layers encode different types of user dynamics.

    For more information about data collection, please refer to our paper.

    Dataset statistics are calculated for the graph with the highest number of nodes and edges:

    Social Network statistics
    Nodes 456,626
    Edges 14,855,842
    Nodes in largest WCC 456290 (0.999)
    Edges in largest WCC 14855466 (1.000)
    Nodes in largest SCC 360210 (0.789)
    Edges in largest SCC 14102605 (0.949)
    Average clustering coefficient 0.1887
    Number of triangles 83023401
    Fraction of closed triangles 0.002901
    Diameter (longest shortest path) 9
    90-percentile effective diameter 3.7

    Retweet Network statistics
    Nodes 256,491
    Edges 328,132
    Nodes in largest WCC 223833 (0.873)
    Edges in largest WCC 308596 (0.940)
    Nodes in largest SCC 984 (0.004)
    Edges in largest SCC 3850 (0.012)
    Average clustering coefficient 0.0156
    Number of triangles 21172
    Fraction of closed triangles 0.0001085
    Diameter (longest shortest path) 19
    90-percentile effective diameter 6.8

    Reply Network statistics
    Nodes 38,918
    Edges 32,523
    Nodes in largest WCC 12839 (0.330)
    Edges in largest WCC 14944 (0....

  13. f

    Statistics information of datasets.

    • figshare.com
    xls
    Updated Oct 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhen Xie; Wenzhe Hou; Feiyang Wu; Hao Xu (2025). Statistics information of datasets. [Dataset]. http://doi.org/10.1371/journal.pone.0334724.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Oct 23, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Zhen Xie; Wenzhe Hou; Feiyang Wu; Hao Xu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Graphs are a representative type of fundamental data structures. They are capable of representing complex association relationships in diverse domains. For large-scale graph processing, the stream graphs have become efficient tools to process dynamically evolving graph data. When processing stream graphs, the subgraph counting problem is a key technique, which faces significant computational challenges due to its #P-complete nature. This work introduces StreamSC, a novel framework that efficiently estimate subgraph counting results on stream graphs through two key innovations: (i) It’s the first learning-based framework to address the subgraph counting problem focused on stream graphs; and (ii) this framework addresses the challenges from dynamic changes of the data graph caused by the insertion or deletion of edges. Experiments on 5 real-word graphs show the priority of StreamSC on accuracy and efficiency.

  14. m

    Statistical performance indicators (SPI): Pillar 4 data sources score (scale...

    • macro-rankings.com
    csv, excel
    Updated Dec 31, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    macro-rankings (2015). Statistical performance indicators (SPI): Pillar 4 data sources score (scale 0-100) - Botswana [Dataset]. https://www.macro-rankings.com/botswana/statistical-performance-indicators-(spi)-pillar-4-data-sources-score-(scale-0-100)
    Explore at:
    csv, excelAvailable download formats
    Dataset updated
    Dec 31, 2015
    Dataset authored and provided by
    macro-rankings
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Botswana
    Description

    Time series data for the statistic Statistical performance indicators (SPI): Pillar 4 data sources score (scale 0-100) and country Botswana. Indicator Definition:The data sources overall score is a composity measure of whether countries have data available from the following sources: Censuses and surveys, administrative data, geospatial data, and private sector/citizen generated data. The data sources (input) pillar is segmented by four types of sources generated by (i) the statistical office (censuses and surveys), and sources accessed from elsewhere such as (ii) administrative data, (iii) geospatial data, and (iv) private sector data and citizen generated data. The appropriate balance between these source types will vary depending on a country’s institutional setting and the maturity of its statistical system. High scores should reflect the extent to which the sources being utilized enable the necessary statistical indicators to be generated. For example, a low score on environment statistics (in the data production pillar) may reflect a lack of use of (and low score for) geospatial data (in the data sources pillar). This type of linkage is inherent in the data cycle approach and can help highlight areas for investment required if country needs are to be met.The indicator "Statistical performance indicators (SPI): Pillar 4 data sources score (scale 0-100)" stands at 62.75 as of 12/31/2023. Regarding the One-Year-Change of the series, the current value is equal to the value the year prior.The 1 year change in percent is 0.0.The 3 year change in percent is 17.97.The 5 year change in percent is 7.93.The Serie's long term average value is 58.13. It's latest available value, on 12/31/2023, is 7.95 percent higher, compared to it's long term average value.The Serie's change in percent from it's minimum value, on 12/31/2020, to it's latest available value, on 12/31/2023, is +17.97%.The Serie's change in percent from it's maximum value, on 12/31/2022, to it's latest available value, on 12/31/2023, is 0.0%.

  15. AI4PROFHEALTH - Profession-health status co-occurrence graph statistics

    • zenodo.org
    zip
    Updated Nov 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rodríguez-Ortega, Miguel; Rodríguez-Ortega, Miguel (2024). AI4PROFHEALTH - Profession-health status co-occurrence graph statistics [Dataset]. http://doi.org/10.5281/zenodo.14223005
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 26, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Rodríguez-Ortega, Miguel; Rodríguez-Ortega, Miguel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the Pointwise Mutual Information (PMI) values for co-occurrence pairs between different mention categories extracted from two distinct clinical datasets: MESINESP2 and the Clinical Case Reports Collection. PMI is a statistical measure used to assess the strength of association between pairs of entities by comparing their observed co-occurrence to the expected frequency under the assumption of independence.

    The datasets include PMI values for each co-occurrence pair, derived from the association of professions and clinical concepts, with the aim of identifying potential occupational health risks. By sharing these datasets, we aim to support further research into the relationships between professions and clinical entities, enabling the development of more accurate and targeted occupational health risk models.

    There is a separate file for each corpus, and each dataset is provided in CSV format for easy access and analysis. These files include the PMI values for co-occurrence pairs extracted from the respective corpora, making them suitable for further data analysis.

    Data Structure:

    • MESINESP2: mesinesp2_co-occurrence_pmi.zip
    • Clinical case reports: clinical_cases_co-occurrence_pmi.zip

    The repository contains a .zip file for each of the corpus, each containing a .csv file with the co-occurrences between the detected professions and clinical entities. The file has the following columns order:

    • span_mention_1: Mention string (original): profession
    • normalized_entity_1: Controlled vocabulary entry for this term
    • mention1_category: Semantic class (i.e., NER label)
    • mention1_freq: Absolute frequency of this mention entity 1
    • span_mention_2: Mention string (original): entity 2 (disease, symptom, species, etc.)
    • normalized_entity_2: Controlled vocabulary entry for this term
    • mention2_category: Semantic class (i.e., NER label)
    • mention1_freq: Absolute frequency of this mention entity 2
    • co-occurrence: Number of co-occurrences
    • PMID: PMID value

    Notes

    This resource been funded by the Spanish National Proyectos I+D+i 2020 AI4ProfHealth project PID2020-119266RA-I00 (PID2020-119266RA-I0/AEI/10.13039/501100011033).

    Contact

    If you have any questions or suggestions, please contact us at:

    - Miguel Rodríguez Ortega (

    Additional resources and corpora

    If you are interested, you might want to check out these corpora and resources:

    • MEDDOPROF (Corpus of mentions of professions, occupations and working status and normalization, different document collection with some overlapping documents)
    • MESINESP-2 (Corpus of manually indexed records with DeCS /MeSH terms comprising scientific literature abstracts, clinical trials, and patent abstracts, different document collection)

  16. F

    Average Price: Gasoline, All Types (Cost per Gallon/3.785 Liters) in...

    • fred.stlouisfed.org
    json
    Updated Oct 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Average Price: Gasoline, All Types (Cost per Gallon/3.785 Liters) in Washington-Arlington-Alexandria, DC-VA-MD-WV (CBSA) [Dataset]. https://fred.stlouisfed.org/series/APUS35A7471A
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Oct 24, 2025
    License

    https://fred.stlouisfed.org/legal/#copyright-public-domainhttps://fred.stlouisfed.org/legal/#copyright-public-domain

    Area covered
    Maryland, Washington Metropolitan Area, Washington-Arlington-Alexandria, DC-VA-MD-WV, West Virginia
    Description

    Graph and download economic data for Average Price: Gasoline, All Types (Cost per Gallon/3.785 Liters) in Washington-Arlington-Alexandria, DC-VA-MD-WV (CBSA) (APUS35A7471A) from Jan 1978 to Sep 2025 about DC, Washington, WV, MD, energy, VA, gas, urban, retail, price, and USA.

  17. Preferred variables (mean score 4.21) and chart types for dashboard...

    • plos.figshare.com
    xls
    Updated Sep 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ratchanont Thippimanporn; Wuttichai Khamna; Kannika Wiratchawa; Thanapong Intharah (2025). Preferred variables (mean score 4.21) and chart types for dashboard construction based on user assessment (n=10). [Dataset]. http://doi.org/10.1371/journal.pone.0332484.t005
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 17, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Ratchanont Thippimanporn; Wuttichai Khamna; Kannika Wiratchawa; Thanapong Intharah
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Preferred variables (mean score 4.21) and chart types for dashboard construction based on user assessment (n=10).

  18. Share of French people who have experienced discrimination 2016, by type and...

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Share of French people who have experienced discrimination 2016, by type and gender [Dataset]. https://www.statista.com/statistics/982298/people-discrimination-by-type-and-gender-france/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 18, 2016 - May 26, 2016
    Area covered
    France
    Description

    This graph shows the percentage of French people who have experienced discrimination based on gender, age, origin, skin color, religion, health condition, disability, pregnancy/maternity in France in 2016, distributed by gender and type of discrimination. It appears that more than 23 percent of responding women stated that they have already been discriminated because of their gender compared to 5.5 percent of responding men.

  19. Water usage statistics chart of Taiwan Water Corporation

    • data.gov.tw
    csv
    Updated Jun 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Taiwan Water Corporation (2025). Water usage statistics chart of Taiwan Water Corporation [Dataset]. https://data.gov.tw/en/datasets/86112
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 1, 2025
    Dataset authored and provided by
    Taiwan Water Corporationhttps://www.water.gov.tw/
    License

    https://data.gov.tw/licensehttps://data.gov.tw/license

    Description

    Provide the company's various types of water usage segmented statistical table for water usage.

  20. KG20C Scholarly Knowledge Graph

    • kaggle.com
    zip
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    T H N (2025). KG20C Scholarly Knowledge Graph [Dataset]. https://www.kaggle.com/tranhungnghiep/kg20c-scholarly-knowledge-graph
    Explore at:
    zip(1369962 bytes)Available download formats
    Dataset updated
    Nov 21, 2025
    Authors
    T H N
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Context

    This knowledge graph is constructed to aid research in scholarly data analysis. It can serve as a standard benchmark dataset for several tasks, including knowledge graph embedding, link prediction, recommendation systems, and question answering about high quality papers from 20 top computer science conferences.

    This has been introduced and used in the PhD thesis Multi-Relational Embedding for Knowledge Graph Representation and Analysis and TPDL'19 paper Exploring Scholarly Data by Semantic Query on Knowledge Graph Embedding Space.

    Content

    Construction protocol

    Scholarly data

    From the Microsoft Academic Graph dataset, we extracted high quality computer science papers published in top conferences between 1990 and 2010. The top conference list are based on the CORE ranking A* conferences. The data was cleaned by removing conferences with less than 300 publications and papers with less than 20 citations. The final list includes 20 top conferences: AAAI, AAMAS, ACL, CHI, COLT, DCC, EC, FOCS, ICCV, ICDE, ICDM, ICML, ICSE, IJCAI, NIPS, SIGGRAPH, SIGIR, SIGMOD, UAI, and WWW.

    Knowledge graph

    The scholarly dataset was converted to a knowledge graph by defining the entities, the relations, and constructing the triples. The knowledge graph can be seen as a labeled multi-digraph between scholarly entities, where the edge labels express there relationships between the nodes. We use 5 intrinsic entity types including Paper, Author, Affiliation, Venue, and Domain. We also use 5 intrinsic relation types between the entities including author_in_affiliation, author_write_paper, paper_in_domain, paper_cite_paper, and paper_in_venue.

    Benchmark data splitting

    The knowledge graph was split uniformly at random into the training, validation, and test sets. We made sure that all entities and relations in the validation and test sets also appear in the training set so that their embeddings can be learned. We also made sure that there is no data leakage and no redundant triples in these splits, thus, constitute a challenging benchmark for link prediction similar to WN18RR and FB15K-237.

    Data content

    File format

    All files are in tab-separated-values format, compatible with other popular benchmark datasets including WN18RR and FB15K-237. For example, train.txt includes "28674CFA author_in_affiliation 075CFC38", which denotes the author with id 28674CFA works in the affiliation with id 075CFC38. The repo includes these files: - all_entity_info.txt contains id name type of all entities - all_relation_info.txt contains id of all relations - train.txt contains training triples of the form entity_1_id relation_id entity_2_id - valid.txt contains validation triples - test.txt contains test triples

    Statistics

    Data statistics of the KG20C knowledge graph:

    AuthorPaperConferenceDomainAffiliation
    8,6805,047201,923692
    EntitiesRelationsTraining triplesValidation triplesTest triples
    16,362548,2133,6703,724

    Acknowledgements

    For the dataset and semantic query method, please cite: - Hung Nghiep Tran and Atsuhiro Takasu. Exploring Scholarly Data by Semantic Query on Knowledge Graph Embedding Space. In Proceedings of International Conference on Theory and Practice of Digital Libraries (TPDL), 2019.

    For the MEI knowledge graph embedding model, please cite: - Hung Nghiep Tran and Atsuhiro Takasu. Multi-Partition Embedding Interaction with Block Term Format for Knowledge Graph Completion. In Proceedings of the European Conference on Artificial Intelligence (ECAI), 2020.

    For the baseline results and extended semantic query method, please cite: - Hung Nghiep Tran. Multi-Relational Embedding for Knowledge Graph Representation and Analysis. PhD Dissertation, The Graduate University for Advanced Studies, SOKENDAI, Japan, 2020.

    For the Microsoft Academic Graph dataset, please cite: - Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. An Overview of Microsoft Academic Service (MAS) and Applications. In Proceedings of the International Conference on World Wide Web (WWW), 2015.

    Inspiration

    We include the baseline results for two tasks on ...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Danilo Díaz-Levicoy; Miluska Osorio; Pedro Arteaga; Francisco Rodríguez-Alveal (2023). Statistical Graphs in Mathematical Textbooks of Primary Education in Perú [Dataset]. http://doi.org/10.6084/m9.figshare.6857033.v1
Organization logo

Data from: Statistical Graphs in Mathematical Textbooks of Primary Education in Perú

Related Article
Explore at:
jpegAvailable download formats
Dataset updated
May 30, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Danilo Díaz-Levicoy; Miluska Osorio; Pedro Arteaga; Francisco Rodríguez-Alveal
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Abstract This paper presents the results of the statistical graphs’ analysis according to the curricular guidelines and its implementation in eighteen primary education mathematical textbooks in Perú, which correspond to three complete series and are from different editorials. In them, through a content analysis, we analyzed sections where graphs appeared, identifying the type of activity that arises from the graphs involved, the demanded reading level and the semiotic complexity task involved. The textbooks are partially suited to the curricular guidelines regarding the graphs presentation by educational level and the number of activities proposed by the three editorials are similar. The main activity that is required in textbooks is calculating and building. The predominance of bar graphs, a basic reading level and the representation of an univariate data distribution in the graph are observed in this study.

Search
Clear search
Close search
Google apps
Main menu