8 datasets found
  1. Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm

    • plos.figshare.com
    docx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic (2023). Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm [Dataset]. http://doi.org/10.1371/journal.pbio.1002128
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.

  2. f

    Figures S1 to S10: A data-driven approach to understanding the relations...

    • geolsoc.figshare.com
    zip
    Updated Oct 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yu-Ting Yu; H. Sebnem Duzgun; Andrew Sabin (2025). Figures S1 to S10: A data-driven approach to understanding the relations between geothermal exploration parameters: insights from Coso, Brady and Desert Peak, USA [Dataset]. http://doi.org/10.6084/m9.figshare.30366421.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 15, 2025
    Dataset provided by
    Geological Society of London
    Authors
    Yu-Ting Yu; H. Sebnem Duzgun; Andrew Sabin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Figure S1. Illustration of indicator mineral map datasets. Figure S2. Illustration of fault map datasets. Figure S3. Fault system at CGF. Figure S4. Fault system at BGF and DPGF. Figure S5. Illustration of LST datasets. Figure S6. Histograms and CDF plots of Two-Class Mineral Maps versus Fault Distance Maps. Figure S7. Histograms and CDF plots of Two-Class Mineral Maps versus Fault Density Maps. Figure S8. Histograms and CDF plots of Two-class Temperature Maps versus fault datasets. The top two rows correspond to Fault Distance Maps, while the bottom two rows correspond to Fault Density Maps. Figure S9. Histograms and CDF plots of Two-Class Mineral Maps versus Multiclass Temperature Map. Figure S10. The multiple comparisons of ANOVA. The plots show the mean estimates (circles) and 95% confidence intervals (bars) for each group of SGP. Red symbols highlight groups with Significant Differences from the control group (blue). Grey symbols indicate groups with Insignificant Differences where confidence intervals overlap with the control group.

  3. NFL Player Suspensions

    • kaggle.com
    zip
    Updated Jan 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). NFL Player Suspensions [Dataset]. https://www.kaggle.com/datasets/thedevastator/nfl-player-suspensions-2005-2015-data
    Explore at:
    zip(10856 bytes)Available download formats
    Dataset updated
    Jan 15, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    NFL Player Suspensions

    Exploring Categories and Sources of Player Discipline

    By FiveThirtyEight [source]

    About this dataset

    This dataset contains comprehensive information on NFL player suspensions. It includes detailed information such as the player's name, team they were playing for during the suspension, number of games suspended, and category of suspension. This data is ideal for anyone looking to analyze or research trends in NFL suspensions over time or compare different players' suspension records and can represent an invaluable source for new insights about professional football in America. So dive deep into this repository and see what meaningful stories you can tell—all under the Creative Commons Attribution 4.0 International License and MIT License. If you find this useful, let us know!

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    Key Columns/Variables

    The following is a list of key columns present in this dataset: - Name: Name of the player who was suspended. (String) - Team: The team that the player was playing for when suspension was issued. (String) - Games: The number of games suspended for which includes postseason games if applicable. (Integer) - Category: A description/categorization of why player was suspended e.g ‘substance abuse’ or ‘personal conduct’.(String) * Desc.: A brief synopsis describingsuspension further - often indicates what action led suspension to take place (e.g drug use).(String) Year: The year suspension originally took place.(Integer) Source: Information source behind suspension data.(String).

    #### Exploring and Visualizing the Data

    There are a variety of ways you can explore and analyze this data set including visualizations such as histograms, box plots, line graphs etc.. Additionally you can further explore correlations between various variables by performing linear regression or isolating individual instances by filtering out specific observations e.g all Substance Abuse offences committed against players in 2015 etc.. In order to identify meaningful relationships within data set we recommend performing univariate analysis i.e analyzing one variable at time and look for patterns which may be indicative wider trends within broader unit./population context which it represents! Here's example code snippet first step towards visualizing your own insights from NFL Suspension data set - generate histogram showing distribution type offense categories undertaken 2005 through 2015.

    Research Ideas

    • An analysis of suspension frequencies over time to determine overall trends in NFL player discipline.
    • Comparing the types of suspensions for players on different teams to evaluate any differences in the consequences for violations of team rules and regulations.
    • A cross-sectional analysis to assess correlations between types and length of suspensions issued given various violation categories, such as substance abuse or personal conduct violations

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: nfl-suspensions-data.csv | Column name | Description | |:--------------|:------------------------------------------------------------| | name | Name of the player who was suspended. (String) | | team | The team the player was suspended from. (String) | | games | The number of games the player was suspended for. (Integer) | | category | The category of the suspension. (String) | | desc. | A description of the suspension. (String) | | year | The year the suspension occurred. (Integer) | | source | The source of the suspension information. (String) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit FiveThirtyEight.

  4. Credit Rating History Dataset

    • kaggle.com
    zip
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Credit Rating History Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/credit-rating-history-dataset
    Explore at:
    zip(26498 bytes)Available download formats
    Dataset updated
    Dec 4, 2023
    Authors
    The Devastator
    Description

    Credit Rating History Dataset

    Credit Rating History

    By Center for Municipal Finance [source]

    About this dataset

    The project that led to the creation of this dataset received funding from the Center for Corporate and Securities Law at the University of San Diego School of Law. The dataset itself can be accessed through a GitHub repository or on its dedicated website.

    In terms of columns contained in this dataset, it encompasses a range of variables relevant to analyzing credit ratings. However, specific details about these columns are not provided in the given information. To acquire a more accurate understanding of the column labels and their corresponding attributes or measurements present in this dataset, further exploration or referencing additional resources may be required

    How to use the dataset

    • Understanding the Data

      The dataset consists of several columns that provide essential information about credit ratings and fixed income securities. Familiarize yourself with the column names and their meanings to better understand the data:

      • Column 1: [Credit Agency]
      • Column 2: [Issuer Name]
      • Column 3: [CUSIP/ISIN]
      • Column 4: [Rating Type]
      • Column 5: [Rating Source]
      • Column 6: [Rating Date]
    • Exploratory Data Analysis (EDA)

      Before diving into detailed analysis, start by performing exploratory data analysis to get an overview of the dataset.

      • Identify Unique Values: Explore each column's unique values to understand rating agencies, issuers, rating types, sources, etc.

      • Frequency Distribution: Analyze the frequency distribution of various attributes like credit agencies or rating types to identify any imbalances or biases in the data.

    • Data Visualization

      Visualizing your data can provide insights that are difficult to derive from tabular representation alone. Utilize various visualization techniques such as bar charts, pie charts, histograms, or line graphs based on your specific objectives.

      For example:

      • Plotting a histogram of each credit agency's ratings can help you understand their distribution across different categories.
      • A time-series line graph can show how ratings have evolved over time for specific issuers or industries.
    • Analyzing Ratings Performance

      One of the main objectives of using credit rating datasets is to assess the performance and accuracy of different credit agencies. Conducting a thorough analysis can help you understand how ratings have changed over time and evaluate the consistency of each agency's ratings.

      • Rating Changes Over Time: Analyze how ratings for specific issuers or industries have changed over different periods.

      • Comparing Rating Agencies: Compare ratings from different agencies to identify any discrepancies or trends. Are there consistent differences in their assessments?

    • Detecting Rating Trends

      The dataset allows you to detect trends and correlations between various factors related to

    Research Ideas

    • Credit Rating Analysis: This dataset can be used for analyzing credit ratings and trends of various fixed income securities. It provides historical credit rating data from different rating agencies, allowing researchers to study the performance, accuracy, and consistency of these ratings over time.
    • Comparative Analysis: The dataset allows for comparative analysis between different agencies' credit ratings for a specific security or issuer. Researchers can compare the ratings assigned by different agencies and identify any discrepancies or differences in their assessments. This analysis can help in understanding variations in methodologies and improving the transparency of credit rating processes

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all ...

  5. Z

    Protein secondary-structure description with a coarse-grained model: code...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kneller, Gerald R.; Hinsen, Konrad (2020). Protein secondary-structure description with a coarse-grained model: code and datasets in ActivePapers format [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_21690
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Université d'Orléans
    CNRS
    Authors
    Kneller, Gerald R.; Hinsen, Konrad
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This file contains the supplementary material for the publication

    Protein secondary-structure description with a coarse-grained model by Gerald R. Kneller and K. Hinsen http://dx.doi.org/10.1107/S1399004715007191 Acta Cryst. (2015). D71, 1411-1422

    Datasets in this file

    1) ScrewFit and ScrewFrame parameters for ideal secondary-structure elements

    Scripts: /code/import_ideal_structures /code/analyze_ideal_structures

    1.1) The PDB files generated with Chimera

    /data/ideal_structures/3-10.pdb /data/ideal_structures/alpha.pdb /data/ideal_structures/beta-antiparallel.pdb /data/ideal_structures/beta-parallel.pdb /data/ideal_structures/pi.pdb

    1.2) The corresponding MOSAIC datasets

    /data/ideal_structures/3-10 /data/ideal_structures/alpha /data/ideal_structures/beta-antiparallel /data/ideal_structures/beta-parallel /data/ideal_structures/pi

    1.3) The ScrewFit parameters

    /data/ideal_structures/screwfit/3-10 /data/ideal_structures/screwfit/alpha /data/ideal_structures/screwfit/beta-antiparallel /data/ideal_structures/screwfit/beta-parallel /data/ideal_structures/screwfit/pi

    1.4) The ScrewFrame parameters

    /data/ideal_structures/screwframe/3-10 /data/ideal_structures/screwframe/alpha /data/ideal_structures/screwframe/beta-antiparallel /data/ideal_structures/screwframe/beta-parallel /data/ideal_structures/screwframe/pi

    2) Statistics for ScrewFit and ScrewFrame parameters computed for the ASTRAL SCOPe subset with less than 40% sequence identity.

    Scripts: /code/astral_analysis /code/fit_rho_distributions /code/plot_histograms

    2.1) The ASTRAL database (link to published ActivePaper)

    /data/astral_2.04

    2.2) The histograms for the ScrewFit and ScrewFrame parameters for the all-alpha and all-beta subsets

    /data/histograms/astral_alpha/screwfit /data/histograms/astral_alpha/screwframe

    /data/histograms/astral_beta/screwfit /data/histograms/astral_beta/screwframe

    2.3) The Gaussians fitted to the peaks in the distributions for rho

    /data/fitted_rho_distributions/screwfit /data/fitted_rho_distributions/screwframe

    2.4) Plots

    /documentation/delta.pdf /documentation/delta_q.pdf /documentation/delta_r.pdf /documentation/p.pdf /documentation/rho-detail.pdf /documentation/rho.pdf /documentation/sigma.pdf /documentation/tau.pdf

    3) Comparison of secondary-structure identification between ScrewFrame and DSSP.

    Script: /code/compare_secondary_structure_assignments /code/plot_histograms

    3.1) The histograms of the lengths of secondary-structure elements

    /data/histograms/secondary_structure/length-alpha-dssp /data/histograms/secondary_structure/length-alpha-screwframe /data/histograms/secondary_structure/length-beta-dssp /data/histograms/secondary_structure/length-beta-screwframe

    3.2) The 2D histograms of the number of residues inside identified secondary-structure elements

    /data/histograms/secondary_structure/n-alpha /data/histograms/secondary_structure/n-beta

    3.3) The distribution of rho inside alpha helices

    /data/histograms/secondary_structure/rho-alpha-dssp

    3.3) Plots

    /documentation/lengths-alpha.pdf /documentation/lengths-beta.pdf /documentation/n-alpha.pdf /documentation/n-beta.pdf /documentation/rho-alpha-dssp.pdf

    4) Illustration for myoglobin and VADC-1

    Scripts: /code/import_myoglobin_vdac /code/analyze_myoglobin /code/analyze_vdac /code/perturbation_analysis

    4.1) Imported structures in MOSAIC format: PDB code 1A6G for myoglobin PDB code 2K4T for VDAC-1

    /data/myoglobin /data/VDAC-1

    4.2) Plots showing rho and delta

    /documentation/rho-myoglobin.pdf /documentation/delta-myoglobin.pdf

    4.3) Tube models for visualization with Chimera

    /documentation/myoglobin-tube.bld /documentation/VDAC-1-tube.bld

    4.4) Sensitivity to perturbations in the coordinates

    /documentation/rho-perturbed-myoglobin.pdf /documentation/delta-perturbed-VDAC-1.pdf /documentation/rho-perturbed-myoglobin.pdf /documentation/delta-perturbed-VDAC-1.pdf /documentation/myoglobin-perturbation.pdf /documentation/VDAC-1-perturbation.pdf

    5) Analysis of CA-only structures in the PDB

    Scripts: /code/ca_analysis /code/import_calpha_structures /code/plot_histograms

    5.1) Imported CA-only structures in MOSAIC format

    /data/pdb_ca_only_structures

    5.2) Histograms for ScrewFrame parameters

    /data/histograms/ca_only_structures

    5.3) Plots

    /documentation/delta_ca.pdf /documentation/delta_q_ca.pdf /documentation/delta_r_ca.pdf /documentation/p_ca.pdf /documentation/rho_ca.pdf /documentation/sigma_ca.pdf /documentation/tau_ca.pdf

  6. Numberical values to generate manuscript graphs and histograms.

    • plos.figshare.com
    xlsx
    Updated Feb 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiang Liu; Nancy Gillis; Chang Jiang; Anthony McCofie; Timothy I. Shaw; Aik-Choon Tan; Bo Zhao; Lixin Wan; Derek R. Duckett; Mingxiang Teng (2024). Numberical values to generate manuscript graphs and histograms. [Dataset]. http://doi.org/10.1371/journal.pcbi.1011873.s014
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 22, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Xiang Liu; Nancy Gillis; Chang Jiang; Anthony McCofie; Timothy I. Shaw; Aik-Choon Tan; Bo Zhao; Lixin Wan; Derek R. Duckett; Mingxiang Teng
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Numberical values to generate manuscript graphs and histograms.

  7. Speedtest Open Data - Australia 2020 Q2, Q3, Q4 extract

    • figshare.com
    txt
    Updated Oct 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Richard Ferrers; Speedtest Global Index (2025). Speedtest Open Data - Australia 2020 Q2, Q3, Q4 extract [Dataset]. http://doi.org/10.6084/m9.figshare.13370504.v17
    Explore at:
    txtAvailable download formats
    Dataset updated
    Oct 24, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Richard Ferrers; Speedtest Global Index
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Australia
    Description

    This is an Australian extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).AWS data licence is "CC BY-NC-SA 4.0", so use of this data must be:- non-commercial (NC)- reuse must be share-alike (SA)(add same licence).This restricts the standard CC-BY Figshare licence.A world speedtest open data was dowloaded (>400Mb, 7M lines of data). An extract of Australia's location (lat, long) revealed 88,000 lines of data (attached as csv).A Jupyter notebook of extract process is attached.A link to Twitter thread of outputs provided.A link to Data tutorial provided (GitHub), including Jupyter Notebook to analyse World Speedtest data, selecting one US State.Data Shows: (Q2)- 3.1M speedtests- 762,000 devices- 88,000 grid locations (600m * 600m), summarised as a point- average speed 33.7Mbps (down), 12.4M (up)- Max speed 724Mbps- data is for 600m * 600m grids, showing average speed up/down, number of tests, and number of users (IP). Added centroid, and now lat/long.See tweet of image of centroids also attached.Versions:v15/16. Add Hist comparing Q1-21 vs Q2-20. Inc ipynb (incHistQ121, v.1.3-Q121) to calc.v14 Add AUS Speedtest Q1 2021 geojson.(79k lines avg d/l 45.4Mbps)v13 - Added three colour MELB map (less than 20Mbps, over 90Mbps, 20-90Mbps)v12 - Added AUS - Syd - Mel Line Chart Q320.v11 - Add line chart compare Q2, Q3, Q4 plus Melb - result virtually indistinguishable. Add line chart to compare Syd - Melb Q3. Also virtually indistinguishable. Add HIST compare Syd - Melb Q3. Add new Jupyter with graph calcs (nbn-AUS-v1.3). Some ERRATA document in Notebook. Issue with resorting table, and graphing only part of table. Not an issue if all lines of table graphed.v10 - Load AURIN sample pics. Speedtest data loaded to AURIN geo-analytic platform; requires edu.au login.v9 - Add comparative Q2, Q3, Q4 Hist pic.v8 - Added Q4 data geojson. Add Q3, Q4 Hist pic.v7 - Rename to include Q2, Q3 in Title.v6 - Add Q3 20 data. Rename geojson AUS data as Q2. Add comparative Histogram. Calc in International.ipynb.v5 - add Jupyter Notebook inc Histograms. Hist is count of geo-locations avg download speed (unweighted by tests).v4 - added Melb choropleth (png 50Mpix) inc legend. (To do - add Melb.geojson). Posted Link to AURIN description of Speedtest data.v3 - Add super fast data (>100Mbps) less than 1% of data - 697 lines. Includes png of superfast.plot(). Link below to Google Maps version of superfast data points. Also Google map of first 100 data points - sample data. Geojson format for loading into GeoPandas, per Jupyter Notebook. New version of Jupyter Notebook, v.1.1.v2 - add centroids image.v1 - initial data load.** Future Work- combine Speedtest data with NBN Technology by location data (national map.gov.au); https://www.data.gov.au/dataset/national-broadband-network-connections-by-technology-type- combine Speedtest data with SEIFA data - socioeconomic categories - to discuss with AURIN.- Further international comparisons- discussed collaboration with Assoc Prof Tooran Alizadeh, USyd.

  8. Online Automotive Sales Statistics'23 (Volkswagen)

    • kaggle.com
    zip
    Updated Mar 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Merve Yavuz (2023). Online Automotive Sales Statistics'23 (Volkswagen) [Dataset]. https://www.kaggle.com/datasets/bimervos/online-automotive-sales-statistics-volkswagen
    Explore at:
    zip(217687 bytes)Available download formats
    Dataset updated
    Mar 30, 2023
    Authors
    Merve Yavuz
    Description

    This dataset contains information from Turkey's largest online real estate and car sales platform. The dataset covers a 3-month period from January 1, 2023, to March 31, 2023, and focuses solely on Volkswagen brand cars. The dataset consists of 13 variables, including customer_id, advertisement_number, brand, model, variant, year, kilometer, color, transmission, fuel, city, ad_date, and price.

    The dataset provides valuable insights into the sales and advertising trends for Volkswagen cars in Turkey during the first quarter of 2023. The data can be used to identify patterns and trends in consumer behavior, such as which models are most popular, the most common transmission type, and the most common fuel type. The data can also be used to evaluate the effectiveness of advertising campaigns and to identify which cities have the highest demand for Volkswagen cars.

    Overall, this dataset provides a rich source of information for anyone interested in the automotive industry in Turkey or for those who want to explore the trends in Volkswagen car sales during the first quarter of 2023.

    Here are the descriptions of the variables in the dataset:

    customer_id: Unique identifier for the customer who placed the advertisement
    advertisement_number: Unique identifier number for the advertisement
    brand: The brand of the car (in this dataset, it is always Volkswagen)
    model: The model of the car (e.g., Golf, Polo, Passat, etc.)
    variant: The variant of the car (e.g., 1.6 FSI Midline, 2.0 TDI Comfortline, etc.)
    year: The year that the car was manufactured
    kilometer: The distance that the car has been driven (in kilometers)
    color: The color of the car
    transmission: The type of transmission (manual or automatic)
    fuel: The type of fuel used by the car (e.g., petrol, diesel, hybrid, etc.)
    city: The city where the advertisement was placed
    ad_date: The date when the advertisement was placed
    price: The asking price for the car

    Here are some possible analyses and insights that can be derived from this dataset:

    Trend analysis: It is possible to analyze the trend of Volkswagen car sales over the three-month period covered by the dataset. This can be done by plotting the number of advertisements placed over time.

    Model popularity analysis: It is possible to determine which Volkswagen car models are the most popular based on the number of advertisements placed for each model. This can be done by grouping the data by model and counting the number of advertisements for each model.

    Price analysis: It is possible to analyze the distribution of prices for Volkswagen cars. This can be done by creating a histogram of the prices.

    Kilometer analysis: It is possible to analyze the distribution of kilometers driven for Volkswagen cars. This can be done by creating a histogram of the kilometer values.

    Geographic analysis: It is possible to analyze the distribution of Volkswagen car sales across different cities. This can be done by grouping the data by city and counting the number of advertisements for each city.

    Correlation analysis: It is possible to analyze the correlations between different variables, such as the year and price of the car or the kilometer and price of the car. This can be done by creating scatterplots of the variables and calculating correlation coefficients.

    Data Cleaning: Some data cleaning processes can be performed on the dataset. Firstly, the missing values can be checked, and the missing values may need to be filled or removed from the dataset. Additionally, the date formats in the dataset and the data types of the variables can be checked and adjusted accordingly. Outliers in the dataset may also need to be checked and corrected or removed.
    These cleaning processes in the dataset will help obtain healthier results for data analysis and machine learning algorithms.

    As a result, this dataset is a workable dataset for data cleaning and a valuable resource that interested parties can use in their data analysis and machine learning projects.

    All of these analyses can be visualized using various graphs and charts, such as line charts, histograms, and scatterplots.

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic (2023). Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm [Dataset]. http://doi.org/10.1371/journal.pbio.1002128
Organization logo

Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm

Explore at:
312 scholarly articles cite this dataset (View in Google Scholar)
docxAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.

Search
Clear search
Close search
Google apps
Main menu