20 datasets found

Heart_disease_patients_details
kaggle.com
zip
Updated Jul 22, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luv Harish Khati (2021). Heart_disease_patients_details [Dataset]. https://www.kaggle.com/luvharishkhati/heart-disease-patients-details
Explore at:
zip(3371 bytes)Available download formats
Dataset updated
Jul 22, 2021
Authors
Luv Harish Khati
Description
Hello all, this dataset involves various factors effecting cancer and based upon those factors, I have created a Histogram of various columns of the table which leads to heart disease. A histogram is a bar graph-like representation of data that buckets a range of outcomes into columns along the x-axis. The y-axis represents the number count or percentage of occurrences in the data for each column and can be used to visualize data distribution. At last I have created combined histogram of entire table which involves all the columns. Giving Titles, X-axis name, Y-axis name, Sizes and Colors is also done in this notebook.
GEE 7: Google Earth Engine Tutorial Pt. VII - Datasets - AmericaView - CKAN
ckan.americaview.org
Updated Nov 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.americaview.org (2021). GEE 7: Google Earth Engine Tutorial Pt. VII - Datasets - AmericaView - CKAN [Dataset]. https://ckan.americaview.org/dataset/gee-7-google-earth-engine-tutorial-pt-vii
Explore at:
Dataset updated
Nov 2, 2021
Dataset provided by
CKANhttps://ckan.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Charts, Histograms, and Time Series • Create a histogram graph from band values of an image collection • Create a time series graph from band values of an image collection
Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm
plos.figshare.com
docx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic (2023). Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm [Dataset]. http://doi.org/10.1371/journal.pbio.1002128
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pbio.1002128
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Tracey L. Weissgerber; Natasa M. Milic; Stacey J. Winham; Vesna D. Garovic
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.
NBA Rookies Performance Statistics and Minutes
kaggle.com
zip
Updated Jan 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). NBA Rookies Performance Statistics and Minutes [Dataset]. https://www.kaggle.com/datasets/thedevastator/nba-rookies-performance-statistics-and-minutes-p
Explore at:
zip(126219 bytes)Available download formats
Dataset updated
Jan 15, 2023
Authors
The Devastator
Description
NBA Rookies Performance Statistics and Minutes Played: 1980-2016

Tracking Basketball Prodigies' Growth and Achievements

By Gabe Salzer [source]

About this dataset

This dataset contains essential performance statistics for NBA rookies from 1980-2016. Here you can find minute per game stats, points scored, field goals made and attempted, three-pointers made and attempted, free throws made and attempted (with the respective percentages for each), offensive rebounds, defensive rebounds, assists, steals blocks turnovers efficiency rating and Hall of Fame induction year. It is organized in descending order by minutes played per game as well as draft year. This Kaggle dataset is an excellent resource for basketball analysts to gain a better understanding of how rookies have evolved over the years—from their stats to how they were inducted into the Hall of Fame. With its great detail on individual players' performance data this dataset allows you to compare their performances against different eras in NBA history along with overall trends in rookie statistics. Compare rookies drafted far apart or those that played together- whatever your goal may be!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset is perfect for providing insight into the performance of NBA rookies over an extended period of time. The data covers rookie stats from 1980 to 2016 and includes statistics such as points scored, field goals made, free throw percentage, offensive rebounds, defensive rebounds and assists. It also provides the name of each rookie along with the year they were drafted and their Hall of Fame class.

This data set is useful for researching how rookies’ stats have changed over time in order to compare different eras or identify trends in player performance. It can also be used to evaluate players by comparing their stats against those of other players or previous years’ stats.

In order to use this dataset effectively, a few tips are helpful:

Consider using Field Goal Percentage (FG%), Three Point Percentage (3P%) and Free Throw Percentage (FT%) to measure a player’s efficiency beyond just points scored or field goals made/attempted (FGM/FGA).

Lookout for anomalies such as low efficiency ratings despite high minutes played as this could indicate that either a player has not had enough playing time in order for their statistics to reach what would be per game average when playing more minutes or that they simply did not play well over that short period with limited opportunities.

Try different visualizations with the data such as histograms, line graphs and scatter plots because each may offer different insights into varied aspects of the data set like comparison between individual years vs aggregate trends over multiple years etc.

Lastly it is important keep in mind whether you're dealing with cumulative totals over multiple seasons versus looking at individual season averages or per game numbers when attempting analysis on these sets!

Research Ideas

Evaluating the performance of historical NBA rookies over time and how this can help inform future draft picks in the NBA.

Analysing the relative importance of certain performance stats, such as three-point percentage, to overall success and Hall of Fame induction from 1980-2016.

Comparing rookie seasons across different years to identify common trends in terms of statistical contributions and development over time

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

Columns

File: NBA Rookies by Year_Hall of Fame Class.csv | Column name | Description | |:-----------------------|:------------------------------------------------------------------| | Name | The name of...
NFL Player Suspensions
kaggle.com
zip
Updated Jan 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). NFL Player Suspensions [Dataset]. https://www.kaggle.com/datasets/thedevastator/nfl-player-suspensions-2005-2015-data
Explore at:
zip(10856 bytes)Available download formats
Dataset updated
Jan 15, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
NFL Player Suspensions

Exploring Categories and Sources of Player Discipline

By FiveThirtyEight [source]

About this dataset

This dataset contains comprehensive information on NFL player suspensions. It includes detailed information such as the player's name, team they were playing for during the suspension, number of games suspended, and category of suspension. This data is ideal for anyone looking to analyze or research trends in NFL suspensions over time or compare different players' suspension records and can represent an invaluable source for new insights about professional football in America. So dive deep into this repository and see what meaningful stories you can tell—all under the Creative Commons Attribution 4.0 International License and MIT License. If you find this useful, let us know!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

Key Columns/Variables

The following is a list of key columns present in this dataset: - Name: Name of the player who was suspended. (String) - Team: The team that the player was playing for when suspension was issued. (String) - Games: The number of games suspended for which includes postseason games if applicable. (Integer) - Category: A description/categorization of why player was suspended e.g ‘substance abuse’ or ‘personal conduct’.(String) * Desc.: A brief synopsis describingsuspension further - often indicates what action led suspension to take place (e.g drug use).(String) Year: The year suspension originally took place.(Integer) Source: Information source behind suspension data.(String).

#### Exploring and Visualizing the Data

There are a variety of ways you can explore and analyze this data set including visualizations such as histograms, box plots, line graphs etc.. Additionally you can further explore correlations between various variables by performing linear regression or isolating individual instances by filtering out specific observations e.g all Substance Abuse offences committed against players in 2015 etc.. In order to identify meaningful relationships within data set we recommend performing univariate analysis i.e analyzing one variable at time and look for patterns which may be indicative wider trends within broader unit./population context which it represents! Here's example code snippet first step towards visualizing your own insights from NFL Suspension data set - generate histogram showing distribution type offense categories undertaken 2005 through 2015.

Research Ideas

An analysis of suspension frequencies over time to determine overall trends in NFL player discipline.

Comparing the types of suspensions for players on different teams to evaluate any differences in the consequences for violations of team rules and regulations.

A cross-sectional analysis to assess correlations between types and length of suspensions issued given various violation categories, such as substance abuse or personal conduct violations

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: nfl-suspensions-data.csv | Column name | Description | |:--------------|:------------------------------------------------------------| | name | Name of the player who was suspended. (String) | | team | The team the player was suspended from. (String) | | games | The number of games the player was suspended for. (Integer) | | category | The category of the suspension. (String) | | desc. | A description of the suspension. (String) | | year | The year the suspension occurred. (Integer) | | source | The source of the suspension information. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit FiveThirtyEight.
f
Histogram plot of the average alignment accuracy (averaged over 10 runs) for...
datasetcatalog.nlm.nih.gov
Updated Oct 30, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ferretti, Vincent; Watt, Stuart N.; Borozan, Ivan (2013). Histogram plot of the average alignment accuracy (averaged over 10 runs) for each viral genome. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001630033
Explore at:
Dataset updated
Oct 30, 2013
Authors
Ferretti, Vincent; Watt, Stuart N.; Borozan, Ivan
Description
Histogram plot of the average alignment accuracy averaged over 10 runs for each viral genome shown in Table 1 and each aligner. Reads crossing splice junction regions are shown in pink, reads not crossing splice junction regions are shown in blue).
h
visual_qa_multipanel
huggingface.co
Updated Aug 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Reading Time Machine (2024). visual_qa_multipanel [Dataset]. https://huggingface.co/datasets/ReadingTimeMachine/visual_qa_multipanel
Explore at:
Dataset updated
Aug 13, 2024
Dataset authored and provided by
The Reading Time Machine
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Do you “see" what I “see"? A Multi-panel Visual Question and Answer Dataset for Large Language Model Chart Analysis

Publication: TBD (linked on publication) GitHub Repo: TBD (linked on publication)

This is a multi-panel figure dataset for visual question and answer (VQA) to test large language/multimodal models (LMMs). Data contains synthetically generated multi-panel figures with histogram, scatter, and line plots. Included are full data used to create plots… See the full description on the dataset page: https://huggingface.co/datasets/ReadingTimeMachine/visual_qa_multipanel.
Credit Rating History Dataset
kaggle.com
zip
Updated Dec 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Credit Rating History Dataset [Dataset]. https://www.kaggle.com/datasets/thedevastator/credit-rating-history-dataset
Explore at:
zip(26498 bytes)Available download formats
Dataset updated
Dec 4, 2023
Authors
The Devastator
Description
Credit Rating History Dataset

Credit Rating History

By Center for Municipal Finance [source]

About this dataset

The project that led to the creation of this dataset received funding from the Center for Corporate and Securities Law at the University of San Diego School of Law. The dataset itself can be accessed through a GitHub repository or on its dedicated website.

In terms of columns contained in this dataset, it encompasses a range of variables relevant to analyzing credit ratings. However, specific details about these columns are not provided in the given information. To acquire a more accurate understanding of the column labels and their corresponding attributes or measurements present in this dataset, further exploration or referencing additional resources may be required

How to use the dataset

Understanding the Data

The dataset consists of several columns that provide essential information about credit ratings and fixed income securities. Familiarize yourself with the column names and their meanings to better understand the data:

Column 1: [Credit Agency]

Column 2: [Issuer Name]

Column 3: [CUSIP/ISIN]

Column 4: [Rating Type]

Column 5: [Rating Source]

Column 6: [Rating Date]

Exploratory Data Analysis (EDA)

Before diving into detailed analysis, start by performing exploratory data analysis to get an overview of the dataset.

Identify Unique Values: Explore each column's unique values to understand rating agencies, issuers, rating types, sources, etc.

Frequency Distribution: Analyze the frequency distribution of various attributes like credit agencies or rating types to identify any imbalances or biases in the data.

Data Visualization

Visualizing your data can provide insights that are difficult to derive from tabular representation alone. Utilize various visualization techniques such as bar charts, pie charts, histograms, or line graphs based on your specific objectives.

For example:

Plotting a histogram of each credit agency's ratings can help you understand their distribution across different categories.

A time-series line graph can show how ratings have evolved over time for specific issuers or industries.

Analyzing Ratings Performance

One of the main objectives of using credit rating datasets is to assess the performance and accuracy of different credit agencies. Conducting a thorough analysis can help you understand how ratings have changed over time and evaluate the consistency of each agency's ratings.

Rating Changes Over Time: Analyze how ratings for specific issuers or industries have changed over different periods.

Comparing Rating Agencies: Compare ratings from different agencies to identify any discrepancies or trends. Are there consistent differences in their assessments?

Detecting Rating Trends

The dataset allows you to detect trends and correlations between various factors related to

Research Ideas

Credit Rating Analysis: This dataset can be used for analyzing credit ratings and trends of various fixed income securities. It provides historical credit rating data from different rating agencies, allowing researchers to study the performance, accuracy, and consistency of these ratings over time.

Comparative Analysis: The dataset allows for comparative analysis between different agencies' credit ratings for a specific security or issuer. Researchers can compare the ratings assigned by different agencies and identify any discrepancies or differences in their assessments. This analysis can help in understanding variations in methodologies and improving the transparency of credit rating processes

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all ...
f
Appendix B. Figures containing a histogram of frequency of effect sizes on...
datasetcatalog.nlm.nih.gov
wiley.figshare.com
Updated Aug 9, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
McKenzie, Scott W.; Jones, T. Hefin; Clark, Katherine E.; Johnson, Scott N.; Hartley, Susan E.; Koricheva, Julia (2016). Appendix B. Figures containing a histogram of frequency of effect sizes on AG and BG herbivores and a funnel plot of effect size and sample sizes indicating absence of publication bias. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001518356
Explore at:
Dataset updated
Aug 9, 2016
Authors
McKenzie, Scott W.; Jones, T. Hefin; Clark, Katherine E.; Johnson, Scott N.; Hartley, Susan E.; Koricheva, Julia
Description
Figures containing a histogram of frequency of effect sizes on AG and BG herbivores and a funnel plot of effect size and sample sizes indicating absence of publication bias.
R
Survey data and visualisation script of the administrative burden of Galaxy...
entrepot.recherche.data.gouv.fr
pdf, text/tsv, tsv +1
Updated Jun 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vlad Visan; Vlad Visan; Matthias Bernt; Matthias Bernt; Lucille Delisle; Lucille Delisle; Hans-Rudolf Hotz; Hans-Rudolf Hotz (2024). Survey data and visualisation script of the administrative burden of Galaxy small-scale admins [Dataset]. http://doi.org/10.57745/SQMQP1
Explore at:
text/tsv(10929), tsv(349), type/x-r-syntax(1510), pdf(141560)Available download formats
Unique identifier
https://doi.org/10.57745/SQMQP1
Dataset updated
Jun 28, 2024
Dataset provided by
Recherche Data Gouv
Authors
Vlad Visan; Vlad Visan; Matthias Bernt; Matthias Bernt; Lucille Delisle; Lucille Delisle; Hans-Rudolf Hotz; Hans-Rudolf Hotz
License
https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Description
Main publication Poll report and form on HAL Authors The raw data was generated by the poll respondents The authors of this Dataset, excluding Vlad Visan, are such respondents. There are also other respondents who chose to remain anonymous The script was written by Vlad Visan The raw format was adapted to a numerical format by Vlad Visan Overall description A poll took place in February 2024, to understand the administrative burden of using Galaxy, specifically for small-scale admins. Context Useful to anyone considering using Galaxy Done as part of the technology monitoring phase of the "Gestionnaire de workflows" (Workflow Management System) project of the OSUG LabEx File descriptions raw_data_names_removed.tsv Raw poll answers. With any personally identifiable information redacted. SSA-Poll-19-Feb-2024-Filtered-Numerical.tab This numerically filtered format is required by the script The transformation could be done automatically in the future, but there are some subtleties: "-1" denotes "ignore/invalid" Some empty answers have to manually be converted to "0" I manually changed one answer that was "0" to "-1" after reading the associated comment which made it clear that "invalid" was more appropriate numericalCsvImportAndGenerateCharts.R The script parses the data, and creates one distribution/histogram graph per column It expects a filtered version, with only the numerical fields. Form-V2.pdf Survey questions, with several errors corrected: End-user assistance questions were worded wrongly Various spelling/wording mistakes
Diabetes Healthcare: Comprehensive Dataset-AI
kaggle.com
zip
Updated Jul 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deependra Verma (2023). Diabetes Healthcare: Comprehensive Dataset-AI [Dataset]. https://www.kaggle.com/deependraverma13/diabetes-healthcare-comprehensive-dataset
Explore at:
zip(9152 bytes)Available download formats
Dataset updated
Jul 23, 2023
Authors
Deependra Verma
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
DESCRIPTION

NIDDK (National Institute of Diabetes and Digestive and Kidney Diseases) research creates knowledge about and treatments for the most chronic, costly, and consequential diseases.

The dataset used in this project is originally from NIDDK. The objective is to predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Build a model to accurately predict whether the patients in the dataset have diabetes or not.

Dataset Description

The datasets consists of several medical predictor variables and one target variable (Outcome). Predictor variables includes the number of pregnancies the patient has had, their BMI, insulin level, age, and more.

Variables Description Pregnancies Number of times pregnant Glucose Plasma glucose concentration in an oral glucose tolerance test BloodPressure Diastolic blood pressure (mm Hg) SkinThickness Triceps skinfold thickness (mm) Insulin Two hour serum insulin BMI Body Mass Index DiabetesPedigreeFunction Diabetes pedigree function Age Age in years Outcome Class variable (either 0 or 1). 268 of 768 values are 1, and the others are 0 Project Task: Week 1

Data Exploration:

Perform descriptive analysis. Understand the variables and their corresponding values. On the columns below, a value of zero does not make sense and thus indicates missing value:

Glucose

BloodPressure

SkinThickness

Insulin

BMI

Visually explore these variables using histograms. Treat the missing values accordingly.

There are integer and float data type variables in this dataset. Create a count (frequency) plot describing the data types and the count of variables.

Data Exploration:

Check the balance of the data by plotting the count of outcomes by their value. Describe your findings and plan future course of action.

Create scatter charts between the pair of variables to understand the relationships. Describe your findings.

Perform correlation analysis. Visually explore it using a heat map.

Project Task: Week 2

Data Modeling:

Devise strategies for model building. It is important to decide the right validation framework. Express your thought process.

Apply an appropriate classification algorithm to build a model.

Compare various models with the results from KNN algorithm.

Create a classification report by analyzing sensitivity, specificity, AUC (ROC curve), etc.

Please be descriptive to explain what values of these parameter you have used.

Data Reporting:

Create a dashboard in tableau by choosing appropriate chart types and metrics useful for the business. The dashboard must entail the following:

Pie chart to describe the diabetic or non-diabetic population

Scatter charts between relevant variables to analyze the relationships

Histogram or frequency charts to analyze the distribution of the data

Heatmap of correlation analysis among the relevant variables

Create bins of these age values: 20-25, 25-30, 30-35, etc. Analyze different variables for these age brackets using a bubble chart.
a
NOAA Inundation Analysis Tool
amerigeo.org
data.amerigeoss.org
+1more
Updated Feb 8, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AmeriGEOSS (2019). NOAA Inundation Analysis Tool [Dataset]. https://www.amerigeo.org/datasets/noaa-inundation-analysis-tool
Explore at:
Dataset updated
Feb 8, 2019
Dataset authored and provided by
AmeriGEOSS
Description
Coastal storms and other meteorological phenomenon can have a significant impact on how high water levels rise and how often. The inundation analysis program is extremely beneficial in determining the frequency (or the occurrence of high waters for different elevations above a specified threshold) and duration (or the amount of time that the specified location is inundated by water) of observed high waters (tides). Statistical output from these analyses can be useful in planning marsh restoration activities. Additionally, the analyses have broader applications for the coastal engineering and mapping community, such as, ecosystem management and regional climate change. Since these statistical outputs are station specific, use for evaluating surrounding areas may be limited.ProductsThe data input for this tool is 6-minute water level data time series and the tabulated times and heights of the high tides over a user specified time period, relative to a desired tidal datum or user-specified datum. The data output of this tool provides summary statistics, which includes the number of occurrences of inundation above the threshold (events) and length of duration of inundation of each events above the threshold elevation for a specified time period. In addition to summary statistics, graphical outputs are provided using three plots. The first plot is a histogram of frequency of occurrence relative to the threshold elevation, the second plot is a histogram of the frequency of duration of inundation, and the third plot is an X-Y plot of frequency of elevation versus duration of inundation for each event. Input data time series are presently limited to the verified data from a set of operating and historical tide stations in the NOAA CO-OPS data base.Data and Resources CO-OPS Frequency and Duration of Inundation Analysis Tool: User's GuideCO-OPS Frequency and Duration of Inundation Analysis Tool: Limitations and Uncertainty Assessment
Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus...
figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard Ferrers; Speedtest Global Index (2023). Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus ALC - 2020, 2022 [Dataset]. http://doi.org/10.6084/m9.figshare.13621169.v24
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13621169.v24
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Richard Ferrers; Speedtest Global Index
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset compares four cities FIXED-line broadband internet speeds: - Melbourne, AU - Bangkok, TH - Shanghai, CN - Los Angeles, US - Alice Springs, AU

ERRATA: 1.Data is for Q3 2020, but some files are labelled incorrectly as 02-20 of June 20. They all should read Sept 20, or 09-20 as Q3 20, rather than Q2. Will rename and reload. Amended in v7.

LAX file named 0320, when should be Q320. Amended in v8.

*lines of data for each geojson file; a line equates to a 600m^2 location, inc total tests, devices used, and average upload and download speed - MEL 16181 locations/lines => 0.85M speedtests (16.7 tests per 100people) - SHG 31745 lines => 0.65M speedtests (2.5/100pp) - BKK 29296 lines => 1.5M speedtests (14.3/100pp) - LAX 15899 lines => 1.3M speedtests (10.4/100pp) - ALC 76 lines => 500 speedtests (2/100pp)

Geojsons of these 2* by 2* extracts for MEL, BKK, SHG now added, and LAX added v6. Alice Springs added v15.

This dataset unpacks, geospatially, data summaries provided in Speedtest Global Index (linked below). See Jupyter Notebook (*.ipynb) to interrogate geo data. See link to install Jupyter.

** To Do Will add Google Map versions so everyone can see without installing Jupyter. - Link to Google Map (BKK) added below. Key:Green > 100Mbps(Superfast). Black > 500Mbps (Ultrafast). CSV provided. Code in Speedtestv1.1.ipynb Jupyter Notebook. - Community (Whirlpool) surprised [Link: https://whrl.pl/RgAPTl] that Melb has 20% at or above 100Mbps. Suggest plot Top 20% on map for community. Google Map link - now added (and tweet).

** Python melb = au_tiles.cx[144:146 , -39:-37] #Lat/Lon extract shg = tiles.cx[120:122 , 30:32] #Lat/Lon extract bkk = tiles.cx[100:102 , 13:15] #Lat/Lon extract lax = tiles.cx[-118:-120, 33:35] #lat/Lon extract ALC=tiles.cx[132:134, -22:-24] #Lat/Lon extract

Histograms (v9), and data visualisations (v3,5,9,11) will be provided. Data Sourced from - This is an extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).

**VERSIONS v.24 Add tweet and google map of Top 20% (over 100Mbps locations) in Mel Q322. Add v.1.5 MEL-Superfast notebook, and CSV of results (now on Google Map; link below). v23. Add graph of 2022 Broadband distribution, and compare 2020 - 2022. Updated v1.4 Jupyter notebook. v22. Add Import ipynb; workflow-import-4cities. v21. Add Q3 2022 data; five cities inc ALC. Geojson files. (2020; 4.3M tests 2022; 2.9M tests)

Melb 14784 lines Avg download speed 69.4M Tests 0.39M

SHG 31207 lines Avg 233.7M Tests 0.56M

ALC 113 lines Avg 51.5M Test 1092

BKK 29684 lines Avg 215.9M Tests 1.2M

LAX 15505 lines Avg 218.5M Tests 0.74M

v20. Speedtest - Five Cities inc ALC. v19. Add ALC2.ipynb. v18. Add ALC line graph. v17. Added ipynb for ALC. Added ALC to title.v16. Load Alice Springs Data Q221 - csv. Added Google Map link of ALC. v15. Load Melb Q1 2021 data - csv. V14. Added Melb Q1 2021 data - geojson. v13. Added Twitter link to pics. v12 Add Line-Compare pic (fastest 1000 locations) inc Jupyter (nbn-intl-v1.2.ipynb). v11 Add Line-Compare pic, plotting Four Cities on a graph. v10 Add Four Histograms in one pic. v9 Add Histogram for Four Cities. Add NBN-Intl.v1.1.ipynb (Jupyter Notebook). v8 Renamed LAX file to Q3, rather than 03. v7 Amended file names of BKK files to correctly label as Q3, not Q2 or 06. v6 Added LAX file. v5 Add screenshot of BKK Google Map. v4 Add BKK Google map(link below), and BKK csv mapping files. v3 replaced MEL map with big key version. Prev key was very tiny in top right corner. v2 Uploaded MEL, SHG, BKK data and Jupyter Notebook v1 Metadata record

** LICENCE AWS data licence on Speedtest data is "CC BY-NC-SA 4.0", so use of this data must be: - non-commercial (NC) - reuse must be share-alike (SA)(add same licence). This restricts the standard CC-BY Figshare licence.

** Other uses of Speedtest Open Data; - see link at Speedtest below.
f
Data for "Are pseudo first-order kinetic constants properly calculated for...
figshare.com
Updated Oct 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Timothy Warner; Ritvik Manicka; Charles-François de Lannoy (2025). Data for "Are pseudo first-order kinetic constants properly calculated for catalytic membranes?" [Dataset]. http://doi.org/10.6084/m9.figshare.30385516.v1
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.30385516.v1
Dataset updated
Oct 17, 2025
Dataset provided by
figshare
Authors
Timothy Warner; Ritvik Manicka; Charles-François de Lannoy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
DescriptionThis dataset accompanies the publication "Are pseudo first-order kinetic constants properly calculated for catalytic membranes?" and includes the database containing all the analysed literature, the kinetic rate constants, statistics and the code used to complete all the analysis and generate the figures for the publication. The study compares different methods of calculating pseudo first-order (PFO) kinetic rate constants for degrading water contaminants using catalytic membranes.This dataset includes:database.xlsx: Excel file containing all details about surveyed literature and calculated PFO kinetic results.catalytic_membrane_meta-analysis-0.1.3.zip: zip folder containing figures from the publication and python codes and instructions for generating PFO results and individual figures for every research article listed in database.xlsxData Contentsdatabase.xlsxPorous Catalytic Membranes (Tab 1):Index - number associated with analysis and figures generated by python codes in catalytic_membrane_meta-analysis-0.1.3.zipAuthor full namesTitleYearSource titleDOIAbstractKeywordsData Table (Tab 2):Index - number associated with analysis and figures generated by python codes in catalytic_membrane_meta-analysis-0.1.3.zipTitleComments on C/C0 Graph - description on what data was extracted, the format of the data and any modification required for analysis (if any).Data Location - the location within the research articles or supplimentary informations files where the C/C0 graphs can be found.Removal Mechanism - The removal mechanism specified by the research articles that are associated with the C/C0 graphs.All the PFO kinetic contant results and the statistics associated with different methods of calculating PFO kinectic constantscatalytic_membrane_meta-analysis-0.1.3.zipcase_studies:Directory containing the data, figures and code to generate the figures for the four case studies presented in the article.concentration_graphs:Directory containing subdirectories jpg and csv where concentration graphs (C/C0) are generated to by "data-analysis.py" script examples:Directory containing figures and code relating to the comparison of calculation methodologies for PFO kinetic rate constants in the article.exp_equil_graphs:Directory containing subdirectories jpg and csv where the graphs related to the exponential model with equilibrium constant are generated to by "data-analysis.py" script exponential_graphs:Directory containing subdirectories jpg and csv where the graphs related to the exponential model are generated to by "data-analysis.py" script first_order_kinetic_graphs:Directory containing subdirectories jpg and csv where the graphs related to the linearized from of C/C0 are generated to by "data-analysis.py" scriptindividual_mechanisms:Directory containing the meta-analysis figures for each separate mechanism (found in supplementary information) and the code to generate these figures.meta-analysis:Directory containing the meta-analysis figure found in the article, histograms subdirectory containing the histogram figures and "database.xlsx" from which the data use used to generate the figures is drawnmodel_parameters:Directory containing "model_parameters.csv" which is the output file for the "data-analysis.py" script where all the calculated PSO kinetic contant values and statistics are recorded.processed_data:Directory containing .csv files associated with each analyzed journal article labelled by the index defined in "database.xlsx". The data is the C/C0 data from each paper after modifications were made as described in the journal article including normalization and converting time scale to minutes etc. These files are used by "data-analysis.py" to generate PSO kinetic constants and the figures associated with each analysis method.data-analysis.py:The python script which should be run to analyze the data and generate figures. Licence:The licence agreement for use of the codes and data.meta-analysis:The python script for generating the meta-analysis figures and histogram figures.README.md:The details and instructions for using the provided codes.Usage NotesPlease cite the original publication when using this dataset.LicensingThis dataset is licensed under CC BY 4.0. See the publication for further details.
f
SPSS files for experiment
figshare.com
bin
Updated Jan 19, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
F. (Fabiano) Dalpiaz (2020). SPSS files for experiment [Dataset]. http://doi.org/10.23644/uu.11659344.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.23644/uu.11659344.v1
Dataset updated
Jan 19, 2020
Dataset provided by
Utrecht University
Authors
F. (Fabiano) Dalpiaz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis SPSS files used in the paper to analyze the experiment results. The tests we executed in the paper are as follows, in the SPSS syntax:** PreQuestionnaire.sav, leading to Table 2T-TEST GROUPS=form(1 2) /MISSING=ANALYSIS /VARIABLES=grade USLEC UCLEC /CRITERIA=CI(.95).NPAR TESTS /M-W= CDFAM UCFAM USFAM UCHW USHW CDHW BY form(1 2) /MISSING ANALYSIS.** Anova.sav, leading to the decision of analyzing the two case studies independentlyGLM EntRec EntPre RelRec RelPre TotRec TotPre AdjRelRec AdjRelPre AdjTotRec AdjTotPre BY Domain Form /METHOD=SSTYPE(3) /INTERCEPT=INCLUDE /POSTHOC=Domain Form(TUKEY) /PLOT=PROFILE(Domain*Form) TYPE=LINE ERRORBAR=NO MEANREFERENCE=NO YAXIS=AUTO /PRINT=DESCRIPTIVE ETASQ /CRITERIA=ALPHA(.05) /DESIGN= Domain Form Domain*Form.** DH.sav, leading to Table 3T-TEST GROUPS=Form(1 2) /MISSING=ANALYSIS /VARIABLES=EntRec EntPre RelRec RelPre TotRec TotPre AdjRelRec AdjRelPre AdjTotRec AdjTotPre /CRITERIA=CI(.95).** PH.sav, leading to Table 4T-TEST GROUPS=Form(1 2) /MISSING=ANALYSIS /VARIABLES=EntRec EntPre RelRec RelPre TotRec TotPre AdjRelRec AdjRelPre AdjTotRec AdjTotPre /CRITERIA=CI(.95).** Preferences.sav, leading to Table 5 and Table 6NPAR TESTS /M-W= UCCM USCM UCCDID USCDID UCRID USRID USSTRUCT UCSTRUCT UCOVER USOVER UCREQ USREQ BY Form(1 2) /MISSING ANALYSIS.EXAMINE VARIABLES=UCCM USCM UCCDID USCDID UCRID USRID USSTRUCT UCSTRUCT UCOVER USOVER UCREQ USREQ BY Form /PLOT HISTOGRAM NPPLOT /STATISTICS DESCRIPTIVES /CINTERVAL 95 /MISSING LISTWISE /NOTOTAL.NPAR TESTS /M-W= UCCM USCM UCCDID USCDID UCRID USRID USSTRUCT UCSTRUCT UCOVER USOVER UCREQ USREQ BY Form(1 2) /STATISTICS=DESCRIPTIVES /MISSING ANALYSIS.GLM EntRec EntPre RelRec RelPre TotRec TotPre AdjRelRec AdjRelPre AdjTotRec AdjTotPre BY Domain Form /METHOD=SSTYPE(3) /INTERCEPT=INCLUDE /POSTHOC=Domain Form(TUKEY) /PLOT=PROFILE(Domain*Form) TYPE=LINE ERRORBAR=NO MEANREFERENCE=NO YAXIS=AUTO /PRINT=DESCRIPTIVE ETASQ /CRITERIA=ALPHA(.05) /DESIGN= Domain Form Domain*Form.
AiportsUSA20112020clean
kaggle.com
zip
Updated Sep 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giancarlo Marchesi (2022). AiportsUSA20112020clean [Dataset]. https://www.kaggle.com/datasets/giancarlomarchesi/quickstartedavisualizations
Explore at:
zip(17304743 bytes)Available download formats
Dataset updated
Sep 29, 2022
Authors
Giancarlo Marchesi
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
The contains flight statistics for all airports in the United States from January 2011 to December 2020. Each observation is reported by month, year, airport, and airline. Flights can be categorized as on time, delayed, canceled, or diverted. Flight delays are attributed to five causes: carrier, weather, NAS, security, and late aircraft. The data was downloaded from the Bureau of Transportation Statistics website https://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp.

The accompanying notebook explores commercial airplane flight delays in the United States using Python's visualization capabilities in Matplotlib and Seaborn, through the lenses of seasonality, airport traffic, and airline performance.

The clean data set (delays_clean.csv) is analyzed using the following visualizations:

Bar chart Bar chart subplots Lollipop chart Tree maps Line plot Histogram Histogram subplots Horizontal stacked bar chart Ranked horizontal bar chart Box plot Pareto chart - double axis Marginal histogram Pie charts Scatter plot Violin plot Map chart Linear regression
Online Automotive Sales Statistics'23 (Volkswagen)
kaggle.com
zip
Updated Mar 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Merve Yavuz (2023). Online Automotive Sales Statistics'23 (Volkswagen) [Dataset]. https://www.kaggle.com/datasets/bimervos/online-automotive-sales-statistics-volkswagen
Explore at:
zip(217687 bytes)Available download formats
Dataset updated
Mar 30, 2023
Authors
Merve Yavuz
Description
This dataset contains information from Turkey's largest online real estate and car sales platform. The dataset covers a 3-month period from January 1, 2023, to March 31, 2023, and focuses solely on Volkswagen brand cars. The dataset consists of 13 variables, including customer_id, advertisement_number, brand, model, variant, year, kilometer, color, transmission, fuel, city, ad_date, and price.

The dataset provides valuable insights into the sales and advertising trends for Volkswagen cars in Turkey during the first quarter of 2023. The data can be used to identify patterns and trends in consumer behavior, such as which models are most popular, the most common transmission type, and the most common fuel type. The data can also be used to evaluate the effectiveness of advertising campaigns and to identify which cities have the highest demand for Volkswagen cars.

Overall, this dataset provides a rich source of information for anyone interested in the automotive industry in Turkey or for those who want to explore the trends in Volkswagen car sales during the first quarter of 2023.

Here are the descriptions of the variables in the dataset:

customer_id: Unique identifier for the customer who placed the advertisement
advertisement_number: Unique identifier number for the advertisement
brand: The brand of the car (in this dataset, it is always Volkswagen)
model: The model of the car (e.g., Golf, Polo, Passat, etc.)
variant: The variant of the car (e.g., 1.6 FSI Midline, 2.0 TDI Comfortline, etc.)
year: The year that the car was manufactured
kilometer: The distance that the car has been driven (in kilometers)
color: The color of the car
transmission: The type of transmission (manual or automatic)
fuel: The type of fuel used by the car (e.g., petrol, diesel, hybrid, etc.)
city: The city where the advertisement was placed
ad_date: The date when the advertisement was placed
price: The asking price for the car

Here are some possible analyses and insights that can be derived from this dataset:

Trend analysis: It is possible to analyze the trend of Volkswagen car sales over the three-month period covered by the dataset. This can be done by plotting the number of advertisements placed over time.

Model popularity analysis: It is possible to determine which Volkswagen car models are the most popular based on the number of advertisements placed for each model. This can be done by grouping the data by model and counting the number of advertisements for each model.

Price analysis: It is possible to analyze the distribution of prices for Volkswagen cars. This can be done by creating a histogram of the prices.

Kilometer analysis: It is possible to analyze the distribution of kilometers driven for Volkswagen cars. This can be done by creating a histogram of the kilometer values.

Geographic analysis: It is possible to analyze the distribution of Volkswagen car sales across different cities. This can be done by grouping the data by city and counting the number of advertisements for each city.

Correlation analysis: It is possible to analyze the correlations between different variables, such as the year and price of the car or the kilometer and price of the car. This can be done by creating scatterplots of the variables and calculating correlation coefficients.

Data Cleaning: Some data cleaning processes can be performed on the dataset. Firstly, the missing values can be checked, and the missing values may need to be filled or removed from the dataset. Additionally, the date formats in the dataset and the data types of the variables can be checked and adjusted accordingly. Outliers in the dataset may also need to be checked and corrected or removed.
These cleaning processes in the dataset will help obtain healthier results for data analysis and machine learning algorithms.

As a result, this dataset is a workable dataset for data cleaning and a valuable resource that interested parties can use in their data analysis and machine learning projects.

All of these analyses can be visualized using various graphs and charts, such as line charts, histograms, and scatterplots.
Percent Vegetation and Elevations_raw data
figshare.com
xlsx
Updated Dec 16, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexandra Thomsen (2020). Percent Vegetation and Elevations_raw data [Dataset]. http://doi.org/10.6084/m9.figshare.13353581.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.13353581.v1
Dataset updated
Dec 16, 2020
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Alexandra Thomsen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset was used to evaluate the utility of unoccupied aerial vehicles (UAV) for monitoring tidal marsh restoration at a site in Elkhorn Slough, on the Central Coast of California, USA. These data were used to graph percent of total area (histogram) and percent of area vegetated (line) in each elevation bin at the restoration site, up to 2.00 m NAVD88. Percent vegetated area is the area of classified vegetation in an elevation bin out of the total area in that bin.
Additional file 2: of Explainable statistical learning in public health for...
springernature.figshare.com
figshare.com
zip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul Schaik; Yonghong Peng; Adedokun Ojelabi; Jonathan Ling (2023). Additional file 2: of Explainable statistical learning in public health for policy development: the case of real-world suicide data [Dataset]. http://doi.org/10.6084/m9.figshare.8948036.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8948036.v1
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Paul Schaik; Yonghong Peng; Adedokun Ojelabi; Jonathan Ling
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Assessing bias. Figure S1. Plot of standardised predicted values against standardised residuals. Figure S2. Histogram and P-P plot of standardised residuals. (ZIP 91 kb)
Mean first-passage times from lung.
figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul K. Newton; Jeremy Mason; Kelly Bethel; Lyudmila A. Bazhenova; Jorge Nieva; Peter Kuhn (2023). Mean first-passage times from lung. [Dataset]. http://doi.org/10.1371/journal.pone.0034637.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0034637.t004
Dataset updated
Jun 2, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Paul K. Newton; Jeremy Mason; Kelly Bethel; Lyudmila A. Bazhenova; Jorge Nieva; Peter Kuhn
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Mean first-passage times (unnormailzed and normalized) from lung to each target site, obtained by Monte Carlo simulation. Histogram plot is shown in Figure 12.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Luv Harish Khati (2021). Heart_disease_patients_details [Dataset]. https://www.kaggle.com/luvharishkhati/heart-disease-patients-details

Heart_disease_patients_details

This notebook shows Histogram of columns of the Table.

Explore at:

zip(3371 bytes)Available download formats

Dataset updated

Jul 22, 2021

Authors

Luv Harish Khati

Description

Hello all, this dataset involves various factors effecting cancer and based upon those factors, I have created a Histogram of various columns of the table which leads to heart disease. A histogram is a bar graph-like representation of data that buckets a range of outcomes into columns along the x-axis. The y-axis represents the number count or percentage of occurrences in the data for each column and can be used to visualize data distribution. At last I have created combined histogram of entire table which involves all the columns. Giving Titles, X-axis name, Y-axis name, Sizes and Colors is also done in this notebook.

Clear search

Close search

Google apps

Main menu

Heart_disease_patients_details

GEE 7: Google Earth Engine Tutorial Pt. VII - Datasets - AmericaView - CKAN

Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm

NBA Rookies Performance Statistics and Minutes

NBA Rookies Performance Statistics and Minutes Played: 1980-2016

Tracking Basketball Prodigies' Growth and Achievements

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

NFL Player Suspensions

NFL Player Suspensions

Exploring Categories and Sources of Player Discipline

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Key Columns/Variables

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Histogram plot of the average alignment accuracy (averaged over 10 runs) for...

visual_qa_multipanel

Credit Rating History Dataset

Credit Rating History Dataset

Credit Rating History

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Appendix B. Figures containing a histogram of frequency of effect sizes on...

Survey data and visualisation script of the administrative burden of Galaxy...

Diabetes Healthcare: Comprehensive Dataset-AI

NOAA Inundation Analysis Tool

Speedtest Open Data - Four International cities - MEL, BKK, SHG, LAX plus...

Melb 14784 lines Avg download speed 69.4M Tests 0.39M

SHG 31207 lines Avg 233.7M Tests 0.56M

ALC 113 lines Avg 51.5M Test 1092

BKK 29684 lines Avg 215.9M Tests 1.2M

LAX 15505 lines Avg 218.5M Tests 0.74M

Data for "Are pseudo first-order kinetic constants properly calculated for...

SPSS files for experiment

AiportsUSA20112020clean

Online Automotive Sales Statistics'23 (Volkswagen)

Percent Vegetation and Elevations_raw data

Additional file 2: of Explainable statistical learning in public health for...

Mean first-passage times from lung.

Heart_disease_patients_details

This notebook shows Histogram of columns of the Table.