Facebook
TwitterHello all, this dataset involves various factors effecting cancer and based upon those factors, I have created a Histogram of various columns of the table which leads to heart disease. A histogram is a bar graph-like representation of data that buckets a range of outcomes into columns along the x-axis. The y-axis represents the number count or percentage of occurrences in the data for each column and can be used to visualize data distribution. At last I have created combined histogram of entire table which involves all the columns. Giving Titles, X-axis name, Y-axis name, Sizes and Colors is also done in this notebook.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Charts, Histograms, and Time Series • Create a histogram graph from band values of an image collection • Create a time series graph from band values of an image collection
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.
Facebook
TwitterBy Gabe Salzer [source]
This dataset contains essential performance statistics for NBA rookies from 1980-2016. Here you can find minute per game stats, points scored, field goals made and attempted, three-pointers made and attempted, free throws made and attempted (with the respective percentages for each), offensive rebounds, defensive rebounds, assists, steals blocks turnovers efficiency rating and Hall of Fame induction year. It is organized in descending order by minutes played per game as well as draft year. This Kaggle dataset is an excellent resource for basketball analysts to gain a better understanding of how rookies have evolved over the years—from their stats to how they were inducted into the Hall of Fame. With its great detail on individual players' performance data this dataset allows you to compare their performances against different eras in NBA history along with overall trends in rookie statistics. Compare rookies drafted far apart or those that played together- whatever your goal may be!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset is perfect for providing insight into the performance of NBA rookies over an extended period of time. The data covers rookie stats from 1980 to 2016 and includes statistics such as points scored, field goals made, free throw percentage, offensive rebounds, defensive rebounds and assists. It also provides the name of each rookie along with the year they were drafted and their Hall of Fame class.
This data set is useful for researching how rookies’ stats have changed over time in order to compare different eras or identify trends in player performance. It can also be used to evaluate players by comparing their stats against those of other players or previous years’ stats.
In order to use this dataset effectively, a few tips are helpful:
Consider using Field Goal Percentage (FG%), Three Point Percentage (3P%) and Free Throw Percentage (FT%) to measure a player’s efficiency beyond just points scored or field goals made/attempted (FGM/FGA).
Lookout for anomalies such as low efficiency ratings despite high minutes played as this could indicate that either a player has not had enough playing time in order for their statistics to reach what would be per game average when playing more minutes or that they simply did not play well over that short period with limited opportunities.
Try different visualizations with the data such as histograms, line graphs and scatter plots because each may offer different insights into varied aspects of the data set like comparison between individual years vs aggregate trends over multiple years etc.
Lastly it is important keep in mind whether you're dealing with cumulative totals over multiple seasons versus looking at individual season averages or per game numbers when attempting analysis on these sets!
- Evaluating the performance of historical NBA rookies over time and how this can help inform future draft picks in the NBA.
- Analysing the relative importance of certain performance stats, such as three-point percentage, to overall success and Hall of Fame induction from 1980-2016.
- Comparing rookie seasons across different years to identify common trends in terms of statistical contributions and development over time
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
File: NBA Rookies by Year_Hall of Fame Class.csv | Column name | Description | |:-----------------------|:------------------------------------------------------------------| | Name | The name of...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By FiveThirtyEight [source]
This dataset contains comprehensive information on NFL player suspensions. It includes detailed information such as the player's name, team they were playing for during the suspension, number of games suspended, and category of suspension. This data is ideal for anyone looking to analyze or research trends in NFL suspensions over time or compare different players' suspension records and can represent an invaluable source for new insights about professional football in America. So dive deep into this repository and see what meaningful stories you can tell—all under the Creative Commons Attribution 4.0 International License and MIT License. If you find this useful, let us know!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
Key Columns/Variables
The following is a list of key columns present in this dataset: - Name: Name of the player who was suspended. (String) - Team: The team that the player was playing for when suspension was issued. (String) - Games: The number of games suspended for which includes postseason games if applicable. (Integer) - Category: A description/categorization of why player was suspended e.g ‘substance abuse’ or ‘personal conduct’.(String) * Desc.: A brief synopsis describingsuspension further - often indicates what action led suspension to take place (e.g drug use).(String) Year: The year suspension originally took place.(Integer) Source: Information source behind suspension data.(String).
#### Exploring and Visualizing the Data
There are a variety of ways you can explore and analyze this data set including visualizations such as histograms, box plots, line graphs etc.. Additionally you can further explore correlations between various variables by performing linear regression or isolating individual instances by filtering out specific observations e.g all Substance Abuse offences committed against players in 2015 etc.. In order to identify meaningful relationships within data set we recommend performing univariate analysis i.e analyzing one variable at time and look for patterns which may be indicative wider trends within broader unit./population context which it represents! Here's example code snippet first step towards visualizing your own insights from NFL Suspension data set - generate histogram showing distribution type offense categories undertaken 2005 through 2015.
- An analysis of suspension frequencies over time to determine overall trends in NFL player discipline.
- Comparing the types of suspensions for players on different teams to evaluate any differences in the consequences for violations of team rules and regulations.
- A cross-sectional analysis to assess correlations between types and length of suspensions issued given various violation categories, such as substance abuse or personal conduct violations
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: nfl-suspensions-data.csv | Column name | Description | |:--------------|:------------------------------------------------------------| | name | Name of the player who was suspended. (String) | | team | The team the player was suspended from. (String) | | games | The number of games the player was suspended for. (Integer) | | category | The category of the suspension. (String) | | desc. | A description of the suspension. (String) | | year | The year the suspension occurred. (Integer) | | source | The source of the suspension information. (String) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit FiveThirtyEight.
Facebook
TwitterHistogram plot of the average alignment accuracy averaged over 10 runs for each viral genome shown in Table 1 and each aligner. Reads crossing splice junction regions are shown in pink, reads not crossing splice junction regions are shown in blue).
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Do you “see" what I “see"? A Multi-panel Visual Question and Answer Dataset for Large Language Model Chart Analysis
Publication: TBD (linked on publication)
GitHub Repo: TBD (linked on publication)
This is a multi-panel figure dataset for visual question and answer (VQA) to test large language/multimodal models (LMMs). Data contains synthetically generated multi-panel figures with histogram, scatter, and line plots. Included are full data used to create plots… See the full description on the dataset page: https://huggingface.co/datasets/ReadingTimeMachine/visual_qa_multipanel.
Facebook
TwitterBy Center for Municipal Finance [source]
The project that led to the creation of this dataset received funding from the Center for Corporate and Securities Law at the University of San Diego School of Law. The dataset itself can be accessed through a GitHub repository or on its dedicated website.
In terms of columns contained in this dataset, it encompasses a range of variables relevant to analyzing credit ratings. However, specific details about these columns are not provided in the given information. To acquire a more accurate understanding of the column labels and their corresponding attributes or measurements present in this dataset, further exploration or referencing additional resources may be required
Understanding the Data
The dataset consists of several columns that provide essential information about credit ratings and fixed income securities. Familiarize yourself with the column names and their meanings to better understand the data:
- Column 1: [Credit Agency]
- Column 2: [Issuer Name]
- Column 3: [CUSIP/ISIN]
- Column 4: [Rating Type]
- Column 5: [Rating Source]
- Column 6: [Rating Date]
Exploratory Data Analysis (EDA)
Before diving into detailed analysis, start by performing exploratory data analysis to get an overview of the dataset.
Identify Unique Values: Explore each column's unique values to understand rating agencies, issuers, rating types, sources, etc.
Frequency Distribution: Analyze the frequency distribution of various attributes like credit agencies or rating types to identify any imbalances or biases in the data.
Data Visualization
Visualizing your data can provide insights that are difficult to derive from tabular representation alone. Utilize various visualization techniques such as bar charts, pie charts, histograms, or line graphs based on your specific objectives.
For example:
- Plotting a histogram of each credit agency's ratings can help you understand their distribution across different categories.
- A time-series line graph can show how ratings have evolved over time for specific issuers or industries.
Analyzing Ratings Performance
One of the main objectives of using credit rating datasets is to assess the performance and accuracy of different credit agencies. Conducting a thorough analysis can help you understand how ratings have changed over time and evaluate the consistency of each agency's ratings.
Rating Changes Over Time: Analyze how ratings for specific issuers or industries have changed over different periods.
Comparing Rating Agencies: Compare ratings from different agencies to identify any discrepancies or trends. Are there consistent differences in their assessments?
Detecting Rating Trends
The dataset allows you to detect trends and correlations between various factors related to
- Credit Rating Analysis: This dataset can be used for analyzing credit ratings and trends of various fixed income securities. It provides historical credit rating data from different rating agencies, allowing researchers to study the performance, accuracy, and consistency of these ratings over time.
- Comparative Analysis: The dataset allows for comparative analysis between different agencies' credit ratings for a specific security or issuer. Researchers can compare the ratings assigned by different agencies and identify any discrepancies or differences in their assessments. This analysis can help in understanding variations in methodologies and improving the transparency of credit rating processes
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all ...
Facebook
TwitterFigures containing a histogram of frequency of effect sizes on AG and BG herbivores and a funnel plot of effect size and sample sizes indicating absence of publication bias.
Facebook
Twitterhttps://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Main publication Poll report and form on HAL Authors The raw data was generated by the poll respondents The authors of this Dataset, excluding Vlad Visan, are such respondents. There are also other respondents who chose to remain anonymous The script was written by Vlad Visan The raw format was adapted to a numerical format by Vlad Visan Overall description A poll took place in February 2024, to understand the administrative burden of using Galaxy, specifically for small-scale admins. Context Useful to anyone considering using Galaxy Done as part of the technology monitoring phase of the "Gestionnaire de workflows" (Workflow Management System) project of the OSUG LabEx File descriptions raw_data_names_removed.tsv Raw poll answers. With any personally identifiable information redacted. SSA-Poll-19-Feb-2024-Filtered-Numerical.tab This numerically filtered format is required by the script The transformation could be done automatically in the future, but there are some subtleties: "-1" denotes "ignore/invalid" Some empty answers have to manually be converted to "0" I manually changed one answer that was "0" to "-1" after reading the associated comment which made it clear that "invalid" was more appropriate numericalCsvImportAndGenerateCharts.R The script parses the data, and creates one distribution/histogram graph per column It expects a filtered version, with only the numerical fields. Form-V2.pdf Survey questions, with several errors corrected: End-user assistance questions were worded wrongly Various spelling/wording mistakes
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
DESCRIPTION
NIDDK (National Institute of Diabetes and Digestive and Kidney Diseases) research creates knowledge about and treatments for the most chronic, costly, and consequential diseases.
The dataset used in this project is originally from NIDDK. The objective is to predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. Build a model to accurately predict whether the patients in the dataset have diabetes or not.
Dataset Description
The datasets consists of several medical predictor variables and one target variable (Outcome). Predictor variables includes the number of pregnancies the patient has had, their BMI, insulin level, age, and more.
Variables Description Pregnancies Number of times pregnant Glucose Plasma glucose concentration in an oral glucose tolerance test BloodPressure Diastolic blood pressure (mm Hg) SkinThickness Triceps skinfold thickness (mm) Insulin Two hour serum insulin BMI Body Mass Index DiabetesPedigreeFunction Diabetes pedigree function Age Age in years Outcome Class variable (either 0 or 1). 268 of 768 values are 1, and the others are 0 Project Task: Week 1
Data Exploration:
Perform descriptive analysis. Understand the variables and their corresponding values. On the columns below, a value of zero does not make sense and thus indicates missing value:
Glucose
BloodPressure
SkinThickness
Insulin
BMI
Visually explore these variables using histograms. Treat the missing values accordingly.
There are integer and float data type variables in this dataset. Create a count (frequency) plot describing the data types and the count of variables.
Data Exploration:
Check the balance of the data by plotting the count of outcomes by their value. Describe your findings and plan future course of action.
Create scatter charts between the pair of variables to understand the relationships. Describe your findings.
Perform correlation analysis. Visually explore it using a heat map.
Project Task: Week 2
Data Modeling:
Devise strategies for model building. It is important to decide the right validation framework. Express your thought process.
Apply an appropriate classification algorithm to build a model.
Compare various models with the results from KNN algorithm.
Create a classification report by analyzing sensitivity, specificity, AUC (ROC curve), etc.
Please be descriptive to explain what values of these parameter you have used.
Data Reporting:
Create a dashboard in tableau by choosing appropriate chart types and metrics useful for the business. The dashboard must entail the following:
Pie chart to describe the diabetic or non-diabetic population
Scatter charts between relevant variables to analyze the relationships
Histogram or frequency charts to analyze the distribution of the data
Heatmap of correlation analysis among the relevant variables
Create bins of these age values: 20-25, 25-30, 30-35, etc. Analyze different variables for these age brackets using a bubble chart.
Facebook
TwitterCoastal storms and other meteorological phenomenon can have a significant impact on how high water levels rise and how often. The inundation analysis program is extremely beneficial in determining the frequency (or the occurrence of high waters for different elevations above a specified threshold) and duration (or the amount of time that the specified location is inundated by water) of observed high waters (tides). Statistical output from these analyses can be useful in planning marsh restoration activities. Additionally, the analyses have broader applications for the coastal engineering and mapping community, such as, ecosystem management and regional climate change. Since these statistical outputs are station specific, use for evaluating surrounding areas may be limited.ProductsThe data input for this tool is 6-minute water level data time series and the tabulated times and heights of the high tides over a user specified time period, relative to a desired tidal datum or user-specified datum. The data output of this tool provides summary statistics, which includes the number of occurrences of inundation above the threshold (events) and length of duration of inundation of each events above the threshold elevation for a specified time period. In addition to summary statistics, graphical outputs are provided using three plots. The first plot is a histogram of frequency of occurrence relative to the threshold elevation, the second plot is a histogram of the frequency of duration of inundation, and the third plot is an X-Y plot of frequency of elevation versus duration of inundation for each event. Input data time series are presently limited to the verified data from a set of operating and historical tide stations in the NOAA CO-OPS data base.Data and Resources CO-OPS Frequency and Duration of Inundation Analysis Tool: User's GuideCO-OPS Frequency and Duration of Inundation Analysis Tool: Limitations and Uncertainty Assessment
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset compares four cities FIXED-line broadband internet speeds: - Melbourne, AU - Bangkok, TH - Shanghai, CN - Los Angeles, US - Alice Springs, AU
ERRATA: 1.Data is for Q3 2020, but some files are labelled incorrectly as 02-20 of June 20. They all should read Sept 20, or 09-20 as Q3 20, rather than Q2. Will rename and reload. Amended in v7.
*lines of data for each geojson file; a line equates to a 600m^2 location, inc total tests, devices used, and average upload and download speed - MEL 16181 locations/lines => 0.85M speedtests (16.7 tests per 100people) - SHG 31745 lines => 0.65M speedtests (2.5/100pp) - BKK 29296 lines => 1.5M speedtests (14.3/100pp) - LAX 15899 lines => 1.3M speedtests (10.4/100pp) - ALC 76 lines => 500 speedtests (2/100pp)
Geojsons of these 2* by 2* extracts for MEL, BKK, SHG now added, and LAX added v6. Alice Springs added v15.
This dataset unpacks, geospatially, data summaries provided in Speedtest Global Index (linked below). See Jupyter Notebook (*.ipynb) to interrogate geo data. See link to install Jupyter.
** To Do Will add Google Map versions so everyone can see without installing Jupyter. - Link to Google Map (BKK) added below. Key:Green > 100Mbps(Superfast). Black > 500Mbps (Ultrafast). CSV provided. Code in Speedtestv1.1.ipynb Jupyter Notebook. - Community (Whirlpool) surprised [Link: https://whrl.pl/RgAPTl] that Melb has 20% at or above 100Mbps. Suggest plot Top 20% on map for community. Google Map link - now added (and tweet).
** Python melb = au_tiles.cx[144:146 , -39:-37] #Lat/Lon extract shg = tiles.cx[120:122 , 30:32] #Lat/Lon extract bkk = tiles.cx[100:102 , 13:15] #Lat/Lon extract lax = tiles.cx[-118:-120, 33:35] #lat/Lon extract ALC=tiles.cx[132:134, -22:-24] #Lat/Lon extract
Histograms (v9), and data visualisations (v3,5,9,11) will be provided. Data Sourced from - This is an extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).
**VERSIONS v.24 Add tweet and google map of Top 20% (over 100Mbps locations) in Mel Q322. Add v.1.5 MEL-Superfast notebook, and CSV of results (now on Google Map; link below). v23. Add graph of 2022 Broadband distribution, and compare 2020 - 2022. Updated v1.4 Jupyter notebook. v22. Add Import ipynb; workflow-import-4cities. v21. Add Q3 2022 data; five cities inc ALC. Geojson files. (2020; 4.3M tests 2022; 2.9M tests)
v20. Speedtest - Five Cities inc ALC. v19. Add ALC2.ipynb. v18. Add ALC line graph. v17. Added ipynb for ALC. Added ALC to title.v16. Load Alice Springs Data Q221 - csv. Added Google Map link of ALC. v15. Load Melb Q1 2021 data - csv. V14. Added Melb Q1 2021 data - geojson. v13. Added Twitter link to pics. v12 Add Line-Compare pic (fastest 1000 locations) inc Jupyter (nbn-intl-v1.2.ipynb). v11 Add Line-Compare pic, plotting Four Cities on a graph. v10 Add Four Histograms in one pic. v9 Add Histogram for Four Cities. Add NBN-Intl.v1.1.ipynb (Jupyter Notebook). v8 Renamed LAX file to Q3, rather than 03. v7 Amended file names of BKK files to correctly label as Q3, not Q2 or 06. v6 Added LAX file. v5 Add screenshot of BKK Google Map. v4 Add BKK Google map(link below), and BKK csv mapping files. v3 replaced MEL map with big key version. Prev key was very tiny in top right corner. v2 Uploaded MEL, SHG, BKK data and Jupyter Notebook v1 Metadata record
** LICENCE AWS data licence on Speedtest data is "CC BY-NC-SA 4.0", so use of this data must be: - non-commercial (NC) - reuse must be share-alike (SA)(add same licence). This restricts the standard CC-BY Figshare licence.
** Other uses of Speedtest Open Data; - see link at Speedtest below.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
DescriptionThis dataset accompanies the publication "Are pseudo first-order kinetic constants properly calculated for catalytic membranes?" and includes the database containing all the analysed literature, the kinetic rate constants, statistics and the code used to complete all the analysis and generate the figures for the publication. The study compares different methods of calculating pseudo first-order (PFO) kinetic rate constants for degrading water contaminants using catalytic membranes.This dataset includes:database.xlsx: Excel file containing all details about surveyed literature and calculated PFO kinetic results.catalytic_membrane_meta-analysis-0.1.3.zip: zip folder containing figures from the publication and python codes and instructions for generating PFO results and individual figures for every research article listed in database.xlsxData Contentsdatabase.xlsxPorous Catalytic Membranes (Tab 1):Index - number associated with analysis and figures generated by python codes in catalytic_membrane_meta-analysis-0.1.3.zipAuthor full namesTitleYearSource titleDOIAbstractKeywordsData Table (Tab 2):Index - number associated with analysis and figures generated by python codes in catalytic_membrane_meta-analysis-0.1.3.zipTitleComments on C/C0 Graph - description on what data was extracted, the format of the data and any modification required for analysis (if any).Data Location - the location within the research articles or supplimentary informations files where the C/C0 graphs can be found.Removal Mechanism - The removal mechanism specified by the research articles that are associated with the C/C0 graphs.All the PFO kinetic contant results and the statistics associated with different methods of calculating PFO kinectic constantscatalytic_membrane_meta-analysis-0.1.3.zipcase_studies:Directory containing the data, figures and code to generate the figures for the four case studies presented in the article.concentration_graphs:Directory containing subdirectories jpg and csv where concentration graphs (C/C0) are generated to by "data-analysis.py" script examples:Directory containing figures and code relating to the comparison of calculation methodologies for PFO kinetic rate constants in the article.exp_equil_graphs:Directory containing subdirectories jpg and csv where the graphs related to the exponential model with equilibrium constant are generated to by "data-analysis.py" script exponential_graphs:Directory containing subdirectories jpg and csv where the graphs related to the exponential model are generated to by "data-analysis.py" script first_order_kinetic_graphs:Directory containing subdirectories jpg and csv where the graphs related to the linearized from of C/C0 are generated to by "data-analysis.py" scriptindividual_mechanisms:Directory containing the meta-analysis figures for each separate mechanism (found in supplementary information) and the code to generate these figures.meta-analysis:Directory containing the meta-analysis figure found in the article, histograms subdirectory containing the histogram figures and "database.xlsx" from which the data use used to generate the figures is drawnmodel_parameters:Directory containing "model_parameters.csv" which is the output file for the "data-analysis.py" script where all the calculated PSO kinetic contant values and statistics are recorded.processed_data:Directory containing .csv files associated with each analyzed journal article labelled by the index defined in "database.xlsx". The data is the C/C0 data from each paper after modifications were made as described in the journal article including normalization and converting time scale to minutes etc. These files are used by "data-analysis.py" to generate PSO kinetic constants and the figures associated with each analysis method.data-analysis.py:The python script which should be run to analyze the data and generate figures. Licence:The licence agreement for use of the codes and data.meta-analysis:The python script for generating the meta-analysis figures and histogram figures.README.md:The details and instructions for using the provided codes.Usage NotesPlease cite the original publication when using this dataset.LicensingThis dataset is licensed under CC BY 4.0. See the publication for further details.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis SPSS files used in the paper to analyze the experiment results. The tests we executed in the paper are as follows, in the SPSS syntax:** PreQuestionnaire.sav, leading to Table 2T-TEST GROUPS=form(1 2) /MISSING=ANALYSIS /VARIABLES=grade USLEC UCLEC /CRITERIA=CI(.95).NPAR TESTS /M-W= CDFAM UCFAM USFAM UCHW USHW CDHW BY form(1 2) /MISSING ANALYSIS.** Anova.sav, leading to the decision of analyzing the two case studies independentlyGLM EntRec EntPre RelRec RelPre TotRec TotPre AdjRelRec AdjRelPre AdjTotRec AdjTotPre BY Domain Form /METHOD=SSTYPE(3) /INTERCEPT=INCLUDE /POSTHOC=Domain Form(TUKEY) /PLOT=PROFILE(Domain*Form) TYPE=LINE ERRORBAR=NO MEANREFERENCE=NO YAXIS=AUTO /PRINT=DESCRIPTIVE ETASQ /CRITERIA=ALPHA(.05) /DESIGN= Domain Form Domain*Form.** DH.sav, leading to Table 3T-TEST GROUPS=Form(1 2) /MISSING=ANALYSIS /VARIABLES=EntRec EntPre RelRec RelPre TotRec TotPre AdjRelRec AdjRelPre AdjTotRec AdjTotPre /CRITERIA=CI(.95).** PH.sav, leading to Table 4T-TEST GROUPS=Form(1 2) /MISSING=ANALYSIS /VARIABLES=EntRec EntPre RelRec RelPre TotRec TotPre AdjRelRec AdjRelPre AdjTotRec AdjTotPre /CRITERIA=CI(.95).** Preferences.sav, leading to Table 5 and Table 6NPAR TESTS /M-W= UCCM USCM UCCDID USCDID UCRID USRID USSTRUCT UCSTRUCT UCOVER USOVER UCREQ USREQ BY Form(1 2) /MISSING ANALYSIS.EXAMINE VARIABLES=UCCM USCM UCCDID USCDID UCRID USRID USSTRUCT UCSTRUCT UCOVER USOVER UCREQ USREQ BY Form /PLOT HISTOGRAM NPPLOT /STATISTICS DESCRIPTIVES /CINTERVAL 95 /MISSING LISTWISE /NOTOTAL.NPAR TESTS /M-W= UCCM USCM UCCDID USCDID UCRID USRID USSTRUCT UCSTRUCT UCOVER USOVER UCREQ USREQ BY Form(1 2) /STATISTICS=DESCRIPTIVES /MISSING ANALYSIS.GLM EntRec EntPre RelRec RelPre TotRec TotPre AdjRelRec AdjRelPre AdjTotRec AdjTotPre BY Domain Form /METHOD=SSTYPE(3) /INTERCEPT=INCLUDE /POSTHOC=Domain Form(TUKEY) /PLOT=PROFILE(Domain*Form) TYPE=LINE ERRORBAR=NO MEANREFERENCE=NO YAXIS=AUTO /PRINT=DESCRIPTIVE ETASQ /CRITERIA=ALPHA(.05) /DESIGN= Domain Form Domain*Form.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The contains flight statistics for all airports in the United States from January 2011 to December 2020. Each observation is reported by month, year, airport, and airline. Flights can be categorized as on time, delayed, canceled, or diverted. Flight delays are attributed to five causes: carrier, weather, NAS, security, and late aircraft. The data was downloaded from the Bureau of Transportation Statistics website https://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp.
The accompanying notebook explores commercial airplane flight delays in the United States using Python's visualization capabilities in Matplotlib and Seaborn, through the lenses of seasonality, airport traffic, and airline performance.
The clean data set (delays_clean.csv) is analyzed using the following visualizations:
Bar chart Bar chart subplots Lollipop chart Tree maps Line plot Histogram Histogram subplots Horizontal stacked bar chart Ranked horizontal bar chart Box plot Pareto chart - double axis Marginal histogram Pie charts Scatter plot Violin plot Map chart Linear regression
Facebook
TwitterThis dataset contains information from Turkey's largest online real estate and car sales platform. The dataset covers a 3-month period from January 1, 2023, to March 31, 2023, and focuses solely on Volkswagen brand cars. The dataset consists of 13 variables, including customer_id, advertisement_number, brand, model, variant, year, kilometer, color, transmission, fuel, city, ad_date, and price.
The dataset provides valuable insights into the sales and advertising trends for Volkswagen cars in Turkey during the first quarter of 2023. The data can be used to identify patterns and trends in consumer behavior, such as which models are most popular, the most common transmission type, and the most common fuel type. The data can also be used to evaluate the effectiveness of advertising campaigns and to identify which cities have the highest demand for Volkswagen cars.
Overall, this dataset provides a rich source of information for anyone interested in the automotive industry in Turkey or for those who want to explore the trends in Volkswagen car sales during the first quarter of 2023.
Here are the descriptions of the variables in the dataset:
customer_id: Unique identifier for the customer who placed the advertisement
advertisement_number: Unique identifier number for the advertisement
brand: The brand of the car (in this dataset, it is always Volkswagen)
model: The model of the car (e.g., Golf, Polo, Passat, etc.)
variant: The variant of the car (e.g., 1.6 FSI Midline, 2.0 TDI Comfortline, etc.)
year: The year that the car was manufactured
kilometer: The distance that the car has been driven (in kilometers)
color: The color of the car
transmission: The type of transmission (manual or automatic)
fuel: The type of fuel used by the car (e.g., petrol, diesel, hybrid, etc.)
city: The city where the advertisement was placed
ad_date: The date when the advertisement was placed
price: The asking price for the car
Here are some possible analyses and insights that can be derived from this dataset:
Trend analysis: It is possible to analyze the trend of Volkswagen car sales over the three-month period covered by the dataset. This can be done by plotting the number of advertisements placed over time.
Model popularity analysis: It is possible to determine which Volkswagen car models are the most popular based on the number of advertisements placed for each model. This can be done by grouping the data by model and counting the number of advertisements for each model.
Price analysis: It is possible to analyze the distribution of prices for Volkswagen cars. This can be done by creating a histogram of the prices.
Kilometer analysis: It is possible to analyze the distribution of kilometers driven for Volkswagen cars. This can be done by creating a histogram of the kilometer values.
Geographic analysis: It is possible to analyze the distribution of Volkswagen car sales across different cities. This can be done by grouping the data by city and counting the number of advertisements for each city.
Correlation analysis: It is possible to analyze the correlations between different variables, such as the year and price of the car or the kilometer and price of the car. This can be done by creating scatterplots of the variables and calculating correlation coefficients.
Data Cleaning: Some data cleaning processes can be performed on the dataset. Firstly, the missing values can be checked, and the missing values may need to be filled or removed from the dataset. Additionally, the date formats in the dataset and the data types of the variables can be checked and adjusted accordingly. Outliers in the dataset may also need to be checked and corrected or removed.
These cleaning processes in the dataset will help obtain healthier results for data analysis and machine learning algorithms.
As a result, this dataset is a workable dataset for data cleaning and a valuable resource that interested parties can use in their data analysis and machine learning projects.
All of these analyses can be visualized using various graphs and charts, such as line charts, histograms, and scatterplots.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was used to evaluate the utility of unoccupied aerial vehicles (UAV) for monitoring tidal marsh restoration at a site in Elkhorn Slough, on the Central Coast of California, USA. These data were used to graph percent of total area (histogram) and percent of area vegetated (line) in each elevation bin at the restoration site, up to 2.00 m NAVD88. Percent vegetated area is the area of classified vegetation in an elevation bin out of the total area in that bin.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Assessing bias. Figure S1. Plot of standardised predicted values against standardised residuals. Figure S2. Histogram and P-P plot of standardised residuals. (ZIP 91 kb)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mean first-passage times (unnormailzed and normalized) from lung to each target site, obtained by Monte Carlo simulation. Histogram plot is shown in Figure 12.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterHello all, this dataset involves various factors effecting cancer and based upon those factors, I have created a Histogram of various columns of the table which leads to heart disease. A histogram is a bar graph-like representation of data that buckets a range of outcomes into columns along the x-axis. The y-axis represents the number count or percentage of occurrences in the data for each column and can be used to visualize data distribution. At last I have created combined histogram of entire table which involves all the columns. Giving Titles, X-axis name, Y-axis name, Sizes and Colors is also done in this notebook.