Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Figure S1. Illustration of indicator mineral map datasets. Figure S2. Illustration of fault map datasets. Figure S3. Fault system at CGF. Figure S4. Fault system at BGF and DPGF. Figure S5. Illustration of LST datasets. Figure S6. Histograms and CDF plots of Two-Class Mineral Maps versus Fault Distance Maps. Figure S7. Histograms and CDF plots of Two-Class Mineral Maps versus Fault Density Maps. Figure S8. Histograms and CDF plots of Two-class Temperature Maps versus fault datasets. The top two rows correspond to Fault Distance Maps, while the bottom two rows correspond to Fault Density Maps. Figure S9. Histograms and CDF plots of Two-Class Mineral Maps versus Multiclass Temperature Map. Figure S10. The multiple comparisons of ANOVA. The plots show the mean estimates (circles) and 95% confidence intervals (bars) for each group of SGP. Red symbols highlight groups with Significant Differences from the control group (blue). Grey symbols indicate groups with Insignificant Differences where confidence intervals overlap with the control group.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By FiveThirtyEight [source]
This dataset contains comprehensive information on NFL player suspensions. It includes detailed information such as the player's name, team they were playing for during the suspension, number of games suspended, and category of suspension. This data is ideal for anyone looking to analyze or research trends in NFL suspensions over time or compare different players' suspension records and can represent an invaluable source for new insights about professional football in America. So dive deep into this repository and see what meaningful stories you can tell—all under the Creative Commons Attribution 4.0 International License and MIT License. If you find this useful, let us know!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
Key Columns/Variables
The following is a list of key columns present in this dataset: - Name: Name of the player who was suspended. (String) - Team: The team that the player was playing for when suspension was issued. (String) - Games: The number of games suspended for which includes postseason games if applicable. (Integer) - Category: A description/categorization of why player was suspended e.g ‘substance abuse’ or ‘personal conduct’.(String) * Desc.: A brief synopsis describingsuspension further - often indicates what action led suspension to take place (e.g drug use).(String) Year: The year suspension originally took place.(Integer) Source: Information source behind suspension data.(String).
#### Exploring and Visualizing the Data
There are a variety of ways you can explore and analyze this data set including visualizations such as histograms, box plots, line graphs etc.. Additionally you can further explore correlations between various variables by performing linear regression or isolating individual instances by filtering out specific observations e.g all Substance Abuse offences committed against players in 2015 etc.. In order to identify meaningful relationships within data set we recommend performing univariate analysis i.e analyzing one variable at time and look for patterns which may be indicative wider trends within broader unit./population context which it represents! Here's example code snippet first step towards visualizing your own insights from NFL Suspension data set - generate histogram showing distribution type offense categories undertaken 2005 through 2015.
- An analysis of suspension frequencies over time to determine overall trends in NFL player discipline.
- Comparing the types of suspensions for players on different teams to evaluate any differences in the consequences for violations of team rules and regulations.
- A cross-sectional analysis to assess correlations between types and length of suspensions issued given various violation categories, such as substance abuse or personal conduct violations
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: nfl-suspensions-data.csv | Column name | Description | |:--------------|:------------------------------------------------------------| | name | Name of the player who was suspended. (String) | | team | The team the player was suspended from. (String) | | games | The number of games the player was suspended for. (Integer) | | category | The category of the suspension. (String) | | desc. | A description of the suspension. (String) | | year | The year the suspension occurred. (Integer) | | source | The source of the suspension information. (String) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit FiveThirtyEight.
Facebook
TwitterBy Center for Municipal Finance [source]
The project that led to the creation of this dataset received funding from the Center for Corporate and Securities Law at the University of San Diego School of Law. The dataset itself can be accessed through a GitHub repository or on its dedicated website.
In terms of columns contained in this dataset, it encompasses a range of variables relevant to analyzing credit ratings. However, specific details about these columns are not provided in the given information. To acquire a more accurate understanding of the column labels and their corresponding attributes or measurements present in this dataset, further exploration or referencing additional resources may be required
Understanding the Data
The dataset consists of several columns that provide essential information about credit ratings and fixed income securities. Familiarize yourself with the column names and their meanings to better understand the data:
- Column 1: [Credit Agency]
- Column 2: [Issuer Name]
- Column 3: [CUSIP/ISIN]
- Column 4: [Rating Type]
- Column 5: [Rating Source]
- Column 6: [Rating Date]
Exploratory Data Analysis (EDA)
Before diving into detailed analysis, start by performing exploratory data analysis to get an overview of the dataset.
Identify Unique Values: Explore each column's unique values to understand rating agencies, issuers, rating types, sources, etc.
Frequency Distribution: Analyze the frequency distribution of various attributes like credit agencies or rating types to identify any imbalances or biases in the data.
Data Visualization
Visualizing your data can provide insights that are difficult to derive from tabular representation alone. Utilize various visualization techniques such as bar charts, pie charts, histograms, or line graphs based on your specific objectives.
For example:
- Plotting a histogram of each credit agency's ratings can help you understand their distribution across different categories.
- A time-series line graph can show how ratings have evolved over time for specific issuers or industries.
Analyzing Ratings Performance
One of the main objectives of using credit rating datasets is to assess the performance and accuracy of different credit agencies. Conducting a thorough analysis can help you understand how ratings have changed over time and evaluate the consistency of each agency's ratings.
Rating Changes Over Time: Analyze how ratings for specific issuers or industries have changed over different periods.
Comparing Rating Agencies: Compare ratings from different agencies to identify any discrepancies or trends. Are there consistent differences in their assessments?
Detecting Rating Trends
The dataset allows you to detect trends and correlations between various factors related to
- Credit Rating Analysis: This dataset can be used for analyzing credit ratings and trends of various fixed income securities. It provides historical credit rating data from different rating agencies, allowing researchers to study the performance, accuracy, and consistency of these ratings over time.
- Comparative Analysis: The dataset allows for comparative analysis between different agencies' credit ratings for a specific security or issuer. Researchers can compare the ratings assigned by different agencies and identify any discrepancies or differences in their assessments. This analysis can help in understanding variations in methodologies and improving the transparency of credit rating processes
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all ...
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This file contains the supplementary material for the publication
Protein secondary-structure description with a coarse-grained model by Gerald R. Kneller and K. Hinsen http://dx.doi.org/10.1107/S1399004715007191 Acta Cryst. (2015). D71, 1411-1422
Datasets in this file
1) ScrewFit and ScrewFrame parameters for ideal secondary-structure elements
Scripts: /code/import_ideal_structures /code/analyze_ideal_structures
1.1) The PDB files generated with Chimera
/data/ideal_structures/3-10.pdb /data/ideal_structures/alpha.pdb /data/ideal_structures/beta-antiparallel.pdb /data/ideal_structures/beta-parallel.pdb /data/ideal_structures/pi.pdb
1.2) The corresponding MOSAIC datasets
/data/ideal_structures/3-10 /data/ideal_structures/alpha /data/ideal_structures/beta-antiparallel /data/ideal_structures/beta-parallel /data/ideal_structures/pi
1.3) The ScrewFit parameters
/data/ideal_structures/screwfit/3-10 /data/ideal_structures/screwfit/alpha /data/ideal_structures/screwfit/beta-antiparallel /data/ideal_structures/screwfit/beta-parallel /data/ideal_structures/screwfit/pi
1.4) The ScrewFrame parameters
/data/ideal_structures/screwframe/3-10 /data/ideal_structures/screwframe/alpha /data/ideal_structures/screwframe/beta-antiparallel /data/ideal_structures/screwframe/beta-parallel /data/ideal_structures/screwframe/pi
2) Statistics for ScrewFit and ScrewFrame parameters computed for the ASTRAL SCOPe subset with less than 40% sequence identity.
Scripts: /code/astral_analysis /code/fit_rho_distributions /code/plot_histograms
2.1) The ASTRAL database (link to published ActivePaper)
/data/astral_2.04
2.2) The histograms for the ScrewFit and ScrewFrame parameters for the all-alpha and all-beta subsets
/data/histograms/astral_alpha/screwfit /data/histograms/astral_alpha/screwframe
/data/histograms/astral_beta/screwfit /data/histograms/astral_beta/screwframe
2.3) The Gaussians fitted to the peaks in the distributions for rho
/data/fitted_rho_distributions/screwfit /data/fitted_rho_distributions/screwframe
2.4) Plots
/documentation/delta.pdf /documentation/delta_q.pdf /documentation/delta_r.pdf /documentation/p.pdf /documentation/rho-detail.pdf /documentation/rho.pdf /documentation/sigma.pdf /documentation/tau.pdf
3) Comparison of secondary-structure identification between ScrewFrame and DSSP.
Script: /code/compare_secondary_structure_assignments /code/plot_histograms
3.1) The histograms of the lengths of secondary-structure elements
/data/histograms/secondary_structure/length-alpha-dssp /data/histograms/secondary_structure/length-alpha-screwframe /data/histograms/secondary_structure/length-beta-dssp /data/histograms/secondary_structure/length-beta-screwframe
3.2) The 2D histograms of the number of residues inside identified secondary-structure elements
/data/histograms/secondary_structure/n-alpha /data/histograms/secondary_structure/n-beta
3.3) The distribution of rho inside alpha helices
/data/histograms/secondary_structure/rho-alpha-dssp
3.3) Plots
/documentation/lengths-alpha.pdf /documentation/lengths-beta.pdf /documentation/n-alpha.pdf /documentation/n-beta.pdf /documentation/rho-alpha-dssp.pdf
4) Illustration for myoglobin and VADC-1
Scripts: /code/import_myoglobin_vdac /code/analyze_myoglobin /code/analyze_vdac /code/perturbation_analysis
4.1) Imported structures in MOSAIC format: PDB code 1A6G for myoglobin PDB code 2K4T for VDAC-1
/data/myoglobin /data/VDAC-1
4.2) Plots showing rho and delta
/documentation/rho-myoglobin.pdf /documentation/delta-myoglobin.pdf
4.3) Tube models for visualization with Chimera
/documentation/myoglobin-tube.bld /documentation/VDAC-1-tube.bld
4.4) Sensitivity to perturbations in the coordinates
/documentation/rho-perturbed-myoglobin.pdf /documentation/delta-perturbed-VDAC-1.pdf /documentation/rho-perturbed-myoglobin.pdf /documentation/delta-perturbed-VDAC-1.pdf /documentation/myoglobin-perturbation.pdf /documentation/VDAC-1-perturbation.pdf
5) Analysis of CA-only structures in the PDB
Scripts: /code/ca_analysis /code/import_calpha_structures /code/plot_histograms
5.1) Imported CA-only structures in MOSAIC format
/data/pdb_ca_only_structures
5.2) Histograms for ScrewFrame parameters
/data/histograms/ca_only_structures
5.3) Plots
/documentation/delta_ca.pdf /documentation/delta_q_ca.pdf /documentation/delta_r_ca.pdf /documentation/p_ca.pdf /documentation/rho_ca.pdf /documentation/sigma_ca.pdf /documentation/tau_ca.pdf
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Numberical values to generate manuscript graphs and histograms.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is an Australian extract of Speedtest Open data available at Amazon WS (link below - opendata.aws).AWS data licence is "CC BY-NC-SA 4.0", so use of this data must be:- non-commercial (NC)- reuse must be share-alike (SA)(add same licence).This restricts the standard CC-BY Figshare licence.A world speedtest open data was dowloaded (>400Mb, 7M lines of data). An extract of Australia's location (lat, long) revealed 88,000 lines of data (attached as csv).A Jupyter notebook of extract process is attached.A link to Twitter thread of outputs provided.A link to Data tutorial provided (GitHub), including Jupyter Notebook to analyse World Speedtest data, selecting one US State.Data Shows: (Q2)- 3.1M speedtests- 762,000 devices- 88,000 grid locations (600m * 600m), summarised as a point- average speed 33.7Mbps (down), 12.4M (up)- Max speed 724Mbps- data is for 600m * 600m grids, showing average speed up/down, number of tests, and number of users (IP). Added centroid, and now lat/long.See tweet of image of centroids also attached.Versions:v15/16. Add Hist comparing Q1-21 vs Q2-20. Inc ipynb (incHistQ121, v.1.3-Q121) to calc.v14 Add AUS Speedtest Q1 2021 geojson.(79k lines avg d/l 45.4Mbps)v13 - Added three colour MELB map (less than 20Mbps, over 90Mbps, 20-90Mbps)v12 - Added AUS - Syd - Mel Line Chart Q320.v11 - Add line chart compare Q2, Q3, Q4 plus Melb - result virtually indistinguishable. Add line chart to compare Syd - Melb Q3. Also virtually indistinguishable. Add HIST compare Syd - Melb Q3. Add new Jupyter with graph calcs (nbn-AUS-v1.3). Some ERRATA document in Notebook. Issue with resorting table, and graphing only part of table. Not an issue if all lines of table graphed.v10 - Load AURIN sample pics. Speedtest data loaded to AURIN geo-analytic platform; requires edu.au login.v9 - Add comparative Q2, Q3, Q4 Hist pic.v8 - Added Q4 data geojson. Add Q3, Q4 Hist pic.v7 - Rename to include Q2, Q3 in Title.v6 - Add Q3 20 data. Rename geojson AUS data as Q2. Add comparative Histogram. Calc in International.ipynb.v5 - add Jupyter Notebook inc Histograms. Hist is count of geo-locations avg download speed (unweighted by tests).v4 - added Melb choropleth (png 50Mpix) inc legend. (To do - add Melb.geojson). Posted Link to AURIN description of Speedtest data.v3 - Add super fast data (>100Mbps) less than 1% of data - 697 lines. Includes png of superfast.plot(). Link below to Google Maps version of superfast data points. Also Google map of first 100 data points - sample data. Geojson format for loading into GeoPandas, per Jupyter Notebook. New version of Jupyter Notebook, v.1.1.v2 - add centroids image.v1 - initial data load.** Future Work- combine Speedtest data with NBN Technology by location data (national map.gov.au); https://www.data.gov.au/dataset/national-broadband-network-connections-by-technology-type- combine Speedtest data with SEIFA data - socioeconomic categories - to discuss with AURIN.- Further international comparisons- discussed collaboration with Assoc Prof Tooran Alizadeh, USyd.
Facebook
TwitterThis dataset contains information from Turkey's largest online real estate and car sales platform. The dataset covers a 3-month period from January 1, 2023, to March 31, 2023, and focuses solely on Volkswagen brand cars. The dataset consists of 13 variables, including customer_id, advertisement_number, brand, model, variant, year, kilometer, color, transmission, fuel, city, ad_date, and price.
The dataset provides valuable insights into the sales and advertising trends for Volkswagen cars in Turkey during the first quarter of 2023. The data can be used to identify patterns and trends in consumer behavior, such as which models are most popular, the most common transmission type, and the most common fuel type. The data can also be used to evaluate the effectiveness of advertising campaigns and to identify which cities have the highest demand for Volkswagen cars.
Overall, this dataset provides a rich source of information for anyone interested in the automotive industry in Turkey or for those who want to explore the trends in Volkswagen car sales during the first quarter of 2023.
Here are the descriptions of the variables in the dataset:
customer_id: Unique identifier for the customer who placed the advertisement
advertisement_number: Unique identifier number for the advertisement
brand: The brand of the car (in this dataset, it is always Volkswagen)
model: The model of the car (e.g., Golf, Polo, Passat, etc.)
variant: The variant of the car (e.g., 1.6 FSI Midline, 2.0 TDI Comfortline, etc.)
year: The year that the car was manufactured
kilometer: The distance that the car has been driven (in kilometers)
color: The color of the car
transmission: The type of transmission (manual or automatic)
fuel: The type of fuel used by the car (e.g., petrol, diesel, hybrid, etc.)
city: The city where the advertisement was placed
ad_date: The date when the advertisement was placed
price: The asking price for the car
Here are some possible analyses and insights that can be derived from this dataset:
Trend analysis: It is possible to analyze the trend of Volkswagen car sales over the three-month period covered by the dataset. This can be done by plotting the number of advertisements placed over time.
Model popularity analysis: It is possible to determine which Volkswagen car models are the most popular based on the number of advertisements placed for each model. This can be done by grouping the data by model and counting the number of advertisements for each model.
Price analysis: It is possible to analyze the distribution of prices for Volkswagen cars. This can be done by creating a histogram of the prices.
Kilometer analysis: It is possible to analyze the distribution of kilometers driven for Volkswagen cars. This can be done by creating a histogram of the kilometer values.
Geographic analysis: It is possible to analyze the distribution of Volkswagen car sales across different cities. This can be done by grouping the data by city and counting the number of advertisements for each city.
Correlation analysis: It is possible to analyze the correlations between different variables, such as the year and price of the car or the kilometer and price of the car. This can be done by creating scatterplots of the variables and calculating correlation coefficients.
Data Cleaning: Some data cleaning processes can be performed on the dataset. Firstly, the missing values can be checked, and the missing values may need to be filled or removed from the dataset. Additionally, the date formats in the dataset and the data types of the variables can be checked and adjusted accordingly. Outliers in the dataset may also need to be checked and corrected or removed.
These cleaning processes in the dataset will help obtain healthier results for data analysis and machine learning algorithms.
As a result, this dataset is a workable dataset for data cleaning and a valuable resource that interested parties can use in their data analysis and machine learning projects.
All of these analyses can be visualized using various graphs and charts, such as line charts, histograms, and scatterplots.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Figures in scientific publications are critically important because they often show the data supporting key findings. Our systematic review of research articles published in top physiology journals (n = 703) suggests that, as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies. Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics. We recommend training investigators in data presentation, encouraging a more complete presentation of data, and changing journal editorial policies. Investigators can quickly make univariate scatterplots for small sample size studies using our Excel templates.