100+ datasets found

Data from: tableone: An open source Python package for producing summary...
zenodo.org
search.dataone.org
+1more
csv, txt
Updated May 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tom J. Pollard; Alistair E. W. Johnson; Jesse D. Raffa; Roger G. Mark; Tom J. Pollard; Alistair E. W. Johnson; Jesse D. Raffa; Roger G. Mark (2022). Data from: tableone: An open source Python package for producing summary statistics for research papers [Dataset]. http://doi.org/10.5061/dryad.26c4s35
Explore at:
csv, txtAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.26c4s35
Dataset updated
May 30, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Tom J. Pollard; Alistair E. W. Johnson; Jesse D. Raffa; Roger G. Mark; Tom J. Pollard; Alistair E. W. Johnson; Jesse D. Raffa; Roger G. Mark
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Objectives: In quantitative research, understanding basic parameters of the study population is key for interpretation of the results. As a result, it is typical for the first table ("Table 1") of a research paper to include summary statistics for the study data. Our objectives are 2-fold. First, we seek to provide a simple, reproducible method for providing summary statistics for research papers in the Python programming language. Second, we seek to use the package to improve the quality of summary statistics reported in research papers.

Materials and Methods: The tableone package is developed following good practice guidelines for scientific computing and all code is made available under a permissive MIT License. A testing framework runs on a continuous integration server, helping to maintain code stability. Issues are tracked openly and public contributions are encouraged.

Results: The tableone software package automatically compiles summary statistics into publishable formats such as CSV, HTML, and LaTeX. An executable Jupyter Notebook demonstrates application of the package to a subset of data from the MIMIC-III database. Tests such as Tukey's rule for outlier detection and Hartigan's Dip Test for modality are computed to highlight potential issues in summarizing the data.

Discussion and Conclusion: We present open source software for researchers to facilitate carrying out reproducible studies in Python, an increasingly popular language in scientific research. The toolkit is intended to mature over time with community feedback and input. Development of a common tool for summarizing data may help to promote good practice when used as a supplement to existing guidelines and recommendations. We encourage use of tableone alongside other methods of descriptive statistics and, in particular, visualization to ensure appropriate data handling. We also suggest seeking guidance from a statistician when using tableone for a research study, especially prior to submitting the study for publication.
Participation measures in higher education - Headline Summary Statistics
explore-education-statistics.service.gov.uk
Updated Nov 25, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Education (2021). Participation measures in higher education - Headline Summary Statistics [Dataset]. https://explore-education-statistics.service.gov.uk/data-catalogue/data-set/82c159ca-ff2e-4f40-a38c-80b4c44a2ff7
Explore at:
Dataset updated
Nov 25, 2021
Dataset authored and provided by
Department for Educationhttps://gov.uk/dfe
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Initial Participation (HEIP30) headline data
d
Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis...
catalog.data.gov
data.usgs.gov
+1more
Updated Oct 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis and Summary Statistics [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-3-0-vector-analysis-and-summary-stati
Explore at:
Dataset updated
Oct 22, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
United States
Description
Spatial analysis and statistical summaries of the Protected Areas Database of the United States (PAD-US) provide land managers and decision makers with a general assessment of management intent for biodiversity protection, natural resource management, and recreation access across the nation. The PAD-US 3.0 Combined Fee, Designation, Easement feature class (with Military Lands and Tribal Areas from the Proclamation and Other Planning Boundaries feature class) was modified to remove overlaps, avoiding overestimation in protected area statistics and to support user needs. A Python scripted process ("PADUS3_0_CreateVectorAnalysisFileScript.zip") associated with this data release prioritized overlapping designations (e.g. Wilderness within a National Forest) based upon their relative biodiversity conservation status (e.g. GAP Status Code 1 over 2), public access values (in the order of Closed, Restricted, Open, Unknown), and geodatabase load order (records are deliberately organized in the PAD-US full inventory with fee owned lands loaded before overlapping management designations, and easements). The Vector Analysis File ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") associated item of PAD-US 3.0 Spatial Analysis and Statistics ( https://doi.org/10.5066/P9KLBB5D ) was clipped to the Census state boundary file to define the extent and serve as a common denominator for statistical summaries. Boundaries of interest to stakeholders (State, Department of the Interior Region, Congressional District, County, EcoRegions I-IV, Urban Areas, Landscape Conservation Cooperative) were incorporated into separate geodatabase feature classes to support various data summaries ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip") and Comma-separated Value (CSV) tables ("PADUS3_0SummaryStatistics_TabularData_CSV.zip") summarizing "PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip" are provided as an alternative format and enable users to explore and download summary statistics of interest (Comma-separated Table [CSV], Microsoft Excel Workbook [.XLSX], Portable Document Format [.PDF] Report) from the PAD-US Lands and Inland Water Statistics Dashboard ( https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-statistics ). In addition, a "flattened" version of the PAD-US 3.0 combined file without other extent boundaries ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") allow for other applications that require a representation of overall protection status without overlapping designation boundaries. The "PADUS3_0VectorAnalysis_State_Clip_CENSUS2020" feature class ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.gdb") is the source of the PAD-US 3.0 raster files (associated item of PAD-US 3.0 Spatial Analysis and Statistics, https://doi.org/10.5066/P9KLBB5D ). Note, the PAD-US inventory is now considered functionally complete with the vast majority of land protection types represented in some manner, while work continues to maintain updates and improve data quality (see inventory completeness estimates at: http://www.protectedlands.net/data-stewards/ ). In addition, changes in protected area status between versions of the PAD-US may be attributed to improving the completeness and accuracy of the spatial data more than actual management actions or new acquisitions. USGS provides no legal warranty for the use of this data. While PAD-US is the official aggregation of protected areas ( https://www.fgdc.gov/ngda-reports/NGDA_Datasets.html ), agencies are the best source of their lands data.
Full summary statistics from 41 EWAS conducted for the EWAS Catalog
zenodo.org
data.niaid.nih.gov
application/gzip, csv
Updated Apr 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
EWAS Catalog team; EWAS Catalog team (2021). Full summary statistics from 41 EWAS conducted for the EWAS Catalog [Dataset]. http://doi.org/10.5281/zenodo.4672754
Explore at:
application/gzip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4672754
Dataset updated
Apr 9, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
EWAS Catalog team; EWAS Catalog team
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Full summary statistics from 41 epigenome-wide association studies (EWAS) conducted by The EWAS Catalog team (www.ewascatalog.org). Meta-data is found in the "studies-full.csv" file and the results are in "full_stats.tar.gz". Unzipping the "full_stats.tar.gz" file will reveal a folder containing 41 csv files, each with the full summary statistics from one EWAS. The results can be linked to the meta-data using the "Results_file" column in "studies-full.csv". These analyses were conducted using data extracted from the Gene Expression Omnibus (GEO). These data were extracted using the geograbi R package. For more information on the EWAS, please consult our paper: Battram, Thomas, et al. "The EWAS Catalog: A Database of Epigenome-wide Association Studies." OSF Preprints, 4 Feb. 2021. https://doi.org/10.31219/osf.io/837wn. Please cite the paper if you use this dataset.
CAD protein association summary statistics
figshare.com
csv
Updated Aug 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sivateja Tangirala (2025). CAD protein association summary statistics [Dataset]. http://doi.org/10.6084/m9.figshare.30006907.v1
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.30006907.v1
Dataset updated
Aug 29, 2025
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Sivateja Tangirala
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
CSV file containing summary statistics of proteins in association with incident CAD from logistic regression after adjusting for demographics, fasting status, glycemic status, BMI, and HbA1c.
BBC NEWS SUMMARY(CSV FORMAT)
kaggle.com
zip
Updated Sep 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dhiraj (2024). BBC NEWS SUMMARY(CSV FORMAT) [Dataset]. https://www.kaggle.com/datasets/dignity45/bbc-news-summarycsv-format
Explore at:
zip(2097600 bytes)Available download formats
Dataset updated
Sep 9, 2024
Authors
Dhiraj
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Description: Text Summarization Dataset

This dataset is designed for users aiming to train models for text summarization. It contains 2,225 rows of data with two columns: "Text" and "Summary". Each row features a detailed news article or piece of text paired with its corresponding summary, providing a rich resource for developing and fine-tuning summarization algorithms.

Key Features:

Text: Full-length articles or passages that serve as the input for summarization.

Summary: Concise summaries of the articles, which are ideal for training models to generate brief, coherent summaries from longer texts.

Future Enhancements:

This evolving dataset is planned to include additional features, such as text class labels, in future updates. These enhancements will provide more context and facilitate the development of models that can perform summarization across different categories of news content.

Usage:

Ideal for researchers and developers focused on text summarization tasks, this dataset enables the training of models to effectively compress information while retaining the essence of the original content.

Acknowledgment

We would like to extend our sincere gratitude to the dataset creator for their contribution to this valuable resource. This dataset, sourced from the BBC News Summary dataset on Kaggle, was created by Pariza. Their work has provided an invaluable asset for those working on text summarization tasks, and we appreciate their efforts in curating and sharing this data with the community.

Thank you for supporting research and development in the field of natural language processing!

File Description

This script processes and consolidates text data from various directories containing news articles and their corresponding summaries. It reads the files from specified folders, handles encoding issues, and then creates a DataFrame that is saved as a CSV file for further analysis.

Key Components:

Imports:

numpy (np): Numerical operations library, though it's not used in this script.

pandas (pd): Data manipulation and analysis library.

os: For interacting with the operating system, e.g., building file paths.

glob: For file pattern matching and retrieving file paths.

Function: get_texts

Parameters:

text_folders: List of folders containing news article text files.

text_list: List to store the content of text files.

summ_folder: List of folders containing summary text files.

sum_list: List to store the content of summary files.

encodings: List of encodings to try for reading files.

Purpose:

Reads text files from specified folders, handles different encodings, and appends the content to text_list and sum_list.

Returns the updated lists of texts and summaries.

Data Preparation:

text_folder: List of directories for news articles.

summ_folder: List of directories for summaries.

text_list and summ_list: Initialize empty lists to store the contents.

data_df: Empty DataFrame to store the final data.

Execution:

Calls get_texts function to populate text_list and summ_list.

Creates a DataFrame data_df with columns 'Text' and 'Summary'.

Saves data_df to a CSV file at /kaggle/working/bbc_news_data.csv.

Output:

Prints the first few entries of the DataFrame to verify the content.

Column Descriptions:

Text: Contains the full-length articles or passages of news content. This column is used as the input for summarization models.

Summary: Contains concise summaries of the corresponding articles in the "Text" column. This column is used as the target output for summarization models.

Usage:

This script is designed to be run in a Kaggle environment where paths to text data are predefined.

It is intended for preprocessing and saving text data from news articles and summaries for subsequent analysis or model training.
d
Protected Areas Database of the United States (PAD-US) 3.0 Spatial Analysis...
catalog.data.gov
data.usgs.gov
Updated Oct 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Protected Areas Database of the United States (PAD-US) 3.0 Spatial Analysis and Statistics [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-3-0-spatial-analysis-and-statistics
Explore at:
Dataset updated
Oct 22, 2025
Dataset provided by
U.S. Geological Survey
Area covered
United States
Description
Spatial analysis and statistical summaries of the Protected Areas Database of the United States (PAD-US) provide land managers and decision makers with a general assessment of management intent for biodiversity protection, natural resource management, and outdoor recreation access across the nation. This data release presents results from statistical summaries of the PAD-US 3.0 protection status (by GAP Status Code) and public access status for various land unit boundaries (Protected Areas Database of the United States 3.0 Vector Analysis and Summary Statistics). Summary statistics are also available to explore and download (Comma-separated Table [CSV], Microsoft Excel Workbook (.xlsx), Portable Document Format [.pdf] Report) from the PAD-US Lands and Inland Water Statistics Dashboard ( https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-statistics ). The vector GIS analysis file, source data used to summarize statistics for areas of interest to stakeholders (National, State, Department of the Interior Region, Congressional District, County, EcoRegions I-IV, Urban Areas, Landscape Conservation Cooperative), and complete Summary Statistics Tabular Data (CSV) are included in this data release. Raster GIS analysis files are also available for combination with other raster data (Protected Areas Database of the United States (PAD-US) 3.0 Raster Analysis). The PAD-US 3.0 Combined Fee, Designation, Easement feature class in the full inventory, with Military Lands and Tribal Areas from the Proclamation and Other Planning Boundaries feature class (Protected Areas Database of the United States (PAD-US) 3.0, https://doi.org/10.5066/P9Q9LQ4B), was modified to prioritize and remove overlapping management designations, limiting overestimation in protection status or public access statistics and to support user needs for vector and raster analysis data. Analysis files in this data release were clipped to the Census State boundary file to define the extent and fill in areas (largely private land) outside the PAD-US, providing a common denominator for statistical summaries.
g
North Carolina (NC) Stochastic Empirical Loading and Dilution Model (SELDM)...
gimi9.com
Updated May 26, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2019). North Carolina (NC) Stochastic Empirical Loading and Dilution Model (SELDM) summary statistics for physical and chemical data at NC highway-runoff and bridge-deck sites | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_north-carolina-nc-stochastic-empirical-loading-and-dilution-model-seldm-summary-statistics
Explore at:
Dataset updated
May 26, 2019
Area covered
North Carolina
Description
The purpose of this USGS data release is to publish NC SELDM streamflow statistics and summary statistics of physical and chemical data in support of the information provided in the above-referenced report. This data release consists of two data sets, "NC SELDM streamflow statistics..." and "NC SELDM summary statistics for physical and chemical data...". The tables that are uploaded for the "NC SELDM streamflow statistics for 266 streamgages across North Carolina" sub-section are primarily the support files for the StreamStatsDB update that was completed when the report was approved. These files were generated using the GNWISQ and QSTATS computer programs developed and described by Granato (2009, appendices 1 and 4). This is discussed near the end of the "Prestorm streamflow statistics" section in the above-referenced report. A large table of selected site attributes and StreamStats basin characteristics that were compiled for the 266 streamgages is also provided as a part of this data release. A ReadMe file is also included in the sub-section of the data release. The tables that are uploaded for the "NC SELDM summary statistics for physical and chemical data at NC highway-runoff and bridge-deck sites" sub-section of the data release support the statewide medians table (Table 7) discussed within the "Simulating highway-runoff quality" section in the above-referenced report. This is a .csv file for each of the 11 constituents referenced in Table 11. Descriptions of the data fields (or columns) in the .csv tables are provided at the top of each .csv file. A ReadMe file is also included in the sub-section of the data release.
Private rental market summary statistics - October 2014 to September 2015
gov.uk
Updated Aug 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Valuation Office Agency (2023). Private rental market summary statistics - October 2014 to September 2015 [Dataset]. https://www.gov.uk/government/statistics/private-rental-market-summary-statistics-england-2014-15
Explore at:
Dataset updated
Aug 15, 2023
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Valuation Office Agency
Description
The release presents the mean, median, lower quartile and upper quartile total monthly rent paid, for a number of bedroom/room categories. This covers each local authority in England, for the 12 months to the end of September 2015.

For further details on the information included in this release, including a glossary of terms and a variable list for the CSV format files, please refer to the statistical summary.
Real State Website Data
kaggle.com
zip
Updated Jun 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
M. Mazhar (2023). Real State Website Data [Dataset]. https://www.kaggle.com/datasets/mazhar01/real-state-website-data/code
Explore at:
zip(228356 bytes)Available download formats
Dataset updated
Jun 11, 2023
Authors
M. Mazhar
License
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Description
Check: End-to-End Regression Model Pipeline Development with FastAPI: From Data Scraping to Deployment with CI/CD Integration

This CSV dataset provides comprehensive information about house prices. It consists of 9,819 entries and 54 columns, offering a wealth of features for analysis. The dataset includes various numerical and categorical variables, providing insights into factors that influence house prices.

The key columns in the dataset are as follows:

Location1: The location of the houses. Location2 column is identical or shorter version of Location1 Year: The year of construction. Type: The type of the house. Bedrooms: The number of bedrooms in the house. Bathrooms: The number of bathrooms in the house. Size_in_SqYds: The size of the house in square yards. Price: The price of the house. Parking_Spaces: The number of parking spaces available. Floors_in_Building: The number of floors in the building. Elevators: The presence of elevators in the building. Lobby_in_Building: The presence of a lobby in the building.

In addition to these, the dataset contains several other features related to various amenities and facilities available in the houses, such as double-glazed windows, central air conditioning, central heating, waste disposal, furnished status, service elevators, and more.

By performing exploratory data analysis on this dataset using Python and the Pandas library, valuable insights can be gained regarding the relationships between different variables and the impact they have on house prices. Descriptive statistics, data visualization, and feature engineering techniques can be applied to uncover patterns and trends in the housing market.

This dataset serves as a valuable resource for real estate professionals, analysts, and researchers interested in understanding the factors that contribute to house prices and making informed decisions in the real estate market.
f
Data to follow the statistical analysis including raw data as CSV files.
datasetcatalog.nlm.nih.gov
figshare.com
+1more
Updated Feb 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fey, Philipp; Mörchel, Philipp; Haddad, Daniel; Jakob, Peter; Hansmann, Jan; Stebani, Jannik; Weber, Daniel Ludwig; Hiller, Karl-Heinz (2023). Data to follow the statistical analysis including raw data as CSV files. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000951591
Explore at:
Dataset updated
Feb 21, 2023
Authors
Fey, Philipp; Mörchel, Philipp; Haddad, Daniel; Jakob, Peter; Hansmann, Jan; Stebani, Jannik; Weber, Daniel Ludwig; Hiller, Karl-Heinz
Description
Data that was used to train the SVM. As the train-test data were assigned randomly for every training iteration, the individual data used for generating the subfigures b–e are not separately listed, as these cannot be manually recreated but depend on the train-test assignment by the algorithm. (ZIP)
Participation measures in higher education - Headline stats summary
explore-education-statistics.service.gov.uk
Updated Oct 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Education (2023). Participation measures in higher education - Headline stats summary [Dataset]. https://explore-education-statistics.service.gov.uk/data-catalogue/data-set/d056c449-6d3c-41cd-bf6b-32d2bf206ec8
Explore at:
Dataset updated
Oct 26, 2023
Dataset authored and provided by
Department for Educationhttps://gov.uk/dfe
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
HE Participation by age 25 (CHEP-25) key stats
Z
Full summary statistics from 387 EWAS conducted for the EWAS Catalog
data.niaid.nih.gov
Updated Apr 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
EWAS Catalog team (2021). Full summary statistics from 387 EWAS conducted for the EWAS Catalog [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4672644
Explore at:
Dataset updated
Apr 9, 2021
Dataset provided by
MRC Integrative Epidemiology Unit, University of Bristol
Authors
EWAS Catalog team
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Full summary statistics from 387 epigenome-wide association studies (EWAS) conducted by The EWAS Catalog team (http://www.ewascatalog.org/). Meta-data is found in the "studies-full.csv" file and the results are in "full_stats.tar.gz". Unzipping the "full_stats.tar.gz" file will reveal a folder containing 387 csv files, each with the full summary statistics from one EWAS. The results can be linked to the meta-data using the "Results_file" column in "studies-full.csv". These analyses were conducted using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES) subset of the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort. For more information on the EWAS, please consult our paper: Battram, Thomas, et al. "The EWAS Catalog: A Database of Epigenome-wide Association Studies." OSF Preprints, 4 Feb. 2021. https://doi.org/10.31219/osf.io/837wn. Please cite the paper if you use the dataset.
Apprenticeships and traineeships - Public sector target - summary
explore-education-statistics.service.gov.uk
Updated Jan 28, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Education (2021). Apprenticeships and traineeships - Public sector target - summary [Dataset]. https://explore-education-statistics.service.gov.uk/data-catalogue/data-set/b748f358-940c-4b4c-a6fe-bea16435d67e
Explore at:
Dataset updated
Jan 28, 2021
Dataset authored and provided by
Department for Educationhttps://gov.uk/dfe
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Summary statistics of figures supplied by public sector bodies (up to and including 17 November 2021) covering the four years of the target (period covering 1 April 2017 to 31 March 2021)Reporting periods: 2017-18 to 2020-21Indicators: Apprentices (prior to period, new in period, at end of period, cumulative since April 2017)Employees (prior to period, new in period, at end of period, cumulative since April 2017)Apprenticeship percentage (prior to period, new in period, at end of period cumulative since April 2017)Number of employers in the periodPercentage of employees starting apprenticeships in periodFilters: Subsector
NBA Top Scorers (2000-2024)
kaggle.com
zip
Updated Aug 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nick Alden (2024). NBA Top Scorers (2000-2024) [Dataset]. https://www.kaggle.com/datasets/nickalden/nba-top-scorers-stats-and-shot-details-2000-2024
Explore at:
zip(2731965 bytes)Available download formats
Dataset updated
Aug 22, 2024
Authors
Nick Alden
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Data for the top 10 scorers in the NBA from the years 2000-2024. Scraped using nba_api.

leaders.csv - - General season statistics for each season's top 10 scorers

shotsXXXXs.csv - - Shot details for every made shot from each season's top 10 scorers

shots2000s.csv - - Data from 2000-01 season through 2009-10 season

shots2010s.csv - - Data from 2010-11 season through 2019-20 season

shots2020s.csv - - Data from 2020-21 season through 2023-24 season
LA and school expenditure - Summary data for headline figures
explore-education-statistics.service.gov.uk
Updated Dec 16, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Education (2021). LA and school expenditure - Summary data for headline figures [Dataset]. https://explore-education-statistics.service.gov.uk/data-catalogue/data-set/9f43d48f-17f5-48c1-9e28-986577766a26
Explore at:
Dataset updated
Dec 16, 2021
Dataset authored and provided by
Department for Educationhttps://gov.uk/dfe
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
This file contains a summary of the headline, national level figures for this publication.
s
Data from: Data files used to study change dynamics in software systems
figshare.swinburne.edu.au
pdf
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rajesh Vasa (2024). Data files used to study change dynamics in software systems [Dataset]. http://doi.org/10.25916/sut.26288227.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.25916/sut.26288227.v1
Dataset updated
Jul 22, 2024
Dataset provided by
Swinburne
Authors
Rajesh Vasa
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
It is a widely accepted fact that evolving software systems change and grow. However, it is less well-understood how change is distributed over time, specifically in object oriented software systems. The patterns and techniques used to measure growth permit developers to identify specific releases where significant change took place as well as to inform them of the longer term trend in the distribution profile. This knowledge assists developers in recording systemic and substantial changes to a release, as well as to provide useful information as input into a potential release retrospective. However, these analysis methods can only be applied after a mature release of the code has been developed. But in order to manage the evolution of complex software systems effectively, it is important to identify change-prone classes as early as possible. Specifically, developers need to know where they can expect change, the likelihood of a change, and the magnitude of these modifications in order to take proactive steps and mitigate any potential risks arising from these changes. Previous research into change-prone classes has identified some common aspects, with different studies suggesting that complex and large classes tend to undergo more changes and classes that changed recently are likely to undergo modifications in the near future. Though the guidance provided is helpful, developers need more specific guidance in order for it to be applicable in practice. Furthermore, the information needs to be available at a level that can help in developing tools that highlight and monitor evolution prone parts of a system as well as support effort estimation activities. The specific research questions that we address in this chapter are: (1) What is the likelihood that a class will change from a given version to the next? (a) Does this probability change over time? (b) Is this likelihood project specific, or general? (2) How is modification frequency distributed for classes that change? (3) What is the distribution of the magnitude of change? Are most modifications minor adjustments, or substantive modifications? (4) Does structural complexity make a class susceptible to change? (5) Does popularity make a class more change-prone? We make recommendations that can help developers to proactively monitor and manage change. These are derived from a statistical analysis of change in approximately 55000 unique classes across all projects under investigation. The analysis methods that we applied took into consideration the highly skewed nature of the metric data distributions. The raw metric data (4 .txt files and 4 .log files in a .zip file measuring ~2MB in total) is provided as a comma separated values (CSV) file, and the first line of the CSV file contains the header. A detailed output of the statistical analysis undertaken is provided as log files generated directly from Stata (statistical analysis software).
Genome-wide association summary statistics of chronic musculoskeletal pain...
zenodo.org
csv
Updated May 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yakov A. Tsepilov; Yakov A. Tsepilov; Maxim B. Freidin; Alexandra S. Shadrina; Alexandra S. Shadrina; Sodbo Z. Sharapov; Sodbo Z. Sharapov; Elizaveta E. Elgaeva; Elizaveta E. Elgaeva; Jan van Zundert; Jan van Zundert; Lennart С. Karssen; Lennart С. Karssen; Pradeep Suri; Frances M.K. Williams; Frances M.K. Williams; Yurii S. Aulchenko; Yurii S. Aulchenko; Maxim B. Freidin; Pradeep Suri (2020). Genome-wide association summary statistics of chronic musculoskeletal pain at four anatomic sites and their genetically independent components [Dataset]. http://doi.org/10.5281/zenodo.3797553
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3797553
Dataset updated
May 15, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yakov A. Tsepilov; Yakov A. Tsepilov; Maxim B. Freidin; Alexandra S. Shadrina; Alexandra S. Shadrina; Sodbo Z. Sharapov; Sodbo Z. Sharapov; Elizaveta E. Elgaeva; Elizaveta E. Elgaeva; Jan van Zundert; Jan van Zundert; Lennart С. Karssen; Lennart С. Karssen; Pradeep Suri; Frances M.K. Williams; Frances M.K. Williams; Yurii S. Aulchenko; Yurii S. Aulchenko; Maxim B. Freidin; Pradeep Suri
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains results of a genome-wide association study of distinct chronic musculoskeletal pain conditions: back pain, knee pain, neck pain, and hip pain. Additionally, there are genome-wide association summary statistics for four genetically independent components of pain conditions, listed above. For more details, please, read the paper XXX.

All files contain association summary statistics for genome-wide association meta-analysis of the 265,000 white British individuals from the UK Biobank and additional 191,580 individuals of European Ancestry from the UK biobank (total N = 456,580). Cases and controls were defined based on questionnaire responses. First, participants responded to “Pain type(s) experienced in the last months” followed by questions inquiring if the specific pain had been present for more than 3 months. Those who reported back, neck or shoulder, hip, or knee pain lasting more than 3 months were considered chronic back, neck/shoulder, hip, and knee pain cases, respectively. Participants reporting no such pain lasting longer than 3 months were considered controls (regardless of whether they had another regional chronic pain, such as abdominal pain, or not). Individuals who preferred not to answer were excluded from the study. Besides this, we excluded individuals who reported more than 3 months of pain all over the body.

The data are provided on an "AS-IS" basis, without warranty of any type, expressed or implied, including but not limited to any warranty as to their performance, merchantability, or fitness for any particular purpose. If investigators use these data, any and all consequences are entirely their responsibility. By downloading and using these data, you agree that you will cite the appropriate publication in any communications or publications arising directly or indirectly from these data; for utilization of data available prior to publication, you agree to respect the requested responsibilities of resource users under 2003 Fort Lauderdale principles; you agree that you will never attempt to identify any participant. This research has been conducted using the UK Biobank Resource and the use of the data is guided by the principles formulated by the UK Biobank.

When using downloaded data, please cite the corresponding paper and this repository:

Tsepilov et al 2020

Funding:

The work of YSA and SZS was supported by the Russian Ministry of Education and Science under the 5-100 Excellence Programme and by the Federal Agency of Scientific Organizations via the Institute of Cytology and Genetics (project 0324-2019-0040). The work of YAT, ASSh, and EEE was supported by the Russian Foundation for Basic Research (project 19-015-00151). The contribution of LСK was funded by PolyOmica. Dr. Suri was supported by VA Career Development Award # 1IK2RX001515 from the United States (U.S.) Department of Veterans Affairs Rehabilitation Research and Development (RR&D) Service. Dr. Suri is a Staff Physician at the VA Puget Sound Health Care System. The contents of this work do not represent the views of the U.S. Department of Veterans Affairs or the United States Government.

List of files:

Back_output_done.csv: GWAS summary statistics for the chronic back pain

gpc1_output_done.csv: GWAS summary statistics for the GIP1

gpc2_output_done.csv: GWAS summary statistics for the GIP2

gpc3_output_done.csv: GWAS summary statistics for the GIP3

gpc4_output_done.csv: GWAS summary statistics for the GIP4

Hip_output_done.csv: GWAS summary statistics for the chronic hip pain

Knee_output_done.csv: GWAS summary statistics for the chronic knee pain

Neck_output_done.csv: GWAS summary statistics for the chronic neck pain

Column headers:

gwas_id: uninformative field

rs_id: dbSNP rsID (GRCh37 build)

snp_num: uninformative field

chr: chromosome (GRCh37 build)

bp: position (GRCh37 build)

ea: effect allele (coded as "1")

ra: reference allele (coded as "0")

eaf: effect allele frequency

af_ref: uninformative field

beta: effect size of effect allele

se: standard error of effect size

p: P-value of association (without GC correction)

n:Total sample size

z: Z-statistic of association

info: uninformative field

af_outlier: uninformative field

pz_outlier: uninformative field
Higher Level Learners in England - Headline stats summary
explore-education-statistics.service.gov.uk
Updated May 25, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Education (2023). Higher Level Learners in England - Headline stats summary [Dataset]. https://explore-education-statistics.service.gov.uk/data-catalogue/data-set/5bac16ac-e04c-4be8-9bcd-f5d5c62c34b8
Explore at:
Dataset updated
May 25, 2023
Dataset authored and provided by
Department for Educationhttps://gov.uk/dfe
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Headline statistics used in the blue summary boxes of the publication.
Widening participation in higher education - Headline Stats
explore-education-statistics.service.gov.uk
Updated Jul 28, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Education (2022). Widening participation in higher education - Headline Stats [Dataset]. https://explore-education-statistics.service.gov.uk/data-catalogue/data-set/26d4d206-269f-4271-9ee6-5f24d8f17d28
Explore at:
Dataset updated
Jul 28, 2022
Dataset authored and provided by
Department for Educationhttps://gov.uk/dfe
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Key summary statistics
- Explore Education Statistics data set Headline Stats from Widening participation in higher education

Facebook

Twitter

Click to copy link

Link copied

Cite

Tom J. Pollard; Alistair E. W. Johnson; Jesse D. Raffa; Roger G. Mark; Tom J. Pollard; Alistair E. W. Johnson; Jesse D. Raffa; Roger G. Mark (2022). Data from: tableone: An open source Python package for producing summary statistics for research papers [Dataset]. http://doi.org/10.5061/dryad.26c4s35

Data from: tableone: An open source Python package for producing summary statistics for research papers

Explore at:

csv, txtAvailable download formats

Unique identifier

https://doi.org/10.5061/dryad.26c4s35

Dataset updated

May 30, 2022

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Tom J. Pollard; Alistair E. W. Johnson; Jesse D. Raffa; Roger G. Mark; Tom J. Pollard; Alistair E. W. Johnson; Jesse D. Raffa; Roger G. Mark

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Objectives: In quantitative research, understanding basic parameters of the study population is key for interpretation of the results. As a result, it is typical for the first table ("Table 1") of a research paper to include summary statistics for the study data. Our objectives are 2-fold. First, we seek to provide a simple, reproducible method for providing summary statistics for research papers in the Python programming language. Second, we seek to use the package to improve the quality of summary statistics reported in research papers.

Materials and Methods: The tableone package is developed following good practice guidelines for scientific computing and all code is made available under a permissive MIT License. A testing framework runs on a continuous integration server, helping to maintain code stability. Issues are tracked openly and public contributions are encouraged.

Results: The tableone software package automatically compiles summary statistics into publishable formats such as CSV, HTML, and LaTeX. An executable Jupyter Notebook demonstrates application of the package to a subset of data from the MIMIC-III database. Tests such as Tukey's rule for outlier detection and Hartigan's Dip Test for modality are computed to highlight potential issues in summarizing the data.

Discussion and Conclusion: We present open source software for researchers to facilitate carrying out reproducible studies in Python, an increasingly popular language in scientific research. The toolkit is intended to mature over time with community feedback and input. Development of a common tool for summarizing data may help to promote good practice when used as a supplement to existing guidelines and recommendations. We encourage use of tableone alongside other methods of descriptive statistics and, in particular, visualization to ensure appropriate data handling. We also suggest seeking guidance from a statistician when using tableone for a research study, especially prior to submitting the study for publication.

Clear search

Close search

Google apps

Main menu

Data from: tableone: An open source Python package for producing summary...

Participation measures in higher education - Headline Summary Statistics

Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis...

Full summary statistics from 41 EWAS conducted for the EWAS Catalog

CAD protein association summary statistics

BBC NEWS SUMMARY(CSV FORMAT)

Dataset Description: Text Summarization Dataset

Key Features:

Future Enhancements:

Usage:

Acknowledgment

File Description

Key Components:

Column Descriptions:

Usage:

Protected Areas Database of the United States (PAD-US) 3.0 Spatial Analysis...

North Carolina (NC) Stochastic Empirical Loading and Dilution Model (SELDM)...

Private rental market summary statistics - October 2014 to September 2015

Real State Website Data

Data to follow the statistical analysis including raw data as CSV files.

Participation measures in higher education - Headline stats summary

Full summary statistics from 387 EWAS conducted for the EWAS Catalog

Apprenticeships and traineeships - Public sector target - summary

NBA Top Scorers (2000-2024)

LA and school expenditure - Summary data for headline figures

Data from: Data files used to study change dynamics in software systems

Genome-wide association summary statistics of chronic musculoskeletal pain...

Higher Level Learners in England - Headline stats summary

Widening participation in higher education - Headline Stats

Data from: tableone: An open source Python package for producing summary statistics for research papers