100+ datasets found
  1. Data from: tableone: An open source Python package for producing summary...

    • zenodo.org
    • search.dataone.org
    • +1more
    csv, txt
    Updated May 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tom J. Pollard; Alistair E. W. Johnson; Jesse D. Raffa; Roger G. Mark; Tom J. Pollard; Alistair E. W. Johnson; Jesse D. Raffa; Roger G. Mark (2022). Data from: tableone: An open source Python package for producing summary statistics for research papers [Dataset]. http://doi.org/10.5061/dryad.26c4s35
    Explore at:
    csv, txtAvailable download formats
    Dataset updated
    May 30, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Tom J. Pollard; Alistair E. W. Johnson; Jesse D. Raffa; Roger G. Mark; Tom J. Pollard; Alistair E. W. Johnson; Jesse D. Raffa; Roger G. Mark
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Objectives: In quantitative research, understanding basic parameters of the study population is key for interpretation of the results. As a result, it is typical for the first table ("Table 1") of a research paper to include summary statistics for the study data. Our objectives are 2-fold. First, we seek to provide a simple, reproducible method for providing summary statistics for research papers in the Python programming language. Second, we seek to use the package to improve the quality of summary statistics reported in research papers.

    Materials and Methods: The tableone package is developed following good practice guidelines for scientific computing and all code is made available under a permissive MIT License. A testing framework runs on a continuous integration server, helping to maintain code stability. Issues are tracked openly and public contributions are encouraged.

    Results: The tableone software package automatically compiles summary statistics into publishable formats such as CSV, HTML, and LaTeX. An executable Jupyter Notebook demonstrates application of the package to a subset of data from the MIMIC-III database. Tests such as Tukey's rule for outlier detection and Hartigan's Dip Test for modality are computed to highlight potential issues in summarizing the data.

    Discussion and Conclusion: We present open source software for researchers to facilitate carrying out reproducible studies in Python, an increasingly popular language in scientific research. The toolkit is intended to mature over time with community feedback and input. Development of a common tool for summarizing data may help to promote good practice when used as a supplement to existing guidelines and recommendations. We encourage use of tableone alongside other methods of descriptive statistics and, in particular, visualization to ensure appropriate data handling. We also suggest seeking guidance from a statistician when using tableone for a research study, especially prior to submitting the study for publication.

  2. Participation measures in higher education - Headline Summary Statistics

    • explore-education-statistics.service.gov.uk
    Updated Nov 25, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Education (2021). Participation measures in higher education - Headline Summary Statistics [Dataset]. https://explore-education-statistics.service.gov.uk/data-catalogue/data-set/82c159ca-ff2e-4f40-a38c-80b4c44a2ff7
    Explore at:
    Dataset updated
    Nov 25, 2021
    Dataset authored and provided by
    Department for Educationhttps://gov.uk/dfe
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Initial Participation (HEIP30) headline data

  3. d

    Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis...

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Oct 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Protected Areas Database of the United States (PAD-US) 3.0 Vector Analysis and Summary Statistics [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-3-0-vector-analysis-and-summary-stati
    Explore at:
    Dataset updated
    Oct 22, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    United States
    Description

    Spatial analysis and statistical summaries of the Protected Areas Database of the United States (PAD-US) provide land managers and decision makers with a general assessment of management intent for biodiversity protection, natural resource management, and recreation access across the nation. The PAD-US 3.0 Combined Fee, Designation, Easement feature class (with Military Lands and Tribal Areas from the Proclamation and Other Planning Boundaries feature class) was modified to remove overlaps, avoiding overestimation in protected area statistics and to support user needs. A Python scripted process ("PADUS3_0_CreateVectorAnalysisFileScript.zip") associated with this data release prioritized overlapping designations (e.g. Wilderness within a National Forest) based upon their relative biodiversity conservation status (e.g. GAP Status Code 1 over 2), public access values (in the order of Closed, Restricted, Open, Unknown), and geodatabase load order (records are deliberately organized in the PAD-US full inventory with fee owned lands loaded before overlapping management designations, and easements). The Vector Analysis File ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") associated item of PAD-US 3.0 Spatial Analysis and Statistics ( https://doi.org/10.5066/P9KLBB5D ) was clipped to the Census state boundary file to define the extent and serve as a common denominator for statistical summaries. Boundaries of interest to stakeholders (State, Department of the Interior Region, Congressional District, County, EcoRegions I-IV, Urban Areas, Landscape Conservation Cooperative) were incorporated into separate geodatabase feature classes to support various data summaries ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip") and Comma-separated Value (CSV) tables ("PADUS3_0SummaryStatistics_TabularData_CSV.zip") summarizing "PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.zip" are provided as an alternative format and enable users to explore and download summary statistics of interest (Comma-separated Table [CSV], Microsoft Excel Workbook [.XLSX], Portable Document Format [.PDF] Report) from the PAD-US Lands and Inland Water Statistics Dashboard ( https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-statistics ). In addition, a "flattened" version of the PAD-US 3.0 combined file without other extent boundaries ("PADUS3_0VectorAnalysisFile_ClipCensus.zip") allow for other applications that require a representation of overall protection status without overlapping designation boundaries. The "PADUS3_0VectorAnalysis_State_Clip_CENSUS2020" feature class ("PADUS3_0VectorAnalysisFileOtherExtents_Clip_Census.gdb") is the source of the PAD-US 3.0 raster files (associated item of PAD-US 3.0 Spatial Analysis and Statistics, https://doi.org/10.5066/P9KLBB5D ). Note, the PAD-US inventory is now considered functionally complete with the vast majority of land protection types represented in some manner, while work continues to maintain updates and improve data quality (see inventory completeness estimates at: http://www.protectedlands.net/data-stewards/ ). In addition, changes in protected area status between versions of the PAD-US may be attributed to improving the completeness and accuracy of the spatial data more than actual management actions or new acquisitions. USGS provides no legal warranty for the use of this data. While PAD-US is the official aggregation of protected areas ( https://www.fgdc.gov/ngda-reports/NGDA_Datasets.html ), agencies are the best source of their lands data.

  4. Full summary statistics from 41 EWAS conducted for the EWAS Catalog

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip, csv
    Updated Apr 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EWAS Catalog team; EWAS Catalog team (2021). Full summary statistics from 41 EWAS conducted for the EWAS Catalog [Dataset]. http://doi.org/10.5281/zenodo.4672754
    Explore at:
    application/gzip, csvAvailable download formats
    Dataset updated
    Apr 9, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    EWAS Catalog team; EWAS Catalog team
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Full summary statistics from 41 epigenome-wide association studies (EWAS) conducted by The EWAS Catalog team (www.ewascatalog.org). Meta-data is found in the "studies-full.csv" file and the results are in "full_stats.tar.gz". Unzipping the "full_stats.tar.gz" file will reveal a folder containing 41 csv files, each with the full summary statistics from one EWAS. The results can be linked to the meta-data using the "Results_file" column in "studies-full.csv". These analyses were conducted using data extracted from the Gene Expression Omnibus (GEO). These data were extracted using the geograbi R package. For more information on the EWAS, please consult our paper: Battram, Thomas, et al. "The EWAS Catalog: A Database of Epigenome-wide Association Studies." OSF Preprints, 4 Feb. 2021. https://doi.org/10.31219/osf.io/837wn. Please cite the paper if you use this dataset.

  5. CAD protein association summary statistics

    • figshare.com
    csv
    Updated Aug 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sivateja Tangirala (2025). CAD protein association summary statistics [Dataset]. http://doi.org/10.6084/m9.figshare.30006907.v1
    Explore at:
    csvAvailable download formats
    Dataset updated
    Aug 29, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Sivateja Tangirala
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CSV file containing summary statistics of proteins in association with incident CAD from logistic regression after adjusting for demographics, fasting status, glycemic status, BMI, and HbA1c.

  6. BBC NEWS SUMMARY(CSV FORMAT)

    • kaggle.com
    zip
    Updated Sep 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dhiraj (2024). BBC NEWS SUMMARY(CSV FORMAT) [Dataset]. https://www.kaggle.com/datasets/dignity45/bbc-news-summarycsv-format
    Explore at:
    zip(2097600 bytes)Available download formats
    Dataset updated
    Sep 9, 2024
    Authors
    Dhiraj
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Description: Text Summarization Dataset

    This dataset is designed for users aiming to train models for text summarization. It contains 2,225 rows of data with two columns: "Text" and "Summary". Each row features a detailed news article or piece of text paired with its corresponding summary, providing a rich resource for developing and fine-tuning summarization algorithms.

    Key Features:

    • Text: Full-length articles or passages that serve as the input for summarization.
    • Summary: Concise summaries of the articles, which are ideal for training models to generate brief, coherent summaries from longer texts.

    Future Enhancements:

    This evolving dataset is planned to include additional features, such as text class labels, in future updates. These enhancements will provide more context and facilitate the development of models that can perform summarization across different categories of news content.

    Usage:

    Ideal for researchers and developers focused on text summarization tasks, this dataset enables the training of models to effectively compress information while retaining the essence of the original content.

    Acknowledgment

    We would like to extend our sincere gratitude to the dataset creator for their contribution to this valuable resource. This dataset, sourced from the BBC News Summary dataset on Kaggle, was created by Pariza. Their work has provided an invaluable asset for those working on text summarization tasks, and we appreciate their efforts in curating and sharing this data with the community.

    Thank you for supporting research and development in the field of natural language processing!

    File Description

    This script processes and consolidates text data from various directories containing news articles and their corresponding summaries. It reads the files from specified folders, handles encoding issues, and then creates a DataFrame that is saved as a CSV file for further analysis.

    Key Components:

    1. Imports:

      • numpy (np): Numerical operations library, though it's not used in this script.
      • pandas (pd): Data manipulation and analysis library.
      • os: For interacting with the operating system, e.g., building file paths.
      • glob: For file pattern matching and retrieving file paths.
    2. Function: get_texts

      • Parameters:
        • text_folders: List of folders containing news article text files.
        • text_list: List to store the content of text files.
        • summ_folder: List of folders containing summary text files.
        • sum_list: List to store the content of summary files.
        • encodings: List of encodings to try for reading files.
      • Purpose:
        • Reads text files from specified folders, handles different encodings, and appends the content to text_list and sum_list.
        • Returns the updated lists of texts and summaries.
    3. Data Preparation:

      • text_folder: List of directories for news articles.
      • summ_folder: List of directories for summaries.
      • text_list and summ_list: Initialize empty lists to store the contents.
      • data_df: Empty DataFrame to store the final data.
    4. Execution:

      • Calls get_texts function to populate text_list and summ_list.
      • Creates a DataFrame data_df with columns 'Text' and 'Summary'.
      • Saves data_df to a CSV file at /kaggle/working/bbc_news_data.csv.
    5. Output:

      • Prints the first few entries of the DataFrame to verify the content.

    Column Descriptions:

    • Text: Contains the full-length articles or passages of news content. This column is used as the input for summarization models.
    • Summary: Contains concise summaries of the corresponding articles in the "Text" column. This column is used as the target output for summarization models.

    Usage:

    • This script is designed to be run in a Kaggle environment where paths to text data are predefined.
    • It is intended for preprocessing and saving text data from news articles and summaries for subsequent analysis or model training.
  7. d

    Protected Areas Database of the United States (PAD-US) 3.0 Spatial Analysis...

    • catalog.data.gov
    • data.usgs.gov
    Updated Oct 22, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Protected Areas Database of the United States (PAD-US) 3.0 Spatial Analysis and Statistics [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-3-0-spatial-analysis-and-statistics
    Explore at:
    Dataset updated
    Oct 22, 2025
    Dataset provided by
    U.S. Geological Survey
    Area covered
    United States
    Description

    Spatial analysis and statistical summaries of the Protected Areas Database of the United States (PAD-US) provide land managers and decision makers with a general assessment of management intent for biodiversity protection, natural resource management, and outdoor recreation access across the nation. This data release presents results from statistical summaries of the PAD-US 3.0 protection status (by GAP Status Code) and public access status for various land unit boundaries (Protected Areas Database of the United States 3.0 Vector Analysis and Summary Statistics). Summary statistics are also available to explore and download (Comma-separated Table [CSV], Microsoft Excel Workbook (.xlsx), Portable Document Format [.pdf] Report) from the PAD-US Lands and Inland Water Statistics Dashboard ( https://www.usgs.gov/programs/gap-analysis-project/science/pad-us-statistics ). The vector GIS analysis file, source data used to summarize statistics for areas of interest to stakeholders (National, State, Department of the Interior Region, Congressional District, County, EcoRegions I-IV, Urban Areas, Landscape Conservation Cooperative), and complete Summary Statistics Tabular Data (CSV) are included in this data release. Raster GIS analysis files are also available for combination with other raster data (Protected Areas Database of the United States (PAD-US) 3.0 Raster Analysis). The PAD-US 3.0 Combined Fee, Designation, Easement feature class in the full inventory, with Military Lands and Tribal Areas from the Proclamation and Other Planning Boundaries feature class (Protected Areas Database of the United States (PAD-US) 3.0, https://doi.org/10.5066/P9Q9LQ4B), was modified to prioritize and remove overlapping management designations, limiting overestimation in protection status or public access statistics and to support user needs for vector and raster analysis data. Analysis files in this data release were clipped to the Census State boundary file to define the extent and fill in areas (largely private land) outside the PAD-US, providing a common denominator for statistical summaries.

  8. g

    North Carolina (NC) Stochastic Empirical Loading and Dilution Model (SELDM)...

    • gimi9.com
    Updated May 26, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2019). North Carolina (NC) Stochastic Empirical Loading and Dilution Model (SELDM) summary statistics for physical and chemical data at NC highway-runoff and bridge-deck sites | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_north-carolina-nc-stochastic-empirical-loading-and-dilution-model-seldm-summary-statistics
    Explore at:
    Dataset updated
    May 26, 2019
    Area covered
    North Carolina
    Description

    The purpose of this USGS data release is to publish NC SELDM streamflow statistics and summary statistics of physical and chemical data in support of the information provided in the above-referenced report. This data release consists of two data sets, "NC SELDM streamflow statistics..." and "NC SELDM summary statistics for physical and chemical data...". The tables that are uploaded for the "NC SELDM streamflow statistics for 266 streamgages across North Carolina" sub-section are primarily the support files for the StreamStatsDB update that was completed when the report was approved. These files were generated using the GNWISQ and QSTATS computer programs developed and described by Granato (2009, appendices 1 and 4). This is discussed near the end of the "Prestorm streamflow statistics" section in the above-referenced report. A large table of selected site attributes and StreamStats basin characteristics that were compiled for the 266 streamgages is also provided as a part of this data release. A ReadMe file is also included in the sub-section of the data release. The tables that are uploaded for the "NC SELDM summary statistics for physical and chemical data at NC highway-runoff and bridge-deck sites" sub-section of the data release support the statewide medians table (Table 7) discussed within the "Simulating highway-runoff quality" section in the above-referenced report. This is a .csv file for each of the 11 constituents referenced in Table 11. Descriptions of the data fields (or columns) in the .csv tables are provided at the top of each .csv file. A ReadMe file is also included in the sub-section of the data release.

  9. Private rental market summary statistics - October 2014 to September 2015

    • gov.uk
    Updated Aug 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Valuation Office Agency (2023). Private rental market summary statistics - October 2014 to September 2015 [Dataset]. https://www.gov.uk/government/statistics/private-rental-market-summary-statistics-england-2014-15
    Explore at:
    Dataset updated
    Aug 15, 2023
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Valuation Office Agency
    Description

    The release presents the mean, median, lower quartile and upper quartile total monthly rent paid, for a number of bedroom/room categories. This covers each local authority in England, for the 12 months to the end of September 2015.

    For further details on the information included in this release, including a glossary of terms and a variable list for the CSV format files, please refer to the statistical summary.

  10. Real State Website Data

    • kaggle.com
    zip
    Updated Jun 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    M. Mazhar (2023). Real State Website Data [Dataset]. https://www.kaggle.com/datasets/mazhar01/real-state-website-data/code
    Explore at:
    zip(228356 bytes)Available download formats
    Dataset updated
    Jun 11, 2023
    Authors
    M. Mazhar
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Check: End-to-End Regression Model Pipeline Development with FastAPI: From Data Scraping to Deployment with CI/CD Integration

    This CSV dataset provides comprehensive information about house prices. It consists of 9,819 entries and 54 columns, offering a wealth of features for analysis. The dataset includes various numerical and categorical variables, providing insights into factors that influence house prices.

    The key columns in the dataset are as follows:

    1. Location1: The location of the houses. Location2 column is identical or shorter version of Location1 Year: The year of construction. Type: The type of the house. Bedrooms: The number of bedrooms in the house. Bathrooms: The number of bathrooms in the house. Size_in_SqYds: The size of the house in square yards. Price: The price of the house. Parking_Spaces: The number of parking spaces available. Floors_in_Building: The number of floors in the building. Elevators: The presence of elevators in the building. Lobby_in_Building: The presence of a lobby in the building.

    In addition to these, the dataset contains several other features related to various amenities and facilities available in the houses, such as double-glazed windows, central air conditioning, central heating, waste disposal, furnished status, service elevators, and more.

    By performing exploratory data analysis on this dataset using Python and the Pandas library, valuable insights can be gained regarding the relationships between different variables and the impact they have on house prices. Descriptive statistics, data visualization, and feature engineering techniques can be applied to uncover patterns and trends in the housing market.

    This dataset serves as a valuable resource for real estate professionals, analysts, and researchers interested in understanding the factors that contribute to house prices and making informed decisions in the real estate market.

  11. f

    Data to follow the statistical analysis including raw data as CSV files.

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    • +1more
    Updated Feb 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fey, Philipp; Mörchel, Philipp; Haddad, Daniel; Jakob, Peter; Hansmann, Jan; Stebani, Jannik; Weber, Daniel Ludwig; Hiller, Karl-Heinz (2023). Data to follow the statistical analysis including raw data as CSV files. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000951591
    Explore at:
    Dataset updated
    Feb 21, 2023
    Authors
    Fey, Philipp; Mörchel, Philipp; Haddad, Daniel; Jakob, Peter; Hansmann, Jan; Stebani, Jannik; Weber, Daniel Ludwig; Hiller, Karl-Heinz
    Description

    Data that was used to train the SVM. As the train-test data were assigned randomly for every training iteration, the individual data used for generating the subfigures b–e are not separately listed, as these cannot be manually recreated but depend on the train-test assignment by the algorithm. (ZIP)

  12. Participation measures in higher education - Headline stats summary

    • explore-education-statistics.service.gov.uk
    Updated Oct 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Education (2023). Participation measures in higher education - Headline stats summary [Dataset]. https://explore-education-statistics.service.gov.uk/data-catalogue/data-set/d056c449-6d3c-41cd-bf6b-32d2bf206ec8
    Explore at:
    Dataset updated
    Oct 26, 2023
    Dataset authored and provided by
    Department for Educationhttps://gov.uk/dfe
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    HE Participation by age 25 (CHEP-25) key stats

  13. Z

    Full summary statistics from 387 EWAS conducted for the EWAS Catalog

    • data.niaid.nih.gov
    Updated Apr 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EWAS Catalog team (2021). Full summary statistics from 387 EWAS conducted for the EWAS Catalog [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4672644
    Explore at:
    Dataset updated
    Apr 9, 2021
    Dataset provided by
    MRC Integrative Epidemiology Unit, University of Bristol
    Authors
    EWAS Catalog team
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Full summary statistics from 387 epigenome-wide association studies (EWAS) conducted by The EWAS Catalog team (http://www.ewascatalog.org/). Meta-data is found in the "studies-full.csv" file and the results are in "full_stats.tar.gz". Unzipping the "full_stats.tar.gz" file will reveal a folder containing 387 csv files, each with the full summary statistics from one EWAS. The results can be linked to the meta-data using the "Results_file" column in "studies-full.csv". These analyses were conducted using data from the Accessible Resource for Integrated Epigenomics Studies (ARIES) subset of the Avon Longitudinal Study of Parents and Children (ALSPAC) cohort. For more information on the EWAS, please consult our paper: Battram, Thomas, et al. "The EWAS Catalog: A Database of Epigenome-wide Association Studies." OSF Preprints, 4 Feb. 2021. https://doi.org/10.31219/osf.io/837wn. Please cite the paper if you use the dataset.

  14. Apprenticeships and traineeships - Public sector target - summary

    • explore-education-statistics.service.gov.uk
    Updated Jan 28, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Education (2021). Apprenticeships and traineeships - Public sector target - summary [Dataset]. https://explore-education-statistics.service.gov.uk/data-catalogue/data-set/b748f358-940c-4b4c-a6fe-bea16435d67e
    Explore at:
    Dataset updated
    Jan 28, 2021
    Dataset authored and provided by
    Department for Educationhttps://gov.uk/dfe
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Summary statistics of figures supplied by public sector bodies (up to and including 17 November 2021) covering the four years of the target (period covering 1 April 2017 to 31 March 2021)Reporting periods: 2017-18 to 2020-21Indicators: Apprentices (prior to period, new in period, at end of period, cumulative since April 2017)Employees (prior to period, new in period, at end of period, cumulative since April 2017)Apprenticeship percentage (prior to period, new in period, at end of period cumulative since April 2017)Number of employers in the periodPercentage of employees starting apprenticeships in periodFilters: Subsector

  15. NBA Top Scorers (2000-2024)

    • kaggle.com
    zip
    Updated Aug 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nick Alden (2024). NBA Top Scorers (2000-2024) [Dataset]. https://www.kaggle.com/datasets/nickalden/nba-top-scorers-stats-and-shot-details-2000-2024
    Explore at:
    zip(2731965 bytes)Available download formats
    Dataset updated
    Aug 22, 2024
    Authors
    Nick Alden
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Data for the top 10 scorers in the NBA from the years 2000-2024. Scraped using nba_api.

    leaders.csv - - General season statistics for each season's top 10 scorers

    shotsXXXXs.csv - - Shot details for every made shot from each season's top 10 scorers

    shots2000s.csv - - Data from 2000-01 season through 2009-10 season

    shots2010s.csv - - Data from 2010-11 season through 2019-20 season

    shots2020s.csv - - Data from 2020-21 season through 2023-24 season

  16. LA and school expenditure - Summary data for headline figures

    • explore-education-statistics.service.gov.uk
    Updated Dec 16, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Education (2021). LA and school expenditure - Summary data for headline figures [Dataset]. https://explore-education-statistics.service.gov.uk/data-catalogue/data-set/9f43d48f-17f5-48c1-9e28-986577766a26
    Explore at:
    Dataset updated
    Dec 16, 2021
    Dataset authored and provided by
    Department for Educationhttps://gov.uk/dfe
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    This file contains a summary of the headline, national level figures for this publication.

  17. s

    Data from: Data files used to study change dynamics in software systems

    • figshare.swinburne.edu.au
    pdf
    Updated Jul 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rajesh Vasa (2024). Data files used to study change dynamics in software systems [Dataset]. http://doi.org/10.25916/sut.26288227.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 22, 2024
    Dataset provided by
    Swinburne
    Authors
    Rajesh Vasa
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    It is a widely accepted fact that evolving software systems change and grow. However, it is less well-understood how change is distributed over time, specifically in object oriented software systems. The patterns and techniques used to measure growth permit developers to identify specific releases where significant change took place as well as to inform them of the longer term trend in the distribution profile. This knowledge assists developers in recording systemic and substantial changes to a release, as well as to provide useful information as input into a potential release retrospective. However, these analysis methods can only be applied after a mature release of the code has been developed. But in order to manage the evolution of complex software systems effectively, it is important to identify change-prone classes as early as possible. Specifically, developers need to know where they can expect change, the likelihood of a change, and the magnitude of these modifications in order to take proactive steps and mitigate any potential risks arising from these changes. Previous research into change-prone classes has identified some common aspects, with different studies suggesting that complex and large classes tend to undergo more changes and classes that changed recently are likely to undergo modifications in the near future. Though the guidance provided is helpful, developers need more specific guidance in order for it to be applicable in practice. Furthermore, the information needs to be available at a level that can help in developing tools that highlight and monitor evolution prone parts of a system as well as support effort estimation activities. The specific research questions that we address in this chapter are: (1) What is the likelihood that a class will change from a given version to the next? (a) Does this probability change over time? (b) Is this likelihood project specific, or general? (2) How is modification frequency distributed for classes that change? (3) What is the distribution of the magnitude of change? Are most modifications minor adjustments, or substantive modifications? (4) Does structural complexity make a class susceptible to change? (5) Does popularity make a class more change-prone? We make recommendations that can help developers to proactively monitor and manage change. These are derived from a statistical analysis of change in approximately 55000 unique classes across all projects under investigation. The analysis methods that we applied took into consideration the highly skewed nature of the metric data distributions. The raw metric data (4 .txt files and 4 .log files in a .zip file measuring ~2MB in total) is provided as a comma separated values (CSV) file, and the first line of the CSV file contains the header. A detailed output of the statistical analysis undertaken is provided as log files generated directly from Stata (statistical analysis software).

  18. Genome-wide association summary statistics of chronic musculoskeletal pain...

    • zenodo.org
    csv
    Updated May 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yakov A. Tsepilov; Yakov A. Tsepilov; Maxim B. Freidin; Alexandra S. Shadrina; Alexandra S. Shadrina; Sodbo Z. Sharapov; Sodbo Z. Sharapov; Elizaveta E. Elgaeva; Elizaveta E. Elgaeva; Jan van Zundert; Jan van Zundert; Lennart С. Karssen; Lennart С. Karssen; Pradeep Suri; Frances M.K. Williams; Frances M.K. Williams; Yurii S. Aulchenko; Yurii S. Aulchenko; Maxim B. Freidin; Pradeep Suri (2020). Genome-wide association summary statistics of chronic musculoskeletal pain at four anatomic sites and their genetically independent components [Dataset]. http://doi.org/10.5281/zenodo.3797553
    Explore at:
    csvAvailable download formats
    Dataset updated
    May 15, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yakov A. Tsepilov; Yakov A. Tsepilov; Maxim B. Freidin; Alexandra S. Shadrina; Alexandra S. Shadrina; Sodbo Z. Sharapov; Sodbo Z. Sharapov; Elizaveta E. Elgaeva; Elizaveta E. Elgaeva; Jan van Zundert; Jan van Zundert; Lennart С. Karssen; Lennart С. Karssen; Pradeep Suri; Frances M.K. Williams; Frances M.K. Williams; Yurii S. Aulchenko; Yurii S. Aulchenko; Maxim B. Freidin; Pradeep Suri
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains results of a genome-wide association study of distinct chronic musculoskeletal pain conditions: back pain, knee pain, neck pain, and hip pain. Additionally, there are genome-wide association summary statistics for four genetically independent components of pain conditions, listed above. For more details, please, read the paper XXX.

    All files contain association summary statistics for genome-wide association meta-analysis of the 265,000 white British individuals from the UK Biobank and additional 191,580 individuals of European Ancestry from the UK biobank (total N = 456,580). Cases and controls were defined based on questionnaire responses. First, participants responded to “Pain type(s) experienced in the last months” followed by questions inquiring if the specific pain had been present for more than 3 months. Those who reported back, neck or shoulder, hip, or knee pain lasting more than 3 months were considered chronic back, neck/shoulder, hip, and knee pain cases, respectively. Participants reporting no such pain lasting longer than 3 months were considered controls (regardless of whether they had another regional chronic pain, such as abdominal pain, or not). Individuals who preferred not to answer were excluded from the study. Besides this, we excluded individuals who reported more than 3 months of pain all over the body.

    The data are provided on an "AS-IS" basis, without warranty of any type, expressed or implied, including but not limited to any warranty as to their performance, merchantability, or fitness for any particular purpose. If investigators use these data, any and all consequences are entirely their responsibility. By downloading and using these data, you agree that you will cite the appropriate publication in any communications or publications arising directly or indirectly from these data; for utilization of data available prior to publication, you agree to respect the requested responsibilities of resource users under 2003 Fort Lauderdale principles; you agree that you will never attempt to identify any participant. This research has been conducted using the UK Biobank Resource and the use of the data is guided by the principles formulated by the UK Biobank.

    When using downloaded data, please cite the corresponding paper and this repository:

    1. Tsepilov et al 2020

    Funding:

    The work of YSA and SZS was supported by the Russian Ministry of Education and Science under the 5-100 Excellence Programme and by the Federal Agency of Scientific Organizations via the Institute of Cytology and Genetics (project 0324-2019-0040). The work of YAT, ASSh, and EEE was supported by the Russian Foundation for Basic Research (project 19-015-00151). The contribution of LСK was funded by PolyOmica. Dr. Suri was supported by VA Career Development Award # 1IK2RX001515 from the United States (U.S.) Department of Veterans Affairs Rehabilitation Research and Development (RR&D) Service. Dr. Suri is a Staff Physician at the VA Puget Sound Health Care System. The contents of this work do not represent the views of the U.S. Department of Veterans Affairs or the United States Government.

    List of files:

    1. Back_output_done.csv: GWAS summary statistics for the chronic back pain
    2. gpc1_output_done.csv: GWAS summary statistics for the GIP1
    3. gpc2_output_done.csv: GWAS summary statistics for the GIP2
    4. gpc3_output_done.csv: GWAS summary statistics for the GIP3
    5. gpc4_output_done.csv: GWAS summary statistics for the GIP4
    6. Hip_output_done.csv: GWAS summary statistics for the chronic hip pain
    7. Knee_output_done.csv: GWAS summary statistics for the chronic knee pain
    8. Neck_output_done.csv: GWAS summary statistics for the chronic neck pain

    Column headers:

    1. gwas_id: uninformative field
    2. rs_id: dbSNP rsID (GRCh37 build)
    3. snp_num: uninformative field
    4. chr: chromosome (GRCh37 build)
    5. bp: position (GRCh37 build)
    6. ea: effect allele (coded as "1")
    7. ra: reference allele (coded as "0")
    8. eaf: effect allele frequency
    9. af_ref: uninformative field
    10. beta: effect size of effect allele
    11. se: standard error of effect size
    12. p: P-value of association (without GC correction)
    13. n:Total sample size
    14. z: Z-statistic of association
    15. info: uninformative field
    16. af_outlier: uninformative field
    17. pz_outlier: uninformative field
  19. Higher Level Learners in England - Headline stats summary

    • explore-education-statistics.service.gov.uk
    Updated May 25, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Education (2023). Higher Level Learners in England - Headline stats summary [Dataset]. https://explore-education-statistics.service.gov.uk/data-catalogue/data-set/5bac16ac-e04c-4be8-9bcd-f5d5c62c34b8
    Explore at:
    Dataset updated
    May 25, 2023
    Dataset authored and provided by
    Department for Educationhttps://gov.uk/dfe
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Headline statistics used in the blue summary boxes of the publication.

  20. Widening participation in higher education - Headline Stats

    • explore-education-statistics.service.gov.uk
    Updated Jul 28, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Education (2022). Widening participation in higher education - Headline Stats [Dataset]. https://explore-education-statistics.service.gov.uk/data-catalogue/data-set/26d4d206-269f-4271-9ee6-5f24d8f17d28
    Explore at:
    Dataset updated
    Jul 28, 2022
    Dataset authored and provided by
    Department for Educationhttps://gov.uk/dfe
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Key summary statistics

    - Explore Education Statistics data set Headline Stats from Widening participation in higher education

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Tom J. Pollard; Alistair E. W. Johnson; Jesse D. Raffa; Roger G. Mark; Tom J. Pollard; Alistair E. W. Johnson; Jesse D. Raffa; Roger G. Mark (2022). Data from: tableone: An open source Python package for producing summary statistics for research papers [Dataset]. http://doi.org/10.5061/dryad.26c4s35
Organization logo

Data from: tableone: An open source Python package for producing summary statistics for research papers

Related Article
Explore at:
csv, txtAvailable download formats
Dataset updated
May 30, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Tom J. Pollard; Alistair E. W. Johnson; Jesse D. Raffa; Roger G. Mark; Tom J. Pollard; Alistair E. W. Johnson; Jesse D. Raffa; Roger G. Mark
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

Objectives: In quantitative research, understanding basic parameters of the study population is key for interpretation of the results. As a result, it is typical for the first table ("Table 1") of a research paper to include summary statistics for the study data. Our objectives are 2-fold. First, we seek to provide a simple, reproducible method for providing summary statistics for research papers in the Python programming language. Second, we seek to use the package to improve the quality of summary statistics reported in research papers.

Materials and Methods: The tableone package is developed following good practice guidelines for scientific computing and all code is made available under a permissive MIT License. A testing framework runs on a continuous integration server, helping to maintain code stability. Issues are tracked openly and public contributions are encouraged.

Results: The tableone software package automatically compiles summary statistics into publishable formats such as CSV, HTML, and LaTeX. An executable Jupyter Notebook demonstrates application of the package to a subset of data from the MIMIC-III database. Tests such as Tukey's rule for outlier detection and Hartigan's Dip Test for modality are computed to highlight potential issues in summarizing the data.

Discussion and Conclusion: We present open source software for researchers to facilitate carrying out reproducible studies in Python, an increasingly popular language in scientific research. The toolkit is intended to mature over time with community feedback and input. Development of a common tool for summarizing data may help to promote good practice when used as a supplement to existing guidelines and recommendations. We encourage use of tableone alongside other methods of descriptive statistics and, in particular, visualization to ensure appropriate data handling. We also suggest seeking guidance from a statistician when using tableone for a research study, especially prior to submitting the study for publication.

Search
Clear search
Close search
Google apps
Main menu