100+ datasets found
  1. Data from: NICHE: A Curated Dataset of Engineered Machine Learning Projects...

    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO (2023). NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python [Dataset]. http://doi.org/10.6084/m9.figshare.21967265.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such high-quality dataset poses an obstacle to understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. In this repository we provide "NICHE.csv" file that contains the list of the project names along with their labels, descriptive information for every dimension, and several basic statistics, such as the number of stars and commits. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.

    GitHub page: https://github.com/soarsmu/NICHE

  2. Social Media and Mental Health

    • kaggle.com
    zip
    Updated Jul 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SouvikAhmed071 (2023). Social Media and Mental Health [Dataset]. https://www.kaggle.com/datasets/souvikahmed071/social-media-and-mental-health
    Explore at:
    zip(10944 bytes)Available download formats
    Dataset updated
    Jul 18, 2023
    Authors
    SouvikAhmed071
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    This dataset was originally collected for a data science and machine learning project that aimed at investigating the potential correlation between the amount of time an individual spends on social media and the impact it has on their mental health.

    The project involves conducting a survey to collect data, organizing the data, and using machine learning techniques to create a predictive model that can determine whether a person should seek professional help based on their answers to the survey questions.

    This project was completed as part of a Statistics course at a university, and the team is currently in the process of writing a report and completing a paper that summarizes and discusses the findings in relation to other research on the topic.

    The following is the Google Colab link to the project, done on Jupyter Notebook -

    https://colab.research.google.com/drive/1p7P6lL1QUw1TtyUD1odNR4M6TVJK7IYN

    The following is the GitHub Repository of the project -

    https://github.com/daerkns/social-media-and-mental-health

    Libraries used for the Project -

    Pandas
    Numpy
    Matplotlib
    Seaborn
    Sci-kit Learn
    
  3. N

    Gratis, OH Population Breakdown by Gender Dataset: Male and Female...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Gratis, OH Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b235d8fd-f25d-11ef-8c1b-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Gratis
    Variables measured
    Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Gratis by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Gratis across both sexes and to determine which sex constitutes the majority.

    Key observations

    There is a slight majority of female population, with 50.0% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

    Variables / Data Columns

    • Gender: This column displays the Gender (Male / Female)
    • Population: The population of the gender in the Gratis is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each gender as a proportion of Gratis total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Gratis Population by Race & Ethnicity. You can refer the same here

  4. Hydrographic and Impairment Statistics Database: THRB

    • catalog.data.gov
    • datasets.ai
    Updated Nov 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Park Service (2025). Hydrographic and Impairment Statistics Database: THRB [Dataset]. https://catalog.data.gov/dataset/hydrographic-and-impairment-statistics-database-thrb
    Explore at:
    Dataset updated
    Nov 25, 2025
    Dataset provided by
    National Park Servicehttp://www.nps.gov/
    Description

    Hydrographic and Impairment Statistics (HIS) is a National Park Service (NPS) Water Resources Division (WRD) project established to track certain goals created in response to the Government Performance and Results Act of 1993 (GPRA). One water resources management goal established by the Department of the Interior under GRPA requires NPS to track the percent of its managed surface waters that are meeting Clean Water Act (CWA) water quality standards. This goal requires an accurate inventory that spatially quantifies the surface water hydrography that each bureau manages and a procedure to determine and track which waterbodies are or are not meeting water quality standards as outlined by Section 303(d) of the CWA. This project helps meet this DOI GRPA goal by inventorying and monitoring in a geographic information system for the NPS: (1) CWA 303(d) quality impaired waters and causes; and (2) hydrographic statistics based on the United States Geological Survey (USGS) National Hydrography Dataset (NHD). Hydrographic and 303(d) impairment statistics were evaluated based on a combination of 1:24,000 (NHD) and finer scale data (frequently provided by state GIS layers).

  5. Training statistics data

    • kaggle.com
    zip
    Updated Aug 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saleh Zeer (2023). Training statistics data [Dataset]. https://www.kaggle.com/datasets/salehzeer/training-statistics-data
    Explore at:
    zip(187579 bytes)Available download formats
    Dataset updated
    Aug 1, 2023
    Authors
    Saleh Zeer
    Description

    Dataset

    This dataset was created by Saleh Zeer

    Contents

  6. e

    Data from: Cascade Datasets 1-25

    • ore.exeter.ac.uk
    xlsx
    Updated Jul 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charlotte Miles (2025). Cascade Datasets 1-25 [Dataset]. https://ore.exeter.ac.uk/articles/dataset/Cascade_Datasets_1-25/29673896
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 30, 2025
    Dataset provided by
    University of Exeter
    Authors
    Charlotte Miles
    License

    https://www.rioxx.net/licenses/all-rights-reservedhttps://www.rioxx.net/licenses/all-rights-reserved

    Description

    Open datasets for the Exeter Cascade Project 1-25.

  7. Z

    Data from: A 24-hour dynamic population distribution dataset based on mobile...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Claudia Bergroth; Olle Järv; Henrikki Tenkanen; Matti Manninen; Tuuli Toivonen (2022). A 24-hour dynamic population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4724388
    Explore at:
    Dataset updated
    Feb 16, 2022
    Dataset provided by
    Elisa Corporation
    Unit of Urban Research and Statistics, City of Helsinki / Digital Geography Lab, Department of Geosciences and Geography, University of Helsinki
    Digital Geography Lab, Department of Geosciences and Geography, University of Helsinki
    Department of Built Environment, Aalto University / Centre for Advanced Spatial Analysis, University College London
    Authors
    Claudia Bergroth; Olle Järv; Henrikki Tenkanen; Matti Manninen; Tuuli Toivonen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Helsinki Metropolitan Area, Finland
    Description

    Related article: Bergroth, C., Järv, O., Tenkanen, H., Manninen, M., Toivonen, T., 2022. A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland. Scientific Data 9, 39.

    In this dataset:

    We present temporally dynamic population distribution data from the Helsinki Metropolitan Area, Finland, at the level of 250 m by 250 m statistical grid cells. Three hourly population distribution datasets are provided for regular workdays (Mon – Thu), Saturdays and Sundays. The data are based on aggregated mobile phone data collected by the biggest mobile network operator in Finland. Mobile phone data are assigned to statistical grid cells using an advanced dasymetric interpolation method based on ancillary data about land cover, buildings and a time use survey. The data were validated by comparing population register data from Statistics Finland for night-time hours and a daytime workplace registry. The resulting 24-hour population data can be used to reveal the temporal dynamics of the city and examine population variations relevant to for instance spatial accessibility analyses, crisis management and planning.

    Please cite this dataset as:

    Bergroth, C., Järv, O., Tenkanen, H., Manninen, M., Toivonen, T., 2022. A 24-hour population distribution dataset based on mobile phone data from Helsinki Metropolitan Area, Finland. Scientific Data 9, 39. https://doi.org/10.1038/s41597-021-01113-4

    Organization of data

    The dataset is packaged into a single Zipfile Helsinki_dynpop_matrix.zip which contains following files:

    HMA_Dynamic_population_24H_workdays.csv represents the dynamic population for average workday in the study area.

    HMA_Dynamic_population_24H_sat.csv represents the dynamic population for average saturday in the study area.

    HMA_Dynamic_population_24H_sun.csv represents the dynamic population for average sunday in the study area.

    target_zones_grid250m_EPSG3067.geojson represents the statistical grid in ETRS89/ETRS-TM35FIN projection that can be used to visualize the data on a map using e.g. QGIS.

    Column names

    YKR_ID : a unique identifier for each statistical grid cell (n=13,231). The identifier is compatible with the statistical YKR grid cell data by Statistics Finland and Finnish Environment Institute.

    H0, H1 ... H23 : Each field represents the proportional distribution of the total population in the study area between grid cells during a one-hour period. In total, 24 fields are formatted as “Hx”, where x stands for the hour of the day (values ranging from 0-23). For example, H0 stands for the first hour of the day: 00:00 - 00:59. The sum of all cell values for each field equals to 100 (i.e. 100% of total population for each one-hour period)

    In order to visualize the data on a map, the result tables can be joined with the target_zones_grid250m_EPSG3067.geojson data. The data can be joined by using the field YKR_ID as a common key between the datasets.

    License Creative Commons Attribution 4.0 International.

    Related datasets

    Järv, Olle; Tenkanen, Henrikki & Toivonen, Tuuli. (2017). Multi-temporal function-based dasymetric interpolation tool for mobile phone data. Zenodo. https://doi.org/10.5281/zenodo.252612

    Tenkanen, Henrikki, & Toivonen, Tuuli. (2019). Helsinki Region Travel Time Matrix [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3247564

  8. NIST Statistical Reference Datasets - SRD 140

    • data.nist.gov
    • gimi9.com
    • +4more
    Updated Nov 20, 2003
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    William F. Guthrie (2003). NIST Statistical Reference Datasets - SRD 140 [Dataset]. http://doi.org/10.18434/T43G6C
    Explore at:
    Dataset updated
    Nov 20, 2003
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Authors
    William F. Guthrie
    License

    https://www.nist.gov/open/licensehttps://www.nist.gov/open/license

    Description

    The purpose of this project is to improve the accuracy of statistical software by providing reference datasets with certified computational results that enable the objective evaluation of statistical software. Currently datasets and certified values are provided for assessing the accuracy of software for univariate statistics, linear regression, nonlinear regression, and analysis of variance. The collection includes both generated and 'real-world' data of varying levels of difficulty. Generated datasets are designed to challenge specific computations. These include the classic Wampler datasets for testing linear regression algorithms and the Simon & Lesage datasets for testing analysis of variance algorithms. Real-world data include challenging datasets such as the Longley data for linear regression, and more benign datasets such as the Daniel & Wood data for nonlinear regression. Certified values are 'best-available' solutions. The certification procedure is described in the web pages for each statistical method. Datasets are ordered by level of difficulty (lower, average, and higher). Strictly speaking the level of difficulty of a dataset depends on the algorithm. These levels are merely provided as rough guidance for the user. Producing correct results on all datasets of higher difficulty does not imply that your software will pass all datasets of average or even lower difficulty. Similarly, producing correct results for all datasets in this collection does not imply that your software will do the same for your particular dataset. It will, however, provide some degree of assurance, in the sense that your package provides correct results for datasets known to yield incorrect results for some software. The Statistical Reference Datasets is also supported by the Standard Reference Data Program.

  9. World Bank: Education Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2019). World Bank: Education Data [Dataset]. https://www.kaggle.com/datasets/theworldbank/world-bank-intl-education
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset provided by
    World Bank Grouphttp://www.worldbank.org/
    Authors
    World Bank
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The World Bank is an international financial institution that provides loans to countries of the world for capital projects. The World Bank's stated goal is the reduction of poverty. Source: https://en.wikipedia.org/wiki/World_Bank

    Content

    This dataset combines key education statistics from a variety of sources to provide a look at global literacy, spending, and access.

    For more information, see the World Bank website.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:world_bank_health_population

    http://data.worldbank.org/data-catalog/ed-stats

    https://cloud.google.com/bigquery/public-data/world-bank-education

    Citation: The World Bank: Education Statistics

    Dataset Source: World Bank. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy - and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by @till_indeman from Unplash.

    Inspiration

    Of total government spending, what percentage is spent on education?

  10. d

    Community Services Statistics, July 2025

    • digital.nhs.uk
    csv, xlsx, zip
    Updated Oct 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Community Services Statistics, July 2025 [Dataset]. https://digital.nhs.uk/data-and-information/publications/statistical/community-services-statistics-for-children-young-people-and-adults/july-2025
    Explore at:
    zip(2.9 MB), csv(1.4 MB), zip(2.8 MB), csv(6.2 MB), xlsx(226.2 kB)Available download formats
    Dataset updated
    Oct 7, 2025
    License

    https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions

    Time period covered
    Jul 1, 2025 - Jul 31, 2025
    Area covered
    England
    Description

    Contains data on Community Services Statistics for July 2025 and a provisional data file for August 2025 (note this is intended as an early view until providers submit a refresh of their data).

  11. Website Statistics

    • data.wu.ac.at
    • lcc.portaljs.com
    • +2more
    csv, pdf
    Updated Jun 11, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lincolnshire County Council (2018). Website Statistics [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/M2ZkZDBjOTUtMzNhYi00YWRjLWI1OWMtZmUzMzA5NjM0ZTdk
    Explore at:
    csv, pdfAvailable download formats
    Dataset updated
    Jun 11, 2018
    Dataset provided by
    Lincolnshire County Councilhttp://www.lincolnshire.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    This Website Statistics dataset has four resources showing usage of the Lincolnshire Open Data website. Web analytics terms used in each resource are defined in their accompanying Metadata file.

    • Website Usage Statistics: This document shows a statistical summary of usage of the Lincolnshire Open Data site for the latest calendar year.

    • Website Statistics Summary: This dataset shows a website statistics summary for the Lincolnshire Open Data site for the latest calendar year.

    • Webpage Statistics: This dataset shows statistics for individual Webpages on the Lincolnshire Open Data site by calendar year.

    • Dataset Statistics: This dataset shows cumulative totals for Datasets on the Lincolnshire Open Data site that have also been published on the national Open Data site Data.Gov.UK - see the Source link.

      Note: Website and Webpage statistics (the first three resources above) show only UK users, and exclude API calls (automated requests for datasets). The Dataset Statistics are confined to users with javascript enabled, which excludes web crawlers and API calls.

    These Website Statistics resources are updated annually in January by the Lincolnshire County Council Business Intelligence team. For any enquiries about the information contact opendata@lincolnshire.gov.uk.

  12. Data generation volume worldwide 2010-2029

    • statista.com
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Data generation volume worldwide 2010-2029 [Dataset]. https://www.statista.com/statistics/871513/worldwide-data-created/
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly. While it was estimated at ***** zettabytes in 2025, the forecast for 2029 stands at ***** zettabytes. Thus, global data generation will triple between 2025 and 2029. Data creation has been expanding continuously over the past decade. In 2020, the growth was higher than previously expected, caused by the increased demand due to the coronavirus (COVID-19) pandemic, as more people worked and learned from home and used home entertainment options more often.

  13. NCES Academic Library Survey Dataset 1996 - 2020 -- alsMERGE_2020.csv

    • figshare.com
    txt
    Updated Jan 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Starr Hoffman (2024). NCES Academic Library Survey Dataset 1996 - 2020 -- alsMERGE_2020.csv [Dataset]. http://doi.org/10.6084/m9.figshare.25007429.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 16, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Starr Hoffman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains data from the National Center for Education Statistics' Academic Library Survey, which was gathered every two years from 1996 - 2014, and annually in IPEDS starting in 2014 (this dataset has continued to only merge data every two years, following the original schedule). This data was merged, transformed, and used for research by Starr Hoffman and Samantha Godbey.This data was merged using R; R scripts for this merge can be made available upon request. Some variables changed names or definitions during this time; a view of these variables over time is provided in the related Figshare Project. Carnegie Classification changed several times during this period; all Carnegie classifications were crosswalked to the 2000 classification version; that information is also provided in the related Figshare Project. This data was used for research published in several articles, conference papers, and posters starting in 2018 (some of this research used an older version of the dataset which was deposited in the University of Nevada, Las Vegas's repository).SourcesAll data sources were downloaded from the National Center for Education Statistics website https://nces.ed.gov/. Individual datasets and years accessed are listed below.[dataset] U.S. Department of Education, National Center for Education Statistics, Academic Libraries component, Integrated Postsecondary Education Data System (IPEDS), (2020, 2018, 2016, 2014), https://nces.ed.gov/ipeds/datacenter/login.aspx?gotoReportId=7[dataset] U.S. Department of Education, National Center for Education Statistics, Academic Libraries Survey (ALS) Public Use Data File, Library Statistics Program, (2012, 2010, 2008, 2006, 2004, 2002, 2000, 1998, 1996), https://nces.ed.gov/surveys/libraries/aca_data.asp[dataset] U.S. Department of Education, National Center for Education Statistics, Institutional Characteristics component, Integrated Postsecondary Education Data System (IPEDS), (2020, 2018, 2016, 2014), https://nces.ed.gov/ipeds/datacenter/login.aspx?gotoReportId=7[dataset] U.S. Department of Education, National Center for Education Statistics, Fall Enrollment component, Integrated Postsecondary Education Data System (IPEDS), (2020, 2018, 2016, 2014, 2012, 2010, 2008, 2006, 2004, 2002, 2000, 1998, 1996), https://nces.ed.gov/ipeds/datacenter/login.aspx?gotoReportId=7[dataset] U.S. Department of Education, National Center for Education Statistics, Human Resources component, Integrated Postsecondary Education Data System (IPEDS), (2020, 2018, 2016, 2014, 2012, 2010, 2008, 2006), https://nces.ed.gov/ipeds/datacenter/login.aspx?gotoReportId=7[dataset] U.S. Department of Education, National Center for Education Statistics, Employees Assigned by Position component, Integrated Postsecondary Education Data System (IPEDS), (2004, 2002), https://nces.ed.gov/ipeds/datacenter/login.aspx?gotoReportId=7[dataset] U.S. Department of Education, National Center for Education Statistics, Fall Staff component, Integrated Postsecondary Education Data System (IPEDS), (1999, 1997, 1995), https://nces.ed.gov/ipeds/datacenter/login.aspx?gotoReportId=7

  14. Hydrographic and Impairment Statistics Database: ULSG

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Nov 25, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Park Service (2025). Hydrographic and Impairment Statistics Database: ULSG [Dataset]. https://catalog.data.gov/dataset/hydrographic-and-impairment-statistics-database-ulsg
    Explore at:
    Dataset updated
    Nov 25, 2025
    Dataset provided by
    National Park Servicehttp://www.nps.gov/
    Description

    Hydrographic and Impairment Statistics (HIS) is a National Park Service (NPS) Water Resources Division (WRD) project established to track certain goals created in response to the Government Performance and Results Act of 1993 (GPRA). One water resources management goal established by the Department of the Interior under GRPA requires NPS to track the percent of its managed surface waters that are meeting Clean Water Act (CWA) water quality standards. This goal requires an accurate inventory that spatially quantifies the surface water hydrography that each bureau manages and a procedure to determine and track which waterbodies are or are not meeting water quality standards as outlined by Section 303(d) of the CWA. This project helps meet this DOI GRPA goal by inventorying and monitoring in a geographic information system for the NPS: (1) CWA 303(d) quality impaired waters and causes; and (2) hydrographic statistics based on the United States Geological Survey (USGS) National Hydrography Dataset (NHD). Hydrographic and 303(d) impairment statistics were evaluated based on a combination of 1:24,000 (NHD) and finer scale data (frequently provided by state GIS layers).

  15. Dataset of the paper: "How do Hugging Face Models Document Datasets, Bias,...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Jan 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Federica Pepe; Vittoria Nardone; Vittoria Nardone; Antonio Mastropaolo; Antonio Mastropaolo; Gerardo Canfora; Gerardo Canfora; Gabriele BAVOTA; Gabriele BAVOTA; Massimiliano Di Penta; Massimiliano Di Penta; Federica Pepe (2024). Dataset of the paper: "How do Hugging Face Models Document Datasets, Bias, and Licenses? An Empirical Study" [Dataset]. http://doi.org/10.5281/zenodo.10058142
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Federica Pepe; Vittoria Nardone; Vittoria Nardone; Antonio Mastropaolo; Antonio Mastropaolo; Gerardo Canfora; Gerardo Canfora; Gabriele BAVOTA; Gabriele BAVOTA; Massimiliano Di Penta; Massimiliano Di Penta; Federica Pepe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This replication package contains datasets and scripts related to the paper: "*How do Hugging Face Models Document Datasets, Bias, and Licenses? An Empirical Study*"

    ## Root directory

    - `statistics.r`: R script used to compute the correlation between usage and downloads, and the RQ1/RQ2 inter-rater agreements

    - `modelsInfo.zip`: zip file containing all the downloaded model cards (in JSON format)

    - `script`: directory containing all the scripts used to collect and process data. For further details, see README file inside the script directory.

    ## Dataset

    - `Dataset/Dataset_HF-models-list.csv`: list of HF models analyzed

    - `Dataset/Dataset_github-prj-list.txt`: list of GitHub projects using the *transformers* library

    - `Dataset/Dataset_github-Prj_model-Used.csv`: contains usage pairs: project, model

    - `Dataset/Dataset_prj-num-models-reused.csv`: number of models used by each GitHub project

    - `Dataset/Dataset_model-download_num-prj_correlation.csv` contains, for each model used by GitHub projects: the name, the task, the number of reusing projects, and the number of downloads

    ## RQ1

    - `RQ1/RQ1_dataset-list.txt`: list of HF datasets

    - `RQ1/RQ1_datasetSample.csv`: sample set of models used for the manual analysis of datasets

    - `RQ1/RQ1_analyzeDatasetTags.py`: Python script to analyze model tags for the presence of datasets. it requires to unzip the `modelsInfo.zip` in a directory with the same name (`modelsInfo`) at the root of the replication package folder. Produces the output to stdout. To redirect in a file fo be analyzed by the `RQ2/countDataset.py` script

    - `RQ1/RQ1_countDataset.py`: given the output of `RQ2/analyzeDatasetTags.py` (passed as argument) produces, for each model, a list of Booleans indicating whether (i) the model only declares HF datasets, (ii) the model only declares external datasets, (iii) the model declares both, and (iv) the model is part of the sample for the manual analysis

    - `RQ1/RQ1_datasetTags.csv`: output of `RQ2/analyzeDatasetTags.py`

    - `RQ1/RQ1_dataset_usage_count.csv`: output of `RQ2/countDataset.py`

    ## RQ2

    - `RQ2/tableBias.pdf`: table detailing the number of occurrences of different types of bias by model Task

    - `RQ2/RQ2_bias_classification_sheet.csv`: results of the manual labeling

    - `RQ2/RQ2_isBiased.csv`: file to compute the inter-rater agreement of whether or not a model documents Bias

    - `RQ2/RQ2_biasAgrLabels.csv`: file to compute the inter-rater agreement related to bias categories

    - `RQ2/RQ2_final_bias_categories_with_levels.csv`: for each model in the sample, this file lists (i) the bias leaf category, (ii) the first-level category, and (iii) the intermediate category

    ## RQ3

    - `RQ3/RQ3_LicenseValidation.csv`: manual validation of a sample of licenses

    - `RQ3/RQ3_{NETWORK-RESTRICTIVE|RESTRICTIVE|WEAK-RESTRICTIVE|PERMISSIVE}-license-list.txt`: lists of licenses with different permissiveness

    - `RQ3/RQ3_prjs_license.csv`: for each project linked to models, among other fields it indicates the license tag and name

    - `RQ3/RQ3_models_license.csv`: for each model, indicates among other pieces of info, whether the model has a license, and if yes what kind of license

    - `RQ3/RQ3_model-prj-license_contingency_table.csv`: usage contingency table between projects' licenses (columns) and models' licenses (rows)

    - `RQ3/RQ3_models_prjs_licenses_with_type.csv`: pairs project-model, with their respective licenses and permissiveness level

    ## scripts

    Contains the scripts used to mine Hugging Face and GitHub. Details are in the enclosed README

  16. N

    Comprehensive Income by Age Group Dataset: Longitudinal Analysis of Gate, OK...

    • neilsberg.com
    Updated Aug 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Comprehensive Income by Age Group Dataset: Longitudinal Analysis of Gate, OK Household Incomes Across 4 Age Groups and 16 Income Brackets. Annual Editions Collection // 2024 Edition [Dataset]. https://www.neilsberg.com/research/datasets/2ecf2ac3-aeee-11ee-aaca-3860777c1fe6/
    Explore at:
    Dataset updated
    Aug 7, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Oklahoma, Gate
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the Gate household income by age. The dataset can be utilized to understand the age-based income distribution of Gate income.

    Content

    The dataset will have the following datasets when applicable

    Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).

    • Gate, OK annual median income by age groups dataset (in 2022 inflation-adjusted dollars)
    • Age-wise distribution of Gate, OK household incomes: Comparative analysis across 16 income brackets

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Interested in deeper insights and visual analysis?

    Explore our comprehensive data analysis and visual representations for a deeper understanding of Gate income distribution by age. You can refer the same here

  17. Family food datasets

    • gov.uk
    Updated Nov 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Environment, Food & Rural Affairs (2025). Family food datasets [Dataset]. https://www.gov.uk/government/statistical-data-sets/family-food-datasets
    Explore at:
    Dataset updated
    Nov 6, 2025
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department for Environment, Food & Rural Affairs
    Description

    These family food datasets contain more detailed information than the ‘Family Food’ report and mainly provide statistics from 2001 onwards. The UK household purchases and the UK household expenditure spreadsheets include statistics from 1974 onwards. These spreadsheets are updated annually when a new edition of the ‘Family Food’ report is published.

    The ‘purchases’ spreadsheets give the average quantity of food and drink purchased per person per week for each food and drink category. The ‘nutrient intake’ spreadsheets give the average nutrient intake (e.g. energy, carbohydrates, protein, fat, fibre, minerals and vitamins) from food and drink per person per day. The ‘expenditure’ spreadsheets give the average amount spent in pence per person per week on each type of food and drink. Several different breakdowns are provided in addition to the UK averages including figures by region, income, household composition and characteristics of the household reference person.

    UK (updated with new FYE 2024 data)

    countries and regions (CR) (updated with new FYE 2024 data)

    equivalised income decile group (EID) (updated with new FYE 2024 data)

  18. N

    Troy, IN Population Breakdown by Gender Dataset: Male and Female Population...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Troy, IN Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b2585f9c-f25d-11ef-8c1b-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Troy, IN
    Variables measured
    Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Troy by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Troy across both sexes and to determine which sex constitutes the majority.

    Key observations

    There is a majority of female population, with 59.37% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

    Variables / Data Columns

    • Gender: This column displays the Gender (Male / Female)
    • Population: The population of the gender in the Troy is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each gender as a proportion of Troy total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Troy Population by Race & Ethnicity. You can refer the same here

  19. Immigration statistics data tables, year ending June 2020

    • gov.uk
    Updated Aug 27, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Home Office (2020). Immigration statistics data tables, year ending June 2020 [Dataset]. https://www.gov.uk/government/statistical-data-sets/immigration-statistics-data-tables-year-ending-june-2020
    Explore at:
    Dataset updated
    Aug 27, 2020
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Home Office
    Description

    The Home Office has changed the format of the published data tables for a number of areas (asylum and resettlement, entry clearance visas, extensions, citizenship, returns, detention, and sponsorship). These now include summary tables, and more detailed datasets (available on a separate page, link below). A list of all available datasets on a given topic can be found in the ‘Contents’ sheet in the ‘summary’ tables. Information on where to find historic data in the ‘old’ format is in the ‘Notes’ page of the ‘summary’ tables.

    The Home Office intends to make these changes in other areas in the coming publications. If you have any feedback, please email MigrationStatsEnquiries@homeoffice.gov.uk.

    Related content

    Immigration statistics, year ending June 2020
    Immigration Statistics Quarterly Release
    Immigration Statistics User Guide
    Publishing detailed data tables in migration statistics
    Policy and legislative changes affecting migration to the UK: timeline
    Immigration statistics data archives

    Asylum and resettlement

    https://assets.publishing.service.gov.uk/media/5f6cae16e90e077517f05a5f/asylum-summary-jun-2020-tables.xlsx">Asylum and resettlement summary tables, year ending June 2020 (MS Excel Spreadsheet, 121 KB)

    Detailed asylum and resettlement datasets

    Sponsorship

    https://assets.publishing.service.gov.uk/media/5f3bcb1fe90e0732d9008e25/sponsorship-summary-jun-2020-tables.xlsx">Sponsorship summary tables, year ending June 2020 (MS Excel Spreadsheet, 72.4 KB)

    Detailed sponsorship datasets

    Entry clearance visas granted outside the UK

    https://assets.publishing.service.gov.uk/media/5f3bcb678fa8f5173cc5f9ed/visas-summary-jun-2020-tables.xlsx">Entry clearance visas summary tables, year ending June 2020 (MS Excel Spreadsheet, 64.9 KB)

    Detailed entry clearance visas datasets

    Passenger arrivals (admissions)

    https://assets.publishing.service.gov.uk/media/5f3bcbbae90e0732d9008e26/passenger-arrivals-admissions-summary-jun-2020-tables.xlsx">Passenger arrivals (admissions) summary tables, year ending June 2020 (MS Excel Spreadsheet, 76 KB)

    Detailed Passengers initially refused entry at port datasets

    Extensions

    https://assets.publishing.service.gov.uk/media/5f3bcbf18fa8f51747a88061/extentions-summary-jun-2020-tables.xlsx">Extensions summary tables, year ending June 2020 (MS Excel Spreadsheet, 42.9 KB)

    <a href="https://www.gov.uk/government/statistical-data-sets/managed-

  20. N

    Minnesota Population Breakdown by Gender Dataset: Male and Female Population...

    • neilsberg.com
    csv, json
    Updated Feb 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Minnesota Population Breakdown by Gender Dataset: Male and Female Population Distribution // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/b244d8cf-f25d-11ef-8c1b-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Minnesota
    Variables measured
    Male Population, Female Population, Male Population as Percent of Total Population, Female Population as Percent of Total Population
    Measurement technique
    The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the gender classifications (biological sex) reported by the US Census Bureau. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset tabulates the population of Minnesota by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Minnesota across both sexes and to determine which sex constitutes the majority.

    Key observations

    There is a slight majority of male population, with 50.09% of total population being male. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Scope of gender :

    Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.

    Variables / Data Columns

    • Gender: This column displays the Gender (Male / Female)
    • Population: The population of the gender in the Minnesota is shown in this column.
    • % of Total Population: This column displays the percentage distribution of each gender as a proportion of Minnesota total population. Please note that the sum of all percentages may not equal one due to rounding of values.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Minnesota Population by Race & Ethnicity. You can refer the same here

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO (2023). NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python [Dataset]. http://doi.org/10.6084/m9.figshare.21967265.v1
Organization logoOrganization logo

Data from: NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python

Related Article
Explore at:
txtAvailable download formats
Dataset updated
May 30, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Ratnadira Widyasari; Zhou YANG; Ferdian Thung; Sheng Qin Sim; Fiona Wee; Camellia Lok; Jack Phan; Haodi Qi; Constance Tan; Qijin Tay; David LO
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Machine learning (ML) has gained much attention and has been incorporated into our daily lives. While there are numerous publicly available ML projects on open source platforms such as GitHub, there have been limited attempts in filtering those projects to curate ML projects of high quality. The limited availability of such high-quality dataset poses an obstacle to understanding ML projects. To help clear this obstacle, we present NICHE, a manually labelled dataset consisting of 572 ML projects. Based on evidences of good software engineering practices, we label 441 of these projects as engineered and 131 as non-engineered. In this repository we provide "NICHE.csv" file that contains the list of the project names along with their labels, descriptive information for every dimension, and several basic statistics, such as the number of stars and commits. This dataset can help researchers understand the practices that are followed in high-quality ML projects. It can also be used as a benchmark for classifiers designed to identify engineered ML projects.

GitHub page: https://github.com/soarsmu/NICHE

Search
Clear search
Close search
Google apps
Main menu