100+ datasets found
  1. Student Performance Dataset

    • kaggle.com
    Updated Aug 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ghulam Muhammad Nabeel (2025). Student Performance Dataset [Dataset]. https://www.kaggle.com/datasets/nabeelqureshitiii/student-performance-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 27, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ghulam Muhammad Nabeel
    Description

    📊 Student Performance Dataset (Synthetic, Realistic)

    Overview

    This dataset contains 1000000 rows of realistic student performance data, designed for beginners in Machine Learning to practice Linear Regression, model training, and evaluation techniques.

    Each row represents one student with features like study hours, attendance, class participation, and final score.
    The dataset is small, clean, and structured to be beginner-friendly.

    🔑 Columns Description

    • student_id → Unique identifier for each student.
    • weekly_self_study_hours → Average weekly self-study hours (0–40). Generated using a normal distribution centered around 15 hours.
    • attendance_percentage → Attendance percentage (50–100). Simulated with a normal distribution around 85%.
    • class_participation → Score between 0–10 indicating how actively the student participates in class. Generated from a normal distribution centered around 6.
    • total_score → Final performance score (0–100). Calculated as a function of study hours + random noise, then clipped between 0–100. Stronger correlation with study hours.
    • grade → Categorical label (A, B, C, D, F) derived from total_score.

    📐 Data Generation Logic

    1. Weekly Study Hours: Modeled using a normal distribution (mean ≈ 15, std ≈ 7), capped between 0 and 40 hours.
    2. Scores: More study hours → higher score. Formula:

    Random noise simulates differences in learning ability, motivation, etc.

    1. Attendance & Participation: Independent but realistic variations added.
    2. Grades: Assigned from scores using thresholds:
    • A: ≥ 85
    • B: ≥ 70
    • C: ≥ 55
    • D: ≥ 40
    • F: < 40

    🎯 How to Use This Dataset

    Regression Tasks

    • Predict total_score from weekly_self_study_hours.
    • Train and evaluate Linear Regression models.
    • Extend to multiple regression using attendance_percentage and class_participation.

    Classification Tasks

    • Predict grade (A–F) using study hours, attendance, and participation.

    Model Evaluation Practice

    • Apply train-test split and cross-validation.
    • Evaluate with MAE, RMSE, R².
    • Compare simple vs. multiple regression.

    ✅ This dataset is intentionally kept simple, so that new ML learners can clearly see the relationship between input features (study, attendance, participation) and output (score/grade).

  2. T

    1-km monthly mean temperature dataset for china (1901-2023)

    • data.tpdc.ac.cn
    zip
    Updated Jul 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shouzhang PENG (2024). 1-km monthly mean temperature dataset for china (1901-2023) [Dataset]. http://doi.org/10.11888/Meteoro.tpdc.270961
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 18, 2024
    Dataset provided by
    TPDC
    Authors
    Shouzhang PENG
    Area covered
    Description

    This dataset includes the monthly mean temperature data with 0.0083333 arc degree (~1km) for China from Jan 1901 to Dec 2023. The data form belongs to NETCDF, namely .nc file. The unit of the data is 0.1 ℃. The dataset was spatially downscaled from CRU TS v4.02 with WorldClim datasets based on Delta downscaling method. The dataset was evaluated by 496 national weather stations across China, and the evaluation indicated that the downscaled dataset is reliable for the investigations related to climate change across China. The dataset covers the main land area of China, including Hong Kong, Macao and Taiwan regions, and excluding islands and reefs in South China Sea. WGS84 is recommended for data coordinate system.

  3. d

    1971-2000 mean annual precipitation data set for Louisiana StreamStats

    • catalog.data.gov
    • data.usgs.gov
    Updated Sep 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). 1971-2000 mean annual precipitation data set for Louisiana StreamStats [Dataset]. https://catalog.data.gov/dataset/1971-2000-mean-annual-precipitation-data-set-for-louisiana-streamstats
    Explore at:
    Dataset updated
    Sep 13, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Louisiana
    Description

    These data represent mean annual precipitation in the Louisiana StreamStats study area for the period of 1971-2000.

  4. Mean house prices for administrative geographies: HPSSA dataset 12

    • ons.gov.uk
    • cy.ons.gov.uk
    xls
    Updated Sep 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2023). Mean house prices for administrative geographies: HPSSA dataset 12 [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/housing/datasets/meanhousepricefornationalandsubnationalgeographiesquarterlyrollingyearhpssadataset12
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 20, 2023
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Mean price paid for residential property in England and Wales, by property type and administrative geographies. Annual data.

  5. EIGHT COLOR ASTEROID SURVEY MEAN DATA V1.0

    • data.nasa.gov
    • s.cnmilf.com
    • +1more
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). EIGHT COLOR ASTEROID SURVEY MEAN DATA V1.0 [Dataset]. https://data.nasa.gov/dataset/eight-color-asteroid-survey-mean-data-v1-0-1079b
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    The eight color asteroid survey provides reflection spectra for minor planets using eight filter passbands. This dataset includes mean data averaged for each of 589 minor planets. The primary data for these minor planets, the response curves for the filters, and the values determined for standard stars, are included in other related datasets. The wavelength range covered is .33 to 1.04 micrometers.

  6. House Price Regression Dataset

    • kaggle.com
    zip
    Updated Sep 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prokshitha Polemoni (2024). House Price Regression Dataset [Dataset]. https://www.kaggle.com/datasets/prokshitha/home-value-insights
    Explore at:
    zip(27045 bytes)Available download formats
    Dataset updated
    Sep 6, 2024
    Authors
    Prokshitha Polemoni
    Description

    Home Value Insights: A Beginner's Regression Dataset

    This dataset is designed for beginners to practice regression problems, particularly in the context of predicting house prices. It contains 1000 rows, with each row representing a house and various attributes that influence its price. The dataset is well-suited for learning basic to intermediate-level regression modeling techniques.

    Features:

    1. Square_Footage: The size of the house in square feet. Larger homes typically have higher prices.
    2. Num_Bedrooms: The number of bedrooms in the house. More bedrooms generally increase the value of a home.
    3. Num_Bathrooms: The number of bathrooms in the house. Houses with more bathrooms are typically priced higher.
    4. Year_Built: The year the house was built. Older houses may be priced lower due to wear and tear.
    5. Lot_Size: The size of the lot the house is built on, measured in acres. Larger lots tend to add value to a property.
    6. Garage_Size: The number of cars that can fit in the garage. Houses with larger garages are usually more expensive.
    7. Neighborhood_Quality: A rating of the neighborhood’s quality on a scale of 1-10, where 10 indicates a high-quality neighborhood. Better neighborhoods usually command higher prices.
    8. House_Price (Target Variable): The price of the house, which is the dependent variable you aim to predict.

    Potential Uses:

    1. Beginner Regression Projects: This dataset can be used to practice building regression models such as Linear Regression, Decision Trees, or Random Forests. The target variable (house price) is continuous, making this an ideal problem for supervised learning techniques.

    2. Feature Engineering Practice: Learners can create new features by combining existing ones, such as the price per square foot or age of the house, providing an opportunity to experiment with feature transformations.

    3. Exploratory Data Analysis (EDA): You can explore how different features (e.g., square footage, number of bedrooms) correlate with the target variable, making it a great dataset for learning about data visualization and summary statistics.

    4. Model Evaluation: The dataset allows for various model evaluation techniques such as cross-validation, R-squared, and Mean Absolute Error (MAE). These metrics can be used to compare the effectiveness of different models.

    Versatility:

    • The dataset is highly versatile for a range of machine learning tasks. You can apply simple linear models to predict house prices based on one or two features, or use more complex models like Random Forest or Gradient Boosting Machines to understand interactions between variables.

    • It can also be used for dimensionality reduction techniques like PCA or to practice handling categorical variables (e.g., neighborhood quality) through encoding techniques like one-hot encoding.

    • This dataset is ideal for anyone wanting to gain practical experience in building regression models while working with real-world features.

  7. d

    Mean Annual Precipitation in West-Central Nevada using the...

    • catalog.data.gov
    • data.usgs.gov
    • +3more
    Updated Nov 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Mean Annual Precipitation in West-Central Nevada using the Precipitation-Zone Method [Dataset]. https://catalog.data.gov/dataset/mean-annual-precipitation-in-west-central-nevada-using-the-precipitation-zone-method
    Explore at:
    Dataset updated
    Nov 27, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Nevada
    Description

    This data set contains 1971-2000 mean annual precipitation estimates for west-central Nevada. This is a raster data set developed using the precipitation-zone method, which uses elevation-based regression equations to estimate mean annual precipitation for defined precipitation zones (Lopes and Medina, 2007.) This data set is based on the 30-meter National Elevation Dataset. Reference Cited Lopes, T.J., and Medina, R.L., 2007, Precipitation Zones of West-Central Nevada: Journal of Nevada Water Resources Association, v. 4, no 2, p. 21.

  8. N

    Income Distribution by Quintile: Mean Household Income in Park City, UT

    • neilsberg.com
    csv, json
    Updated Jan 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Income Distribution by Quintile: Mean Household Income in Park City, UT [Dataset]. https://www.neilsberg.com/research/datasets/94ddc441-7479-11ee-949f-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Jan 11, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Park City, Utah
    Variables measured
    Income Level, Mean Household Income
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across income quintiles (mentioned above) following an initial analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents the mean household income for each of the five quintiles in Park City, UT, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.

    Key observations

    • Income disparities: The mean income of the lowest quintile (20% of households with the lowest income) is 22,989, while the mean income for the highest quintile (20% of households with the highest income) is 725,204. This indicates that the top earners earn 32 times compared to the lowest earners.
    • *Top 5%: * The mean household income for the wealthiest population (top 5%) is 1,289,541, which is 177.82% higher compared to the highest quintile, and 5609.38% higher compared to the lowest quintile.

    https://i.neilsberg.com/ch/park-city-ut-mean-household-income-by-quintiles.jpeg" alt="Mean household income by quintiles in Park City, UT (in 2022 inflation-adjusted dollars))">

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Income Levels:

    • Lowest Quintile
    • Second Quintile
    • Third Quintile
    • Fourth Quintile
    • Highest Quintile
    • Top 5 Percent

    Variables / Data Columns

    • Income Level: This column showcases the income levels (As mentioned above).
    • Mean Household Income: Mean household income, in 2022 inflation-adjusted dollars for the specific income level.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Park City median household income. You can refer the same here

  9. w

    West Africa Mean Annual Precipitation (CHIRP dataset)

    • data.wu.ac.at
    • data.europa.eu
    wms
    Updated Nov 29, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JRC DataCatalogue (2016). West Africa Mean Annual Precipitation (CHIRP dataset) [Dataset]. https://data.wu.ac.at/odso/drdsi_jrc_ec_europa_eu/YWJlOWQyZWUtY2YwZC00NjYyLTllYjktOGIzNjhjN2I3MTY3
    Explore at:
    wmsAvailable download formats
    Dataset updated
    Nov 29, 2016
    Dataset provided by
    JRC DataCatalogue
    Description

    Mean Annual Precipitation [mm/year] across West Africa using the Climate Hazards Group Infrared Precipitation with Station data (CHIRP) dataset.

  10. Regional weather in Hong Kong – the latest 1-minute mean air temperature |...

    • data.gov.hk
    Updated Dec 23, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.gov.hk (2022). Regional weather in Hong Kong – the latest 1-minute mean air temperature | DATA.GOV.HK [Dataset]. https://data.gov.hk/en-data/dataset/hk-hko-rss-latest-one-minute-mean-air-temp
    Explore at:
    Dataset updated
    Dec 23, 2022
    Dataset provided by
    data.gov.hk
    Area covered
    Hong Kong
    Description

    Provide regional weather in Hong Kong - the latest 1-minute mean air temperature (the data provided is provisional). The multiple file formats are available for datasets download in API.

  11. Data from: BOREAS AFM-06 Mean Temperature Profile Data

    • data.nasa.gov
    • data.globalchange.gov
    • +5more
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). BOREAS AFM-06 Mean Temperature Profile Data [Dataset]. https://data.nasa.gov/dataset/boreas-afm-06-mean-temperature-profile-data-85e49
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    The BOREAS AFM-06 team from the National Oceanic and Atmospheric Administration Environment Technology Laboratory (NOAA/ETL) operated a 915 MHz wind/Radio Acoustic Sounding System (RASS) profiler system in the Southern Study Area (SSA) near the Old Jack Pine (OJP) tower from 21-May-1994 to 20-Sep-1994. The data set provides temperature profiles at 15 heights, containing the variables of virtual temperature, vertical velocity, the speed of sound, and w-bar.

  12. d

    The StreamCat Dataset: Accumulated Attributes for NHDPlusV2 (Version 2.1)...

    • catalog.data.gov
    • gimi9.com
    Updated Feb 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Environmental Protection Agency, Office of Research and Development (ORD), Center for Public Health and Environmental Assessment (CPHEA), Pacific Ecological Systems Division (PESD), (2025). The StreamCat Dataset: Accumulated Attributes for NHDPlusV2 (Version 2.1) Catchments for the Conterminous United States: Reference Stream Temperature Predictions [Dataset]. https://catalog.data.gov/dataset/the-streamcat-dataset-accumulated-attributes-for-nhdplusv2-version-2-1-catchments-for-the--8d7d3
    Explore at:
    Dataset updated
    Feb 4, 2025
    Dataset provided by
    U.S. Environmental Protection Agency, Office of Research and Development (ORD), Center for Public Health and Environmental Assessment (CPHEA), Pacific Ecological Systems Division (PESD),
    Area covered
    Contiguous United States, United States
    Description

    This dataset represents predictions made to individual, local NHDPlusV2 stream segments. Attributes were calculated for every local NHDPlusV2 stream segment. (See Supplementary Info for Glossary of Terms). These predictions were made to provide estimates of reference-condition stream temperatures in support of the 2008-2009 and 2013-2014 (forthcoming) National Rivers and Streams Assessments. These predictions were based on a set of published models (Hill et al. 2013; http://www.journals.uchicago.edu/doi/abs/10.1899/12-009.1). From Hill et al. (2013): "We modeled 3 ecologically important elements of the thermal regime: mean summer, mean winter, and mean annual stream temperature. These models used a set of least-disturbed USGS stations and sites to model stream temperatures from a set of landscape metrics. To build reference-condition models, we used daily mean ST data obtained from several thousand US Geological Survey temperature sites distributed across the conterminous USA and iteratively modeled ST with Random Forests to identify sites in reference condition. These data are summarized to produce local stream segment-level metrics as a continuous data type.

  13. earthquake dataset

    • kaggle.com
    zip
    Updated Jan 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    baki turhan (2025). earthquake dataset [Dataset]. https://www.kaggle.com/datasets/bakiturhan/earthquake-dataset
    Explore at:
    zip(441683 bytes)Available download formats
    Dataset updated
    Jan 1, 2025
    Authors
    baki turhan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This data set is taken from USGS(U.S Geological Survey). The USGS serves the Nation as an independent fact-finding agency that collects, monitors, analyzes, and provides scientific understanding about natural resource and natural hazard conditions, issues, and problems. The value of the USGS to the Nation rests on its ability to carry out studies on a national scale and to sustain long-term monitoring and assessment of natural resources and hazards. For additional information, visit the link.

    https://www.usgs.gov/

    Content

    This dataset contains earthquake data with a magnitude of 4.5+ and an "alert" warning level, recorded between 1976 and 2025. Below is an explanation of the columns included in the dataset:

    • time: The timestamp indicating when the earthquake or event occurred, including the date and time in UTC format.
    • latitude: The geographical latitude of the earthquake's epicenter, measured in degrees.
    • longitude: The geographical longitude of the earthquake's epicenter, measured in degrees.
    • depth: The depth at which the earthquake occurred, typically measured in kilometers below the Earth's surface.
    • mag: The magnitude of the earthquake, representing the energy released by the seismic event. In this case, a value of 8.6 indicates a very large earthquake.
    • magType: The type of magnitude measurement used, such as "mww" (Moment Magnitude Scale), which is a common scale for large earthquakes.
    • nst: The number of stations reporting the earthquake, indicating how many seismic stations detected the event.
    • gap: The azimuthal gap, which refers to the angular distance between the two most distant seismic stations that recorded the earthquake. A smaller gap typically indicates better global coverage.
    • dmin: The minimum distance between the earthquake's epicenter and the nearest seismic station, measured in degrees.
    • rms: The root mean square of the amplitude of the seismic waves, representing the strength of the seismic signal.
    • net: The network identifier for the seismic station or data source that reported the earthquake.
    • id: A unique identifier for the earthquake event.
    • updated: The timestamp indicating when the earthquake data was last updated or reviewed.
    • place: The location or region where the earthquake occurred, often including the name of the area or nearby landmarks.
    • type: The type of event, such as "volcanic eruption" or "earthquake."
    • horizontalError: The error associated with the latitude and longitude coordinates of the epicenter, typically measured in kilometers.
    • depthError: The error associated with the depth measurement of the earthquake, typically measured in kilometers.
    • magError: The error associated with the magnitude measurement of the earthquake, representing the uncertainty in the reported magnitude.
    • magNst: The number of stations that contributed to the magnitude estimation.
    • status: The status of the earthquake event, such as "reviewed" or "automatic," indicating whether the data has been verified.
    • locationSource: The source of the location data for the earthquake, such as the seismic network or organization that provided the coordinates.
    • magSource: The source of the magnitude data, such as the network or organization that calculated the magnitude.
    • Alert: The alert level issued for the earthquake, such as "yellow," indicating the severity of the event and the potential for impact or danger.

    Acknowledgements

    Real Time Feeds(Spreadsheet format): courtesy of the U.S. Geological Survey

    Credit: U.S. Geological Survey

    Department of the Interior/USGS

    https://www.usgs.gov/information-policies-and-instructions/copyrights-and-credits

  14. N

    Income Distribution by Quintile: Mean Household Income in Key West, FL

    • neilsberg.com
    csv, json
    Updated Jan 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Income Distribution by Quintile: Mean Household Income in Key West, FL [Dataset]. https://www.neilsberg.com/research/datasets/94b01938-7479-11ee-949f-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Jan 11, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Florida, Key West
    Variables measured
    Income Level, Mean Household Income
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across income quintiles (mentioned above) following an initial analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents the mean household income for each of the five quintiles in Key West, FL, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.

    Key observations

    • Income disparities: The mean income of the lowest quintile (20% of households with the lowest income) is 20,685, while the mean income for the highest quintile (20% of households with the highest income) is 351,156. This indicates that the top earners earn 17 times compared to the lowest earners.
    • *Top 5%: * The mean household income for the wealthiest population (top 5%) is 730,255, which is 207.96% higher compared to the highest quintile, and 3530.36% higher compared to the lowest quintile.

    https://i.neilsberg.com/ch/key-west-fl-mean-household-income-by-quintiles.jpeg" alt="Mean household income by quintiles in Key West, FL (in 2022 inflation-adjusted dollars))">

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Income Levels:

    • Lowest Quintile
    • Second Quintile
    • Third Quintile
    • Fourth Quintile
    • Highest Quintile
    • Top 5 Percent

    Variables / Data Columns

    • Income Level: This column showcases the income levels (As mentioned above).
    • Mean Household Income: Mean household income, in 2022 inflation-adjusted dollars for the specific income level.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Key West median household income. You can refer the same here

  15. n

    Reconstructed Global Mean Sea Level 1900-2018

    • podaac.jpl.nasa.gov
    • cmr.earthdata.nasa.gov
    • +1more
    html
    Updated Aug 14, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PO.DAAC (2020). Reconstructed Global Mean Sea Level 1900-2018 [Dataset]. http://doi.org/10.5067/GMSLT-FJPL1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Aug 14, 2020
    Dataset provided by
    PO.DAAC
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    SEA SURFACE HEIGHT
    Description

    This dataset contains reconstructed global-mean sea level evolution and the estimated contributing processes over 1900-2018. Reconstructed sea level is based on annual-mean tide-gauge observations and uses the virtual-station method to aggregate the individual observations into a global estimate. The contributing processes consist of thermosteric changes, glacier mass changes, mass changes of the Greenland and Antarctic Ice Sheet, and terrestrial water storage changes. The glacier, ice sheet, and terrestrial water storage are estimated by combining GRACE observations (2003-2018) with long-term estimates from in-situ observations and models. Steric estimates are based on in-situ temperature profiles. The upper- and lower bound represent the 5 and 95 percent confidence level. The numbers are equal to the ones presented in Frederikse et al. The causes of sea-level rise since 1900, Nature, 2020.This dataset was produced by the Heat and Ocean Mass from Gravity ESDR (HOMAGE) project, with funding from MeASUREs-2017. HOMAGE is combining satellite observations to create a set of ESDRs that provide a homogeneous basis for accurate and current quantification of the planetary sea level budget, ocean heat content, and large-scale ocean transport variations.

  16. N

    Income Distribution by Quintile: Mean Household Income in Lake City, MI

    • neilsberg.com
    csv, json
    Updated Jan 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Income Distribution by Quintile: Mean Household Income in Lake City, MI [Dataset]. https://www.neilsberg.com/research/datasets/94b36d77-7479-11ee-949f-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Jan 11, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Lake City, Michigan
    Variables measured
    Income Level, Mean Household Income
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across income quintiles (mentioned above) following an initial analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents the mean household income for each of the five quintiles in Lake City, MI, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.

    Key observations

    • Income disparities: The mean income of the lowest quintile (20% of households with the lowest income) is 17,899, while the mean income for the highest quintile (20% of households with the highest income) is 161,779. This indicates that the top earners earn 9 times compared to the lowest earners.
    • *Top 5%: * The mean household income for the wealthiest population (top 5%) is 235,068, which is 145.30% higher compared to the highest quintile, and 1313.30% higher compared to the lowest quintile.

    https://i.neilsberg.com/ch/lake-city-mi-mean-household-income-by-quintiles.jpeg" alt="Mean household income by quintiles in Lake City, MI (in 2022 inflation-adjusted dollars))">

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Income Levels:

    • Lowest Quintile
    • Second Quintile
    • Third Quintile
    • Fourth Quintile
    • Highest Quintile
    • Top 5 Percent

    Variables / Data Columns

    • Income Level: This column showcases the income levels (As mentioned above).
    • Mean Household Income: Mean household income, in 2022 inflation-adjusted dollars for the specific income level.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Lake City median household income. You can refer the same here

  17. Monthly Mean Temperature Data for Major US Cities

    • kaggle.com
    zip
    Updated Mar 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Garrick Hague (2023). Monthly Mean Temperature Data for Major US Cities [Dataset]. https://www.kaggle.com/datasets/garrickhague/temp-data-of-prominent-us-cities-from-1948-to-2022
    Explore at:
    zip(93354 bytes)Available download formats
    Dataset updated
    Mar 12, 2023
    Authors
    Garrick Hague
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    The monthly mean temperature data presented in this dataset was obtained from the Climate Prediction Center (CPC) Global Land Surface Air Temperature Analysis, which was loaded into Python using xarray. The data was then filtered to include only the latitude and longitude coordinates corresponding to each city in the dataset. In order to select the nearest location to each city, the 'select' method with the nearest point was used, resulting in temperature data that may not be exactly at the city location. The data is presented on a 0.5x0.5 degree grid across the globe.

    The temperature data provides a valuable resource for time series analysis, and if you are interested in obtaining temperature data for additional cities, please let me know. I will also be sharing the source code on GitHub for anyone who would like to reproduce the data or analysis.

  18. N

    Income Distribution by Quintile: Mean Household Income in Hope, New York

    • neilsberg.com
    csv, json
    Updated Jan 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Income Distribution by Quintile: Mean Household Income in Hope, New York [Dataset]. https://www.neilsberg.com/research/datasets/94a6beef-7479-11ee-949f-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Jan 11, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Hope
    Variables measured
    Income Level, Mean Household Income
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across income quintiles (mentioned above) following an initial analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents the mean household income for each of the five quintiles in Hope, New York, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.

    Key observations

    • Income disparities: The mean income of the lowest quintile (20% of households with the lowest income) is 19,893, while the mean income for the highest quintile (20% of households with the highest income) is 147,661. This indicates that the top earners earn 7 times compared to the lowest earners.
    • *Top 5%: * The mean household income for the wealthiest population (top 5%) is 211,499, which is 143.23% higher compared to the highest quintile, and 1063.18% higher compared to the lowest quintile.

    https://i.neilsberg.com/ch/hope-ny-mean-household-income-by-quintiles.jpeg" alt="Mean household income by quintiles in Hope, New York (in 2022 inflation-adjusted dollars))">

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Income Levels:

    • Lowest Quintile
    • Second Quintile
    • Third Quintile
    • Fourth Quintile
    • Highest Quintile
    • Top 5 Percent

    Variables / Data Columns

    • Income Level: This column showcases the income levels (As mentioned above).
    • Mean Household Income: Mean household income, in 2022 inflation-adjusted dollars for the specific income level.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Hope town median household income. You can refer the same here

  19. N

    Income Distribution by Quintile: Mean Household Income in Central City, PA

    • neilsberg.com
    csv, json
    Updated Jan 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Income Distribution by Quintile: Mean Household Income in Central City, PA [Dataset]. https://www.neilsberg.com/research/datasets/9471051c-7479-11ee-949f-3860777c1fe6/
    Explore at:
    csv, jsonAvailable download formats
    Dataset updated
    Jan 11, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Pennsylvania, Central City
    Variables measured
    Income Level, Mean Household Income
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across income quintiles (mentioned above) following an initial analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents the mean household income for each of the five quintiles in Central City, PA, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.

    Key observations

    • Income disparities: The mean income of the lowest quintile (20% of households with the lowest income) is 11,912, while the mean income for the highest quintile (20% of households with the highest income) is 122,542. This indicates that the top earners earn 10 times compared to the lowest earners.
    • *Top 5%: * The mean household income for the wealthiest population (top 5%) is 163,453, which is 133.39% higher compared to the highest quintile, and 1372.17% higher compared to the lowest quintile.

    https://i.neilsberg.com/ch/central-city-pa-mean-household-income-by-quintiles.jpeg" alt="Mean household income by quintiles in Central City, PA (in 2022 inflation-adjusted dollars))">

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Income Levels:

    • Lowest Quintile
    • Second Quintile
    • Third Quintile
    • Fourth Quintile
    • Highest Quintile
    • Top 5 Percent

    Variables / Data Columns

    • Income Level: This column showcases the income levels (As mentioned above).
    • Mean Household Income: Mean household income, in 2022 inflation-adjusted dollars for the specific income level.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Central City median household income. You can refer the same here

  20. Kokoro Speech Dataset v1.1 Tiny

    • kaggle.com
    zip
    Updated May 14, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katsuya Iida (2021). Kokoro Speech Dataset v1.1 Tiny [Dataset]. https://www.kaggle.com/datasets/kaiida/kokoro-speech-dataset-v11-tiny
    Explore at:
    zip(48156884 bytes)Available download formats
    Dataset updated
    May 14, 2021
    Authors
    Katsuya Iida
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Kokoro Speech Dataset

    Kokoro Speech Dataset is a public domain Japanese speech dataset. It contains 34,958 short audio clips of a single speaker reading 9 novel books. The format of the metadata is similar to that of LJ Speech so that the dataset is compatible with modern speech synthesis systems.

    The texts are from Aozora Bunko, which is in the public domain. The audio clips are from LibriVox project, which is also in the public domain. Readings are estimated by MeCab and UniDic Lite from kanji-kana mixture text. Readings are romanized which are similar to the format used by Julius.

    The audio clips were split and transcripts were aligned automatically by Voice100.

    Sample data

    Listen from your browser or download randomly sampled 100 clips.

    File Format

    Metadata is provided in metadata.csv. This file consists of one record per line, delimited by the pipe character (0x7c). The fields are:

    • ID: this is the name of the corresponding .wav file
    • Transcription: Kanji-kana mixture text spoken by the reader (UTF-8)
    • Reading: Romanized text spoken by the reader (UTF-8)

    Each audio file is a single-channel 16-bit PCM WAV with a sample rate of 22050 Hz.

    Statistics

    The dataset is provided in different sizes, large, small, tiny. small and tiny don't share same clips. large contains all available clips, including small and tiny.

    Large:
    Total clips: 34958
    Min duration: 3.007 secs
    Max duration: 14.745 secs
    Mean duration: 4.978 secs
    Total duration: 48:20:24
    
    Small:
    Total clips: 8812
    Min duration: 3.007 secs
    Max duration: 14.431 secs
    Mean duration: 4.951 secs
    Total duration: 12:07:12
    
    Tiny:
    Total clips: 285
    Min duration: 3.019 secs
    Max duration: 9.462 secs
    Mean duration: 4.871 secs
    Total duration: 00:23:08
    

    How to get the data

    Because of its large data size of the dataset, audio files are not included in this repository, but the metadata is included.

    To make .wav files of the dataset, run

    $ bash download.sh
    

    to download the metadata from the project page. Then run

    $ pip3 install torchaudio
    $ python3 extract.py --size tiny
    

    This prints a shell script example to download MP3 audio files from archive.org and extract them if you haven't done it already.

    After doing so, run the command again

    $ python3 extract.py --size tiny
    

    to get files for tiny under ./output directory.

    You can give another size name to the --size option to get dataset of the size.

    Pretrained Tacotron model

    Pretrained Tacotron model trained with Kokoro Speech Dataset and audio samples are available. The model was trained for 21K steps with small. According to the above repo, "Speech started to become intelligible around 20K steps" with LJ Speech Dataset. Audio samples read the first few sentences from Gon Gitsune which is not included in small.

    Books

    The dataset contains recordings from these books read by ekzemplaro

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ghulam Muhammad Nabeel (2025). Student Performance Dataset [Dataset]. https://www.kaggle.com/datasets/nabeelqureshitiii/student-performance-dataset
Organization logo

Student Performance Dataset

A generic data for ML Beginners

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 27, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ghulam Muhammad Nabeel
Description

📊 Student Performance Dataset (Synthetic, Realistic)

Overview

This dataset contains 1000000 rows of realistic student performance data, designed for beginners in Machine Learning to practice Linear Regression, model training, and evaluation techniques.

Each row represents one student with features like study hours, attendance, class participation, and final score.
The dataset is small, clean, and structured to be beginner-friendly.

🔑 Columns Description

  • student_id → Unique identifier for each student.
  • weekly_self_study_hours → Average weekly self-study hours (0–40). Generated using a normal distribution centered around 15 hours.
  • attendance_percentage → Attendance percentage (50–100). Simulated with a normal distribution around 85%.
  • class_participation → Score between 0–10 indicating how actively the student participates in class. Generated from a normal distribution centered around 6.
  • total_score → Final performance score (0–100). Calculated as a function of study hours + random noise, then clipped between 0–100. Stronger correlation with study hours.
  • grade → Categorical label (A, B, C, D, F) derived from total_score.

📐 Data Generation Logic

  1. Weekly Study Hours: Modeled using a normal distribution (mean ≈ 15, std ≈ 7), capped between 0 and 40 hours.
  2. Scores: More study hours → higher score. Formula:

Random noise simulates differences in learning ability, motivation, etc.

  1. Attendance & Participation: Independent but realistic variations added.
  2. Grades: Assigned from scores using thresholds:
  • A: ≥ 85
  • B: ≥ 70
  • C: ≥ 55
  • D: ≥ 40
  • F: < 40

🎯 How to Use This Dataset

Regression Tasks

  • Predict total_score from weekly_self_study_hours.
  • Train and evaluate Linear Regression models.
  • Extend to multiple regression using attendance_percentage and class_participation.

Classification Tasks

  • Predict grade (A–F) using study hours, attendance, and participation.

Model Evaluation Practice

  • Apply train-test split and cross-validation.
  • Evaluate with MAE, RMSE, R².
  • Compare simple vs. multiple regression.

✅ This dataset is intentionally kept simple, so that new ML learners can clearly see the relationship between input features (study, attendance, participation) and output (score/grade).

Search
Clear search
Close search
Google apps
Main menu