Facebook
TwitterThis dataset contains 1000000 rows of realistic student performance data, designed for beginners in Machine Learning to practice Linear Regression, model training, and evaluation techniques.
Each row represents one student with features like study hours, attendance, class participation, and final score.
The dataset is small, clean, and structured to be beginner-friendly.
Random noise simulates differences in learning ability, motivation, etc.
Regression Tasks
total_score from weekly_self_study_hours. attendance_percentage and class_participation. Classification Tasks
grade (A–F) using study hours, attendance, and participation. Model Evaluation Practice
✅ This dataset is intentionally kept simple, so that new ML learners can clearly see the relationship between input features (study, attendance, participation) and output (score/grade).
Facebook
TwitterThis dataset includes the monthly mean temperature data with 0.0083333 arc degree (~1km) for China from Jan 1901 to Dec 2023. The data form belongs to NETCDF, namely .nc file. The unit of the data is 0.1 ℃. The dataset was spatially downscaled from CRU TS v4.02 with WorldClim datasets based on Delta downscaling method. The dataset was evaluated by 496 national weather stations across China, and the evaluation indicated that the downscaled dataset is reliable for the investigations related to climate change across China. The dataset covers the main land area of China, including Hong Kong, Macao and Taiwan regions, and excluding islands and reefs in South China Sea. WGS84 is recommended for data coordinate system.
Facebook
TwitterThese data represent mean annual precipitation in the Louisiana StreamStats study area for the period of 1971-2000.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Mean price paid for residential property in England and Wales, by property type and administrative geographies. Annual data.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The eight color asteroid survey provides reflection spectra for minor planets using eight filter passbands. This dataset includes mean data averaged for each of 589 minor planets. The primary data for these minor planets, the response curves for the filters, and the values determined for standard stars, are included in other related datasets. The wavelength range covered is .33 to 1.04 micrometers.
Facebook
TwitterThis dataset is designed for beginners to practice regression problems, particularly in the context of predicting house prices. It contains 1000 rows, with each row representing a house and various attributes that influence its price. The dataset is well-suited for learning basic to intermediate-level regression modeling techniques.
Beginner Regression Projects: This dataset can be used to practice building regression models such as Linear Regression, Decision Trees, or Random Forests. The target variable (house price) is continuous, making this an ideal problem for supervised learning techniques.
Feature Engineering Practice: Learners can create new features by combining existing ones, such as the price per square foot or age of the house, providing an opportunity to experiment with feature transformations.
Exploratory Data Analysis (EDA): You can explore how different features (e.g., square footage, number of bedrooms) correlate with the target variable, making it a great dataset for learning about data visualization and summary statistics.
Model Evaluation: The dataset allows for various model evaluation techniques such as cross-validation, R-squared, and Mean Absolute Error (MAE). These metrics can be used to compare the effectiveness of different models.
The dataset is highly versatile for a range of machine learning tasks. You can apply simple linear models to predict house prices based on one or two features, or use more complex models like Random Forest or Gradient Boosting Machines to understand interactions between variables.
It can also be used for dimensionality reduction techniques like PCA or to practice handling categorical variables (e.g., neighborhood quality) through encoding techniques like one-hot encoding.
This dataset is ideal for anyone wanting to gain practical experience in building regression models while working with real-world features.
Facebook
TwitterThis data set contains 1971-2000 mean annual precipitation estimates for west-central Nevada. This is a raster data set developed using the precipitation-zone method, which uses elevation-based regression equations to estimate mean annual precipitation for defined precipitation zones (Lopes and Medina, 2007.) This data set is based on the 30-meter National Elevation Dataset. Reference Cited Lopes, T.J., and Medina, R.L., 2007, Precipitation Zones of West-Central Nevada: Journal of Nevada Water Resources Association, v. 4, no 2, p. 21.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the mean household income for each of the five quintiles in Park City, UT, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.
Key observations
https://i.neilsberg.com/ch/park-city-ut-mean-household-income-by-quintiles.jpeg" alt="Mean household income by quintiles in Park City, UT (in 2022 inflation-adjusted dollars))">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Income Levels:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Park City median household income. You can refer the same here
Facebook
TwitterMean Annual Precipitation [mm/year] across West Africa using the Climate Hazards Group Infrared Precipitation with Station data (CHIRP) dataset.
Facebook
TwitterProvide regional weather in Hong Kong - the latest 1-minute mean air temperature (the data provided is provisional). The multiple file formats are available for datasets download in API.
Facebook
TwitterThe BOREAS AFM-06 team from the National Oceanic and Atmospheric Administration Environment Technology Laboratory (NOAA/ETL) operated a 915 MHz wind/Radio Acoustic Sounding System (RASS) profiler system in the Southern Study Area (SSA) near the Old Jack Pine (OJP) tower from 21-May-1994 to 20-Sep-1994. The data set provides temperature profiles at 15 heights, containing the variables of virtual temperature, vertical velocity, the speed of sound, and w-bar.
Facebook
TwitterThis dataset represents predictions made to individual, local NHDPlusV2 stream segments. Attributes were calculated for every local NHDPlusV2 stream segment. (See Supplementary Info for Glossary of Terms). These predictions were made to provide estimates of reference-condition stream temperatures in support of the 2008-2009 and 2013-2014 (forthcoming) National Rivers and Streams Assessments. These predictions were based on a set of published models (Hill et al. 2013; http://www.journals.uchicago.edu/doi/abs/10.1899/12-009.1). From Hill et al. (2013): "We modeled 3 ecologically important elements of the thermal regime: mean summer, mean winter, and mean annual stream temperature. These models used a set of least-disturbed USGS stations and sites to model stream temperatures from a set of landscape metrics. To build reference-condition models, we used daily mean ST data obtained from several thousand US Geological Survey temperature sites distributed across the conterminous USA and iteratively modeled ST with Random Forests to identify sites in reference condition. These data are summarized to produce local stream segment-level metrics as a continuous data type.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context
This data set is taken from USGS(U.S Geological Survey). The USGS serves the Nation as an independent fact-finding agency that collects, monitors, analyzes, and provides scientific understanding about natural resource and natural hazard conditions, issues, and problems. The value of the USGS to the Nation rests on its ability to carry out studies on a national scale and to sustain long-term monitoring and assessment of natural resources and hazards. For additional information, visit the link.
Content
This dataset contains earthquake data with a magnitude of 4.5+ and an "alert" warning level, recorded between 1976 and 2025. Below is an explanation of the columns included in the dataset:
Acknowledgements
Real Time Feeds(Spreadsheet format): courtesy of the U.S. Geological Survey
Credit: U.S. Geological Survey
Department of the Interior/USGS
https://www.usgs.gov/information-policies-and-instructions/copyrights-and-credits
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the mean household income for each of the five quintiles in Key West, FL, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.
Key observations
https://i.neilsberg.com/ch/key-west-fl-mean-household-income-by-quintiles.jpeg" alt="Mean household income by quintiles in Key West, FL (in 2022 inflation-adjusted dollars))">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Income Levels:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Key West median household income. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains reconstructed global-mean sea level evolution and the estimated contributing processes over 1900-2018. Reconstructed sea level is based on annual-mean tide-gauge observations and uses the virtual-station method to aggregate the individual observations into a global estimate. The contributing processes consist of thermosteric changes, glacier mass changes, mass changes of the Greenland and Antarctic Ice Sheet, and terrestrial water storage changes. The glacier, ice sheet, and terrestrial water storage are estimated by combining GRACE observations (2003-2018) with long-term estimates from in-situ observations and models. Steric estimates are based on in-situ temperature profiles. The upper- and lower bound represent the 5 and 95 percent confidence level. The numbers are equal to the ones presented in Frederikse et al. The causes of sea-level rise since 1900, Nature, 2020.This dataset was produced by the Heat and Ocean Mass from Gravity ESDR (HOMAGE) project, with funding from MeASUREs-2017. HOMAGE is combining satellite observations to create a set of ESDRs that provide a homogeneous basis for accurate and current quantification of the planetary sea level budget, ocean heat content, and large-scale ocean transport variations.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the mean household income for each of the five quintiles in Lake City, MI, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.
Key observations
https://i.neilsberg.com/ch/lake-city-mi-mean-household-income-by-quintiles.jpeg" alt="Mean household income by quintiles in Lake City, MI (in 2022 inflation-adjusted dollars))">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Income Levels:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Lake City median household income. You can refer the same here
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The monthly mean temperature data presented in this dataset was obtained from the Climate Prediction Center (CPC) Global Land Surface Air Temperature Analysis, which was loaded into Python using xarray. The data was then filtered to include only the latitude and longitude coordinates corresponding to each city in the dataset. In order to select the nearest location to each city, the 'select' method with the nearest point was used, resulting in temperature data that may not be exactly at the city location. The data is presented on a 0.5x0.5 degree grid across the globe.
The temperature data provides a valuable resource for time series analysis, and if you are interested in obtaining temperature data for additional cities, please let me know. I will also be sharing the source code on GitHub for anyone who would like to reproduce the data or analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the mean household income for each of the five quintiles in Hope, New York, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.
Key observations
https://i.neilsberg.com/ch/hope-ny-mean-household-income-by-quintiles.jpeg" alt="Mean household income by quintiles in Hope, New York (in 2022 inflation-adjusted dollars))">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Income Levels:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Hope town median household income. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the mean household income for each of the five quintiles in Central City, PA, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.
Key observations
https://i.neilsberg.com/ch/central-city-pa-mean-household-income-by-quintiles.jpeg" alt="Mean household income by quintiles in Central City, PA (in 2022 inflation-adjusted dollars))">
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Income Levels:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Central City median household income. You can refer the same here
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Kokoro Speech Dataset is a public domain Japanese speech dataset. It contains 34,958 short audio clips of a single speaker reading 9 novel books. The format of the metadata is similar to that of LJ Speech so that the dataset is compatible with modern speech synthesis systems.
The texts are from Aozora Bunko, which is in the public domain. The audio clips are from LibriVox project, which is also in the public domain. Readings are estimated by MeCab and UniDic Lite from kanji-kana mixture text. Readings are romanized which are similar to the format used by Julius.
The audio clips were split and transcripts were aligned automatically by Voice100.
Listen from your browser or download randomly sampled 100 clips.
Metadata is provided in metadata.csv. This file consists of one record per line,
delimited by the pipe character (0x7c). The fields are:
Each audio file is a single-channel 16-bit PCM WAV with a sample rate of 22050 Hz.
The dataset is provided in different sizes, large, small, tiny. small and tiny
don't share same clips. large contains all available clips, including small and tiny.
Large:
Total clips: 34958
Min duration: 3.007 secs
Max duration: 14.745 secs
Mean duration: 4.978 secs
Total duration: 48:20:24
Small:
Total clips: 8812
Min duration: 3.007 secs
Max duration: 14.431 secs
Mean duration: 4.951 secs
Total duration: 12:07:12
Tiny:
Total clips: 285
Min duration: 3.019 secs
Max duration: 9.462 secs
Mean duration: 4.871 secs
Total duration: 00:23:08
Because of its large data size of the dataset, audio files are not included in this repository, but the metadata is included.
To make .wav files of the dataset, run
$ bash download.sh
to download the metadata from the project page. Then run
$ pip3 install torchaudio
$ python3 extract.py --size tiny
This prints a shell script example to download MP3 audio files from archive.org and extract them if you haven't done it already.
After doing so, run the command again
$ python3 extract.py --size tiny
to get files for tiny under ./output directory.
You can give another size name to the --size option to get
dataset of the size.
Pretrained Tacotron
model trained with Kokoro Speech Dataset
and audio samples are available.
The model was trained for 21K steps with small.
According to the above repo,
"Speech started to become intelligible around 20K steps" with
LJ Speech Dataset.
Audio samples read the first few sentences from Gon Gitsune
which is not included in small.
The dataset contains recordings from these books read by ekzemplaro
Facebook
TwitterThis dataset contains 1000000 rows of realistic student performance data, designed for beginners in Machine Learning to practice Linear Regression, model training, and evaluation techniques.
Each row represents one student with features like study hours, attendance, class participation, and final score.
The dataset is small, clean, and structured to be beginner-friendly.
Random noise simulates differences in learning ability, motivation, etc.
Regression Tasks
total_score from weekly_self_study_hours. attendance_percentage and class_participation. Classification Tasks
grade (A–F) using study hours, attendance, and participation. Model Evaluation Practice
✅ This dataset is intentionally kept simple, so that new ML learners can clearly see the relationship between input features (study, attendance, participation) and output (score/grade).