100+ datasets found
  1. Weather and Housing in North America

    • kaggle.com
    zip
    Updated Feb 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Weather and Housing in North America [Dataset]. https://www.kaggle.com/datasets/thedevastator/weather-and-housing-in-north-america
    Explore at:
    zip(512280 bytes)Available download formats
    Dataset updated
    Feb 13, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    North America
    Description

    Weather and Housing in North America

    Exploring the Relationship between Weather and Housing Conditions in 2012

    By [source]

    About this dataset

    This comprehensive dataset explores the relationship between housing and weather conditions across North America in 2012. Through a range of climate variables such as temperature, wind speed, humidity, pressure and visibility it provides unique insights into the weather-influenced environment of numerous regions. The interrelated nature of housing parameters such as longitude, latitude, median income, median house value and ocean proximity further enhances our understanding of how distinct climates play an integral part in area real estate valuations. Analyzing these two data sets offers a wealth of knowledge when it comes to understanding what factors can dictate the value and comfort level offered by residential areas throughout North America

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • ๐Ÿšจ Your notebook can be here! ๐Ÿšจ!

    How to use the dataset

    This dataset offers plenty of insights into the effects of weather and housing on North American regions. To explore these relationships, you can perform data analysis on the variables provided.

    First, start by examining descriptive statistics (i.e., mean, median, mode). This can help show you the general trend and distribution of each variable in this dataset. For example, what is the most common temperature in a given region? What is the average wind speed? How does this vary across different regions? By looking at descriptive statistics, you can get an initial idea of how various weather conditions and housing attributes interact with one another.

    Next, explore correlations between variables. Are certain weather variables correlated with specific housing attributes? Is there a link between wind speeds and median house value? Or between humidity and ocean proximity? Analyzing correlations allows for deeper insights into how different aspects may influence one another for a given region or area. These correlations may also inform broader patterns that are present across multiple North American regions or countries.

    Finally, use visualizations to further investigate this relationship between climate and housing attributes in North America in 2012. Graphs allow you visualize trends like seasonal variations or long-term changes over time more easily so they are useful when interpreting large amounts of data quickly while providing larger context beyond what numbers alone can tell us about relationships between different aspects within this dataset

    Research Ideas

    • Analyzing the effect of climate change on housing markets across North America. By looking at temperature and weather trends in combination with housing values, researchers can better understand how climate change may be impacting certain regions differently than others.
    • Investigating the relationship between median income, house values and ocean proximity in coastal areas. Understanding how ocean proximity plays into housing prices may help inform real estate investment decisions and urban planning initiatives related to coastal development.
    • Utilizing differences in weather patterns across different climates to determine optimal seasonal rental prices for property owners. By analyzing changes in temperature, wind speed, humidity, pressure and visibility from season to season an investor could gain valuable insights into seasonal market trends to maximize their profits from rentals or Airbnb listings over time

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: Weather.csv | Column name | Description | |:---------------------|:-----------------------------------------------| | Date/Time | Date and time of the observation. (Date/Time) | | Temp_C | Temperature in Celsius. (Numeric) | | Dew Point Temp_C | Dew point temperature in Celsius. (Numeric) | | Rel Hum_% | Relative humidity in percent. (Numeric) | | Wind Speed_km/h | Wind speed in kilometers per hour. (Numeric) | | Visibility_km | Visibilit...

  2. Simulation Data Set

    • catalog.data.gov
    • s.cnmilf.com
    Updated Nov 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Simulation Data Set [Dataset]. https://catalog.data.gov/dataset/simulation-data-set
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: File format: R workspace file; โ€œSimulated_Dataset.RDataโ€. Metadata (including data dictionary) โ€ข y: Vector of binary responses (1: adverse outcome, 0: control) โ€ข x: Matrix of covariates; one row for each simulated individual โ€ข z: Matrix of standardized pollution exposures โ€ข n: Number of simulated individuals โ€ข m: Number of exposure time periods (e.g., weeks of pregnancy) โ€ข p: Number of columns in the covariate design matrix โ€ข alpha_true: Vector of โ€œtrueโ€ critical window locations/magnitudes (i.e., the ground truth that we want to estimate) Code Abstract We provide R statistical software code (โ€œCWVS_LMC.txtโ€) to fit the linear model of coregionalization (LMC) version of the Critical Window Variable Selection (CWVS) method developed in the manuscript. We also provide R code (โ€œResults_Summary.txtโ€) to summarize/plot the estimated critical windows and posterior marginal inclusion probabilities. Description โ€œCWVS_LMC.txtโ€: This code is delivered to the user in the form of a .txt file that contains R statistical software code. Once the โ€œSimulated_Dataset.RDataโ€ workspace has been loaded into R, the text in the file can be used to identify/estimate critical windows of susceptibility and posterior marginal inclusion probabilities. โ€œResults_Summary.txtโ€: This code is also delivered to the user in the form of a .txt file that contains R statistical software code. Once the โ€œCWVS_LMC.txtโ€ code is applied to the simulated dataset and the program has completed, this code can be used to summarize and plot the identified/estimated critical windows and posterior marginal inclusion probabilities (similar to the plots shown in the manuscript). Optional Information (complete as necessary) Required R packages: โ€ข For running โ€œCWVS_LMC.txtโ€: โ€ข msm: Sampling from the truncated normal distribution โ€ข mnormt: Sampling from the multivariate normal distribution โ€ข BayesLogit: Sampling from the Polya-Gamma distribution โ€ข For running โ€œResults_Summary.txtโ€: โ€ข plotrix: Plotting the posterior means and credible intervals Instructions for Use Reproducibility (Mandatory) What can be reproduced: The data and code can be used to identify/estimate critical windows from one of the actual simulated datasets generated under setting E4 from the presented simulation study. How to use the information: โ€ข Load the โ€œSimulated_Dataset.RDataโ€ workspace โ€ข Run the code contained in โ€œCWVS_LMC.txtโ€ โ€ข Once the โ€œCWVS_LMC.txtโ€ code is complete, run โ€œResults_Summary.txtโ€. Format: Below is the replication procedure for the attached data set for the portion of the analyses using a simulated data set: Data The data used in the application section of the manuscript consist of geocoded birth records from the North Carolina State Center for Health Statistics, 2005-2008. In the simulation study section of the manuscript, we simulate synthetic data that closely match some of the key features of the birth certificate data while maintaining confidentiality of any actual pregnant women. Availability Due to the highly sensitive and identifying information contained in the birth certificate data (including latitude/longitude and address of residence at delivery), we are unable to make the data from the application section publically available. However, we will make one of the simulated datasets available for any reader interested in applying the method to realistic simulated birth records data. This will also allow the user to become familiar with the required inputs of the model, how the data should be structured, and what type of output is obtained. While we cannot provide the application data here, access to the North Carolina birth records can be requested through the North Carolina State Center for Health Statistics, and requires an appropriate data use agreement. Description Permissions: These are simulated data without any identifying information or informative birth-level covariates. We also standardize the pollution exposures on each week by subtracting off the median exposure amount on a given week and dividing by the interquartile range (IQR) (as in the actual application to the true NC birth records data). The dataset that we provide includes weekly average pregnancy exposures that have already been standardized in this way while the medians and IQRs are not given. This further protects identifiability of the spatial locations used in the analysis. This dataset is associated with the following publication: Warren, J., W. Kong, T. Luben, and H. Chang. Critical Window Variable Selection: Estimating the Impact of Air Pollution on Very Preterm Birth. Biostatistics. Oxford University Press, OXFORD, UK, 1-30, (2019).

  3. N

    Income Distribution by Quintile: Mean Household Income in Lake View, AL

    • neilsberg.com
    csv, json
    Updated Jan 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Income Distribution by Quintile: Mean Household Income in Lake View, AL [Dataset]. https://www.neilsberg.com/research/datasets/94b43a8f-7479-11ee-949f-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Jan 11, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Lake View, Alabama
    Variables measured
    Income Level, Mean Household Income
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across income quintiles (mentioned above) following an initial analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents the mean household income for each of the five quintiles in Lake View, AL, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.

    Key observations

    • Income disparities: The mean income of the lowest quintile (20% of households with the lowest income) is 28,416, while the mean income for the highest quintile (20% of households with the highest income) is 200,443. This indicates that the top earners earn 7 times compared to the lowest earners.
    • *Top 5%: * The mean household income for the wealthiest population (top 5%) is 260,140, which is 129.78% higher compared to the highest quintile, and 915.47% higher compared to the lowest quintile.

    https://i.neilsberg.com/ch/lake-view-al-mean-household-income-by-quintiles.jpeg" alt="Mean household income by quintiles in Lake View, AL (in 2022 inflation-adjusted dollars))">

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Income Levels:

    • Lowest Quintile
    • Second Quintile
    • Third Quintile
    • Fourth Quintile
    • Highest Quintile
    • Top 5 Percent

    Variables / Data Columns

    • Income Level: This column showcases the income levels (As mentioned above).
    • Mean Household Income: Mean household income, in 2022 inflation-adjusted dollars for the specific income level.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Lake View median household income. You can refer the same here

  4. Google Analytics data of an E-commerce Company

    • kaggle.com
    zip
    Updated Oct 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    fehu.zone (2024). Google Analytics data of an E-commerce Company [Dataset]. https://www.kaggle.com/datasets/fehu94/google-analytics-data-of-an-e-commerce-company
    Explore at:
    zip(3156 bytes)Available download formats
    Dataset updated
    Oct 19, 2024
    Authors
    fehu.zone
    Description

    ๐Ÿ“Š Dataset Title: Daily Active Users Dataset

    ๐Ÿ“ Description

    This dataset provides detailed insights into daily active users (DAU) of a platform or service, captured over a defined period of time. The dataset includes information such as the number of active users per day, allowing data analysts and business intelligence teams to track usage trends, monitor platform engagement, and identify patterns in user activity over time.

    The data is ideal for performing time series analysis, statistical analysis, and trend forecasting. You can utilize this dataset to measure the success of platform initiatives, evaluate user behavior, or predict future trends in engagement. It is also suitable for training machine learning models that focus on user activity prediction or anomaly detection.

    ๐Ÿ“‚ Dataset Structure

    The dataset is structured in a simple and easy-to-use format, containing the following columns:

    • Date: The date on which the data was recorded, formatted as YYYYMMDD.
    • Number of Active Users: The number of users who were active on the platform on the corresponding date.

    Each row in the dataset represents a unique date and its corresponding number of active users. This allows for time-based analysis, such as calculating the moving average of active users, detecting seasonality, or spotting sudden spikes or drops in engagement.

    ๐Ÿง Key Use Cases

    This dataset can be used for a wide range of purposes, including:

    1. Time Series Analysis: Analyze trends and seasonality of user engagement.
    2. Trend Detection: Discover peaks and valleys in user activity.
    3. Anomaly Detection: Use statistical methods or machine learning algorithms to detect anomalies in user behavior.
    4. Forecasting User Growth: Build forecasting models to predict future platform usage.
    5. Seasonality Insights: Identify patterns like increased activity on weekends or holidays.

    ๐Ÿ“ˆ Potential Analysis

    Here are some specific analyses you can perform using this dataset:

    • Moving Average and Smoothing: Calculate the moving average over a 7-day or 30-day period.
    • Correlation with External Factors: Correlate daily active users with other datasets.
    • Statistical Hypothesis Testing: Perform t-tests or ANOVA to determine significant differences in user activity.
    • Machine Learning for Prediction: Train machine learning models to predict user engagement.

    ๐Ÿš€ Getting Started

    To get started with this dataset, you can load it into your preferred analysis tool. Here's how to do it using Python's pandas library:

    import pandas as pd
    
    # Load the dataset
    data = pd.read_csv('path_to_dataset.csv')
    
    # Display the first few rows
    print(data.head())
    
    # Basic statistics
    print(data.describe())
    
  5. f

    Data used to calculate mean resting orientations of ornithischian scapular...

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    • +1more
    Updated Dec 18, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Senter, Phil; Robins, James H. (2015). Data used to calculate mean resting orientations of ornithischian scapular angles in lateral view. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001934624
    Explore at:
    Dataset updated
    Dec 18, 2015
    Authors
    Senter, Phil; Robins, James H.
    Description

    See Materials and Methods section for description of angle B. Group means shown without confidence intervals are those for which sample size is too small to derive 95% confidence intervals (n < 8). See Table 1 for institutional abbreviations.

  6. Pakistan House Price dataset

    • kaggle.com
    zip
    Updated May 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jillani SofTech (2023). Pakistan House Price dataset [Dataset]. https://www.kaggle.com/datasets/jillanisofttech/pakistan-house-price-dataset/versions/1
    Explore at:
    zip(8379623 bytes)Available download formats
    Dataset updated
    May 6, 2023
    Authors
    Jillani SofTech
    Area covered
    Pakistan
    Description

    Dataset Description: The dataset contains information about properties. Each property has a unique property ID and is associated with a location ID based on the subcategory of the city. The dataset includes the following attributes:

    Property ID: Unique identifier for each property. Location ID: Unique identifier for each location within a city. Page URL: The URL of the webpage where the property was published. Property Type: Categorization of the property into six types: House, FarmHouse, Upper Portion, Lower Portion, Flat, or Room. Price: The price of the property, which is the dependent feature in this dataset. City: The city where the property is located. The dataset includes five cities: Lahore, Karachi, Faisalabad, Rawalpindi, and Islamabad. Province: The state or province where the city is located. Location: Different types of locations within each city. Latitude and Longitude: Geographic coordinates of the cities. Steps Involved in the Analysis:

    Statistical Analysis:

    Data Types: Determine the data types of the attributes. Level of Measurement: Identify the level of measurement for each attribute. Summary Statistics: Calculate mean, standard deviation, minimum, and maximum values for numerical attributes. Data Cleaning:

    Filling Null Values: Handle missing values in the dataset. Duplicate Values: Remove duplicate records, if any. Correcting Data Types: Ensure the correct data types for each attribute. Outliers Detection: Identify and handle outliers in the data. Exploratory Data Analysis (EDA):

    Visualization: Use libraries such as Seaborn, Matplotlib, and Plotly to visualize the data and gain insights. Model Building:

    Libraries: Utilize libraries like Sklearn and pickle. List of Models: Build models using Linear Regression, Decision Tree, Random Forest, K-Nearest Neighbors (KNN), XG Boost, Gradient Boost, and Ada Boost. Model Saving: Save the selected model into a pickle file for future use. I hope this captures the essence of the provided information. Let me know if you need any further assistance!

  7. N

    Median Household Income Variation by Family Size in Ocean View, DE:...

    • neilsberg.com
    csv, json
    Updated Mar 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Median Household Income Variation by Family Size in Ocean View, DE: Comparative analysis across 7 household sizes [Dataset]. https://www.neilsberg.com/insights/ocean-view-de-median-household-income/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Mar 3, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Ocean View, Delaware
    Variables measured
    Household size, Median Household Income
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It delineates income distributions across 7 household sizes (mentioned above) following an initial analysis and categorization. Using this dataset, you can find out how household income varies with the size of the family unit. For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents median household incomes for various household sizes in Ocean View, DE, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.

    Key observations

    • Of the 7 household sizes (1 person to 7-or-more person households) reported by the census bureau, Ocean View did not include 5, 6, or 7-person households. Across the different household sizes in Ocean View the mean income is $114,088, and the standard deviation is $59,951. The coefficient of variation (CV) is 52.55%. This high CV indicates high relative variability, suggesting that the incomes vary significantly across different sizes of households.
    • In the most recent year, 2023, The smallest household size for which the bureau reported a median household income was 1-person households, with an income of $40,761. It then further increased to $167,813 for 4-person households, the largest household size for which the bureau reported a median household income.
    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Household Sizes:

    • 1-person households
    • 2-person households
    • 3-person households
    • 4-person households
    • 5-person households
    • 6-person households
    • 7-or-more-person households

    Variables / Data Columns

    • Household Size: This column showcases 7 household sizes ranging from 1-person households to 7-or-more-person households (As mentioned above).
    • Median Household Income: Median household income, in 2023 inflation-adjusted dollars for the specific household size.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Ocean View median household income. You can refer the same here

  8. w

    Charts of climate statistics and MODIS data for all Bioregional Assessment...

    • data.wu.ac.at
    • researchdata.edu.au
    • +1more
    Updated Jun 14, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Programme (2018). Charts of climate statistics and MODIS data for all Bioregional Assessment subregions [Dataset]. https://data.wu.ac.at/schema/data_gov_au/NzAxNGQ5NjYtODdmNS00ODJkLThjYWQtNjNkZTk2NGFkZGQ5
    Explore at:
    Dataset updated
    Jun 14, 2018
    Dataset provided by
    Bioregional Assessment Programme
    Description

    Abstract

    This dataset was derived by the Bioregional Assessment Programme from 'Mean climate variables for all subregions' and 'fPAR derived from MODIS for BA subregions'. You can find a link to the parent datasets in the Lineage Field in this metadata statement. The History Field in this metadata statement describes how this dataset was derived.

    These are charts of climate statistics and MODIS data for each BA subregion. There are six 600dpi PNG files per subregion, with the naming convention BA-[regioncode]-[subregioncode]-[chartname].png. The charts, according to their filename, are: rain (time-series of rainfall; Figure 1), P-PET (average monthly precipitation and potential evapotranspiration; Figure 2), 5line (assorted monthly statistics; Figure 3), trend (monthly long-term trends; Figure 4) and fPAR (fraction of photosynthetically available radiation - an indication of biomass; Figure 5).

    This version was created on 18 November 2014, using data that accounted for a modified boundary for the Gippsland Basin bioregion and the combination of two subregions to form the Sydney Basin bioregion.

    Purpose

    These charts were generated to be included in the Contextual Report (geography) for each subregion.

    Dataset History

    These charts were generated using MatPlotLib 1.3.0 in Python 2.7.5 (Anaconda distribution v1.7.0 32-bit).

    The script for generating these plots is BA-ClimateCharts.py, and is packaged with the dataset. This script is a data collection and chart drawing script, it does not do any analysis. The data are charted as they appear in the parent datasets (see Lineage). A word document (BA-ClimateGraphs-ReadMe) is also included. This document includes examples of, and approved captions for, each chart.

    Dataset Citation

    Bioregional Assessment Programme (2014) Charts of climate statistics and MODIS data for all Bioregional Assessment subregions. Bioregional Assessment Derived Dataset. Viewed 14 June 2018, http://data.bioregionalassessments.gov.au/dataset/8a1c5f43-b150-4357-aa25-5f301b1a02e1.

    Dataset Ancestors

  9. Earth Radiation area average time series through Wide-field-of-view...

    • data.nasa.gov
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Earth Radiation area average time series through Wide-field-of-view nonscanner abroad Earth Radiation Budget Satellite - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/earth-radiation-area-average-time-series-through-wide-field-of-view-nonscanner-abroad-eart-d59c2
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Area covered
    Earth
    Description

    Understanding the mean and variability of the Earthโ€™s radiation budget (ERB) at the Top-of-Atmosphere (TOA) and surface is a fundamental quantity governing climate variability and, for that reason, NASA has been making concerted efforts to observe the ERB since1984 through two projects: ERBE and CERES, that span nearly 30 years to date. The proposed project utilizes knowledge gained in the last 10 years through CERES data analyses and apply the knowledge to existing data to develop long-term (nearly 30 years) consistent and calibrated data product (TOA irradiances at the same radiometric scale) from multiple missions (ERBS and CERES). This project proposes to produce level 3 surface irradiance products that are consistent with observed TOA irradiances in a framework of 1D radiative transfer theory. Based on these TOA and surface irradiance products, a data product will be developed which contains the contribution of atmospheric and cloud property variability to TOA and surface irradiance variability. All algorithms used in the process are based on existing CERES algorithms. All data sets produced by this project will be available from the Atmospheric Science Data Center.

  10. Z

    Dataset from: High consistency and repeatability in the breeding migrations...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous (2024). Dataset from: High consistency and repeatability in the breeding migrations of a benthic shark [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11467088
    Explore at:
    Dataset updated
    Jun 4, 2024
    Authors
    Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset and scripts used for manuscript: High consistency and repeatability in the breeding migrations of a benthic shark.

    Project title: High consistency and repeatability in the breeding migrations of a benthic sharkDate:23/04/2024

    Folders:- 1_Raw_data - Perpendicular_Point_068151, Sanctuary_Point_068088, SST raw data, sst_nc_files, IMOS_animal_measurements, IMOS_detections, PS&Syd&JB tags, rainfall_raw, sample_size, Point_Perpendicular_2013_2019, Sanctuary_Point_2013_2019, EAC_transport- 2_Processed_data - SST (anomaly, historic_sst, mean_sst_31_years, week_1992_sst:week_2022_sst including week_2019_complete_sst) - Rain (weekly_rain, weekly_rainfall_completed) - Clean (clean, cleaned_data, cleaned_gam, cleaned_pj_data)- 3_Script_processing_data - Plots(dual_axis_plot (Fig. 1 & Fig. 4).R, period_plot (Fig. 2).R, sd_plot (Fig. 5).R, sex_plot (Fig. 3).R - cleaned_data.R, cleaned_data_gam.R, weekly_rainfall_completed.R, descriptive_stats.R, sst.R, sst_2019b.R, sst_anomaly.R- 4_Script_analyses - gam.R, gam_eac.R, glm.R, lme.R, Repeatability.R- 5_Output_doc - Plots (arrival_dual_plot_with_anomaly (Fig. 1).png, period_plot (Fig.2).png, sex_arrival_departure (Fig. 3).png, departure_dual_plot_with_anomaly (Fig. 4).png, standard deviation plot (Fig. 5).png) - Tables (gam_arrival_eac_selection_table.csv (Table S2), gam_departure_eac_selection_table (Table S5), gam_arrival_selection_table (Table. S3), gam_departure_selection_table (Table. S6), glm_arrival_selection_table, glm_departure_selection_table, lme_arrival_anova_table, lme_arrival_selection_table (Table S4), lme_departure_anova_table, lme_departure_selection_table (Table. S8))

    Descriptions of scripts and files used:- cleaned_data.R: script to extract detections of sharks at Jervis Bay. Calculate arrival and departure dates over the seven breeding seasons. Add sex and length for each individual. Extract moon phase (numerical value) and period of the day from arrival and departure times. - IMOS_detections.csv: raw data file with detections of Port Jackson sharks over different sites in Australia. - IMOS_animal_measurements.csv: raw data file with morphological data of Port Jackson sharks - PS&Syd&JB tags: file with measurements and sex identification of sharks (different from IMOS, it was used to complete missing sex and length). - cleaned_data.csv: file with arrival and departure dates of the final sample size of sharks (N=49) with missing sex and length for some individuals. - clean.csv: completed file using PS&Syd&JB tags, note: tag ID 117393679 was wrongly identified as a male in IMOS and correctly identified as a female in PS&Syd&JB tags file as indicated by its large size. - cleaned_pj_data: Final data file with arrival and departure dates, sex, length, moon phase (numerical) and period of the day.

    • weekly_rainfall_completed.R: script to calculate average weekly rainfall and correlation between the two weather stations used (Point perpendicular and Sanctuary point). - weekly_rain.csv: file with the corresponding week number (1-28) for each date (01-06-2013 to 13-12-2019) - weekly_rainfall_completed.csv: file with week number (1-28), year (2013-2019) and weekly rainfall average completed with Sanctuary Point for week 2 of 2017 - Point_Perpendicular_2013_2019: Rainfall (mm) from 01-01-2013 to 31-12-2020 at the Point Perpendicular weather station - Sanctuary_Point_2013_2019: Rainfall (mm) from 01-01-2013 to 31-12-2020 at the Sanctuary Point weather station - IDCJAC0009_068088_2017_Data.csv: Rainfall (mm) from 01-01-2017 to 31-12-2017 at the Sanctuary Point weather station (to fill in missing value for average rainfall of week 2 of 2017)

    • cleaned_data_gam.R: script to calculate weekly counts of sharks to run gam models and add weekly averages of rainfall and sst anomaly - cleaned_pj_data.csv - anomaly.csv: weekly (1-28) average sst anomalies for Jervis Bay (2013-2019) - weekly_rainfall_completed.csv: weekly (1-28) average rainfall for Jervis Bay (2013-2019_ - sample_size.csv: file with the number of sharks tagged (13-49) for each year (2013-2019)

    • sst.R: script to extract daily and weekly sst from IMOS nc files from 01-05 until 31-12 for the following years: 1992:2022 for Jervis Bay - sst_raw_data: folder with all the raw weekly (1:28) csv files for each year (1992:2022) to fill in with sst data using the sst script - sst_nc_files: folder with all the nc files downloaded from IMOS from the last 31 years (1992-2022) at the sensor (IMOS - SRS - SST - L3S-Single Sensor - 1 day - night time โ€“ Australia). - SST: folder with the average weekly (1-28) sst data extracted from the nc files using the sst script for each of the 31 years (to calculate temperature anomaly).

    • sst_2019b.R: script to extract daily and weekly sst from IMOS nc file for 2019 (missing value for week 19) for Jervis Bay - week_2019_sst: weekly average sst 2019 with a missing value for week 19 - week_2019b_sst: sst data from 2019 with another sensor (IMOS โ€“ SRS โ€“ MODIS - 01 day - Ocean Colour-SST) to fill in the gap of week 19 - week_2019_complete_sst: completed average weekly sst data from the year 2019 for weeks 1-28.

    • sst_anomaly.R: script to calculate mean weekly sst anomaly for the study period (2013-2019) using mean historic weekly sst (1992-2022) - historic_sst.csv: mean weekly (1-28) and yearly (1992-2022) sst for Jervis Bay - mean_sst_31_years.csv: mean weekly (1-28) sst across all years (1992-2022) for Jervis Bay - anomaly.csv: mean weekly and yearly sst anomalies for the study period (2013-2019)

    • Descriptive_stats.R: script to calculate minimum and maximum length of sharks, mean Julian arrival and departure dates per individual per year, mean Julian arrival and departure dates per year for all sharks (Table. S10), summary of standard deviation of julian arrival dates (Table. S9) - cleaned_pj_data.csv

    • gam.R: script used to run the Generalized additive model for rainfall and sea surface temperature - cleaned_gam.csv

    • glm.R: script used to run the Generalized linear mixed models for the period of the day and moon phase - cleaned_pj_data.csv - sample_size.csv

    • lme.R: script used to run the Linear mixed model for sex and size - cleaned_pj_data.csv

    • Repeatability.R: script used to run the Repeatability for Julian arrival and Julian departure dates - cleaned_pj_data.csv

  11. Summer Products Sales Performance

    • kaggle.com
    zip
    Updated Dec 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Summer Products Sales Performance [Dataset]. https://www.kaggle.com/thedevastator/summer-products-sales-performance
    Explore at:
    zip(436244 bytes)Available download formats
    Dataset updated
    Dec 4, 2023
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Summer Products Sales Performance

    E-commerce sales performance and ratings data for summer products

    By Jeffrey Mvutu Mabilama [source]

    About this dataset

    The Summer Products and Sales Performance dataset is a comprehensive collection of product listings, ratings, and sales data from the Wish platform. The dataset aims to provide insights into the trends and patterns in e-commerce during the summer season. It contains valuable information such as product titles, prices, retail prices, currency used for pricing, units sold, whether ad boosts are used for product listings, average ratings for products, total ratings count for products, counts of five-star to one-star ratings for products.

    Additionally, the dataset includes data on various aspects related to product quality and shipping options such as badges count (indicating special qualities), local product status (whether the product is sold locally), product quality rating badges (indicating the quality of the product), fast shipping availability badges (indicating whether fast shipping is available), tags associated with products (making them more discoverable), color variations of products available in inventory along with their count. It also provides information on different shipping options including option names and their corresponding prices.

    Moreover,the dataset encompasses details about merchants selling these products including merchant title and name as well as information on merchant rating count (total number of ratings received by merchants) ,merchant profile picture availability,and subtitle which gives additional details about merchant's info.

    The dataset further includes links to images of individual listed products along with links to respective online shop pages where these are found . In addition,currency buyer specifies currency type used by buyers throughout various transactions.Items flagged under urgency text have an associated urgency text rate indicating how urgently they are desired or needed.

    This comprehensive dataset also allows users to analyze units sold per listed item as well as mean units sold per listed item across different categories/theme .Further evaluation can be done using totalunitsold variable which represents total volume sales from all listed items tied together across Wish platform.

    Aiding further analysis around elasticity theory users can find marked down rates/percentage tagged describing discounts over retail price,ranging from 0-1 as well as average discount values for individual listed products.Further custom insights such as number of countries items can be delivered to, their origin country, if they possess an urgency banner or fast shipping and if the seller is famous/has a profile picture.

    This comprehensive dataset served to build model helping sellers predict how well an item may sell so as to equip businesses with ability to make replenishment decisions guided by this model

    How to use the dataset

    • Familiarize Yourself with the Columns:

      • Before diving into data analysis, it's important to understand the meaning of each column in the dataset. The columns contain information such as product titles, prices, ratings, inventory details, shipping options, merchant information, and more. Refer to the dataset documentation or use descriptive statistics methods to gain insights into different attributes.
    • Explore Product Categories:

      • The dataset includes a column named theme that represents the category or theme of each product listing. By analyzing this column's values and frequency distribution, you can identify top-selling categories during the summer season. This information can be beneficial for businesses looking to optimize their product offerings.
    • Analyze Pricing Data:

      • The columns like price, retail_price, and currency_buyer provide insights into pricing strategies employed by sellers on Wish platform.
      • Calculate various statistical measures like mean price using 'meanproductprices', highest priced items using 'price', average discount using averagediscount'
      • Investigate relationships between pricing factors such as discounted prices compared to original retail prices ('discounted price' = 'retail_price' - 'price').
    • Examine Ratings Data: 4a) Analyze Product Ratings: To gauge customer satisfaction levels regarding products listed on Wish platform products rating features have been provided. Available columns- -> Number of ratings received per star rating -> Total number of ratings received (rating_count) -> Average rating (rating) Perform analysis to find: - Aver...

  12. Binance Coin BNB, 1m Full Historical Data

    • kaggle.com
    zip
    Updated Oct 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Imran Bukhari (2025). Binance Coin BNB, 1m Full Historical Data [Dataset]. https://www.kaggle.com/datasets/imranbukhari/comprehensive-bnbusd-1m-data/data
    Explore at:
    zip(266775584 bytes)Available download formats
    Dataset updated
    Oct 11, 2025
    Authors
    Imran Bukhari
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    I am a new developer and I would greatly appreciate your support. If you find this dataset helpful, please consider giving it an upvote!

    Key Features:

    Complete 1m Data: Raw 1m historical data from multiple exchanges, covering the entire trading history of BNBUSD available through their API endpoints. This dataset is updated daily to ensure up-to-date coverage.

    Combined Index Dataset: A unique feature of this dataset is the combined index, which is derived by averaging all other datasets into one, please see attached notebook. This creates the longest continuous, unbroken BNBUSD dataset available on Kaggle, with no gaps and no erroneous values. It gives a much more comprehensive view of the market i.e. total volume across multiple exchanges.

    Superior Performance: The combined index dataset has demonstrated superior 'mean average error' (MAE) metric performance when training machine learning models, compared to single-source datasets by a whole order of MAE magnitude.

    Unbroken History: The combined dataset's continuous history is a valuable asset for researchers and traders who require accurate and uninterrupted time series data for modeling or back-testing.

    https://i.imgur.com/aqtuPay.png" alt="BNBUSD Dataset Summary">

    https://i.imgur.com/mnzs2f4.png" alt="Combined Dataset Close Plot"> This plot illustrates the continuity of the dataset over time, with no gaps in data, making it ideal for time series analysis.

    Included Resources:

    Two Notebooks:

    Dataset Usage and Diagnostics: This notebook demonstrates how to use the dataset and includes a powerful data diagnostics function, which is useful for all time series analyses.

    Aggregating Multiple Data Sources: This notebook walks you through the process of combining multiple exchange datasets into a single, clean dataset. (Currently unavailable, will be added shortly)

  13. Fish dataset

    • kaggle.com
    zip
    Updated May 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AbdElRahman16 (2025). Fish dataset [Dataset]. https://www.kaggle.com/datasets/abdelrahman16/11111111111111111111
    Explore at:
    zip(20477 bytes)Available download formats
    Dataset updated
    May 6, 2025
    Authors
    AbdElRahman16
    Description

    ๐Ÿ” Dataset Overview: ๐ŸŸ Species: Name of the fish species (e.g., Anabas testudineus)

    ๐Ÿ“ Length: Length of the fish (in centimeters)

    โš–๏ธ Weight: Weight of the fish (in grams)

    ๐Ÿงฎ W/L Ratio: Weight-to-length ratio of the fish

    ๐Ÿง  Steps to Build the Prediction Model: ๐Ÿ“‹ Data Preprocessing: 1 - Handle Missing Values: Check for and handle any missing values appropriately using methods like:

    Imputation (mean/median for numeric data)

    Row or column removal (if data is too sparse)

    2 - Convert Data Types: Ensure numerical columns (Length, Weight, W/L Ratio) are in the correct numeric format.

    3 - Handle Categorical Variables: Convert the Species column into numerical format using:

    One-Hot Encoding

    Label Encoding

    ๐ŸŽฏ Feature Selection: 1 - Correlation Analysis: Use correlation heatmaps or statistical tests to identify features most related to the target variable (e.g., Weight).

    2 - Feature Importance: Use tree-based models (like Random Forest) to determine which features are most predictive.

    ๐Ÿ” Model Selection: 1 - Algorithm Choice: Choose suitable machine learning algorithms such as:

    Linear Regression

    Decision Tree Regressor

    Random Forest Regressor

    Gradient Boosting Regressor

    2 - Model Comparison: Evaluate each model using metrics like:

    Mean Absolute Error (MAE)

    Mean Squared Error (MSE)

    R-squared (Rยฒ)

    ๐Ÿš€ Model Training and Evaluation: 1 - Train the Model: Split the dataset into training and testing sets (e.g., 80/20 split). Train the selected model(s) on the training set.

    2 - Evaluate the Model: Use the test set to assess model performance and fine-tune as necessary using grid search or cross-validation.

    This dataset and workflow are useful for exploring biometric relationships in fish and building regression models to predict weight based on length or species. Great for marine biology, aquaculture analytics, and educational projects.

    ๐Ÿ  Happy modeling! ๐Ÿ‘ Please upvote if you found this helpful!

    https://www.kaggle.com/code/abdelrahman16/fish-clustering-diverse-techniques

  14. N

    Median Household Income Variation by Family Size in Forest View, IL:...

    • neilsberg.com
    csv, json
    Updated Mar 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2025). Median Household Income Variation by Family Size in Forest View, IL: Comparative analysis across 7 household sizes [Dataset]. https://www.neilsberg.com/research/datasets/23fe928c-f81d-11ef-a994-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Mar 3, 2025
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Illinois, Forest View
    Variables measured
    Household size, Median Household Income
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It delineates income distributions across 7 household sizes (mentioned above) following an initial analysis and categorization. Using this dataset, you can find out how household income varies with the size of the family unit. For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents median household incomes for various household sizes in Forest View, IL, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.

    Key observations

    • Of the 7 household sizes (1 person to 7-or-more person households) reported by the census bureau, Forest View did not include 6, or 7-person households. Across the different household sizes in Forest View the mean income is $116,795, and the standard deviation is $37,772. The coefficient of variation (CV) is 32.34%. This high CV indicates high relative variability, suggesting that the incomes vary significantly across different sizes of households.
    • In the most recent year, 2023, The smallest household size for which the bureau reported a median household income was 1-person households, with an income of $76,250. It then further increased to $135,179 for 5-person households, the largest household size for which the bureau reported a median household income.
    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

    Household Sizes:

    • 1-person households
    • 2-person households
    • 3-person households
    • 4-person households
    • 5-person households
    • 6-person households
    • 7-or-more-person households

    Variables / Data Columns

    • Household Size: This column showcases 7 household sizes ranging from 1-person households to 7-or-more-person households (As mentioned above).
    • Median Household Income: Median household income, in 2023 inflation-adjusted dollars for the specific household size.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Forest View median household income. You can refer the same here

  15. YouTube Channel Performance Analytics

    • kaggle.com
    zip
    Updated Oct 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    L3WY (2024). YouTube Channel Performance Analytics [Dataset]. https://www.kaggle.com/datasets/positivealexey/youtube-channel-performance-analytics
    Explore at:
    zip(41446 bytes)Available download formats
    Dataset updated
    Oct 25, 2024
    Authors
    L3WY
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    This dataset provides an in-depth look at YouTube video analytics, capturing key metrics related to video performance, audience engagement, revenue generation, and viewer behavior. Sourced from real video data, it highlights how variables like video duration, upload time, and ad impressions contribute to monetization and audience retention. This dataset is ideal for data analysts, content creators, and marketers aiming to uncover trends in viewer engagement, optimize content strategies, and maximize ad revenue. Inspired by the evolving landscape of digital content, it serves as a resource for understanding the impact of YouTube metrics on channel growth and content reach.

    Video Details: Columns like Video Duration, Video Publish Time, Days Since Publish, Day of Week.

    Revenue Metrics: Includes Revenue per 1000 Views (USD), Estimated Revenue (USD), Ad Impressions, and various ad revenue sources (e.g., AdSense, DoubleClick).

    Engagement Metrics: Metrics such as Views, Likes, Dislikes, Shares, Comments, Average View Duration, Average View Percentage (%), and Video Thumbnail CTR (%).

    Audience Data: Data on New Subscribers, Unsubscribes, Unique Viewers, Returning Viewers, and New Viewers.

    Monetization & Transaction Metrics: Details on Monetized Playbacks, Playback-Based CPM, YouTube Premium Revenue, and transactions like Orders and Total Sales Volume (USD).

  16. Student Academic Performance (Synthetic Dataset)

    • kaggle.com
    zip
    Updated Oct 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mamun Hasan (2025). Student Academic Performance (Synthetic Dataset) [Dataset]. https://www.kaggle.com/datasets/mamunhasan2cs/student-academic-performance-synthetic-dataset
    Explore at:
    zip(9287 bytes)Available download formats
    Dataset updated
    Oct 10, 2025
    Authors
    Mamun Hasan
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is a synthetic collection of student performance data created for data preprocessing, cleaning, and analysis practice in Data Mining and Machine Learning courses. It contains information about 1,020 students, including their study habits, attendance, and test performance, with intentionally introduced missing values, duplicates, and outliers to simulate real-world data issues.

    The dataset is suitable for laboratory exercises, assignments, and demonstration of key preprocessing techniques such as:

    • Handling missing values
    • Removing duplicates
    • Detecting and treating outliers
    • Data normalization and transformation
    • Encoding categorical variables
    • Exploratory data analysis (EDA)
    • Regression Analysis

    ๐Ÿ“Š Columns Description

    Column NameDescription
    Student_IDUnique identifier for each student (e.g., S0001, S0002, โ€ฆ)
    AgeAge of the student (between 18 and 25 years)
    GenderGender of the student (Male/Female)
    Study_HoursAverage number of study hours per day (contains missing values and outliers)
    Attendance(%)Percentage of class attendance (contains missing values)
    Test_ScoreFinal exam score (0โ€“100 scale)
    GradeLetter grade derived from test scores (F, C, B, A, A+)

    ๐Ÿง  Example Lab Tasks Using This Dataset:

    • Identify and impute missing values using mean/median.
    • Detect and remove duplicate records.
    • Use IQR or Z-score methods to handle outliers.
    • Normalize Study_Hours and Test_Score using Min-Max scaling.
    • Encode categorical variables (Gender, Grade) for model input.
    • Prepare a clean dataset ready for classification/regression analysis.
    • Can be used for Limited Regression

    ๐ŸŽฏ Possible Regression Targets

    Test_Score โ†’ Predict test score based on study hours, attendance, age, and gender.

    ๐Ÿงฉ Example Regression Problem

    Predict the studentโ€™s test score using their study hours, attendance percentage, and age.

    ๐Ÿง  Sample Features: X = ['Age', 'Gender', 'Study_Hours', 'Attendance(%)'] y = ['Test_Score']

    You can use:

    • Linear Regression (for simplicity)
    • Polynomial Regression (to explore nonlinear patterns)
    • Decision Tree Regressor or Random Forest Regressor

    And analyze feature influence using correlation or SHAP/LIME explainability.

  17. Student Performance Dataset

    • kaggle.com
    Updated Aug 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ghulam Muhammad Nabeel (2025). Student Performance Dataset [Dataset]. https://www.kaggle.com/datasets/nabeelqureshitiii/student-performance-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 27, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ghulam Muhammad Nabeel
    Description

    ๐Ÿ“Š Student Performance Dataset (Synthetic, Realistic)

    Overview

    This dataset contains 1000000 rows of realistic student performance data, designed for beginners in Machine Learning to practice Linear Regression, model training, and evaluation techniques.

    Each row represents one student with features like study hours, attendance, class participation, and final score.
    The dataset is small, clean, and structured to be beginner-friendly.

    ๐Ÿ”‘ Columns Description

    • student_id โ†’ Unique identifier for each student.
    • weekly_self_study_hours โ†’ Average weekly self-study hours (0โ€“40). Generated using a normal distribution centered around 15 hours.
    • attendance_percentage โ†’ Attendance percentage (50โ€“100). Simulated with a normal distribution around 85%.
    • class_participation โ†’ Score between 0โ€“10 indicating how actively the student participates in class. Generated from a normal distribution centered around 6.
    • total_score โ†’ Final performance score (0โ€“100). Calculated as a function of study hours + random noise, then clipped between 0โ€“100. Stronger correlation with study hours.
    • grade โ†’ Categorical label (A, B, C, D, F) derived from total_score.

    ๐Ÿ“ Data Generation Logic

    1. Weekly Study Hours: Modeled using a normal distribution (mean โ‰ˆ 15, std โ‰ˆ 7), capped between 0 and 40 hours.
    2. Scores: More study hours โ†’ higher score. Formula:

    Random noise simulates differences in learning ability, motivation, etc.

    1. Attendance & Participation: Independent but realistic variations added.
    2. Grades: Assigned from scores using thresholds:
    • A: โ‰ฅ 85
    • B: โ‰ฅ 70
    • C: โ‰ฅ 55
    • D: โ‰ฅ 40
    • F: < 40

    ๐ŸŽฏ How to Use This Dataset

    Regression Tasks

    • Predict total_score from weekly_self_study_hours.
    • Train and evaluate Linear Regression models.
    • Extend to multiple regression using attendance_percentage and class_participation.

    Classification Tasks

    • Predict grade (Aโ€“F) using study hours, attendance, and participation.

    Model Evaluation Practice

    • Apply train-test split and cross-validation.
    • Evaluate with MAE, RMSE, Rยฒ.
    • Compare simple vs. multiple regression.

    โœ… This dataset is intentionally kept simple, so that new ML learners can clearly see the relationship between input features (study, attendance, participation) and output (score/grade).

  18. Kokoro Speech Dataset v1.1 Tiny

    • kaggle.com
    zip
    Updated May 14, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katsuya Iida (2021). Kokoro Speech Dataset v1.1 Tiny [Dataset]. https://www.kaggle.com/datasets/kaiida/kokoro-speech-dataset-v11-tiny
    Explore at:
    zip(48156884 bytes)Available download formats
    Dataset updated
    May 14, 2021
    Authors
    Katsuya Iida
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Kokoro Speech Dataset

    Kokoro Speech Dataset is a public domain Japanese speech dataset. It contains 34,958 short audio clips of a single speaker reading 9 novel books. The format of the metadata is similar to that of LJ Speech so that the dataset is compatible with modern speech synthesis systems.

    The texts are from Aozora Bunko, which is in the public domain. The audio clips are from LibriVox project, which is also in the public domain. Readings are estimated by MeCab and UniDic Lite from kanji-kana mixture text. Readings are romanized which are similar to the format used by Julius.

    The audio clips were split and transcripts were aligned automatically by Voice100.

    Sample data

    Listen from your browser or download randomly sampled 100 clips.

    File Format

    Metadata is provided in metadata.csv. This file consists of one record per line, delimited by the pipe character (0x7c). The fields are:

    • ID: this is the name of the corresponding .wav file
    • Transcription: Kanji-kana mixture text spoken by the reader (UTF-8)
    • Reading: Romanized text spoken by the reader (UTF-8)

    Each audio file is a single-channel 16-bit PCM WAV with a sample rate of 22050 Hz.

    Statistics

    The dataset is provided in different sizes, large, small, tiny. small and tiny don't share same clips. large contains all available clips, including small and tiny.

    Large:
    Total clips: 34958
    Min duration: 3.007 secs
    Max duration: 14.745 secs
    Mean duration: 4.978 secs
    Total duration: 48:20:24
    
    Small:
    Total clips: 8812
    Min duration: 3.007 secs
    Max duration: 14.431 secs
    Mean duration: 4.951 secs
    Total duration: 12:07:12
    
    Tiny:
    Total clips: 285
    Min duration: 3.019 secs
    Max duration: 9.462 secs
    Mean duration: 4.871 secs
    Total duration: 00:23:08
    

    How to get the data

    Because of its large data size of the dataset, audio files are not included in this repository, but the metadata is included.

    To make .wav files of the dataset, run

    $ bash download.sh
    

    to download the metadata from the project page. Then run

    $ pip3 install torchaudio
    $ python3 extract.py --size tiny
    

    This prints a shell script example to download MP3 audio files from archive.org and extract them if you haven't done it already.

    After doing so, run the command again

    $ python3 extract.py --size tiny
    

    to get files for tiny under ./output directory.

    You can give another size name to the --size option to get dataset of the size.

    Pretrained Tacotron model

    Pretrained Tacotron model trained with Kokoro Speech Dataset and audio samples are available. The model was trained for 21K steps with small. According to the above repo, "Speech started to become intelligible around 20K steps" with LJ Speech Dataset. Audio samples read the first few sentences from Gon Gitsune which is not included in small.

    Books

    The dataset contains recordings from these books read by ekzemplaro

  19. N

    Income Distribution by Quintile: Mean Household Income in Bay View, OH

    • neilsberg.com
    csv, json
    Updated Jan 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neilsberg Research (2024). Income Distribution by Quintile: Mean Household Income in Bay View, OH [Dataset]. https://www.neilsberg.com/research/datasets/945d8e9e-7479-11ee-949f-3860777c1fe6/
    Explore at:
    json, csvAvailable download formats
    Dataset updated
    Jan 11, 2024
    Dataset authored and provided by
    Neilsberg Research
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bay View
    Variables measured
    Income Level, Mean Household Income
    Measurement technique
    The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across income quintiles (mentioned above) following an initial analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). For additional information about these estimations, please contact us via email at research@neilsberg.com
    Dataset funded by
    Neilsberg Research
    Description
    About this dataset

    Context

    The dataset presents the mean household income for each of the five quintiles in Bay View, OH, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.

    Key observations

    • Income disparities: The mean income of the lowest quintile (20% of households with the lowest income) is 22,803, while the mean income for the highest quintile (20% of households with the highest income) is 219,508. This indicates that the top earners earn 10 times compared to the lowest earners.
    • *Top 5%: * The mean household income for the wealthiest population (top 5%) is 407,667, which is 185.72% higher compared to the highest quintile, and 1787.78% higher compared to the lowest quintile.

    https://i.neilsberg.com/ch/bay-view-oh-mean-household-income-by-quintiles.jpeg" alt="Mean household income by quintiles in Bay View, OH (in 2022 inflation-adjusted dollars))">

    Content

    When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

    Income Levels:

    • Lowest Quintile
    • Second Quintile
    • Third Quintile
    • Fourth Quintile
    • Highest Quintile
    • Top 5 Percent

    Variables / Data Columns

    • Income Level: This column showcases the income levels (As mentioned above).
    • Mean Household Income: Mean household income, in 2022 inflation-adjusted dollars for the specific income level.

    Good to know

    Margin of Error

    Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

    Custom data

    If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

    Inspiration

    Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

    Recommended for further research

    This dataset is a part of the main dataset for Bay View median household income. You can refer the same here

  20. Coffee Shop Daily Revenue Prediction Dataset

    • kaggle.com
    zip
    Updated Feb 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Himel Sarder (2025). Coffee Shop Daily Revenue Prediction Dataset [Dataset]. https://www.kaggle.com/datasets/himelsarder/coffee-shop-daily-revenue-prediction-dataset
    Explore at:
    zip(30259 bytes)Available download formats
    Dataset updated
    Feb 7, 2025
    Authors
    Himel Sarder
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset Overview

    This dataset contains 2,000 rows of data from coffee shops, offering detailed insights into factors that influence daily revenue. It includes key operational and environmental variables that provide a comprehensive view of how business activities and external conditions affect sales performance. Designed for use in predictive analytics and business optimization, this dataset is a valuable resource for anyone looking to understand the relationship between customer behavior, operational decisions, and revenue generation in the food and beverage industry.

    Columns & Variables

    The dataset features a variety of columns that capture the operational details of coffee shops, including customer activity, store operations, and external factors such as marketing spend and location foot traffic.

    1. Number of Customers Per Day

      • The total number of customers visiting the coffee shop on any given day.
      • Range: 50 - 500 customers.
    2. Average Order Value ($)

      • The average dollar amount spent by each customer during their visit.
      • Range: $2.50 - $10.00.
    3. Operating Hours Per Day

      • The total number of hours the coffee shop is open for business each day.
      • Range: 6 - 18 hours.
    4. Number of Employees

      • The number of employees working on a given day. This can influence service speed, customer satisfaction, and ultimately, sales.
      • Range: 2 - 15 employees.
    5. Marketing Spend Per Day ($)

      • The amount of money spent on marketing campaigns or promotions on any given day.
      • Range: $10 - $500 per day.
    6. Location Foot Traffic (people/hour)

      • The number of people passing by the coffee shop per hour, a variable indicative of the shop's location and its potential to attract customers.
      • Range: 50 - 1000 people per hour.

    Target Variable

    • Daily Revenue ($)
      • This is the dependent variable representing the total revenue generated by the coffee shop each day.
      • It is calculated as a combination of customer visits, average spending, and other operational factors like marketing spend and staff availability.
      • Range: $200 - $10,000 per day.

    Data Distribution & Insights

    The dataset spans a wide variety of operational scenarios, from small neighborhood coffee shops with limited traffic to larger, high-traffic locations with extensive marketing budgets. This variety allows for exploring different predictive modeling strategies. Key insights that can be derived from the data include:

    • The effect of marketing spend on daily revenue.
    • The correlation between customer count and daily sales.
    • The relationship between staffing levels and revenue generation.
    • The influence of foot traffic and operating hours on customer behavior.

    Use Cases & Applications

    The dataset offers a wide range of applications, especially in predictive analytics, business optimization, and forecasting:

    • Predictive Modeling: Use machine learning models such as regression, decision trees, or neural networks to predict daily revenue based on operational data.
    • Business Strategy Development: Analyze how changes in marketing spend, staff numbers, or operating hours can optimize revenue and improve efficiency.
    • Customer Insights: Identify patterns in customer behavior related to shop operations and external factors like foot traffic and marketing campaigns.
    • Resource Allocation: Determine optimal staffing levels and marketing budgets based on predicted sales, improving overall profitability.

    Real-World Applications in the Food & Beverage Industry

    For coffee shop owners, managers, and analysts in the food and beverage industry, this dataset provides an essential tool for refining daily operations and boosting profitability. Insights gained from this data can help:

    • Optimize Marketing Campaigns: Evaluate the effectiveness of daily or seasonal marketing campaigns on revenue.
    • Staff Scheduling: Predict busy days and ensure that the right number of employees are scheduled to maximize efficiency.
    • Revenue Forecasting: Provide accurate revenue projections that can assist with financial planning and decision-making.
    • Operational Efficiency: Discover the most profitable operating hours and adjust business hours accordingly.

    This dataset is also ideal for aspiring data scientists and machine learning practitioners looking to apply their skills to real-world business problems in the food and beverage sector.

    Conclusion

    The Coffee Shop Revenue Prediction Dataset is a versatile and comprehensive resource for understanding the dynamics of daily sales performance in coffee shops. With a focus on key operational factors, it is perfect for building predictive models, ...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
The Devastator (2023). Weather and Housing in North America [Dataset]. https://www.kaggle.com/datasets/thedevastator/weather-and-housing-in-north-america
Organization logo

Weather and Housing in North America

Exploring the Relationship between Weather and Housing Conditions in 2012

Explore at:
zip(512280 bytes)Available download formats
Dataset updated
Feb 13, 2023
Authors
The Devastator
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered
North America
Description

Weather and Housing in North America

Exploring the Relationship between Weather and Housing Conditions in 2012

By [source]

About this dataset

This comprehensive dataset explores the relationship between housing and weather conditions across North America in 2012. Through a range of climate variables such as temperature, wind speed, humidity, pressure and visibility it provides unique insights into the weather-influenced environment of numerous regions. The interrelated nature of housing parameters such as longitude, latitude, median income, median house value and ocean proximity further enhances our understanding of how distinct climates play an integral part in area real estate valuations. Analyzing these two data sets offers a wealth of knowledge when it comes to understanding what factors can dictate the value and comfort level offered by residential areas throughout North America

More Datasets

For more datasets, click here.

Featured Notebooks

  • ๐Ÿšจ Your notebook can be here! ๐Ÿšจ!

How to use the dataset

This dataset offers plenty of insights into the effects of weather and housing on North American regions. To explore these relationships, you can perform data analysis on the variables provided.

First, start by examining descriptive statistics (i.e., mean, median, mode). This can help show you the general trend and distribution of each variable in this dataset. For example, what is the most common temperature in a given region? What is the average wind speed? How does this vary across different regions? By looking at descriptive statistics, you can get an initial idea of how various weather conditions and housing attributes interact with one another.

Next, explore correlations between variables. Are certain weather variables correlated with specific housing attributes? Is there a link between wind speeds and median house value? Or between humidity and ocean proximity? Analyzing correlations allows for deeper insights into how different aspects may influence one another for a given region or area. These correlations may also inform broader patterns that are present across multiple North American regions or countries.

Finally, use visualizations to further investigate this relationship between climate and housing attributes in North America in 2012. Graphs allow you visualize trends like seasonal variations or long-term changes over time more easily so they are useful when interpreting large amounts of data quickly while providing larger context beyond what numbers alone can tell us about relationships between different aspects within this dataset

Research Ideas

  • Analyzing the effect of climate change on housing markets across North America. By looking at temperature and weather trends in combination with housing values, researchers can better understand how climate change may be impacting certain regions differently than others.
  • Investigating the relationship between median income, house values and ocean proximity in coastal areas. Understanding how ocean proximity plays into housing prices may help inform real estate investment decisions and urban planning initiatives related to coastal development.
  • Utilizing differences in weather patterns across different climates to determine optimal seasonal rental prices for property owners. By analyzing changes in temperature, wind speed, humidity, pressure and visibility from season to season an investor could gain valuable insights into seasonal market trends to maximize their profits from rentals or Airbnb listings over time

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: Weather.csv | Column name | Description | |:---------------------|:-----------------------------------------------| | Date/Time | Date and time of the observation. (Date/Time) | | Temp_C | Temperature in Celsius. (Numeric) | | Dew Point Temp_C | Dew point temperature in Celsius. (Numeric) | | Rel Hum_% | Relative humidity in percent. (Numeric) | | Wind Speed_km/h | Wind speed in kilometers per hour. (Numeric) | | Visibility_km | Visibilit...

Search
Clear search
Close search
Google apps
Main menu