100+ datasets found

Customer Satisfaction Scores and Behavior Data
kaggle.com
zip
Updated Apr 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salahuddin Ahmed (2025). Customer Satisfaction Scores and Behavior Data [Dataset]. https://www.kaggle.com/datasets/salahuddinahmedshuvo/customer-satisfaction-scores-and-behavior-data/discussion
Explore at:
zip(2456 bytes)Available download formats
Dataset updated
Apr 6, 2025
Authors
Salahuddin Ahmed
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains customer satisfaction scores collected from a survey, alongside key demographic and behavioral data. It includes variables such as customer age, gender, location, purchase history, support contact status, loyalty level, and satisfaction factors. The dataset is designed to help analyze customer satisfaction, identify trends, and develop insights that can drive business decisions.

File Information: File Name: customer_satisfaction_data.csv (or your specific file name)

File Type: CSV (or the actual file format you are using)

Number of Rows: 120

Number of Columns: 10

Column Names:

Customer_ID – Unique identifier for each customer (e.g., 81-237-4704)

Group – The group to which the customer belongs (A or B)

Satisfaction_Score – Customer's satisfaction score on a scale of 1-10

Age – Age of the customer

Gender – Gender of the customer (Male, Female)

Location – Customer's location (e.g., Phoenix.AZ, Los Angeles.CA)

Purchase_History – Whether the customer has made a purchase (Yes or No)

Support_Contacted – Whether the customer has contacted support (Yes or No)

Loyalty_Level – Customer's loyalty level (Low, Medium, High)

Satisfaction_Factor – Primary factor contributing to customer satisfaction (e.g., Price, Product Quality)

Statistical Analyses:

Descriptive Statistics:

Calculate mean, median, mode, standard deviation, and range for key numerical variables (e.g., Satisfaction Score, Age).

Summarize categorical variables (e.g., Gender, Loyalty Level, Purchase History) with frequency distributions and percentages.

Two-Sample t-Test (Independent t-test):

Compare the mean satisfaction scores between two independent groups (e.g., Group A vs. Group B) to determine if there is a significant difference in their average satisfaction scores.

Paired t-Test:

If there are two related measurements (e.g., satisfaction scores before and after a certain event), you can compare the means using a paired t-test.

One-Way ANOVA (Analysis of Variance):

Test if there are significant differences in mean satisfaction scores across more than two groups (e.g., comparing the mean satisfaction score across different Loyalty Levels).

Chi-Square Test for Independence:

Examine the relationship between two categorical variables (e.g., Gender vs. Purchase History or Loyalty Level vs. Support Contacted) to determine if there’s a significant association.

Mann-Whitney U Test:

For non-normally distributed data, use this test to compare satisfaction scores between two independent groups (e.g., Group A vs. Group B) to see if their distributions differ significantly.

Kruskal-Wallis Test:

Similar to ANOVA, but used for non-normally distributed data. This test can compare the median satisfaction scores across multiple groups (e.g., comparing satisfaction scores across Loyalty Levels or Satisfaction Factors).

Spearman’s Rank Correlation:

Test for a monotonic relationship between two ordinal or continuous variables (e.g., Age vs. Satisfaction Score or Satisfaction Score vs. Loyalty Level).

Regression Analysis:

Linear Regression: Model the relationship between a continuous dependent variable (e.g., Satisfaction Score) and independent variables (e.g., Age, Gender, Loyalty Level).

Logistic Regression: If analyzing binary outcomes (e.g., Purchase History or Support Contacted), you could model the probability of an outcome based on predictors.

Factor Analysis:

To identify underlying patterns or groups in customer behavior or satisfaction factors, you can apply Factor Analysis to reduce the dimensionality of the dataset and group similar variables.

Cluster Analysis:

Use K-Means Clustering or Hierarchical Clustering to group customers based on similarity in their satisfaction scores and other features (e.g., Loyalty Level, Purchase History).

Confidence Intervals:

Calculate confidence intervals for the mean of satisfaction scores or any other metric to estimate the range in which the true population mean might lie.
I
Cities' average distance to surface diversion sources
ihp-wins.unesco.org
cloud.csiss.gmu.edu
+1more
shp
Updated Feb 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Intergovernmental Hydrological Programme (2024). Cities' average distance to surface diversion sources [Dataset]. https://ihp-wins.unesco.org/dataset/cities-average-distance-to-surface-diversion-sources
Explore at:
shpAvailable download formats
Dataset updated
Feb 5, 2024
Dataset provided by
Intergovernmental Hydrological Programme
Description
The average distance to surface diversion sources is expressed in kilometers. Today, cities are looking farther beyond their limits for clean water. On average, cities retrieve surface water from an average distance of 57.86 km.Note that if a city gets a small fraction of its water from surface water, there will be calculated values for this metric, but it is not particularly meaningful for a city's water risk or opportunity profile.For more information, access the Urban Water Blueprint report here: http://www.iwa-network.org/wp-content/uploads/2016/06/Urban-Water-Blueprint-Report.pdfYou can also visit the Urban Water Blueprint website here: http://water.nature.org/waterblueprint/#/intro=true
a
Annual Average Temperature Change - Projections (12km)
hub.arcgis.com
climatedataportal.metoffice.gov.uk
+1more
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Met Office (2023). Annual Average Temperature Change - Projections (12km) [Dataset]. https://hub.arcgis.com/datasets/cf8f426fffde4956af27a38857cd55b9
Explore at:
Dataset updated
Jun 1, 2023
Dataset authored and provided by
Met Office
Area covered

Description
[Updated 28/01/25 to fix an issue in the ‘Lower’ values, which were not fully representing the range of uncertainty. ‘Median’ and ‘Higher’ values remain unchanged. The size of the change varies by grid cell and fixed period/global warming levels but the average difference between the 'lower' values before and after this update is 0.13°C.]What does the data show? This dataset shows the change in annual temperature for a range of global warming levels, including the recent past (2001-2020), compared to the 1981-2000 baseline period. Note, as the values in this dataset are averaged over a year they do not represent possible extreme conditions.The dataset uses projections of daily average air temperature from UKCP18 which are averaged to give values for the 1981-2000 baseline, the recent past (2001-2020) and global warming levels. The warming levels available are 1.5°C, 2.0°C, 2.5°C, 3.0°C and 4.0°C above the pre-industrial (1850-1900) period. The recent past value and global warming level values are stated as a change (in °C) relative to the 1981-2000 value. This enables users to compare annual average temperature trends for the different periods. In addition to the change values, values for the 1981-2000 baseline (corresponding to 0.51°C warming) and recent past (2001-2020, corresponding to 0.87°C warming) are also provided. This is summarised in the table below.

PeriodDescription 1981-2000 baselineAverage temperature (°C) for the period 2001-2020 (recent past)Average temperature (°C) for the period 2001-2020 (recent past) changeTemperature change (°C) relative to 1981-2000 1.5°C global warming level changeTemperature change (°C) relative to 1981-2000 2°C global warming level changeTemperature change (°C) relative to 1981-20002.5°C global warming level changeTemperature change (°C) relative to 1981-2000 3°C global warming level changeTemperature change (°C) relative to 1981-2000 4°C global warming level changeTemperature change (°C) relative to 1981-2000What is a global warming level?The Annual Average Temperature Change is calculated from the UKCP18 regional climate projections using the high emissions scenario (RCP 8.5) where greenhouse gas emissions continue to grow. Instead of considering future climate change during specific time periods (e.g. decades) for this scenario, the dataset is calculated at various levels of global warming relative to the pre-industrial (1850-1900) period. The world has already warmed by around 1.1°C (between 1850–1900 and 2011–2020), whilst this dataset allows for the exploration of greater levels of warming. The global warming levels available in this dataset are 1.5°C, 2°C, 2.5°C, 3°C and 4°C. The data at each warming level was calculated using a 21 year period. These 21 year periods are calculated by taking 10 years either side of the first year at which the global warming level is reached. This time will be different for different model ensemble members. To calculate the value for the Annual Average Temperature Change, an average is taken across the 21 year period.We cannot provide a precise likelihood for particular emission scenarios being followed in the real world future. However, we do note that RCP8.5 corresponds to emissions considerably above those expected with current international policy agreements. The results are also expressed for several global warming levels because we do not yet know which level will be reached in the real climate as it will depend on future greenhouse emission choices and the sensitivity of the climate system, which is uncertain. Estimates based on the assumption of current international agreements on greenhouse gas emissions suggest a median warming level in the region of 2.4-2.8°C, but it could either be higher or lower than this level.What are the naming conventions and how do I explore the data?This data contains a field for the 1981-2000 baseline, 2001-2020 period and each warming level. They are named 'tas annual change' (change in air 'temperature at surface'), the warming level or historic time period, and 'upper' 'median' or 'lower' as per the description below. e.g. 'tas annual change 2.0 median' is the median value for the 2.0°C warming level. Decimal points are included in field aliases but not in field names, e.g. 'tas annual change 2.0 median' is named 'tas_annual_change_20_median'. To understand how to explore the data, refer to the New Users ESRI Storymap. Please note, if viewing in ArcGIS Map Viewer, the map will default to ‘tas annual change 2.0°C median’ values.What do the 'median', 'upper', and 'lower' values mean?Climate models are numerical representations of the climate system. To capture uncertainty in projections for the future, an ensemble, or group, of climate models are run. Each ensemble member has slightly different starting conditions or model set-ups. Considering all of the model outcomes gives users a range of plausible conditions which could occur in the future.For this dataset, the model projections consist of 12 separate ensemble members. To select which ensemble members to use, the Annual Average Temperature Change was calculated for each ensemble member and they were then ranked in order from lowest to highest for each location.The ‘lower’ fields are the second lowest ranked ensemble member. The ‘higher’ fields are the second highest ranked ensemble member. The ‘median’ field is the central value of the ensemble.This gives a median value, and a spread of the ensemble members indicating the range of possible outcomes in the projections. This spread of outputs can be used to infer the uncertainty in the projections. The larger the difference between the lower and higher fields, the greater the uncertainty.‘Lower’, ‘median’ and ‘upper’ are also given for the baseline period as these values also come from the model that was used to produce the projections. This allows a fair comparison between the model projections and recent past. Useful linksFor further information on the UK Climate Projections (UKCP).Further information on understanding climate data within the Met Office Climate Data Portal.
US Average, Maximum, and Minimum Temperatures
kaggle.com
zip
Updated Jan 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). US Average, Maximum, and Minimum Temperatures [Dataset]. https://www.kaggle.com/datasets/thedevastator/2015-us-average-maximum-and-minimum-temperatures
Explore at:
zip(9429155 bytes)Available download formats
Dataset updated
Jan 18, 2023
Authors
The Devastator
Area covered
United States
Description
US Average, Maximum, and Minimum Temperatures

Analyzing Daily Temperatures Across the USA

By Matthew Winter [source]

About this dataset

This dataset features the daily temperature summaries from various weather stations across the United States. It includes information such as location, average temperature, maximum temperature, minimum temperature, state name, state code, and zip code. All the data contained in this dataset has been filtered so that any values equaling -999 were removed. With this powerful set of data you to explore how climate conditions changed throughout the year and how they varied across different regions of the country. Dive into your own research today to uncover fascinating climate trends or use it to further narrow your studies specific to a region or city

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset offers a detailed look at daily average, minimum, and maximum temperatures across the United States. It contains information from 1120 weather stations throughout the year to provide a comprehensive look at temperature trends for the year.

The data contains a variety of columns including station, station name, location (latitude and longitude), state name zip code and date. The primary focus of this dataset is on the AvgTemp, MaxTemp and MinTemp columns which provide daily average, maximum and minimum temperature records respectively in degrees Fahrenheit.

To use this dataset effectively it is useful to consider multiple views before undertaking any analysis or making conclusions:
- Plot each individual record versus time by creating a line graph with stations as labels on different lines indicating changes over time. Doing so can help identify outliers that may need further examination; much like viewing data on a scatterplot looking for confidence bands or examining variance between points that are otherwise hard to see when all points are plotted on one graph only.
- A comparison of states can be made through creating grouped bar charts where states are grouped together with Avg/Max/Min temperatures included within each chart - thereby showing any variance that may exist between states during a specific period about which it's possible to make observations about themselves (rather than comparing them). For example - you could observe if there was an abnormally high temperature increase in California during July compared with other US states since all measurements would be represented visually providing opportunity for insights quickly compared with having to manually calculate figures from raw data sets only.

With these two initial approaches there will also be further visualizations possible regarding correlations between particular geographical areas versus different climatic conditions or through population analysis such as correlating areas warmer/colder than median observances verses relative population densities etc.. providing additional opportunities for investigation particularly when combined with key metrics collected over multiple years versus one single year's results exclusively allowing wider inferences to be made depending upon what is being requested in terms of outcomes desired from those who may explore this data set further down the line beyond its original compilation starter point here today!

Research Ideas

Using the Latitude and Longitude values, this dataset can be used to create a map of average temperatures across the USA. This would be useful for seeing which areas were consistently hotter or colder than others throughout the year.

Using the AvgTemp and StateName columns, predictors could use regression modeling to predict what temperature an area will have in a given month based on it's average temperature.

By using the Date column and plotting it alongside MaxTemp or MinTemp values, visualization methods such as timelines could be utilized to show how temperatures changed during different times of year across various states in the US

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

Unknown License - Please check the dataset description for more information.

Columns

File: 2015 USA Weather Data FINAL.csv

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Matthew Winter.
Existing own homes; average purchase prices, region
data.overheid.nl
cbs.nl
+1more
atom, json
Updated Feb 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centraal Bureau voor de Statistiek (Rijk) (2025). Existing own homes; average purchase prices, region [Dataset]. https://data.overheid.nl/dataset/4146-existing-own-homes--average-purchase-prices--region
Explore at:
json(KB), atom(KB)Available download formats
Dataset updated
Feb 17, 2025
Dataset provided by
Statistics Netherlands
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This table shows the average purchase price that has been paid in the reporting period for existing own homes purchased by a private individual. The average purchase price of existing own homes may differ from the price index of existing own homes. The average purchase price is no indicator for price developments of owner-occupied residential property. The average purchase price reflects the average price of dwellings sold in a particular period. The fact that de dwellings sold differs from one period to another is not taken into account. The following instance explains which problems are entailed by the continually changing of the quality of the dwellings sold. Suppose in February of a particular year mainly big houses with extensive gardens beautifully situated alongside canals are sold, whereas in March many small terraced houses are sold. In that case the average purchase price in February will be higher than in March but this does not mean that house prices are increased. See note 3 for a link to the article 'Why the average purchase price is not an indicator'.

Data available from: 1995

Status of the figures: The figures in this table are immediately definitive. The calculation of these figures is based on the number of notary transactions that are registered every month by the Dutch Land Registry Office (Kadaster). A revision of the figures is exceptional and occurs specifically if an error significantly exceeds the acceptable statistical margins. The average purchasing prices of existing owner-occupied sold homes can be calculated by Kadaster at a later date. These figures are usually the same as the publication on Statline, but in some periods they differ. Kadaster calculates the average purchasing prices based on the most recent data. These may have changed since the first publication. Statistics Netherlands uses figures from the first publication in accordance with the revision policy described above.

Changes as of 17 February 2025: Added average purchase prices of the municipalities for the year 2024.

When will new figures be published? New figures are published approximately one to three months after the period under review.
Earnings by Workplace, Borough - Dataset - data.gov.uk
ckan.publishing.service.gov.uk
Updated Jun 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2025). Earnings by Workplace, Borough - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/earnings-by-workplace-borough
Explore at:
Dataset updated
Jun 9, 2025
Dataset provided by
CKANhttps://ckan.org/
Description
This dataset provides information about earnings of employees who are working in an area, who are on adult rates and whose pay for the survey pay-period was not affected by absence. Tables provided here include total gross weekly earnings, and full time weekly earnings with breakdowns by gender, and annual median, mean and lower quartile earnings by borough and UK region. These are provided both in nominal and real terms. Real earnings figures are on sheets labelled "real", are in 2016 prices, and calculated by applying ONS’s annual CPI index series for April to ASHE data. Annual Survey of Hours and Earnings (ASHE) is based on a sample of employee jobs taken from HM Revenue & Customs PAYE records. Information on earnings and hours is obtained in confidence from employers. ASHE does not cover the self-employed nor does it cover employees not paid during the reference period. The earnings information presented relates to gross pay before tax, National Insurance or other deductions, and excludes payments in kind. The confidence figure is the coefficient of variation (CV) of that estimate. The CV is the ratio of the standard error of an estimate to the estimate itself and is expressed as a percentage. The smaller the coefficient of variation the greater the accuracy of the estimate. The true value is likely to lie within +/- twice the CV. Results for 2003 and earlier exclude supplementary surveys. In 2006 there were a number of methodological changes made. For further details goto : http://www.nomisweb.co.uk/articles/341.aspx. The headline statistics for ASHE are based on the median rather than the mean. The median is the value below which 50 per cent of employees fall. It is ONS's preferred measure of average earnings as it is less affected by a relatively small number of very high earners and the skewed distribution of earnings. It therefore gives a better indication of typical pay than the mean. Survey data from a sample frame, use caution if using for performance measurement and trend analysis '#' These figures are suppressed as statistically unreliable. ! Estimate and confidence interval not available since the group sample size is zero or disclosive (0-2). Furthermore, data from Abstract of Regional Statistics, New Earnings Survey and ASHE have been combined to create long run historical series of full-time weekly earnings data for London and Great Britain, stretching back to 1965, and is broken down by sex.
Head Hunting
kaggle.com
zip
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mariyam Al Shatta (2023). Head Hunting [Dataset]. https://www.kaggle.com/datasets/mariyamalshatta/head-hunting/code
Explore at:
zip(1515 bytes)Available download formats
Dataset updated
Nov 8, 2023
Authors
Mariyam Al Shatta
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Business Context

A research institute conducts a Talent Hunt Examination every year to hire people who can work on various research projects in the field of Mathematics and Computer Science. A2Z institute provides a preparatory program to help the aspirants prepare for the Talent Hunt Exam. The institute has a good record of helping many students clear the exam. Before the application for the next batch starts, the institute wants to attract more aspirants to their program. For this, the institute wants to assure the aspiring students of the quality of results obtained by students enrolled in their program in recent years.

However, one challenge in estimating an average score is that every year the exam’s difficulty level varies a little, and the distribution of scores also changes accordingly. The institute keeps a track of the final scores of its alumni who attempted the exam previously. A dataset constituted of a simple random sample of final scores of 600 aspirants from the last three years is prepared by the institute.

Objective

The institute wants to provide an estimate of the average score obtained by aspirants who enroll in their program. Keeping in mind the variation in scores every year, the institute wants to provide a more reliable estimate of the average score using a range of scores instead of a single estimate. It is known from previous records that the standard deviation of the scores is 10 and the cut-off score in the most recent year was 84.

A recent social media post from A2Z institute received feedback from a reputed critic, mentioning that the students from A2Z institute score less than last year's cut-off on average. The institute wants to test if the claim by the critic is valid.

Solution Approach

To provide a more reliable estimate of the average score using a range of scores instead of a single estimate, we will construct a 95% confidence interval for the mean score that an aspirant has scored after enrolling in the institute’s program. To test the validity of the critic's claim (the mean score of the students from A2Z institute is less than last year’s cut-off score of 84), we will perform a hypothesis test (taking alpha = 5%)

Data

The dataset provided (Talent_hunt.csv) contains the final scores of 600 aspirants enrolled in the institute’s program in the last three years.
N
Income Distribution by Quintile: Mean Household Income in Chula Vista, CA //...
neilsberg.com
csv, json
Updated Mar 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Income Distribution by Quintile: Mean Household Income in Chula Vista, CA // 2025 Edition [Dataset]. https://www.neilsberg.com/insights/chula-vista-ca-median-household-income/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Mar 3, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Chula Vista, California
Variables measured
Income Level, Mean Household Income
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. It delineates income distributions across income quintiles (mentioned above) following an initial analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series via current methods (R-CPI-U-RS). For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents the mean household income for each of the five quintiles in Chula Vista, CA, as reported by the U.S. Census Bureau. The dataset highlights the variation in mean household income across quintiles, offering valuable insights into income distribution and inequality.

Key observations

Income disparities: The mean income of the lowest quintile (20% of households with the lowest income) is 22,855, while the mean income for the highest quintile (20% of households with the highest income) is 285,394. This indicates that the top earners earn 12 times compared to the lowest earners.

*Top 5%: * The mean household income for the wealthiest population (top 5%) is 453,494, which is 158.90% higher compared to the highest quintile, and 1984.22% higher compared to the lowest quintile.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Income Levels:

Lowest Quintile

Second Quintile

Third Quintile

Fourth Quintile

Highest Quintile

Top 5 Percent

Variables / Data Columns

Income Level: This column showcases the income levels (As mentioned above).

Mean Household Income: Mean household income, in 2023 inflation-adjusted dollars for the specific income level.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Chula Vista median household income. You can refer the same here
N
Median Household Income Variation by Family Size in Texas City, TX:...
neilsberg.com
csv, json
Updated Jan 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Median Household Income Variation by Family Size in Texas City, TX: Comparative analysis across 7 household sizes [Dataset]. https://www.neilsberg.com/research/datasets/1b81907b-73fd-11ee-949f-3860777c1fe6/
Explore at:
json, csvAvailable download formats
Dataset updated
Jan 11, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Texas City, Texas
Variables measured
Household size, Median Household Income
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across 7 household sizes (mentioned above) following an initial analysis and categorization. Using this dataset, you can find out how household income varies with the size of the family unit. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents median household incomes for various household sizes in Texas City, TX, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.

Key observations

Of the 7 household sizes (1 person to 7-or-more person households) reported by the census bureau, Texas City did not include 6, or 7-person households. Across the different household sizes in Texas City the mean income is $65,952, and the standard deviation is $20,348. The coefficient of variation (CV) is 30.85%. This high CV indicates high relative variability, suggesting that the incomes vary significantly across different sizes of households.

In the most recent year, 2021, The smallest household size for which the bureau reported a median household income was 1-person households, with an income of $31,474. It then further increased to $65,395 for 5-person households, the largest household size for which the bureau reported a median household income.

https://i.neilsberg.com/ch/texas-city-tx-median-household-income-by-household-size.jpeg" alt="Texas City, TX median household income, by household size (in 2022 inflation-adjusted dollars)">

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

Household Sizes:

1-person households

2-person households

3-person households

4-person households

5-person households

6-person households

7-or-more-person households

Variables / Data Columns

Household Size: This column showcases 7 household sizes ranging from 1-person households to 7-or-more-person households (As mentioned above).

Median Household Income: Median household income, in 2022 inflation-adjusted dollars for the specific household size.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Texas City median household income. You can refer the same here
N
Median Household Income Variation by Family Size in San Diego County, CA:...
neilsberg.com
csv, json
Updated Jan 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Median Household Income Variation by Family Size in San Diego County, CA: Comparative analysis across 7 household sizes [Dataset]. https://www.neilsberg.com/research/datasets/1b68c952-73fd-11ee-949f-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Jan 11, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
San Diego County, California
Variables measured
Household size, Median Household Income
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. It delineates income distributions across 7 household sizes (mentioned above) following an initial analysis and categorization. Using this dataset, you can find out how household income varies with the size of the family unit. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents median household incomes for various household sizes in San Diego County, CA, as reported by the U.S. Census Bureau. The dataset highlights the variation in median household income with the size of the family unit, offering valuable insights into economic trends and disparities within different household sizes, aiding in data analysis and decision-making.

Key observations

Of the 7 household sizes (1 person to 7-or-more person households) reported by the census bureau, all of the household sizes were found in San Diego County. Across the different household sizes in San Diego County the mean income is $114,248, and the standard deviation is $29,606. The coefficient of variation (CV) is 25.91%. This high CV indicates high relative variability, suggesting that the incomes vary significantly across different sizes of households.

In the most recent year, 2021, The smallest household size for which the bureau reported a median household income was 1-person households, with an income of $52,320. It then further increased to $136,905 for 7-person households, the largest household size for which the bureau reported a median household income.

https://i.neilsberg.com/ch/san-diego-county-ca-median-household-income-by-household-size.jpeg" alt="San Diego County, CA median household income, by household size (in 2022 inflation-adjusted dollars)">

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

Household Sizes:

1-person households

2-person households

3-person households

4-person households

5-person households

6-person households

7-or-more-person households

Variables / Data Columns

Household Size: This column showcases 7 household sizes ranging from 1-person households to 7-or-more-person households (As mentioned above).

Median Household Income: Median household income, in 2022 inflation-adjusted dollars for the specific household size.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for San Diego County median household income. You can refer the same here
Global Land and Surface Temperature Trends
kaggle.com
zip
Updated Jan 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Global Land and Surface Temperature Trends [Dataset]. https://www.kaggle.com/datasets/thedevastator/global-land-and-surface-temperature-trends-analy
Explore at:
zip(16000936 bytes)Available download formats
Dataset updated
Jan 11, 2023
Authors
The Devastator
Description
Global Land and Surface Temperature Trends Analysis

Assessing climate change year by year

By IBM Watson AI XPRIZE - Environment [source]

About this dataset

This dataset from Kaggle contains global land and surface temperature data from major cities around the world. By relying on the raw temperature reports that form the foundation of their averaging system, researchers are able to accurately track climate change over time. With this dataset, we can observe monthly averages and create detailed gridded temperature fields to analyze localized data on a country-by-country basis. The information in this dataset has allowed us to gain a better understanding of our changing planet and how certain regions are being impacted more than others by climate change. With such insights, we can look towards developing better responses and strategies as our temperatures continue to increase over time

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

Introduction

This guide will show you how to use this dataset to explore global climate change trends over time.

Exploring the Dataset

Select one or more countries by using df[df['Country']=='countryname'] command in order to filter out any unnecessary information that is not related to those countries;

Use df.groupby('City')['AverageTemperature'] command in order to group all cities together with their respective average temperatures;

Compute basic summary statistics such as mean or median for each group with df['AverageTemperature'].{mean(),median()}, where {} can be replaced with mean or median according various statistic requirements;

4 .Plot a graph comparing these results from line plots or bar charts with pandas plot function such as df[column].plot(kind='line'/'bar'), etc., which can help visualize certain trends associated form these groups

You can also use latitude/longitude coordinates provided alongwith every record further decompose records by location using folium library within python such as folium maps that provide visualization features & zoomable maps alongwith many other rendering options within them like mapping locations according different color shades & size based on different parameters given.. These are just some ways you could visualize your data! There are plenty more possibilities!

Research Ideas

Analyzing temperature changes across different countries to identify regional climate trends and abnormalities.

Investigating how global warming is affecting urban areas by looking at the average temperatures of major cities over time.

Comparing historic average temperatures for a given region to current day average temperatures to quantify the magnitude of global warming in that region.

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.

Columns

File: GlobalLandTemperaturesByCountry.csv | Column name | Description | |:----------------------------------|:--------------------------------------------------------------| | dt | Date of the temperature measurement. (Date) | | AverageTemperature | Average temperature for the given date. (Float) | | AverageTemperatureUncertainty | Uncertainty of the average temperature measurement. (Float) | | Country | Country where the temperature measurement was taken. (String) |

File: GlobalLandTemperaturesByMajorCity.csv | Column name | Description | |:----------------------------------|:-----------------------------------------------------------------------| | dt | Date...
IPPS DRG Provider Summary
kaggle.com
zip
Updated Jan 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). IPPS DRG Provider Summary [Dataset]. https://www.kaggle.com/datasets/thedevastator/ipps-drg-provider-summary
Explore at:
zip(8432015 bytes)Available download formats
Dataset updated
Jan 23, 2023
Authors
The Devastator
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
IPPS DRG Provider Summary

Average Discharges, Charges, and Medicare Payments

By Health [source]

About this dataset

This dataset is a valuable resource for gaining insight into Inpatient Prospective Payment System (IPPS) utilization, average charges and average Medicare payments across the top 100 Diagnosis-Related Groups (DRG). With column categories such as DRG Definition, Hospital Referral Region Description, Total Discharges, Average Covered Charges, Average Medicare Payments and Average Medicare Payments 2 this dataset enables researchers to discover and assess healthcare trends in areas such as provider payment comparsons by geographic location or compare service cost across hospital. Visualize the data using various methods to uncover unique information and drive further hospital research

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset provides a provider level summary of Inpatient Prospective Payment System (IPPS) discharges, average charges and average Medicare payments for the Top 100 Diagnosis-Related Groups (DRG). This data can be used to analyze cost and utilization trends across hospital DRGs.

To make the most use of this dataset, here are some steps to consider:

Understand what each column means in the table: Each column provides different information from the DRG Definition to Hospital Referral Region Description and Average Medicare Payments.

Analyze the data by looking for patterns amongst the relevant columns: Compare different aspects such as total discharges or average Medicare payments by hospital referral region or DRG Definition. This can help identify any potential trends amongst different categories within your analysis.

Generate visualizations: Create charts, graphs, or maps that display your data in an easy-to-understand format using tools such as Microsoft Excel or Tableau. Such visuals may reveal more insights into patterns within your data than simply reading numerical values on a spreadsheet could provide alone.

Research Ideas

Identifying potential areas of cost savings by drilling down to particular DRGs and hospital regions with the highest average covered charges compared to average Medicare payments.

Establishing benchmarks for typical charges and payments across different DRGs and hospital regions to help providers set market-appropriate prices.

Analyzing trends in total discharges, charges and Medicare payments over time, allowing healthcare organizations to measure their performance against regional peers

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: Open Database License (ODbL) v1.0 - You are free to: - Share - copy and redistribute the material in any medium or format. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices. - No Derivatives - If you remix, transform, or build upon the material, you may not distribute the modified material. - No additional restrictions - You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

Columns

File: 97k6-zzx3.csv | Column name | Description | |:-----------------------------------------|:------------------------------------------------------| | drg_definition | Diagnosis-Related Group (DRG) definition. (String) | | average_medicare_payments | Average Medicare payments for each DRG. (Numeric) | | hospital_referral_region_description | Description of the hospital referral region. (String) | | total_discharges | Total number of discharges for each DRG. (Numeric) | | average_covered_charges | Average covered charges for each DRG. (Numeric) | | average_medicare_payments_2 | Average Medicare payments for each DRG. (Numeric) |

**File: Inpatient_Prospective_Payment_System_IPPS_Provider_Summary_for_the_Top_100_Diagnosis-Related_Groups_DRG...
Medical Trial Dataset
kaggle.com
zip
Updated Sep 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ta-wei Lo (2024). Medical Trial Dataset [Dataset]. https://www.kaggle.com/datasets/taweilo/medical-trial-dataset
Explore at:
zip(18019 bytes)Available download formats
Dataset updated
Sep 16, 2024
Authors
Ta-wei Lo
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Medical Trial Case Study

1) Background Information

XYZ client is currently using Medication A for all their patients and is considering switching to Medication B. An essential aspect of evaluating Medication B is determining the anticipated usage in XYZ's patients. A trial was conducted to assess Medication B's effectiveness, and data for approximately 130 patients has been collected. This data includes information from at least 2 months prior to switching medications and up to 3 months after switching to Medication B.

Key considerations: - Patients can be on either Medication A or Medication B, but not both simultaneously. - Medication B is administered less frequently (~1 time per month) than Medication A. - The units for Medication A and Medication B are different and cannot be converted between each other. - Time on Medication A is defined as the period between the first and last recorded administration of Medication A. - A week is defined as 7 days, and a month is assumed to be 4.33 weeks.

2) Metadata of the File

The data file contains the following information:

**Admin: **
- ID: patient ID
- Med: Med type - Admin Date: Dates of administration - Units: Dosage units administered for each medication

Labs:
- ID: patient ID - DRAW_DATE: draw date - LAB_RESULT_CODE: different types of lab tests - LAB_VALUE: lab values

3) Business Goal

The main objective is to evaluate the potential adoption of Medication B by XYZ's patients. Specifically, the goal is to analyze the usage patterns, switching trends, and dosing behavior to make informed decisions regarding the transition from Medication A to Medication B. Additionally, the cost-effectiveness of Medication B compared to Medication A will be assessed.

4) Possible Questions and Analysis

Total Monthly Medication Usage: What is the total number of units administered for each medication in each month across all patients?

Patient Counts on Each Medication: How many patients received Medication A and Medication B from July to November?

Average Monthly Dose per Patient: What is the average total monthly dose per patient for each medication from July to November?

Switching Analysis: How many patients switched from Medication A to Medication B each month (September, October, November)? How many patients started on Medication B without being on Medication A in the past?

Time on Medication A Before Switch: For patients who switched to Medication B, what is the average number of weeks spent on Medication A before switching?

Dose Comparison Before and After Switch: What is the average monthly dose of Medication A for patients before switching to Medication B? What is the average monthly dose of Medication B post-switch? Breakeven Analysis: If Medication A costs $1 for 100 units, what is the breakeven price point for Medication B on a per-unit basis?

Dose Change Over Time: How does the average total monthly dose per patient (for both Medication A and B) change for patients switched in September vs. October vs. November?

Second Dose Analysis: For patients switched to Medication B: What percentage of the second Medication B dose is the same, higher, lower, or zero compared to the first dose? Lab Value Comparison: For patients that switched from Medication A to B, what was the average LAB B value while on Medication A compared to while on Medication B?

This structured approach will help identify the key metrics necessary to decide whether Medication B is a suitable replacement for Medication A across XYZ’s patients.

Feel free to leave comments on the discussion. I'd appreciate your upvote if you find my dataset useful! 😀
N
Comprehensive Median Household Income and Distribution Dataset for Franklin...
neilsberg.com
Updated Jan 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2024). Comprehensive Median Household Income and Distribution Dataset for Franklin County, OH: Analysis by Household Type, Size and Income Brackets [Dataset]. https://www.neilsberg.com/research/datasets/cd9bcce9-b041-11ee-aaca-3860777c1fe6/
Explore at:
Dataset updated
Jan 11, 2024
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Franklin County, Ohio
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the median household income in Franklin County. It can be utilized to understand the trend in median household income and to analyze the income distribution in Franklin County by household type, size, and across various income brackets.

Content

The dataset will have the following datasets when applicable

Please note: The 2020 1-Year ACS estimates data was not reported by the Census Bureau due to the impact on survey collection and analysis caused by COVID-19. Consequently, median household income data for 2020 is unavailable for large cities (population 65,000 and above).

Franklin County, OH Median Household Income Trends (2010-2021, in 2022 inflation-adjusted dollars)

Median Household Income Variation by Family Size in Franklin County, OH: Comparative analysis across 7 household sizes

Income Distribution by Quintile: Mean Household Income in Franklin County, OH

Franklin County, OH households by income brackets: family, non-family, and total, in 2022 inflation-adjusted dollars

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Interested in deeper insights and visual analysis?

Explore our comprehensive data analysis and visual representations for a deeper understanding of Franklin County median household income. You can refer the same here
Global Average Human Height by Age and Country
kaggle.com
zip
Updated Feb 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdul Rahman Nazir Ahmed (2025). Global Average Human Height by Age and Country [Dataset]. https://www.kaggle.com/datasets/quantumgoat/global-average-human-height-by-age-and-country
Explore at:
zip(5892 bytes)Available download formats
Dataset updated
Feb 15, 2025
Authors
Abdul Rahman Nazir Ahmed
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset provides insights into the average height of boys and girls at different ages (5, 10, 15, and 19) across multiple countries. The data has been sourced from various online sources, including government reports, research studies, and health organizations. It can be useful for analyzing trends in child growth, nutrition, and global health disparities.

Researchers, data analysts, and policymakers can leverage this dataset to compare growth patterns across countries and explore how factors like nutrition, healthcare, and socio-economic conditions impact height development over time."*

Let me know if you need further refinements! 🚀
Unemployment by Age Groups Dataset
kaggle.com
zip
Updated Jun 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sahir Maharaj (2024). Unemployment by Age Groups Dataset [Dataset]. https://www.kaggle.com/datasets/sahirmaharajj/unemployment-by-age-groups-dataset
Explore at:
zip(3412 bytes)Available download formats
Dataset updated
Jun 23, 2024
Authors
Sahir Maharaj
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset contains non-seasonally adjusted California Unemployment Rate by age groups, from the Current Population Survey (CPS). The age group ranges are as follows; 16-19 ; 20 - 24; 25 - 34; 35 - 44; 45 - 54; 55 -64; 65+. This data is based on a 12-month moving average.

This dataset is invaluable for data science applications due to its granularity and the historical depth it offers. With detailed monthly data on unemployment rates by age groups, data scientists can perform a myriad of analyses:

Time Series Analysis: Analyzing trends over time to forecast future unemployment rates or identify cyclic patterns.

Demographic Analysis: Understanding which age groups are most affected by unemployment and how this changes through economic conditions.

Geographic Analysis: Although the dataset currently focuses on California, if similar data is available for other regions, comparisons can be made to understand regional economic differences.

The dataset can also be merged with other socioeconomic indicators like GDP, education levels, and industry growth metrics to examine broader economic narratives or policy impacts.

American Time Use Survey

kaggle.com

zip

Updated Dec 19, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

The Devastator (2023). American Time Use Survey [Dataset]. https://www.kaggle.com/datasets/thedevastator/american-time-use-survey

Explore at:

zip(21831 bytes)Available download formats

Dataset updated

Dec 19, 2023

Authors

The Devastator

Description

American Time Use Survey

Average hours spent on activities in the US over a decade

By Throwback Thursday [source]

About this dataset

How to use the dataset

1. Familiarize Yourself with the Columns

To begin, let's familiarize ourselves with the columns in this dataset:

Year: The year in which the data was collected.

Series: The name of the series, representing a specific category or topic.

Sub-Series: Additional details or categorization within the series.

Type: Specifies the type of activity being measured.

Average Hours: The average number of hours spent on the activity.

These columns will be key in understanding and analyzing trends and patterns over time.

2. Focus on Series and Sub-Series

The 'Series' column represents specific categories or topics, while 'Sub-Series' provides additional details or categorization within those categories. Start by exploring these columns to gain an overview of different activities covered in this survey.

For example, you can filter by a particular series such as 'Work', then further narrow it down using sub-series like 'Paid Work' or 'Unpaid Work'. This will help you dive deeper into specific areas of interest.

3. Analyze Types of Activities

The 'Type' column specifies the type of activity being measured. It allows you to identify different types within each series/sub-series combination.

Use this information to segment activities based on their nature or characteristics. For instance, within the Leisure series, you may have sub-series like Socializing, Sports, and Entertainment. Analyzing these types individually can provide unique insights into how people spend their leisure time over a decade.

4. Investigate Average Hours Spent

The 'Average Hours' column quantifies how much time individuals spent on each specified activity on average. Use this numerical data to identify activities that are more time-consuming compared to others.

As you explore different series, sub-series, and types of activities, pay attention to any significant changes in the average hours spent over the years. This will allow you to uncover interesting trends and patterns in time use over the decade covered by this dataset.

5. Combine Filters for Deeper Analysis

To perform more specific analysis, combine multiple filters from different columns simultaneously. For example, you can filter by a particular series like 'Leisure' and then choose a specific sub-series like 'Sports'. Next, further narrow down your analysis by selecting a

Research Ideas

Analyzing trends in time use: Researchers can use this dataset to analyze how the average hours spent on different activities have changed over a decade. They can identify trends and patterns in time allocation, such as changes in leisure activities, work-related tasks, or household chores.

Comparing sub-groups: The dataset includes sub-series and types of activities, which allows researchers to compare average hours spent on different activities across various sub-groups of the population. For example, they can analyze if there are any differences between genders in terms of time spent on childcare or leisure activities.

Understanding societal shifts: By examining the changes in average hours spent on specific series or sub-series over time, researchers can gain insights into societal shifts and changing priorities. This dataset provides an opportunity to understand how behaviors and attitudes towards different activities may have evolved over a decade

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

See the dataset description for more information.

Columns

Column name	Description
Year	The year in which the data was collected. (Numeric)
Series	The name of the series, which represents a specific category or topic. (Text)
Sub-Series	Additional details or categorization within the series. (Text)
Type	Specifies the type of activity being measured. (Text)
Average Hours	The average number of hours spent on the activity. (Numeric)

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Throwback Thursday.

HN Ask and Show Posts
kaggle.com
zip
Updated Oct 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaun Oilund (2022). HN Ask and Show Posts [Dataset]. https://www.kaggle.com/datasets/shaunoilund/hacker-news-upto-09-26-2016-zero-comments-removed
Explore at:
zip(5605639 bytes)Available download formats
Dataset updated
Oct 14, 2022
Authors
Shaun Oilund
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains data related to Hacker News posts from the 12 months prior to and including September 26, 2016. This dataset has been modified by removing all posts that have zero comments in the num_comments column; no other data has been removed or modified.

The original dataset is from Hacker News and the original dataset can be found on Kaggle here:

HN Ask and Show 2016 - Original

Both datasets have the following columns:

title: title of the post (self explanatory)

url: the url of the item being linked to

num_points: the number of upvotes the post received.

num_comments: the number of comments the post received.

author: the name of the account that made the post.

created_at: the date and time the post was made (the time zone is Eastern Time in the US).

This dataset was used as part of a project to determine if 'Ask HN' or 'Show HN' posts receive more comments on average and if the time posts are created have an impact on the comments average.
Cancer County-Level
kaggle.com
zip
Updated Dec 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Cancer County-Level [Dataset]. https://www.kaggle.com/datasets/thedevastator/exploring-county-level-correlations-in-cancer-ra
Explore at:
zip(146998 bytes)Available download formats
Dataset updated
Dec 3, 2022
Authors
The Devastator
Description
Exploring County-Level Correlations in Cancer Rates and Trends

A Multivariate Ordinary Least Squares Regression Model

By Noah Rippner [source]

About this dataset

This dataset offers a unique opportunity to examine the pattern and trends of county-level cancer rates in the United States at the individual county level. Using data from cancer.gov and the US Census American Community Survey, this dataset allows us to gain insight into how age-adjusted death rate, average deaths per year, and recent trends vary between counties – along with other key metrics like average annual counts, met objectives of 45.5?, recent trends (2) in death rates, etc., captured within our deep multi-dimensional dataset. We are able to build linear regression models based on our data to determine correlations between variables that can help us better understand cancers prevalence levels across different counties over time - making it easier to target health initiatives and resources accurately when necessary or desired

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This kaggle dataset provides county-level datasets from the US Census American Community Survey and cancer.gov for exploring correlations between county-level cancer rates, trends, and mortality statistics. This dataset contains records from all U.S counties concerning the age-adjusted death rate, average deaths per year, recent trend (2) in death rates, average annual count of cases detected within 5 years, and whether or not an objective of 45.5 (1) was met in the county associated with each row in the table.

To use this dataset to its fullest potential you need to understand how to perform simple descriptive analytics which includes calculating summary statistics such as mean, median or other numerical values; summarizing categorical variables using frequency tables; creating data visualizations such as charts and histograms; applying linear regression or other machine learning techniques such as support vector machines (SVMs), random forests or neural networks etc.; differentiating between supervised vs unsupervised learning techniques etc.; reviewing diagnostics tests to evaluate your models; interpreting your findings; hypothesizing possible reasons and patterns discovered during exploration made through data visualizations ; Communicating and conveying results found via effective presentation slides/documents etc.. Having this understanding will enable you apply different methods of analysis on this data set accurately ad effectively.

Once these concepts are understood you are ready start exploring this data set by first importing it into your visualization software either tableau public/ desktop version/Qlikview / SAS Analytical suite/Python notebooks for building predictive models by loading specified packages based on usage like Scikit Learn if Python is used among others depending on what tool is used . Secondly a brief description of the entire table's column structure has been provided above . Statistical operations can be carried out with simple queries after proper knowledge of basic SQL commands is attained just like queries using sub sets can also be performed with good command over selecting columns while specifying conditions applicable along with sorting operations being done based on specific attributes as required leading up towards writing python codes needed when parsing specific portion of data desired grouping / aggregating different categories before performing any kind of predictions / models can also activated create post joining few tables possible , when ever necessary once again varying across tools being used Thereby diving deep into analyzing available features determined randomly thus creating correlation matrices figures showing distribution relationships using correlation & covariance matrixes , thus making evaluations deducing informative facts since revealing trends identified through corresponding scatter plots from a given metric gathered from appropriate fields!

Research Ideas

Building a predictive cancer incidence model based on county-level demographic data to identify high-risk areas and target public health interventions.

Analyzing correlations between age-adjusted death rate, average annual count, and recent trends in order to develop more effective policy initiatives for cancer prevention and healthcare access.

Utilizing the dataset to construct a machine learning algorithm that can predict county-level mortality rates based on socio-economic factors such as poverty levels and educational attainment rates

Acknowledgements

If you use this dataset i...
US Weather History
kaggle.com
zip
Updated Jan 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). US Weather History [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-weather-history-12-months-of-record-setting-t/code
Explore at:
zip(76236 bytes)Available download formats
Dataset updated
Jan 18, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
US Weather History

Actual and Average Temperatures and Precipitation

By FiveThirtyEight [source]

About this dataset

This dataset contains a collection of weather data from twelve major cities across the United States, including Los Angeles (KCTQ), Charlotte (KCLT), Houston (KHOU), Indianapolis (KIND), Jacksonville (KJAX), Chicago (KMDW), New York City (KNYC), Philadelphia(KPHL ), Phoenix( KPHX) and Seattle( KSEA). These datasets offer an exciting insight into the changing temperatures and climate in these key locations over a period of 12 months. Whether you are an experienced researcher in climate science or just interested in understanding more about world weather trends, this dataset provides an invaluable source.

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset contains 12 weather records from various cities across the US, from Los Angeles to New York City. Each record includes information about average and actual temperatures, as well as precipitation and related records.

Research Ideas

Using the data to map out a timeline of high temperature records throughout the US and compare it to predictions of climate scientists on how climate change will affect regional temperatures in a given area.

Tracking average and actual precipitation levels over the course of an entire year in various cities around the US in order to develop city-specific estimates for water resource availability in future years.

Comparing record temperatures across cities in different regions, determining if there are any correlations between geographical location and temperature extremes, and then extrapolating these findings to better understand local weather patterns on both short-term or long-term scales

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: KPHL.csv | Column name | Description | |:--------------------------|:--------------------------------------------------------------------| | date | The date of the weather record. (Date) | | actual_mean_temp | The actual mean temperature for the day. (Float) | | actual_min_temp | The actual minimum temperature for the day. (Float) | | actual_max_temp | The actual maximum temperature for the day. (Float) | | average_min_temp | The average minimum temperature for the day. (Float) | | average_max_temp | The average maximum temperature for the day. (Float) | | record_min_temp | The record minimum temperature for the day. (Float) | | record_max_temp | The record maximum temperature for the day. (Float) | | record_min_temp_year | The year in which the record minimum temperature was set. (Integer) | | record_max_temp_year | The year in which the record maximum temperature was set. (Integer) | | actual_precipitation | The actual precipitation for the day. (Float) | | average_precipitation | The average precipitation for the day. (Float) | | record_precipitation | The record precipitation for the day. (Float) |

File: KPHX.csv | Column name | Description | |:--------------------------|:--------------------------------------------------------------------| | date | The date of the weather record. (Date) | | actual_mean_temp | The actual mean temperature for the day. (Float) | | actual_min_temp | The actual minimum temperature for the day. (Float) | | actual_max_temp | The actual maximum temperature for the day. (Float) | | average_min_temp | The average minimum temperature for the day. (Float) | | average_max_temp | The average maximum temperature for the day. (Float) | | **record_min_...

Facebook

Twitter

Click to copy link

Link copied

Cite

Salahuddin Ahmed (2025). Customer Satisfaction Scores and Behavior Data [Dataset]. https://www.kaggle.com/datasets/salahuddinahmedshuvo/customer-satisfaction-scores-and-behavior-data/discussion

Customer Satisfaction Scores and Behavior Data

A dataset containing customer satisfaction scores, demographics, and behavioral

Explore at:

zip(2456 bytes)Available download formats

Dataset updated

Apr 6, 2025

Authors

Salahuddin Ahmed

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This dataset contains customer satisfaction scores collected from a survey, alongside key demographic and behavioral data. It includes variables such as customer age, gender, location, purchase history, support contact status, loyalty level, and satisfaction factors. The dataset is designed to help analyze customer satisfaction, identify trends, and develop insights that can drive business decisions.

File Information: File Name: customer_satisfaction_data.csv (or your specific file name)

File Type: CSV (or the actual file format you are using)

Number of Rows: 120

Number of Columns: 10

Column Names:

Customer_ID – Unique identifier for each customer (e.g., 81-237-4704)

Group – The group to which the customer belongs (A or B)

Satisfaction_Score – Customer's satisfaction score on a scale of 1-10

Age – Age of the customer

Gender – Gender of the customer (Male, Female)

Location – Customer's location (e.g., Phoenix.AZ, Los Angeles.CA)

Purchase_History – Whether the customer has made a purchase (Yes or No)

Support_Contacted – Whether the customer has contacted support (Yes or No)

Loyalty_Level – Customer's loyalty level (Low, Medium, High)

Satisfaction_Factor – Primary factor contributing to customer satisfaction (e.g., Price, Product Quality)

Statistical Analyses:

Descriptive Statistics:

Calculate mean, median, mode, standard deviation, and range for key numerical variables (e.g., Satisfaction Score, Age).

Summarize categorical variables (e.g., Gender, Loyalty Level, Purchase History) with frequency distributions and percentages.

Two-Sample t-Test (Independent t-test):

Compare the mean satisfaction scores between two independent groups (e.g., Group A vs. Group B) to determine if there is a significant difference in their average satisfaction scores.

Paired t-Test:

If there are two related measurements (e.g., satisfaction scores before and after a certain event), you can compare the means using a paired t-test.

One-Way ANOVA (Analysis of Variance):

Test if there are significant differences in mean satisfaction scores across more than two groups (e.g., comparing the mean satisfaction score across different Loyalty Levels).

Chi-Square Test for Independence:

Examine the relationship between two categorical variables (e.g., Gender vs. Purchase History or Loyalty Level vs. Support Contacted) to determine if there’s a significant association.

Mann-Whitney U Test:

For non-normally distributed data, use this test to compare satisfaction scores between two independent groups (e.g., Group A vs. Group B) to see if their distributions differ significantly.

Kruskal-Wallis Test:

Similar to ANOVA, but used for non-normally distributed data. This test can compare the median satisfaction scores across multiple groups (e.g., comparing satisfaction scores across Loyalty Levels or Satisfaction Factors).

Spearman’s Rank Correlation:

Test for a monotonic relationship between two ordinal or continuous variables (e.g., Age vs. Satisfaction Score or Satisfaction Score vs. Loyalty Level).

Regression Analysis:

Linear Regression: Model the relationship between a continuous dependent variable (e.g., Satisfaction Score) and independent variables (e.g., Age, Gender, Loyalty Level).

Logistic Regression: If analyzing binary outcomes (e.g., Purchase History or Support Contacted), you could model the probability of an outcome based on predictors.

Factor Analysis:

To identify underlying patterns or groups in customer behavior or satisfaction factors, you can apply Factor Analysis to reduce the dimensionality of the dataset and group similar variables.

Cluster Analysis:

Use K-Means Clustering or Hierarchical Clustering to group customers based on similarity in their satisfaction scores and other features (e.g., Loyalty Level, Purchase History).

Confidence Intervals:

Calculate confidence intervals for the mean of satisfaction scores or any other metric to estimate the range in which the true population mean might lie.

Clear search

Close search

Google apps

Main menu

Customer Satisfaction Scores and Behavior Data

Cities' average distance to surface diversion sources

Annual Average Temperature Change - Projections (12km)

US Average, Maximum, and Minimum Temperatures

US Average, Maximum, and Minimum Temperatures

Analyzing Daily Temperatures Across the USA

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Existing own homes; average purchase prices, region

Earnings by Workplace, Borough - Dataset - data.gov.uk

Head Hunting

Income Distribution by Quintile: Mean Household Income in Chula Vista, CA //...

About this dataset

Content

Inspiration

Recommended for further research

Median Household Income Variation by Family Size in Texas City, TX:...

About this dataset

Content

Inspiration

Recommended for further research

Median Household Income Variation by Family Size in San Diego County, CA:...

About this dataset

Content

Inspiration

Recommended for further research

Global Land and Surface Temperature Trends

Global Land and Surface Temperature Trends Analysis

Assessing climate change year by year

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Introduction

Exploring the Dataset

Research Ideas

Acknowledgements

License

Columns

IPPS DRG Provider Summary

IPPS DRG Provider Summary

Average Discharges, Charges, and Medicare Payments

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Medical Trial Dataset

Medical Trial Case Study

1) Background Information

2) Metadata of the File

3) Business Goal

Feel free to leave comments on the discussion. I'd appreciate your upvote if you find my dataset useful! 😀

Comprehensive Median Household Income and Distribution Dataset for Franklin...

About this dataset

Content

Inspiration

Interested in deeper insights and visual analysis?

Global Average Human Height by Age and Country

Unemployment by Age Groups Dataset

American Time Use Survey

American Time Use Survey

Average hours spent on activities in the US over a decade

About this dataset

How to use the dataset

1. Familiarize Yourself with the Columns

2. Focus on Series and Sub-Series

3. Analyze Types of Activities

4. Investigate Average Hours Spent

5. Combine Filters for Deeper Analysis