By Andy Kriebel [source]
The file contains data on births in the United States from 1994 to 2014. The data includes the following columns: year: The year of the observation. (Integer) month: The month of the observation. (Integer) date_of_month: The date of the observation. (Integer) day_of_week: The day of the week of the observation. (Integer) births: The number of births on the given day. (Integer)
The US Births dataset on Kaggle contains data on births in the United States from 1994 to 2014. The data is broken down by year, month, date of month, day of week, and births.
This dataset can be used to answer questions about when people are born, how common certain birthdays are, and any trends over time. For example, you could use this dataset to find out which day of the week has the most births or which month has the most births
- Determining which day of the year and what time of day that people are mostly born to help with staffing levels in maternity wards
- Identifying trends in baby names over time
- Predicting the number of births on a given day
This data set is a combined effort of the U.S. National Center for Health Statistics and the U.S. Social Security Administration, provided by FiveThirtyEight. It contains data on births in the United States from 1994 to 2014, with the following columns: year, month, date_of_month, day_of_week, births
->Thank you to FiveThirtyEight for providing this dataset!
License
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
File: US_births_1994-2014.csv | Column name | Description | |:------------------|:---------------------------------------------| | year | Year of the data. (Integer) | | month | Month of the data. (Integer) | | date_of_month | Day of the month of the data. (Integer) | | day_of_week | Day of the week of the data. (Integer) | | births | Number of births on the given day. (Integer) |
If you use this dataset in your research, please credit Andy Kriebel.
This dataset contains counts of live births for California counties based on information entered on birth certificates. Final counts are derived from static data and include out of state births to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all births that occurred during the time period.
The final data tables include both births that occurred in California regardless of the place of residence (by occurrence) and births to California residents (by residence), whereas the provisional data table only includes births that occurred in California regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by parent giving birth's age, parent giving birth's race-ethnicity, and birth place type. See temporal coverage for more information on which strata are available for which years.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains US baby names from the Social Security Administration dating back to 1879. With over 150 years of data, this is one of the most comprehensive datasets on baby names in the US. The data includes the name, year of birth, sex, and number of babies with that name for each year. This dataset is a great resource for anyone interested in studying baby naming trends over time
This dataset is a compilation of over 140 years of data from the Social Security Administration. It includes data on baby names, year of birth, and sex. There are also columns for the number of babies with that name born in that year.
This dataset can be used to track changes in baby naming trends over time, or to study how popular names have changed in popularity. It can also be used to study how naming trends differ between sexes, or between different years
This dataset could be used for a number of things, including: 1. Determining baby name trends over time 2. Finding out what the most popular baby names are in the US 3. Analyzing how baby name popularity has changed over the years
If you use this dataset in your research, please credit @nickgott, @rflprr and the Social Security Administration via Data.gov
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
SELECTED SOCIAL CHARACTERISTICS IN THE UNITED STATES PLACE OF BIRTH - DP02 Universe - Total population Survey-Program - American Community Survey 5-year estimates Years - 2020, 2021, 2022 People not reporting a place of birth were assigned the state or country of birth of another family member, or were allocated the response of another individual with similar characteristics. People born outside the United States were asked to report their place of birth according to current international boundaries. Since numerous changes in boundaries of foreign countries have occurred in the last century, some people may have reported their place of birth in terms of boundaries that existed at the time of their birth or emigration, or in accordance with their own national preference.
This dataset contains counts of live births for California as a whole based on information entered on birth certificates. Final counts are derived from static data and include out of state births to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all births that occurred during the time period.
The final data tables include both births that occurred in California regardless of the place of residence (by occurrence) and births to California residents (by residence), whereas the provisional data table only includes births that occurred in California regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by parent giving birth's age, parent giving birth's race-ethnicity, and birth place type. See temporal coverage for more information on which strata are available for which years.
https://data.gov.sg/open-data-licencehttps://data.gov.sg/open-data-licence
Dataset from Singapore Department of Statistics. For more information, visit https://data.gov.sg/datasets/d_6150f21b0892b3fdde546d2a1af2af82/view
This dataset describes birth outcomes (weight, gestational age, sex assigned at birth, presence of birth defects, etc.) and parental factors (age, address, health status, etc.) for people born in North Carolina between 2003 and 2015. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: Data come from the North Carolina Birth Defects Monitoring Program. These data are not publicly available, but more information can be obtained at https://schs.dph.ncdhhs.gov/units/bdmp/ (accessed 11/9/2021). Format: Data are stored as csv files and contain information on birth records in North Carolina from 2003 to 2015, including addresses of parents and medical information on parents and neonates. This dataset is associated with the following publication: Slawsky, E., A. Weaver, T. Luben, and K. Rappazzo. A Cross-sectional Study of Brownfields and Birth Defects. Birth Defects Research. John Wiley & Sons, Inc., Hoboken, NJ, USA, 114(5-6): 197-207, (2022).
Birth Statistics (i) Number of Known Births for Different Sexes and Crude Birth Rate for the Period from 1981 to 2024 (ii) Percentage Distribution of Live Births by Birth Weight for the Period from 2012 to 2023
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
The research on life expectancy in countries takes the spotlight in the notebook's machine learning model. Substantial data analysis and predictive algorithms are used to uncover the reasons causing differences in longevity among countries. With the aid of strong statistical tools, valuable insights into the complex link between healthcare, socioeconomic factors, and life expectancy are sought
|Description|Column|
|:------:|:--------:|
|Country under study|Country
|
|year|Year
|
|Status of the country's development|Status
|
|Population of country|Population
|
|Percentage of people finally one year old who were immunized against hepatitis B|Hepatitis B
|
|The number of reported measles cases per 1000 people|Measles
|
|Percentage of 1-year-olds immunized against polio|Polio
|
|Percentage of people finally one year old who were immunized against diphtheria|Diphtheria
|
|The number of deaths caused by AIDS of the last 4-year-olds who were born alive per 1000 people|HIV/AIDS
|
|The number of infant deaths per 1000 people|infant deaths
|
|he number of deaths of people under 5 years old per 1000 people|under-five deaths
|
|The ratio of government medical-health expenses to total government expenses in percentage|Total expenditure
|
|Gross domestic product|GDP
|
|The average body mass index of the entire population of the country|BMI
|
|Prevalence of thinness among people 19 years old in percentage|thinness 1-19 years
|
|Liters of alcohol consumption among people over 15 years old|Alcohol
|
|The number of years that people study|Schooling
|
|Country life expectancy|Life expectancy [target variable]
|
We conducted an unmatched case-control study of 1,225,285 infants from a North Carolina Birth Cohort (2003-2015). Ozone and PM2.5 during critical exposure periods (gestational weeks 3-8) were estimated using residential address and a national spatiotemporal model at census tract centroid. Here we describe data sources for outcome (i.e., congenital heart defects) and exposure (i.e., ozone and PM2.5) data. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The North Carolina Birth Cohort data are not publicly available as it contains personal identifiable information. Data may be requested through the NCDHHS, Division of Public Health with proper approvals. Air pollutant concentrations for ozone and PM2.5 from the national spatiotemporal model are publicly available from EPA's website. Format: Birth certificate data from the State Center for Health Statistics of the NC Department of Health and Human Services linked with data from the Birth Defects Monitoring Program (NC BDMP) to create a birth cohort of all infants born in NC between 2003-2015. The NC BDMP is an active surveillance system that follows NC births to obtain birth defect diagnoses up to 1 year after the date of birth as well as identify infant deaths during the first year of life and include relevant information from the death certificate. A national spatiotemporal model provided data on predicted ozone PM2.5 concentrations over critical prenatal and time periods. The prediction model used data from research and regulatory monitors as well as a large (>200) array of geographic covariates to create fine scale spatial and temporal predictions. The model has a cross-validated R2 of 0.89 for PM2.5. Concentrations were predicted for daily throughout the study period at the centroid of each 2010 census tract in NC. This dataset is associated with the following publication: Arogbokun, O., T. Luben, J. Stingone, L. Engel, C. Martin, and A. Olshan. Racial disparities in maternal exposure to ambient air pollution during pregnancy and prevalence of congenital heart defects. AMERICAN JOURNAL OF EPIDEMIOLOGY. Johns Hopkins Bloomberg School of Public Health, 194(3): 709-721, (2025).
Estimated number of persons on July 1, by 5-year age groups and gender, and median age, for Canada, provinces and territories.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States
This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.
All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.
Fork this kernel to get started with this dataset.
https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names
https://cloud.google.com/bigquery/public-data/usa-names
Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Banner Photo by @dcp from Unplash.
What are the most common names?
What are the most common female names?
Are there more female or male names?
Female names by a wide margin?
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Annual UK and constituent country figures for births, deaths, marriages, divorces, civil partnerships and civil partnership dissolutions.
Number and percentage of live births, by age group of mother, 1991 to most recent year.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
The census is undertaken by the Office for National Statistics every 10 years and gives us a picture of all the people and households in England and Wales. The most recent census took place in March of 2021.The census asks every household questions about the people who live there and the type of home they live in. In doing so, it helps to build a detailed snapshot of society. Information from the census helps the government and local authorities to plan and fund local services, such as education, doctors' surgeries and roads.Key census statistics for Leicester are published on the open data platform to make information accessible to local services, voluntary and community groups, and residents. There is also a dashboard published showcasing various datasets from the census allowing users to view data for Leicester and compare this with national statistics.Further information about the census and full datasets can be found on the ONS website - https://www.ons.gov.uk/census/aboutcensus/censusproductsCountry of birthThis dataset provides Census 2021 estimates that classify usual residents in England and Wales by their country of birth. The estimates are as at Census Day, 21 March 2021.Definition: The country in which a person was born. For people not born in one of in the four parts of the UK, there was an option to select "elsewhere". People who selected "elsewhere" were asked to write in the current name for their country of birth.
https://digital.nhs.uk/about-nhs-digital/terms-and-conditionshttps://digital.nhs.uk/about-nhs-digital/terms-and-conditions
This is a publication on maternity activity in English NHS hospitals. This report examines data relating to delivery and birth episodes in 2023-24, and the booking appointments for these deliveries. This annual publication covers the financial year ending March 2024. Data is included from both the Hospital Episodes Statistics (HES) data warehouse and the Maternity Services Data Set (MSDS). HES contains records of all admissions, appointments and attendances for patients admitted to NHS hospitals in England. The HES data used in this publication are called 'delivery episodes'. The MSDS collects records of each stage of the maternity service care pathway in NHS-funded maternity services, and includes information not recorded in HES. The MSDS is a maturing, national-level dataset. In April 2019, the MSDS transitioned to a new version of the dataset. This version, MSDS v2.0, is an update that introduced a new structure and content - including clinical terminology, in order to meet current clinical practice and incorporate new requirements. It is designed to meet requirements that resulted from the National Maternity Review, which led to the publication of the Better Births report in February 2016. This is the fifth publication of data from MSDS v2.0 and data from 2019-20 onwards is not directly comparable to data from previous years. This publication shows the number of HES delivery episodes during the period, with a number of breakdowns including by method of onset of labour, delivery method and place of delivery. It also shows the number of MSDS deliveries recorded during the period, with a breakdown for the mother's smoking status at the booking appointment by age group. It also provides counts of live born term babies with breakdowns for the general condition of newborns (via Apgar scores), skin-to-skin contact and baby's first feed type - all immediately after birth. There is also data available in a separate file on breastfeeding at 6 to 8 weeks. For the first time information on 'Smoking at Time of Delivery' has been presented using annual data from the MSDS. This includes national data broken down by maternal age, ethnicity and deprivation. From 2025/2026, MSDS will become the official source of 'Smoking at Time of Delivery' information and will replace the historic 'Smoking at Time of Delivery' data which is to become retired. We are currently undergoing dual collection and reporting on a quarterly basis for 2024/25 to help users compare information from the two sources. We are working with data submitters to help reconcile any discrepancies at a local level before any close down activities begin. A link to the dual reporting in the SATOD publication series can be found in the links below. Information on how all measures are constructed can be found in the HES Metadata and MSDS Metadata files provided below. In this publication we have also included an interactive Power BI dashboard to enable users to explore key NHS Maternity Statistics measures. The purpose of this publication is to inform and support strategic and policy-led processes for the benefit of patient care. This report will also be of interest to researchers, journalists and members of the public interested in NHS hospital activity in England. Any feedback on this publication or dashboard can be provided to enquiries@nhsdigital.nhs.uk, under the subject “NHS Maternity Statistics”.
The "Famous Birthdays" Kaggle notebook is a comprehensive dataset comprising the birthdays of 4,700 well-known individuals. The dataset provides insightful information about these celebrities, including their names, the number of articles written about them, their birth dates, and their zodiac signs. The columns included in this dataset are:
This notebook serves as a valuable resource for analyzing patterns and trends among famous personalities based on their birth information. For instance, users can explore which zodiac signs are most common among celebrities or identify any seasonal trends in birth dates.
Foto von Adi Goldstein auf Unsplash
Estimated annual number of births by gender for Canada, provinces and territories.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Yearly registered births – breakdown by Month
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Annual live births in England and Wales by age of mother and father, type of registration, median interval between births, number of previous live-born children and National Statistics Socio-economic Classification (NS-SEC).
By Andy Kriebel [source]
The file contains data on births in the United States from 1994 to 2014. The data includes the following columns: year: The year of the observation. (Integer) month: The month of the observation. (Integer) date_of_month: The date of the observation. (Integer) day_of_week: The day of the week of the observation. (Integer) births: The number of births on the given day. (Integer)
The US Births dataset on Kaggle contains data on births in the United States from 1994 to 2014. The data is broken down by year, month, date of month, day of week, and births.
This dataset can be used to answer questions about when people are born, how common certain birthdays are, and any trends over time. For example, you could use this dataset to find out which day of the week has the most births or which month has the most births
- Determining which day of the year and what time of day that people are mostly born to help with staffing levels in maternity wards
- Identifying trends in baby names over time
- Predicting the number of births on a given day
This data set is a combined effort of the U.S. National Center for Health Statistics and the U.S. Social Security Administration, provided by FiveThirtyEight. It contains data on births in the United States from 1994 to 2014, with the following columns: year, month, date_of_month, day_of_week, births
->Thank you to FiveThirtyEight for providing this dataset!
License
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
File: US_births_1994-2014.csv | Column name | Description | |:------------------|:---------------------------------------------| | year | Year of the data. (Integer) | | month | Month of the data. (Integer) | | date_of_month | Day of the month of the data. (Integer) | | day_of_week | Day of the week of the data. (Integer) | | births | Number of births on the given day. (Integer) |
If you use this dataset in your research, please credit Andy Kriebel.