Facebook
TwitterBy data.world's Admin [source]
The data was obtained from multiple sources. Data from 1985-2002 were downloaded from the National Bureau for Economic Research through the National Center for Health Statistics' National Vital Statistics System. Data from 2003-2015 were sourced using aggregators provided by CDC's WONDER tool, utilizing Year, Month, State, and County filters. It is worth noting that geolocation information for individual babies born after 2005 is not released due to privacy concerns; therefore, all data has been aggregated by month.
The spatial applicability of this dataset is limited to the United States at the county level. It covers a temporal range spanning January 1, 1985 - December 31, 2015. Each row in the dataset represents aggregated birth counts within a specific county for a particular month and year.
Additional notes highlight that this dataset expands on data presented in an essay called The Timing of Baby Making published by The Pudding website in May 2017. While only data ranging from1995-2015 were displayed in the essay itself, this dataset includes an extra ten years of birth data. Furthermore, any non-US residents have been excluded from this dataset.
The provided metadata gives a detailed breakdown of the columns in the dataset, including their descriptions and data types. The included variables allow researchers to analyze births at both individual county and state levels over time. Finally, the dataset is available under the MIT License for public use
Here is a guide on how to effectively use this dataset:
Step 1: Understanding the Columns
The dataset consists of several columns that provide specific information about each birth record. Let's understand what each column represents:
- State: The state (including District of Columbia) where the mother lives.
- County: The county where the mother lives, coded using the FIPS County Code.
- Month: The month in which the birth took place (1 = January, 2 = February, etc.).
- Year: The four-digit year of the birth.
- countyBirths: The calculated sum of births that occurred to mothers living in a county for a given month. If the sum was less than 9, it is listed as NA as per NCHS reporting guidelines.
- stateBirths: The calculated sum of births that occurred to mothers living in a state for a given month. It includes all birth counts, even those from counties with fewer than 9 births.
Step 2: Exploring Birth Trends by State and County
You can analyze birth trends by focusing on specific states or counties within specific time frames. Here's how you can do it:
Filter by State or County:
- Select rows based on your chosen state using the State column. Each number corresponds to a specific state (e.g.,
01= Alabama).- Further narrow down your analysis by selecting specific counties using their respective FIPS codes mentioned in the County column.
Analyze Monthly Variation:
- Calculate monthly total births within your desired location(s) by grouping data based on the Month column.
- Compare the number of births between different months to identify any seasonal trends or patterns.
Visualize Birth Trends:
- Create line charts or bar plots to visualize how the number of births changes over time.
- Plot a line or bar for each month across multiple years to identify any significant changes in birth rates.
Step 3: Comparison and Calculation
You can utilize this dataset to compare birth rates between states, counties, and regions. Here are a few techniques you can try:
- State vs. County Comparison:
- Calculate the total births within each state by aggregating
- Analyzing birth trends: This dataset can be used to analyze and understand the trends in birth rates across different states and counties over the period of 1985 to 2015. Researchers can study factors that may influence these trends, such as socioeconomic factors, healthcare access, or cultural changes.
- Identifying seasonal variations: The dataset includes information on the month of birth for each entry. This data can be utilized to identify any seasonal variations in births across different locations in the US. Understanding these variations can help in planning resources and healthcare services accordingly.
- Studying geographical patterns: By analyzing the county-level data, researchers can explore geographical patterns of childbirth throughout the United States. They can identify regions with high or low birth rates and...
Facebook
TwitterBy data.world's Admin [source]
This dataset contains an aggregation of birth data from the United Statesbetween 1985 and 2015. It consists of information on mothers' locations by state (including District of Columbia) and county, as well as information such as the month they gave birth, and aggregates giving the sum of births during that month. This data has been provided by both the National Bureau for Economic Research and National Center for Health Statistics, whose shared mission is to understand how life works in order to aid individuals in making decisions about their health and wellbeing. This dataset provides valuable insight into population trends across time and location - for example, which states have higher or lower birthrates than others? Which counties experience dramatic fluctuations over time? Given its scope, this dataset could be used in a number of contexts--from epidemiology research to population forecasting. Be sure to check out our other datasets related to births while you're here!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset could be used to examine local trends in birth rates over time or analyze births at different geographical locations. In order to maximize your use of this dataset, it is important that you understand what information the various columns contain.
The main columns are: State (including District of Columbia), County (coded using the FIPS county code number), Month (numbering from 1 for January through 12 for December), Year (4-digit year) countyBirths (calculated sum of births that occurred to mothers living in a county for a given month) and stateBirths (calculated sum of births that occurred to mothers living in a state for a given month). These fields should provide enough information for you analyze trends across geographic locations both at monthly and yearly levels. You could also consider combining variables such as
YearwithStateorYearwithMonthor any other grouping combinations depending on your analysis goal.In addition, while all data were downloaded on April 5th 2017, it is worth noting that all sources used followed privacy guidelines as laid out by NCHC so individual births occurring after 2005 are not included due to geolocation concerns.
We hope you find this dataset useful and can benefit from its content! With proper understanding of what each field contains, we are confident you will gain valuable insights on birth rates across counties within the United States during this period
- Establishing county-level trends in birth rates for the US over time.
- Analyzing the relationship between month of birth and health outcomes for US babies after they are born (e.g., infant mortality, neurological development, etc.).
- Comparing state/county-level differences in average numbers of twins born each year
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: allBirthData.csv | Column name | Description | |:-----------------|:-----------------------------------------------------------------------------------------------------------------| | State | The numerical order of the state where the mother lives. (Integer) | | Month | The month in which the birth took place. (Integer) | | Year | The year of the birth. (Integer) | | countyBirths | The calculated sum of births that occurred to mothers living in that county for that particular month. (Integer) | | stateBirths | The aggregate number at the level of entire states for any given month-year combination. (Integer) | | County | The county where the mother lives, coded using FIPS County Code. (Integer) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit data.world's Admin.
Facebook
TwitterNumber and percentage of live births, by month of birth, 1991 to most recent year.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States US: Fertility Rate: Total: Births per Woman data was reported at 1.800 Ratio in 2016. This records a decrease from the previous number of 1.843 Ratio for 2015. United States US: Fertility Rate: Total: Births per Woman data is updated yearly, averaging 2.002 Ratio from Dec 1960 (Median) to 2016, with 57 observations. The data reached an all-time high of 3.654 Ratio in 1960 and a record low of 1.738 Ratio in 1976. United States US: Fertility Rate: Total: Births per Woman data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s USA – Table US.World Bank: Health Statistics. Total fertility rate represents the number of children that would be born to a woman if she were to live to the end of her childbearing years and bear children in accordance with age-specific fertility rates of the specified year.; ; (1) United Nations Population Division. World Population Prospects: 2017 Revision. (2) Census reports and other statistical publications from national statistical offices, (3) Eurostat: Demographic Statistics, (4) United Nations Statistical Division. Population and Vital Statistics Reprot (various years), (5) U.S. Census Bureau: International Database, and (6) Secretariat of the Pacific Community: Statistics and Demography Programme.; Weighted average; Relevance to gender indicator: it can indicate the status of women within households and a woman’s decision about the number and spacing of children.
Facebook
TwitterThis dataset includes teen birth rates for females by age group, race, and Hispanic origin in the United States since 1960. Data availability varies by race and ethnicity groups. All birth data by race before 1980 are based on race of the child. Since 1980, birth data by race are based on race of the mother. For race, data are available for Black and White births since 1960, and for American Indians/Alaska Native and Asian/Pacific Islander births since 1980. Data on Hispanic origin are available since 1989. Teen birth rates for specific racial and ethnic categories are also available since 1989. From 2003 through 2015, the birth data by race were based on the “bridged” race categories (5). Starting in 2016, the race categories for reporting birth data changed; the new race and Hispanic origin categories are: Non-Hispanic, Single Race White; Non-Hispanic, Single Race Black; Non-Hispanic, Single Race American Indian/Alaska Native; Non-Hispanic, Single Race Asian; and, Non-Hispanic, Single Race Native Hawaiian/Pacific Islander (5,6). Birth data by the prior, “bridged” race (and Hispanic origin) categories are included through 2018 for comparison. National data on births by Hispanic origin exclude data for Louisiana, New Hampshire, and Oklahoma in 1989; New Hampshire and Oklahoma in 1990; and New Hampshire in 1991 and 1992. Birth and fertility rates for the Central and South American population includes other and unknown Hispanic. Information on reporting Hispanic origin is detailed in the Technical Appendix for the 1999 public-use natality data file (see ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/DVS/natality/Nat1999doc.pdf). SOURCES NCHS, National Vital Statistics System, birth data (see https://www.cdc.gov/nchs/births.htm); public-use data files (see https://www.cdc.gov/nchs/data_access/VitalStatsOnline.htm); and CDC WONDER (see http://wonder.cdc.gov/). REFERENCES National Office of Vital Statistics. Vital Statistics of the United States, 1950, Volume I. 1954. Available from: https://www.cdc.gov/nchs/data/vsus/vsus_1950_1.pdf. Hetzel AM. U.S. vital statistics system: major activities and developments, 1950-95. National Center for Health Statistics. 1997. Available from: https://www.cdc.gov/nchs/data/misc/usvss.pdf. National Center for Health Statistics. Vital Statistics of the United States, 1967, Volume I–Natality. 1969. Available from: https://www.cdc.gov/nchs/data/vsus/nat67_1.pdf. Martin JA, Hamilton BE, Osterman MJK, et al. Births: Final data for 2015. National vital statistics reports; vol 66 no 1. Hyattsville, MD: National Center for Health Statistics. 2017. Available from: https://www.cdc.gov/nchs/data/nvsr/nvsr66/nvsr66_01.pdf. Martin JA, Hamilton BE, Osterman MJK, Driscoll AK, Drake P. Births: Final data for 2016. National Vital Statistics Reports; vol 67 no 1. Hyattsville, MD: National Center for Health Statistics. 2018. Available from: https://www.cdc.gov/nvsr/nvsr67/nvsr67_01.pdf. Martin JA, Hamilton BE, Osterman MJK, Driscoll AK, Births: Final data for 2018. National vital statistics reports; vol 68 no 13. Hyattsville, MD: National Center for Health Statistics. 2019. Available from: https://www.cdc.gov/nchs/data/nvsr/nvsr68/nvsr68_13.pdf.
Facebook
TwitterThis dataset is about births in the US
US_births_1994-2003_CDC_NCHS.csv contains U.S. births data for the years 1994 to 2003, as provided by the Centers for Disease Control and Prevention's National Center for Health Statistics.
US_births_2000-2014_SSA.csv contains U.S. births data for the years 2000 to 2014, as provided by the Social Security Administration.
Facebook
TwitterThis is a source dataset for a Let's Get Healthy California indicator at https://letsgethealthy.ca.gov/. Infant Mortality is defined as the number of deaths in infants under one year of age per 1,000 live births. Infant mortality is often used as an indicator to measure the health and well-being of a community, because factors affecting the health of entire populations can also impact the mortality rate of infants. Although California’s infant mortality rate is better than the national average, there are significant disparities, with African American babies dying at more than twice the rate of other groups. Data are from the Birth Cohort Files. The infant mortality indicator computed from the birth cohort file comprises birth certificate information on all births that occur in a calendar year (denominator) plus death certificate information linked to the birth certificate for those infants who were born in that year but subsequently died within 12 months of birth (numerator). Studies of infant mortality that are based on information from death certificates alone have been found to underestimate infant death rates for infants of all race/ethnic groups and especially for certain race/ethnic groups, due to problems such as confusion about event registration requirements, incomplete data, and transfers of newborns from one facility to another for medical care. Note there is a separate data table "Infant Mortality by Race/Ethnicity" which is based on death records only, which is more timely but less accurate than the Birth Cohort File. Single year shown to provide state-level data and county totals for the most recent year. Numerator: Infants deaths (under age 1 year). Denominator: Live births occurring to California state residents. Multiple years aggregated to allow for stratification at the county level. For this indicator, race/ethnicity is based on the birth certificate information, which records the race/ethnicity of the mother. The mother can “decline to state”; this is considered to be a valid response. These responses are not displayed on the indicator visualization.
Facebook
TwitterThis dataset contains counts of live births to California residents by ZIP Code based on information entered on birth certificates. Final counts are derived from static data and include out-of-state births to California residents. The data tables include births to residents of California by ZIP Code of residence (by residence).
Note that ZIP Codes are intended for mail delivery routing and do not represent geographic regions. ZIP Codes are subject to change over time and may not represent the same locations between different time periods. All ZIP Codes in the list of California ZIP Codes used for validation are included for all years, but this does not mean that the ZIP Code was in use at that time.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains US baby names from the Social Security Administration dating back to 1879. With over 150 years of data, this is one of the most comprehensive datasets on baby names in the US. The data includes the name, year of birth, sex, and number of babies with that name for each year. This dataset is a great resource for anyone interested in studying baby naming trends over time
This dataset is a compilation of over 140 years of data from the Social Security Administration. It includes data on baby names, year of birth, and sex. There are also columns for the number of babies with that name born in that year.
This dataset can be used to track changes in baby naming trends over time, or to study how popular names have changed in popularity. It can also be used to study how naming trends differ between sexes, or between different years
This dataset could be used for a number of things, including: 1. Determining baby name trends over time 2. Finding out what the most popular baby names are in the US 3. Analyzing how baby name popularity has changed over the years
If you use this dataset in your research, please credit @nickgott, @rflprr and the Social Security Administration via Data.gov
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset provides birth rates and related data across the 50 states and DC from 2016 to 2021. The data was sourced from the Centers for Disease Control and Prevention (CDC) and includes detailed information such as number of births, gender, birth weight, state, and year of the delivery. A particular emphasis is given to detailed information on the mother's educational level. With this dataset, one can, for example, examine trends and patterns in birth rates across different academic groups and geographic locations.
Each row in the dataset is considered a category defined by the state, birth year, baby's gender, and educational level of the mother. Three quantities are given for each category: number of births, mother's average age, and average baby weight. The CDC is sensitive to potentially disclosing personal information, so any category with less than ten births is suppressed. For this reason, you will find 12 rows missing out of an expected 5,508 \( \text{51 states * 6 years * 2 genders * 9 edu levels = 5,508} \) Those missing rows all had the mother's educational level listed as "unknown or not stated" and their absence should not significantly impact studies or conclusions made using the dataset.
The data in this dataset was obtained using CDC's WONDER retrieval tool on the CDC Natality page
Image by Sarah Richter from Pixabay
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
US Social Security applications are a great way to track trends in how babies born in the US are named.
Data.gov releases two datasets that are helplful for this: one at the national level and another at the state level. Note that only names with at least 5 babies born in the same year (/ state) are included in this dataset for privacy.
I've taken the raw files here and combined/normalized them into two CSV files (one for each dataset) as well as a SQLite database with two equivalently-defined tables. The code that did these transformations is available here.
New to data exploration in R? Take the free, interactive DataCamp course, "Data Exploration With Kaggle Scripts," to learn the basics of visualizing data with ggplot. You'll also create your first Kaggle Scripts along the way.
Facebook
TwitterPopular Baby Names by Sex and Ethnic Group Data were collected through civil birth registration. Each record represents the ranking of a baby name in the order of frequency. Data can be used to represent the popularity of a name. Caution should be used when assessing the rank of a baby name if the frequency count is close to 10; the ranking may vary year to year.
Facebook
TwitterWe conducted an unmatched case-control study of 1,225,285 infants from a North Carolina Birth Cohort (2003-2015). Ozone and PM2.5 during critical exposure periods (gestational weeks 3-8) were estimated using residential address and a national spatiotemporal model at census tract centroid. Here we describe data sources for outcome (i.e., congenital heart defects) and exposure (i.e., ozone and PM2.5) data. This dataset is not publicly accessible because: EPA cannot release personally identifiable information regarding living individuals, according to the Privacy Act and the Freedom of Information Act (FOIA). This dataset contains information about human research subjects. Because there is potential to identify individual participants and disclose personal information, either alone or in combination with other datasets, individual level data are not appropriate to post for public access. Restricted access may be granted to authorized persons by contacting the party listed. It can be accessed through the following means: The North Carolina Birth Cohort data are not publicly available as it contains personal identifiable information. Data may be requested through the NCDHHS, Division of Public Health with proper approvals. Air pollutant concentrations for ozone and PM2.5 from the national spatiotemporal model are publicly available from EPA's website. Format: Birth certificate data from the State Center for Health Statistics of the NC Department of Health and Human Services linked with data from the Birth Defects Monitoring Program (NC BDMP) to create a birth cohort of all infants born in NC between 2003-2015. The NC BDMP is an active surveillance system that follows NC births to obtain birth defect diagnoses up to 1 year after the date of birth as well as identify infant deaths during the first year of life and include relevant information from the death certificate. A national spatiotemporal model provided data on predicted ozone PM2.5 concentrations over critical prenatal and time periods. The prediction model used data from research and regulatory monitors as well as a large (>200) array of geographic covariates to create fine scale spatial and temporal predictions. The model has a cross-validated R2 of 0.89 for PM2.5. Concentrations were predicted for daily throughout the study period at the centroid of each 2010 census tract in NC. This dataset is associated with the following publication: Arogbokun, O., T. Luben, J. Stingone, L. Engel, C. Martin, and A. Olshan. Racial disparities in maternal exposure to ambient air pollution during pregnancy and prevalence of congenital heart defects. AMERICAN JOURNAL OF EPIDEMIOLOGY. Johns Hopkins Bloomberg School of Public Health, 194(3): 709-721, (2025).
Facebook
TwitterThe Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.
All manuscripts (and other items you'd like to publish) must be submitted to
phsdatacore@stanford.edu for approval prior to journal submission.
We will check your cell sizes and citations.
For more information about how to cite PHS and PHS datasets, please visit:
https:/phsdocs.developerhub.io/need-help/citing-phs-data-core
This dataset was created on 2020-01-10 22:52:11.461 by merging multiple datasets together. The source datasets for this version were:
IPUMS 1930 households: This dataset includes all households from the 1930 US census.
IPUMS 1930 persons: This dataset includes all individuals from the 1930 US census.
IPUMS 1930 Lookup: This dataset includes variable names, variable labels, variable values, and corresponding variable value labels for the IPUMS 1930 datasets.
Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.
In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier. In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.
The historic US 1930 census data was collected in April 1930. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.
Notes
We provide IPUMS household and person data separately so that it is convenient to explore the descriptive statistics on each level. In order to obtain a full dataset, merge the household and person on the variables SERIAL and SERIALP. In order to create a longitudinal dataset, merge datasets on the variable HISTID.
Households with more than 60 people in the original data were broken up for processing purposes. Every person in the large households are considered to be in their own household. The original large households can be identified using the variable SPLIT, reconstructed using the variable SPLITHID, and the original count is found in the variable SPLITNUM.
Coded variables derived from string variables are still in progress. These variables include: occupation and industry.
Missing observations have been allocated and some inconsistencies have been edited for the following variables: SPEAKENG, YRIMMIG, CITIZEN, AGEMARR, AGE, BPL, MBPL, FBPL, LIT, SCHOOL, OWNERSHP, FARM, EMPSTAT, OCC1950, IND1950, MTONGUE, MARST, RACE, SEX, RELATE, CLASSWKR. The flag variables indicating an allocated observation for the associated variables can be included in your extract by clicking the ‘Select data quality flags’ box on the extract summary page.
Most inconsistent information was not edite
Facebook
TwitterThe data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 on.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Newborn by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Newborn. The dataset can be utilized to understand the population distribution of Newborn by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Newborn. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Newborn.
Key observations
Largest age group (population): Male # 20-24 years (78) | Female # 70-74 years (46). Source: U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Age groups:
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Newborn Population by Gender. You can refer the same here
Facebook
TwitterThis dataset contains percent preterm and very preterm live births by race/ethnic group of mother. Preterm births are all live births less than 37 weeks of gestation. Very preterm births are all live births less than 32 weeks of gestation. Important growth and development occur throughout pregnancy, especially in the final months and weeks. There is a higher risk of serious disability or death the earlier a baby is born. Gestational age is based on obstetric estimate at delivery (OE). Data includes births with gestational age of 17-47 weeks. Note: The race and ethnic groups in this table utilize eight mutually exclusive race and ethnicity categories. These categories are Hispanic and the following Non-Hispanic categories of Multi-Race, African-American, American Indian (includes Eskimo and Aleut), Asian, Pacific Islander (includes Hawaiian), White (includes Other race) and Unknown (includes refused to state and missing).
Data should not be compared to other data where gestational age is based on the date of last normal menses (LMP) and not OE. The National Center for Health Statistics recently transitioned to using an OE-based gestational age measure due to increasing evidence of its greater validity compared with the LMP-based measure. (http://www.cdc.gov/nchs/data/nvsr/nvsr64/nvsr64_05.pdf)
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Baby names from social security card applications in the United States spanning three decades, including state, gender, year of birth, name, and the number of babies given each name.
Recommended Analysis
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the data for the Newborn, GA population pyramid, which represents the Newborn population distribution across age and gender, using estimates from the U.S. Census Bureau American Community Survey 5-Year estimates. It lists the male and female population for each age group, along with the total population for those age groups. Higher numbers at the bottom of the table suggest population growth, whereas higher numbers at the top indicate declining birth rates. Furthermore, the dataset can be utilized to understand the youth dependency ratio, old-age dependency ratio, total dependency ratio, and potential support ratio.
Key observations
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Newborn Population by Age. You can refer the same here
Facebook
TwitterThe goal is to predict the rate of heart disease (per 100,000 individuals) across the United States at the county-level from other socioeconomic indicators. The data is compiled from a wide range of sources and made publicly available by the United States Department of Agriculture Economic Research Service (USDA ERS).
There are 33 variables in this dataset. Each row in the dataset represents a United States county, and the dataset we are working with covers two particular years, denoted a, and b We don't provide a unique identifier for an individual county, just a row_id for each row.
The variables in the dataset have names that of the form category_variable, where category is the high level category of the variable (e.g. econ or health). variable is what the specific column contains.
We're trying to predict the variable heart_disease_mortality_per_100k (a positive integer) for each row of the test data set.
Columns
area — information about the county
area_rucc — Rural-Urban Continuum Codes "form a classification scheme that distinguishes metropolitan counties by the population size of their metro area, and nonmetropolitan counties by degree of urbanization and adjacency to a metro area. The official Office of Management and Budget (OMB) metro and nonmetro categories have been subdivided into three metro and six nonmetro categories. Each county in the U.S. is assigned one of the 9 codes." (USDA Economic Research Service, https://www.ers.usda.gov/data-products/rural-urban-continuum-codes/)
area_urban_influence — Urban Influence Codes "form a classification scheme that distinguishes metropolitan counties by population size of their metro area, and nonmetropolitan counties by size of the largest city or town and proximity to metro and micropolitan areas." (USDA Economic Research Service, https://www.ers.usda.gov/data-products/urban-influence-codes/)
econ — economic indicators
econ_economic_typology — County Typology Codes "classify all U.S. counties according to six mutually exclusive categories of economic dependence and six overlapping categories of policy-relevant themes. The economic dependence types include farming, mining, manufacturing, Federal/State government, recreation, and nonspecialized counties. The policy-relevant types include low education, low employment, persistent poverty, persistent child poverty, population loss, and retirement destination." (USDA Economic Research Service, https://www.ers.usda.gov/data-products/county-typology-codes.aspx)
econ_pct_civilian_labor — Civilian labor force, annual average, as percent of population (Bureau of Labor Statistics, http://www.bls.gov/lau/)
econ_pct_unemployment — Unemployment, annual average, as percent of population (Bureau of Labor Statistics, http://www.bls.gov/lau/)
econ_pct_uninsured_adults — Percent of adults without health insurance (Bureau of Labor Statistics, http://www.bls.gov/lau/) econ_pct_uninsured_children — Percent of children without health insurance (Bureau of Labor Statistics, http://www.bls.gov/lau/)
health — health indicators
health_pct_adult_obesity — Percent of adults who meet clinical definition of obese (National Center for Chronic Disease Prevention and Health Promotion)
health_pct_adult_smoking — Percent of adults who smoke (Behavioral Risk Factor Surveillance System)
health_pct_diabetes — Percent of population with diabetes (National Center for Chronic Disease Prevention and Health Promotion, Division of Diabetes Translation)
health_pct_low_birthweight — Percent of babies born with low birth weight (National Center for Health Statistics)
health_pct_excessive_drinking — Percent of adult population that engages in excessive consumption of alcohol (Behavioral Risk Factor Surveillance System, )
health_pct_physical_inacticity — Percent of adult population that is physically inactive (National Center for Chronic Disease Prevention and Health Promotion)
health_air_pollution_particulate_matter — Fine particulate matter in µg/m³ (CDC WONDER, https://wonder.cdc.gov/wonder/help/pm.html)
health_homicides_per_100k — Deaths by homicide per 100,000 population (National Center for Health Statistics)
health_motor_vehicle_crash_deaths_per_100k — Deaths by motor vehicle crash per 100,000 population (National Center for Health Statistics)
health_pop_per_dentist — Population per dentist (HRSA Area Resource File)
health_pop_per_primary_care_physician — Population per Primary Care Physician (HRSA Area Resource File)
demo — demographics information
demo_pct_female — Percent of population that is female (US Census Population Estimates)
demo_pct_below_18_years_of_age — Percent of population that is below 18 years of age (US Census Population Estimates)
demo_pct_aged_65_years_and_older — Percent of population that is aged 65 years or older (US Census Population Estimates)
dem...
Facebook
TwitterBy data.world's Admin [source]
The data was obtained from multiple sources. Data from 1985-2002 were downloaded from the National Bureau for Economic Research through the National Center for Health Statistics' National Vital Statistics System. Data from 2003-2015 were sourced using aggregators provided by CDC's WONDER tool, utilizing Year, Month, State, and County filters. It is worth noting that geolocation information for individual babies born after 2005 is not released due to privacy concerns; therefore, all data has been aggregated by month.
The spatial applicability of this dataset is limited to the United States at the county level. It covers a temporal range spanning January 1, 1985 - December 31, 2015. Each row in the dataset represents aggregated birth counts within a specific county for a particular month and year.
Additional notes highlight that this dataset expands on data presented in an essay called The Timing of Baby Making published by The Pudding website in May 2017. While only data ranging from1995-2015 were displayed in the essay itself, this dataset includes an extra ten years of birth data. Furthermore, any non-US residents have been excluded from this dataset.
The provided metadata gives a detailed breakdown of the columns in the dataset, including their descriptions and data types. The included variables allow researchers to analyze births at both individual county and state levels over time. Finally, the dataset is available under the MIT License for public use
Here is a guide on how to effectively use this dataset:
Step 1: Understanding the Columns
The dataset consists of several columns that provide specific information about each birth record. Let's understand what each column represents:
- State: The state (including District of Columbia) where the mother lives.
- County: The county where the mother lives, coded using the FIPS County Code.
- Month: The month in which the birth took place (1 = January, 2 = February, etc.).
- Year: The four-digit year of the birth.
- countyBirths: The calculated sum of births that occurred to mothers living in a county for a given month. If the sum was less than 9, it is listed as NA as per NCHS reporting guidelines.
- stateBirths: The calculated sum of births that occurred to mothers living in a state for a given month. It includes all birth counts, even those from counties with fewer than 9 births.
Step 2: Exploring Birth Trends by State and County
You can analyze birth trends by focusing on specific states or counties within specific time frames. Here's how you can do it:
Filter by State or County:
- Select rows based on your chosen state using the State column. Each number corresponds to a specific state (e.g.,
01= Alabama).- Further narrow down your analysis by selecting specific counties using their respective FIPS codes mentioned in the County column.
Analyze Monthly Variation:
- Calculate monthly total births within your desired location(s) by grouping data based on the Month column.
- Compare the number of births between different months to identify any seasonal trends or patterns.
Visualize Birth Trends:
- Create line charts or bar plots to visualize how the number of births changes over time.
- Plot a line or bar for each month across multiple years to identify any significant changes in birth rates.
Step 3: Comparison and Calculation
You can utilize this dataset to compare birth rates between states, counties, and regions. Here are a few techniques you can try:
- State vs. County Comparison:
- Calculate the total births within each state by aggregating
- Analyzing birth trends: This dataset can be used to analyze and understand the trends in birth rates across different states and counties over the period of 1985 to 2015. Researchers can study factors that may influence these trends, such as socioeconomic factors, healthcare access, or cultural changes.
- Identifying seasonal variations: The dataset includes information on the month of birth for each entry. This data can be utilized to identify any seasonal variations in births across different locations in the US. Understanding these variations can help in planning resources and healthcare services accordingly.
- Studying geographical patterns: By analyzing the county-level data, researchers can explore geographical patterns of childbirth throughout the United States. They can identify regions with high or low birth rates and...