81 datasets found
  1. USA Name Data

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data.gov (2019). USA Name Data [Dataset]. https://www.kaggle.com/datagov/usa-names
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    Data.govhttps://data.gov/
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    Context

    Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

    Content

    This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

    All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

    https://cloud.google.com/bigquery/public-data/usa-names

    Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by @dcp from Unplash.

    Inspiration

    What are the most common names?

    What are the most common female names?

    Are there more female or male names?

    Female names by a wide margin?

  2. Baby Names from Social Security Card Applications - National Data

    • catalog.data.gov
    • data.amerigeoss.org
    Updated May 5, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social Security Administration (2022). Baby Names from Social Security Card Applications - National Data [Dataset]. https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-data
    Explore at:
    Dataset updated
    May 5, 2022
    Dataset provided by
    Social Security Administrationhttp://ssa.gov/
    Description

    The data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 onward.

  3. USA Names

    • console.cloud.google.com
    Updated Jul 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Social%20Security%20Administration&hl=de&inv=1&invt=Ab2mjA (2023). USA Names [Dataset]. https://console.cloud.google.com/marketplace/product/social-security-administration/us-names?hl=de
    Explore at:
    Dataset updated
    Jul 15, 2023
    Dataset provided by
    Googlehttp://google.com/
    Area covered
    United States
    Description

    This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data. All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  4. US state county name & codes

    • kaggle.com
    Updated Jun 6, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VivekMangipudi (2017). US state county name & codes [Dataset]. https://www.kaggle.com/stansilas/us-state-county-name-codes/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 6, 2017
    Dataset provided by
    Kaggle
    Authors
    VivekMangipudi
    Area covered
    United States
    Description

    Context

    There is no story behind this data.

    These are just supplementary datasets which I plan on using for plotting county wise data on maps.. (in particular for using with my kernel : https://www.kaggle.com/stansilas/maps-are-beautiful-unemployment-is-not/)
    As that data set didn't have the info I needed for plotting an interactive map using highcharter .

    Content

    Since I noticed that most demographic datasets here on Kaggle, either have state code, state name, or county name + state name but not all of it i.e county name, fips code, state name + state code.

    Using these two datasets one can get any combination of state county codes etc.

    States.csv has State name + code
    US counties.csv has county wise data.

    Acknowledgements

    Picture : https://unsplash.com/search/usa-states?photo=-RO2DFPl7wE
    Counties : https://www.census.gov/geo/reference/codes/cou.html
    State :

    Inspiration

    Not Applicable.

  5. Historic US Census - 1900

    • redivis.com
    application/jsonl +7
    Updated Jan 10, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Center for Population Health Sciences (2020). Historic US Census - 1900 [Dataset]. http://doi.org/10.57761/mez6-j880
    Explore at:
    arrow, spss, avro, sas, application/jsonl, csv, parquet, stataAvailable download formats
    Dataset updated
    Jan 10, 2020
    Dataset provided by
    Redivis Inc.
    Authors
    Stanford Center for Population Health Sciences
    Time period covered
    Feb 1, 1900 - Dec 31, 1900
    Area covered
    United States
    Description

    Documentation

    The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.

    Historic data are scarce and often only exists in aggregate tables. The key advantage of the IPUMS data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.

    In sum: the IPUMS data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.

    The IPUMS 1900 census data was collected in June 1900. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.

    Section 2

    This dataset was created on 2020-01-10 22:51:40.810 by merging multiple datasets together. The source datasets for this version were:

    IPUMS 1900 households: This dataset includes all households from the 1900 US census.

    IPUMS 1900 persons: This dataset includes all individuals from the 1910 US census.

    IPUMS 1900 Lookup: This dataset includes variable names, variable labels, variable values, and corresponding variable value labels for the IPUMS 1900 datasets.

    Section 3

    The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.

    Historic data are scarce and often only exists in aggregate tables. The key advantage of the IPUMS data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.

    In sum: the IPUMS data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.

    The IPUMS 1900 census data was collected in June 1900. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.

  6. Census Data

    • catalog.data.gov
    • datadiscoverystudio.org
    • +3more
    Updated Mar 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Bureau of the Census (2024). Census Data [Dataset]. https://catalog.data.gov/dataset/census-data
    Explore at:
    Dataset updated
    Mar 1, 2024
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Description

    The Bureau of the Census has released Census 2000 Summary File 1 (SF1) 100-Percent data. The file includes the following population items: sex, age, race, Hispanic or Latino origin, household relationship, and household and family characteristics. Housing items include occupancy status and tenure (whether the unit is owner or renter occupied). SF1 does not include information on incomes, poverty status, overcrowded housing or age of housing. These topics will be covered in Summary File 3. Data are available for states, counties, county subdivisions, places, census tracts, block groups, and, where applicable, American Indian and Alaskan Native Areas and Hawaiian Home Lands. The SF1 data are available on the Bureau's web site and may be retrieved from American FactFinder as tables, lists, or maps. Users may also download a set of compressed ASCII files for each state via the Bureau's FTP server. There are over 8000 data items available for each geographic area. The full listing of these data items is available here as a downloadable compressed data base file named TABLES.ZIP. The uncompressed is in FoxPro data base file (dbf) format and may be imported to ACCESS, EXCEL, and other software formats. While all of this information is useful, the Office of Community Planning and Development has downloaded selected information for all states and areas and is making this information available on the CPD web pages. The tables and data items selected are those items used in the CDBG and HOME allocation formulas plus topics most pertinent to the Comprehensive Housing Affordability Strategy (CHAS), the Consolidated Plan, and similar overall economic and community development plans. The information is contained in five compressed (zipped) dbf tables for each state. When uncompressed the tables are ready for use with FoxPro and they can be imported into ACCESS, EXCEL, and other spreadsheet, GIS and database software. The data are at the block group summary level. The first two characters of the file name are the state abbreviation. The next two letters are BG for block group. Each record is labeled with the code and name of the city and county in which it is located so that the data can be summarized to higher-level geography. The last part of the file name describes the contents . The GEO file contains standard Census Bureau geographic identifiers for each block group, such as the metropolitan area code and congressional district code. The only data included in this table is total population and total housing units. POP1 and POP2 contain selected population variables and selected housing items are in the HU file. The MA05 table data is only for use by State CDBG grantees for the reporting of the racial composition of beneficiaries of Area Benefit activities. The complete package for a state consists of the dictionary file named TABLES, and the five data files for the state. The logical record number (LOGRECNO) links the records across tables.

  7. US Census Demographic Data

    • kaggle.com
    zip
    Updated Mar 3, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MuonNeutrino (2019). US Census Demographic Data [Dataset]. https://www.kaggle.com/muonneutrino/us-census-demographic-data
    Explore at:
    zip(11110116 bytes)Available download formats
    Dataset updated
    Mar 3, 2019
    Authors
    MuonNeutrino
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    This dataset expands on my earlier New York City Census Data dataset. It includes data from the entire country instead of just New York City. The expanded data will allow for much more interesting analyses and will also be much more useful at supporting other data sets.

    Content

    The data here are taken from the DP03 and DP05 tables of the 2015 American Community Survey 5-year estimates. The full datasets and much more can be found at the American Factfinder website. Currently, I include two data files:

    1. acs2015_census_tract_data.csv: Data for each census tract in the US, including DC and Puerto Rico.
    2. acs2015_county_data.csv: Data for each county or county equivalent in the US, including DC and Puerto Rico.

    The two files have the same structure, with just a small difference in the name of the id column. Counties are political subdivisions, and the boundaries of some have been set for centuries. Census tracts, however, are defined by the census bureau and will have a much more consistent size. A typical census tract has around 5000 or so residents.

    The Census Bureau updates the estimates approximately every year. At least some of the 2016 data is already available, so I will likely update this in the near future.

    Acknowledgements

    The data here were collected by the US Census Bureau. As a product of the US federal government, this is not subject to copyright within the US.

    Inspiration

    There are many questions that we could try to answer with the data here. Can we predict things such as the state (classification) or household income (regression)? What kinds of clusters can we find in the data? What other datasets can be improved by the addition of census data?

  8. Consumers who saw or heard of the movie “Call Me by Your Name” in the U.S....

    • statista.com
    Updated Jan 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2023). Consumers who saw or heard of the movie “Call Me by Your Name” in the U.S. 2018 [Dataset]. https://www.statista.com/statistics/805600/public-awareness-call-me-by-your-name/
    Explore at:
    Dataset updated
    Jan 5, 2023
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 25, 2018 - Feb 27, 2018
    Area covered
    United States
    Description

    The statistic presents data on the share of consumers who have seen, want to see, or at least heard of the movie “Call Me by Your Name” in the United States as of February 2018. During a survey, nine percent of respondents stated they wanted to see the movie “Call Me by Your Name”.

  9. o

    Geonames - All Cities with a population > 1000

    • public.opendatasoft.com
    • data.smartidf.services
    • +1more
    csv, excel, geojson +1
    Updated Mar 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Geonames - All Cities with a population > 1000 [Dataset]. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/
    Explore at:
    csv, json, geojson, excelAvailable download formats
    Dataset updated
    Mar 10, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name

  10. w

    Dataset of books called The American polity : the people and their...

    • workwithdata.com
    Updated Apr 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called The American polity : the people and their government [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=The+American+polity+%3A+the+people+and+their+government
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 1 row and is filtered where the book is The American polity : the people and their government. It features 7 columns including author, publication date, language, and book publisher.

  11. Popular White Last Names in the US

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). Popular White Last Names in the US [Dataset]. https://www.johnsnowlabs.com/marketplace/popular-white-last-names-in-the-us/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Area covered
    United States
    Description

    This dataset represents the popular last names in the United States for White.

  12. 🏥🏥US healthcare providers by cities 💊💊

    • kaggle.com
    Updated Nov 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shiv_D24Coder (2023). 🏥🏥US healthcare providers by cities 💊💊 [Dataset]. https://www.kaggle.com/datasets/shivd24coder/us-healthcare-providers-by-cities
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 1, 2023
    Dataset provided by
    Kaggle
    Authors
    Shiv_D24Coder
    License

    https://www.usa.gov/government-works/https://www.usa.gov/government-works/

    Area covered
    United States
    Description

    key Features

    Column NameDescription
    city_nameThe name of the city where healthcare providers are located.
    result_countThe count of healthcare providers in the city.
    resultsDetails of healthcare providers in the city.
    created_epochThe epoch timestamp when the provider's information was created.
    enumeration_typeThe type of enumeration for the provider (e.g., NPI-1, NPI-2).
    last_updated_epochThe epoch timestamp when the provider's information was last updated.
    numberThe unique identifier for the healthcare provider.
    addressesInformation about the provider's addresses, including mailing and location addresses.
    country_codeThe country code for the provider's address (e.g., US for the United States).
    country_nameThe country name for the provider's address.
    address_purposeThe purpose of the address (e.g., MAILING, LOCATION).
    address_typeThe type of address (e.g., DOM - Domestic).
    address_1The first line of the provider's address.
    address_2The second line of the provider's address.
    cityThe city where the provider is located.
    stateThe state where the provider is located.
    postal_codeThe postal code or ZIP code for the provider's location.
    telephone_numberThe telephone number for the provider's contact.
    practiceLocationsDetails about the provider's practice locations.
    basicBasic information about the provider, including their name, credentials, and gender.
    first_nameThe first name of the healthcare provider.
    last_nameThe last name of the healthcare provider.
    middle_nameThe middle name of the healthcare provider.
    credentialThe credential of the healthcare provider (e.g., PT, DPT).
    sole_proprietorIndicates whether the provider is a sole proprietor (e.g., YES, NO).
    genderThe gender of the healthcare provider (e.g., M, F).
    enumeration_dateThe date when the provider's enumeration was recorded.
    last_updatedThe date when the provider's information was last updated.
    taxonomiesInformation about the provider's taxonomies, including code, description, state, license, and primary designation.
    identifiersAdditional identifiers for the healthcare provider.
    endpointsInformation about communication endpoints for the provider.
    other_namesAny other names associated with the healthcare provider.

    How to use this Dataset

    1. Healthcare Provider Analysis: This dataset can be used to perform in-depth analyses of healthcare providers across various cities. You can extract insights into the distribution of different types of healthcare professionals, their practice locations, and their specialties. This information is valuable for healthcare workforce planning and resource allocation.

    2. Geospatial Mapping: Utilize the city names and addresses in the dataset to create geospatial visualizations. You can map the locations of healthcare providers in each city, helping stakeholders identify areas with potential shortages or surpluses of healthcare services.

    3. Provider Directory Development: The dataset provides detailed information about healthcare providers, including their names, contact details, and credentials. You can use this data to build a comprehensive healthcare provider directory or search tool, helping patients and healthcare organizations find and connect with the right providers in their area.

    If you find this dataset useful, give it an upvote – it's a small gesture that goes a long way! Thanks for your support. 😄

  13. f

    Distribution of first name and last name frequencies by country

    • figshare.com
    xlsx
    Updated Feb 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mike Thelwall (2023). Distribution of first name and last name frequencies by country [Dataset]. http://doi.org/10.6084/m9.figshare.21956795.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 2, 2023
    Dataset provided by
    figshare
    Authors
    Mike Thelwall
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Distribution of first and last name frequencies of academic authors by country.

    Spreadsheet 1 contains 50 countries, with names based on affiliations in Scopus journal articles 2001-2021.

    Spreadsheet 2 contains 200 countries, with names based on affiliations in Scopus journal articles 2001-2021, using a marginally updated last name extraction algorithm that is almost the same except for Dutch/Flemish names.

    From the paper: Can national researcher mobility be tracked by first or last name uniqueness?

    For example the distribution for the UK shows a single peak for international names, with no national names, Belgium has a national peak and an international peak, and China has mainly a national peak. The 50 countries are:

    No Code Country 1 SB Serbia 2 IE Ireland 3 HU Hungary 4 CL Chile 5 CO Columbia 6 NG Nigeria 7 HK Hong Kong 8 AR Argentina 9 SG Singapore 10 NZ New Zealand 11 PK Pakistan 12 TH Thailand 13 UA Ukraine 14 SA Saudi Arabia 15 RO Israel 16 ID Indonesia 17 IL Israel 18 MY Malaysia 19 DK Denmark 20 CZ Czech Republic 21 ZA South Africa 22 AT Austria 23 FI Finland 24 PT Portugal 25 GR Greece 26 NO Norway 27 EG Egypt 28 MX Mexico 29 BE Belgium 30 CH Switzerland 31 SW Sweden 32 PL Poland 33 TW Taiwan 34 NL Netherlands 35 TK Turkey 36 IR Iran 37 RU Russia 38 AU Australia 39 BR Brazil 40 KR South Korea 41 ES Spain 42 CA Canada 43 IT France 44 FR France 45 IN India 46 DE Germany 47 US USA 48 UK UK 49 JP Japan 50 CN China

  14. US Household Income Statistics

    • kaggle.com
    Updated Apr 16, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Golden Oak Research Group (2018). US Household Income Statistics [Dataset]. https://www.kaggle.com/goldenoakresearch/us-household-income-stats-geo-locations/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 16, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Golden Oak Research Group
    Area covered
    United States
    Description

    New Upload:

    Added +32,000 more locations. For information on data calculations please refer to the methodology pdf document. Information on how to calculate the data your self is also provided as well as how to buy data for $1.29 dollars.

    What you get:

    The database contains 32,000 records on US Household Income Statistics & Geo Locations. The field description of the database is documented in the attached pdf file. To access, all 348,893 records on a scale roughly equivalent to a neighborhood (census tract) see link below and make sure to up vote. Up vote right now, please. Enjoy!

    Household & Geographic Statistics:

    • Mean Household Income (double)
    • Median Household Income (double)
    • Standard Deviation of Household Income (double)
    • Number of Households (double)
    • Square area of land at location (double)
    • Square area of water at location (double)

    Geographic Location:

    • Longitude (double)
    • Latitude (double)
    • State Name (character)
    • State abbreviated (character)
    • State_Code (character)
    • County Name (character)
    • City Name (character)
    • Name of city, town, village or CPD (character)
    • Primary, Defines if the location is a track and block group.
    • Zip Code (character)
    • Area Code (character)

    Abstract

    The dataset originally developed for real estate and business investment research. Income is a vital element when determining both quality and socioeconomic features of a given geographic location. The following data was derived from over +36,000 files and covers 348,893 location records.

    License

    Only proper citing is required please see the documentation for details. Have Fun!!!

    Golden Oak Research Group, LLC. “U.S. Income Database Kaggle”. Publication: 5, August 2017. Accessed, day, month year.

    Sources, don't have 2 dollars? Get the full information yourself!

    2011-2015 ACS 5-Year Documentation was provided by the U.S. Census Reports. Retrieved August 2, 2017, from https://www2.census.gov/programs-surveys/acs/summary_file/2015/data/5_year_by_state/

    Found Errors?

    Please tell us so we may provide you the most accurate data possible. You may reach us at: research_development@goldenoakresearch.com

    for any questions you can reach me on at 585-626-2965

    please note: it is my personal number and email is preferred

    Check our data's accuracy: Census Fact Checker

    Access all 348,893 location records and more:

    Don't settle. Go big and win big. Optimize your potential. Overcome limitation and outperform expectation. Access all household income records on a scale roughly equivalent to a neighborhood, see link below:

    Website: Golden Oak Research Kaggle Deals all databases $1.29 Limited time only

    A small startup with big dreams, giving the every day, up and coming data scientist professional grade data at affordable prices It's what we do.

  15. Popular Black Last Names in the US

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). Popular Black Last Names in the US [Dataset]. https://www.johnsnowlabs.com/marketplace/popular-black-last-names-in-the-us/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Area covered
    United States
    Description

    This dataset represents the popular last names in the United States for Black.

  16. 🏥 US Work-related injury

    • kaggle.com
    Updated Aug 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mexwell (2023). 🏥 US Work-related injury [Dataset]. https://www.kaggle.com/datasets/mexwell/us-work-related-injury
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 14, 2023
    Dataset provided by
    Kaggle
    Authors
    mexwell
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Area covered
    United States
    Description

    The Occupational Safety and Health Administration (OSHA) collected work-related injury and illness data from employers within specific industry and employment size specifications from 2002 through 2011. This data collection is called the OSHA Data Initiative or ODI. The data provided is used by OSHA to calculate establishment specific injury and illness incidence rates. This searchable database contains a table with the name, address, industry, and associated Total Case Rate (TCR), Days Away, Restricted, and Transfer (DART) case rate, and the Days Away From Work (DAFWII) case rate for the establishments that provided OSHA with valid data for calendar years 2002 through 2011. This data has been sampled down from its original size to 4%. In addition, the original dataset only has data from a small portion of all private sector establishments in the United States (80,000 out of 7.5 million total establishments). Therefore, these data are not representative of all businesses and general conclusions pertaining to all US business should not be overdrawn. Data quality: While OSHA takes multiple steps to ensure the data collected is accurate, problems and errors invariably exist for a small percentage of establishments. OSHA does not believe the data for the establishments with the highest rates on this file are accurate in absolute terms. Efforts were made during the collection cycle to correct submission errors, however some remain unresolved. It would be a mistake to say establishments with the highest rates on this file are the ‘most dangerous’ or ‘worst’ establishments in the Nation. Rate Calculation: An incidence rate of injuries and illnesses is computed from the following formula: (Number of injuries and illnesses X 200,000) / Employee hours worked = Incidence rate. The Total Case Rate includes all cases recorded on the OSHA Form 300 (Column G + Column H + Column I + Column J). The Days Away/Restriced/Transfer includes cases recorded in Column H + Column I. The Days Away includes cases recorded in Column H. For further information on injury and illness incidence rates, please visit the Bureau of Labor Statistics’ webpage at http://www.bls.gov/iif/osheval.htm State Participation: Not all state plan states participate in the ODI. The following states did not participate in the 2010 ODI (collection of CY 2009 data), establishment data is not available for these states: Alaska; Oregon; Puerto Rico; South Carolina; Washington; Wyoming.

    Data Dictionary

    KeyList of...CommentExample Value
    yearInteger$MISSING_FIELD2002
    address.cityString$MISSING_FIELD"Cherry Hill"
    address.stateString$MISSING_FIELD"NJ"
    address.streetString$MISSING_FIELD"100 Dobbs Ln Ste 102"
    address.zipInteger$MISSING_FIELD8034
    business.nameString$MISSING_FIELD"United States Cold Storage"
    business.second nameString$MISSING_FIELD"US Cold"
    industry.divisionString$MISSING_FIELD"Transportation, Communications, Electric, Gas, And Sanitary Services"
    industry.idInteger$MISSING_FIELD4222
    industry.labelString$MISSING_FIELD"Refrigerated Warehousing and Storage"
    industry.major_groupString$MISSING_FIELD"Motor Freight Transportation And Warehousing"
    statistics.days awayFloat$MISSING_FIELD0.0
    statistics.days away/restricted/transferFloat$MISSING_FIELD0.0
    statistics.total case rateFloat$MISSING_FIELD0.0

    Acknowlegement

    Original Data

    CORGIS Dataset Project

    Foto von National Cancer Institute auf Unsplash

  17. o

    Places - United States of America

    • public.opendatasoft.com
    • data.smartidf.services
    • +1more
    csv, excel, geojson +1
    Updated Jun 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Places - United States of America [Dataset]. https://public.opendatasoft.com/explore/dataset/georef-united-states-of-america-place/
    Explore at:
    geojson, csv, json, excelAvailable download formats
    Dataset updated
    Jun 6, 2024
    License

    https://en.wikipedia.org/wiki/Public_domainhttps://en.wikipedia.org/wiki/Public_domain

    Area covered
    United States
    Description

    This dataset is part of the Geographical repository maintained by Opendatasoft. This dataset contains data for places and equivalent entities in United States of America.This layer both incorporated places (legal entities) and census designated places or CDPs (statistical entities). An incorporated place is established to provide governmental functions for a concentration of people as opposed to a minor civil division (MCD), which generally is created to provide services or administer an area without regard, necessarily, to population. Places always nest within a state, but may extend across county and county subdivision boundaries. An incorporated place usually is a city, town, village, or borough, but can have other legal descriptions. CDPs are delineated for the decennial census as the statistical counterparts of incorporated places. CDPs are delineated to provide data for settled concentrations of population that are identifiable by name, but are not legally incorporated under the laws of the state in which they are located. The boundaries for CDPs often are defined in partnership with state, local, and/or tribal officials and usually coincide with visible features or the boundary of an adjacent incorporated place or another legal entity. CDP boundaries often change from one decennial census to the next with changes in the settlement pattern and development; a CDP with the same name as in an earlier census does not necessarily have the same boundary. The only population/housing size requirement for CDPs is that they must contain some housing and population. Processors and tools are using this data. Enhancements Add ISO 3166-3 codes. Simplify geometries to provide better performance across the services. Add administrative hierarchy.

  18. w

    Dataset of books called American Indian and African American people,...

    • workwithdata.com
    Updated Apr 17, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Work With Data (2025). Dataset of books called American Indian and African American people, communities, and interactions : an annotated bibliography [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=American+Indian+and+African+American+people%2C+communities%2C+and+interactions+%3A+an+annotated+bibliography
    Explore at:
    Dataset updated
    Apr 17, 2025
    Dataset authored and provided by
    Work With Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is about books. It has 1 row and is filtered where the book is American Indian and African American people, communities, and interactions : an annotated bibliography. It features 7 columns including author, publication date, language, and book publisher.

  19. United States COVID-19 Community Levels by County

    • healthdata.gov
    • data.virginia.gov
    • +1more
    application/rdfxml +5
    Updated Mar 8, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cdc.gov (2022). United States COVID-19 Community Levels by County [Dataset]. https://healthdata.gov/dataset/United-States-COVID-19-Community-Levels-by-County/nn5b-j5u9
    Explore at:
    application/rssxml, json, tsv, csv, xml, application/rdfxmlAvailable download formats
    Dataset updated
    Mar 8, 2022
    Dataset provided by
    data.cdc.gov
    Area covered
    United States
    Description

    Reporting of Aggregate Case and Death Count data was discontinued May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. Although these data will continue to be publicly available, this dataset will no longer be updated.

    This archived public use dataset has 11 data elements reflecting United States COVID-19 community levels for all available counties.

    The COVID-19 community levels were developed using a combination of three metrics — new COVID-19 admissions per 100,000 population in the past 7 days, the percent of staffed inpatient beds occupied by COVID-19 patients, and total new COVID-19 cases per 100,000 population in the past 7 days. The COVID-19 community level was determined by the higher of the new admissions and inpatient beds metrics, based on the current level of new cases per 100,000 population in the past 7 days. New COVID-19 admissions and the percent of staffed inpatient beds occupied represent the current potential for strain on the health system. Data on new cases acts as an early warning indicator of potential increases in health system strain in the event of a COVID-19 surge.

    Using these data, the COVID-19 community level was classified as low, medium, or high.

    COVID-19 Community Levels were used to help communities and individuals make decisions based on their local context and their unique needs. Community vaccination coverage and other local information, like early alerts from surveillance, such as through wastewater or the number of emergency department visits for COVID-19, when available, can also inform decision making for health officials and individuals.

    For the most accurate and up-to-date data for any county or state, visit the relevant health department website. COVID Data Tracker may display data that differ from state and local websites. This can be due to differences in how data were collected, how metrics were calculated, or the timing of web updates.

    Archived Data Notes:

    This dataset was renamed from "United States COVID-19 Community Levels by County as Originally Posted" to "United States COVID-19 Community Levels by County" on March 31, 2022.

    March 31, 2022: Column name for county population was changed to “county_population”. No change was made to the data points previous released.

    March 31, 2022: New column, “health_service_area_population”, was added to the dataset to denote the total population in the designated Health Service Area based on 2019 Census estimate.

    March 31, 2022: FIPS codes for territories American Samoa, Guam, Commonwealth of the Northern Mariana Islands, and United States Virgin Islands were re-formatted to 5-digit numeric for records released on 3/3/2022 to be consistent with other records in the dataset.

    March 31, 2022: Changes were made to the text fields in variables “county”, “state”, and “health_service_area” so the formats are consistent across releases.

    March 31, 2022: The “%” sign was removed from the text field in column “covid_inpatient_bed_utilization”. No change was made to the data. As indicated in the column description, values in this column represent the percentage of staffed inpatient beds occupied by COVID-19 patients (7-day average).

    March 31, 2022: Data values for columns, “county_population”, “health_service_area_number”, and “health_service_area” were backfilled for records released on 2/24/2022. These columns were added since the week of 3/3/2022, thus the values were previously missing for records released the week prior.

    April 7, 2022: Updates made to data released on 3/24/2022 for Guam, Commonwealth of the Northern Mariana Islands, and United States Virgin Islands to correct a data mapping error.

    April 21, 2022: COVID-19 Community Level (CCL) data released for counties in Nebraska for the week of April 21, 2022 have 3 counties identified in the high category and 37 in the medium category. CDC has been working with state officials t

  20. Places

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Sep 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Census Bureau (USCB) (Point of Contact) (2024). Places [Dataset]. https://catalog.data.gov/dataset/places2
    Explore at:
    Dataset updated
    Sep 20, 2024
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    Description

    The Places dataset was published on August 31, 2022 from the United States Census Bureau (USCB) and is part of the U.S. Department of Transportation (USDOT)/Bureau of Transportation Statistics (BTS) National Transportation Atlas Database (NTAD). This resource is a member of a series. The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The TIGER/Line shapefiles include both incorporated places (legal entities) and census designated places or CDPs (statistical entities). An incorporated place is established to provide governmental functions for a concentration of people as opposed to a minor civil division (MCD), which generally is created to provide services or administer an area without regard, necessarily, to population. Places always nest within a state, but may extend across county and county subdivision boundaries. An incorporated place usually is a city, town, village, or borough, but can have other legal descriptions. CDPs are delineated for the decennial census as the statistical counterparts of incorporated places. CDPs are delineated to provide data for settled concentrations of population that are identifiable by name, but are not legally incorporated under the laws of the state in which they are located. The boundaries for CDPs often are defined in partnership with state, local, and/or tribal officials and usually coincide with visible features or the boundary of an adjacent incorporated place or another legal entity. CDP boundaries often change from one decennial census to the next with changes in the settlement pattern and development; a CDP with the same name as in an earlier census does not necessarily have the same boundary. The only population/housing size requirement for CDPs is that they must contain some housing and population. The boundaries of most incorporated places in this shapefile are as of January 1, 2022, as reported through the Census Bureau's Boundary and Annexation Survey (BAS). The boundaries of all CDPs were delineated as part of the Census Bureau's Participant Statistical Areas Program (PSAP) for the 2020 Census, but some CDPs were added or updated through the 2022 BAS as well.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Data.gov (2019). USA Name Data [Dataset]. https://www.kaggle.com/datagov/usa-names
Organization logo

USA Name Data

USA Name Data (BigQuery Dataset)

Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset provided by
Data.govhttps://data.gov/
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered
United States
Description

Context

Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

Content

This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

https://cloud.google.com/bigquery/public-data/usa-names

Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @dcp from Unplash.

Inspiration

What are the most common names?

What are the most common female names?

Are there more female or male names?

Female names by a wide margin?

Search
Clear search
Close search
Google apps
Main menu