100+ datasets found
  1. d

    Popular Baby Names

    • catalog.data.gov
    • data.cityofnewyork.us
    • +5more
    Updated Jul 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2025). Popular Baby Names [Dataset]. https://catalog.data.gov/dataset/popular-baby-names
    Explore at:
    Dataset updated
    Jul 12, 2025
    Dataset provided by
    data.cityofnewyork.us
    Description

    Popular Baby Names by Sex and Ethnic Group Data were collected through civil birth registration. Each record represents the ranking of a baby name in the order of frequency. Data can be used to represent the popularity of a name. Caution should be used when assessing the rank of a baby name if the frequency count is close to 10; the ranking may vary year to year.

  2. Baby Names from Social Security Card Applications - National Data

    • catalog.data.gov
    Updated Jul 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social Security Administration (2025). Baby Names from Social Security Card Applications - National Data [Dataset]. https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-data
    Explore at:
    Dataset updated
    Jul 4, 2025
    Dataset provided by
    Social Security Administrationhttp://ssa.gov/
    Description

    The data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 on.

  3. Most frequent combinations of first and last name in the United States

    • statista.com
    Updated Nov 16, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2013). Most frequent combinations of first and last name in the United States [Dataset]. https://www.statista.com/statistics/279713/frequent-combinations-of-first-and-last-name-in-the-us/
    Explore at:
    Dataset updated
    Nov 16, 2013
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2013
    Area covered
    United States
    Description

    This statistic shows the most frequent combinations of first name and last name in the United States, as of 2013. According to this ranking, the name "James Smith" occurs most often and is most popular in the United States.

  4. Baby Names

    • kaggle.com
    zip
    Updated Feb 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evan Zhang (2021). Baby Names [Dataset]. https://www.kaggle.com/datasets/ironicninja/baby-names
    Explore at:
    zip(5656233 bytes)Available download formats
    Dataset updated
    Feb 9, 2021
    Authors
    Evan Zhang
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Dataset of US baby names from 1910 to 2021. Includes State, Sex, Year, Name, and Count as features.

    Inspiration

    Mainly used for a tutorial but can be used for classification/other visualizations.

  5. d

    Protected Areas Database of the United States (PAD-US) 2.1

    • catalog.data.gov
    • data.usgs.gov
    Updated Nov 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Protected Areas Database of the United States (PAD-US) 2.1 [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-2-1
    Explore at:
    Dataset updated
    Nov 19, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    United States
    Description

    NOTE: A more current version of the Protected Areas Database of the United States (PAD-US) is available: PAD-US 3.0 https://doi.org/10.5066/P9Q9LQ4B. The USGS Protected Areas Database of the United States (PAD-US) is the nation's inventory of protected areas, including public land and voluntarily provided private protected areas, identified as an A-16 National Geospatial Data Asset in the Cadastre Theme (https://communities.geoplatform.gov/ngda-cadastre/). The PAD-US is an ongoing project with several published versions of a spatial database including areas dedicated to the preservation of biological diversity, and other natural (including extraction), recreational, or cultural uses, managed for these purposes through legal or other effective means. The database was originally designed to support biodiversity assessments; however, its scope expanded in recent years to include all public and nonprofit lands and waters. Most are public lands owned in fee (the owner of the property has full and irrevocable ownership of the land); however, long-term easements, leases, agreements, Congressional (e.g. 'Wilderness Area'), Executive (e.g. 'National Monument'), and administrative designations (e.g. 'Area of Critical Environmental Concern') documented in agency management plans are also included. The PAD-US strives to be a complete inventory of public land and other protected areas, compiling “best available” data provided by managing agencies and organizations. The PAD-US geodatabase maps and describes areas using over twenty-five attributes and five feature classes representing the U.S. protected areas network in separate feature classes: Fee (ownership parcels), Designation, Easement, Marine, Proclamation and Other Planning Boundaries. Five additional feature classes include various combinations of the primary layers (for example, Combined_Fee_Easement) to support data management, queries, web mapping services, and analyses. This PAD-US Version 2.1 dataset includes a variety of updates and new data from the previous Version 2.0 dataset (USGS, 2018 https://doi.org/10.5066/P955KPLE ), achieving the primary goal to "Complete the PAD-US Inventory by 2020" (https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/science/pad-us-vision) by addressing known data gaps with newly available data. The following list summarizes the integration of "best available" spatial data to ensure public lands and other protected areas from all jurisdictions are represented in PAD-US, along with continued improvements and regular maintenance of the federal theme. Completing the PAD-US Inventory: 1) Integration of over 75,000 city parks in all 50 States (and the District of Columbia) from The Trust for Public Land's (TPL) ParkServe data development initiative (https://parkserve.tpl.org/) added nearly 2.7 million acres of protected area and significantly reduced the primary known data gap in previous PAD-US versions (local government lands). 2) First-time integration of the Census American Indian/Alaskan Native Areas (AIA) dataset (https://www2.census.gov/geo/tiger/TIGER2019/AIANNH) representing the boundaries for federally recognized American Indian reservations and off-reservation trust lands across the nation (as of January 1, 2020, as reported by the federally recognized tribal governments through the Census Bureau's Boundary and Annexation Survey) addressed another major PAD-US data gap. 3) Aggregation of nearly 5,000 protected areas owned by local land trusts in 13 states, aggregated by Ducks Unlimited through data calls for easements to update the National Conservation Easement Database (https://www.conservationeasement.us/), increased PAD-US protected areas by over 350,000 acres. Maintaining regular Federal updates: 1) Major update of the Federal estate (fee ownership parcels, easement interest, and management designations), including authoritative data from 8 agencies: Bureau of Land Management (BLM), U.S. Census Bureau (Census), Department of Defense (DOD), U.S. Fish and Wildlife Service (FWS), National Park Service (NPS), Natural Resources Conservation Service (NRCS), U.S. Forest Service (USFS), National Oceanic and Atmospheric Administration (NOAA). The federal theme in PAD-US is developed in close collaboration with the Federal Geographic Data Committee (FGDC) Federal Lands Working Group (FLWG, https://communities.geoplatform.gov/ngda-govunits/federal-lands-workgroup/); 2) Complete National Marine Protected Areas (MPA) update: from the National Oceanic and Atmospheric Administration (NOAA) MPA Inventory, including conservation measure ('GAP Status Code', 'IUCN Category') review by NOAA; Other changes: 1) PAD-US field name change - The "Public Access" field name changed from 'Access' to 'Pub_Access' to avoid unintended scripting errors associated with the script command 'access'. 2) Additional field - The "Feature Class" (FeatClass) field was added to all layers within PAD-US 2.1 (only included in the "Combined" layers of PAD-US 2.0 to describe which feature class data originated from). 3) Categorical GAP Status Code default changes - National Monuments are categorically assigned GAP Status Code = 2 (previously GAP 3), in the absence of other information, to better represent biodiversity protection restrictions associated with the designation. The Bureau of Land Management Areas of Environmental Concern (ACECs) are categorically assigned GAP Status Code = 3 (previously GAP 2) as the areas are administratively protected, not permanent. More information is available upon request. 4) Agency Name (FWS) geodatabase domain description changed to U.S. Fish and Wildlife Service (previously U.S. Fish & Wildlife Service). 5) Select areas in the provisional PAD-US 2.1 Proclamation feature class were removed following a consultation with the data-steward (Census Bureau). Tribal designated statistical areas are purely a geographic area for providing Census statistics with no land base. Most affected areas are relatively small; however, 4,341,120 acres and 37 records were removed in total. Contact Mason Croft (masoncroft@boisestate) for more information about how to identify these records. For more information regarding the PAD-US dataset please visit, https://usgs.gov/gapanalysis/PAD-US/. For more information about data aggregation please review the Online PAD-US Data Manual available at https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/pad-us-data-manual .

  6. d

    Census Data

    • catalog.data.gov
    • data.globalchange.gov
    • +3more
    Updated Mar 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Bureau of the Census (2024). Census Data [Dataset]. https://catalog.data.gov/dataset/census-data
    Explore at:
    Dataset updated
    Mar 1, 2024
    Dataset provided by
    U.S. Bureau of the Census
    Description

    The Bureau of the Census has released Census 2000 Summary File 1 (SF1) 100-Percent data. The file includes the following population items: sex, age, race, Hispanic or Latino origin, household relationship, and household and family characteristics. Housing items include occupancy status and tenure (whether the unit is owner or renter occupied). SF1 does not include information on incomes, poverty status, overcrowded housing or age of housing. These topics will be covered in Summary File 3. Data are available for states, counties, county subdivisions, places, census tracts, block groups, and, where applicable, American Indian and Alaskan Native Areas and Hawaiian Home Lands. The SF1 data are available on the Bureau's web site and may be retrieved from American FactFinder as tables, lists, or maps. Users may also download a set of compressed ASCII files for each state via the Bureau's FTP server. There are over 8000 data items available for each geographic area. The full listing of these data items is available here as a downloadable compressed data base file named TABLES.ZIP. The uncompressed is in FoxPro data base file (dbf) format and may be imported to ACCESS, EXCEL, and other software formats. While all of this information is useful, the Office of Community Planning and Development has downloaded selected information for all states and areas and is making this information available on the CPD web pages. The tables and data items selected are those items used in the CDBG and HOME allocation formulas plus topics most pertinent to the Comprehensive Housing Affordability Strategy (CHAS), the Consolidated Plan, and similar overall economic and community development plans. The information is contained in five compressed (zipped) dbf tables for each state. When uncompressed the tables are ready for use with FoxPro and they can be imported into ACCESS, EXCEL, and other spreadsheet, GIS and database software. The data are at the block group summary level. The first two characters of the file name are the state abbreviation. The next two letters are BG for block group. Each record is labeled with the code and name of the city and county in which it is located so that the data can be summarized to higher-level geography. The last part of the file name describes the contents . The GEO file contains standard Census Bureau geographic identifiers for each block group, such as the metropolitan area code and congressional district code. The only data included in this table is total population and total housing units. POP1 and POP2 contain selected population variables and selected housing items are in the HU file. The MA05 table data is only for use by State CDBG grantees for the reporting of the racial composition of beneficiaries of Area Benefit activities. The complete package for a state consists of the dictionary file named TABLES, and the five data files for the state. The logical record number (LOGRECNO) links the records across tables.

  7. US geographic names 2017-06-01

    • figshare.com
    application/gzip
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacob Malcom (2023). US geographic names 2017-06-01 [Dataset]. http://doi.org/10.6084/m9.figshare.4897124.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Jacob Malcom
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    A text database of named places in the United States. The US Board on Geographic Names controls the database of official names of places in the US, and the US Geological Survey (USGS) maintains the database. This is a copy of the 2017-06-01 database, which I am using to create an R package for textual analyses of geographic content, to ensure this version remains. The original source was from: https://geonames.usgs.gov/domestic/download_data.htm (which is a very slow server).I have chosen CC0 for the license because, as a creation of the US government, I don't think the database can be copyrighted (and CC0 is the closest match).

  8. e

    LinkedIn US city ID's

    • datarepository.eur.nl
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul Kievits (2023). LinkedIn US city ID's [Dataset]. http://doi.org/10.25397/eur.19932221.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Erasmus University Rotterdam (EUR)
    Authors
    Paul Kievits
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This is a very small but useful dataset if you are ever looking to get jobs for a certain US city in LinkedIn. It contains a list of US cities and states and it's corresponding LinkedIn ID (which is usually externally hidden).

    The cities list was retreived from here: https://github.com/kelvins/US-Cities-Database and the names of the ciiadjusted to match the name used in LinkedIn (which could differ in subtle ways).

    Some cities do not have an ID, this is because the city is either too small or because there was a difference in the name on LinkedIn which I did not detect (human error). If you ever run in to one of these feel free to enhance this dataset.

  9. w

    US Census Bureau TIGER data

    • data.wu.ac.at
    Updated Oct 10, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Global (2013). US Census Bureau TIGER data [Dataset]. https://data.wu.ac.at/odso/datahub_io/YmFkOGRmNGYtNmEyZS00ZTQ5LTk0NmMtNzk1MTE1OThhOGQ1
    Explore at:
    Dataset updated
    Oct 10, 2013
    Dataset provided by
    Global
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Area covered
    United States
    Description

    The US government's 'Topologically Integrated Geographic Encoding and Referencing' system, usually referred to as TIGER, is based on an extensive database of US geographic information. It is county-level data that documents physical features like roads and rivers, as well as some administrative features such as Congressional districts. Data can be downloaded for each state and for each of the following: Puerto Rico, American Samoa, Guam, Northern Mariana Islands, Midway Island, and the US Virgin Islands.

    The database does not contain demographic or topographic (terrain) data; the "topology" referenced in the name refers to how the database itself is designed.

    Only recent versions are available online for free download. Note also that the Census Bureau is making substantial changes to how the TIGER system is formatted, which should allow for wider and more effective compatibility with GIS tools. A good overview can be seen at this page

  10. d

    Alesco Phone ID Database - Phone Data with over 860 Million Phone Number...

    • datarade.ai
    .csv, .xls, .txt
    Updated Jul 5, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alesco Data (2018). Alesco Phone ID Database - Phone Data with over 860 Million Phone Number with Carrier Name, covers 94% of the US population - available for licensing! [Dataset]. https://datarade.ai/data-products/alesco-phone-id-database-the-industry-s-largest-and-most-ac-alesco-data
    Explore at:
    .csv, .xls, .txtAvailable download formats
    Dataset updated
    Jul 5, 2018
    Dataset authored and provided by
    Alesco Data
    Area covered
    United States
    Description

    The Alesco Phone ID Database data ties together a consumer's true identity, and with linkage to the Alesco Power Identity Graph, we are perfectly positioned to help customers solve today's most challenging marketing, analytics, and identity resolution problems.

    Our proprietary Phone ID database combines public and private sources and validates phone numbers against current and historical data 24 hours a day, 365 days a year.

    With over 650 million unique phone numbers, device and service information, our one-of-a-kind solutions are now available for your marketing and identity resolution challenges in both B2C and B2B applications!

    • Alesco Phone ID provides more than 860 million phone numbers monthly linked to a consumer or business name and includes landline, mobile phone number, VoIP, private and business phone numbers — all permissibly obtained and privacy-compliant and linked to other Alesco data sets

    • How we do it: Alesco Phone ID is multi-sourced with daily information and delivered monthly or quarterly to clients. Our proprietary machine learning and advanced analytics processes ensure quality levels far above industry standards. Alesco processes over 100 million phone signals per day, compiling, normalizing, and standardizing phone information from 37 input sources.

    • Accuracy: Each of Alesco’s phone data sources are vetted to ensure they are authoritative, giving you confidence in the accuracy of the information. Every record is validated, verified and processed to ensure the widest, most reliable coverage combined with stunning precision.

    Ease of use: Alesco’s Phone ID Database is available as an on-premise phone database license, giving you full control to host and access this powerful resource on-site. Ongoing updates are provided on a monthly basis ensure your data is up to date.

  11. d

    Email Address Data| Email Database | US Consumers | 650 million Consumer...

    • datarade.ai
    .csv, .txt
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stirista, Email Address Data| Email Database | US Consumers | 650 million Consumer Email Addresses [Dataset]. https://datarade.ai/data-products/email-address-data-email-database-us-consumers-564-milli-stirista
    Explore at:
    .csv, .txtAvailable download formats
    Dataset authored and provided by
    Stirista
    Area covered
    United States of America
    Description

    Andrew Wharton's Actionable US Consumer Email Database hosts over 650 million email addresses that have been active within the last 36 months. This database is fully CAN-SPAM compliant and 100% opted-in for Third Party Use.

    This Email Address database successfully connects you with your customers and/or prospects at their most recent, deliverable online address. and Increase impression rates, deliverability, and engagement in your digital campaigns.

    The Email Address Data is 100% populated with email address, HEMS (MD5, Sha1, Sha256) first name, last name, postal address (primary and secondary), IP Address, Time Stamp(s) for Last Registration, Verification, and First Seen. An enhanced version of the database is available with Date-of-Birth (where available), Phone (mobile and landline) and MAIDs to Hashed email conversion.

    The Andrews Wharton Actionable US Consumer Email Database is updated monthly. A complete replacement database or new adds are available as update files.

    Contact us at successdelivered@andrewswharton.com or visit us at www.andrewswharton.com to learn more about this dataset.

  12. US National Baby Names (1880-2017)

    • kaggle.com
    Updated Sep 14, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hassen Morad (2018). US National Baby Names (1880-2017) [Dataset]. https://www.kaggle.com/hassenmorad/us-national-baby-names-18802017/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 14, 2018
    Dataset provided by
    Kaggle
    Authors
    Hassen Morad
    Description

    This is an updated version of the national baby names database, containing records from 1880-2017.

  13. u

    2020 New Mexico Designated Places

    • gstore.unm.edu
    csv, geojson, gml +5
    Updated Sep 10, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Earth Data Analysis Center (2021). 2020 New Mexico Designated Places [Dataset]. https://gstore.unm.edu/apps/rgis/datasets/3fb87757-7bcf-4941-b7b8-660e32a8432d/metadata/FGDC-STD-001-1998.html
    Explore at:
    shp(5), xls(5), json(5), kml(5), zip(3), gml(5), geojson(5), csv(5)Available download formats
    Dataset updated
    Sep 10, 2021
    Dataset provided by
    Earth Data Analysis Center
    Time period covered
    May 23, 2020
    Area covered
    and feature names., Geographic Names Information System (GNIS), Federal Information Processing Series (FIPS), New Mexico, West Bounding Coordinate -109.049169 East Bounding Coordinate -103.043557 North Bounding Coordinate 37.000129 South Bounding Coordinate 31.783667, United States
    Description

    The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The TIGER/Line shapefiles include both incorporated places (legal entities) and census designated places or CDPs (statistical entities). An incorporated place is established to provide governmental functions for a concentration of people as opposed to a minor civil division (MCD), which generally is created to provide services or administer an area without regard, necessarily, to population. Places always nest within a state, but may extend across county and county subdivision boundaries. An incorporated place usually is a city, town, village, or borough, but can have other legal descriptions. CDPs are delineated for the decennial census as the statistical counterparts of incorporated places. CDPs are delineated to provide data for settled concentrations of population that are identifiable by name, but are not legally incorporated under the laws of the state in which they are located. The boundaries for CDPs often are defined in partnership with state, local, and/or tribal officials and usually coincide with visible features or the boundary of an adjacent incorporated place or another legal entity. CDP boundaries often change from one decennial census to the next with changes in the settlement pattern and development; a CDP with the same name as in an earlier census does not necessarily have the same boundary. The only population/housing size requirement for CDPs is that they must contain some housing and population. The boundaries of most incorporated places in this shapefile are as of January 1, 2020, as reported through the Census Bureau's Boundary and Annexation Survey (BAS). The boundaries of all CDPs were delineated as part of the Census Bureau's Participant Statistical Areas Program (PSAP) for the 2010 Census.

  14. Places

    • catalog.data.gov
    • geodata.bts.gov
    • +1more
    Updated Oct 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Department of Commerce, U.S. Census Bureau, Geography Division (Point of Contact) (2025). Places [Dataset]. https://catalog.data.gov/dataset/places2
    Explore at:
    Dataset updated
    Oct 21, 2025
    Dataset provided by
    United States Census Bureauhttp://census.gov/
    United States Department of Commercehttp://commerce.gov/
    Description

    The Places dataset was published on September 22, 2025 from the U.S. Department of Commerce, U.S. Census Bureau, Geography Division and is part of the U.S. Department of Transportation (USDOT)/Bureau of Transportation Statistics (BTS) National Transportation Atlas Database (NTAD). This resource is a member of a series. The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) System (MTS). The MTS represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The TIGER/Line shapefiles include both incorporated places (legal entities) and census designated places or CDPs (statistical entities). An incorporated place is established to provide governmental functions for a concentration of people as opposed to a minor civil division (MCD), which generally is created to provide services or administer an area without regard, necessarily, to population. Places always nest within a state but may extend across county and county subdivision boundaries. An incorporated place is usually a city, town, village, or borough, but can have other legal descriptions. CDPs are delineated for the decennial census as the statistical counterparts of incorporated places. CDPs are delineated to provide data for settled concentrations of population that are identifiable by name but are not legally incorporated under the laws of the state in which they are located. The boundaries for CDPs are often defined in partnership with state, local, and/or tribal officials and usually coincide with visible features or the boundary of an adjacent incorporated place or another legal entity. CDP boundaries often change from one decennial census to the next with changes in the settlement pattern and development; a CDP with the same name as in an earlier census does not necessarily have the same boundary. The only population/housing size requirement for CDPs is that they must contain some housing and population. The boundaries of most incorporated places in this shapefile are as of January 1, 2024, as reported through the Census Bureau's Boundary and Annexation Survey (BAS). The boundaries of all CDPs were delineated as part of the Census Bureau's Participant Statistical Areas Program (PSAP) for the 2020 Census, but some CDPs were added or updated through the 2024 BAS as well. A data dictionary, or other source of attribute information, is accessible at https://doi.org/10.21949/1529072

  15. H

    Race and ethnicity data for first, middle, and last names

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Apr 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evan Rosenman; Santiago Olivella; Kosuke Imai (2023). Race and ethnicity data for first, middle, and last names [Dataset]. http://doi.org/10.7910/DVN/SGKW0K
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 11, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Evan Rosenman; Santiago Olivella; Kosuke Imai
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    We provide datasets that that estimate the racial distributions associated with first, middle, and last names in the United States. The datasets cover five racial categories: White, Black, Hispanic, Asian, and Other. The provided data are computed from the voter files of six Southern states -- Alabama, Florida, Georgia, Louisiana, North Carolina, and South Carolina -- that collect race and ethnicity data upon registration. We include seven voter files per state, sourced between 2018 and 2021 from L2, Inc. Together, these states have approximately 36MM individuals who provide self-reported race and ethnicity. The last name datasets includes 338K surnames, while the middle name dictionaries contains 126K middle names and the first name datasets includes 136K first names. For each type of name, we provide a dataset of P(race | name) probabilities and P(name | race) probabilities. We include only names that appear at least 25 times across the 42 (= 7 voter files * 6 states) voter files in our dataset. These data are closely related to the the dataset: "Name Dictionaries for "wru" R Package", https://doi.org/10.7910/DVN/7TRYAC. These are the probabilities used in the latest iteration of the "WRU" package (Khanna et al., 2022) to make probabilistic predictions about the race of individuals, given their names and geolocations.

  16. u

    Census MAF/TIGER database

    • gstore.unm.edu
    csv, geojson, gml +5
    Updated Oct 4, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Earth Data Analysis Center (2018). Census MAF/TIGER database [Dataset]. https://gstore.unm.edu/apps/rgis/datasets/241d8727-e304-4e46-9a96-37ba602c79e6/metadata/FGDC-STD-001-1998.html
    Explore at:
    json(5), zip(2), shp(5), kml(5), csv(5), geojson(5), xls(5), gml(5)Available download formats
    Dataset updated
    Oct 4, 2018
    Dataset provided by
    Earth Data Analysis Center
    Time period covered
    Jun 2014
    Area covered
    Geographic Names Information System (GNIS), and feature names., Federal Information Processing Series (FIPS), West Bounding Coordinate -109.049169 East Bounding Coordinate -103.043557 North Bounding Coordinate 37.000014 South Bounding Coordinate 31.783148, United States, New Mexico
    Description

    The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The TIGER/Line shapefiles include both incorporated places (legal entities) and census designated places or CDPs (statistical entities). An incorporated place is established to provide governmental functions for a concentration of people as opposed to a minor civil division (MCD), which generally is created to provide services or administer an area without regard, necessarily, to population. Places always nest within a state, but may extend across county and county subdivision boundaries. An incorporated place usually is a city, town, village, or borough, but can have other legal descriptions. CDPs are delineated for the decennial census as the statistical counterparts of incorporated places. CDPs are delineated to provide data for settled concentrations of population that are identifiable by name, but are not legally incorporated under the laws of the state in which they are located. The boundaries for CDPs often are defined in partnership with state, local, and/or tribal officials and usually coincide with visible features or the boundary of an adjacent incorporated place or another legal entity. CDP boundaries often change from one decennial census to the next with changes in the settlement pattern and development; a CDP with the same name as in an earlier census does not necessarily have the same boundary. The only population/housing size requirement for CDPs is that they must contain some housing and population. The boundaries of most incorporated places in this shapefile are as of January 1, 2015, as reported through the Census Bureau's Boundary and Annexation Survey (BAS). The boundaries of all CDPs were delineated as part of the Census Bureau's Participant Statistical Areas Program (PSAP) for the 2010 Census.

  17. K

    US Places (Population 50K-100K)

    • koordinates.com
    csv, dwg, geodatabase +6
    Updated Feb 1, 2001
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    US Bureau of Transportation Statistics (BTS) (2001). US Places (Population 50K-100K) [Dataset]. https://koordinates.com/layer/22835-us-places-population-50k-100k/
    Explore at:
    dwg, geodatabase, pdf, mapinfo mif, kml, shapefile, geopackage / sqlite, csv, mapinfo tabAvailable download formats
    Dataset updated
    Feb 1, 2001
    Dataset authored and provided by
    US Bureau of Transportation Statistics (BTS)
    Area covered
    Description

    This data set includes cities in the United States, Puerto Rico and the U.S. Virgin Islands. These cities were collected from the 1970 National Atlas of the United States. Where applicable, U.S. Census Bureau codes for named populated places were associated with each name to allow additional information to be attached. The Geographic Names Information System (GNIS) was also used as a source for additional information. This is a revised version of the December, 2003, data set.

    This layer is sourced from maps.bts.dot.gov.

  18. n

    2020 Census Block Groups for Urban Search and Rescue - 764585

    • prep-response-portal.napsgfoundation.org
    • data-napsg.opendata.arcgis.com
    • +1more
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SARGeo (2025). 2020 Census Block Groups for Urban Search and Rescue - 764585 [Dataset]. https://prep-response-portal.napsgfoundation.org/datasets/sargeo::2020-census-block-groups-for-urban-search-and-rescue-764585
    Explore at:
    Dataset updated
    Jun 24, 2025
    Dataset authored and provided by
    SARGeo
    Area covered
    Description

    USA Census Block Groups (CBG) for Urban Search and Rescue. This layer can be used for search segment planning. Block groups generally contain between 600 and 5,000 people and the boundaries generally follow existing roads and waterways. The field segment_designation is the last 6 digits of the unique identifier and matches the field in the SARCOP Segment layer.Data download date: August 12, 2021Census tables: P1, P2, P3, P4, H1, P5, HeaderDownloaded from: Census FTP siteProcessing Notes:Data was downloaded from the U.S. Census Bureau FTP site, imported into SAS format and joined to the 2020 TIGER boundaries. Boundaries are sourced from the 2020 TIGER/Line Geodatabases. Boundaries have been projected into Web Mercator and each attribute has been given a clear descriptive alias name. No alterations have been made to the vertices of the data.Each attribute maintains it's specified name from Census, but also has a descriptive alias name and long description derived from the technical documentation provided by the Census. For a detailed list of the attributes contained in this layer, view the Data tab and select "Fields". The following alterations have been made to the tabular data:Joined all tables to create one wide attribute table:P1 - RaceP2 - Hispanic or Latino, and not Hispanic or Latino by RaceP3 - Race for the Population 18 Years and OverP4 - Hispanic or Latino, and not Hispanic or Latino by Race for the Population 18 Years and OverH1 - Occupancy Status (Housing)P5 - Group Quarters Population by Group Quarters Type (correctional institutions, juvenile facilities, nursing facilities/skilled nursing, college/university student housing, military quarters, etc.)HeaderAfter joining, dropped fields: FILEID, STUSAB, CHARITER, CIFSN, LOGRECNO, GEOVAR, GEOCOMP, LSADC, and BLOCK.GEOCOMP was renamed to GEOID and moved be the first column in the table, the original GEOID was dropped.Placeholder fields for future legislative districts have been dropped: CD118, CD119, CD120, CD121, SLDU22, SLDU24, SLDU26, SLDU28, SLDL22, SLDL24 SLDL26, SLDL28.P0020001 was dropped, as it is duplicative of P0010001. Similarly, P0040001 was dropped, as it is duplicative of P0030001.In addition to calculated fields, County_Name and State_Name were added.The following calculated fields have been added (see long field descriptions in the Data tab for formulas used): PCT_P0030001: Percent of Population 18 Years and OverPCT_P0020002: Percent Hispanic or LatinoPCT_P0020005: Percent White alone, not Hispanic or LatinoPCT_P0020006: Percent Black or African American alone, not Hispanic or LatinoPCT_P0020007: Percent American Indian and Alaska Native alone, not Hispanic or LatinoPCT_P0020008: Percent Asian alone, Not Hispanic or LatinoPCT_P0020009: Percent Native Hawaiian and Other Pacific Islander alone, not Hispanic or LatinoPCT_P0020010: Percent Some Other Race alone, not Hispanic or LatinoPCT_P0020011: Percent Population of Two or More Races, not Hispanic or LatinoPCT_H0010002: Percent of Housing Units that are OccupiedPCT_H0010003: Percent of Housing Units that are VacantPlease note these percentages might look strange at the individual block group level, since this data has been protected using differential privacy.* *To protect the privacy and confidentiality of respondents, data has been protected using differential privacy techniques by the U.S. Census Bureau. This means that some individual block groups will have values that are inconsistent or improbable. However, when aggregated up, these issues become minimized. The pop-up on this layer uses Arcade to display aggregated values for the surrounding area rather than values for the block group itself.Download Census redistricting data in this layer as a file geodatabase.Additional links:U.S. Census BureauU.S. Census Bureau Decennial CensusAbout the 2020 Census2020 Census2020 Census data qualityDecennial Census P.L. 94-171 Redistricting Data Program

  19. d

    Alesco Email Database - Email Address Data - 2.3+ Billion US email records -...

    • datarade.ai
    .csv, .xls, .txt
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alesco Data, Alesco Email Database - Email Address Data - 2.3+ Billion US email records - available acquisition marketing and identify resolution! [Dataset]. https://datarade.ai/data-products/alesco-email-database-over-1-8-billion-us-email-records-alesco-data
    Explore at:
    .csv, .xls, .txtAvailable download formats
    Dataset authored and provided by
    Alesco Data
    Area covered
    United States of America
    Description

    Alesco’s aggregated consumer email database consists of over 2.3 billion U.S. records with name, address and email. The database is fully CAN-SPAM and privacy compliant, and records include referring URL, IP address and date stamp. Postal addresses are address standardized and processed through NCOA. Available for licensing!

    File size: 2.3 Billion IP Address: 1.9 Billion eAppend data: 1.8 Billion (full name/postal) Acquisition: 269 Million (full demo’s)

    Fields Included: Name Address Email Phone IP Address

    Consumer Behavior Data Consumer Demographic Data Identity Data Audience Data

  20. H

    Data from: Validated Names for Experimental Studies on Race and Ethnicity

    • dataverse.harvard.edu
    • search.dataone.org
    • +1more
    Updated Mar 22, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charles Crabtree; Jae Yeon Kim; Michael S. Gaddis; John B. Holbein; Cameron Guage; William X. Marx (2022). Validated Names for Experimental Studies on Race and Ethnicity [Dataset]. http://doi.org/10.7910/DVN/JVCUQM
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 22, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Charles Crabtree; Jae Yeon Kim; Michael S. Gaddis; John B. Holbein; Cameron Guage; William X. Marx
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    A large and fast-growing number of studies across the social sciences use experiments to better understand the role of race in human interactions, particularly in the American context. Researchers often use names to signal the race of individuals portrayed in these experiments. However, those names might also signal other attributes, such as socioeconomic status (e.g., education and income) and citizenship. If they do, researchers need pre-tested names with data on perceptions of these attributes. Such data would permit researchers to draw correct inferences about the causal effect of race in their experiments. In this paper, we provide the largest dataset of validated name perceptions based on three different surveys conducted in the United States. In total, our data include over 44,170 name evaluations from 4,026 respondents for 600 names. In addition to respondent perceptions of race, income, education, and citizenship from names, our data also include respondent characteristics. Our data will be broadly helpful for researchers conducting experiments on the manifold ways in which race shapes American life.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
data.cityofnewyork.us (2025). Popular Baby Names [Dataset]. https://catalog.data.gov/dataset/popular-baby-names

Popular Baby Names

Explore at:
4 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jul 12, 2025
Dataset provided by
data.cityofnewyork.us
Description

Popular Baby Names by Sex and Ethnic Group Data were collected through civil birth registration. Each record represents the ranking of a baby name in the order of frequency. Data can be used to represent the popularity of a name. Caution should be used when assessing the rank of a baby name if the frequency count is close to 10; the ranking may vary year to year.

Search
Clear search
Close search
Google apps
Main menu