100+ datasets found

d
Popular Baby Names
catalog.data.gov
data.cityofnewyork.us
+5more
Updated Jul 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2025). Popular Baby Names [Dataset]. https://catalog.data.gov/dataset/popular-baby-names
Explore at:
Dataset updated
Jul 12, 2025
Dataset provided by
data.cityofnewyork.us
Description
Popular Baby Names by Sex and Ethnic Group Data were collected through civil birth registration. Each record represents the ranking of a baby name in the order of frequency. Data can be used to represent the popularity of a name. Caution should be used when assessing the rank of a baby name if the frequency count is close to 10; the ranking may vary year to year.
Baby Names from Social Security Card Applications - National Data
catalog.data.gov
Updated Jul 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Social Security Administration (2025). Baby Names from Social Security Card Applications - National Data [Dataset]. https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-data
Explore at:
Dataset updated
Jul 4, 2025
Dataset provided by
Social Security Administrationhttp://ssa.gov/
Description
The data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 on.
Most frequent combinations of first and last name in the United States
statista.com
Updated Nov 16, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2013). Most frequent combinations of first and last name in the United States [Dataset]. https://www.statista.com/statistics/279713/frequent-combinations-of-first-and-last-name-in-the-us/
Explore at:
Dataset updated
Nov 16, 2013
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2013
Area covered
United States
Description
This statistic shows the most frequent combinations of first name and last name in the United States, as of 2013. According to this ranking, the name "James Smith" occurs most often and is most popular in the United States.
Baby Names
kaggle.com
zip
Updated Feb 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evan Zhang (2021). Baby Names [Dataset]. https://www.kaggle.com/datasets/ironicninja/baby-names
Explore at:
zip(5656233 bytes)Available download formats
Dataset updated
Feb 9, 2021
Authors
Evan Zhang
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Dataset of US baby names from 1910 to 2021. Includes State, Sex, Year, Name, and Count as features.

Inspiration

Mainly used for a tutorial but can be used for classification/other visualizations.
d
Protected Areas Database of the United States (PAD-US) 2.1
catalog.data.gov
data.usgs.gov
Updated Nov 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Protected Areas Database of the United States (PAD-US) 2.1 [Dataset]. https://catalog.data.gov/dataset/protected-areas-database-of-the-united-states-pad-us-2-1
Explore at:
Dataset updated
Nov 19, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
United States
Description
NOTE: A more current version of the Protected Areas Database of the United States (PAD-US) is available: PAD-US 3.0 https://doi.org/10.5066/P9Q9LQ4B. The USGS Protected Areas Database of the United States (PAD-US) is the nation's inventory of protected areas, including public land and voluntarily provided private protected areas, identified as an A-16 National Geospatial Data Asset in the Cadastre Theme (https://communities.geoplatform.gov/ngda-cadastre/). The PAD-US is an ongoing project with several published versions of a spatial database including areas dedicated to the preservation of biological diversity, and other natural (including extraction), recreational, or cultural uses, managed for these purposes through legal or other effective means. The database was originally designed to support biodiversity assessments; however, its scope expanded in recent years to include all public and nonprofit lands and waters. Most are public lands owned in fee (the owner of the property has full and irrevocable ownership of the land); however, long-term easements, leases, agreements, Congressional (e.g. 'Wilderness Area'), Executive (e.g. 'National Monument'), and administrative designations (e.g. 'Area of Critical Environmental Concern') documented in agency management plans are also included. The PAD-US strives to be a complete inventory of public land and other protected areas, compiling “best available” data provided by managing agencies and organizations. The PAD-US geodatabase maps and describes areas using over twenty-five attributes and five feature classes representing the U.S. protected areas network in separate feature classes: Fee (ownership parcels), Designation, Easement, Marine, Proclamation and Other Planning Boundaries. Five additional feature classes include various combinations of the primary layers (for example, Combined_Fee_Easement) to support data management, queries, web mapping services, and analyses. This PAD-US Version 2.1 dataset includes a variety of updates and new data from the previous Version 2.0 dataset (USGS, 2018 https://doi.org/10.5066/P955KPLE ), achieving the primary goal to "Complete the PAD-US Inventory by 2020" (https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/science/pad-us-vision) by addressing known data gaps with newly available data. The following list summarizes the integration of "best available" spatial data to ensure public lands and other protected areas from all jurisdictions are represented in PAD-US, along with continued improvements and regular maintenance of the federal theme. Completing the PAD-US Inventory: 1) Integration of over 75,000 city parks in all 50 States (and the District of Columbia) from The Trust for Public Land's (TPL) ParkServe data development initiative (https://parkserve.tpl.org/) added nearly 2.7 million acres of protected area and significantly reduced the primary known data gap in previous PAD-US versions (local government lands). 2) First-time integration of the Census American Indian/Alaskan Native Areas (AIA) dataset (https://www2.census.gov/geo/tiger/TIGER2019/AIANNH) representing the boundaries for federally recognized American Indian reservations and off-reservation trust lands across the nation (as of January 1, 2020, as reported by the federally recognized tribal governments through the Census Bureau's Boundary and Annexation Survey) addressed another major PAD-US data gap. 3) Aggregation of nearly 5,000 protected areas owned by local land trusts in 13 states, aggregated by Ducks Unlimited through data calls for easements to update the National Conservation Easement Database (https://www.conservationeasement.us/), increased PAD-US protected areas by over 350,000 acres. Maintaining regular Federal updates: 1) Major update of the Federal estate (fee ownership parcels, easement interest, and management designations), including authoritative data from 8 agencies: Bureau of Land Management (BLM), U.S. Census Bureau (Census), Department of Defense (DOD), U.S. Fish and Wildlife Service (FWS), National Park Service (NPS), Natural Resources Conservation Service (NRCS), U.S. Forest Service (USFS), National Oceanic and Atmospheric Administration (NOAA). The federal theme in PAD-US is developed in close collaboration with the Federal Geographic Data Committee (FGDC) Federal Lands Working Group (FLWG, https://communities.geoplatform.gov/ngda-govunits/federal-lands-workgroup/); 2) Complete National Marine Protected Areas (MPA) update: from the National Oceanic and Atmospheric Administration (NOAA) MPA Inventory, including conservation measure ('GAP Status Code', 'IUCN Category') review by NOAA; Other changes: 1) PAD-US field name change - The "Public Access" field name changed from 'Access' to 'Pub_Access' to avoid unintended scripting errors associated with the script command 'access'. 2) Additional field - The "Feature Class" (FeatClass) field was added to all layers within PAD-US 2.1 (only included in the "Combined" layers of PAD-US 2.0 to describe which feature class data originated from). 3) Categorical GAP Status Code default changes - National Monuments are categorically assigned GAP Status Code = 2 (previously GAP 3), in the absence of other information, to better represent biodiversity protection restrictions associated with the designation. The Bureau of Land Management Areas of Environmental Concern (ACECs) are categorically assigned GAP Status Code = 3 (previously GAP 2) as the areas are administratively protected, not permanent. More information is available upon request. 4) Agency Name (FWS) geodatabase domain description changed to U.S. Fish and Wildlife Service (previously U.S. Fish & Wildlife Service). 5) Select areas in the provisional PAD-US 2.1 Proclamation feature class were removed following a consultation with the data-steward (Census Bureau). Tribal designated statistical areas are purely a geographic area for providing Census statistics with no land base. Most affected areas are relatively small; however, 4,341,120 acres and 37 records were removed in total. Contact Mason Croft (masoncroft@boisestate) for more information about how to identify these records. For more information regarding the PAD-US dataset please visit, https://usgs.gov/gapanalysis/PAD-US/. For more information about data aggregation please review the Online PAD-US Data Manual available at https://www.usgs.gov/core-science-systems/science-analytics-and-synthesis/gap/pad-us-data-manual .
d
Census Data
catalog.data.gov
data.globalchange.gov
+3more
Updated Mar 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Bureau of the Census (2024). Census Data [Dataset]. https://catalog.data.gov/dataset/census-data
Explore at:
Dataset updated
Mar 1, 2024
Dataset provided by
U.S. Bureau of the Census
Description
The Bureau of the Census has released Census 2000 Summary File 1 (SF1) 100-Percent data. The file includes the following population items: sex, age, race, Hispanic or Latino origin, household relationship, and household and family characteristics. Housing items include occupancy status and tenure (whether the unit is owner or renter occupied). SF1 does not include information on incomes, poverty status, overcrowded housing or age of housing. These topics will be covered in Summary File 3. Data are available for states, counties, county subdivisions, places, census tracts, block groups, and, where applicable, American Indian and Alaskan Native Areas and Hawaiian Home Lands. The SF1 data are available on the Bureau's web site and may be retrieved from American FactFinder as tables, lists, or maps. Users may also download a set of compressed ASCII files for each state via the Bureau's FTP server. There are over 8000 data items available for each geographic area. The full listing of these data items is available here as a downloadable compressed data base file named TABLES.ZIP. The uncompressed is in FoxPro data base file (dbf) format and may be imported to ACCESS, EXCEL, and other software formats. While all of this information is useful, the Office of Community Planning and Development has downloaded selected information for all states and areas and is making this information available on the CPD web pages. The tables and data items selected are those items used in the CDBG and HOME allocation formulas plus topics most pertinent to the Comprehensive Housing Affordability Strategy (CHAS), the Consolidated Plan, and similar overall economic and community development plans. The information is contained in five compressed (zipped) dbf tables for each state. When uncompressed the tables are ready for use with FoxPro and they can be imported into ACCESS, EXCEL, and other spreadsheet, GIS and database software. The data are at the block group summary level. The first two characters of the file name are the state abbreviation. The next two letters are BG for block group. Each record is labeled with the code and name of the city and county in which it is located so that the data can be summarized to higher-level geography. The last part of the file name describes the contents . The GEO file contains standard Census Bureau geographic identifiers for each block group, such as the metropolitan area code and congressional district code. The only data included in this table is total population and total housing units. POP1 and POP2 contain selected population variables and selected housing items are in the HU file. The MA05 table data is only for use by State CDBG grantees for the reporting of the racial composition of beneficiaries of Area Benefit activities. The complete package for a state consists of the dictionary file named TABLES, and the five data files for the state. The logical record number (LOGRECNO) links the records across tables.
US geographic names 2017-06-01
figshare.com
application/gzip
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Malcom (2023). US geographic names 2017-06-01 [Dataset]. http://doi.org/10.6084/m9.figshare.4897124.v1
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.4897124.v1
Dataset updated
Jun 4, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Jacob Malcom
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
A text database of named places in the United States. The US Board on Geographic Names controls the database of official names of places in the US, and the US Geological Survey (USGS) maintains the database. This is a copy of the 2017-06-01 database, which I am using to create an R package for textual analyses of geographic content, to ensure this version remains. The original source was from: https://geonames.usgs.gov/domestic/download_data.htm (which is a very slow server).I have chosen CC0 for the license because, as a creation of the US government, I don't think the database can be copyrighted (and CC0 is the closest match).
e
LinkedIn US city ID's
datarepository.eur.nl
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul Kievits (2023). LinkedIn US city ID's [Dataset]. http://doi.org/10.25397/eur.19932221.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.25397/eur.19932221.v1
Dataset updated
May 31, 2023
Dataset provided by
Erasmus University Rotterdam (EUR)
Authors
Paul Kievits
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This is a very small but useful dataset if you are ever looking to get jobs for a certain US city in LinkedIn. It contains a list of US cities and states and it's corresponding LinkedIn ID (which is usually externally hidden).

The cities list was retreived from here: https://github.com/kelvins/US-Cities-Database and the names of the ciiadjusted to match the name used in LinkedIn (which could differ in subtle ways).

Some cities do not have an ID, this is because the city is either too small or because there was a difference in the name on LinkedIn which I did not detect (human error). If you ever run in to one of these feel free to enhance this dataset.
w
US Census Bureau TIGER data
data.wu.ac.at
Updated Oct 10, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Global (2013). US Census Bureau TIGER data [Dataset]. https://data.wu.ac.at/odso/datahub_io/YmFkOGRmNGYtNmEyZS00ZTQ5LTk0NmMtNzk1MTE1OThhOGQ1
Explore at:
Dataset updated
Oct 10, 2013
Dataset provided by
Global
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Area covered
United States
Description
The US government's 'Topologically Integrated Geographic Encoding and Referencing' system, usually referred to as TIGER, is based on an extensive database of US geographic information. It is county-level data that documents physical features like roads and rivers, as well as some administrative features such as Congressional districts. Data can be downloaded for each state and for each of the following: Puerto Rico, American Samoa, Guam, Northern Mariana Islands, Midway Island, and the US Virgin Islands.

The database does not contain demographic or topographic (terrain) data; the "topology" referenced in the name refers to how the database itself is designed.

Only recent versions are available online for free download. Note also that the Census Bureau is making substantial changes to how the TIGER system is formatted, which should allow for wider and more effective compatibility with GIS tools. A good overview can be seen at this page
d
Alesco Phone ID Database - Phone Data with over 860 Million Phone Number...
datarade.ai
.csv, .xls, .txt
Updated Jul 5, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alesco Data (2018). Alesco Phone ID Database - Phone Data with over 860 Million Phone Number with Carrier Name, covers 94% of the US population - available for licensing! [Dataset]. https://datarade.ai/data-products/alesco-phone-id-database-the-industry-s-largest-and-most-ac-alesco-data
Explore at:
.csv, .xls, .txtAvailable download formats
Dataset updated
Jul 5, 2018
Dataset authored and provided by
Alesco Data
Area covered
United States
Description
The Alesco Phone ID Database data ties together a consumer's true identity, and with linkage to the Alesco Power Identity Graph, we are perfectly positioned to help customers solve today's most challenging marketing, analytics, and identity resolution problems.

Our proprietary Phone ID database combines public and private sources and validates phone numbers against current and historical data 24 hours a day, 365 days a year.

With over 650 million unique phone numbers, device and service information, our one-of-a-kind solutions are now available for your marketing and identity resolution challenges in both B2C and B2B applications!

• Alesco Phone ID provides more than 860 million phone numbers monthly linked to a consumer or business name and includes landline, mobile phone number, VoIP, private and business phone numbers — all permissibly obtained and privacy-compliant and linked to other Alesco data sets

• How we do it: Alesco Phone ID is multi-sourced with daily information and delivered monthly or quarterly to clients. Our proprietary machine learning and advanced analytics processes ensure quality levels far above industry standards. Alesco processes over 100 million phone signals per day, compiling, normalizing, and standardizing phone information from 37 input sources.

• Accuracy: Each of Alesco’s phone data sources are vetted to ensure they are authoritative, giving you confidence in the accuracy of the information. Every record is validated, verified and processed to ensure the widest, most reliable coverage combined with stunning precision.

Ease of use: Alesco’s Phone ID Database is available as an on-premise phone database license, giving you full control to host and access this powerful resource on-site. Ongoing updates are provided on a monthly basis ensure your data is up to date.
d
Email Address Data| Email Database | US Consumers | 650 million Consumer...
datarade.ai
.csv, .txt
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stirista, Email Address Data| Email Database | US Consumers | 650 million Consumer Email Addresses [Dataset]. https://datarade.ai/data-products/email-address-data-email-database-us-consumers-564-milli-stirista
Explore at:
.csv, .txtAvailable download formats
Dataset authored and provided by
Stirista
Area covered
United States of America
Description
Andrew Wharton's Actionable US Consumer Email Database hosts over 650 million email addresses that have been active within the last 36 months. This database is fully CAN-SPAM compliant and 100% opted-in for Third Party Use.

This Email Address database successfully connects you with your customers and/or prospects at their most recent, deliverable online address. and Increase impression rates, deliverability, and engagement in your digital campaigns.

The Email Address Data is 100% populated with email address, HEMS (MD5, Sha1, Sha256) first name, last name, postal address (primary and secondary), IP Address, Time Stamp(s) for Last Registration, Verification, and First Seen. An enhanced version of the database is available with Date-of-Birth (where available), Phone (mobile and landline) and MAIDs to Hashed email conversion.

The Andrews Wharton Actionable US Consumer Email Database is updated monthly. A complete replacement database or new adds are available as update files.

Contact us at successdelivered@andrewswharton.com or visit us at www.andrewswharton.com to learn more about this dataset.
US National Baby Names (1880-2017)
kaggle.com
Updated Sep 14, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hassen Morad (2018). US National Baby Names (1880-2017) [Dataset]. https://www.kaggle.com/hassenmorad/us-national-baby-names-18802017/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 14, 2018
Dataset provided by
Kaggle
Authors
Hassen Morad
Description
This is an updated version of the national baby names database, containing records from 1880-2017.
u
2020 New Mexico Designated Places
gstore.unm.edu
csv, geojson, gml +5
Updated Sep 10, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Earth Data Analysis Center (2021). 2020 New Mexico Designated Places [Dataset]. https://gstore.unm.edu/apps/rgis/datasets/3fb87757-7bcf-4941-b7b8-660e32a8432d/metadata/FGDC-STD-001-1998.html
Explore at:
shp(5), xls(5), json(5), kml(5), zip(3), gml(5), geojson(5), csv(5)Available download formats
Dataset updated
Sep 10, 2021
Dataset provided by
Earth Data Analysis Center
Time period covered
May 23, 2020
Area covered
and feature names., Geographic Names Information System (GNIS), Federal Information Processing Series (FIPS), New Mexico, West Bounding Coordinate -109.049169 East Bounding Coordinate -103.043557 North Bounding Coordinate 37.000129 South Bounding Coordinate 31.783667, United States
Description
The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The TIGER/Line shapefiles include both incorporated places (legal entities) and census designated places or CDPs (statistical entities). An incorporated place is established to provide governmental functions for a concentration of people as opposed to a minor civil division (MCD), which generally is created to provide services or administer an area without regard, necessarily, to population. Places always nest within a state, but may extend across county and county subdivision boundaries. An incorporated place usually is a city, town, village, or borough, but can have other legal descriptions. CDPs are delineated for the decennial census as the statistical counterparts of incorporated places. CDPs are delineated to provide data for settled concentrations of population that are identifiable by name, but are not legally incorporated under the laws of the state in which they are located. The boundaries for CDPs often are defined in partnership with state, local, and/or tribal officials and usually coincide with visible features or the boundary of an adjacent incorporated place or another legal entity. CDP boundaries often change from one decennial census to the next with changes in the settlement pattern and development; a CDP with the same name as in an earlier census does not necessarily have the same boundary. The only population/housing size requirement for CDPs is that they must contain some housing and population. The boundaries of most incorporated places in this shapefile are as of January 1, 2020, as reported through the Census Bureau's Boundary and Annexation Survey (BAS). The boundaries of all CDPs were delineated as part of the Census Bureau's Participant Statistical Areas Program (PSAP) for the 2010 Census.
Places
catalog.data.gov
geodata.bts.gov
+1more
Updated Oct 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of Commerce, U.S. Census Bureau, Geography Division (Point of Contact) (2025). Places [Dataset]. https://catalog.data.gov/dataset/places2
Explore at:
Dataset updated
Oct 21, 2025
Dataset provided by
United States Census Bureauhttp://census.gov/
United States Department of Commercehttp://commerce.gov/
Description
The Places dataset was published on September 22, 2025 from the U.S. Department of Commerce, U.S. Census Bureau, Geography Division and is part of the U.S. Department of Transportation (USDOT)/Bureau of Transportation Statistics (BTS) National Transportation Atlas Database (NTAD). This resource is a member of a series. The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) System (MTS). The MTS represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The TIGER/Line shapefiles include both incorporated places (legal entities) and census designated places or CDPs (statistical entities). An incorporated place is established to provide governmental functions for a concentration of people as opposed to a minor civil division (MCD), which generally is created to provide services or administer an area without regard, necessarily, to population. Places always nest within a state but may extend across county and county subdivision boundaries. An incorporated place is usually a city, town, village, or borough, but can have other legal descriptions. CDPs are delineated for the decennial census as the statistical counterparts of incorporated places. CDPs are delineated to provide data for settled concentrations of population that are identifiable by name but are not legally incorporated under the laws of the state in which they are located. The boundaries for CDPs are often defined in partnership with state, local, and/or tribal officials and usually coincide with visible features or the boundary of an adjacent incorporated place or another legal entity. CDP boundaries often change from one decennial census to the next with changes in the settlement pattern and development; a CDP with the same name as in an earlier census does not necessarily have the same boundary. The only population/housing size requirement for CDPs is that they must contain some housing and population. The boundaries of most incorporated places in this shapefile are as of January 1, 2024, as reported through the Census Bureau's Boundary and Annexation Survey (BAS). The boundaries of all CDPs were delineated as part of the Census Bureau's Participant Statistical Areas Program (PSAP) for the 2020 Census, but some CDPs were added or updated through the 2024 BAS as well. A data dictionary, or other source of attribute information, is accessible at https://doi.org/10.21949/1529072
H
Race and ethnicity data for first, middle, and last names
dataverse.harvard.edu
search.dataone.org
Updated Apr 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evan Rosenman; Santiago Olivella; Kosuke Imai (2023). Race and ethnicity data for first, middle, and last names [Dataset]. http://doi.org/10.7910/DVN/SGKW0K
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/SGKW0K
Dataset updated
Apr 11, 2023
Dataset provided by
Harvard Dataverse
Authors
Evan Rosenman; Santiago Olivella; Kosuke Imai
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
We provide datasets that that estimate the racial distributions associated with first, middle, and last names in the United States. The datasets cover five racial categories: White, Black, Hispanic, Asian, and Other. The provided data are computed from the voter files of six Southern states -- Alabama, Florida, Georgia, Louisiana, North Carolina, and South Carolina -- that collect race and ethnicity data upon registration. We include seven voter files per state, sourced between 2018 and 2021 from L2, Inc. Together, these states have approximately 36MM individuals who provide self-reported race and ethnicity. The last name datasets includes 338K surnames, while the middle name dictionaries contains 126K middle names and the first name datasets includes 136K first names. For each type of name, we provide a dataset of P(race | name) probabilities and P(name | race) probabilities. We include only names that appear at least 25 times across the 42 (= 7 voter files * 6 states) voter files in our dataset. These data are closely related to the the dataset: "Name Dictionaries for "wru" R Package", https://doi.org/10.7910/DVN/7TRYAC. These are the probabilities used in the latest iteration of the "WRU" package (Khanna et al., 2022) to make probabilistic predictions about the race of individuals, given their names and geolocations.
u
Census MAF/TIGER database
gstore.unm.edu
csv, geojson, gml +5
Updated Oct 4, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Earth Data Analysis Center (2018). Census MAF/TIGER database [Dataset]. https://gstore.unm.edu/apps/rgis/datasets/241d8727-e304-4e46-9a96-37ba602c79e6/metadata/FGDC-STD-001-1998.html
Explore at:
json(5), zip(2), shp(5), kml(5), csv(5), geojson(5), xls(5), gml(5)Available download formats
Dataset updated
Oct 4, 2018
Dataset provided by
Earth Data Analysis Center
Time period covered
Jun 2014
Area covered
Geographic Names Information System (GNIS), and feature names., Federal Information Processing Series (FIPS), West Bounding Coordinate -109.049169 East Bounding Coordinate -103.043557 North Bounding Coordinate 37.000014 South Bounding Coordinate 31.783148, United States, New Mexico
Description
The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The TIGER/Line shapefiles include both incorporated places (legal entities) and census designated places or CDPs (statistical entities). An incorporated place is established to provide governmental functions for a concentration of people as opposed to a minor civil division (MCD), which generally is created to provide services or administer an area without regard, necessarily, to population. Places always nest within a state, but may extend across county and county subdivision boundaries. An incorporated place usually is a city, town, village, or borough, but can have other legal descriptions. CDPs are delineated for the decennial census as the statistical counterparts of incorporated places. CDPs are delineated to provide data for settled concentrations of population that are identifiable by name, but are not legally incorporated under the laws of the state in which they are located. The boundaries for CDPs often are defined in partnership with state, local, and/or tribal officials and usually coincide with visible features or the boundary of an adjacent incorporated place or another legal entity. CDP boundaries often change from one decennial census to the next with changes in the settlement pattern and development; a CDP with the same name as in an earlier census does not necessarily have the same boundary. The only population/housing size requirement for CDPs is that they must contain some housing and population. The boundaries of most incorporated places in this shapefile are as of January 1, 2015, as reported through the Census Bureau's Boundary and Annexation Survey (BAS). The boundaries of all CDPs were delineated as part of the Census Bureau's Participant Statistical Areas Program (PSAP) for the 2010 Census.
K
US Places (Population 50K-100K)
koordinates.com
csv, dwg, geodatabase +6
Updated Feb 1, 2001
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
US Bureau of Transportation Statistics (BTS) (2001). US Places (Population 50K-100K) [Dataset]. https://koordinates.com/layer/22835-us-places-population-50k-100k/
Explore at:
dwg, geodatabase, pdf, mapinfo mif, kml, shapefile, geopackage / sqlite, csv, mapinfo tabAvailable download formats
Dataset updated
Feb 1, 2001
Dataset authored and provided by
US Bureau of Transportation Statistics (BTS)
Area covered

Description
This data set includes cities in the United States, Puerto Rico and the U.S. Virgin Islands. These cities were collected from the 1970 National Atlas of the United States. Where applicable, U.S. Census Bureau codes for named populated places were associated with each name to allow additional information to be attached. The Geographic Names Information System (GNIS) was also used as a source for additional information. This is a revised version of the December, 2003, data set.

This layer is sourced from maps.bts.dot.gov.
n
2020 Census Block Groups for Urban Search and Rescue - 764585
prep-response-portal.napsgfoundation.org
data-napsg.opendata.arcgis.com
+1more
Updated Jun 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SARGeo (2025). 2020 Census Block Groups for Urban Search and Rescue - 764585 [Dataset]. https://prep-response-portal.napsgfoundation.org/datasets/sargeo::2020-census-block-groups-for-urban-search-and-rescue-764585
Explore at:
Dataset updated
Jun 24, 2025
Dataset authored and provided by
SARGeo
Area covered
Description
USA Census Block Groups (CBG) for Urban Search and Rescue. This layer can be used for search segment planning. Block groups generally contain between 600 and 5,000 people and the boundaries generally follow existing roads and waterways. The field segment_designation is the last 6 digits of the unique identifier and matches the field in the SARCOP Segment layer.Data download date: August 12, 2021Census tables: P1, P2, P3, P4, H1, P5, HeaderDownloaded from: Census FTP siteProcessing Notes:Data was downloaded from the U.S. Census Bureau FTP site, imported into SAS format and joined to the 2020 TIGER boundaries. Boundaries are sourced from the 2020 TIGER/Line Geodatabases. Boundaries have been projected into Web Mercator and each attribute has been given a clear descriptive alias name. No alterations have been made to the vertices of the data.Each attribute maintains it's specified name from Census, but also has a descriptive alias name and long description derived from the technical documentation provided by the Census. For a detailed list of the attributes contained in this layer, view the Data tab and select "Fields". The following alterations have been made to the tabular data:Joined all tables to create one wide attribute table:P1 - RaceP2 - Hispanic or Latino, and not Hispanic or Latino by RaceP3 - Race for the Population 18 Years and OverP4 - Hispanic or Latino, and not Hispanic or Latino by Race for the Population 18 Years and OverH1 - Occupancy Status (Housing)P5 - Group Quarters Population by Group Quarters Type (correctional institutions, juvenile facilities, nursing facilities/skilled nursing, college/university student housing, military quarters, etc.)HeaderAfter joining, dropped fields: FILEID, STUSAB, CHARITER, CIFSN, LOGRECNO, GEOVAR, GEOCOMP, LSADC, and BLOCK.GEOCOMP was renamed to GEOID and moved be the first column in the table, the original GEOID was dropped.Placeholder fields for future legislative districts have been dropped: CD118, CD119, CD120, CD121, SLDU22, SLDU24, SLDU26, SLDU28, SLDL22, SLDL24 SLDL26, SLDL28.P0020001 was dropped, as it is duplicative of P0010001. Similarly, P0040001 was dropped, as it is duplicative of P0030001.In addition to calculated fields, County_Name and State_Name were added.The following calculated fields have been added (see long field descriptions in the Data tab for formulas used): PCT_P0030001: Percent of Population 18 Years and OverPCT_P0020002: Percent Hispanic or LatinoPCT_P0020005: Percent White alone, not Hispanic or LatinoPCT_P0020006: Percent Black or African American alone, not Hispanic or LatinoPCT_P0020007: Percent American Indian and Alaska Native alone, not Hispanic or LatinoPCT_P0020008: Percent Asian alone, Not Hispanic or LatinoPCT_P0020009: Percent Native Hawaiian and Other Pacific Islander alone, not Hispanic or LatinoPCT_P0020010: Percent Some Other Race alone, not Hispanic or LatinoPCT_P0020011: Percent Population of Two or More Races, not Hispanic or LatinoPCT_H0010002: Percent of Housing Units that are OccupiedPCT_H0010003: Percent of Housing Units that are VacantPlease note these percentages might look strange at the individual block group level, since this data has been protected using differential privacy.* *To protect the privacy and confidentiality of respondents, data has been protected using differential privacy techniques by the U.S. Census Bureau. This means that some individual block groups will have values that are inconsistent or improbable. However, when aggregated up, these issues become minimized. The pop-up on this layer uses Arcade to display aggregated values for the surrounding area rather than values for the block group itself.Download Census redistricting data in this layer as a file geodatabase.Additional links:U.S. Census BureauU.S. Census Bureau Decennial CensusAbout the 2020 Census2020 Census2020 Census data qualityDecennial Census P.L. 94-171 Redistricting Data Program
d
Alesco Email Database - Email Address Data - 2.3+ Billion US email records -...
datarade.ai
.csv, .xls, .txt
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alesco Data, Alesco Email Database - Email Address Data - 2.3+ Billion US email records - available acquisition marketing and identify resolution! [Dataset]. https://datarade.ai/data-products/alesco-email-database-over-1-8-billion-us-email-records-alesco-data
Explore at:
.csv, .xls, .txtAvailable download formats
Dataset authored and provided by
Alesco Data
Area covered
United States of America
Description
Alesco’s aggregated consumer email database consists of over 2.3 billion U.S. records with name, address and email. The database is fully CAN-SPAM and privacy compliant, and records include referring URL, IP address and date stamp. Postal addresses are address standardized and processed through NCOA. Available for licensing!

File size: 2.3 Billion IP Address: 1.9 Billion eAppend data: 1.8 Billion (full name/postal) Acquisition: 269 Million (full demo’s)

Fields Included: Name Address Email Phone IP Address

Consumer Behavior Data Consumer Demographic Data Identity Data Audience Data
H
Data from: Validated Names for Experimental Studies on Race and Ethnicity
dataverse.harvard.edu
search.dataone.org
+1more
Updated Mar 22, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Charles Crabtree; Jae Yeon Kim; Michael S. Gaddis; John B. Holbein; Cameron Guage; William X. Marx (2022). Validated Names for Experimental Studies on Race and Ethnicity [Dataset]. http://doi.org/10.7910/DVN/JVCUQM
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/JVCUQM
Dataset updated
Mar 22, 2022
Dataset provided by
Harvard Dataverse
Authors
Charles Crabtree; Jae Yeon Kim; Michael S. Gaddis; John B. Holbein; Cameron Guage; William X. Marx
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
A large and fast-growing number of studies across the social sciences use experiments to better understand the role of race in human interactions, particularly in the American context. Researchers often use names to signal the race of individuals portrayed in these experiments. However, those names might also signal other attributes, such as socioeconomic status (e.g., education and income) and citizenship. If they do, researchers need pre-tested names with data on perceptions of these attributes. Such data would permit researchers to draw correct inferences about the causal effect of race in their experiments. In this paper, we provide the largest dataset of validated name perceptions based on three different surveys conducted in the United States. In total, our data include over 44,170 name evaluations from 4,026 respondents for 600 names. In addition to respondent perceptions of race, income, education, and citizenship from names, our data also include respondent characteristics. Our data will be broadly helpful for researchers conducting experiments on the manifold ways in which race shapes American life.

Facebook

Twitter

Click to copy link

Link copied

Cite

data.cityofnewyork.us (2025). Popular Baby Names [Dataset]. https://catalog.data.gov/dataset/popular-baby-names

Popular Baby Names

Explore at:

4 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jul 12, 2025

Dataset provided by

data.cityofnewyork.us

Description

Popular Baby Names by Sex and Ethnic Group Data were collected through civil birth registration. Each record represents the ranking of a baby name in the order of frequency. Data can be used to represent the popularity of a name. Caution should be used when assessing the rank of a baby name if the frequency count is close to 10; the ranking may vary year to year.

Clear search

Close search

Google apps

Main menu

Popular Baby Names

Baby Names from Social Security Card Applications - National Data

Most frequent combinations of first and last name in the United States

Baby Names

Context

Inspiration

Protected Areas Database of the United States (PAD-US) 2.1

Census Data

US geographic names 2017-06-01

LinkedIn US city ID's

US Census Bureau TIGER data

Alesco Phone ID Database - Phone Data with over 860 Million Phone Number...

Email Address Data| Email Database | US Consumers | 650 million Consumer...

US National Baby Names (1880-2017)

2020 New Mexico Designated Places

Places

Race and ethnicity data for first, middle, and last names

Census MAF/TIGER database

US Places (Population 50K-100K)

2020 Census Block Groups for Urban Search and Rescue - 764585

Alesco Email Database - Email Address Data - 2.3+ Billion US email records -...

Data from: Validated Names for Experimental Studies on Race and Ethnicity

Popular Baby NamesSee More Versions

Popular Baby Names