100+ datasets found

e
List of first names and surnames
data.europa.eu
csv
Updated May 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christian Quest (2023). List of first names and surnames [Dataset]. https://data.europa.eu/data/datasets/5bc35259634f41122d982759
Explore at:
csv(2104259), csv(10841127)Available download formats
Dataset updated
May 18, 2023
Dataset authored and provided by
Christian Quest
License
https://www.etalab.gouv.fr/licence-ouverte-open-licencehttps://www.etalab.gouv.fr/licence-ouverte-open-licence
Description
In order to facilitate the anonymisation of data, this list of first names and surnames was extracted from the SIRENE database of INSEE.

For each first name and surname, the number of appearances is indicated.

ATTENTION: No content check is done, and these lists may contain anomalies present in the original database!
Baby Names from Social Security Card Applications - National Data
catalog.data.gov
Updated Jul 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Social Security Administration (2025). Baby Names from Social Security Card Applications - National Data [Dataset]. https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-data
Explore at:
Dataset updated
Jul 4, 2025
Dataset provided by
Social Security Administrationhttp://ssa.gov/
Description
The data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 on.
E
Database of Persian Names
catalog.elra.info
live.european-language-grid.eu
Updated Oct 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2019). Database of Persian Names [Dataset]. https://catalog.elra.info/en-us/repository/browse/ELRA-L0127/
Explore at:
Dataset updated
Oct 7, 2019
Dataset provided by
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
ELRA (European Language Resources Association)
License
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
Description
A unique resource that has been developed in cooperation with a team of native-speaker experts in Persian phonology. The data includes a confidence rank to indicate the relative likelihood that a variant will be encountered in the real world.
d
Business Name Search
catalog.data.gov
opendata.hawaii.gov
+3more
Updated Apr 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Commerce and Consumer Affairs (2024). Business Name Search [Dataset]. https://catalog.data.gov/dataset/business-name-search
Explore at:
Dataset updated
Apr 10, 2024
Dataset provided by
Commerce and Consumer Affairs
Description
Search for a business by name. You can obtain business information and then proceed to purchase a certificate of good standing or other documents. The purpose of this search is simply to determine whether a company/entity exists and to provide basic information on the company/entity.
Baby Names
kaggle.com
zip
Updated Feb 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evan Zhang (2021). Baby Names [Dataset]. https://www.kaggle.com/datasets/ironicninja/baby-names
Explore at:
zip(5656233 bytes)Available download formats
Dataset updated
Feb 9, 2021
Authors
Evan Zhang
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Dataset of US baby names from 1910 to 2021. Includes State, Sex, Year, Name, and Count as features.

Inspiration

Mainly used for a tutorial but can be used for classification/other visualizations.
l
Plant Names Database Quarterly Changes August 2024 - Dataset - DataStore
datastore.landcareresearch.co.nz
Updated Aug 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Plant Names Database Quarterly Changes August 2024 - Dataset - DataStore [Dataset]. https://datastore.landcareresearch.co.nz/dataset/plant-names-database-quarterly-changes-august-2024
Explore at:
Dataset updated
Aug 15, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary data on changes to data in the Plant Names Database in the following classes: the addition of new names for formal deprecation of duplicate names changes to the status of the name as preferred name or synonym for a taxon updating the origin or occurrence of a taxon within New Zealand applying changes to the classification of a taxon updating the scientific article that is being applied to the taxa to determine whether the name is a synonym or preferred name
E
Database of Arab Names in Arabic
catalog.elra.info
live.european-language-grid.eu
Updated Oct 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2019). Database of Arab Names in Arabic [Dataset]. https://catalog.elra.info/en-us/repository/browse/ELRA-L0123/
Explore at:
Dataset updated
Oct 7, 2019
Dataset provided by
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
ELRA (European Language Resources Association)
License
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
Description
A resource of Arab personal names and variants, in the original Arabic script, this database covers several hundred thousand Arabic script variants, along with common spelling mistakes. Every Arabic name is normalized and vocalized.
Data from: Brazilian Names
kaggle.com
zip
Updated Mar 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FelipeKitamura, MD, PhD (2023). Brazilian Names [Dataset]. https://www.kaggle.com/datasets/felipekitamura/brazilian-names
Explore at:
zip(486308 bytes)Available download formats
Dataset updated
Mar 12, 2023
Authors
FelipeKitamura, MD, PhD
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Area covered
Brazil
Description
List of first names in Brasil, collected by IBGE.

This dataset could be used in many applications, including helping de-identify text.

The files contain one name per line, including variations of the same names.
E
Database of Japanese Name Variants
live.european-language-grid.eu
catalog.elra.info
txt
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Database of Japanese Name Variants [Dataset]. https://live.european-language-grid.eu/catalogue/lcr/2420
Explore at:
txtAvailable download formats
License
http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
http://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttp://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
Description
This resource covers four million Japanese names and their romanized variants, and includes gender codes, classification codes, and frequency rankings.
O
911 Addressing - Street Name Master List
austintexas.gov
data.austintexas.gov
+3more
csv, xlsx, xml
Updated Nov 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Austin, Texas - data.austintexas.gov (2025). 911 Addressing - Street Name Master List [Dataset]. https://www.austintexas.gov/page/street-name-database
Explore at:
xml, xlsx, csvAvailable download formats
Dataset updated
Nov 22, 2025
Dataset authored and provided by
City of Austin, Texas - data.austintexas.gov
Description
Street Name Master List - contains all the reserved and active street names.
HANA Database
kaggle.com
zip
Updated Jan 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simon Wittrock (2022). HANA Database [Dataset]. https://www.kaggle.com/sdusimonwittrock/hana-database
Explore at:
zip(50294159555 bytes)Available download formats
Dataset updated
Jan 15, 2022
Authors
Simon Wittrock
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This is the HANA database of handwritten personal names as introduced in the paper HANA: A HAndwritten NAme Database for Offline Handwritten Text Recognition (official code available here). The minipics are from police register sheets from Copenhagen which cover all adults (above the age of 10) residing in the capital of Denmark, Copenhagen, in the period from 1890 to 1923.

The labels in the .csv files refer to the main character on the original register sheets. Each row contains a reference to the corresponding image as the first element and the name as the second element. The HANA database consists of 1,105,904 images with corresponding labels. The last name is always only one word and if multiple last names were transcribed, the last of these were chosen as the last name, while the remaining were moved to the end of the first names. The first names can consist of up to nine individual words.

All names are written in lower case letters and contain only characters which are used in Danish words, which implies 29 alphabetic characters i.e., this database includes the letters æ, ø, and å.

If anything is missing or if you are interested in the original documents from Copenhagen Archives to improve, e.g., the segmentation, feel free to reach out at sfw@sam.sdu.dk.

We wish you the best of luck.
d
Race and ethnicity data for first, middle, and last names
search.dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rosenman, Evan; Olivella, Santiago; Imai, Kosuke (2023). Race and ethnicity data for first, middle, and last names [Dataset]. http://doi.org/10.7910/DVN/SGKW0K
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/SGKW0K
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Rosenman, Evan; Olivella, Santiago; Imai, Kosuke
Description
We provide datasets that that estimate the racial distributions associated with first, middle, and last names in the United States. The datasets cover five racial categories: White, Black, Hispanic, Asian, and Other. The provided data are computed from the voter files of six Southern states -- Alabama, Florida, Georgia, Louisiana, North Carolina, and South Carolina -- that collect race and ethnicity data upon registration. We include seven voter files per state, sourced between 2018 and 2021 from L2, Inc. Together, these states have approximately 36MM individuals who provide self-reported race and ethnicity. The last name datasets includes 338K surnames, while the middle name dictionaries contains 126K middle names and the first name datasets includes 136K first names. For each type of name, we provide a dataset of P(race | name) probabilities and P(name | race) probabilities. We include only names that appear at least 25 times across the 42 (= 7 voter files * 6 states) voter files in our dataset. These data are closely related to the the dataset: "Name Dictionaries for "wru" R Package", https://doi.org/10.7910/DVN/7TRYAC. These are the probabilities used in the latest iteration of the "WRU" package (Khanna et al., 2022) to make probabilistic predictions about the race of individuals, given their names and geolocations.
Common Brazilian Names and Gender
kaggle.com
zip
Updated Dec 17, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thiago Oliveira (2017). Common Brazilian Names and Gender [Dataset]. https://www.kaggle.com/pintowar/common-brazilian-names-and-gender
Explore at:
zip(21677 bytes)Available download formats
Dataset updated
Dec 17, 2017
Authors
Thiago Oliveira
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Brazil
Description
Overview

In portuguese (Brazil official idiom), people's names are usually related to their natural gender (male or female only). That been said the sole name of a person contains patterns that can reveal their natural gender.

Content

The dataset contains only two columns: name and gender.

Challenge

The main goal of this dataset is to be able to automatically classify the natural gender of a person based only on it's name.
d
Irish Place names database - Dataset - PSB Data Catalogue
datacatalogue.gov.ie
Updated Mar 21, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Irish Place names database - Dataset - PSB Data Catalogue [Dataset]. https://datacatalogue.gov.ie/dataset/irish-place-names-database
Explore at:
Dataset updated
Mar 21, 2021
Area covered
Ireland
Description
Database of Irish Place Names --> --> External Link--> --> -->
Name Phonics Dataset
kaggle.com
zip
Updated Sep 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amritvir Singh (2020). Name Phonics Dataset [Dataset]. https://www.kaggle.com/amritvirsinghx/gender-prediction-from-name-pronunciation
Explore at:
zip(678210 bytes)Available download formats
Dataset updated
Sep 20, 2020
Authors
Amritvir Singh
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Content

This Data set contains various files containing over 1000 names and their respective gender information. This can be used to predict Gender using phonics from their respective names.

Inspiration

Searching for a person's name in a database is a unique challenge. Depending on the source and age of the data, you may not be able to count on the spelling of the name being correct, or even the same name being spelled the same way when it appears more than once. Discrepancies between stored data and search terms may be introduced due to personal choice or cultural differences in spellings, homophones, transcription errors, illiteracy, or simply lack of standardized spellings during some time periods. These sorts of problems are especially prevalent in transcriptions of handwritten historical records used by historians, genealogists, and other researchers.

A common way to solve the string-search problem is to look for values that are "close" to the same as the search target. Using a traditional fuzzy match algorithm to compute the closeness of two arbitrary strings is expensive, though, and it isn't appropriate for searching large data sets. A better solution is to compute hash values for entries in the database in advance, and several special hash algorithms have been created for this purpose. These phonetic hash algorithms allow you to compare two words or names based on how they sound, rather than the precise spelling.

Early Efforts: Soundex One such algorithm is Soundex, developed by Margaret K. Odell and Robert C. Russell in the early 1900s. The Soundex algorithm appears frequently in genealogical contexts because it's associated with the U.S. Census and is specifically designed to encode names. A Soundex hash value is calculated by using the first letter of the name and converting the consonants in the rest of the name to digits by using a simple lookup table. Vowels and duplicate encoded values are dropped, and the result is padded up to—or truncated down to—four characters.

The Fuzzy library includes a Soundex implementation for Python programs

This dataset can be used to explore the power of Fuzzy Source
E
ArabLEX: Database of Arabic Place Names (DAP)
catalog.elra.info
Updated Oct 7, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2019). ArabLEX: Database of Arabic Place Names (DAP) [Dataset]. https://catalog.elra.info/en-us/repository/browse/ELRA-M0105/
Explore at:
Dataset updated
Oct 7, 2019
Dataset provided by
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
ELRA (European Language Resources Association)
License
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
Description
This database is part of the ArabLEX set of data which consists of the Database of Arabic General Vocabulary (DAG), Database of Arabic Place Names (DAP), Database of Foreign Names in Arabic (DAF) and Database of Arab Names (DAN) available from ELRA under references, respectively, ELRA-L0131, ELRA-M0105, ELRA-M0106 and ELRA-M0107.This full-form Arabic-English place name database of over 21,000 lemmas and nearly 6.5 million forms provides worldwide coverage of common place names, given in standard MSA orthography, and includes all inflected and cliticized forms for each place name. In addition, precise phonemic transcriptions and full vowel diacritics are designed to enhance Arabic speech technology. Orthographic variants are also extensively covered.This database is provided with three options: 1) proclitics, 2) phonetic information (CARS) and 3) orthographic variants. Subsets excluding some of the three proposed options may be provided upon demand. CARS is an accurate phonemic transcription. Optionally, phonetic transcriptions, IPA and/or SAMPA, can be provided, fine tuned to a customer's specifications.Quantity and size: 6,455,201 lines / 812 MBFile format: flat TSV text filesSamples and a specifications document available upon request.
o
Notices of Name Changes
data.ontario.ca
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government and Consumer Services (2021). Notices of Name Changes [Dataset]. https://data.ontario.ca/dataset/notices-of-name-changes
Explore at:
(None)Available download formats
Dataset updated
Dec 9, 2021
Dataset authored and provided by
Government and Consumer Services
License
https://www.ontario.ca/page/copyright-informationhttps://www.ontario.ca/page/copyright-information
Time period covered
Oct 5, 2016
Area covered
Ontario
Description
This dataset contains a listing of individuals who have had their name formally changed in Ontario.

This data is made publicly available through the Ontario Gazette.
w
Name Find Source LLC Whois Database | Whois Data Center
whoisdatacenter.com
csv
Updated Oct 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AllHeart Web Inc (2025). Name Find Source LLC Whois Database | Whois Data Center [Dataset]. https://whoisdatacenter.com/registrar/2863/
Explore at:
csvAvailable download formats
Dataset updated
Oct 7, 2025
Dataset authored and provided by
AllHeart Web Inc
License
https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/
Time period covered
Oct 12, 2025 - Dec 31, 2025
Description
Name Find Source LLC Whois Database, discover comprehensive ownership details, registration dates, and more for Name Find Source LLC with Whois Data Center.
GeoNames database
kaggle.com
zip
Updated Aug 29, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GeoNames (2017). GeoNames database [Dataset]. https://www.kaggle.com/geonames/geonames-database
Explore at:
zip(344692118 bytes)Available download formats
Dataset updated
Aug 29, 2017
Dataset authored and provided by
GeoNames
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Context

The GeoNames geographical database contains over 10 million geographical names and consists of over 9 million unique features with 2.8 million populated places and 5.5 million alternate names. All features are categorized into one out of nine feature classes and further subcategorized into one out of 645 feature codes.

Content

The main 'geoname' table has the following fields :

geonameid : integer id of record in geonames database

name : name of geographical point (utf8) varchar(200)

asciiname : name of geographical point in plain ascii characters, varchar(200)

alternatenames : alternatenames, comma separated, ascii names automatically transliterated, convenience attribute from alternatename table, varchar(10000)

latitude : latitude in decimal degrees (wgs84)

longitude : longitude in decimal degrees (wgs84)

feature class : see http://www.geonames.org/export/codes.html, char(1)

feature code : see http://www.geonames.org/export/codes.html, varchar(10)

country code : ISO-3166 2-letter country code, 2 characters

cc2 : alternate country codes, comma separated, ISO-3166 2-letter country code, 200 characters

admin1 code : fipscode (subject to change to iso code), see exceptions below, see file admin1Codes.txt for display names of this code; varchar(20)

admin2 code : code for the second administrative division, a county in the US, see file admin2Codes.txt; varchar(80)

admin3 code : code for third level administrative division, varchar(20)

admin4 code : code for fourth level administrative division, varchar(20)

population : bigint (8 byte int)

elevation : in meters, integer

dem : digital elevation model, srtm3 or gtopo30, average elevation of 3''x3'' (ca 90mx90m) or 30''x30'' (ca 900mx900m) area in meters, integer. srtm processed by cgiar/ciat.

timezone : the iana timezone id (see file timeZone.txt) varchar(40)

modification date : date of last modification in yyyy-MM-dd format

AdminCodes:

Most adm1 are FIPS codes. ISO codes are used for US, CH, BE and ME. UK and Greece are using an additional level between country and fips code. The code '00' stands for general features where no specific adm1 code is defined. The corresponding admin feature is found with the same countrycode and adminX codes and the respective feature code ADMx.

feature classes:

A: country, state, region,...

H: stream, lake, ...

L: parks,area, ...

P: city, village,...

R: road, railroad

S: spot, building, farm

T: mountain,hill,rock,...

U: undersea

V: forest,heath,...

Acknowledgements

Data Sources: http://www.geonames.org/data-sources.html
E
Database of Foreign Names in Arabic
catalogue.elra.info
live.european-language-grid.eu
Updated Oct 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2019). Database of Foreign Names in Arabic [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-L0124/
Explore at:
Dataset updated
Oct 7, 2019
Dataset provided by
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
ELRA (European Language Resources Association)
License
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
Description
This database covers non-Arabic names, their Arabic equivalents, and Arabic script variants for each name (with the most important variant given first).

Facebook

Twitter

Click to copy link

Link copied

Cite

Christian Quest (2023). List of first names and surnames [Dataset]. https://data.europa.eu/data/datasets/5bc35259634f41122d982759

List of first names and surnames

Explore at:

7 scholarly articles cite this dataset (View in Google Scholar)

csv(2104259), csv(10841127)Available download formats

Dataset updated

May 18, 2023

Dataset authored and provided by

Christian Quest

License

https://www.etalab.gouv.fr/licence-ouverte-open-licencehttps://www.etalab.gouv.fr/licence-ouverte-open-licence

Description

In order to facilitate the anonymisation of data, this list of first names and surnames was extracted from the SIRENE database of INSEE.

For each first name and surname, the number of appearances is indicated.

ATTENTION: No content check is done, and these lists may contain anomalies present in the original database!

Clear search

Close search

Google apps

Main menu

List of first names and surnames

Baby Names from Social Security Card Applications - National Data

Database of Persian Names

Business Name Search

Baby Names

Context

Inspiration

Plant Names Database Quarterly Changes August 2024 - Dataset - DataStore

Database of Arab Names in Arabic

Data from: Brazilian Names

Database of Japanese Name Variants

911 Addressing - Street Name Master List

HANA Database

Race and ethnicity data for first, middle, and last names

Common Brazilian Names and Gender

Overview

Content

Challenge

Irish Place names database - Dataset - PSB Data Catalogue

Name Phonics Dataset

Content

Inspiration

ArabLEX: Database of Arabic Place Names (DAP)

Notices of Name Changes

Name Find Source LLC Whois Database | Whois Data Center

GeoNames database

Context

Content

Acknowledgements

Database of Foreign Names in Arabic

List of first names and surnames