100+ datasets found

E
Database of Chinese Names
catalog.elra.info
live.european-language-grid.eu
Updated Oct 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2019). Database of Chinese Names [Dataset]. https://catalog.elra.info/en-us/repository/browse/ELRA-L0129/
Explore at:
Dataset updated
Oct 7, 2019
Dataset provided by
ELRA (European Language Resources Association)
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
License
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
Area covered
China
Description
Chinese name components, accompanied by accurate pinyin readings, gender codes, and flags denoting whether name is a given name, surname, or both.
Z
Database of Russian names, surnames and midnames for gender identification
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ivan Begtin (2020). Database of Russian names, surnames and midnames for gender identification [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2747010
Explore at:
Dataset updated
Jan 24, 2020
Dataset authored and provided by
Ivan Begtin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Database of names, surnames and midnames across the Russian federation used as source to teach algorithms for gender identification by fullname.

Dataset prepared for MongoDB database. It has MongoDB dump and dump of tables as JSON lines files.

Used in gender identification and fullname parsing software https://github.com/datacoon/russiannames

Available under Creative Commons CC-BY SA by default.
l
Plant Names Database Quarterly Changes May 2025 - Dataset - DataStore
datastore.landcareresearch.co.nz
Updated May 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Plant Names Database Quarterly Changes May 2025 - Dataset - DataStore [Dataset]. https://datastore.landcareresearch.co.nz/dataset/plant-names-database-quarterly-changes-may-2025
Explore at:
Dataset updated
May 15, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary data on changes to data in the Plant Names Database in the following classes: the addition of new names for formal deprecation of duplicate names changes to the status of the name as preferred name or synonym for a taxon updating the origin or occurrence of a taxon within New Zealand applying changes to the classification of a taxon updating the scientific article that is being applied to the taxa to determine whether the name is a synonym or preferred name
E
Database of Persian Names
catalog.elra.info
live.european-language-grid.eu
Updated Oct 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2019). Database of Persian Names [Dataset]. https://catalog.elra.info/en-us/repository/browse/ELRA-L0127/
Explore at:
Dataset updated
Oct 7, 2019
Dataset provided by
ELRA (European Language Resources Association)
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
License
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
Description
A unique resource that has been developed in cooperation with a team of native-speaker experts in Persian phonology. The data includes a confidence rank to indicate the relative likelihood that a variant will be encountered in the real world.
a
Facebook Names Dataset
academictorrents.com
bittorrent
Updated Nov 11, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ron Bowes (Skull Security) (2015). Facebook Names Dataset [Dataset]. https://academictorrents.com/details/e54c73099d291605e7579b90838c2cd86a8e9575
Explore at:
bittorrent(2991052604)Available download formats
Dataset updated
Nov 11, 2015
Dataset authored and provided by
Ron Bowes (Skull Security)
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
171 million names (100 million unique) This torrent contains: The URL of every searchable Facebook user s profile The name of every searchable Facebook user, both unique and by count (perfect for post-processing, datamining, etc) Processed lists, including first names with count, last names with count, potential usernames with count, etc The programs I used to generate everything So, there you have it: lots of awesome data from Facebook. Now, I just have to find one more problem with Facebook so I can write "Revenge of the Facebook Snatchers" and complete the trilogy. Any suggestions? >:-) Limitations So far, I have only indexed the searchable users, not their friends. Getting their friends will be significantly more data to process, and I don t have those capabilities right now. I d like to tackle that in the future, though, so if anybody has any bandwidth they d like to donate, all I need is an ssh account and Nmap installed. An additional limitation is that these are on
Baby Names from Social Security Card Applications - National Data
catalog.data.gov
data.amerigeoss.org
Updated May 5, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Social Security Administration (2022). Baby Names from Social Security Card Applications - National Data [Dataset]. https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-data
Explore at:
Dataset updated
May 5, 2022
Dataset provided by
Social Security Administrationhttp://ssa.gov/
Description
The data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 onward.
l
Plant Names Database Quarterly Changes May 2022 - Dataset - DataStore
datastore.landcareresearch.co.nz
Updated May 15, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Plant Names Database Quarterly Changes May 2022 - Dataset - DataStore [Dataset]. https://datastore.landcareresearch.co.nz/dataset/plant-names-database-quarterly-changes-may-2022
Explore at:
Dataset updated
May 15, 2022
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary data on changes to data in the Plant Names Database in the following classes: the addition of new names for formal deprecation of duplicate names changes to the status of the name as preferred name or synonym for a taxon updating the origin or occurrence of a taxon within New Zealand applying changes to the classification of a taxon updating the scientific article that is being applied to the taxa to determine whether the name is a synonym or preferred name
E
Database of Chinese Full Names
catalog.elra.info
live.european-language-grid.eu
Updated Oct 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2019). Database of Chinese Full Names [Dataset]. https://catalog.elra.info/en-us/repository/browse/ELRA-L0106/
Explore at:
Dataset updated
Oct 7, 2019
Dataset provided by
ELRA (European Language Resources Association)
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
License
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
Description
Covers Chinese full names of real people, including celebrities. Includes pinyin readings.
d
Irish Place names database - Dataset - PSB Data Catalogue
datacatalogue.gov.ie
Updated Mar 21, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). Irish Place names database - Dataset - PSB Data Catalogue [Dataset]. https://datacatalogue.gov.ie/dataset/irish-place-names-database
Explore at:
Dataset updated
Mar 21, 2021
Area covered
Ireland
Description
Database of Irish Place Names --> --> External Link--> --> -->
E
Database of Arab Names in Arabic
catalogue.elra.info
live.european-language-grid.eu
Updated Oct 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2019). Database of Arab Names in Arabic [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-L0123/
Explore at:
Dataset updated
Oct 7, 2019
Dataset provided by
ELRA (European Language Resources Association)
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
License
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
Description
A resource of Arab personal names and variants, in the original Arabic script, this database covers several hundred thousand Arabic script variants, along with common spelling mistakes. Every Arabic name is normalized and vocalized.
g
NWT Place Names Database - Dataset - Open Data
opendata.gov.nt.ca
Updated Jan 31, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). NWT Place Names Database - Dataset - Open Data [Dataset]. https://opendata.gov.nt.ca/dataset/nwt-place-names-database
Explore at:
Dataset updated
Jan 31, 2017
License
Description
NWT Place Names Database
Canadian Geographical Names - CGN
open.canada.ca
catalogue.arctic-sdi.org
csv, kml, pdf, shp
Updated Apr 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Natural Resources Canada (2023). Canadian Geographical Names - CGN [Dataset]. https://open.canada.ca/data/en/dataset/e27c6eba-3c5d-4051-9db2-082dc6411c2c
Explore at:
shp, csv, kml, pdfAvailable download formats
Dataset updated
Apr 3, 2023
Dataset provided by
Ministry of Natural Resources of Canadahttps://www.nrcan.gc.ca/
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Area covered
Canada
Description
The Canadian Geographical Names Data Base (CGNDB) is the authoritative national database of Canada's geographical names. The purpose of the CGNDB is to store place names and their attributes that have been approved by the Geographical Names Board of Canada (GNBC), the national coordinating body responsible for standards and policies on place names. The CGNDB is maintained by Natural Resources Canada, through the Canada Centre for Mapping and Earth Observation. The geographic extent of the CGNDB is the Canadian landmass and water bodies; the temporal extent is from 1897 to present. This dataset is extracted from the CGNDB on a weekly basis, and consists of current officially approved names, feature type, coordinates of the feature, decision date, source, and other attributes. The output file formats for this product are: text (CSV), Shape (SHP), and Keyhole Markup Language (KML). Content advisory: The Canadian Geographical Names Database contains historical terminology that is considered racist, offensive and derogatory. Geographical naming authorities are in the process of addressing many offensive place names, but the work is still ongoing. For more information, please contact the GNBC Secretariat.
h
french_first_names_insee_2024
huggingface.co
Updated Nov 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ronan L.M. (2024). french_first_names_insee_2024 [Dataset]. http://doi.org/10.57967/hf/3431
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57967/hf/3431
Dataset updated
Nov 4, 2024
Authors
Ronan L.M.
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Area covered
French
Description
French First Names from Death Records (1970-2024)

This dataset contains French first names extracted from death records provided by INSEE (French National Institute of Statistics and Economic Studies) covering the period from 1970 to September 2024.

Dataset Description Data Source

The data is sourced from INSEE's death records database. It includes first names of deceased individuals in France, providing valuable insights into naming patterns across different… See the full description on the dataset page: https://huggingface.co/datasets/eltorio/french_first_names_insee_2024.
H
WGND 1.0
dataverse.harvard.edu
Updated Jul 27, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julio Raffo; Gema Lax-Martinez (2018). WGND 1.0 [Dataset]. http://doi.org/10.7910/DVN/YPRQH8
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/YPRQH8
Dataset updated
Jul 27, 2018
Dataset provided by
Harvard Dataverse
Authors
Julio Raffo; Gema Lax-Martinez
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Wiegand Hall
Description
This dataset compiles the first version of the worldwide gender-name dictionary (WGND) including 6.2 million names for 182 different countries to disambiguate the gender.
d
Street Names
catalog.data.gov
data.lacity.org
+2more
Updated May 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.lacity.org (2025). Street Names [Dataset]. https://catalog.data.gov/dataset/street-names-7385b
Explore at:
Dataset updated
May 10, 2025
Dataset provided by
data.lacity.org
Description
Official Street Names in the City of Los Angeles created and maintained by the Bureau of Engineering.
l
Plant Names Database Quarterly Changes February 2025 - Dataset - DataStore
datastore.landcareresearch.co.nz
Updated Feb 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Plant Names Database Quarterly Changes February 2025 - Dataset - DataStore [Dataset]. https://datastore.landcareresearch.co.nz/dataset/plant-names-database-quarterly-changes-february-2025
Explore at:
Dataset updated
Feb 15, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary data on changes to data in the Plant Names Database in the following classes: the addition of new names for formal deprecation of duplicate names changes to the status of the name as preferred name or synonym for a taxon updating the origin or occurrence of a taxon within New Zealand applying changes to the classification of a taxon updating the scientific article that is being applied to the taxa to determine whether the name is a synonym or preferred name
E
ArabLEX: Database of Arabic Place Names (DAP)
catalog.elra.info
Updated Oct 7, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2019). ArabLEX: Database of Arabic Place Names (DAP) [Dataset]. https://catalog.elra.info/en-us/repository/browse/ELRA-M0105/
Explore at:
Dataset updated
Oct 7, 2019
Dataset provided by
ELRA (European Language Resources Association)
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
License
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
Description
This database is part of the ArabLEX set of data which consists of the Database of Arabic General Vocabulary (DAG), Database of Arabic Place Names (DAP), Database of Foreign Names in Arabic (DAF) and Database of Arab Names (DAN) available from ELRA under references, respectively, ELRA-L0131, ELRA-M0105, ELRA-M0106 and ELRA-M0107.This full-form Arabic-English place name database of over 21,000 lemmas and nearly 6.5 million forms provides worldwide coverage of common place names, given in standard MSA orthography, and includes all inflected and cliticized forms for each place name. In addition, precise phonemic transcriptions and full vowel diacritics are designed to enhance Arabic speech technology. Orthographic variants are also extensively covered.This database is provided with three options: 1) proclitics, 2) phonetic information (CARS) and 3) orthographic variants. Subsets excluding some of the three proposed options may be provided upon demand. CARS is an accurate phonemic transcription. Optionally, phonetic transcriptions, IPA and/or SAMPA, can be provided, fine tuned to a customer's specifications.Quantity and size: 6,455,201 lines / 812 MBFile format: flat TSV text filesSamples and a specifications document available upon request.
Danish Census Handwritten Names (Large)
kaggle.com
Updated Feb 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simon Wittrock (2022). Danish Census Handwritten Names (Large) [Dataset]. https://www.kaggle.com/datasets/sdusimonwittrock/danish-census-handwritten-names-large
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 16, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Simon Wittrock
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This is the large sample of minipics of the handwritten names from the Danish census from 1916. We use this sample for testing the performance of transfer learning from the HANA Database.

Each row contain a reference to the corresponding image as the first element and the name as the second element. All names are written in lower case letters and contain only characters which are used in Danish words, which implies 29 alphabetic characters i.e. this database include the letters æ, ø, and å.

More information can be found in: HANA: A HAndwritten NAme Database for Offline Handwritten Text Recognition and the full HANA Database can be found at HANA Database
#1 Domain Names International, Inc. dba 1dni.com Whois Database | Whois Data...
whoisdatacenter.com
csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AllHeart Web Inc, #1 Domain Names International, Inc. dba 1dni.com Whois Database | Whois Data Center [Dataset]. https://whoisdatacenter.com/registrar/101/
Explore at:
csvAvailable download formats
Dataset provided by
AllHeart Web
Authors
AllHeart Web Inc
License
https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/
Time period covered
Jul 8, 2025 - Dec 31, 2025
Description
1 Domain Names International, Inc. dba 1dni.com Whois Database, discover comprehensive ownership details, registration dates, and more for #1 Domain Names International, Inc. dba 1dni.com with Whois Data Center.
HANA Database
kaggle.com
Updated Jan 15, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simon Wittrock (2022). HANA Database [Dataset]. https://www.kaggle.com/sdusimonwittrock/hana-database/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 15, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Simon Wittrock
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This is the HANA database of handwritten personal names as introduced in the paper HANA: A HAndwritten NAme Database for Offline Handwritten Text Recognition (official code available here). The minipics are from police register sheets from Copenhagen which cover all adults (above the age of 10) residing in the capital of Denmark, Copenhagen, in the period from 1890 to 1923.

The labels in the .csv files refer to the main character on the original register sheets. Each row contains a reference to the corresponding image as the first element and the name as the second element. The HANA database consists of 1,105,904 images with corresponding labels. The last name is always only one word and if multiple last names were transcribed, the last of these were chosen as the last name, while the remaining were moved to the end of the first names. The first names can consist of up to nine individual words.

All names are written in lower case letters and contain only characters which are used in Danish words, which implies 29 alphabetic characters i.e., this database includes the letters æ, ø, and å.

If anything is missing or if you are interested in the original documents from Copenhagen Archives to improve, e.g., the segmentation, feel free to reach out at sfw@sam.sdu.dk.

We wish you the best of luck.

Facebook

Twitter

Click to copy link

Link copied

Cite

ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2019). Database of Chinese Names [Dataset]. https://catalog.elra.info/en-us/repository/browse/ELRA-L0129/

Database of Chinese Names

Explore at:

8 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Oct 7, 2019

Dataset provided by

ELRA (European Language Resources Association)
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)

License

https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

Area covered

China

Description

Chinese name components, accompanied by accurate pinyin readings, gender codes, and flags denoting whether name is a given name, surname, or both.

Clear search

Close search

Google apps

Main menu

Database of Chinese Names

Database of Russian names, surnames and midnames for gender identification

Plant Names Database Quarterly Changes May 2025 - Dataset - DataStore

Database of Persian Names

Facebook Names Dataset

Baby Names from Social Security Card Applications - National Data

Plant Names Database Quarterly Changes May 2022 - Dataset - DataStore

Database of Chinese Full Names

Irish Place names database - Dataset - PSB Data Catalogue

Database of Arab Names in Arabic

NWT Place Names Database - Dataset - Open Data

Canadian Geographical Names - CGN

french_first_names_insee_2024

WGND 1.0

Street Names

Plant Names Database Quarterly Changes February 2025 - Dataset - DataStore

ArabLEX: Database of Arabic Place Names (DAP)

Danish Census Handwritten Names (Large)

#1 Domain Names International, Inc. dba 1dni.com Whois Database | Whois Data...

1 Domain Names International, Inc. dba 1dni.com Whois Database, discover comprehensive ownership details, registration dates, and more for #1 Domain Names International, Inc. dba 1dni.com with Whois Data Center.

HANA Database

Database of Chinese Names