https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
Chinese name components, accompanied by accurate pinyin readings, gender codes, and flags denoting whether name is a given name, surname, or both.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Database of names, surnames and midnames across the Russian federation used as source to teach algorithms for gender identification by fullname.
Dataset prepared for MongoDB database. It has MongoDB dump and dump of tables as JSON lines files.
Used in gender identification and fullname parsing software https://github.com/datacoon/russiannames
Available under Creative Commons CC-BY SA by default.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary data on changes to data in the Plant Names Database in the following classes: the addition of new names for formal deprecation of duplicate names changes to the status of the name as preferred name or synonym for a taxon updating the origin or occurrence of a taxon within New Zealand applying changes to the classification of a taxon updating the scientific article that is being applied to the taxa to determine whether the name is a synonym or preferred name
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
A unique resource that has been developed in cooperation with a team of native-speaker experts in Persian phonology. The data includes a confidence rank to indicate the relative likelihood that a variant will be encountered in the real world.
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
171 million names (100 million unique) This torrent contains: The URL of every searchable Facebook user s profile The name of every searchable Facebook user, both unique and by count (perfect for post-processing, datamining, etc) Processed lists, including first names with count, last names with count, potential usernames with count, etc The programs I used to generate everything So, there you have it: lots of awesome data from Facebook. Now, I just have to find one more problem with Facebook so I can write "Revenge of the Facebook Snatchers" and complete the trilogy. Any suggestions? >:-) Limitations So far, I have only indexed the searchable users, not their friends. Getting their friends will be significantly more data to process, and I don t have those capabilities right now. I d like to tackle that in the future, though, so if anybody has any bandwidth they d like to donate, all I need is an ssh account and Nmap installed. An additional limitation is that these are on
The data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 onward.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary data on changes to data in the Plant Names Database in the following classes: the addition of new names for formal deprecation of duplicate names changes to the status of the name as preferred name or synonym for a taxon updating the origin or occurrence of a taxon within New Zealand applying changes to the classification of a taxon updating the scientific article that is being applied to the taxa to determine whether the name is a synonym or preferred name
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
Covers Chinese full names of real people, including celebrities. Includes pinyin readings.
Database of Irish Place Names --> --> External Link--> --> -->
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
A resource of Arab personal names and variants, in the original Arabic script, this database covers several hundred thousand Arabic script variants, along with common spelling mistakes. Every Arabic name is normalized and vocalized.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The Canadian Geographical Names Data Base (CGNDB) is the authoritative national database of Canada's geographical names. The purpose of the CGNDB is to store place names and their attributes that have been approved by the Geographical Names Board of Canada (GNBC), the national coordinating body responsible for standards and policies on place names. The CGNDB is maintained by Natural Resources Canada, through the Canada Centre for Mapping and Earth Observation. The geographic extent of the CGNDB is the Canadian landmass and water bodies; the temporal extent is from 1897 to present. This dataset is extracted from the CGNDB on a weekly basis, and consists of current officially approved names, feature type, coordinates of the feature, decision date, source, and other attributes. The output file formats for this product are: text (CSV), Shape (SHP), and Keyhole Markup Language (KML). Content advisory: The Canadian Geographical Names Database contains historical terminology that is considered racist, offensive and derogatory. Geographical naming authorities are in the process of addressing many offensive place names, but the work is still ongoing. For more information, please contact the GNBC Secretariat.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
French First Names from Death Records (1970-2024)
This dataset contains French first names extracted from death records provided by INSEE (French National Institute of Statistics and Economic Studies) covering the period from 1970 to September 2024.
Dataset Description
Data Source
The data is sourced from INSEE's death records database. It includes first names of deceased individuals in France, providing valuable insights into naming patterns across different… See the full description on the dataset page: https://huggingface.co/datasets/eltorio/french_first_names_insee_2024.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset compiles the first version of the worldwide gender-name dictionary (WGND) including 6.2 million names for 182 different countries to disambiguate the gender.
Official Street Names in the City of Los Angeles created and maintained by the Bureau of Engineering.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary data on changes to data in the Plant Names Database in the following classes: the addition of new names for formal deprecation of duplicate names changes to the status of the name as preferred name or synonym for a taxon updating the origin or occurrence of a taxon within New Zealand applying changes to the classification of a taxon updating the scientific article that is being applied to the taxa to determine whether the name is a synonym or preferred name
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
This database is part of the ArabLEX set of data which consists of the Database of Arabic General Vocabulary (DAG), Database of Arabic Place Names (DAP), Database of Foreign Names in Arabic (DAF) and Database of Arab Names (DAN) available from ELRA under references, respectively, ELRA-L0131, ELRA-M0105, ELRA-M0106 and ELRA-M0107.This full-form Arabic-English place name database of over 21,000 lemmas and nearly 6.5 million forms provides worldwide coverage of common place names, given in standard MSA orthography, and includes all inflected and cliticized forms for each place name. In addition, precise phonemic transcriptions and full vowel diacritics are designed to enhance Arabic speech technology. Orthographic variants are also extensively covered.This database is provided with three options: 1) proclitics, 2) phonetic information (CARS) and 3) orthographic variants. Subsets excluding some of the three proposed options may be provided upon demand. CARS is an accurate phonemic transcription. Optionally, phonetic transcriptions, IPA and/or SAMPA, can be provided, fine tuned to a customer's specifications.Quantity and size: 6,455,201 lines / 812 MBFile format: flat TSV text filesSamples and a specifications document available upon request.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is the large sample of minipics of the handwritten names from the Danish census from 1916. We use this sample for testing the performance of transfer learning from the HANA Database.
Each row contain a reference to the corresponding image as the first element and the name as the second element. All names are written in lower case letters and contain only characters which are used in Danish words, which implies 29 alphabetic characters i.e. this database include the letters æ, ø, and å.
More information can be found in: HANA: A HAndwritten NAme Database for Offline Handwritten Text Recognition and the full HANA Database can be found at HANA Database
https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This is the HANA database of handwritten personal names as introduced in the paper HANA: A HAndwritten NAme Database for Offline Handwritten Text Recognition (official code available here). The minipics are from police register sheets from Copenhagen which cover all adults (above the age of 10) residing in the capital of Denmark, Copenhagen, in the period from 1890 to 1923.
The labels in the .csv files refer to the main character on the original register sheets. Each row contains a reference to the corresponding image as the first element and the name as the second element. The HANA database consists of 1,105,904 images with corresponding labels. The last name is always only one word and if multiple last names were transcribed, the last of these were chosen as the last name, while the remaining were moved to the end of the first names. The first names can consist of up to nine individual words.
All names are written in lower case letters and contain only characters which are used in Danish words, which implies 29 alphabetic characters i.e., this database includes the letters æ, ø, and å.
If anything is missing or if you are interested in the original documents from Copenhagen Archives to improve, e.g., the segmentation, feel free to reach out at sfw@sam.sdu.dk.
We wish you the best of luck.
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
Chinese name components, accompanied by accurate pinyin readings, gender codes, and flags denoting whether name is a given name, surname, or both.