100+ datasets found
  1. E

    Database of Chinese Names

    • catalog.elra.info
    • live.european-language-grid.eu
    Updated Oct 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2019). Database of Chinese Names [Dataset]. https://catalog.elra.info/en-us/repository/browse/ELRA-L0129/
    Explore at:
    Dataset updated
    Oct 7, 2019
    Dataset provided by
    ELRA (European Language Resources Association)
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    License

    https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    Area covered
    China
    Description

    Chinese name components, accompanied by accurate pinyin readings, gender codes, and flags denoting whether name is a given name, surname, or both.

  2. Z

    Database of Russian names, surnames and midnames for gender identification

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ivan Begtin (2020). Database of Russian names, surnames and midnames for gender identification [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2747010
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset authored and provided by
    Ivan Begtin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Database of names, surnames and midnames across the Russian federation used as source to teach algorithms for gender identification by fullname.

    Dataset prepared for MongoDB database. It has MongoDB dump and dump of tables as JSON lines files.

    Used in gender identification and fullname parsing software https://github.com/datacoon/russiannames

    Available under Creative Commons CC-BY SA by default.

  3. l

    Plant Names Database Quarterly Changes May 2025 - Dataset - DataStore

    • datastore.landcareresearch.co.nz
    Updated May 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Plant Names Database Quarterly Changes May 2025 - Dataset - DataStore [Dataset]. https://datastore.landcareresearch.co.nz/dataset/plant-names-database-quarterly-changes-may-2025
    Explore at:
    Dataset updated
    May 15, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary data on changes to data in the Plant Names Database in the following classes: the addition of new names for formal deprecation of duplicate names changes to the status of the name as preferred name or synonym for a taxon updating the origin or occurrence of a taxon within New Zealand applying changes to the classification of a taxon updating the scientific article that is being applied to the taxa to determine whether the name is a synonym or preferred name

  4. E

    Database of Persian Names

    • catalog.elra.info
    • live.european-language-grid.eu
    Updated Oct 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2019). Database of Persian Names [Dataset]. https://catalog.elra.info/en-us/repository/browse/ELRA-L0127/
    Explore at:
    Dataset updated
    Oct 7, 2019
    Dataset provided by
    ELRA (European Language Resources Association)
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    License

    https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    Description

    A unique resource that has been developed in cooperation with a team of native-speaker experts in Persian phonology. The data includes a confidence rank to indicate the relative likelihood that a variant will be encountered in the real world.

  5. a

    Facebook Names Dataset

    • academictorrents.com
    bittorrent
    Updated Nov 11, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ron Bowes (Skull Security) (2015). Facebook Names Dataset [Dataset]. https://academictorrents.com/details/e54c73099d291605e7579b90838c2cd86a8e9575
    Explore at:
    bittorrent(2991052604)Available download formats
    Dataset updated
    Nov 11, 2015
    Dataset authored and provided by
    Ron Bowes (Skull Security)
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    171 million names (100 million unique) This torrent contains: The URL of every searchable Facebook user s profile The name of every searchable Facebook user, both unique and by count (perfect for post-processing, datamining, etc) Processed lists, including first names with count, last names with count, potential usernames with count, etc The programs I used to generate everything So, there you have it: lots of awesome data from Facebook. Now, I just have to find one more problem with Facebook so I can write "Revenge of the Facebook Snatchers" and complete the trilogy. Any suggestions? >:-) Limitations So far, I have only indexed the searchable users, not their friends. Getting their friends will be significantly more data to process, and I don t have those capabilities right now. I d like to tackle that in the future, though, so if anybody has any bandwidth they d like to donate, all I need is an ssh account and Nmap installed. An additional limitation is that these are on

  6. Baby Names from Social Security Card Applications - National Data

    • catalog.data.gov
    • data.amerigeoss.org
    Updated May 5, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social Security Administration (2022). Baby Names from Social Security Card Applications - National Data [Dataset]. https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-data
    Explore at:
    Dataset updated
    May 5, 2022
    Dataset provided by
    Social Security Administrationhttp://ssa.gov/
    Description

    The data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 onward.

  7. l

    Plant Names Database Quarterly Changes May 2022 - Dataset - DataStore

    • datastore.landcareresearch.co.nz
    Updated May 15, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Plant Names Database Quarterly Changes May 2022 - Dataset - DataStore [Dataset]. https://datastore.landcareresearch.co.nz/dataset/plant-names-database-quarterly-changes-may-2022
    Explore at:
    Dataset updated
    May 15, 2022
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary data on changes to data in the Plant Names Database in the following classes: the addition of new names for formal deprecation of duplicate names changes to the status of the name as preferred name or synonym for a taxon updating the origin or occurrence of a taxon within New Zealand applying changes to the classification of a taxon updating the scientific article that is being applied to the taxa to determine whether the name is a synonym or preferred name

  8. E

    Database of Chinese Full Names

    • catalog.elra.info
    • live.european-language-grid.eu
    Updated Oct 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2019). Database of Chinese Full Names [Dataset]. https://catalog.elra.info/en-us/repository/browse/ELRA-L0106/
    Explore at:
    Dataset updated
    Oct 7, 2019
    Dataset provided by
    ELRA (European Language Resources Association)
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    License

    https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    Description

    Covers Chinese full names of real people, including celebrities. Includes pinyin readings.

  9. d

    Irish Place names database - Dataset - PSB Data Catalogue

    • datacatalogue.gov.ie
    Updated Mar 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). Irish Place names database - Dataset - PSB Data Catalogue [Dataset]. https://datacatalogue.gov.ie/dataset/irish-place-names-database
    Explore at:
    Dataset updated
    Mar 21, 2021
    Area covered
    Ireland
    Description

    Database of Irish Place Names --> --> External Link--> --> -->

  10. E

    Database of Arab Names in Arabic

    • catalogue.elra.info
    • live.european-language-grid.eu
    Updated Oct 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2019). Database of Arab Names in Arabic [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-L0123/
    Explore at:
    Dataset updated
    Oct 7, 2019
    Dataset provided by
    ELRA (European Language Resources Association)
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    License

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    Description

    A resource of Arab personal names and variants, in the original Arabic script, this database covers several hundred thousand Arabic script variants, along with common spelling mistakes. Every Arabic name is normalized and vocalized.

  11. g

    NWT Place Names Database - Dataset - Open Data

    • opendata.gov.nt.ca
    Updated Jan 31, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). NWT Place Names Database - Dataset - Open Data [Dataset]. https://opendata.gov.nt.ca/dataset/nwt-place-names-database
    Explore at:
    Dataset updated
    Jan 31, 2017
    License
    Description

    NWT Place Names Database

  12. Canadian Geographical Names - CGN

    • open.canada.ca
    • catalogue.arctic-sdi.org
    csv, kml, pdf, shp
    Updated Apr 3, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Natural Resources Canada (2023). Canadian Geographical Names - CGN [Dataset]. https://open.canada.ca/data/en/dataset/e27c6eba-3c5d-4051-9db2-082dc6411c2c
    Explore at:
    shp, csv, kml, pdfAvailable download formats
    Dataset updated
    Apr 3, 2023
    Dataset provided by
    Ministry of Natural Resources of Canadahttps://www.nrcan.gc.ca/
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Canada
    Description

    The Canadian Geographical Names Data Base (CGNDB) is the authoritative national database of Canada's geographical names. The purpose of the CGNDB is to store place names and their attributes that have been approved by the Geographical Names Board of Canada (GNBC), the national coordinating body responsible for standards and policies on place names. The CGNDB is maintained by Natural Resources Canada, through the Canada Centre for Mapping and Earth Observation. The geographic extent of the CGNDB is the Canadian landmass and water bodies; the temporal extent is from 1897 to present. This dataset is extracted from the CGNDB on a weekly basis, and consists of current officially approved names, feature type, coordinates of the feature, decision date, source, and other attributes. The output file formats for this product are: text (CSV), Shape (SHP), and Keyhole Markup Language (KML). Content advisory: The Canadian Geographical Names Database contains historical terminology that is considered racist, offensive and derogatory. Geographical naming authorities are in the process of addressing many offensive place names, but the work is still ongoing. For more information, please contact the GNBC Secretariat.

  13. h

    french_first_names_insee_2024

    • huggingface.co
    Updated Nov 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ronan L.M. (2024). french_first_names_insee_2024 [Dataset]. http://doi.org/10.57967/hf/3431
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 4, 2024
    Authors
    Ronan L.M.
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    French
    Description

    French First Names from Death Records (1970-2024)

    This dataset contains French first names extracted from death records provided by INSEE (French National Institute of Statistics and Economic Studies) covering the period from 1970 to September 2024.

      Dataset Description
    
    
    
    
    
      Data Source
    

    The data is sourced from INSEE's death records database. It includes first names of deceased individuals in France, providing valuable insights into naming patterns across different… See the full description on the dataset page: https://huggingface.co/datasets/eltorio/french_first_names_insee_2024.

  14. H

    WGND 1.0

    • dataverse.harvard.edu
    Updated Jul 27, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julio Raffo; Gema Lax-Martinez (2018). WGND 1.0 [Dataset]. http://doi.org/10.7910/DVN/YPRQH8
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 27, 2018
    Dataset provided by
    Harvard Dataverse
    Authors
    Julio Raffo; Gema Lax-Martinez
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Wiegand Hall
    Description

    This dataset compiles the first version of the worldwide gender-name dictionary (WGND) including 6.2 million names for 182 different countries to disambiguate the gender.

  15. d

    Street Names

    • catalog.data.gov
    • data.lacity.org
    • +2more
    Updated May 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.lacity.org (2025). Street Names [Dataset]. https://catalog.data.gov/dataset/street-names-7385b
    Explore at:
    Dataset updated
    May 10, 2025
    Dataset provided by
    data.lacity.org
    Description

    Official Street Names in the City of Los Angeles created and maintained by the Bureau of Engineering.

  16. l

    Plant Names Database Quarterly Changes February 2025 - Dataset - DataStore

    • datastore.landcareresearch.co.nz
    Updated Feb 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Plant Names Database Quarterly Changes February 2025 - Dataset - DataStore [Dataset]. https://datastore.landcareresearch.co.nz/dataset/plant-names-database-quarterly-changes-february-2025
    Explore at:
    Dataset updated
    Feb 15, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary data on changes to data in the Plant Names Database in the following classes: the addition of new names for formal deprecation of duplicate names changes to the status of the name as preferred name or synonym for a taxon updating the origin or occurrence of a taxon within New Zealand applying changes to the classification of a taxon updating the scientific article that is being applied to the taxa to determine whether the name is a synonym or preferred name

  17. E

    ArabLEX: Database of Arabic Place Names (DAP)

    • catalog.elra.info
    Updated Oct 7, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2019). ArabLEX: Database of Arabic Place Names (DAP) [Dataset]. https://catalog.elra.info/en-us/repository/browse/ELRA-M0105/
    Explore at:
    Dataset updated
    Oct 7, 2019
    Dataset provided by
    ELRA (European Language Resources Association)
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    License

    https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    Description

    This database is part of the ArabLEX set of data which consists of the Database of Arabic General Vocabulary (DAG), Database of Arabic Place Names (DAP), Database of Foreign Names in Arabic (DAF) and Database of Arab Names (DAN) available from ELRA under references, respectively, ELRA-L0131, ELRA-M0105, ELRA-M0106 and ELRA-M0107.This full-form Arabic-English place name database of over 21,000 lemmas and nearly 6.5 million forms provides worldwide coverage of common place names, given in standard MSA orthography, and includes all inflected and cliticized forms for each place name. In addition, precise phonemic transcriptions and full vowel diacritics are designed to enhance Arabic speech technology. Orthographic variants are also extensively covered.This database is provided with three options: 1) proclitics, 2) phonetic information (CARS) and 3) orthographic variants. Subsets excluding some of the three proposed options may be provided upon demand. CARS is an accurate phonemic transcription. Optionally, phonetic transcriptions, IPA and/or SAMPA, can be provided, fine tuned to a customer's specifications.Quantity and size: 6,455,201 lines / 812 MBFile format: flat TSV text filesSamples and a specifications document available upon request.

  18. Danish Census Handwritten Names (Large)

    • kaggle.com
    Updated Feb 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Wittrock (2022). Danish Census Handwritten Names (Large) [Dataset]. https://www.kaggle.com/datasets/sdusimonwittrock/danish-census-handwritten-names-large
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Simon Wittrock
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This is the large sample of minipics of the handwritten names from the Danish census from 1916. We use this sample for testing the performance of transfer learning from the HANA Database.

    Each row contain a reference to the corresponding image as the first element and the name as the second element. All names are written in lower case letters and contain only characters which are used in Danish words, which implies 29 alphabetic characters i.e. this database include the letters æ, ø, and å.

    More information can be found in: HANA: A HAndwritten NAme Database for Offline Handwritten Text Recognition and the full HANA Database can be found at HANA Database

  19. #1 Domain Names International, Inc. dba 1dni.com Whois Database | Whois Data...

    • whoisdatacenter.com
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AllHeart Web Inc, #1 Domain Names International, Inc. dba 1dni.com Whois Database | Whois Data Center [Dataset]. https://whoisdatacenter.com/registrar/101/
    Explore at:
    csvAvailable download formats
    Dataset provided by
    AllHeart Web
    Authors
    AllHeart Web Inc
    License

    https://whoisdatacenter.com/terms-of-use/https://whoisdatacenter.com/terms-of-use/

    Time period covered
    Jul 8, 2025 - Dec 31, 2025
    Description

    1 Domain Names International, Inc. dba 1dni.com Whois Database, discover comprehensive ownership details, registration dates, and more for #1 Domain Names International, Inc. dba 1dni.com with Whois Data Center.

  20. HANA Database

    • kaggle.com
    Updated Jan 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simon Wittrock (2022). HANA Database [Dataset]. https://www.kaggle.com/sdusimonwittrock/hana-database/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 15, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Simon Wittrock
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    This is the HANA database of handwritten personal names as introduced in the paper HANA: A HAndwritten NAme Database for Offline Handwritten Text Recognition (official code available here). The minipics are from police register sheets from Copenhagen which cover all adults (above the age of 10) residing in the capital of Denmark, Copenhagen, in the period from 1890 to 1923.

    The labels in the .csv files refer to the main character on the original register sheets. Each row contains a reference to the corresponding image as the first element and the name as the second element. The HANA database consists of 1,105,904 images with corresponding labels. The last name is always only one word and if multiple last names were transcribed, the last of these were chosen as the last name, while the remaining were moved to the end of the first names. The first names can consist of up to nine individual words.

    All names are written in lower case letters and contain only characters which are used in Danish words, which implies 29 alphabetic characters i.e., this database includes the letters æ, ø, and å.

    If anything is missing or if you are interested in the original documents from Copenhagen Archives to improve, e.g., the segmentation, feel free to reach out at sfw@sam.sdu.dk.

    We wish you the best of luck.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2019). Database of Chinese Names [Dataset]. https://catalog.elra.info/en-us/repository/browse/ELRA-L0129/

Database of Chinese Names

Explore at:
8 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Oct 7, 2019
Dataset provided by
ELRA (European Language Resources Association)
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
License

https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

Area covered
China
Description

Chinese name components, accompanied by accurate pinyin readings, gender codes, and flags denoting whether name is a given name, surname, or both.

Search
Clear search
Close search
Google apps
Main menu