100+ datasets found
  1. e

    List of first names and surnames

    • data.europa.eu
    csv
    Updated May 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christian Quest (2023). List of first names and surnames [Dataset]. https://data.europa.eu/data/datasets/5bc35259634f41122d982759
    Explore at:
    csv(2104259), csv(10841127)Available download formats
    Dataset updated
    May 18, 2023
    Dataset authored and provided by
    Christian Quest
    License

    https://www.etalab.gouv.fr/licence-ouverte-open-licencehttps://www.etalab.gouv.fr/licence-ouverte-open-licence

    Description

    In order to facilitate the anonymisation of data, this list of first names and surnames was extracted from the SIRENE database of INSEE.

    For each first name and surname, the number of appearances is indicated.

    ATTENTION: No content check is done, and these lists may contain anomalies present in the original database!

  2. Forenames and Surnames with Gender and Country

    • kaggle.com
    zip
    Updated Jul 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erpel1 (2024). Forenames and Surnames with Gender and Country [Dataset]. https://www.kaggle.com/datasets/erpel1/forenames-and-surnames-with-gender-and-country
    Explore at:
    zip(167277328 bytes)Available download formats
    Dataset updated
    Jul 1, 2024
    Authors
    Erpel1
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset contains 5M forenames (first names) and 8M surnames (last names) from 105 different countries. They are annotated with gender, country and the amount of occurrences in the original data. Since most names appear in multiple countries and sometimes not just for one gender, one name typically has multiple rows. The data is aggregated from https://github.com/philipperemy/name-dataset?tab=readme-ov-file#full-dataset .

    A list of all countries can be found in "country_codes.csv".

    If you want to look at names from your country, go to either "forenames.csv" or "surnames.csv" and click on the three horizontal bars in the head of the country column. Then search for your country_code with two capital letters and click apply.

  3. Baby Names

    • kaggle.com
    zip
    Updated Feb 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evan Zhang (2021). Baby Names [Dataset]. https://www.kaggle.com/datasets/ironicninja/baby-names
    Explore at:
    zip(5656233 bytes)Available download formats
    Dataset updated
    Feb 9, 2021
    Authors
    Evan Zhang
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Dataset of US baby names from 1910 to 2021. Includes State, Sex, Year, Name, and Count as features.

    Inspiration

    Mainly used for a tutorial but can be used for classification/other visualizations.

  4. Baby Names from Social Security Card Applications - National Data

    • catalog.data.gov
    Updated Jul 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social Security Administration (2025). Baby Names from Social Security Card Applications - National Data [Dataset]. https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-data
    Explore at:
    Dataset updated
    Jul 4, 2025
    Dataset provided by
    Social Security Administrationhttp://ssa.gov/
    Description

    The data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 on.

  5. D

    BDM Data - Popular Baby Names 1952 to 2024

    • data.nsw.gov.au
    csv
    Updated Oct 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NSW Registry of Births Deaths and Marriages (2025). BDM Data - Popular Baby Names 1952 to 2024 [Dataset]. https://data.nsw.gov.au/data/dataset/popular-baby-names-from-1952
    Explore at:
    csv(373639)Available download formats
    Dataset updated
    Oct 7, 2025
    Dataset authored and provided by
    NSW Registry of Births Deaths and Marriages
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Finding the right name to give to your baby can be fun and challenging. NSW Registry of Birth, Deaths and Marriages have provided a list of popular names to help you find the one that's just right for you.

    An annual list of the most popular baby names in NSW is based on the names registered in the previous calendar year. We have combined the annual lists from from 1952 to 2024 and made it available in excel.

  6. d

    Master Street Name Table

    • catalog.data.gov
    • data.nola.gov
    • +3more
    Updated Feb 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.nola.gov (2024). Master Street Name Table [Dataset]. https://catalog.data.gov/dataset/master-street-name-table
    Explore at:
    Dataset updated
    Feb 9, 2024
    Dataset provided by
    data.nola.gov
    Description

    This list is a work-in-progress and will be updated at least quarterly. This version updates column names and corrects spellings of several streets in order to alleviate confusion and simplify street name research. It represents an inventory of official street name spellings in the City of New Orleans. Several sources contain various spellings and formats of street names. This list represents street name spellings and formats researched by the City of New Orleans GIS and City Planning Commission.Note: This list may not represent what is currently displayed on street signs. City of New Orleans official street list is derived from New Orleans street centerline file, 9-1-1 centerline file, and CPC plat maps. Fields include the full street name and the parsed elements along with abbreviations using US Postal Standards. We invite your input to as we work toward one enterprise street name list.Status: Current: Currently a known used street name in New Orleans Other: Currently a known used street name on a planned but not developed street. May be a retired street name.

  7. Namesakes

    • figshare.com
    json
    Updated Nov 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oleg Vasilyev; Aysu Altun; Nidhi Vyas; Vedant Dharnidharka; Erika Lampert; John Bohannon (2021). Namesakes [Dataset]. http://doi.org/10.6084/m9.figshare.17009105.v1
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Nov 20, 2021
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Oleg Vasilyev; Aysu Altun; Nidhi Vyas; Vedant Dharnidharka; Erika Lampert; John Bohannon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract

    Motivation: creating challenging dataset for testing Named-Entity
    

    Linking. The Namesakes dataset consists of three closely related datasets: Entities, News and Backlinks. Entities were collected as Wikipedia text chunks corresponding to highly ambiguous entity names. The News were collected as random news text chunks, containing mentions that either belong to the Entities dataset or can be easily confused with them. Backlinks were obtained from Wikipedia dump data with intention to have mentions linked to the entities of the Entity dataset. The Entities and News are human-labeled, resolving the mentions of the entities.Methods

    Entities were collected as Wikipedia 
    

    text chunks corresponding to highly ambiguous entity names: the most popular people names, the most popular locations, and organizations with name ambiguity. In each Entities text chunk, the named entities with the name similar to the chunk Wikipedia page name are labeled. For labeling, these entities were suggested to human annotators (odetta.ai) to tag as "Same" (same as the page entity) or "Other". The labeling was done by 6 experienced annotators that passed through a preliminary trial task. The only accepted tags are the tags assigned in agreement by not less than 5 annotators, and then passed through reconciliation with an experienced reconciliator.

    The News were collected as random news text chunks, containing mentions which either belong to the Entities dataset or can be easily confused with them. In each News text chunk one mention was selected for labeling, and 3-10 Wikipedia pages from Entities were suggested as the labels for an annotator to choose from. The labeling was done by 3 experienced annotators (odetta.ai), after the annotators passed a preliminary trial task. The results were reconciled by an experienced reconciliator. All the labeling was done using Lighttag (lighttag.io).

    Backlinks were obtained from Wikipedia dump data (dumps.wikimedia.org/enwiki/20210701) with intention to have mentions linked to the entities of the Entity dataset. The backlinks were filtered to leave only mentions in a good quality text; each text was cut 1000 characters after the last mention.

    Usage NotesEntities:
    

    File: Namesakes_entities.jsonl The Entities dataset consists of 4148 Wikipedia text chunks containing human-tagged mentions of entities. Each mention is tagged either as "Same" (meaning that the mention is of this Wikipedia page entity), or "Other" (meaning that the mention is of some other entity, just having the same or similar name). The Entities dataset is a jsonl list, each item is a dictionary with the following keys and values: Key: ‘pagename’: page name of the Wikipedia page. Key ‘pageid’: page id of the Wikipedia page. Key ‘title’: title of the Wikipedia page. Key ‘url’: URL of the Wikipedia page. Key ‘text’: The text chunk from the Wikipedia page. Key ‘entities’: list of the mentions in the page text, each entity is represented by a dictionary with the keys: Key 'text': the mention as a string from the page text. Key ‘start’: start character position of the entity in the text. Key ‘end’: end (one-past-last) character position of the entity in the text. Key ‘tag’: annotation tag given as a string - either ‘Same’ or ‘Other’.

    News: File: Namesakes_news.jsonl The News dataset consists of 1000 news text chunks, each one with a single annotated entity mention. The annotation either points to the corresponding entity from the Entities dataset (if the mention is of that entity), or indicates that the mentioned entity does not belong to the Entities dataset. The News dataset is a jsonl list, each item is a dictionary with the following keys and values: Key ‘id_text’: Id of the sample. Key ‘text’: The text chunk. Key ‘urls’: List of URLs of wikipedia entities suggested to labelers for identification of the entity mentioned in the text. Key ‘entity’: a dictionary describing the annotated entity mention in the text: Key 'text': the mention as a string found by an NER model in the text. Key ‘start’: start character position of the mention in the text. Key ‘end’: end (one-past-last) character position of the mention in the text. Key 'tag': This key exists only if the mentioned entity is annotated as belonging to the Entities dataset - if so, the value is a dictionary identifying the Wikipedia page assigned by annotators to the mentioned entity: Key ‘pageid’: Wikipedia page id. Key ‘pagetitle’: page title. Key 'url': page URL.

    Backlinks dataset: The Backlinks dataset consists of two parts: dictionary Entity-to-Backlinks and Backlinks documents. The dictionary points to backlinks for each entity of the Entity dataset (if any backlinks exist for the entity). The Backlinks documents are the backlinks Wikipedia text chunks with identified mentions of the entities from the Entities dataset.

    Each mention is identified by surrounded double square brackets, e.g. "Muir built a small cabin along [[Yosemite Creek]].". However, if the mention differs from the exact entity name, the double square brackets wrap both the exact name and, separated by '|', the mention string to the right, for example: "Muir also spent time with photographer [[Carleton E. Watkins | Carleton Watkins]] and studied his photographs of Yosemite.".

    The Entity-to-Backlinks is a jsonl with 1527 items. File: Namesakes_backlinks_entities.jsonl Each item is a tuple: Entity name. Entity Wikipedia page id. Backlinks ids: a list of pageids of backlink documents.

    The Backlinks documents is a jsonl with 26903 items. File: Namesakes_backlinks_texts.jsonl Each item is a dictionary: Key ‘pageid’: Id of the Wikipedia page. Key ‘title’: Title of the Wikipedia page. Key 'content': Text chunk from the Wikipedia page, with all mentions in the double brackets; the text is cut 1000 characters after the last mention, the cut is denoted as '...[CUT]'. Key 'mentions': List of the mentions from the text, for convenience. Each mention is a tuple: Entity name. Entity Wikipedia page id. Sorted list of all character indexes at which the mention occurrences start in the text.

  8. g

    The annual list of first names of newborns — city of Nancy

    • gimi9.com
    Updated Dec 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). The annual list of first names of newborns — city of Nancy [Dataset]. https://gimi9.com/dataset/eu_5d2c2919634f41429aae86ce/
    Explore at:
    Dataset updated
    Dec 16, 2023
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    The annual list of first names of newborns is a simple and popular dataset. These data, from the register of civil status, shall contain the following essential data: sex of the newborn, first name of the newborn, number of occurrences of the first name for the corresponding year, year of survey. The dataset consists of the list of first names of children born in Nancy since 2016, in CSV format, with the number of occurrences of each given name, classified by year and sex. The first names declared below an occurrence of five are not published, with a view to protecting personal data. The standardisation of this dataset follows the recommendations of Opendata France following the work around the Common Socle des Data Locales. Definition of headers COLL_NOM: name of the municipality COLL_INSEE: Insee code of the municipality where the first names are registered in the civil status of the place of birth. Note that the place of birth may be different from the place of residence of the parents. CHILD_SEX: Gender corresponding to first name: M or F respectively for men or women CHILD_PRENOM: first name of new born(s) recorded as first name in the civil status documents of the corresponding year. NUMBER_OCCURENCES: occurrence of first name YEAR: year of birth Total births reported to the City of Nancy 2018 Total number of births: 5135 Total number of births of girls: 2692 Total number of births of boys: 2443 2017 Total number of births: 5483 Total number of births of girls: 2704 Total number of births of boys: 2779 2016 Total number of births: 5544 Total number of births of girls: 2692 Total number of births of boys: 2852

  9. e

    Marseille - First Name List 2020

    • data.europa.eu
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ville de Marseille, Marseille - First Name List 2020 [Dataset]. https://data.europa.eu/88u/dataset/66dada5c3b30732f477093f6
    Explore at:
    csvAvailable download formats
    Dataset authored and provided by
    Ville de Marseille
    License

    https://www.etalab.gouv.fr/licence-ouverte-open-licencehttps://www.etalab.gouv.fr/licence-ouverte-open-licence

    Description

    Produced as part of the Data Challenge with Sciences Po Saint-Germain-en-Laye students. Annual compilation of data on first names from civil status records: 99 female first names and 99 male first names grouped and ranked by number of occurrences for each year since 2010.

  10. Z

    name lists for 'Detecting intersectionality in NER models: A data-driven...

    • data-staging.niaid.nih.gov
    • data.niaid.nih.gov
    Updated Feb 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous (2023). name lists for 'Detecting intersectionality in NER models: A data-driven approach' [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_7647194
    Explore at:
    Dataset updated
    Feb 22, 2023
    Authors
    Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Name lists used for data augmentation for testing biases (in terms of error disparities) of Name Entity Recognition in Danish NLP pipelines.

    The following lists are from Statistics Denmark:

    majority_first_names_2023_men.csv

    majority_first_names_2023_women.csv

    majority_last_names_2023.csv

    The following lists are from Eva Villarsen Meldgaard. 2005. Muslimske fornavne i danmark. Publisher: Københavns Universitet

    minority_first_names_men.csv

    minority_first_names_men.csv

    The list majority_unisex_names.csv is retrieved from The Agency of Family Law in Denmark, and the numbers are retrieved from the above lists from Statistics Denmark.

    The list minority_last_names.csv is retrieved from FamilyEducation.

    The list overlapping_names.csv contains first names, which both occur on the list of majority names and the list of minority names.

  11. g

    First name statistics for newborns by year of birth in Münster | gimi9.com

    • gimi9.com
    Updated Dec 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). First name statistics for newborns by year of birth in Münster | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_6d04e8b3-ed6e-406e-b85f-befe678205c2
    Explore at:
    Dataset updated
    Dec 15, 2024
    Description

    This data set contains the first name statistics for newborns in Münster from 2007 to 2021. Two different lists are made available: A first name hit list with the top 30 most commonly used first names, grouped by year of birth and gender. A list of “first name numbers”. This list shows how many babies have been given multiple first names. First name hitlist The table with the first name hitlist contains the following columns: Year = year of birth Rank = Top 30 rank Gender = girl or boy Name = the chosen name Number = Number of children with this name Please note the following additional information: All given first names are taken into account for the calculation of the first name list, i.e. the second and third names. For example, if “Tom” leads the list in a year, that doesn't mean that Tom was the most popular name, but Tom was the most frequently mentioned first name among the total first, second, third and other given names for babies. First name number The table with the first name number contains the following columns: Year = year of birth Children with.. = How many first names Number = number of children The following is an Excel file, which contains both lists in different spreadsheets, as well as two corresponding CSV files.

  12. 10000 Indian Companies and their Basic Information

    • kaggle.com
    zip
    Updated Mar 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shiv Prakash (2022). 10000 Indian Companies and their Basic Information [Dataset]. https://www.kaggle.com/datasets/shivprakash21/10000-indian-companies-and-their-basic-information
    Explore at:
    zip(221965 bytes)Available download formats
    Dataset updated
    Mar 13, 2022
    Authors
    Shiv Prakash
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    India
    Description

    Context

    List of companies available in India with some additional details.

    Content

    This dataset contain a list of companies along with additional details like (name, type, average rating, review count, company age, company headquarters and number of employee working on that company). The whole list of company is web scrapped from the website AmbitionBox.com.

    Acknowledgements

    Data Source: https://www.ambitionbox.com/list-of-companies This dataset wouldn't be made without data available at ambitionbox.com. So a big thanks to the whole team of ambitionbox from the whole kaggle community.

    Inspiration

    My intension to create this dataset was to enlist the companies available in India and do some analysis on that.

  13. d

    Popular Baby Names

    • catalog.data.gov
    • data.cityofnewyork.us
    • +5more
    Updated Jul 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2025). Popular Baby Names [Dataset]. https://catalog.data.gov/dataset/popular-baby-names
    Explore at:
    Dataset updated
    Jul 12, 2025
    Dataset provided by
    data.cityofnewyork.us
    Description

    Popular Baby Names by Sex and Ethnic Group Data were collected through civil birth registration. Each record represents the ranking of a baby name in the order of frequency. Data can be used to represent the popularity of a name. Caution should be used when assessing the rank of a baby name if the frequency count is close to 10; the ranking may vary year to year.

  14. Countries and territories Named Authority List

    • data.europa.eu
    rdf xml, xml, zip
    Updated Dec 3, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Publications Office of the European Union (2021). Countries and territories Named Authority List [Dataset]. https://data.europa.eu/data/datasets/country?locale=en
    Explore at:
    xml, rdf xml, zipAvailable download formats
    Dataset updated
    Dec 3, 2021
    Dataset provided by
    Publications Office of the European Unionhttp://op.europa.eu/
    European Union-
    Authors
    Publications Office of the European Union
    License

    http://data.europa.eu/eli/dec/2011/833/ojhttp://data.europa.eu/eli/dec/2011/833/oj

    Description

    Countries and territories is a controlled vocabulary that lists concepts associated with names of countries and territories. It is a corporate reference data asset covered by the Corporate Reference Data Management policy of the European Commission. It provides codes and names of geospatial and geopolitical entities in all official EU languages and is the result of a combination of multiple relevant standards, created to serve the requirements and use cases of the EU institutions services. Its main scope is to support documentary metadata activities. The codes of the concepts included are correlated with the ISO 3166 international standard. The authority code relies where possible on the ISO 3166-1 alpha-3 code. Additional user-assigned alpha-3 codes have been used to cover entities that are not included in the ISO 3166-1 standard. The corporate list contains mappings with the ISO 3166-1 two-letter codes, the Interinstitutional Style Guide codes and with other internal and external identifiers including ISO 3166-1 numeric, ISO 3166-3, UNSD M49, UNSD Geoscheme, IBAN, TIR, IANA domain. For the names of countries and territories, the corporate list synchronises with the Interinstitutional Style Guide (ISG, Section 7.1 and Annexes A5 and A6) and with the IATE terminology database. Membership and classification properties provide possibilities to group concepts, e.g., UN, EU, EEA, EFTA, Schengen area, Euro area, NATO, OECD, UCPM, ENP-EAST, ENP-SOUTH, EU candidate countries and potential candidates. Countries and territories is maintained by the Publications Office of the European Union and disseminated on the EU Vocabularies website. Regular updates are foreseen based on its stakeholders’ needs. Downloads in human-readable formats (.csv, .html) are also available.

  15. Dataset: Ethnicity-Based Name Partitioning for Author Name Disambiguation...

    • figshare.com
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jinseok Kim; Jenna Kim; Jason Owen-Smith (2023). Dataset: Ethnicity-Based Name Partitioning for Author Name Disambiguation Using Supervised Machine Learning [Dataset]. http://doi.org/10.6084/m9.figshare.14043791.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Jinseok Kim; Jenna Kim; Jason Owen-Smith
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains data files for a research paper, "Ethnicity-Based Name Partitioning for Author Name Disambiguation Using Supervised Machine Learning," published in the Journal of the Association for Information Science and Technology.Four zipped files are uploaded.Each zipped file contains five data files: signatures_train.txt, signatures_test.txt, records.txt, clusters_train.txt, and clusters_test.txt.1. 'Signatures' files contain lists of name instances. Each name instance (a row) is associated with information as follows. - 1st column: instance id (numeric): unique id assigned to a name instance - 2nd column: paper id (numeric): unique id assigned to a paper in which the name instance appears as an author name - 3rd column: byline position (numeric): integer indicating the position of the name instance in the authorship byline of the paper - 4th column: author name (string): name string formatted as surname, comma, and forename(s) - 5th column: ethnic name group (string): name ethnicity assigned by Ethnea to the name instance - 6th column: affiliation (string): affiliation associated with the name instance, if available in the original data - 7th column: block (string): simplified name string of the name instance to indicate its block membership (surname and first forename initial) - 8th column: author id (string): unique author id (i.e., author label) assigned by the creators of the original data2. 'Records' files contain lists of papers. Each paper is associated with information as follows. -1st column: paper id (numeric): unique paper id; this is the unique paper id (2nd column) in Signatures files -2nd column: year (numeric): year of publication * Some papers may have wrong publication years due to incorrect indexing or delayed updates in original data -3rd column: venue (string): name of journal or conference in which the paper is published * Venue names can be in full string or in a shortened format according to the formats in original data -4th column: authors (string; separated by vertical bar): list of author names that appear in the paper's byline * Author names are formatted into surname, comma, and forename(s) -5th column: title words (string; separated by space): words in a title of the paper. * Note that common words are stop-listed and each remaining word is stemmed using Porter's stemmer.3. 'Clusters' files contain lists of clusters. Each cluster is associated with information as follows. -1st column: cluster id (numeric): unique id of a cluster -2nd column: list of name instance ids (Signatures - 1st column) that belong to the same unique author id (Signatures - 8th column). Signatures and Clusters files consist of two subsets - train and test files - of original labeled data which are randomly split into 50%-50% by the authors of this study.Original labeled data for AMiner.zip, KISTI.zip, and GESIS.zip came from the studies cited below.If you use one of the uploaded data files, please cite them accordingly.[AMiner.zip]Tang, J., Fong, A. C. M., Wang, B., & Zhang, J. (2012). A Unified Probabilistic Framework for Name Disambiguation in Digital Library. IEEE Transactions on Knowledge and Data Engineering, 24(6), 975-987. doi:10.1109/Tkde.2011.13Wang, X., Tang, J., Cheng, H., & Yu, P. S. (2011). ADANA: Active Name Disambiguation. Paper presented at the 2011 IEEE 11th International Conference on Data Mining.[KISTI.zip]Kang, I. S., Kim, P., Lee, S., Jung, H., & You, B. J. (2011). Construction of a Large-Scale Test Set for Author Disambiguation. Information Processing & Management, 47(3), 452-465. doi:10.1016/j.ipm.2010.10.001Note that the original KISTI data contain errors and duplicates. This study reuses the revised version of KISTI reported in a study below.Kim, J. (2018). Evaluating author name disambiguation for digital libraries: A case of DBLP. Scientometrics, 116(3), 1867-1886. doi:10.1007/s11192-018-2824-5[GESIS.zip]Momeni, F., & Mayr, P. (2016). Evaluating Co-authorship Networks in Author Name Disambiguation for Common Names. Paper presented at the 20th international Conference on Theory and Practice of Digital Libraries (TPDL 2016), Hannover, Germany.Note that this study reuses the 'Evaluation Set' among the original GESIS data which was added titles by a study below.Kim, J., & Kim, J. (2020). Effect of forename string on author name disambiguation. Journal of the Association for Information Science and Technology, 71(7), 839-855. doi:10.1002/asi.24298[UM-IRIS.zip]This labeled dataset was created for this study. For description about the labeling method, please see 'Method' in the paper below.Kim, J., Kim, J., & Owen-Smith, J. (In print). Ethnicity-based name partitioning for author name disambiguation using supervised machine learning. Journal of the Association for Information Science and Technology. doi:10.1002/asi.24459.For details on the labeling method and limitations, see the paper below.Kim, J., & Owen-Smith, J. (2021). ORCID-linked labeled data for evaluating author name disambiguation at scale. Scientometrics. doi:10.1007/s11192-020-03826-6

  16. d

    Popular Baby Names - Dataset - data.sa.gov.au

    • data.sa.gov.au
    Updated Mar 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Popular Baby Names - Dataset - data.sa.gov.au [Dataset]. https://data.sa.gov.au/data/dataset/popular-baby-names
    Explore at:
    Dataset updated
    Mar 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Australia
    Description

    List of male and female baby names in South Australia from 1944 to 2024. The annual data for baby names is published January/February each year.

  17. Top 100 baby names in England and Wales: historical data

    • ons.gov.uk
    • cy.ons.gov.uk
    xlsx
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2025). Top 100 baby names in England and Wales: historical data [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/datasets/babynamesenglandandwalestop100babynameshistoricaldata
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 31, 2025
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Historic lists of top 100 names for baby boys and girls for 1904 to 2024 at 10-yearly intervals.

  18. C

    List of common first names 2017

    • ckan.mobidatalab.eu
    • data.europa.eu
    csv, pdf
    Updated Sep 26, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    State Agency for Civil and Regulatory Affairs (2020). List of common first names 2017 [Dataset]. https://ckan.mobidatalab.eu/dataset/list-of-common-first-names-2017
    Explore at:
    pdf, csvAvailable download formats
    Dataset updated
    Sep 26, 2020
    Dataset provided by
    State Agency for Civil and Regulatory Affairs
    License

    http://dcat-ap.de/def/licenses/cc-byhttp://dcat-ap.de/def/licenses/cc-by

    Description

    The list of the most frequently given first names, separated by gender and broken down by district. In contrast to previous years, if there are several first names, the position is also given. The position does not allow any conclusions to be drawn about the given name. All existing years of first name data are also available collectively at https://github.com/berlinonline/haeufige-vornamen-berlin.

  19. o

    Street Where You Live List - Dataset - City of Regina Open Data

    • openregina.ca
    Updated Feb 3, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). Street Where You Live List - Dataset - City of Regina Open Data [Dataset]. https://openregina.ca/dataset/street-where-you-live-list
    Explore at:
    Dataset updated
    Feb 3, 2017
    Area covered
    Regina
    Description

    List of name origins for street and park names currently in use or approved for use within the City of Regina. This dataset is an update of the book "The Street Where You Live", copyright the Regina Public Library, and is presented in cooperation with the Regina Public Library. Dataset will be updated as new names are approved for use by the Civic Naming Committee, an administrative committee of the City of Regina.

  20. List of .gov.uk domain names - Dataset - data.gov.uk

    • ckan.publishing.service.gov.uk
    Updated Oct 16, 2012
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ckan.publishing.service.gov.uk (2012). List of .gov.uk domain names - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/list-of-gov-uk-domain-names
    Explore at:
    Dataset updated
    Oct 16, 2012
    Dataset provided by
    GOV.UKhttp://gov.uk/
    CKANhttps://ckan.org/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    The UK Government manages the .gov.uk domain name registry in order to signify digital services that are part of the administration of the state, so that they can be identified as authoritative and trustworthy. The rules governing which organisations are eligible to register .gov.uk domain names and those names that may be used are set out in Naming and registering websites and social media channels. Public sector bodies may register .gov.uk domain names for a variety of purposes: Email only purposes Websites, including those for campaigns, education, information and transactions Page redirection (e.g. for alternative spellings of domain names) To maintain access to information on the UK Government Website Archive Domain registration is requested via an Internet Service Provider (ISP) to the technical administrators of the .gov.uk second level domain, JANET(UK). JANET(UK) acts as the domain name registry, holding information on which domain names are registered and who owns them on behalf of Cabinet Office. Cabinet Office is responsible for approving requests for new domain names and any appeals. The .gov.uk domain name approvals and appeals process is described online.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Christian Quest (2023). List of first names and surnames [Dataset]. https://data.europa.eu/data/datasets/5bc35259634f41122d982759

List of first names and surnames

Explore at:
7 scholarly articles cite this dataset (View in Google Scholar)
csv(2104259), csv(10841127)Available download formats
Dataset updated
May 18, 2023
Dataset authored and provided by
Christian Quest
License

https://www.etalab.gouv.fr/licence-ouverte-open-licencehttps://www.etalab.gouv.fr/licence-ouverte-open-licence

Description

In order to facilitate the anonymisation of data, this list of first names and surnames was extracted from the SIRENE database of INSEE.

For each first name and surname, the number of appearances is indicated.

ATTENTION: No content check is done, and these lists may contain anomalies present in the original database!

Search
Clear search
Close search
Google apps
Main menu