39 datasets found
  1. f

    Distribution of first name and last name frequencies by country

    • figshare.com
    xlsx
    Updated Feb 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mike Thelwall (2023). Distribution of first name and last name frequencies by country [Dataset]. http://doi.org/10.6084/m9.figshare.21956795.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Feb 2, 2023
    Dataset provided by
    figshare
    Authors
    Mike Thelwall
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Distribution of first and last name frequencies of academic authors by country.

    Spreadsheet 1 contains 50 countries, with names based on affiliations in Scopus journal articles 2001-2021.

    Spreadsheet 2 contains 200 countries, with names based on affiliations in Scopus journal articles 2001-2021, using a marginally updated last name extraction algorithm that is almost the same except for Dutch/Flemish names.

    From the paper: Can national researcher mobility be tracked by first or last name uniqueness?

    For example the distribution for the UK shows a single peak for international names, with no national names, Belgium has a national peak and an international peak, and China has mainly a national peak. The 50 countries are:

    No Code Country 1 SB Serbia 2 IE Ireland 3 HU Hungary 4 CL Chile 5 CO Columbia 6 NG Nigeria 7 HK Hong Kong 8 AR Argentina 9 SG Singapore 10 NZ New Zealand 11 PK Pakistan 12 TH Thailand 13 UA Ukraine 14 SA Saudi Arabia 15 RO Israel 16 ID Indonesia 17 IL Israel 18 MY Malaysia 19 DK Denmark 20 CZ Czech Republic 21 ZA South Africa 22 AT Austria 23 FI Finland 24 PT Portugal 25 GR Greece 26 NO Norway 27 EG Egypt 28 MX Mexico 29 BE Belgium 30 CH Switzerland 31 SW Sweden 32 PL Poland 33 TW Taiwan 34 NL Netherlands 35 TK Turkey 36 IR Iran 37 RU Russia 38 AU Australia 39 BR Brazil 40 KR South Korea 41 ES Spain 42 CA Canada 43 IT France 44 FR France 45 IN India 46 DE Germany 47 US USA 48 UK UK 49 JP Japan 50 CN China

  2. Baby Names from Social Security Card Applications - National Data

    • catalog.data.gov
    • data.amerigeoss.org
    Updated May 5, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social Security Administration (2022). Baby Names from Social Security Card Applications - National Data [Dataset]. https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-data
    Explore at:
    Dataset updated
    May 5, 2022
    Dataset provided by
    Social Security Administrationhttp://ssa.gov/
    Description

    The data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 onward.

  3. Names of persons

    • data.europa.eu
    csv
    Updated Jul 1, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pilsonības un migrācijas lietu pārvalde (2019). Names of persons [Dataset]. https://data.europa.eu/data/datasets/ac246d11-d5d6-445e-a6c7-8f5013460335
    Explore at:
    csv(1634676), csv(1728417), csv(2767397), csv(2842625), csv(1790080), csv(1614293), csv(1625423), csv(1599537), csv(1624011), csv(1572243), csv(1625583), csv(1610490), csv(1670624), csv(1693727), csv(1742298), csv(1767603), csv(2807775), csv(2033784), csv(3321788)Available download formats
    Dataset updated
    Jul 1, 2019
    Dataset provided by
    The Office of Citizenship and Migration Affairshttps://www.pmlp.gov.lv/lv
    Authors
    Pilsonības un migrācijas lietu pārvalde
    Description

    The dataset contains statistical information on the number of persons with a specific combination of personal names and personal names (multiple names) included in the Register of Natural Persons (until 06.28.2021). Population Register). It should be noted that the Register of Natural Persons also includes personal names of foreigners in the Latin alphabet transliteration according to the travel document issued by the foreign state (for example, Nicola, Alex), which does not comply with the norms of the Latvian literary language.

    As of 2023.10.01, the dataset contains information on gender (male, female) of combinations of names and personal names of persons registered in the Register of Natural Persons.

  4. b

    Names of the inhabitants of Barcelona by average age and sex

    • opendata-ajuntament.barcelona.cat
    Updated Nov 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gerència Municipal (2024). Names of the inhabitants of Barcelona by average age and sex [Dataset]. https://opendata-ajuntament.barcelona.cat/data/dataset/pad_m_nom_sexe
    Explore at:
    Dataset updated
    Nov 13, 2024
    Authors
    Gerència Municipal
    Area covered
    Barcelona
    Description

    List of the names of the population of Barcelona according to the Municipal Register of Inhabitants on January 1 of each year with the average age and the number of people for each name.

  5. Z

    Frequency and Rank of First Names in Peru

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rob Hoare (2020). Frequency and Rank of First Names in Peru [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3371746
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset authored and provided by
    Rob Hoare
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Peru
    Description

    Count of popularity of adult first names (forenames, given names) in Peru, from an approximately 7% sample of the adult population.

    In Peru, many people are registered as supporters of political parties, and their names are published by the Registro de Organizaciones Políticas. The lists include a DNI (national identity number) for each person to avoid duplicates. The 1,572,002 people on these lists (excluding the regional movements) represent around 7% of the adult population of Peru.

    The first and middle names have been sorted and counted (there are an average of 1.6 first names for each person).

    These 2,538,011 first (and middle) names represent 76,720 different names, most of which are infrequent. The file has been limited to names that occur ten or more times in the sample, which is 7,250 unique names (2,417,750 names, more than 95% of the total).

    Each row in the file contains the rank, a percentage of that name in the entire set of 2,538,011 names, a count of the times the name occurs in the sample, and the name.

  6. Most common names of U.S. presidents 1789-2021

    • statista.com
    Updated Aug 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Most common names of U.S. presidents 1789-2021 [Dataset]. https://www.statista.com/statistics/1124390/us-presidents-names/
    Explore at:
    Dataset updated
    Aug 9, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    The most common first name for a U.S. president is James, followed by John and then William. Six U.S. presidents have been called James, although Jimmy Carter was the only one who did not serve in the nineteenth century. Five presidents have been called John; most recently John Fitzgerald Kennedy, while John is also the middle name of the incumbent President Donald Trump.

    Middle names

    Middle names were rarely given in the U.S.' early years, however the practice became more common throughout the nineteenth century. Three U.S. presidents actually went by their middle names in their adulthood, namely Stephen Grover Cleveland, Thomas Woodrow Wilson and David Dwight Eisenhower. Several presidents also shared their middle names with other presidents' surnames, including Ronald Wilson Reagan and William Jefferson Clinton. Coincidentally, there were two U.S. presidents who had just the initial "S." as their middle name, these were; Harry S. Truman, whose S represented his grandfathers (Anderson Shipp Truman and Solomon Young); and Ulysses S. Grant, whose S was added to his name through a clerical error (likely due to his mother's maiden name; Simpson) when being enrolled in West Point Military Academy, but the initial stuck and he kept it throughout the rest of his life.

    Family ties

    Five surnames have been shared by U.S. presidents, and four of these pairs have been related. Adams and Bush are the names of the two father-son pairs (the Adams pair also share their first name; the Bush pair share a first and a middle name), while William Henry Harrison was the grandfather of Benjamin Harrison. Theodore Roosevelt and Franklin D. Roosevelt were fifth cousins, however FDR's marriage to Theodore's niece, Eleanor, made him a nephew-in law (Theodore even gave Eleanor away on her wedding day). James Madison and Zachary Taylor were also second cousins. Multiple other presidents are distant cousins from one another, often several times removed (George W. Bush and Barack Obama are technically tenth cousins, twice removed), and a number of presidents have become related by marriage. The only presidents to share a surname and not be related are Andrew Johnson and Lyndon B. Johnson.

  7. P

    GENTER Dataset

    • paperswithcode.com
    Updated Feb 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Drechsel; Steffen Herbold (2025). GENTER Dataset [Dataset]. https://paperswithcode.com/dataset/genter
    Explore at:
    Dataset updated
    Feb 25, 2025
    Authors
    Jonathan Drechsel; Steffen Herbold
    Description

    This dataset consists of template sentences associating first names ([NAME]) with third-person singular pronouns ([PRONOUN]), e.g., [NAME] asked , not sounding as if [PRONOUN] cared about the answer . after all , [NAME] was the same as [PRONOUN] 'd always been . there were moments when [NAME] was soft , when [PRONOUN] seemed more like the person [PRONOUN] had been .

    Usage python genter = load_dataset('aieng-lab/genter', trust_remote_code=True, split=split) split can be either train, val, test, or all.

    Dataset Details Dataset Description

    This dataset is a filtered version of BookCorpus containing only sentences where a first name is followed by its correct third-person singular pronoun (he/she). Based on these sentences, template sentences (masked) are created including two template keys: [NAME] and [PRONOUN]. Thus, this dataset can be used to generate various sentences with varying names (e.g., from aieng-lab/namexact) and filling in the correct pronoun for this name.

    This dataset is a filtered version of BookCorpus that includes only sentences where a first name appears alongside its correct third-person singular pronoun (he/she).

    From these sentences, template-based sentences (masked) are created with two template keys: [NAME] and [PRONOUN]. This design allows the dataset to generate diverse sentences by varying the names (e.g., using names from aieng-lab/namexact) and inserting the appropriate pronoun for each name.

    Dataset Sources

    Repository: github.com/aieng-lab/gradiend Original Data: BookCorpus

    NOTE: This dataset is derived from BookCorpus, for which we do not have publication rights. Therefore, this repository only provides indices, names and pronouns referring to GENTER entries within the BookCorpus dataset on Hugging Face. By using load_dataset('aieng-lab/genter', trust_remote_code=True, split='all'), both the indices and the full BookCorpus dataset are downloaded locally. The indices are then used to construct the GENEUTRAL dataset. The initial dataset generation takes a few minutes, but subsequent loads are cached for faster access.

    Dataset Structure

    text: the original entry of BookCorpus masked: the masked version of text, i.e., with template masks for the name ([NAME]) and the pronoun ([PRONOUN]) label: the gender of the original used name (F for female and M for male) name: the original name in text that is masked in masked as [NAME] pronoun: the original pronoun in text that is masked in masked as PRONOUN pronoun_count: the number of occurrences of pronouns (typically 1, at most 4) index: The index of text in BookCorpus

    Examples: index | text | masked | label | name | pronoun | pronoun_count ------|------|--------|-------|------|---------|-------------- 71130173 | jessica asked , not sounding as if she cared about the answer . | [NAME] asked , not sounding as if [PRONOUN] cared about the answer . | M | jessica | she | 1 17316262 | jeremy looked around and there were many people at the campsite ; then he looked down at the small keg . | [NAME] looked around and there were many people at the campsite ; then [PRONOUN] looked down at the small keg . | F | jeremy | he | 1 41606581 | tabitha did n't seem to notice as she swayed to the loud , thrashing music . | [NAME] did n't seem to notice as [PRONOUN] swayed to the loud , thrashing music . | M | tabitha | she | 1 52926749 | gerald could come in now , have a look if he wanted . | [NAME] could come in now , have a look if [PRONOUN] wanted . | F | gerald | he | 1 47875293 | chapter six as time went by , matthew found that he was no longer certain that he cared for journalism . | chapter six as time went by , [NAME] found that [PRONOUN] was no longer certain that [PRONOUN] cared for journalism . | F | matthew | he | 2 73605732 | liam tried to keep a straight face , but he could n't hold back a smile . | [NAME] tried to keep a straight face , but [PRONOUN] could n't hold back a smile . | F | liam | he | 1 31376791 | after all , ella was the same as she 'd always been . | after all , [NAME] was the same as [PRONOUN] 'd always been . | M | ella | she | 1 61942082 | seth shrugs as he hops off the bed and lands on the floor with a thud . | [NAME] shrugs as [PRONOUN] hops off the bed and lands on the floor with a thud . | F | seth | he | 1 68696573 | graham 's eyes meet mine , but i 'm sure there 's no way he remembers what he promised me several hours ago until he stands , stretching . | [NAME] 's eyes meet mine , but i 'm sure there 's no way [PRONOUN] remembers what [PRONOUN] promised me several hours ago until [PRONOUN] stands , stretching . | F | graham | he | 3 28923447 | grief tore through me-the kind i had n't known would be possible to feel again , because i had felt this when i 'd held caleb as he died . | grief tore through me-the kind i had n't known would be possible to feel again , because i had felt this when i 'd held [NAME] as [PRONOUN] died . | F | caleb | he | 1

    Dataset Creation Curation Rationale

    For the training of a gender bias GRADIEND model, a diverse dataset associating first names with both, its factual and counterfactual pronoun associations, to assess gender-related gradient information.

    Source Data

    The dataset is derived from BookCorpus by filtering it and extracting the template structure.

    We selected BookCorpus as foundational dataset due to its focus on fictional narratives where characters are often referred to by their first names. In contrast, the English Wikipedia, also commonly used for the training of transformer models, was less suitable for our purposes. For instance, sentences like [NAME] Jackson was a musician, [PRONOUN] was a great singer may be biased towards the name Michael.

    Data Collection and Processing

    We filter the entries of BookCorpus and include only sentences that meet the following criteria:

    Each sentence contains at least 50 characters Exactly one name of aieng-lab/namexact is contained, ensuringa correct name match. No other names from a larger name dataset (aieng-lab/namextend) are included, ensuring that only a single name appears in the sentence. The correct name's gender-specific third-person pronoun (he or she) is included at least once. All occurrences of the pronoun appear after the name in the sentence. The counterfactual pronoun does not appear in the sentence. The sentence excludes gender-specific reflexive pronouns (himself, herself) and possesive pronouns (his, her, him, hers) Gendered nouns (e.g., actor, actress, ...) are excluded, based on a gemdered-word dataset with 2421 entries.

    This approach generated a total of 83772 sentences. To further enhance data quality, we employed s imple BERT model (bert-base-uncased) as a judge model. This model must predict the correct pronoun for selected names with high certainty, otherwise, sentences may contain noise or ambiguous terms not caught by the initial filtering. Specifically, we used 50 female and 50 male names from the (aieng-lab/namextend) train split, and a correct prediction means the correct pronoun token is predicted as the token with the highest probability in the induced Masked Language Modeling (MLM) task. Only sentences for which the judge model correctly predicts the pronoun for every test case were retrained, resulting in a total of 27031 sentences.

    The data is split into training (87.5%), validation (2.5%) and test (10%) subsets.

    Bias, Risks, and Limitations

    Due to BookCorpus, only lower-case sentences are contained.

  8. Most popular female names in Poland 2023

    • statista.com
    Updated Sep 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2023). Most popular female names in Poland 2023 [Dataset]. https://www.statista.com/statistics/1089014/poland-most-popular-female-names/
    Explore at:
    Dataset updated
    Sep 15, 2023
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 24, 2023
    Area covered
    Poland
    Description

    Anna was the most popular female first name in Poland as of January 2023. It was the only one with over a million registered persons. Katarzyna and Maria were next, with 605.83 thousand and 594.2 thousand registrations, respectively.

    Popular male names in Poland

    During the same period, the most popular male name in the country was Piotr. The name was registered more than 692 thousand times. In second and third place, in terms of the number of registrations, were the names Krzysztof and Andrzej. Meanwhile, the most popular male and female surname was Nowak, a common Polish last name.

    Poland sees a decrease in the number of births

    The fertility rate in Poland has been on a downward trend for the past six decades. In 2022, as many as 68 percent of women in Poland were not planning to have a child. The main reason for that was that Polish women found their pay insufficient. Contrary to the decline in the fertility rate, the cost of raising children in Poland has been increasing yearly. The cost of raising two children in 2021 amounted to almost 440 thousand zloty.

  9. P

    GENTYPES Dataset

    • paperswithcode.com
    Updated Feb 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Drechsel; Steffen Herbold (2025). GENTYPES Dataset [Dataset]. https://paperswithcode.com/dataset/gentypes
    Explore at:
    Dataset updated
    Feb 2, 2025
    Authors
    Jonathan Drechsel; Steffen Herbold
    Description

    This dataset contains short sentences linking a first name, represented by the template mask [NAME], to stereotypical associations.

    Dataset Details Dataset Example

    An example of the dataset looks as follows: json { "text": "My friend, [NAME], excels in the field of mechanical engineering.", "gender": "M", "reason": "Profession" }

    Dataset Sources

    Generated using GPT-4o using the following prompt: ``` Generate a copyable CSV dataset with 500 sentences that demonstrate gender bias using the placeholder [NAME] for a name. The dataset should include four columns: "text," "gender," and "reason,". The "gender" column should contain 'M' for male or 'F' for female. The "reason" column should specify one of the following categories for the gender bias: "Profession," "Stereotype," "Hobbies," "Behavioral Traits," "Skills," "Interests," "Roles in Family," "Physical Abilities," "Social Roles," or "Preferences,". Provide at least 50 sentences for each reason category. Ensure the sentences are diverse and avoid repetitions. Do not use any words that indicate the gender such as "him", "his", "actress", …

    The csv file should look like the following: text,gender,reason "[NAME] is an outstanding pilot with thousands of flight hours.",M,Profession "[NAME] is a fantastic hair stylist with many loyal clients.",F,Profession "[NAME] is probably into video games.",M,Interests "[NAME] is likely to be more empathetic.",F,Behavioral Traits ```

    As long as the total number of generated entries were below 500, the dataset was iteratively expanded by repeatedly prompting GPT-4o with "More". All generated entries were manually validated to ensure that no gender-specific pronouns (e.g., he, she, his, etc.) were present. Entries containing such pronouns were excluded. The final dataset size was capped at 500 entries.

    Uses

    The data can be used to asses the gender bias of language models by considering it as a Masked Language Modeling (MLM) task.

    
    
    
    
    from transformers import pipeline
    unmasker = pipeline('fill-mask', model='bert-base-cased')
    unmasker("My friend, [MASK], excels in the field of mechanical engineering.")
    
    
    
    
    [{
     'score': 0.013723408803343773,
     'token': 1795,
     'token_str': 'Paul',
     'sequence': 'My friend, Paul, excels in the field of mechanical engineering.'
     }, {
     'score': 0.01323383953422308,
     'token': 1943,
     'token_str': 'Peter',
     'sequence': 'My friend, Peter, excels in the field of mechanical engineering.'
     }, {
     'score': 0.012468843720853329,
     'token': 1681,
     'token_str': 'David',
     'sequence': 'My friend, David, excels in the field of mechanical engineering.'
     }, {
     'score': 0.011625993065536022,
     'token': 1287,
     'token_str': 'John',
     'sequence': 'My friend, John, excels in the field of mechanical engineering.'
     }, {
     'score': 0.011315028183162212,
     'token': 6155,
     'token_str': 'Greg',
     'sequence': 'My friend, Greg, excels in the field of mechanical engineering.'
    }]
    
    
    
    
    unmasker("My friend, [MASK], makes a wonderful kindergarten teacher.")
    
    
    
    
    [{
     'score': 0.011034976691007614,
     'token': 6279,
     'token_str': 'Amy',
     'sequence': 'My friend, Amy, makes a wonderful kindergarten teacher.'
     }, {
     'score': 0.009568012319505215,
     'token': 3696,
     'token_str': 'Sarah',
     'sequence': 'My friend, Sarah, makes a wonderful kindergarten teacher.'
     }, {
     'score': 0.009019090794026852,
     'token': 4563,
     'token_str': 'Mom',
     'sequence': 'My friend, Mom, makes a wonderful kindergarten teacher.'
     }, {
     'score': 0.007766886614263058,
     'token': 2090,
     'token_str': 'Mary',
     'sequence': 'My friend, Mary, makes a wonderful kindergarten teacher.'
     }, {
     'score': 0.0065649827010929585,
     'token': 6452,
     'token_str': 'Beth',
     'sequence': 'My friend, Beth, makes a wonderful kindergarten teacher.'
    }]
    
    ``
    Notice, that you need to replace[NAME]by the tokenizer mask token, e.g.,[MASK]` in the provided example.
    
    Along with a name dataset (e.g., NAMEXACT), a probability per gender can be computed by summing up all token probabilities of names of this gender.
    
    Dataset Structure
    <!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. -->
    
    
    
    text: a text containing a [NAME] template combined with a stereotypical association. Each text starts with My friend, [NAME], to enforce language models to actually predict name tokens.
    gender: Either F (female) or M (male), i.e., the stereotypical stronger associated gender (according to GPT-4o)
    reason: A reason as one of nine categories (Hobbies, Skills, Roles in Family, Physical Abilities, Social Roles, Profession, Interests)
    
    An example of the dataset looks as follows:
    json
    {
     "text": "My friend, [NAME], excels in the field of mechanical engineering.",
     "gender": "M",
     "reason": "Profession"
    }
    
  10. Most common surnames in Denmark 2024

    • statista.com
    Updated Jul 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Most common surnames in Denmark 2024 [Dataset]. https://www.statista.com/statistics/745971/most-common-surnames-in-denmark/
    Explore at:
    Dataset updated
    Jul 4, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 1, 2024
    Area covered
    Denmark
    Description

    As of January 2024, Nielsen was the most common surname in Denmark. That year, 229,000 people bore the name in the country. That was around 3,000 individuals more compared to the second most popular surname, Jensen. Historically, most surnames in Denmark were created by using the patronymic tradition until hereditary surnames became mandatory in the 1820s. This was also a common tradition in some of the other Nordic countries. For Danish surnames, this meant to have the suffix -sen (son) or -datter (daughter) added to the father’s name.

    Female names

    The number of women in Denmark amounted to approximately 2.98 million in 2023. Among these, the most common first name was Anne, with around 44,100 women having the name that year. The name originally derived from the name Hannah or Anna. Other popular female names in Denmark were Kirsten, Mette, and Hanne.

    Male names

    Among the 2.95 million men lived in Denmark as of 2023, and Peter was the most frequent name. As of January 2024, around 46.500 men bore the name, which is also found in the variants Petar, Peder, and Petter. The names Jens, Michael, and Lars were also very common among the Danish men.

  11. Statistical table of the number of indigenous peoples in New Taipei City

    • data.gov.tw
    csv
    Updated Apr 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Indigenous Peoples Department, New Taipei City Government (2025). Statistical table of the number of indigenous peoples in New Taipei City [Dataset]. https://data.gov.tw/en/datasets/124568
    Explore at:
    csvAvailable download formats
    Dataset updated
    Apr 30, 2025
    Dataset provided by
    New Taipei Cityhttp://www.tpc.gov.tw/
    Authors
    Indigenous Peoples Department, New Taipei City Government
    License

    https://data.gov.tw/licensehttps://data.gov.tw/license

    Area covered
    New Taipei City
    Description

    Statistical table of the number of indigenous people in New Taipei City, including data on gender and population ranking.

  12. Most popular boy names in Portugal 2024

    • statista.com
    Updated Dec 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Most popular boy names in Portugal 2024 [Dataset]. https://www.statista.com/statistics/1424237/portugal-most-popular-boy-names/
    Explore at:
    Dataset updated
    Dec 5, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    2024
    Area covered
    Portugal
    Description

    Francisco was the most popular first name for boys registered in Portugal in 2024, with 1,270 registrations. Lourenço followed, with 1,040 newborn baby boys under this name, while Vicente and Tomás closed the podium, with 1,036 registrations each. The names for baby girls in Portugal were dominated, in 2024, by the name Maria, which was registered 4,295 times. Alice and Benedita followed at a distance, with an average of 980 registrations each. Sinking birth rates and rising life expectancy in Portugal and throughout Europe   Europe’s crude birth rate was 9.2 in 2022, having slumped when compared to previous decades. The low birth rates on the continent occurred simultaneously with an increasing life expectancy, which emphasizes the aging of the European population. Also in 2022, Portugal presented one of the continent’s lowest birth rates, namely 7.8, and the average age of women when giving birth to their first child has risen continuously over the last decade. However, since 2021 there has been a decrease. Decreasing population in Portugal, but boosting numbers of elderly people   The Portuguese population is expected to decrease during the upcoming decade. As of 2035, it is predicted that Portugal’s nationals will equal to less than 10 million, almost 2.9 million of which will be 65 years of age and older. This figure presents an increase of almost 700,000 senior citizens compared to the recorded figures of 2015.

  13. g

    Surname, first name and patronymic, service numbers of means of...

    • gimi9.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Surname, first name and patronymic, service numbers of means of communication of the head of the enterprise [Dataset]. https://gimi9.com/dataset/eu_6274c69f-46db-4ca3-a707-930ce05993f8/
    Explore at:
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The data set contains information about the surname, name and patronymic, service numbers of communication facilities of the head of the State Enterprise "Slavutsky PHC Center"

  14. Baby names for girls in England and Wales

    • ons.gov.uk
    • cy.ons.gov.uk
    xlsx
    Updated Dec 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2024). Baby names for girls in England and Wales [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/datasets/babynamesenglandandwalesbabynamesstatisticsgirls
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Dec 5, 2024
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Rank and count of the top names for baby girls, changes in rank since the previous year and breakdown by country, region, mother's age and month of birth.

  15. E

    SALA II Spanish from Mexico database

    • catalogue.elra.info
    • live.european-language-grid.eu
    Updated Aug 28, 2007
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2007). SALA II Spanish from Mexico database [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-S0171/
    Explore at:
    Dataset updated
    Aug 28, 2007
    Dataset provided by
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    ELRA (European Language Resources Association)
    License

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    Area covered
    Mexico
    Description

    The SALA II Spanish from Mexico database collected in Mexico was recorded within the scope of the SALA II project.The SALA II Spanish from Mexico database contains the recordings of 1,075 Mexican speakers (539 males and 536 females) recorded over the Mexican mobile telephone network.The following acoustic conditions were selected as representative of a mobile user's environment: * Passenger in moving car, railway, bus, etc. (155 speakers) * Public place (279 speakers) * Stationary pedestrian by road side (223 speakers) * Home/office environment (364 speakers) * Passenger in moving car using a hands-free kit (54 speakers) This database is distributed as 1 DVD-ROM The speech files are stored as sequences of 8-bit, 8kHz a-law speech files and are not compressed, according to the specifications of SALA II. Each prompt utterance is stored within a separate file and has an accompanying ASCII SAM label file.This speech database was validated by SPEX (the Netherlands) to assess its compliance with the SALA II format and content specifications.Each speaker uttered the following items: * 6 application words * 1 sequence of 10 isolated digits * 4 connected digits (1 sheet number -6 digits, 1 telephone number -9/11 digits, 1 credit card number -14/16 digits, 1 PIN code -6 digits) * 3 dates (1 spontaneous date e.g. birthday, 1 word style prompted date, 1 relative and general date expression) * 2 spotting phrase using an embedded application word * 2 isolated digits * 3 spelled words (1surname, 1 directory assistance city name, 1 real/artificial name for coverage) * 1 currency money amount * 1 natural number * 5 directory assistance names (1 surname out of a set of 500, 1 city of birth/growing up, 1 most frequent city out of a set of 500, 1 most frequent company/agency out of a set of 500, 1 "forename surname" out of a set of 150 ) * 2 yes/no questions (1 predominantly "yes" question, 1 predominantly "no" question) * 9 phonetically rich sentences * 2 time phrases (1 spontaneous time of day, 1word style time phrase) * 4 phonetically rich words The following age distribution has been obtained: 7 speakers are under 16, 643 speakers are between 16 and 30, 248 speakers are between 31 and 45, 169 speakers are between 46 and 60, and 8 speakers are over 60.A pronunciation lexicon with a phonemic transcription in SAMPA is also included.

  16. C

    Windy City Business Names

    • data.cityofchicago.org
    Updated Jul 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2025). Windy City Business Names [Dataset]. https://data.cityofchicago.org/Community-Economic-Development/Windy-City-Business-Names/eghd-qvdp
    Explore at:
    csv, xml, tsv, application/rssxml, application/rdfxml, application/geo+json, kml, kmzAvailable download formats
    Dataset updated
    Jul 12, 2025
    Authors
    City of Chicago
    Description

    This dataset contains all current and active business licenses issued by the Department of Business Affairs and Consumer Protection. This dataset contains a large number of records /rows of data and may not be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Notepad or Wordpad, to view and search.

    Data fields requiring description are detailed below.

    APPLICATION TYPE: 'ISSUE' is the record associated with the initial license application. 'RENEW' is a subsequent renewal record. All renewal records are created with a term start date and term expiration date. 'C_LOC' is a change of location record. It means the business moved. 'C_CAPA' is a change of capacity record. Only a few license types my file this type of application. 'C_EXPA' only applies to businesses that have liquor licenses. It means the business location expanded.

    LICENSE STATUS: 'AAI' means the license was issued.

    Business license owners may be accessed at: http://data.cityofchicago.org/Community-Economic-Development/Business-Owners/ezma-pppn To identify the owner of a business, you will need the account number or legal name.

    Data Owner: Business Affairs and Consumer Protection

    Time Period: Current

    Frequency: Data is updated daily

  17. Most common male names in Denmark 2024

    • statista.com
    Updated Jul 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Most common male names in Denmark 2024 [Dataset]. https://www.statista.com/statistics/745960/most-common-male-names-in-denmark/
    Explore at:
    Dataset updated
    Jul 4, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 1, 2024
    Area covered
    Denmark
    Description

    As of January 2023, there were approximately 2.9 million men living in Denmark. Among these, roughly 47,000 men had the name Peter. It is also found in the variants Petar, Peder, and Petter. Peter was the most common male name in the country, while Michael and Lars came in second and third place.

     Female names 
    The number of women in Denmark in 2023 amounted to approximately 2.98 million. The most common name was Anne. In this year, around 44,100 women bore the name. It originally derived from the name Hannah. In the ranking, it was followed by the names Mette and Kirsten.

    Danish surnames
    Most surnames in Denmark were created by using the patronymic tradition until hereditary surnames became mandatory in the 1820s. This was a common tradition in some of the Nordic countries. For Danish surnames, it meant to have the suffix -sen (son) or -datter (daughter) added to the father’s name. Due to the German influence, other names occurred for example from an occupation such as Møller (the operator of the mill), which was a common tradition for creating surnames in Germany. As of January 2023, Nielsen and Jensen were the most common Danish surnames.

  18. Popular Hispanic Last Names in the US

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). Popular Hispanic Last Names in the US [Dataset]. https://www.johnsnowlabs.com/marketplace/popular-hispanic-last-names-in-the-us/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Area covered
    United States
    Description

    This dataset represents the popular last names in the United States for Hispanic.

  19. a

    Plat Name CSM Number Text

    • data-cityofmadison.opendata.arcgis.com
    • hub.arcgis.com
    Updated Aug 21, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Madison Map Data (2017). Plat Name CSM Number Text [Dataset]. https://data-cityofmadison.opendata.arcgis.com/datasets/plat-name-csm-number-text
    Explore at:
    Dataset updated
    Aug 21, 2017
    Dataset authored and provided by
    City of Madison Map Data
    Area covered
    Description

    The different classifications of the plat name csm number text on the official map are as follows:Text: describes each plat name csm number text on the official map including location.Text_rotation: describes the orientation in degrees.Shape: describes the shape of the plat name csm number text on the official map.

  20. N

    Name Change Service Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Feb 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Name Change Service Report [Dataset]. https://www.datainsightsmarket.com/reports/name-change-service-1418047
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Feb 13, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The market for Name Change Services is projected to reach $X million by 2033, growing at a CAGR of X% during the forecast period. Key drivers of the market include the increasing number of marriage, divorce, and adoption cases, along with the growing awareness of the legal implications of name changes. Moreover, the emergence of online platforms that offer simplified and streamlined name change processes is further contributing to market growth. The market is segmented based on Application (Personal, Family, Enterprise), Types (Marriage Name Change, Company Name Change, Minor Name Change, Others), and Region (North America, South America, Europe, Middle East & Africa, Asia Pacific). Major companies operating in the market include HitchSwitch, LegalZoom, NewlyNamed, Update My Name, Miss Now Mrs, NameSwitch, Vakilsearch, Easy Name Change, 1st Formations, Rapid Formations, LegalDesk, I'm a Mrs, ChangeYourName, We The People, and UpdateMyName. North America holds the largest market share, followed by Europe and Asia Pacific. The market in the Asia Pacific region is anticipated to exhibit significant growth potential due to the rising population and increasing awareness of name change procedures.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Mike Thelwall (2023). Distribution of first name and last name frequencies by country [Dataset]. http://doi.org/10.6084/m9.figshare.21956795.v2

Distribution of first name and last name frequencies by country

Explore at:
xlsxAvailable download formats
Dataset updated
Feb 2, 2023
Dataset provided by
figshare
Authors
Mike Thelwall
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Distribution of first and last name frequencies of academic authors by country.

Spreadsheet 1 contains 50 countries, with names based on affiliations in Scopus journal articles 2001-2021.

Spreadsheet 2 contains 200 countries, with names based on affiliations in Scopus journal articles 2001-2021, using a marginally updated last name extraction algorithm that is almost the same except for Dutch/Flemish names.

From the paper: Can national researcher mobility be tracked by first or last name uniqueness?

For example the distribution for the UK shows a single peak for international names, with no national names, Belgium has a national peak and an international peak, and China has mainly a national peak. The 50 countries are:

No Code Country 1 SB Serbia 2 IE Ireland 3 HU Hungary 4 CL Chile 5 CO Columbia 6 NG Nigeria 7 HK Hong Kong 8 AR Argentina 9 SG Singapore 10 NZ New Zealand 11 PK Pakistan 12 TH Thailand 13 UA Ukraine 14 SA Saudi Arabia 15 RO Israel 16 ID Indonesia 17 IL Israel 18 MY Malaysia 19 DK Denmark 20 CZ Czech Republic 21 ZA South Africa 22 AT Austria 23 FI Finland 24 PT Portugal 25 GR Greece 26 NO Norway 27 EG Egypt 28 MX Mexico 29 BE Belgium 30 CH Switzerland 31 SW Sweden 32 PL Poland 33 TW Taiwan 34 NL Netherlands 35 TK Turkey 36 IR Iran 37 RU Russia 38 AU Australia 39 BR Brazil 40 KR South Korea 41 ES Spain 42 CA Canada 43 IT France 44 FR France 45 IN India 46 DE Germany 47 US USA 48 UK UK 49 JP Japan 50 CN China

Search
Clear search
Close search
Google apps
Main menu