Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Distribution of first and last name frequencies of academic authors by country.
Spreadsheet 1 contains 50 countries, with names based on affiliations in Scopus journal articles 2001-2021.
Spreadsheet 2 contains 200 countries, with names based on affiliations in Scopus journal articles 2001-2021, using a marginally updated last name extraction algorithm that is almost the same except for Dutch/Flemish names.
From the paper: Can national researcher mobility be tracked by first or last name uniqueness?
For example the distribution for the UK shows a single peak for international names, with no national names, Belgium has a national peak and an international peak, and China has mainly a national peak. The 50 countries are:
No Code Country 1 SB Serbia 2 IE Ireland 3 HU Hungary 4 CL Chile 5 CO Columbia 6 NG Nigeria 7 HK Hong Kong 8 AR Argentina 9 SG Singapore 10 NZ New Zealand 11 PK Pakistan 12 TH Thailand 13 UA Ukraine 14 SA Saudi Arabia 15 RO Israel 16 ID Indonesia 17 IL Israel 18 MY Malaysia 19 DK Denmark 20 CZ Czech Republic 21 ZA South Africa 22 AT Austria 23 FI Finland 24 PT Portugal 25 GR Greece 26 NO Norway 27 EG Egypt 28 MX Mexico 29 BE Belgium 30 CH Switzerland 31 SW Sweden 32 PL Poland 33 TW Taiwan 34 NL Netherlands 35 TK Turkey 36 IR Iran 37 RU Russia 38 AU Australia 39 BR Brazil 40 KR South Korea 41 ES Spain 42 CA Canada 43 IT France 44 FR France 45 IN India 46 DE Germany 47 US USA 48 UK UK 49 JP Japan 50 CN China
The data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 onward.
The dataset contains statistical information on the number of persons with a specific combination of personal names and personal names (multiple names) included in the Register of Natural Persons (until 06.28.2021). Population Register). It should be noted that the Register of Natural Persons also includes personal names of foreigners in the Latin alphabet transliteration according to the travel document issued by the foreign state (for example, Nicola, Alex), which does not comply with the norms of the Latvian literary language.
As of 2023.10.01, the dataset contains information on gender (male, female) of combinations of names and personal names of persons registered in the Register of Natural Persons.
List of the names of the population of Barcelona according to the Municipal Register of Inhabitants on January 1 of each year with the average age and the number of people for each name.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Count of popularity of adult first names (forenames, given names) in Peru, from an approximately 7% sample of the adult population.
In Peru, many people are registered as supporters of political parties, and their names are published by the Registro de Organizaciones Políticas. The lists include a DNI (national identity number) for each person to avoid duplicates. The 1,572,002 people on these lists (excluding the regional movements) represent around 7% of the adult population of Peru.
The first and middle names have been sorted and counted (there are an average of 1.6 first names for each person).
These 2,538,011 first (and middle) names represent 76,720 different names, most of which are infrequent. The file has been limited to names that occur ten or more times in the sample, which is 7,250 unique names (2,417,750 names, more than 95% of the total).
Each row in the file contains the rank, a percentage of that name in the entire set of 2,538,011 names, a count of the times the name occurs in the sample, and the name.
The most common first name for a U.S. president is James, followed by John and then William. Six U.S. presidents have been called James, although Jimmy Carter was the only one who did not serve in the nineteenth century. Five presidents have been called John; most recently John Fitzgerald Kennedy, while John is also the middle name of the incumbent President Donald Trump.
Middle names
Middle names were rarely given in the U.S.' early years, however the practice became more common throughout the nineteenth century. Three U.S. presidents actually went by their middle names in their adulthood, namely Stephen Grover Cleveland, Thomas Woodrow Wilson and David Dwight Eisenhower. Several presidents also shared their middle names with other presidents' surnames, including Ronald Wilson Reagan and William Jefferson Clinton. Coincidentally, there were two U.S. presidents who had just the initial "S." as their middle name, these were; Harry S. Truman, whose S represented his grandfathers (Anderson Shipp Truman and Solomon Young); and Ulysses S. Grant, whose S was added to his name through a clerical error (likely due to his mother's maiden name; Simpson) when being enrolled in West Point Military Academy, but the initial stuck and he kept it throughout the rest of his life.
Family ties
Five surnames have been shared by U.S. presidents, and four of these pairs have been related. Adams and Bush are the names of the two father-son pairs (the Adams pair also share their first name; the Bush pair share a first and a middle name), while William Henry Harrison was the grandfather of Benjamin Harrison. Theodore Roosevelt and Franklin D. Roosevelt were fifth cousins, however FDR's marriage to Theodore's niece, Eleanor, made him a nephew-in law (Theodore even gave Eleanor away on her wedding day). James Madison and Zachary Taylor were also second cousins. Multiple other presidents are distant cousins from one another, often several times removed (George W. Bush and Barack Obama are technically tenth cousins, twice removed), and a number of presidents have become related by marriage. The only presidents to share a surname and not be related are Andrew Johnson and Lyndon B. Johnson.
This dataset consists of template sentences associating first names ([NAME]) with third-person singular pronouns ([PRONOUN]), e.g., [NAME] asked , not sounding as if [PRONOUN] cared about the answer . after all , [NAME] was the same as [PRONOUN] 'd always been . there were moments when [NAME] was soft , when [PRONOUN] seemed more like the person [PRONOUN] had been .
Usage python genter = load_dataset('aieng-lab/genter', trust_remote_code=True, split=split) split can be either train, val, test, or all.
Dataset Details Dataset Description
This dataset is a filtered version of BookCorpus containing only sentences where a first name is followed by its correct third-person singular pronoun (he/she). Based on these sentences, template sentences (masked) are created including two template keys: [NAME] and [PRONOUN]. Thus, this dataset can be used to generate various sentences with varying names (e.g., from aieng-lab/namexact) and filling in the correct pronoun for this name.
This dataset is a filtered version of BookCorpus that includes only sentences where a first name appears alongside its correct third-person singular pronoun (he/she).
From these sentences, template-based sentences (masked) are created with two template keys: [NAME] and [PRONOUN]. This design allows the dataset to generate diverse sentences by varying the names (e.g., using names from aieng-lab/namexact) and inserting the appropriate pronoun for each name.
Dataset Sources
Repository: github.com/aieng-lab/gradiend Original Data: BookCorpus
NOTE: This dataset is derived from BookCorpus, for which we do not have publication rights. Therefore, this repository only provides indices, names and pronouns referring to GENTER entries within the BookCorpus dataset on Hugging Face. By using load_dataset('aieng-lab/genter', trust_remote_code=True, split='all'), both the indices and the full BookCorpus dataset are downloaded locally. The indices are then used to construct the GENEUTRAL dataset. The initial dataset generation takes a few minutes, but subsequent loads are cached for faster access.
Dataset Structure
text: the original entry of BookCorpus masked: the masked version of text, i.e., with template masks for the name ([NAME]) and the pronoun ([PRONOUN]) label: the gender of the original used name (F for female and M for male) name: the original name in text that is masked in masked as [NAME] pronoun: the original pronoun in text that is masked in masked as PRONOUN pronoun_count: the number of occurrences of pronouns (typically 1, at most 4) index: The index of text in BookCorpus
Examples: index | text | masked | label | name | pronoun | pronoun_count ------|------|--------|-------|------|---------|-------------- 71130173 | jessica asked , not sounding as if she cared about the answer . | [NAME] asked , not sounding as if [PRONOUN] cared about the answer . | M | jessica | she | 1 17316262 | jeremy looked around and there were many people at the campsite ; then he looked down at the small keg . | [NAME] looked around and there were many people at the campsite ; then [PRONOUN] looked down at the small keg . | F | jeremy | he | 1 41606581 | tabitha did n't seem to notice as she swayed to the loud , thrashing music . | [NAME] did n't seem to notice as [PRONOUN] swayed to the loud , thrashing music . | M | tabitha | she | 1 52926749 | gerald could come in now , have a look if he wanted . | [NAME] could come in now , have a look if [PRONOUN] wanted . | F | gerald | he | 1 47875293 | chapter six as time went by , matthew found that he was no longer certain that he cared for journalism . | chapter six as time went by , [NAME] found that [PRONOUN] was no longer certain that [PRONOUN] cared for journalism . | F | matthew | he | 2 73605732 | liam tried to keep a straight face , but he could n't hold back a smile . | [NAME] tried to keep a straight face , but [PRONOUN] could n't hold back a smile . | F | liam | he | 1 31376791 | after all , ella was the same as she 'd always been . | after all , [NAME] was the same as [PRONOUN] 'd always been . | M | ella | she | 1 61942082 | seth shrugs as he hops off the bed and lands on the floor with a thud . | [NAME] shrugs as [PRONOUN] hops off the bed and lands on the floor with a thud . | F | seth | he | 1 68696573 | graham 's eyes meet mine , but i 'm sure there 's no way he remembers what he promised me several hours ago until he stands , stretching . | [NAME] 's eyes meet mine , but i 'm sure there 's no way [PRONOUN] remembers what [PRONOUN] promised me several hours ago until [PRONOUN] stands , stretching . | F | graham | he | 3 28923447 | grief tore through me-the kind i had n't known would be possible to feel again , because i had felt this when i 'd held caleb as he died . | grief tore through me-the kind i had n't known would be possible to feel again , because i had felt this when i 'd held [NAME] as [PRONOUN] died . | F | caleb | he | 1
Dataset Creation Curation Rationale
For the training of a gender bias GRADIEND model, a diverse dataset associating first names with both, its factual and counterfactual pronoun associations, to assess gender-related gradient information.
Source Data
The dataset is derived from BookCorpus by filtering it and extracting the template structure.
We selected BookCorpus as foundational dataset due to its focus on fictional narratives where characters are often referred to by their first names. In contrast, the English Wikipedia, also commonly used for the training of transformer models, was less suitable for our purposes. For instance, sentences like [NAME] Jackson was a musician, [PRONOUN] was a great singer may be biased towards the name Michael.
Data Collection and Processing
We filter the entries of BookCorpus and include only sentences that meet the following criteria:
Each sentence contains at least 50 characters Exactly one name of aieng-lab/namexact is contained, ensuringa correct name match. No other names from a larger name dataset (aieng-lab/namextend) are included, ensuring that only a single name appears in the sentence. The correct name's gender-specific third-person pronoun (he or she) is included at least once. All occurrences of the pronoun appear after the name in the sentence. The counterfactual pronoun does not appear in the sentence. The sentence excludes gender-specific reflexive pronouns (himself, herself) and possesive pronouns (his, her, him, hers) Gendered nouns (e.g., actor, actress, ...) are excluded, based on a gemdered-word dataset with 2421 entries.
This approach generated a total of 83772 sentences. To further enhance data quality, we employed s imple BERT model (bert-base-uncased) as a judge model. This model must predict the correct pronoun for selected names with high certainty, otherwise, sentences may contain noise or ambiguous terms not caught by the initial filtering. Specifically, we used 50 female and 50 male names from the (aieng-lab/namextend) train split, and a correct prediction means the correct pronoun token is predicted as the token with the highest probability in the induced Masked Language Modeling (MLM) task. Only sentences for which the judge model correctly predicts the pronoun for every test case were retrained, resulting in a total of 27031 sentences.
The data is split into training (87.5%), validation (2.5%) and test (10%) subsets.
Bias, Risks, and Limitations
Due to BookCorpus, only lower-case sentences are contained.
Anna was the most popular female first name in Poland as of January 2023. It was the only one with over a million registered persons. Katarzyna and Maria were next, with 605.83 thousand and 594.2 thousand registrations, respectively.
Popular male names in Poland
During the same period, the most popular male name in the country was Piotr. The name was registered more than 692 thousand times. In second and third place, in terms of the number of registrations, were the names Krzysztof and Andrzej. Meanwhile, the most popular male and female surname was Nowak, a common Polish last name.
Poland sees a decrease in the number of births
The fertility rate in Poland has been on a downward trend for the past six decades. In 2022, as many as 68 percent of women in Poland were not planning to have a child. The main reason for that was that Polish women found their pay insufficient. Contrary to the decline in the fertility rate, the cost of raising children in Poland has been increasing yearly. The cost of raising two children in 2021 amounted to almost 440 thousand zloty.
This dataset contains short sentences linking a first name, represented by the template mask [NAME], to stereotypical associations.
Dataset Details Dataset Example
An example of the dataset looks as follows: json { "text": "My friend, [NAME], excels in the field of mechanical engineering.", "gender": "M", "reason": "Profession" }
Dataset Sources
Generated using GPT-4o using the following prompt: ``` Generate a copyable CSV dataset with 500 sentences that demonstrate gender bias using the placeholder [NAME] for a name. The dataset should include four columns: "text," "gender," and "reason,". The "gender" column should contain 'M' for male or 'F' for female. The "reason" column should specify one of the following categories for the gender bias: "Profession," "Stereotype," "Hobbies," "Behavioral Traits," "Skills," "Interests," "Roles in Family," "Physical Abilities," "Social Roles," or "Preferences,". Provide at least 50 sentences for each reason category. Ensure the sentences are diverse and avoid repetitions. Do not use any words that indicate the gender such as "him", "his", "actress", …
The csv file should look like the following: text,gender,reason "[NAME] is an outstanding pilot with thousands of flight hours.",M,Profession "[NAME] is a fantastic hair stylist with many loyal clients.",F,Profession "[NAME] is probably into video games.",M,Interests "[NAME] is likely to be more empathetic.",F,Behavioral Traits ```
As long as the total number of generated entries were below 500, the dataset was iteratively expanded by repeatedly prompting GPT-4o with "More". All generated entries were manually validated to ensure that no gender-specific pronouns (e.g., he, she, his, etc.) were present. Entries containing such pronouns were excluded. The final dataset size was capped at 500 entries.
Uses
The data can be used to asses the gender bias of language models by considering it as a Masked Language Modeling (MLM) task.
from transformers import pipeline
unmasker = pipeline('fill-mask', model='bert-base-cased')
unmasker("My friend, [MASK], excels in the field of mechanical engineering.")
[{
'score': 0.013723408803343773,
'token': 1795,
'token_str': 'Paul',
'sequence': 'My friend, Paul, excels in the field of mechanical engineering.'
}, {
'score': 0.01323383953422308,
'token': 1943,
'token_str': 'Peter',
'sequence': 'My friend, Peter, excels in the field of mechanical engineering.'
}, {
'score': 0.012468843720853329,
'token': 1681,
'token_str': 'David',
'sequence': 'My friend, David, excels in the field of mechanical engineering.'
}, {
'score': 0.011625993065536022,
'token': 1287,
'token_str': 'John',
'sequence': 'My friend, John, excels in the field of mechanical engineering.'
}, {
'score': 0.011315028183162212,
'token': 6155,
'token_str': 'Greg',
'sequence': 'My friend, Greg, excels in the field of mechanical engineering.'
}]
unmasker("My friend, [MASK], makes a wonderful kindergarten teacher.")
[{
'score': 0.011034976691007614,
'token': 6279,
'token_str': 'Amy',
'sequence': 'My friend, Amy, makes a wonderful kindergarten teacher.'
}, {
'score': 0.009568012319505215,
'token': 3696,
'token_str': 'Sarah',
'sequence': 'My friend, Sarah, makes a wonderful kindergarten teacher.'
}, {
'score': 0.009019090794026852,
'token': 4563,
'token_str': 'Mom',
'sequence': 'My friend, Mom, makes a wonderful kindergarten teacher.'
}, {
'score': 0.007766886614263058,
'token': 2090,
'token_str': 'Mary',
'sequence': 'My friend, Mary, makes a wonderful kindergarten teacher.'
}, {
'score': 0.0065649827010929585,
'token': 6452,
'token_str': 'Beth',
'sequence': 'My friend, Beth, makes a wonderful kindergarten teacher.'
}]
``
Notice, that you need to replace[NAME]by the tokenizer mask token, e.g.,[MASK]` in the provided example.
Along with a name dataset (e.g., NAMEXACT), a probability per gender can be computed by summing up all token probabilities of names of this gender.
Dataset Structure
<!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. -->
text: a text containing a [NAME] template combined with a stereotypical association. Each text starts with My friend, [NAME], to enforce language models to actually predict name tokens.
gender: Either F (female) or M (male), i.e., the stereotypical stronger associated gender (according to GPT-4o)
reason: A reason as one of nine categories (Hobbies, Skills, Roles in Family, Physical Abilities, Social Roles, Profession, Interests)
An example of the dataset looks as follows:
json
{
"text": "My friend, [NAME], excels in the field of mechanical engineering.",
"gender": "M",
"reason": "Profession"
}
As of January 2024, Nielsen was the most common surname in Denmark. That year, 229,000 people bore the name in the country. That was around 3,000 individuals more compared to the second most popular surname, Jensen. Historically, most surnames in Denmark were created by using the patronymic tradition until hereditary surnames became mandatory in the 1820s. This was also a common tradition in some of the other Nordic countries. For Danish surnames, this meant to have the suffix -sen (son) or -datter (daughter) added to the father’s name.
Female names
The number of women in Denmark amounted to approximately 2.98 million in 2023. Among these, the most common first name was Anne, with around 44,100 women having the name that year. The name originally derived from the name Hannah or Anna. Other popular female names in Denmark were Kirsten, Mette, and Hanne.
Male names
Among the 2.95 million men lived in Denmark as of 2023, and Peter was the most frequent name. As of January 2024, around 46.500 men bore the name, which is also found in the variants Petar, Peder, and Petter. The names Jens, Michael, and Lars were also very common among the Danish men.
https://data.gov.tw/licensehttps://data.gov.tw/license
Statistical table of the number of indigenous people in New Taipei City, including data on gender and population ranking.
Francisco was the most popular first name for boys registered in Portugal in 2024, with 1,270 registrations. Lourenço followed, with 1,040 newborn baby boys under this name, while Vicente and Tomás closed the podium, with 1,036 registrations each. The names for baby girls in Portugal were dominated, in 2024, by the name Maria, which was registered 4,295 times. Alice and Benedita followed at a distance, with an average of 980 registrations each. Sinking birth rates and rising life expectancy in Portugal and throughout Europe Europe’s crude birth rate was 9.2 in 2022, having slumped when compared to previous decades. The low birth rates on the continent occurred simultaneously with an increasing life expectancy, which emphasizes the aging of the European population. Also in 2022, Portugal presented one of the continent’s lowest birth rates, namely 7.8, and the average age of women when giving birth to their first child has risen continuously over the last decade. However, since 2021 there has been a decrease. Decreasing population in Portugal, but boosting numbers of elderly people The Portuguese population is expected to decrease during the upcoming decade. As of 2035, it is predicted that Portugal’s nationals will equal to less than 10 million, almost 2.9 million of which will be 65 years of age and older. This figure presents an increase of almost 700,000 senior citizens compared to the recorded figures of 2015.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data set contains information about the surname, name and patronymic, service numbers of communication facilities of the head of the State Enterprise "Slavutsky PHC Center"
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Rank and count of the top names for baby girls, changes in rank since the previous year and breakdown by country, region, mother's age and month of birth.
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
The SALA II Spanish from Mexico database collected in Mexico was recorded within the scope of the SALA II project.The SALA II Spanish from Mexico database contains the recordings of 1,075 Mexican speakers (539 males and 536 females) recorded over the Mexican mobile telephone network.The following acoustic conditions were selected as representative of a mobile user's environment: * Passenger in moving car, railway, bus, etc. (155 speakers) * Public place (279 speakers) * Stationary pedestrian by road side (223 speakers) * Home/office environment (364 speakers) * Passenger in moving car using a hands-free kit (54 speakers) This database is distributed as 1 DVD-ROM The speech files are stored as sequences of 8-bit, 8kHz a-law speech files and are not compressed, according to the specifications of SALA II. Each prompt utterance is stored within a separate file and has an accompanying ASCII SAM label file.This speech database was validated by SPEX (the Netherlands) to assess its compliance with the SALA II format and content specifications.Each speaker uttered the following items: * 6 application words * 1 sequence of 10 isolated digits * 4 connected digits (1 sheet number -6 digits, 1 telephone number -9/11 digits, 1 credit card number -14/16 digits, 1 PIN code -6 digits) * 3 dates (1 spontaneous date e.g. birthday, 1 word style prompted date, 1 relative and general date expression) * 2 spotting phrase using an embedded application word * 2 isolated digits * 3 spelled words (1surname, 1 directory assistance city name, 1 real/artificial name for coverage) * 1 currency money amount * 1 natural number * 5 directory assistance names (1 surname out of a set of 500, 1 city of birth/growing up, 1 most frequent city out of a set of 500, 1 most frequent company/agency out of a set of 500, 1 "forename surname" out of a set of 150 ) * 2 yes/no questions (1 predominantly "yes" question, 1 predominantly "no" question) * 9 phonetically rich sentences * 2 time phrases (1 spontaneous time of day, 1word style time phrase) * 4 phonetically rich words The following age distribution has been obtained: 7 speakers are under 16, 643 speakers are between 16 and 30, 248 speakers are between 31 and 45, 169 speakers are between 46 and 60, and 8 speakers are over 60.A pronunciation lexicon with a phonemic transcription in SAMPA is also included.
This dataset contains all current and active business licenses issued by the Department of Business Affairs and Consumer Protection. This dataset contains a large number of records /rows of data and may not be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Notepad or Wordpad, to view and search.
Data fields requiring description are detailed below.
APPLICATION TYPE: 'ISSUE' is the record associated with the initial license application. 'RENEW' is a subsequent renewal record. All renewal records are created with a term start date and term expiration date. 'C_LOC' is a change of location record. It means the business moved. 'C_CAPA' is a change of capacity record. Only a few license types my file this type of application. 'C_EXPA' only applies to businesses that have liquor licenses. It means the business location expanded.
LICENSE STATUS: 'AAI' means the license was issued.
Business license owners may be accessed at: http://data.cityofchicago.org/Community-Economic-Development/Business-Owners/ezma-pppn To identify the owner of a business, you will need the account number or legal name.
Data Owner: Business Affairs and Consumer Protection
Time Period: Current
Frequency: Data is updated daily
As of January 2023, there were approximately 2.9 million men living in Denmark. Among these, roughly 47,000 men had the name Peter. It is also found in the variants Petar, Peder, and Petter. Peter was the most common male name in the country, while Michael and Lars came in second and third place.
Female names
The number of women in Denmark in 2023 amounted to approximately 2.98 million. The most common name was Anne. In this year, around 44,100 women bore the name. It originally derived from the name Hannah. In the ranking, it was followed by the names Mette and Kirsten.
Danish surnames
Most surnames in Denmark were created by using the patronymic tradition until hereditary surnames became mandatory in the 1820s. This was a common tradition in some of the Nordic countries. For Danish surnames, it meant to have the suffix -sen (son) or -datter (daughter) added to the father’s name. Due to the German influence, other names occurred for example from an occupation such as Møller (the operator of the mill), which was a common tradition for creating surnames in Germany. As of January 2023, Nielsen and Jensen were the most common Danish surnames.
This dataset represents the popular last names in the United States for Hispanic.
The different classifications of the plat name csm number text on the official map are as follows:Text: describes each plat name csm number text on the official map including location.Text_rotation: describes the orientation in degrees.Shape: describes the shape of the plat name csm number text on the official map.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The market for Name Change Services is projected to reach $X million by 2033, growing at a CAGR of X% during the forecast period. Key drivers of the market include the increasing number of marriage, divorce, and adoption cases, along with the growing awareness of the legal implications of name changes. Moreover, the emergence of online platforms that offer simplified and streamlined name change processes is further contributing to market growth. The market is segmented based on Application (Personal, Family, Enterprise), Types (Marriage Name Change, Company Name Change, Minor Name Change, Others), and Region (North America, South America, Europe, Middle East & Africa, Asia Pacific). Major companies operating in the market include HitchSwitch, LegalZoom, NewlyNamed, Update My Name, Miss Now Mrs, NameSwitch, Vakilsearch, Easy Name Change, 1st Formations, Rapid Formations, LegalDesk, I'm a Mrs, ChangeYourName, We The People, and UpdateMyName. North America holds the largest market share, followed by Europe and Asia Pacific. The market in the Asia Pacific region is anticipated to exhibit significant growth potential due to the rising population and increasing awareness of name change procedures.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Distribution of first and last name frequencies of academic authors by country.
Spreadsheet 1 contains 50 countries, with names based on affiliations in Scopus journal articles 2001-2021.
Spreadsheet 2 contains 200 countries, with names based on affiliations in Scopus journal articles 2001-2021, using a marginally updated last name extraction algorithm that is almost the same except for Dutch/Flemish names.
From the paper: Can national researcher mobility be tracked by first or last name uniqueness?
For example the distribution for the UK shows a single peak for international names, with no national names, Belgium has a national peak and an international peak, and China has mainly a national peak. The 50 countries are:
No Code Country 1 SB Serbia 2 IE Ireland 3 HU Hungary 4 CL Chile 5 CO Columbia 6 NG Nigeria 7 HK Hong Kong 8 AR Argentina 9 SG Singapore 10 NZ New Zealand 11 PK Pakistan 12 TH Thailand 13 UA Ukraine 14 SA Saudi Arabia 15 RO Israel 16 ID Indonesia 17 IL Israel 18 MY Malaysia 19 DK Denmark 20 CZ Czech Republic 21 ZA South Africa 22 AT Austria 23 FI Finland 24 PT Portugal 25 GR Greece 26 NO Norway 27 EG Egypt 28 MX Mexico 29 BE Belgium 30 CH Switzerland 31 SW Sweden 32 PL Poland 33 TW Taiwan 34 NL Netherlands 35 TK Turkey 36 IR Iran 37 RU Russia 38 AU Australia 39 BR Brazil 40 KR South Korea 41 ES Spain 42 CA Canada 43 IT France 44 FR France 45 IN India 46 DE Germany 47 US USA 48 UK UK 49 JP Japan 50 CN China