100+ datasets found

d
WGND 2.0
search.dataone.org
Updated Nov 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raffo, Julio (2023). WGND 2.0 [Dataset]. http://doi.org/10.7910/DVN/MSEGSJ
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/MSEGSJ
Dataset updated
Nov 14, 2023
Dataset provided by
Harvard Dataverse
Authors
Raffo, Julio
Area covered
Wiegand Hall
Description
This paper revisits the first World Gender Name Dictionary (WGND 1.0), allowing to disambiguate the gender in data naming physical persons (Lax Martínez et al., 2016). We discuss its advantages and limitations and propose an expansion based on updated data and additional sources. By including more than 26 million records linking given names and 195 different countries and territories, the resulting WGND 2.0 substantially increases the international coverage of its processor. As a result, it is particularly designed to be applied to intellectual property unit-record data naming inventors, designers, individual applicants and other creators disclosed in these data.
Gender by Name (Time-series)
kaggle.com
Updated Dec 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Gender by Name (Time-series) [Dataset]. https://www.kaggle.com/datasets/thedevastator/automated-gender-identification-using-name-proba/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 5, 2022
Dataset provided by
Kaggle
Authors
The Devastator
Description
Automated Gender Identification Using Name Probabilities

2019 US Social Security Administration Data

By Derek Howard [source]

About this dataset

This dataset provides an essential tool for generating gender-specific datasets from names alone. It contains information on the probability of a person's name belonging to a certain gender, based off of US Social Security records from the last century. This makes it easy to assign genders to datasets that do not natively include this data. All probability values were culled from records with 5 or more people associated with each name - so those individuals with less common monikers can still have their genders correctly predicted! With this resource, users can generate gender-aware data in no time, making gender identification in data sets more accurate and easier than ever

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset provides a helpful resource when you need to accurately identify gender from names. With this dataset, you’ll be able to quickly and accurately assign genders to datasets that contain names but no other information about the person.

To get started, you will need a csv file with two columns: name and probability. The name column should contain the first names of the people in your dataset. The probability column should contain numbers between 0 and 1 indicating the likelihood that each name is associated with one specific gender (0 for male, 1 for female).

In addition to simply assigning genders from these probabilities alone, users of this dataset also have more control over their classifications - they can use it as either a baseline or as an absolute measure of accuracy depending on their exact needs/preferences. Experimentation is highly encouraged here!
Good luck!

Research Ideas

Create gender-specific applications - tailor different apps to different genders based on the probability of a particular name belonging to a certain gender.

Generate gender neutral names - use this data to generate random names with no gender bias.

Automate record lookup - quickly and accurately assign genders based on the probability associated with their name

Acknowledgements

If you use this dataset in your research, please credit the original authors.

Data Source

License

Unknown License - Please check the dataset description for more information.

Columns

File: name_gender.csv | Column name | Description | |:----------------|:--------------------------------------------------------------------| | name | The name of the person. (String) | | gender | The gender of the person. (String) | | probability | The probability of the gender being assigned to the person. (Float) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Derek Howard.
g
The annual list of first names of newborns — city of Nancy
gimi9.com
data.europa.eu
Updated Dec 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). The annual list of first names of newborns — city of Nancy [Dataset]. https://gimi9.com/dataset/eu_5d2c2919634f41429aae86ce/
Explore at:
Dataset updated
Dec 16, 2023
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
The annual list of first names of newborns is a simple and popular dataset. These data, from the register of civil status, shall contain the following essential data: sex of the newborn, first name of the newborn, number of occurrences of the first name for the corresponding year, year of survey. The dataset consists of the list of first names of children born in Nancy since 2016, in CSV format, with the number of occurrences of each given name, classified by year and sex. The first names declared below an occurrence of five are not published, with a view to protecting personal data. The standardisation of this dataset follows the recommendations of Opendata France following the work around the Common Socle des Data Locales. Definition of headers COLL_NOM: name of the municipality COLL_INSEE: Insee code of the municipality where the first names are registered in the civil status of the place of birth. Note that the place of birth may be different from the place of residence of the parents. CHILD_SEX: Gender corresponding to first name: M or F respectively for men or women CHILD_PRENOM: first name of new born(s) recorded as first name in the civil status documents of the corresponding year. NUMBER_OCCURENCES: occurrence of first name YEAR: year of birth Total births reported to the City of Nancy 2018 Total number of births: 5135 Total number of births of girls: 2692 Total number of births of boys: 2443 2017 Total number of births: 5483 Total number of births of girls: 2704 Total number of births of boys: 2779 2016 Total number of births: 5544 Total number of births of girls: 2692 Total number of births of boys: 2852
d
Experts and scholars suggest a database of recommended names for the gender...
data.gov.tw
json
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Public Construction Commisssion, EY, Experts and scholars suggest a database of recommended names for the gender ratio review committee [Dataset]. https://data.gov.tw/en/datasets/26461
Explore at:
jsonAvailable download formats
Dataset authored and provided by
Public Construction Commisssion, EY
License
https://data.gov.tw/licensehttps://data.gov.tw/license
Description
Gender proportion of the annual expert and scholar recommendation list database review committee
Baby Names from Social Security Card Applications - National Data
catalog.data.gov
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
Updated Jul 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Social Security Administration (2025). Baby Names from Social Security Card Applications - National Data [Dataset]. https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-data
Explore at:
Dataset updated
Jul 4, 2025
Dataset provided by
Social Security Administrationhttp://ssa.gov/
Description
The data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 on.
USA Name Data
kaggle.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data.gov (2019). USA Name Data [Dataset]. https://www.kaggle.com/datasets/datagov/usa-names
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset provided by
Data.govhttps://data.gov/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Context

Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

Content

This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

https://cloud.google.com/bigquery/public-data/usa-names

Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @dcp from Unplash.

Inspiration

What are the most common names?

What are the most common female names?

Are there more female or male names?

Female names by a wide margin?
Demographics Data Package
johnsnowlabs.com
csv
Updated Jan 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Snow Labs (2021). Demographics Data Package [Dataset]. https://www.johnsnowlabs.com/marketplace/demographics-data-package/
Explore at:
csvAvailable download formats
Dataset updated
Jan 20, 2021
Dataset authored and provided by
John Snow Labs
Description
This data package consists of 26 datasets all containing statistical data relating to the population and particular groups within it belonging to different countries, mostly the United States.
Z
Database of Russian names, surnames and midnames for gender identification
data.niaid.nih.gov
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ivan Begtin (2020). Database of Russian names, surnames and midnames for gender identification [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2747010
Explore at:
Dataset updated
Jan 24, 2020
Dataset authored and provided by
Ivan Begtin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Database of names, surnames and midnames across the Russian federation used as source to teach algorithms for gender identification by fullname.

Dataset prepared for MongoDB database. It has MongoDB dump and dump of tables as JSON lines files.

Used in gender identification and fullname parsing software https://github.com/datacoon/russiannames

Available under Creative Commons CC-BY SA by default.
u
Frequency and ranking of baby names by year and gender
data.urbandatacentre.ca
open.alberta.ca
+1more
Updated Jun 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Frequency and ranking of baby names by year and gender [Dataset]. https://data.urbandatacentre.ca/dataset/ab-frequency-and-ranking-of-baby-names-by-year-and-gender
Explore at:
Dataset updated
Jun 24, 2025
Description
The frequency and ranking of first names given to babies born in the province of Alberta, by year of birth and gender of the baby.
E
ArabLEX: Database of Arab Names (DAN)
catalogue.elra.info
Updated Oct 7, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2019). ArabLEX: Database of Arab Names (DAN) [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-M0107/
Explore at:
Dataset updated
Oct 7, 2019
Dataset provided by
ELRA (European Language Resources Association)
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
License
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
Description
This database is part of the ArabLEX set of data which consists of the Database of Arabic General Vocabulary (DAG), Database of Arabic Place Names (DAP), Database of Foreign Names in Arabic (DAF) and Database of Arab Names (DAN) available from ELRA under references, respectively, ELRA-L0131, ELRA-M0105, ELRA-M0106 and ELRA-M0107.With over 218 million forms based on 100,000 lemmas, this full-form database covers Arab personal names (both given names and surnames) in both Arabic and English and contains a rich set of romanized name variants for each name with a variety of supplementary information such as gender, name type and frequency statistics. This comprehensive lexicon (over 6.4 million variants) contains precise phonemic transcriptions and vocalized Arabic for all inflected and cliticized forms for each name.This database is provided with three options: 1) proclitics, 2) phonetic information (CARS) and 3) orthographic variants. Subsets excluding some of the three proposed options may be provided upon demand. CARS is an accurate phonemic transcription. Optionally, phonetic transcriptions, IPA and/or SAMPA, can be provided, fine tuned to a customer's specifications.Quantity and size: 218,215,875 lines / 32,659 MB (31.9 GB)File format: flat TSV text filesSamples and a specifications document available upon request.
Baby Names DataSet
kaggle.com
Updated Mar 21, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samrat Rai (2019). Baby Names DataSet [Dataset]. https://www.kaggle.com/samrat77/baby-names-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 21, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Samrat Rai
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

There's a story behind every dataset and here's your opportunity to share yours.

Content

What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

Acknowledgements

We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

Inspiration

Your data will be in front of the world's largest data science community. What questions do you want to see answered?
E
Database of Chinese Names
catalog.elra.info
live.european-language-grid.eu
Updated Oct 7, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2019). Database of Chinese Names [Dataset]. https://catalog.elra.info/en-us/repository/browse/ELRA-L0129/
Explore at:
Dataset updated
Oct 7, 2019
Dataset provided by
ELRA (European Language Resources Association)
ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
License
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf
https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf
Area covered
China
Description
Chinese name components, accompanied by accurate pinyin readings, gender codes, and flags denoting whether name is a given name, surname, or both.
N
Popular Baby Names
data.cityofnewyork.us
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
+4more
application/rdfxml +5
Updated Jun 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Health and Mental Hygiene (DOHMH) (2025). Popular Baby Names [Dataset]. https://data.cityofnewyork.us/Health/Popular-Baby-Names/25th-nujf
Explore at:
csv, tsv, application/rdfxml, application/rssxml, xml, jsonAvailable download formats
Dataset updated
Jun 8, 2025
Dataset authored and provided by
Department of Health and Mental Hygiene (DOHMH)
Description
Popular Baby Names by Sex and Ethnic Group Data were collected through civil birth registration. Each record represents the ranking of a baby name in the order of frequency. Data can be used to represent the popularity of a name. Caution should be used when assessing the rank of a baby name if the frequency count is close to 10; the ranking may vary year to year.
o
Gender and Ethnicity Predictions for California City Council Members and...
openicpsr.org
dataverse.harvard.edu
delimited
Updated Oct 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rohan M. Dalal (2024). Gender and Ethnicity Predictions for California City Council Members and School Board Members, 2010-2023 [Dataset]. http://doi.org/10.3886/E209861V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E209861V1
Dataset updated
Oct 24, 2024
Dataset provided by
Crystal Springs Uplands School
Authors
Rohan M. Dalal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2010 - 2023
Area covered
California City, California
Description
To conduct this study, I sourced demographic data from 2010 to 2023 from the California Elections Data Archive (CEDA) for city council members and school board members. The CEDA data provide a full list of candidate names and the number of votes a given candidate received for every city council and school board election. I assigned the gender to each candidate based on the lists of popular male and female names provided by the Social Security Administration. Since the average age of city council members is 46 years old according to the Bureau of Labor Statistics, I compiled a list of popular male and female given names for babies born in the 1960s, 1970s, and 1980s. Then, I automated the gender classification as follows: for example, as “Lisa” is identified as a popular female given name by the Social Security Administration, every candidate whose first name is “Lisa” was assigned “female” in our dataset. For a gender-neutral name that appeared on the lists for both male and female given names, which included “Alex” and “Casey,” I used the following keywords “[first name] [last name] [office type (either “city council” or “school board”)] [name of the city or the school district]” to search for more information about the official’s gender online. My search returned either a picture to help clearly identify the official’s gender and/or an article that refers to the official with gendered pronouns. To identify the ethnicity of each elected official, I used the 2010 Census data and the 23AndMe Surname Discovery Tool. The 2010 Census lists surnames occurring at least 100 times, and it includes self-reported ethnicity data for individuals with a given surname. Similarly, the 23AndMe Surname Discovery Tool gives the percentage of individuals with the given surname who identify as each of four different ethnicity groups: Hispanic, White, Asian/Pacific Islander, and Black based on the 2010 US Census data. For surnames that did not appear on either the 2010 Census data or the 23AndMe Surname Discovery Tool, I used Python’s Ethnicolr library, which bases its prediction of ethnicity using either both first and last name or just the last name on the US census data (2000 and 2010), the Florida voting registration data, and the Wikipedia data.
H
Data from: Signaling Race, Ethnicity, and Gender with Names: Challenges and...
dataverse.harvard.edu
search.dataone.org
Updated Mar 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Elizabeth Elder (2023). Signaling Race, Ethnicity, and Gender with Names: Challenges and Recommendations. [Dataset]. http://doi.org/10.7910/DVN/47CZDX
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/47CZDX
Dataset updated
Mar 13, 2023
Dataset provided by
Harvard Dataverse
Authors
Elizabeth Elder
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Data on perceived characteristics of first and last names. Forthcoming at the Journal of Politics; this Dataverse will be deleted when the official JOP replication archive is made available.
Data from: Gender Detection
kaggle.com
Updated Sep 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cyber Cop (2021). Gender Detection [Dataset]. https://www.kaggle.com/subhajournal/gender-detection
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 19, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Cyber Cop
License
http://www.gnu.org/licenses/agpl-3.0.htmlhttp://www.gnu.org/licenses/agpl-3.0.html
Description
Dataset

This dataset was created by Cyber Cop

Released under GNU Affero General Public License 3.0

Contents
g
Name linguistic data | gimi9.com
gimi9.com
Updated Jun 21, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Name linguistic data | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_https-data-gov-lt-datasets-2664-/
Explore at:
Dataset updated
Jun 21, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The State List of Citizens' NAMES of the Republic of Lithuania of the Lithuanian Language Commission (VLKK), as a source of the data set, is created according to the names of persons who held citizenship in 2006 of the Population Register and continues to be filled in with the names of newborns. The collection has been compiled since 2010 and is updated every 3 months. Data include full name, number of names (comity), gender of name, normality, whether it is the name of the saint, date of name input, latest renewal and search dates, number of searches, and groups of origin, subgroups, and subdivisions of up to 5 names each.
I
Genni + Ethnea for the Author-ity 2009 dataset
databank.illinois.edu
search.datacite.org
Updated Apr 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vetle Torvik (2024). Genni + Ethnea for the Author-ity 2009 dataset [Dataset]. http://doi.org/10.13012/B2IDB-9087546_V1
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-9087546_V1
Dataset updated
Apr 18, 2024
Authors
Vetle Torvik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset funded by
U.S. National Institutes of Health (NIH)
U.S. National Science Foundation (NSF)
Description
Prepared by Vetle Torvik 2018-04-15 The dataset comes as a single tab-delimited ASCII encoded file, and should be about 717MB uncompressed. • How was the dataset created? First and last names of authors in the Author-ity 2009 dataset was processed through several tools to predict ethnicities and gender, including Ethnea+Genni as described in: Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geocoded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington, DC, USA. http://hdl.handle.net/2142/88927 Smith, B., Singh, M., & Torvik, V. (2013). A search engine approach to estimating temporal changes in gender orientation of first names. Proceedings Of The ACM/IEEE Joint Conference On Digital Libraries, (JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries), 199-208. doi:10.1145/2467696.2467720 EthnicSeer: http://singularity.ist.psu.edu/ethnicity Treeratpituk P, Giles CL (2012). Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching. Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (pp. 1141-1147). AAAI-12. Toronto, ON, Canada SexMachine 0.1.1: https://pypi.org/project/SexMachine First names, for some Author-ity records lacking them, were harvested from outside bibliographic databases. • The code and back-end data is periodically updated and made available for query at Torvik Research Group • What is the format of the dataset? The dataset contains 9,300,182 rows and 10 columns 1. auid: unique ID for Authors in Author-ity 2009 (PMID_authorposition) 2. name: full name used as input to EthnicSeer) 3. EthnicSeer: predicted ethnicity; ARA, CHI, ENG, FRN, GER, IND, ITA, JAP, KOR, RUS, SPA, VIE, XXX 4. prop: decimal between 0 and 1 reflecting the confidence of the EthnicSeer prediction 5. lastname: used as input for Ethnea+Genni 6. firstname: used as input for Ethnea+Genni 7. Ethnea: predicted ethnicity; either one of 26 (AFRICAN, ARAB, BALTIC, CARIBBEAN, CHINESE, DUTCH, ENGLISH, FRENCH, GERMAN, GREEK, HISPANIC, HUNGARIAN, INDIAN, INDONESIAN, ISRAELI, ITALIAN, JAPANESE, KOREAN, MONGOLIAN, NORDIC, POLYNESIAN, ROMANIAN, SLAV, THAI, TURKISH, VIETNAMESE) or two ethnicities (e.g., SLAV-ENGLISH), or UNKNOWN (if no one or two dominant predictons), or TOOSHORT (if both first and last name are too short) 8. Genni: predicted gender; 'F', 'M', or '-' 9. SexMac: predicted gender based on third-party Python program (default settings except case_sensitive=False); female, mostly_female, andy, mostly_male, male) 10. SSNgender: predicted gender based on US SSN data; 'F', 'M', or '-'
e
List of Common First Names 2017
data.europa.eu
ckan.mobidatalab.eu
csv, pdf
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Landesamt für Bürger- und Ordnungsangelegenheiten, List of Common First Names 2017 [Dataset]. https://data.europa.eu/data/datasets/ff105c67-6fb2-46da-9eac-15c730be8921
Explore at:
pdf, csvAvailable download formats
Dataset authored and provided by
Landesamt für Bürger- und Ordnungsangelegenheiten
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The list of the most frequently given first names, separated by gender and broken down by districts. In contrast to previous years, the position is also indicated for several first names. The position does not allow any conclusions about the call name.

All available years of first name data are also available at https://github.com/berlinonline/haeufige-vornamen-berlin.
d
Data from: Double-blind review favours increased representation of female...
dataone.org
data.nceas.ucsb.edu
+2more
Updated Jan 6, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amber Budden (2015). Double-blind review favours increased representation of female authors [Dataset]. http://doi.org/10.5063/AA/xhan.4.1
Explore at:
Unique identifier
https://doi.org/10.5063/AA/xhan.4.1
Dataset updated
Jan 6, 2015
Dataset provided by
Knowledge Network for Biocomplexity
Authors
Amber Budden
Time period covered
Jan 1, 1997 - Jan 1, 2005
Variables measured
BP, PY, VL, Title, Authors, Journal, FA Gender, Pre/Post 2001, Review policy
Description
Double-blind peer review, in which neither author nor reviewer are identified, is rarely practised in ecology or evolution journals. Most journals in the field of ecology practice single-blind reviews in which the reviewer but not the author identity is concealed. In 2001, however, double-blind review was introduced by the journal Behavioral Ecology. A database of all papers published in BE between 1997 and 2005 (n=867) was generated (the year 2001 was omitted to accomodate the change in editorial policy). For each paper, gender was assignmed to the first author using first names. Gender was classified as "unknown" if the author provided only initials, if the name was gender neutral or if the name could not be assigned to either gender. The same data was gathered from an out-group set of primary research journals listed by ISI as being in the category of "Ecology" or "Evolutionary Biology" with a 2004 impact factor of 2.0-2.5 (similar to that of BE). This provided an additional five journals: Behavioral Ecology and Sociobiology (BES; n=1040), Animal Behavior (AB; n=2178), Journal of Biogeography (JB; n=1040), Biological Conservation (BC; n=1719), and Landscape Ecology (LE; n=419). Missing data from complete issues omitted from the table of contents were inserted using ISI (JB and LE; four issues). This study showed that following the policy change to double-blind peer reviews, there was a significant increase in female first-authored papers.

Facebook

Twitter

Click to copy link

Link copied

Cite

Raffo, Julio (2023). WGND 2.0 [Dataset]. http://doi.org/10.7910/DVN/MSEGSJ

WGND 2.0

Explore at:

30 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://doi.org/10.7910/DVN/MSEGSJ

Dataset updated

Nov 14, 2023

Dataset provided by

Harvard Dataverse

Authors

Raffo, Julio

Area covered

Wiegand Hall

Description

This paper revisits the first World Gender Name Dictionary (WGND 1.0), allowing to disambiguate the gender in data naming physical persons (Lax Martínez et al., 2016). We discuss its advantages and limitations and propose an expansion based on updated data and additional sources. By including more than 26 million records linking given names and 195 different countries and territories, the resulting WGND 2.0 substantially increases the international coverage of its processor. As a result, it is particularly designed to be applied to intellectual property unit-record data naming inventors, designers, individual applicants and other creators disclosed in these data.

Clear search

Close search

Google apps

Main menu

WGND 2.0

Gender by Name (Time-series)

Automated Gender Identification Using Name Probabilities

2019 US Social Security Administration Data

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

The annual list of first names of newborns — city of Nancy

Experts and scholars suggest a database of recommended names for the gender...

Baby Names from Social Security Card Applications - National Data

USA Name Data

Context

Content

Acknowledgements

Inspiration

Demographics Data Package

Database of Russian names, surnames and midnames for gender identification

Frequency and ranking of baby names by year and gender

ArabLEX: Database of Arab Names (DAN)

Baby Names DataSet

Context

Content

Acknowledgements

Inspiration

Database of Chinese Names

Popular Baby Names

Gender and Ethnicity Predictions for California City Council Members and...

Data from: Signaling Race, Ethnicity, and Gender with Names: Challenges and...

Data from: Gender Detection

Dataset

Contents

Name linguistic data | gimi9.com

Genni + Ethnea for the Author-ity 2009 dataset

List of Common First Names 2017

Data from: Double-blind review favours increased representation of female...

WGND 2.0See More Versions

WGND 2.0