100+ datasets found
  1. d

    WGND 2.0

    • search.dataone.org
    Updated Nov 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Raffo, Julio (2023). WGND 2.0 [Dataset]. http://doi.org/10.7910/DVN/MSEGSJ
    Explore at:
    Dataset updated
    Nov 14, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Raffo, Julio
    Area covered
    Wiegand Hall
    Description

    This paper revisits the first World Gender Name Dictionary (WGND 1.0), allowing to disambiguate the gender in data naming physical persons (Lax Martínez et al., 2016). We discuss its advantages and limitations and propose an expansion based on updated data and additional sources. By including more than 26 million records linking given names and 195 different countries and territories, the resulting WGND 2.0 substantially increases the international coverage of its processor. As a result, it is particularly designed to be applied to intellectual property unit-record data naming inventors, designers, individual applicants and other creators disclosed in these data.

  2. Gender by Name (Time-series)

    • kaggle.com
    Updated Dec 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Gender by Name (Time-series) [Dataset]. https://www.kaggle.com/datasets/thedevastator/automated-gender-identification-using-name-proba/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 5, 2022
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    Description

    Automated Gender Identification Using Name Probabilities

    2019 US Social Security Administration Data

    By Derek Howard [source]

    About this dataset

    This dataset provides an essential tool for generating gender-specific datasets from names alone. It contains information on the probability of a person's name belonging to a certain gender, based off of US Social Security records from the last century. This makes it easy to assign genders to datasets that do not natively include this data. All probability values were culled from records with 5 or more people associated with each name - so those individuals with less common monikers can still have their genders correctly predicted! With this resource, users can generate gender-aware data in no time, making gender identification in data sets more accurate and easier than ever

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides a helpful resource when you need to accurately identify gender from names. With this dataset, you’ll be able to quickly and accurately assign genders to datasets that contain names but no other information about the person.

    To get started, you will need a csv file with two columns: name and probability. The name column should contain the first names of the people in your dataset. The probability column should contain numbers between 0 and 1 indicating the likelihood that each name is associated with one specific gender (0 for male, 1 for female).

    In addition to simply assigning genders from these probabilities alone, users of this dataset also have more control over their classifications - they can use it as either a baseline or as an absolute measure of accuracy depending on their exact needs/preferences. Experimentation is highly encouraged here!
    Good luck!

    Research Ideas

    • Create gender-specific applications - tailor different apps to different genders based on the probability of a particular name belonging to a certain gender.

    • Generate gender neutral names - use this data to generate random names with no gender bias.

    • Automate record lookup - quickly and accurately assign genders based on the probability associated with their name

    Acknowledgements

    If you use this dataset in your research, please credit the original authors.

    Data Source

    License

    Unknown License - Please check the dataset description for more information.

    Columns

    File: name_gender.csv | Column name | Description | |:----------------|:--------------------------------------------------------------------| | name | The name of the person. (String) | | gender | The gender of the person. (String) | | probability | The probability of the gender being assigned to the person. (Float) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Derek Howard.

  3. g

    The annual list of first names of newborns — city of Nancy

    • gimi9.com
    • data.europa.eu
    Updated Dec 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). The annual list of first names of newborns — city of Nancy [Dataset]. https://gimi9.com/dataset/eu_5d2c2919634f41429aae86ce/
    Explore at:
    Dataset updated
    Dec 16, 2023
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    The annual list of first names of newborns is a simple and popular dataset. These data, from the register of civil status, shall contain the following essential data: sex of the newborn, first name of the newborn, number of occurrences of the first name for the corresponding year, year of survey. The dataset consists of the list of first names of children born in Nancy since 2016, in CSV format, with the number of occurrences of each given name, classified by year and sex. The first names declared below an occurrence of five are not published, with a view to protecting personal data. The standardisation of this dataset follows the recommendations of Opendata France following the work around the Common Socle des Data Locales. Definition of headers COLL_NOM: name of the municipality COLL_INSEE: Insee code of the municipality where the first names are registered in the civil status of the place of birth. Note that the place of birth may be different from the place of residence of the parents. CHILD_SEX: Gender corresponding to first name: M or F respectively for men or women CHILD_PRENOM: first name of new born(s) recorded as first name in the civil status documents of the corresponding year. NUMBER_OCCURENCES: occurrence of first name YEAR: year of birth Total births reported to the City of Nancy 2018 Total number of births: 5135 Total number of births of girls: 2692 Total number of births of boys: 2443 2017 Total number of births: 5483 Total number of births of girls: 2704 Total number of births of boys: 2779 2016 Total number of births: 5544 Total number of births of girls: 2692 Total number of births of boys: 2852

  4. d

    Experts and scholars suggest a database of recommended names for the gender...

    • data.gov.tw
    json
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public Construction Commisssion, EY, Experts and scholars suggest a database of recommended names for the gender ratio review committee [Dataset]. https://data.gov.tw/en/datasets/26461
    Explore at:
    jsonAvailable download formats
    Dataset authored and provided by
    Public Construction Commisssion, EY
    License

    https://data.gov.tw/licensehttps://data.gov.tw/license

    Description

    Gender proportion of the annual expert and scholar recommendation list database review committee

  5. Baby Names from Social Security Card Applications - National Data

    • catalog.data.gov
    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    Updated Jul 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social Security Administration (2025). Baby Names from Social Security Card Applications - National Data [Dataset]. https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-data
    Explore at:
    Dataset updated
    Jul 4, 2025
    Dataset provided by
    Social Security Administrationhttp://ssa.gov/
    Description

    The data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 on.

  6. USA Name Data

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data.gov (2019). USA Name Data [Dataset]. https://www.kaggle.com/datasets/datagov/usa-names
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    Data.govhttps://data.gov/
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    Context

    Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

    Content

    This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

    All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

    https://cloud.google.com/bigquery/public-data/usa-names

    Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by @dcp from Unplash.

    Inspiration

    What are the most common names?

    What are the most common female names?

    Are there more female or male names?

    Female names by a wide margin?

  7. Demographics Data Package

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). Demographics Data Package [Dataset]. https://www.johnsnowlabs.com/marketplace/demographics-data-package/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Description

    This data package consists of 26 datasets all containing statistical data relating to the population and particular groups within it belonging to different countries, mostly the United States.

  8. Z

    Database of Russian names, surnames and midnames for gender identification

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ivan Begtin (2020). Database of Russian names, surnames and midnames for gender identification [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2747010
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset authored and provided by
    Ivan Begtin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Database of names, surnames and midnames across the Russian federation used as source to teach algorithms for gender identification by fullname.

    Dataset prepared for MongoDB database. It has MongoDB dump and dump of tables as JSON lines files.

    Used in gender identification and fullname parsing software https://github.com/datacoon/russiannames

    Available under Creative Commons CC-BY SA by default.

  9. u

    Frequency and ranking of baby names by year and gender

    • data.urbandatacentre.ca
    • open.alberta.ca
    • +1more
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Frequency and ranking of baby names by year and gender [Dataset]. https://data.urbandatacentre.ca/dataset/ab-frequency-and-ranking-of-baby-names-by-year-and-gender
    Explore at:
    Dataset updated
    Jun 24, 2025
    Description

    The frequency and ranking of first names given to babies born in the province of Alberta, by year of birth and gender of the baby.

  10. E

    ArabLEX: Database of Arab Names (DAN)

    • catalogue.elra.info
    Updated Oct 7, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2019). ArabLEX: Database of Arab Names (DAN) [Dataset]. https://catalogue.elra.info/en-us/repository/browse/ELRA-M0107/
    Explore at:
    Dataset updated
    Oct 7, 2019
    Dataset provided by
    ELRA (European Language Resources Association)
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    License

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    https://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalogue.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    Description

    This database is part of the ArabLEX set of data which consists of the Database of Arabic General Vocabulary (DAG), Database of Arabic Place Names (DAP), Database of Foreign Names in Arabic (DAF) and Database of Arab Names (DAN) available from ELRA under references, respectively, ELRA-L0131, ELRA-M0105, ELRA-M0106 and ELRA-M0107.With over 218 million forms based on 100,000 lemmas, this full-form database covers Arab personal names (both given names and surnames) in both Arabic and English and contains a rich set of romanized name variants for each name with a variety of supplementary information such as gender, name type and frequency statistics. This comprehensive lexicon (over 6.4 million variants) contains precise phonemic transcriptions and vocalized Arabic for all inflected and cliticized forms for each name.This database is provided with three options: 1) proclitics, 2) phonetic information (CARS) and 3) orthographic variants. Subsets excluding some of the three proposed options may be provided upon demand. CARS is an accurate phonemic transcription. Optionally, phonetic transcriptions, IPA and/or SAMPA, can be provided, fine tuned to a customer's specifications.Quantity and size: 218,215,875 lines / 32,659 MB (31.9 GB)File format: flat TSV text filesSamples and a specifications document available upon request.

  11. Baby Names DataSet

    • kaggle.com
    Updated Mar 21, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samrat Rai (2019). Baby Names DataSet [Dataset]. https://www.kaggle.com/samrat77/baby-names-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 21, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Samrat Rai
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    There's a story behind every dataset and here's your opportunity to share yours.

    Content

    What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

    Acknowledgements

    We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  12. E

    Database of Chinese Names

    • catalog.elra.info
    • live.european-language-grid.eu
    Updated Oct 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency) (2019). Database of Chinese Names [Dataset]. https://catalog.elra.info/en-us/repository/browse/ELRA-L0129/
    Explore at:
    Dataset updated
    Oct 7, 2019
    Dataset provided by
    ELRA (European Language Resources Association)
    ELRA (European Language Resources Association) and its operational body ELDA (Evaluations and Language resources Distribution Agency)
    License

    https://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_END_USER.pdf

    https://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdfhttps://catalog.elra.info/static/from_media/metashare/licences/ELRA_VAR.pdf

    Area covered
    China
    Description

    Chinese name components, accompanied by accurate pinyin readings, gender codes, and flags denoting whether name is a given name, surname, or both.

  13. N

    Popular Baby Names

    • data.cityofnewyork.us
    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    • +4more
    application/rdfxml +5
    Updated Jun 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Health and Mental Hygiene (DOHMH) (2025). Popular Baby Names [Dataset]. https://data.cityofnewyork.us/Health/Popular-Baby-Names/25th-nujf
    Explore at:
    csv, tsv, application/rdfxml, application/rssxml, xml, jsonAvailable download formats
    Dataset updated
    Jun 8, 2025
    Dataset authored and provided by
    Department of Health and Mental Hygiene (DOHMH)
    Description

    Popular Baby Names by Sex and Ethnic Group Data were collected through civil birth registration. Each record represents the ranking of a baby name in the order of frequency. Data can be used to represent the popularity of a name. Caution should be used when assessing the rank of a baby name if the frequency count is close to 10; the ranking may vary year to year.

  14. o

    Gender and Ethnicity Predictions for California City Council Members and...

    • openicpsr.org
    • dataverse.harvard.edu
    delimited
    Updated Oct 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rohan M. Dalal (2024). Gender and Ethnicity Predictions for California City Council Members and School Board Members, 2010-2023 [Dataset]. http://doi.org/10.3886/E209861V1
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Oct 24, 2024
    Dataset provided by
    Crystal Springs Uplands School
    Authors
    Rohan M. Dalal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2010 - 2023
    Area covered
    California City, California
    Description

    To conduct this study, I sourced demographic data from 2010 to 2023 from the California Elections Data Archive (CEDA) for city council members and school board members. The CEDA data provide a full list of candidate names and the number of votes a given candidate received for every city council and school board election. I assigned the gender to each candidate based on the lists of popular male and female names provided by the Social Security Administration. Since the average age of city council members is 46 years old according to the Bureau of Labor Statistics, I compiled a list of popular male and female given names for babies born in the 1960s, 1970s, and 1980s. Then, I automated the gender classification as follows: for example, as “Lisa” is identified as a popular female given name by the Social Security Administration, every candidate whose first name is “Lisa” was assigned “female” in our dataset. For a gender-neutral name that appeared on the lists for both male and female given names, which included “Alex” and “Casey,” I used the following keywords “[first name] [last name] [office type (either “city council” or “school board”)] [name of the city or the school district]” to search for more information about the official’s gender online. My search returned either a picture to help clearly identify the official’s gender and/or an article that refers to the official with gendered pronouns. To identify the ethnicity of each elected official, I used the 2010 Census data and the 23AndMe Surname Discovery Tool. The 2010 Census lists surnames occurring at least 100 times, and it includes self-reported ethnicity data for individuals with a given surname. Similarly, the 23AndMe Surname Discovery Tool gives the percentage of individuals with the given surname who identify as each of four different ethnicity groups: Hispanic, White, Asian/Pacific Islander, and Black based on the 2010 US Census data. For surnames that did not appear on either the 2010 Census data or the 23AndMe Surname Discovery Tool, I used Python’s Ethnicolr library, which bases its prediction of ethnicity using either both first and last name or just the last name on the US census data (2000 and 2010), the Florida voting registration data, and the Wikipedia data.

  15. H

    Data from: Signaling Race, Ethnicity, and Gender with Names: Challenges and...

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Mar 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Elizabeth Elder (2023). Signaling Race, Ethnicity, and Gender with Names: Challenges and Recommendations. [Dataset]. http://doi.org/10.7910/DVN/47CZDX
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 13, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Elizabeth Elder
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Data on perceived characteristics of first and last names. Forthcoming at the Journal of Politics; this Dataverse will be deleted when the official JOP replication archive is made available.

  16. Data from: Gender Detection

    • kaggle.com
    Updated Sep 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cyber Cop (2021). Gender Detection [Dataset]. https://www.kaggle.com/subhajournal/gender-detection
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 19, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Cyber Cop
    License

    http://www.gnu.org/licenses/agpl-3.0.htmlhttp://www.gnu.org/licenses/agpl-3.0.html

    Description

    Dataset

    This dataset was created by Cyber Cop

    Released under GNU Affero General Public License 3.0

    Contents

  17. g

    Name linguistic data | gimi9.com

    • gimi9.com
    Updated Jun 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Name linguistic data | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_https-data-gov-lt-datasets-2664-/
    Explore at:
    Dataset updated
    Jun 21, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The State List of Citizens' NAMES of the Republic of Lithuania of the Lithuanian Language Commission (VLKK), as a source of the data set, is created according to the names of persons who held citizenship in 2006 of the Population Register and continues to be filled in with the names of newborns. The collection has been compiled since 2010 and is updated every 3 months. Data include full name, number of names (comity), gender of name, normality, whether it is the name of the saint, date of name input, latest renewal and search dates, number of searches, and groups of origin, subgroups, and subdivisions of up to 5 names each.

  18. I

    Genni + Ethnea for the Author-ity 2009 dataset

    • databank.illinois.edu
    • search.datacite.org
    Updated Apr 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vetle Torvik (2024). Genni + Ethnea for the Author-ity 2009 dataset [Dataset]. http://doi.org/10.13012/B2IDB-9087546_V1
    Explore at:
    Dataset updated
    Apr 18, 2024
    Authors
    Vetle Torvik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Dataset funded by
    U.S. National Institutes of Health (NIH)
    U.S. National Science Foundation (NSF)
    Description

    Prepared by Vetle Torvik 2018-04-15 The dataset comes as a single tab-delimited ASCII encoded file, and should be about 717MB uncompressed. • How was the dataset created? First and last names of authors in the Author-ity 2009 dataset was processed through several tools to predict ethnicities and gender, including Ethnea+Genni as described in: Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geocoded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington, DC, USA. http://hdl.handle.net/2142/88927 Smith, B., Singh, M., & Torvik, V. (2013). A search engine approach to estimating temporal changes in gender orientation of first names. Proceedings Of The ACM/IEEE Joint Conference On Digital Libraries, (JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries), 199-208. doi:10.1145/2467696.2467720 EthnicSeer: http://singularity.ist.psu.edu/ethnicity Treeratpituk P, Giles CL (2012). Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching. Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (pp. 1141-1147). AAAI-12. Toronto, ON, Canada SexMachine 0.1.1: https://pypi.org/project/SexMachine First names, for some Author-ity records lacking them, were harvested from outside bibliographic databases. • The code and back-end data is periodically updated and made available for query at Torvik Research Group • What is the format of the dataset? The dataset contains 9,300,182 rows and 10 columns 1. auid: unique ID for Authors in Author-ity 2009 (PMID_authorposition) 2. name: full name used as input to EthnicSeer) 3. EthnicSeer: predicted ethnicity; ARA, CHI, ENG, FRN, GER, IND, ITA, JAP, KOR, RUS, SPA, VIE, XXX 4. prop: decimal between 0 and 1 reflecting the confidence of the EthnicSeer prediction 5. lastname: used as input for Ethnea+Genni 6. firstname: used as input for Ethnea+Genni 7. Ethnea: predicted ethnicity; either one of 26 (AFRICAN, ARAB, BALTIC, CARIBBEAN, CHINESE, DUTCH, ENGLISH, FRENCH, GERMAN, GREEK, HISPANIC, HUNGARIAN, INDIAN, INDONESIAN, ISRAELI, ITALIAN, JAPANESE, KOREAN, MONGOLIAN, NORDIC, POLYNESIAN, ROMANIAN, SLAV, THAI, TURKISH, VIETNAMESE) or two ethnicities (e.g., SLAV-ENGLISH), or UNKNOWN (if no one or two dominant predictons), or TOOSHORT (if both first and last name are too short) 8. Genni: predicted gender; 'F', 'M', or '-' 9. SexMac: predicted gender based on third-party Python program (default settings except case_sensitive=False); female, mostly_female, andy, mostly_male, male) 10. SSNgender: predicted gender based on US SSN data; 'F', 'M', or '-'

  19. e

    List of Common First Names 2017

    • data.europa.eu
    • ckan.mobidatalab.eu
    csv, pdf
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Landesamt für Bürger- und Ordnungsangelegenheiten, List of Common First Names 2017 [Dataset]. https://data.europa.eu/data/datasets/ff105c67-6fb2-46da-9eac-15c730be8921
    Explore at:
    pdf, csvAvailable download formats
    Dataset authored and provided by
    Landesamt für Bürger- und Ordnungsangelegenheiten
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The list of the most frequently given first names, separated by gender and broken down by districts. In contrast to previous years, the position is also indicated for several first names. The position does not allow any conclusions about the call name.

    All available years of first name data are also available at https://github.com/berlinonline/haeufige-vornamen-berlin.

  20. d

    Data from: Double-blind review favours increased representation of female...

    • dataone.org
    • data.nceas.ucsb.edu
    • +2more
    Updated Jan 6, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amber Budden (2015). Double-blind review favours increased representation of female authors [Dataset]. http://doi.org/10.5063/AA/xhan.4.1
    Explore at:
    Dataset updated
    Jan 6, 2015
    Dataset provided by
    Knowledge Network for Biocomplexity
    Authors
    Amber Budden
    Time period covered
    Jan 1, 1997 - Jan 1, 2005
    Variables measured
    BP, PY, VL, Title, Authors, Journal, FA Gender, Pre/Post 2001, Review policy
    Description

    Double-blind peer review, in which neither author nor reviewer are identified, is rarely practised in ecology or evolution journals. Most journals in the field of ecology practice single-blind reviews in which the reviewer but not the author identity is concealed. In 2001, however, double-blind review was introduced by the journal Behavioral Ecology. A database of all papers published in BE between 1997 and 2005 (n=867) was generated (the year 2001 was omitted to accomodate the change in editorial policy). For each paper, gender was assignmed to the first author using first names. Gender was classified as "unknown" if the author provided only initials, if the name was gender neutral or if the name could not be assigned to either gender. The same data was gathered from an out-group set of primary research journals listed by ISI as being in the category of "Ecology" or "Evolutionary Biology" with a 2004 impact factor of 2.0-2.5 (similar to that of BE). This provided an additional five journals: Behavioral Ecology and Sociobiology (BES; n=1040), Animal Behavior (AB; n=2178), Journal of Biogeography (JB; n=1040), Biological Conservation (BC; n=1719), and Landscape Ecology (LE; n=419). Missing data from complete issues omitted from the table of contents were inserted using ISI (JB and LE; four issues). This study showed that following the policy change to double-blind peer reviews, there was a significant increase in female first-authored papers.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Raffo, Julio (2023). WGND 2.0 [Dataset]. http://doi.org/10.7910/DVN/MSEGSJ

WGND 2.0

Explore at:
30 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Nov 14, 2023
Dataset provided by
Harvard Dataverse
Authors
Raffo, Julio
Area covered
Wiegand Hall
Description

This paper revisits the first World Gender Name Dictionary (WGND 1.0), allowing to disambiguate the gender in data naming physical persons (Lax Martínez et al., 2016). We discuss its advantages and limitations and propose an expansion based on updated data and additional sources. By including more than 26 million records linking given names and 195 different countries and territories, the resulting WGND 2.0 substantially increases the international coverage of its processor. As a result, it is particularly designed to be applied to intellectual property unit-record data naming inventors, designers, individual applicants and other creators disclosed in these data.

Search
Clear search
Close search
Google apps
Main menu