3 datasets found

I
Genni + Ethnea for the Author-ity 2009 dataset
databank.illinois.edu
search.datacite.org
Updated Apr 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vetle Torvik (2024). Genni + Ethnea for the Author-ity 2009 dataset [Dataset]. http://doi.org/10.13012/B2IDB-9087546_V1
Explore at:
Unique identifier
https://doi.org/10.13012/B2IDB-9087546_V1
Dataset updated
Apr 18, 2024
Authors
Vetle Torvik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset funded by
U.S. National Institutes of Health (NIH)
U.S. National Science Foundation (NSF)
Description
Prepared by Vetle Torvik 2018-04-15 The dataset comes as a single tab-delimited ASCII encoded file, and should be about 717MB uncompressed. • How was the dataset created? First and last names of authors in the Author-ity 2009 dataset was processed through several tools to predict ethnicities and gender, including Ethnea+Genni as described in: Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geocoded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington, DC, USA. http://hdl.handle.net/2142/88927 Smith, B., Singh, M., & Torvik, V. (2013). A search engine approach to estimating temporal changes in gender orientation of first names. Proceedings Of The ACM/IEEE Joint Conference On Digital Libraries, (JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries), 199-208. doi:10.1145/2467696.2467720 EthnicSeer: http://singularity.ist.psu.edu/ethnicity Treeratpituk P, Giles CL (2012). Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching. Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (pp. 1141-1147). AAAI-12. Toronto, ON, Canada SexMachine 0.1.1: https://pypi.org/project/SexMachine First names, for some Author-ity records lacking them, were harvested from outside bibliographic databases. • The code and back-end data is periodically updated and made available for query at Torvik Research Group • What is the format of the dataset? The dataset contains 9,300,182 rows and 10 columns 1. auid: unique ID for Authors in Author-ity 2009 (PMID_authorposition) 2. name: full name used as input to EthnicSeer) 3. EthnicSeer: predicted ethnicity; ARA, CHI, ENG, FRN, GER, IND, ITA, JAP, KOR, RUS, SPA, VIE, XXX 4. prop: decimal between 0 and 1 reflecting the confidence of the EthnicSeer prediction 5. lastname: used as input for Ethnea+Genni 6. firstname: used as input for Ethnea+Genni 7. Ethnea: predicted ethnicity; either one of 26 (AFRICAN, ARAB, BALTIC, CARIBBEAN, CHINESE, DUTCH, ENGLISH, FRENCH, GERMAN, GREEK, HISPANIC, HUNGARIAN, INDIAN, INDONESIAN, ISRAELI, ITALIAN, JAPANESE, KOREAN, MONGOLIAN, NORDIC, POLYNESIAN, ROMANIAN, SLAV, THAI, TURKISH, VIETNAMESE) or two ethnicities (e.g., SLAV-ENGLISH), or UNKNOWN (if no one or two dominant predictons), or TOOSHORT (if both first and last name are too short) 8. Genni: predicted gender; 'F', 'M', or '-' 9. SexMac: predicted gender based on third-party Python program (default settings except case_sensitive=False); female, mostly_female, andy, mostly_male, male) 10. SSNgender: predicted gender based on US SSN data; 'F', 'M', or '-'
u
Data from: Working groups, gender and publication impact of Canada’s ecology...
open.library.ubc.ca
Updated Mar 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wei, Qian; Srivastava, Diane; Lachapelle, Francois; Fuller, Sylvia (2025). Data from: Working groups, gender and publication impact of Canada’s ecology and evolution faculty [Dataset]. http://doi.org/10.14288/1.0448175
Explore at:
Unique identifier
https://doi.org/10.14288/1.0448175
Dataset updated
Mar 6, 2025
Authors
Wei, Qian; Srivastava, Diane; Lachapelle, Francois; Fuller, Sylvia
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
Mar 4, 2025
Area covered
Canada
Description
Methods
We compiled information on 1,244 faculty members at Canadian universities who were funded by a NSERC Discovery grant (Evolution and Ecology subcommittee) between 1991 and 2019. This information included assumed binary gender from first names and institutional website use of pronouns and photographs (coded men, women); we acknowledge that we may have mis-assigned gender or failed to notice non-binary, transitional or fluid gender identities. We also collected information on the researcher’s year of PhD and all institutions they were affiliated with during their research career. This information was obtained from public curriculum vitae, institutional websites, personally-maintained researcher websites, academic networking platforms (LinkedIn, Research Gate), Google Scholar, and other public sources such as obituaries. For each researcher, we reconstructed their H-index through time using (1) a compiled list of their peer-reviewed publications and (2) the citations for each publication, for each calender year from the date of publication until 2019. We compiled their publications using a recursive procedure, which started by first downloading all publications for individuals with the researcher’s first initial and last name from Web of Science Core Collection (hereafter, WOS) starting from 5 years prior to their PhD until 2019, and then filtering this list by cross-referencing with known variants in authorship names for the researcher (from online curriculum vitae or Google Scholar profile) as well as their institutional affiliations, fuzzy matching of publication titles from their curriculum vitae or Google Scholar profile where possible, and recursive identification of previously unidentified affiliations to fine-tune the cross-referencing procedure. Once we had cleaned the publication record, we then calculated cumulative citations over years for each publication from WOS yearly citation counts as a precursor to calculating the H-index.

We identified a potential pool of publications from working groups by (1) matching WOS titles with known working group publications funded by the 15 synthesis centers that comprise the International Synthesis Consortium, (2) by searching the funding and acknowledgment sections of publications for synthesis centre names or acronyms, or keywords commonly used to describe working groups (“working group”, “synthesis group”, “synthesis working group”, “synthesis committee”, “synthesis workshop”, “catalysis group”). All publications from steps 1 and 2 were then manually coded as primary research vs. synthesis research, and as working group method vs. non-working group method. We further categorized synthesis research publications into the following types: statistical synthesis (statistical analysis of previously published or archived data collected by multiple different researchers and/or studies), conceptual synthesis (qualitative review of the literature or proposal of new frameworks for scientific concepts or investigation), or mathematical synthesis (theoretical mathematical models or specific application of general models for the purpose of prediction). We scored non-working group publications using similar criteria. However, given the large number of publications involved, we changed methods to allow for programmatic approaches based on keywords indicative of the three types of synthesis science. This data is presented in aggregated and anonymized form as needed to prevent the identification of individuals.

We conducted an online survey of current ecology and evolution faculty in Canada from July to September 2019, recruited by email and supplemented by in-person recruitment at the Canadian Society of Ecology and Evolution annual conference (Fredericton NB Canada, August 18-21 2019). The 169 valid responses represent an effective questionnaire response rate of 14.7%. The questionnaire asked for information designed to confirm or complete the researcher database (e.g. academic history, gender) as well as information about why researchers participated or not in working groups, and the perceived costs and benefits of participation. This data is presented in condensed and anonymized form only to maintain the privacy of personal information.

We used survival analysis to test if gender or pace of career progression (H-index adjusted for time since PhD) predicts the hazard rate of participation in working groups. We included an interaction between gender and H-index to assess whether potential selection effects tied to research record captured by the H-index are the same for women and men. We estimated hazard ratios for attending WGs using Cox proportional hazard models.

For the 183 researchers who participated in working groups, we used a fixed effects model with a linear spline to investigate the effects of working group participation and gender on researchers’ trajectory of H-indices over time. This model compares the trajectory of researchers’ H-indices in years before (0-5 years before) and after (1-5 years after, and >6 years after) participating in working groups, and then averaging those differences across researchers. To account for autocorrelation within individuals and heteroscedasticity across individuals, we clustered on individuals. We used a 0.67 power transformation on the “time” variable to linearize the H-index ~ time relationship. We code the spline specification in marginal form, which makes interpretation simple: coefficients of the second and third intervals capture changes in H-index growth rates from their prior intervals.

The effects of research type (synthesis vs primary) and method (working group vs traditional) on publication citation rates were evaluated with a zero-inflated generalized linear model based on a negative binomial error distribution with a log link (R package glmmTMB).

The survey results were evaluated with simple Chi-square tests of association.
f
Comparison of gender proportions by using SSA data (with a 95% cut-off)...
plos.figshare.com
figshare.com
xls
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shubhanshu Mishra; Brent D. Fegley; Jana Diesner; Vetle I. Torvik (2023). Comparison of gender proportions by using SSA data (with a 95% cut-off) versus Genni 2.0, aggregated by ethnicity. [Dataset]. http://doi.org/10.1371/journal.pone.0195773.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0195773.t001
Dataset updated
May 30, 2023
Dataset provided by
PLOS ONE
Authors
Shubhanshu Mishra; Brent D. Fegley; Jana Diesner; Vetle I. Torvik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
U denotes the percentage of authorships labelled Unknown, %F denotes the percentage of female authorships among male and female authorships, and G = SSA denotes the percentage of male and female SSA predictions that match the Genni predictions.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Vetle Torvik (2024). Genni + Ethnea for the Author-ity 2009 dataset [Dataset]. http://doi.org/10.13012/B2IDB-9087546_V1

Genni + Ethnea for the Author-ity 2009 dataset

Explore at:

13 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://doi.org/10.13012/B2IDB-9087546_V1

Dataset updated

Apr 18, 2024

Authors

Vetle Torvik

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Dataset funded by

U.S. National Institutes of Health (NIH)
U.S. National Science Foundation (NSF)

Description

Prepared by Vetle Torvik 2018-04-15 The dataset comes as a single tab-delimited ASCII encoded file, and should be about 717MB uncompressed. • How was the dataset created? First and last names of authors in the Author-ity 2009 dataset was processed through several tools to predict ethnicities and gender, including Ethnea+Genni as described in: Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geocoded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington, DC, USA. http://hdl.handle.net/2142/88927 Smith, B., Singh, M., & Torvik, V. (2013). A search engine approach to estimating temporal changes in gender orientation of first names. Proceedings Of The ACM/IEEE Joint Conference On Digital Libraries, (JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries), 199-208. doi:10.1145/2467696.2467720 EthnicSeer: http://singularity.ist.psu.edu/ethnicity Treeratpituk P, Giles CL (2012). Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching. Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (pp. 1141-1147). AAAI-12. Toronto, ON, Canada SexMachine 0.1.1: https://pypi.org/project/SexMachine First names, for some Author-ity records lacking them, were harvested from outside bibliographic databases. • The code and back-end data is periodically updated and made available for query at Torvik Research Group • What is the format of the dataset? The dataset contains 9,300,182 rows and 10 columns 1. auid: unique ID for Authors in Author-ity 2009 (PMID_authorposition) 2. name: full name used as input to EthnicSeer) 3. EthnicSeer: predicted ethnicity; ARA, CHI, ENG, FRN, GER, IND, ITA, JAP, KOR, RUS, SPA, VIE, XXX 4. prop: decimal between 0 and 1 reflecting the confidence of the EthnicSeer prediction 5. lastname: used as input for Ethnea+Genni 6. firstname: used as input for Ethnea+Genni 7. Ethnea: predicted ethnicity; either one of 26 (AFRICAN, ARAB, BALTIC, CARIBBEAN, CHINESE, DUTCH, ENGLISH, FRENCH, GERMAN, GREEK, HISPANIC, HUNGARIAN, INDIAN, INDONESIAN, ISRAELI, ITALIAN, JAPANESE, KOREAN, MONGOLIAN, NORDIC, POLYNESIAN, ROMANIAN, SLAV, THAI, TURKISH, VIETNAMESE) or two ethnicities (e.g., SLAV-ENGLISH), or UNKNOWN (if no one or two dominant predictons), or TOOSHORT (if both first and last name are too short) 8. Genni: predicted gender; 'F', 'M', or '-' 9. SexMac: predicted gender based on third-party Python program (default settings except case_sensitive=False); female, mostly_female, andy, mostly_male, male) 10. SSNgender: predicted gender based on US SSN data; 'F', 'M', or '-'

Clear search

Close search

Google apps

Main menu

Genni + Ethnea for the Author-ity 2009 dataset

Data from: Working groups, gender and publication impact of Canada’s ecology...

Comparison of gender proportions by using SSA data (with a 95% cut-off)...

Genni + Ethnea for the Author-ity 2009 dataset