Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Prepared by Vetle Torvik 2018-04-15 The dataset comes as a single tab-delimited ASCII encoded file, and should be about 717MB uncompressed. • How was the dataset created? First and last names of authors in the Author-ity 2009 dataset was processed through several tools to predict ethnicities and gender, including Ethnea+Genni as described in: Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geocoded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington, DC, USA. http://hdl.handle.net/2142/88927 Smith, B., Singh, M., & Torvik, V. (2013). A search engine approach to estimating temporal changes in gender orientation of first names. Proceedings Of The ACM/IEEE Joint Conference On Digital Libraries, (JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries), 199-208. doi:10.1145/2467696.2467720 EthnicSeer: http://singularity.ist.psu.edu/ethnicity Treeratpituk P, Giles CL (2012). Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching. Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (pp. 1141-1147). AAAI-12. Toronto, ON, Canada SexMachine 0.1.1: https://pypi.org/project/SexMachine First names, for some Author-ity records lacking them, were harvested from outside bibliographic databases. • The code and back-end data is periodically updated and made available for query at Torvik Research Group • What is the format of the dataset? The dataset contains 9,300,182 rows and 10 columns 1. auid: unique ID for Authors in Author-ity 2009 (PMID_authorposition) 2. name: full name used as input to EthnicSeer) 3. EthnicSeer: predicted ethnicity; ARA, CHI, ENG, FRN, GER, IND, ITA, JAP, KOR, RUS, SPA, VIE, XXX 4. prop: decimal between 0 and 1 reflecting the confidence of the EthnicSeer prediction 5. lastname: used as input for Ethnea+Genni 6. firstname: used as input for Ethnea+Genni 7. Ethnea: predicted ethnicity; either one of 26 (AFRICAN, ARAB, BALTIC, CARIBBEAN, CHINESE, DUTCH, ENGLISH, FRENCH, GERMAN, GREEK, HISPANIC, HUNGARIAN, INDIAN, INDONESIAN, ISRAELI, ITALIAN, JAPANESE, KOREAN, MONGOLIAN, NORDIC, POLYNESIAN, ROMANIAN, SLAV, THAI, TURKISH, VIETNAMESE) or two ethnicities (e.g., SLAV-ENGLISH), or UNKNOWN (if no one or two dominant predictons), or TOOSHORT (if both first and last name are too short) 8. Genni: predicted gender; 'F', 'M', or '-' 9. SexMac: predicted gender based on third-party Python program (default settings except case_sensitive=False); female, mostly_female, andy, mostly_male, male) 10. SSNgender: predicted gender based on US SSN data; 'F', 'M', or '-'
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Methods
We compiled information on 1,244 faculty members at Canadian universities who were funded by a NSERC Discovery grant (Evolution and Ecology subcommittee) between 1991 and 2019. This information included assumed binary gender from first names and institutional website use of pronouns and photographs (coded men, women); we acknowledge that we may have mis-assigned gender or failed to notice non-binary, transitional or fluid gender identities. We also collected information on the researcher’s year of PhD and all institutions they were affiliated with during their research career. This information was obtained from public curriculum vitae, institutional websites, personally-maintained researcher websites, academic networking platforms (LinkedIn, Research Gate), Google Scholar, and other public sources such as obituaries. For each researcher, we reconstructed their H-index through time using (1) a compiled list of their peer-reviewed publications and (2) the citations for each publication, for each calender year from the date of publication until 2019. We compiled their publications using a recursive procedure, which started by first downloading all publications for individuals with the researcher’s first initial and last name from Web of Science Core Collection (hereafter, WOS) starting from 5 years prior to their PhD until 2019, and then filtering this list by cross-referencing with known variants in authorship names for the researcher (from online curriculum vitae or Google Scholar profile) as well as their institutional affiliations, fuzzy matching of publication titles from their curriculum vitae or Google Scholar profile where possible, and recursive identification of previously unidentified affiliations to fine-tune the cross-referencing procedure. Once we had cleaned the publication record, we then calculated cumulative citations over years for each publication from WOS yearly citation counts as a precursor to calculating the H-index.
We identified a potential pool of publications from working groups by (1) matching WOS titles with known working group publications funded by the 15 synthesis centers that comprise the International Synthesis Consortium, (2) by searching the funding and acknowledgment sections of publications for synthesis centre names or acronyms, or keywords commonly used to describe working groups (“working group”, “synthesis group”, “synthesis working group”, “synthesis committee”, “synthesis workshop”, “catalysis group”). All publications from steps 1 and 2 were then manually coded as primary research vs. synthesis research, and as working group method vs. non-working group method. We further categorized synthesis research publications into the following types: statistical synthesis (statistical analysis of previously published or archived data collected by multiple different researchers and/or studies), conceptual synthesis (qualitative review of the literature or proposal of new frameworks for scientific concepts or investigation), or mathematical synthesis (theoretical mathematical models or specific application of general models for the purpose of prediction). We scored non-working group publications using similar criteria. However, given the large number of publications involved, we changed methods to allow for programmatic approaches based on keywords indicative of the three types of synthesis science. This data is presented in aggregated and anonymized form as needed to prevent the identification of individuals.
We conducted an online survey of current ecology and evolution faculty in Canada from July to September 2019, recruited by email and supplemented by in-person recruitment at the Canadian Society of Ecology and Evolution annual conference (Fredericton NB Canada, August 18-21 2019). The 169 valid responses represent an effective questionnaire response rate of 14.7%. The questionnaire asked for information designed to confirm or complete the researcher database (e.g. academic history, gender) as well as information about why researchers participated or not in working groups, and the perceived costs and benefits of participation. This data is presented in condensed and anonymized form only to maintain the privacy of personal information.
We used survival analysis to test if gender or pace of career progression (H-index adjusted for time since PhD) predicts the hazard rate of participation in working groups. We included an interaction between gender and H-index to assess whether potential selection effects tied to research record captured by the H-index are the same for women and men. We estimated hazard ratios for attending WGs using Cox proportional hazard models.
For the 183 researchers who participated in working groups, we used a fixed effects model with a linear spline to investigate the effects of working group participation and gender on researchers’ trajectory of H-indices over time. This model compares the trajectory of researchers’ H-indices in years before (0-5 years before) and after (1-5 years after, and >6 years after) participating in working groups, and then averaging those differences across researchers. To account for autocorrelation within individuals and heteroscedasticity across individuals, we clustered on individuals. We used a 0.67 power transformation on the “time” variable to linearize the H-index ~ time relationship. We code the spline specification in marginal form, which makes interpretation simple: coefficients of the second and third intervals capture changes in H-index growth rates from their prior intervals.
The effects of research type (synthesis vs primary) and method (working group vs traditional) on publication citation rates were evaluated with a zero-inflated generalized linear model based on a negative binomial error distribution with a log link (R package glmmTMB).
The survey results were evaluated with simple Chi-square tests of association.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
U denotes the percentage of authorships labelled Unknown, %F denotes the percentage of female authorships among male and female authorships, and G = SSA denotes the percentage of male and female SSA predictions that match the Genni predictions.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Prepared by Vetle Torvik 2018-04-15 The dataset comes as a single tab-delimited ASCII encoded file, and should be about 717MB uncompressed. • How was the dataset created? First and last names of authors in the Author-ity 2009 dataset was processed through several tools to predict ethnicities and gender, including Ethnea+Genni as described in: Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geocoded author names in a large-scale bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of Congress, Washington, DC, USA. http://hdl.handle.net/2142/88927 Smith, B., Singh, M., & Torvik, V. (2013). A search engine approach to estimating temporal changes in gender orientation of first names. Proceedings Of The ACM/IEEE Joint Conference On Digital Libraries, (JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries), 199-208. doi:10.1145/2467696.2467720 EthnicSeer: http://singularity.ist.psu.edu/ethnicity Treeratpituk P, Giles CL (2012). Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching. Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (pp. 1141-1147). AAAI-12. Toronto, ON, Canada SexMachine 0.1.1: https://pypi.org/project/SexMachine First names, for some Author-ity records lacking them, were harvested from outside bibliographic databases. • The code and back-end data is periodically updated and made available for query at Torvik Research Group • What is the format of the dataset? The dataset contains 9,300,182 rows and 10 columns 1. auid: unique ID for Authors in Author-ity 2009 (PMID_authorposition) 2. name: full name used as input to EthnicSeer) 3. EthnicSeer: predicted ethnicity; ARA, CHI, ENG, FRN, GER, IND, ITA, JAP, KOR, RUS, SPA, VIE, XXX 4. prop: decimal between 0 and 1 reflecting the confidence of the EthnicSeer prediction 5. lastname: used as input for Ethnea+Genni 6. firstname: used as input for Ethnea+Genni 7. Ethnea: predicted ethnicity; either one of 26 (AFRICAN, ARAB, BALTIC, CARIBBEAN, CHINESE, DUTCH, ENGLISH, FRENCH, GERMAN, GREEK, HISPANIC, HUNGARIAN, INDIAN, INDONESIAN, ISRAELI, ITALIAN, JAPANESE, KOREAN, MONGOLIAN, NORDIC, POLYNESIAN, ROMANIAN, SLAV, THAI, TURKISH, VIETNAMESE) or two ethnicities (e.g., SLAV-ENGLISH), or UNKNOWN (if no one or two dominant predictons), or TOOSHORT (if both first and last name are too short) 8. Genni: predicted gender; 'F', 'M', or '-' 9. SexMac: predicted gender based on third-party Python program (default settings except case_sensitive=False); female, mostly_female, andy, mostly_male, male) 10. SSNgender: predicted gender based on US SSN data; 'F', 'M', or '-'