12 datasets found
  1. Baby Names from Social Security Card Applications - National Data

    • catalog.data.gov
    • s.cnmilf.com
    • +1more
    Updated May 5, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social Security Administration (2022). Baby Names from Social Security Card Applications - National Data [Dataset]. https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-data
    Explore at:
    Dataset updated
    May 5, 2022
    Dataset provided by
    Social Security Administrationhttp://www.ssa.gov/
    Description

    The data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 onward.

  2. d

    Gender and Intersectional Disparities in Biographies on English and Spanish...

    • b2find.dkrz.de
    Updated Jan 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Gender and Intersectional Disparities in Biographies on English and Spanish Wikipedia Front Pages (2013-2023) - Dataset - B2FIND [Dataset]. https://b2find.dkrz.de/dataset/0f9014ea-070d-5f25-8503-e46a38cf3c40
    Explore at:
    Dataset updated
    Jan 9, 2025
    Description

    El següent dataset conté dos carpetes amb dades diferents, les quals inclouen: El conjunt de dades de la carpeta amb nom "Gender" proporciona la distribució per gènere de les persones que han estat destacades a les portades de les versions anglesa i espanyola de Wikipedia, durant el període 2013-2023. Pel que fa a l'edició en castellà, les dades s'han recollit de les seccions "Artículos buenos" i "Artículos destacados" i es mostren en forma agregada. El conjunt de dades de la carpeta amb nom "Intersectionality" proporciona la distribució per diferents atributs sociodemogràfics de les persones que han estat destacades a les portades de les versions en anglès i en espanyol de Wikipedia, en el període del 2013 al 2023. Està estructurat en quatre CSV. Tres d'aquests CSV corresponen a l'edició de Wikipedia en anglès: el CSV English 3C que conté les dades de les seccions "Did you know...", "In the news" i "On this day..."; un CSV dedicat a "English Featured Article", i un altre a "English Featured Picture". El quart CSV conté les dades de l'edició en castellà de la Wikipedia, extretes de les seccions "Artículo Destacado" i "Artículo Bueno". A cada CSV, les dades es presenten en columnes, cadascuna dedicada a un atribut sociodemogràfic. The following dataset contains two folders with different data, which include: The data set of the folder with name "Gender" provides the gender distribution of individuals featured on the front pages of the English and Spanish versions of Wikipedia from 2013 to 2023. For the Spanish edition, data has been collected from the "Artículos buenos" and "Artículos destacados" sections and is displayed in an aggregated format. The data set of the folder with name "Intersectionality" provides the distribution based on various sociodemographic attributes of individuals who have been featured on the front pages of the English and Spanish versions of Wikipedia from 2013 to 2023. It is structured into four CSV. Three of these CSV correspond to the English Wikipedia edition: the English 3C CSV containing data from the sections "Did you know...", "In the news," and "On this day..."; a CSV dedicated to "English Featured Article," and another to "English Featured Picture." The fourth CSV contains data from the Spanish edition of Wikipedia, extracted from the sections "Artículo Destacado" and "Artículo Bueno." Within each CSV, the data is presented in columns, each dedicated to a sociodemographic attribute.

  3. d

    Popular Baby Names

    • catalog.data.gov
    • data.cityofnewyork.us
    • +4more
    Updated Jun 15, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2024). Popular Baby Names [Dataset]. https://catalog.data.gov/dataset/popular-baby-names
    Explore at:
    Dataset updated
    Jun 15, 2024
    Dataset provided by
    data.cityofnewyork.us
    Description

    Popular Baby Names by Sex and Ethnic Group Data were collected through civil birth registration. Each record represents the ranking of a baby name in the order of frequency. Data can be used to represent the popularity of a name. Caution should be used when assessing the rank of a baby name if the frequency count is close to 10; the ranking may vary year to year.

  4. Worldwide Gender Differences in Public Code Contributions - Replication...

    • zenodo.org
    • data.niaid.nih.gov
    bin, html, zip
    Updated Feb 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Davide Rossi; Stefano Zacchiroli; Stefano Zacchiroli; Davide Rossi (2022). Worldwide Gender Differences in Public Code Contributions - Replication Package [Dataset]. http://doi.org/10.5281/zenodo.6020475
    Explore at:
    bin, html, zipAvailable download formats
    Dataset updated
    Feb 9, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Davide Rossi; Stefano Zacchiroli; Stefano Zacchiroli; Davide Rossi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Worldwide Gender Differences in Public Code Contributions - Replication Package

    This document describes how to replicate the findings of the paper: Davide Rossi and Stefano Zacchiroli, 2022, Worldwide Gender Differences in Public Code Contributions. In Software Engineering in Society (ICSE-SEIS'22), May 21-29, 2022, Pittsburgh, PA, USA. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3510458.3513011

    This document comes with the software needed to mine and analyze the data presented in the paper.

    Prerequisites

    These instructions assume the use of the bash shell, the Python programming language, the PosgreSQL DBMS (version 11 or later), the zstd compression utility and various usual *nix shell utilities (cat, pv, ...), all of which are available for multiple architectures and OSs.
    It is advisable to create a Python virtual environment and install the following PyPI packages: click==8.0.3 cycler==0.10.0 gender-guesser==0.4.0 kiwisolver==1.3.2 matplotlib==3.4.3 numpy==1.21.3 pandas==1.3.4 patsy==0.5.2 Pillow==8.4.0 pyparsing==2.4.7 python-dateutil==2.8.2 pytz==2021.3 scipy==1.7.1 six==1.16.0 statsmodels==0.13.0

    Initial data

    • swh-replica, a PostgreSQL database containing a copy of Software Heritage data. The schema for the database is available at https://forge.softwareheritage.org/source/swh-storage/browse/master/swh/storage/sql/.
      We retrieved these data from Software Heritage, in collaboration with the archive operators, taking an archive snapshot as of 2021-07-07. We cannot make these data available in full as part of the replication package due to both its volume and the presence in it of personal information such as user email addresses. However, equivalent data (stripped of email addresses) can be obtained from the Software Heritage archive dataset, as documented in the article: Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli, The Software Heritage Graph Dataset: Public software development under one roof. In proceedings of MSR 2019: The 16th International Conference on Mining Software Repositories, May 2019, Montreal, Canada. Pages 138-142, IEEE 2019. http://dx.doi.org/10.1109/MSR.2019.00030.
      Once retrieved, the data can be loaded in PostgreSQL to populate swh-replica.
    • names.tab - forenames and surnames per country with their frequency
    • zones.acc.tab - countries/territories, timezones, population and world zones
    • c_c.tab - ccTDL entities - world zones matches

    Data preparation

    • Export data from the swh-replica database to create commits.csv.zst and authors.csv.zst sh> ./export.sh
    • Run the authors cleanup script to create authors--clean.csv.zst sh> ./cleanup.sh authors.csv.zst
    • Filter out implausible names and create authors--plausible.csv.zst sh> pv authors--clean.csv.zst | unzstd | ./filter_names.py 2> authors--plausible.csv.log | zstdmt > authors--plausible.csv.zst

    Gender detection

    • Run the gender guessing script to create author-fullnames-gender.csv.zst sh> pv authors--plausible.csv.zst | unzstd | ./guess_gender.py --fullname --field 2 | zstdmt > author-fullnames-gender.csv.zst

    Database creation and data ingestion

    • Create the PostgreSQL DB sh> createdb gender-commit Notice that from now on when prepending the psql> prompt we assume the execution of psql on the gender-commit database.

    • Import data into PostgreSQL DB sh> ./import_data.sh

    Zone detection

    • Extract commits data from the DB and create commits.tab, that is used as input for the gender detection script
      sh> psql -f extract_commits.sql gender-commit
    • Run the world zone detection script to create commit_zones.tab.zst sh> pv commits.tab | ./assign_world_zone.py -a -n names.tab -p zones.acc.tab -x -w 8 | zstdmt > commit_zones.tab.zst Use ./assign_world_zone.py --help if you are interested in changing the script parameters.
    • Read zones assignment data from the file into the DB
      psql> \copy commit_culture from program 'zstdcat commit_zones.tab.zst | cut -f1,6 | grep -Ev ''\s$'''

    Extraction and graphs

    • Run the script to execute the queries to extract the data to plot from the DB. This creates commits_tz.tab, authors_tz.tab, commits_zones.tab, authors_zones.tab, and authors_zones_1620.tab.
      Edit extract_data.sql if you whish to modify extraction parameters (start/end year, sampling, ...). sh> ./extract_data.sh
    • Run the script to create the graphs from all the previously extracted tabfiles. This will generate commits_tzs.pdf, authors_tzs.pdf, commits_zones.pdf, authors_zones.pdf, and authors_zones_1620.pdf. sh> ./create_charts.sh

    Additional graphs

    This package also includes some already-made graphs

    • authors_zones_1.pdf: stacked graphs showing the ratio of female authors per world zone through the years, considering all authors with at least one commit per period
    • authors_zones_2.pdf: ditto with at least two commits per period
    • authors_zones_10.pdf: ditto with at least ten commits per period
  5. Author Gender Representation at Audio Engineering Conferences - An...

    • zenodo.org
    • data.niaid.nih.gov
    Updated Feb 12, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kat Young; Kat Young; Michael Lovedee-Turner; Michael Lovedee-Turner; Jude Brereton; Jude Brereton; Helena Daffern; Helena Daffern (2021). Author Gender Representation at Audio Engineering Conferences - An Anonymised Dataset v2 [Dataset]. http://doi.org/10.5281/zenodo.4535610
    Explore at:
    Dataset updated
    Feb 12, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Kat Young; Kat Young; Michael Lovedee-Turner; Michael Lovedee-Turner; Jude Brereton; Jude Brereton; Helena Daffern; Helena Daffern
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the author gender dataset (as a comma-delimited .csv file) originally created in association with the paper entitled 'The Impact of Gender on Conference Authorship in Audio Engineering: Analysis Using a New Data Collection Method', but since extended to include conferences up to the end of 2019. The original dataset is available at: https://doi.org/10.5281/zenodo.1249693. Please cite both the paper and the relevant dataset if used. Visualisation is available at: http://tibbakoi.github.io/aesgender.

    ---

    The dataset was produced using a novel method which used self-identified pronouns, therefore allowing for as many groups as necessary to describe the population.

    1. A list of authors was generated from conference proceedings.
    2. An email was sent to each author to acquire their pronoun.
    3. If no email was available/no response was received, a pronoun was acquired from a biography.
    4. If no biography was available, a pronoun was inferred from traditional gender markers and gender presentation.
    5. If no gender marker/photograph was available, the entry was labelled as 'Information Unavailable'. For brevity, the label 'Unknown' is used in the paper.

    ---

    The columns in the dataset are as follows:

    1. ID: unique identifier of entry
    2. Pronoun: pronoun of entry
    3. Position (abs): numerical absolute position within author list for entry
    4. Position (relative): relative position within author list for entry (either First, Last, or Middle)
    5. Single/multi-author: whether the publication for that entry has a single author or has multiple authors (single author publications are excluded from author position analysis)
    6. Conference: Full conference name of entry
    7. Topic: Topic of conference of entry, taken from conference name
    8. Year: Year of conference of entry
    9. Type: Type of publication for that entry as listed on the online conference proceedings
    10. Grouped Type: Grouping of publication types for that entry for easier analysis due to inconsistencies in online conference proceedings (groups are: workshop, poster, paper, panel, keynote, invited speaker, invited paper, demo)
    11. Inc. for author pos?: True/False as to whether to include the entry for analysis over author position (included types are: paper, invited paper, poster (all with multiple authors) as these have meaningful author orders)
    12. Inc. for single/multi-author?: True/False as to whether to include the entry for analysis over single/multi author (includes types are: paper, invited paper, poster as these have meaningful author orders)
    13. Invited paper status: Grouping of the types to allow statistical analysis over invited vs non-invited types (invited types are: invited speaker, invited paper, keynote, panel. Non-invited types are: poster, paper, demo, workshop)

    NB: Some grouping of the data is required as online conference proceedings are not always consistent (Column 10). Some labelling of the data is required to determine which entries to include in certain types of analysis (Columns 11-13).

    ---

    This dataset is distributed in the hopes that it will prove useful under the Creative Commons Attribution 4.0, with no warranty; or the implied warranty of merchantability or fitness for a particular problem.

    ---

    Dataset curated by: Kat Young and Michael Lovedee-Turner, formerly at the AudioLab, Dept. of Electronic Engineering, University of York.
    Contact: kathryn.ae.young@gmail.com

  6. Trends in gender homophily in scientific publications (data)

    • zenodo.org
    Updated Apr 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2024). Trends in gender homophily in scientific publications (data) [Dataset]. http://doi.org/10.5281/zenodo.7958034
    Explore at:
    Dataset updated
    Apr 11, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous; Anonymous
    Description

    This dataset contains records of research articles extracted from the Web of Science (WoS) from 1980 to 2019---in total, 15,642 journals, 28,241,100 articles and 111,980,858 authorships across 153 research areas.

    The main dataset (author_address_article_gend_v2.parquet), in Parquet format, contains all the authorships, where an authorship is defined as the tuple article-author. There are 12 variables per authorship (row):

    • ut: unique article identifier.
    • daisng_id: unique author identifier.
    • country: author country (two-letter ISO code).
    • date: publication date.
    • gender: gender of the author ("male" or "female"), as provided by the Genderize.io API.
    • probability: probability of the gender attribute, as provided by the Genderize.io API.
    • count: number of entries for the author first name, as provided by the Genderize.io API.
    • jsc: journal subject category.
    • field: field of research.
    • research_area: area of research.
    • n_aut: number of authors in this publication.
    • journal: journal name.

    With the previous dataset, a resampler was applied to generate null homophily values for each year. There are 4 datasets in R Data Serialization (RDS) format:

    • null_field.rds: null homophily values per country, year and field of research.
    • null_field_comp.rds: null homophily values per year and field of research (only for complete authorships).
    • null_research.rds: null homophily values per year and area of research.
    • null_research_comp.rds: null homophily values per year and area of research (only for complete authorships).

    All these datasets have the same structure:

    • country: country (two-letter ISO code).
    • year: year.
    • variable: either field or research area name.
    • m: average homophily.
    • s: homophily std. error.

    Finally, some supplementary files used in the descriptive analysis and methods:

    • File null_research_l2019.rds is an example of the output from the resampling algorithm for year 2019.
    • File wos_category_to_field.csv is a mapping from WoS categories to more general fields.
    • File jcr_if_2020.csv contains the percentiles of the journal impact factor for the JCR 2020.
  7. J

    Data from: More on the influence of gender equality on gender differences in...

    • journaldata.zbw.eu
    csv, pdf, txt
    Updated Mar 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sara Cerioli; Andrey Formozov; Sara Cerioli; Andrey Formozov (2024). More on the influence of gender equality on gender differences in economic preferences [Dataset]. http://doi.org/10.15456/jbnst.2024027.1150685504
    Explore at:
    txt(2795), pdf(454354), csv(4103), csv(3229), csv(162836), csv(60677)Available download formats
    Dataset updated
    Mar 13, 2024
    Dataset provided by
    ZBW - Leibniz Informationszentrum Wirtschaft
    Authors
    Sara Cerioli; Andrey Formozov; Sara Cerioli; Andrey Formozov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Introduction

    This study reproduces the results of the article Relationship of gender differences in preferences to economic development and gender equality (DOI: 10.1126/science.aas9899) and partially its supplementary material.

    The code for the analysis can be found at the following GitHub page: https://github.com/scerioli/Global-Preferences-Survey

    Preparation of the data

    Data Collection, Cleaning, and Standardization

    The data used in the Falk & Hermle 2018 is not fully available because of two reasons:

    1. Data paywall: Some part of the data is not available for free. It requires to pay a fee to the Gallup to access them. This is the case for the additional data set that is used in the article, for instance, the one that contains the education level and the household income quintile. Check the website of the briq - Institute on Behavior & Inequality for more information on it.

    2. Data used in study is not available online: This is what happened for the LogGDP p/c calculated in 2005 US dollars (which is not directly available online). We decided to calculate the LogGDP p/c in 2010 US dollars because it was easily available, which should not change the main findings of the article.

    Global Preferences Survey

    This data is protected by copyright and cannot be given to third parties.

    To download the GPS data set, go to the website of the Global Preferences Survey in the section "downloads". There, choose the "Dataset" form and after filling it, we can download the data set.

    Hint: The organisation can be also "private".

    The following two relevant papers have to be also cited in all publications that make use of or refer in any kind to GPS dataset:

    • Falk, A., Becker, A., Dohmen, T., Enke, B., Huffman, D., & Sunde, U. (2018). Global evidence on economic preferences. Quarterly Journal of Economics, 133 (4), 1645–1692.

    • Falk, A., Becker, A., Dohmen, T. J., Huffman, D., & Sunde, U. (2016). The preference survey module: A validated instrument for measuring risk, time, and social preferences. IZA Discussion Paper No. 9674.

    GDP per capita

    From the website of the World Bank, one can access the data about the GDP per capita on a certain set of years. We took the GDP per capita (constant 2010 US$), made an average of the data from 2003 until 2012 for all the available countries, and matched the names of the countries with the ones from the GPS data set.

    Gender Equality Index

    The Gender Equality Index is composed of four main data sets.

    • Time since women’s suffrage: Taken from the Inter-Parliamentary Union Website. We prepared the data in the following way. For several countries more than one date where provided (for example, the right to be elected and the right to vote). We use the last date when both vote and stand for election right were granted, with no other restrictions commented. Some counties were a colony or within union of the countries (for instance, Kazakhstan in Soviet Union). For these countries, the rights to vote and be elected might be technically granted two times within union and as independent state. In this case we kept the first date. It was difficult to decide on South Africa because its history shows the racism part very entangled with women's rights. We kept the latest date when also Black women could vote. For Nigeria, considered the distinctions between North and South, we decided to keep only the North data because, again, it was showing the completeness of the country and it was the last date. Note: USA data doesn't take into account that also up to 1964 black women couldn't vote (in general, Blacks couldn't vote up to that year). We didn’t keep this date, because it was not explicitly mentioned in the original data set. This is in contrast with other choices made, but it is important to reproduce exactly the results of the publication, and the USA is often easy to spot on the plots.

    • UN Gender Inequality Index: Taken from the Human Development Report 2015. We kept only the table called "Gender Inequality Index".

    • WEF Global Gender Gap: WEF Global Gender Gap Index Taken from the World Economic Forum Global Gender Gap Report 2015. For countries where data were missing, data was added from the World Economic Forum Global Gender Gap Report 2006. We modified some of the country names directly in the csv file, that is why we provide it as an input file.

    • Ratio of female and male labour force participation: Average International Labour Organization estimates from 2003 to 2012 taken from the World Bank database (http://data.worldbank.org/indicator/SL.TLF.CACT.FM.ZS). Values were inverted to create an index of equality. We took the average for the period between 2004 and 2013.

    In our extended analysis, we also involved the following index:

    • United Nations Development Programme Gender Development Index taken from Human Development Reports 2020. Note that we have downloaded the two tables of the Human Development Index for males and females, and used the ratio of the two as a GDI index, as described in the report.
  8. Data from: THE RELEVANCY OF MASSIVE HEALTH EDUCATION IN THE BRAZILIAN PRISON...

    • zenodo.org
    csv, pdf
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Janaína L. R. da S. Valentim; Janaína L. R. da S. Valentim; Sara Dias-Trindade; Sara Dias-Trindade; Eloiza da S. G. Oliveira; Eloiza da S. G. Oliveira; José A. M. Moreira; José A. M. Moreira; Felipe Fernandes; Felipe Fernandes; Manoel Honorio Romão; Manoel Honorio Romão; Philippi S. G. de Morais; Philippi S. G. de Morais; Alexandre R. Caitano; Alexandre R. Caitano; Aline P. Dias; Aline P. Dias; Carlos A. P. Oliveira; Carlos A. P. Oliveira; Karilany D. Coutinho; Karilany D. Coutinho; Ricardo B. Ceccim; Ricardo B. Ceccim; Ricardo A. de M. Valentim; Ricardo A. de M. Valentim (2024). THE RELEVANCY OF MASSIVE HEALTH EDUCATION IN THE BRAZILIAN PRISON SYSTEM: THE COURSE "HEALTH CARE FOR PEOPLE DEPRIVED OF FREEDOM" AND ITS IMPACTS [Dataset]. http://doi.org/10.5281/zenodo.6499752
    Explore at:
    csv, pdfAvailable download formats
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Janaína L. R. da S. Valentim; Janaína L. R. da S. Valentim; Sara Dias-Trindade; Sara Dias-Trindade; Eloiza da S. G. Oliveira; Eloiza da S. G. Oliveira; José A. M. Moreira; José A. M. Moreira; Felipe Fernandes; Felipe Fernandes; Manoel Honorio Romão; Manoel Honorio Romão; Philippi S. G. de Morais; Philippi S. G. de Morais; Alexandre R. Caitano; Alexandre R. Caitano; Aline P. Dias; Aline P. Dias; Carlos A. P. Oliveira; Carlos A. P. Oliveira; Karilany D. Coutinho; Karilany D. Coutinho; Ricardo B. Ceccim; Ricardo B. Ceccim; Ricardo A. de M. Valentim; Ricardo A. de M. Valentim
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Brazil
    Description

    Dataset name: asppl_dataset_v2.csv

    Version: 2.0

    Dataset period: 06/07/2018 - 01/14/2022

    Dataset Characteristics: Multivalued

    Number of Instances: 8118

    Number of Attributes: 9

    Missing Values: Yes

    Area(s): Health and education

    Sources:

    • Virtual Learning Environment of the Brazilian Health System (AVASUS) (Brasil, 2022a);

    • Brazilian Occupational Classification (CBO) (Brasil, 2022b);

    • National Registry of Health Establishments (CNES) (Brasil, 2022c);

    • Brazilian Institute of Geography and Statistics (IBGE) (Brasil, 2022e).

    Description: The data contained in the asppl_dataset_v2.csv dataset (see Table 1) originates from participants of the technology-based educational course “Health Care for People Deprived of Freedom.” The course is available on the AVASUS (Brasil, 2022a). This dataset provides elementary data for analyzing the course’s impact and reach and the profile of its participants. In addition, it brings an update of the data presented in work by Valentim et al. (2021).

    Table 1: Description of AVASUS dataset features.

    Attributes

    Description

    datatype

    Value

    gender

    Gender of the course participant.

    Categorical.

    Feminino / Masculino / Não Informado. (In English, Female, Male or Uninformed)

    course_progress

    Percentage of completion of the course.

    Numerical.

    Range from 0 to 100.

    course_evaluation

    A score given to the course by the participant.

    Numerical.

    0, 1, 2, 3, 4, 5 or NaN.

    evaluation_commentary

    Comment made by the participant about the course.

    Categorical.

    Free text or NaN.

    region

    Brazilian region in which the participant resides.

    Categorical.

    Brazilian region according to IBGE: Norte, Nordeste, Centro-Oeste, Sudeste or Sul (In English North, Northeast, Midwest, Southeast or South).

    CNES

    The CNES code refers to the health establishment where the participant works.

    Numerical.

    CNES Code or NaN.

    health_care_level

    Identification of the health care network level for which the course participant works.

    Categorical.

    “ATENCAO PRIMARIA”,

    “MEDIA COMPLEXIDADE”,

    “ALTA COMPLEXIDADE”,

    and their possible combinations.

    (In English "PRIMARY HEALTH CARE", "SECONDARY HEALTH CARE" AND "TERTIARY HEALTH CARE")

    year_enrollment

    Year in which the course participant registered.

    Numerical.

    Year (YYYY).

    CBO

    Participant occupation.

    Categorical.

    Text coded according to the Brazilian Classification of Occupations or “Indivíduo sem afiliação formal.” (In English “Individual without formal affiliation.”)

    Dataset name: prison_syphilis_and_population_brazil.csv

    Dataset period: 2017 - 2020

    Dataset Characteristics: Multivalued

    Number of Instances: 6

    Number of Attributes: 13

    Missing Values: No

    Source:

    • National Penitentiary Department (DEPEN) (Brasil, 2022d);

    Description: The data contained in the prison_syphilis_and_population_brazil.csv dataset (see Table 2) originate from the National Penitentiary Department Information System (SISDEPEN) (Brasil, 2022d). This dataset provides data on the population and prevalence of syphilis in the Brazilian prison system. In addition, it brings a rate that represents the normalized data for purposes of comparison between the populations of each region and Brazil.

    Table 2: Description of DEPEN dataset Features.

    Attributes

    Description

    datatype

    Value

    Region

    Brazilian region in which the participant resides. In addition, the sum of the regions, which refers to Brazil.

    Categorical.

    Brazil and Brazilian region according to IBGE: North, Northeast, Midwest, Southeast or South.

    syphilis_2017

    Number of syphilis cases in the prison system in 2017.

    Numerical.

    Number of syphilis cases.

    syphilis_rate_2017

    Normalized rate of syphilis cases in 2017.

    Numerical.

    Syphilis case rate.

    syphilis_2018

    Number of syphilis cases in the prison system in 2018.

    Numerical.

    Number of syphilis cases.

    syphilis_rate_2018

    Normalized rate of syphilis cases in 2018.

    Numerical.

    Syphilis case rate.

    syphilis_2019

    Number of syphilis cases in the prison system in 2019.

    Numerical.

    Number of syphilis cases.

    syphilis_rate_2019

    Normalized rate of syphilis cases in 2019.

    Numerical.

    Syphilis case rate.

    syphilis_2020

    Number of syphilis cases in the prison system in 2020.

    Numerical.

    Number of syphilis cases.

    syphilis_rate_2020

    Normalized rate of syphilis cases in 2020.

    Numerical.

    Syphilis case rate.

    pop_2017

    Prison population in 2017.

    Numerical.

    Population number.

    pop_2018

    Prison population in 2018.

    Numerical.

    Population number.

    pop_2019

    Prison population in 2019.

    Numerical.

    Population number.

    pop_2020

    Prison population in 2020.

    Numerical.

    Population number.

    Dataset name: students_cumulative_sum.csv

    Dataset period: 2018 - 2020

    Dataset Characteristics: Multivalued

    Number of Instances: 6

    Number of Attributes: 7

    Missing Values: No

    Source:

    • Virtual Learning Environment of the Brazilian Health System (AVASUS) (Brasil, 2022a);

    • Brazilian Institute of Geography and Statistics (IBGE) (Brasil, 2022e).

    Description: The data contained in the students_cumulative_sum.csv dataset (see Table 3) originate mainly from AVASUS (Brasil, 2022a). This dataset provides data on the number of students by region and year. In addition, it brings a rate that represents the normalized data for purposes of comparison between the populations of each region and Brazil. We used population data estimated by the IBGE (Brasil, 2022e) to calculate the rate.

    Table 3: Description of Students dataset Features.

  9. Disambiguated researchers publication data

    • figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amaral Lab (2023). Disambiguated researchers publication data [Dataset]. http://doi.org/10.6084/m9.figshare.1591864.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Amaral Lab
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Companion dataset to "The possible role of resource requirements and academic career-choice risk on gender differences in publication rate and impact" by Duch J, Zeng XHT, Sales-Pardo M, Radicchi F, Otis S, Woodruff TK, Amaral LAN (PLoS ONE 7, e51332, 2012) doi: 10.1371/journal.pone.0051332 This dataset lists the total number of publications by 4,394 faculty members from 7 distinct research fields working at top U.S. institutions. The dataset also contains bibliographic information manualy gathered from the CVs of those faculty members. The publications data was collected from Thomson Reuters' Web of Science according to the procedures described in the published paper.

    The data is a single csv file with the following fields: author_name - researcher name as: Last name, Initialsgender - researcher gender as: M (male) or F (female)univ_name - Institution of current employmentfield - scientific disciplinephd_year - year of phd completionnationality - Country of originbackground - List of degreesaffiliations - List of honours and past appointmentstotal_pubs - Total number of publications Some fields are not available for some researchers. Current employments are accurate as of June, 2010.total_pubs field show total number of publications published by the end of 2010.

  10. Worldwide Bureaucracy Indicators

    • kaggle.com
    Updated Jun 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joakim Arvidsson (2024). Worldwide Bureaucracy Indicators [Dataset]. https://www.kaggle.com/datasets/joebeachcapital/worldwide-bureaucracy-indicators/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 12, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Joakim Arvidsson
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Worldwide Bureaucracy Indicators

    Worldwide Bureaucracy Indicators (WWBI) dataset from the World Bank.

    The Worldwide Bureaucracy Indicators (WWBI) database is a unique cross-national dataset on public sector employment and wages that aims to fill an information gap, thereby helping researchers, development practitioners, and policymakers gain a better understanding of the personnel dimensions of state capability, the footprint of the public sector within the overall labor market, and the fiscal implications of the public sector wage bill. The dataset is derived from administrative data and household surveys, thereby complementing existing, expert perception-based approaches.

    The World Bank introduced the dataset with a series of four blogs:

    Can you replicate the figures in the blogs? Can you display any of the data more clearly than in the blogs?

    Data Dictionary

    wwbi_data.csv

    variableclassdescription
    country_codecharacter3-letter ISO_3166-1 code
    indicator_codecharactercode identifying the indicator of bureaucracy
    yearnumericyear of the data
    valuenumericnumeric value of the data

    wwbi_series.csv

    variableclassdescription
    indicator_codecharactercode identifying the indicator of bureaucracy
    indicator_namecharactername of the indicator

    wwbi_country.csv

    variableclassdescription
    country_codecharacter3-letter ISO_3166-1 code
    short_namecharactershort or common name for the country
    table_namecharactermore alphabetically sortable name of the country
    long_namecharacterfull name of the country
    x2_alpha_codecharacter2-letter ISO_3166-1 code
    currency_unitcharactercurrency unit
    special_notescharacterspecial notes
    regioncharacterregion
    income_groupcharacterlow, lower middle, upper middle, or high income
    wb_2_codecharacteralternate 2-letter code
    national_accounts_base_yearintegernational accounts base year
    national_accounts_reference_yearintegernational accounts reference year
    sna_price_valuationcharacterUN system of national accounts price valuation
    lending_categorycharacterInternational Development Association (IDA), Interanational Bank of Reconstruction and Development (IBRD), a blend or neither
    other_groupscharacterHeavily Indebted Poor Countries initiative (HIPC), or countries classified as the "Euro area"
    system_of_national_accountsintegerwhich System of National Accounts methodology the country uses (1968, 1993, or 2008 version)
    balance_of_payments_manual_in_usecharacterthe version of the Balance of Payments Manual used by the country
    external_debt_reporting_statuscharacterestimate, preliminary, or actual
    system_of_tradecharacterUnder the general system imports include goods imported for domestic consumption and imports into bonded warehouses and free trade zones. Under the special system imports comprise goods imported for domestic consumption (including transformation and repair) and withdrawals for domestic consumption from bonded warehouses and free trade zones. Goods transported through a country en route to another are excluded.
    government_accounting_conceptcharactergovernment accounting concept
    imf_data_dissemination_standardcharacterInternational Monetary Fund data-dissemination standard: Special Data Dissemination Standard (SDDS, 1996, created for countries
    that have or seek to have access to international markets), SDDS Plus (2012, the highest tier of data standards, intended for systemically important economies), enhanced GDDS (e-GDDS, 2015, encouraging participants to emphasize data publication)
    latest_household_surveycharacterwhich household survey was most recently administered
    source_of_most_recent_income_and_expenditure_datacharacterwhich survey serves as the basis for income and expenditure data
    vital_registration_completelogicalwhether the vital registration is complete
    latest_agricultural_censusintegeryear of latest agricultural census
    latest_industrial_dataintegeryear of latest industrial data
    latest_trade_datain...
  11. d

    Working groups, gender and publication impact of Canada’s ecology and...

    • search.dataone.org
    Updated Mar 5, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qian Wei; Diane Srivastava; Francois Lachapelle; Sylvia Fuller (2025). Working groups, gender and publication impact of Canada’s ecology and evolution faculty [Dataset]. https://search.dataone.org/view/sha256%3A20fcbcb6e5ea2dbd525f9edf9064e3c36e9031f27c0a725336e238663d680d86
    Explore at:
    Dataset updated
    Mar 5, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Qian Wei; Diane Srivastava; Francois Lachapelle; Sylvia Fuller
    Time period covered
    Jan 1, 2024
    Description

    Working groups are recognized as a highly effective method for synthesizing science. It is less clear if participating in working groups benefits individual researchers, or if benefits differ between men and women. This is a critical question, for the working group method is not sustainable if the benefit to science comes at a cost to academic careers or gender equity. Here, we analyze the publications of Canadian university faculty specialized in ecology and evolution (N=1244), a field that has embraced the working group method. Researchers were more likely to have participated in a working group as their academic age and prior H-index increased, but controlling for these factors there was no effect of gender. Using a longitudinal analysis, we find that researcher H-indices accrue 14% faster following their first working group publication, regardless of gender. Part of this acceleration may be the 3- to 5-fold higher citation rate of working group synthesis publications. In a survey (N..., We compiled information on 1,244 faculty members at Canadian universities who were funded by a NSERC Discovery grant (Evolution and Ecology subcommittee) between 1991 and 2019. This information included assumed binary gender from first names and institutional website use of pronouns and photographs (coded men, women); we acknowledge that we may have mis-assigned gender or failed to notice non-binary, transitional or fluid gender identities. We also collected information on the researcher’s year of PhD and all institutions they were affiliated with during their research career. This information was obtained from public curriculum vitae, institutional websites, personally-maintained researcher websites, academic networking platforms (LinkedIn, Research Gate), Google Scholar, and other public sources such as obituaries. For each researcher, we reconstructed their H-index through time using (1) a compiled list of their peer-reviewed publications and (2) the citations for each publication, f..., , # Data from: "Working groups, gender and publication impact of Canada’s ecology and evolution faculty"

    Description of the Data and file structure

    This readme file describes the (1) scripts and (2) datafiles included in this repository. Missing data in data files are indicated as NA. All data files are in .csv format meaning that “,†is used as the separator.

    (1) Scripts

    paper_analysis.do:

    This do file is written using Stata 18.0. This script analyses whether working group (WG) experience has a significant impact on researchers' Hindex progression and whether this benefit or WG participation is gendered.

    Input: researcher_database.csv Output: Table 1, Table 2, Figure 1(a) (b) (c), Figure 3(a) (b)

    SSS_citations_public.R:

    This script is written in the programming language R. The script analyses the effect of research type and research method on the citations of publications using generalized linear models. The script also plots this data.

    Input: syn_sc_socio_public_...

  12. PAAL ADL Accelerometry dataset v2.0

    • zenodo.org
    • observatorio-cientifico.ua.es
    • +1more
    csv, zip
    Updated Jan 12, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pau Climent-Pérez; Pau Climent-Pérez; Ángela María Muñoz-Antón; Angelica Poli; Angelica Poli; Susanna Spinsante; Susanna Spinsante; Francisco Florez-Revuelta; Francisco Florez-Revuelta; Ángela María Muñoz-Antón (2022). PAAL ADL Accelerometry dataset v2.0 [Dataset]. http://doi.org/10.5281/zenodo.5785955
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Jan 12, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Pau Climent-Pérez; Pau Climent-Pérez; Ángela María Muñoz-Antón; Angelica Poli; Angelica Poli; Susanna Spinsante; Susanna Spinsante; Francisco Florez-Revuelta; Francisco Florez-Revuelta; Ángela María Muñoz-Antón
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The PAAL ADL Accelerometry dataset (v2.0) has been acquired with a high-quality wearable multisensor device, the Empatica E4. In this dataset, among the signals collected by the sensors embedded in the Empatica E4, only the acceleration has been extracted to monitor the users performing different activities of daily living (ADLs). To promote the real-life acquisition procedure, subjects acted in their natural environment, with no instructions about how and for how long to perform each activity (other than a minimum time). The device was worn on the dominant hand.

    The dataset includes 24 different ADLs performed using real objects. Each activity was repeated between 3 and 5 times (on average) by 52 healthy subjects, characterized by a gender balance (26 women and 26 men), and a large age range (between 18 and 77 years, mean = 44.08 years and standard deviation = 17.06 years).

    The PAAL ADL Accelerometry dataset (v2.0) is composed of three files:

    • users.csv: each line contains (participant id, gender, age) of each user performing the ADLs in the dataset. N.B: gender labels are 'man' and 'woman'.
    • ADLs.csv: each line contains (ADL id, ADL name)
    • data.zip: folder with 6,072 files of accelerometer data of users performing ADLs. The name of each file indicates the name of the ADL, the user id and the repetition. Each row in the files represents the continuous gravitational force (g) applied to each of the three spacial dimensions (x, y, and z). The scale is limited to [-2g, +2g]. The sampling frequency is 32 Hz, with a resolution of 0.015 g (8 bit). More information about the format here.
  13. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Social Security Administration (2022). Baby Names from Social Security Card Applications - National Data [Dataset]. https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-data
Organization logo

Baby Names from Social Security Card Applications - National Data

Explore at:
15 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
May 5, 2022
Dataset provided by
Social Security Administrationhttp://www.ssa.gov/
Description

The data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 onward.

Search
Clear search
Close search
Google apps
Main menu