63 datasets found
  1. How to Rank Journals

    • plos.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Corey J. A. Bradshaw; Barry W. Brook (2023). How to Rank Journals [Dataset]. http://doi.org/10.1371/journal.pone.0149852
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Corey J. A. Bradshaw; Barry W. Brook
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    There are now many methods available to assess the relative citation performance of peer-reviewed journals. Regardless of their individual faults and advantages, citation-based metrics are used by researchers to maximize the citation potential of their articles, and by employers to rank academic track records. The absolute value of any particular index is arguably meaningless unless compared to other journals, and different metrics result in divergent rankings. To provide a simple yet more objective way to rank journals within and among disciplines, we developed a κ-resampled composite journal rank incorporating five popular citation indices: Impact Factor, Immediacy Index, Source-Normalized Impact Per Paper, SCImago Journal Rank and Google 5-year h-index; this approach provides an index of relative rank uncertainty. We applied the approach to six sample sets of scientific journals from Ecology (n = 100 journals), Medicine (n = 100), Multidisciplinary (n = 50); Ecology + Multidisciplinary (n = 25), Obstetrics & Gynaecology (n = 25) and Marine Biology & Fisheries (n = 25). We then cross-compared the κ-resampled ranking for the Ecology + Multidisciplinary journal set to the results of a survey of 188 publishing ecologists who were asked to rank the same journals, and found a 0.68–0.84 Spearman’s ρ correlation between the two rankings datasets. Our composite index approach therefore approximates relative journal reputation, at least for that discipline. Agglomerative and divisive clustering and multi-dimensional scaling techniques applied to the Ecology + Multidisciplinary journal set identified specific clusters of similarly ranked journals, with only Nature & Science separating out from the others. When comparing a selection of journals within or among disciplines, we recommend collecting multiple citation-based metrics for a sample of relevant and realistic journals to calculate the composite rankings and their relative uncertainty windows.

  2. Supplementary data on journal quartiles and citation indicators across...

    • zenodo.org
    png
    Updated Apr 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Serhii Nazarovets; Serhii Nazarovets (2025). Supplementary data on journal quartiles and citation indicators across disciplines [Dataset]. http://doi.org/10.5281/zenodo.15206056
    Explore at:
    pngAvailable download formats
    Dataset updated
    Apr 13, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Serhii Nazarovets; Serhii Nazarovets
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset provides supplementary data extracted and processed from the SCImago Journal Rank portal (2023) and the Scopus Discontinued Titles list (February 2025). It includes journal-level metrics such as SJR and h-index, quartile assignments, and subject category information. The files are intended to support exploratory analysis of citation patterns, disciplinary variations, and structural characteristics of journal evaluation systems. The dataset also contains Python code and visual materials used to examine relationships between prestige metrics and cumulative citation indicators.

    Contents:

    • Scimago Journal Rank 2023.xlsx – full SJR dataset with quartile and h-index data.
    • Q1 journals with h-index below 5 (SJR 2023).xlsx – filtered subset of Q1 journals with low citation impact.
    • Relationship between journal h-index and SJR 2023.png – visualization of SJR vs h-index by quartile.
    • Scopus Discontinued Titles (Feb 2025) – list of discontinued sources from Scopus used for consistency checks.
    • Python script for data processing and visualization.
  3. f

    Data from: Publication rates of editorial board members in oral health...

    • datasetcatalog.nlm.nih.gov
    • scielo.figshare.com
    Updated Sep 26, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RÖSING, Cassiano Kuchenbecker; HAAS, Alex Nogueira; JUNGES, Roger (2018). Publication rates of editorial board members in oral health journals [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000678944
    Explore at:
    Dataset updated
    Sep 26, 2018
    Authors
    RÖSING, Cassiano Kuchenbecker; HAAS, Alex Nogueira; JUNGES, Roger
    Description

    The aim of this study was to measure the publication rate of editorial board members in their board journals and to evaluate associated variables. We evaluated the ten highest-ranked journals according to the 5-year impact factor under ‘Dentistry, Oral Surgery & Medicine’ subject category for 2010, 2011, and 2012. All original research papers with at least one member of the editorial board as author were counted. Final analyses assessed associated variables such as size of the editorial board, number of papers published each year, and each journal’s impact factor. Overall, there was an increase in the average number of articles published from 2010 (115.2 ± 52.2) to 2012 (134.7 ± 47.4). The number and percentage of articles published with editorial board members as authors over the three years did not follow the same pattern, with a slight decrease from 2010 to 2011 and an increase in 2012. The number of articles with editorial board members as authors was significantly higher for journals with impact factors ≥4.0. Journals with a higher impact factor and larger editorial board were associated with higher chances of editorial board members publishing in their respective journals. Participation of editorial board members as authors in publishing varies significantly among journals.

  4. n

    Top 100-Ranked Clinical Journals' Preprint Policies as of April 23, 2020

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Sep 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dorothy Massey; Joshua Wallach; Joseph Ross; Michelle Opare; Harlan Krumholz (2020). Top 100-Ranked Clinical Journals' Preprint Policies as of April 23, 2020 [Dataset]. http://doi.org/10.5061/dryad.jdfn2z38f
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 6, 2020
    Dataset provided by
    Yale University
    Yale New Haven Hospital
    Yale School of Public Health
    Authors
    Dorothy Massey; Joshua Wallach; Joseph Ross; Michelle Opare; Harlan Krumholz
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Objective: To determine the top 100-ranked (by impact factor) clinical journals' policies toward publishing research previously published on preprint servers (preprints).

    Design: Cross sectional. Main outcome measures: Editorial guidelines toward preprints, journal rank by impact factor.

    Results: 86 (86%) of the journals examined will consider papers previously published as preprints (preprints), 13 (13%) determine their decision on a case-by-case basis, and 1 (1%) does not allow preprints.

    Conclusions: We found wide acceptance of publishing preprints in the clinical research community, although researchers may still face uncertainty that their preprints will be accepted by all of their target journals.

    Methods We examined journal policies of the 100 top-ranked clinical journals using the 2018 impact factors as reported by InCites Journal Citation Reports (JCR). First, we examined all journals with an impact factor greater than 5, and then we manually screened by title and category do identify the first 100 clinical journals. We included only those that publish original research. Next, we checked each journal's editorial policy on preprints. We examined, in order, the journal website, the publisher website, the Transpose Database, and the first 10 pages of a Google search with the journal name and the term "preprint." We classified each journal's policy, as shown in this dataset, as allowing preprints, determining based on preprint status on a case-by-case basis, and not allowing any preprints. We collected data on April 23, 2020.

    (Full methods can also be found in previously published paper.)

  5. Z

    Drivers and Barriers for Open Access Publishing - WoS 2016 Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sergio Ruiz-Perez (2020). Drivers and Barriers for Open Access Publishing - WoS 2016 Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_842012
    Explore at:
    Dataset updated
    Jan 24, 2020
    Authors
    Sergio Ruiz-Perez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Answers to a survey on gold Open Access run from July to October 2016. The dataset contains 15,235 unique responses from Web of Science published authors. This survey is part of a PhD thesis from the University of Granada in Spain. More details about the study can be found in the full text document, also available in Zenodo.

    Following are listed the questions related to the WoS 2016 dataset. Please note that countries with less than 40 answers are listed as "Other" in order to preserve anonymity.

    • 1. How many years have you been employed in research?

    Fewer than 5 years

    5-14 years

    15-24 years

    25 years or longer

    Many of the questions that follow concern Open Access publishing. For the purposes of this survey, an article is Open Access if its final, peer-reviewed, version is published online by a journal and is free of charge to all users without restrictions on access or use.

    • 2. Do any journals in your research field publish Open Access articles?

    Yes

    No

    I do not know

    • 3. Do you think your research field benefits, or would benefit from journals that publish Open Access articles?

    Yes

    No

    I have no opinion

    I do not care

    • 4. How many peer reviewed research articles (Open Access or not Open Access) have you published in the last five years?

    1-5

    6-10

    11-20

    21-50

    More than 50

    • 5. What factors are important to you when selecting a journal to publish in?

    [Each factor may be rated "Extremely important", "Important", "Less important" or "Irrelevant". The factors are presented in random order.]

    Importance of the journal for academic promotion, tenure or assessment

    Recommendation of the journal by my colleagues

    Positive experience with publisher/editor(s) of the journal

    The journal is an Open Access journal

    Relevance of the journal for my community

    The journal fits the policy of my organisation

    Prestige/perceived quality of the journal

    Likelihood of article acceptance in the journal

    Absence of journal publication fees (e.g. submission charges, page charges, colour charges)

    Copyright policy of the journal

    Journal Impact Factor

    Speed of publication of the journal

    1. Who usually decides which journals your articles are submitted to? (Choose more than one answer if applicable)

    The decision is my own

    A collective decision is made with my fellow authors

    I am advised where to publish by a senior colleague

    The organisation that finances my research advises me where to publish

    Other (please specify) [Text box follows]

    1. Approximately how many Open Access articles have you published in the last five years?

    0

    1-5

    6-10

    More than 10

    I do not know

    [If the answer is "0", the survey jumps to Q10.]

    • 8. What publication fee was charged for the last Open Access article you published?

    No charge

    Up to €250 ($275)

    €251-€500 ($275-$550)

    €501-€1000 ($551-$1100)

    €1001-€3000 ($1101-$3300)

    More than €3000 ($3300)

    I do not know

    [If the answer is "No charge or I don't know" the survey jumps to Q20. ]

    • 9. How was this publication fee covered? (Choose more than one answer if applicable)

    My research funding includes money for paying such fees

    I used part of my research funding not specifically intended for paying such fees

    My institution paid the fees

    I paid the costs myself

    Other (please specify) [Text box follows]

    • 10. How easy is it to obtain funding if needed for Open Access publishing from your institution or the organisation mainly responsible for financing your research?

    Easy

    Difficult

    I have not used these sources

    • 11. Listed below are a series of statements, both positive and negative, concerning Open Access publishing. Please indicate how strongly you agree/disagree with each statement.

    [Each statement may be rated "Strongly agree", "Agree", "Neither agree nor disagree", "Disagree" or "Strongly disagree". The statements are presented in random order.]

    Researchers should retain the rights to their published work and allow it to be used by others

    Open Access publishing undermines the system of peer review

    Open Access publishing leads to an increase in the publication of poor quality research

    If authors pay publication fees to make their articles Open Access, there will be less money available for research

    It is not beneficial for the general public to have access to published scientific and medical articles

    Open Access unfairly penalises research-intensive institutions with large publication output by making them pay high costs for publication

    Publicly-funded research should be made available to be read and used without access barrier

    Open Access publishing is more cost-effective than subscription-based publishing and so will benefit public investment in research

    Articles that are available by Open Access are likely to be read and cited more often than those not Open Access

    This study and its questionnaire are based on the SOAP Project (http://project-soap.eu). An article describing the highlights of the SOAP Survey is available at: https://arxiv.org/abs/1101.5260. The dataset of the SOAP survey is available at http://bit.ly/gSmm71. A manual describing the SOAP dataset is available at http://bit.ly/gI8nc.

  6. f

    Data from: Distribution of Female and Male First and Last Authorship across...

    • acs.figshare.com
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated Jun 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jacqueline E. McLaughlin; Jacob M. Bachelder; Kristy M. Ainslie (2023). Distribution of Female and Male First and Last Authorship across Drug Delivery Related Journals with Respect to Year and Journal Impact Factor [Dataset]. http://doi.org/10.1021/acs.molpharmaceut.3c00328.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 23, 2023
    Dataset provided by
    ACS Publications
    Authors
    Jacqueline E. McLaughlin; Jacob M. Bachelder; Kristy M. Ainslie
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    First and last authorship are important metrics of productivity and scholarly success for trainees and professors. For 11 drug delivery-related journals in 2021, the percentage of female first (39.5%) and last (25.7%) authorship was reported. A strong negative correlation, with female first (rp = −0.73) and female last authorship (rp = −0.66), was observed with respect to journal impact factor. In contrast, there was a strong positive correlation with male first and last authorship (rp = 0.71). Papers were ∼1.5 times more likely to have a male first author, and ∼3 times more likely to have a male last author, than females. A female was 22% more likely to have first authorship if the last author was female, although there is an ∼1% increase per year in female authorship with male last authorship, which equates to equality in first authorship by 2044. Considering that drug delivery is composed of engineering, chemistry, and pharmaceutical science disciplines, the observed 25.7% female last authorship does not represent the approximately 35.5% to 50% of professors that are female in these disciplines, internationally. Overall, female authorship in drug delivery-related journals should improve to better represent the work of female senior authors.

  7. r

    Annals of Botany Impact Factor 2024-2025 - ResearchHelpDesk

    • researchhelpdesk.org
    Updated Feb 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Research Help Desk (2022). Annals of Botany Impact Factor 2024-2025 - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/impact-factor-if/331/annals-of-botany
    Explore at:
    Dataset updated
    Feb 23, 2022
    Dataset authored and provided by
    Research Help Desk
    Description

    Annals of Botany Impact Factor 2024-2025 - ResearchHelpDesk - Annals of Botany is an international plant science journal publishing novel and rigorous research in all areas of plant science. It is published monthly in both electronic and printed forms with at least two extra issues each year that focus on a particular theme in plant biology. The Journal is managed by the Annals of Botany Company, a not-for-profit educational charity established to promote plant science worldwide. The Journal publishes original research papers, invited and submitted review articles, ‘Research in Context’ expanding on original work, 'Botanical Briefings' as short overviews of important topics, and ‘Viewpoints’ giving opinions. All papers in each issue are summarized briefly in Content Snapshots , there are topical news items in the Plant Cuttings section and Book Reviews . A rigorous review process ensures that readers are exposed to genuine and novel advances across a wide spectrum of botanical knowledge. All papers aim to advance knowledge and make a difference to our understanding of plant science. The Annals of Botany Company is a Limited Company registered in England No. 78001 at University of Exeter, Innovation Centre, Rennes Drive, Exeter EX4 4RN, UK, and is also a Registered Charity, No. 237771. Abstract & indexing The journal is covered by the following services: Abstracts on Hygiene and Communicable Diseases Agricultural Engineering Abstracts Agbiotech News and Information Abstracts in Anthropology Agroforestry Abstracts Aquatic Sciences and Fisheries Abstracts ASCI Database Biocontrol News and Information Biological Abstracts Biological and Agricultural Index BIOSIS Previews CAB Abstracts Chemical Abstracts Crop Physiology Abstracts Current Contents® /Agriculture, Biology, and Environmental Sciences Dairy Science Abstracts Ecology Abstracts Elsevier BIOBASE - Current Awareness in Biological Sciences (CABS) Environmental Science and Pollution Management Excerpta Medica Abstract Journals Field Crop Abstracts Food Science and Technology Abstracts Forest Products Abstracts Forestry Abstracts Geobase Global Health Grasslands & Forage Abstracts Horticultural Abstracts Irrigation and Drainage Abstracts Journal Citation Reports /Science Edition Maize Abstracts Online Nematological Abstracts Nutrition Abstracts and Reviews Oceanic Abstracts Ornamental Horticulture Plant Breeding Abstracts Plant Genetic Resources Abstracts Plant Growth Regulators Post Harvest News and Information Potato Abstracts Poultry Abstracts PROQUEST DATABASE : AGRICOLA PlusText PROQUEST DATABASE : MEDLINE with Full Text PROQUEST DATABASE : ProQuest 5000 International PROQUEST DATABASE : ProQuest Agriculture Journals PROQUEST DATABASE : ProQuest Biology Journals PROQUEST DATABASE : ProQuest Central PROQUEST DATABASE : ProQuest Health & Medical Complete PROQUEST DATABASE : ProQuest Medical Library PROQUEST DATABASE : ProQuest Pharma Collection PubMed Review of Agricultural Entomology Review of Aromatic and Medicinal Plants Review of Plant Pathology Rice Abstracts Science Citation Index Expanded (SciSearch®) Science Citation Index® Seed Abstracts Soybean Abstracts Sugar Industry Abstracts The Standard Periodical Directory VITIS - Viticulture and Oenology Abstracts Water Resources Abstracts Weed Abstracts Wheat, Barley and Triticale Abstracts Wildlife Review

  8. Remote Work Health Impact Survey 2025

    • kaggle.com
    zip
    Updated Jul 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pratyush Puri (2025). Remote Work Health Impact Survey 2025 [Dataset]. https://www.kaggle.com/datasets/pratyushpuri/remote-work-health-impact-survey-2025/data
    Explore at:
    zip(45354 bytes)Available download formats
    Dataset updated
    Jul 1, 2025
    Authors
    Pratyush Puri
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description

    The Post-Pandemic Remote Work Health Impact 2025 dataset presents a comprehensive, global snapshot of how remote, hybrid, and onsite work arrangements are influencing the mental and physical health of employees in the post-pandemic era. Collected in June 2025, this dataset aggregates responses from a diverse workforce spanning continents, industries, age groups, and job roles. It is designed to support research, data analysis, and policy-making around the evolving landscape of work and well-being.

    This dataset enables in-depth exploration of: - The prevalence of mental health conditions (e.g., anxiety, burnout, PTSD, depression) across different work setups. - The relationship between work arrangements and physical health complaints (e.g., back pain, eye strain, neck pain). - Variations in work-life balance, social isolation, and burnout levels segmented by demographic and occupational factors. - Salary distributions and their correlation with health outcomes and job roles.

    By providing granular, anonymized data on both subjective (self-reported) and objective (hours worked, salary range) factors, this resource empowers data scientists, health researchers, HR professionals, and business leaders to: - Identify risk factors and protective factors for employee well-being. - Benchmark health impacts across industries and regions. - Inform organizational policy and future-of-work strategies.

    Dataset Structure

    The dataset is in CSV format, with each row representing an individual survey response. Below is a detailed explanation of each column:

    Column NameDescriptionExample Values
    Survey_DateDate when the survey response was submitted (YYYY-MM-DD)2025-06-01
    AgeAge of the respondent (in years)27, 52, 40
    GenderGender identity of the respondentFemale, Male, Non-binary, Prefer not to say
    RegionGeographical region of employmentAsia, Europe, North America, Africa, Oceania
    IndustryIndustry sector of the respondentTechnology, Manufacturing, Finance, Healthcare
    Job_RoleSpecific job title or functionData Analyst, HR Manager, Software Engineer
    Work_ArrangementPrimary work modeOnsite, Remote, Hybrid
    Hours_Per_WeekAverage number of hours worked per week36, 55, 64
    Mental_Health_StatusPrimary self-reported mental health conditionAnxiety, Burnout, Depression, None, PTSD
    Burnout_LevelSelf-assessed burnout (categorical: Low, Medium, High)High, Medium, Low
    Work_Life_Balance_ScoreSelf-rated work-life balance on a scale of 1 (poor) to 5 (excellent)1, 3, 5
    Physical_Health_IssuesSelf-reported physical health complaints (semicolon-separated if multiple)Back Pain; Eye Strain; Neck Pain; None
    Social_Isolation_ScoreSelf-rated social isolation on a scale of 1 (none) to 5 (severe)1, 2, 5
    Salary_RangeAnnual salary range in USD$40K-60K, $80K-100K, $120K+

    Example Data Row

    Survey_Date2025-06-01
    Age27
    GenderFemale
    RegionAsia
    IndustryProfessional Services
    Job_RoleData Analyst
    Work_ArrangementOnsite
    Hours_Per_Week64
    Mental_Health_StatusStress Disorder
    Burnout_LevelHigh
    Work_Life_Balance_Score3
    Physical_Health_IssuesShoulder Pain; Neck Pain
    Social_Isolation_Score2
    Salary_Range$40K-60K

    Key Features

    • Global Coverage: Responses from all ...
  9. d

    Data Visualization in Social Work Research

    • search.dataone.org
    • dataverse.harvard.edu
    • +2more
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rothwell, David; Esposito, Tonino; Wegner-Lohin (2023). Data Visualization in Social Work Research [Dataset]. http://doi.org/10.7910/DVN/I6IIXL
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Rothwell, David; Esposito, Tonino; Wegner-Lohin
    Time period covered
    Jan 1, 2009 - Jan 1, 2012
    Description

    Research dissemination and knowledge translation are imperative in social work. Methodological developments in data visualization techniques have improved the ability to convey meaning and reduce erroneous conclusions. The purpose of this project is to examine: (1) How are empirical results presented visually in social work research?; (2) To what extent do top social work journals vary in the publication of data visualization techniques?; (3) What is the predominant type of analysis presented in tables and graphs?; (4) How can current data visualization methods be improved to increase understanding of social work research? Method: A database was built from a systematic literature review of the four most recent issues of Social Work Research and 6 other highly ranked journals in social work based on the 2009 5-year impact factor (Thomson Reuters ISI Web of Knowledge). Overall, 294 articles were reviewed. Articles without any form of data visualization were not included in the final database. The number of articles reviewed by journal includes : Child Abuse & Neglect (38), Child Maltreatment (30), American Journal of Community Psychology (31), Family Relations (36), Social Work (29), Children and Youth Services Review (112), and Social Work Research (18). Articles with any type of data visualization (table, graph, other) were included in the database and coded sequentially by two reviewers based on the type of visualization method and type of analyses presented (descriptive, bivariate, measurement, estimate, predicted value, other). Additional revi ew was required from the entire research team for 68 articles. Codes were discussed until 100% agreement was reached. The final database includes 824 data visualization entries.

  10. Sample characteristics of students in transdisciplinary (TD) and traditional...

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anna-Sigrid Keck; Stephanie Sloane; Janet M. Liechty; Barbara H. Fiese; Sharon M. Donovan (2023). Sample characteristics of students in transdisciplinary (TD) and traditional doctoral programs at time of enrollment and advisor characteristics at program year 5. [Dataset]. http://doi.org/10.1371/journal.pone.0189391.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Anna-Sigrid Keck; Stephanie Sloane; Janet M. Liechty; Barbara H. Fiese; Sharon M. Donovan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sample characteristics of students in transdisciplinary (TD) and traditional doctoral programs at time of enrollment and advisor characteristics at program year 5.

  11. Regression Results, Poisson Model, Equation (5): Economics.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David I. Stern (2023). Regression Results, Poisson Model, Equation (5): Economics. [Dataset]. http://doi.org/10.1371/journal.pone.0112520.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    David I. Stern
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Standard errors in parentheses.Regression Results, Poisson Model, Equation (5): Economics.

  12. d

    Characterization Factors for Construction Material EPD Indicators...

    • catalog.data.gov
    Updated May 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office of Chemical Safety and Pollution Prevention (2024). Characterization Factors for Construction Material EPD Indicators (ISO21930-LCIA-US) v0.1 [Dataset]. https://catalog.data.gov/dataset/characterization-factors-for-construction-material-epd-indicators-iso21930-lcia-us-v0-1
    Explore at:
    Dataset updated
    May 2, 2024
    Dataset provided by
    Office of Chemical Safety and Pollution Prevention
    Description

    This dataset contains characterization factors (CFs) for the five mandatory life cycle impact assessment (LCIA) categories required in ISO 21930:2017: 1. Greenhouse gases (GHG), which is incorrectly named ‘GWP’ in the standard, 2. Ozone Depletion Potential (ODP), 3. Eutrophication Potential (EP), 4. Acidification Potential (AP), and 5. Photochemical Ozone Formation Potential (POCP) These CFs are appropriate for use with life cycle inventory data for activities occurring within the United States. The short name for the dataset is ISO21930-LCIA-US v0.1. The characterization factors, with the exception of GHGs, are identical to the those currently in TRACI v2.1 for the corresponding impact categories. The four TRACI v2.1 impact categories have the same names as ISO 21930:2017 with the exception of POCP, which is called “smog formation” in TRACI v2.1. The characterization factors for GHGs are the 100-year (GWP-100) GWPs from the International Panel for Climate Change (IPCC)’s 5th Assessment Report (AR5) report. The names for the chemicals, release contexts, units and IDs are from the Federal LCA Elementary Flow List (FEDEFL) v1.2. These datasets were created using the LCIA Formatter v1.1.2 (https://github.com/USEPA/LCIAformatter). Formats Datasets are provided in simple tables in Excel, in the openLCA JSON-LD format using Federal LCA Commons standards, and in Apache parquet format. The fields in the Excel and identical parquet versions use the LCIAmethod format fields: https://github.com/USEPA/LCIAformatter/blob/master/format%20specs/LCIAmethod.md 1. Zip archives of JSON files in the JSON-LD schema: a file type associated with the openLCA schema. Two JSON-LD versions are provided. a. “ISO21930-LCIA-USv0.1_noflows_json-ld.zip” is without flow objects. b. “ISO21930-LCIA-USv0.1_wprefflows_json-ld.zip” is with flow objects of preferred flows from the FEDEFL. See usage notes below. 2. Excel and parquet: tabular format according to schema from the LCIA formatter, with additional fields added: o “source_method”: indicates the original method source for the indicator (e.g., TRACI 2.1 or IPCC) o “source_indicator”: indicates the name of the indicator in its original form (e.g. Smog Formation) o “category”: indicates the desired parent folder name for the impact category (shown as “EPA EPD in Figure 1) Usage Generally, in all formats, the CFs can be multiplied by kg (or unit specified in the denominator) of the relevant chemical emitted to calculate the potential impact value for a given impact category for that relevant chemical. If no CF exists for a chemical in a given impact category, it is not considered to have an impact in that category. The parquet format is most efficient for import into applications or scripts using languages like Python and R. The Zip archives of JSON-LD files can be loaded into openLCA or other LCA or EPD software supporting that format. When loaded into openLCA (via JSON-LD), the method shows as a separate impact assessment method. Individual indicators are categorized within the EPA EPD category. For introduction to importing a dataset into openLCA we recommend this training video from the National Renewable Energy Laboratory. https://youtu.be/YLao5jC5b_0?si=H0SNZ_ufOwInkgCF&t=48 The version with no flows is designed to import in a database that already has FEDEFL elementary flows or no more modeling is to be done that would use any new flows. It will only create the LCIA method. The version with flows can be imported into a new ‘empty’ database and it will create not just the LCIA method but all associated flows and more basic objects like units and flow properties. It can be used when no process data that you wish to model has been created yet and/or if you want to have a full import of all relevant elementary flows.

  13. Mortality Rate (Under-5, Per 1000 Live Births)

    • kaggle.com
    zip
    Updated Nov 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hafiz Amsal (2024). Mortality Rate (Under-5, Per 1000 Live Births) [Dataset]. https://www.kaggle.com/datasets/hafizamsal/mortality-rate-under-5-per-1000-live-births
    Explore at:
    zip(26849 bytes)Available download formats
    Dataset updated
    Nov 29, 2024
    Authors
    Hafiz Amsal
    License

    https://www.worldbank.org/en/about/legal/terms-of-use-for-datasetshttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasets

    Description

    Kaggle Dataset Description

    Title: Mortality Rate (Under-5, Per 1000 Live Births)
    Subtitle: Exploring global trends in child survival and health advancements.

    Detailed Description:
    This dataset contains the under-5 mortality rate, measured as the number of deaths per 1,000 live births for children under five years of age. Sourced from the World Bank, it highlights progress in child survival and health outcomes globally over decades.

    Key Highlights: - Annual data for countries worldwide. - Metric: Mortality rate (under-5, per 1000 live births). - Use cases: Analyze trends, compare regional disparities, and correlate mortality rates with health and economic indicators.

    4. Exploratory Data Analysis (EDA)

    Notebook Ideas

    1. Data Cleaning:

      • Handle missing or inconsistent data.
      • Normalize data for comparison across regions.
      • Add calculated fields like regional averages or year-over-year changes.
    2. Visualizations:

      • Line Graph: Trends in under-5 mortality rates over time for selected countries.
      • Heatmap: Mortality rates by region and year.
      • Scatterplot: Correlation between mortality rates and healthcare expenditure or GDP per capita.
      • Bar Chart: Top and bottom countries by under-5 mortality for a specific year.
    3. Descriptive Analysis:

      • Highlight countries with the most significant reductions in mortality.
      • Analyze regional improvements over decades (e.g., Sub-Saharan Africa vs. South Asia).

    5. Predictive Analysis (Optional)

    • Use time-series forecasting (e.g., ARIMA or Prophet) to predict future mortality rates for specific countries or regions.
    • Explore regression models to analyze the impact of factors like healthcare expenditure on mortality reduction.

    6. Kaggle Notebook

    Create a Kaggle notebook with: 1. Data Cleaning: Show how missing or inconsistent values are handled. 2. EDA: Include visualizations like heatmaps, scatterplots, and line charts. 3. Insights: Highlight significant findings, such as countries with notable improvements in child survival. 4. Optional Predictive Modeling: Use regression or time-series models to project future trends.

    7. Call to Action

    For GitHub:

    • Share the GitHub repository link on LinkedIn, Twitter, and forums like Reddit (e.g., r/datascience).
    • Invite collaboration:
      • "Fork this repository to add your analyses or insights!"

    GitHub Link: https://github.com/yourusername/Under5_Mortality_Trends

    For Kaggle:

    • Encourage upvotes:
      • "If this dataset helps you, consider upvoting it to help others discover it!"
    • Include questions to engage users:
      • "Which regions have made the most progress in reducing child mortality?"
      • "What correlations can be drawn between healthcare expenditure and mortality rates?"

    Kaggle Link: https://www.kaggle.com/datasets/yourusername/under5-mortality-rate

    8. LinkedIn Post

    Post Title:
    📉 Global Trends in Under-5 Mortality Rates 🌍

    Post Body:
    I’m excited to share my latest dataset on under-5 mortality rates (per 1,000 live births), sourced from the World Bank. This dataset highlights progress in global health and child survival, spanning decades and covering countries worldwide.

    📂 Explore the Dataset:
    - GitHub Repository: https://github.com/yourusername/Under5_Mortality_Trends
    - Kaggle Dataset: https://www.kaggle.com/datasets/yourusername/under5-mortality-rate

    Why It Matters:

    Child survival is a fundamental measure of global health progress. This dataset is ideal for:
    - Trend Analysis: Explore how under-5 mortality rates have evolved globally.
    - Regional Comparisons: Identify disparities in child survival rates across regions.
    - Correlations: Study the relationship between mortality rates and economic indicators like healthcare expenditure or GDP per capita.

    📈 Get Involved:
    - Use the dataset for your own analyses and visualizations.
    - Share your insights and findings.
    - Upvote the Kaggle dataset to help others discover it!

    What trends or correlations do you find in the data?
    - Which country or region has shown the most improvement?
    - What factors would you analyze further?

    Let me know your thoughts, and feel free to share this resource with others who might benefit! 🌟

    DataScience #ChildHealth #MortalityRates #WorldBankData #DataVisualization #GitHub #Kaggle #HealthAnalysis

    Let me know if you'd like assistance with EDA or visualization templates!

  14. u

    Data from: Inventory of online public databases and repositories holding...

    • agdatacommons.nal.usda.gov
    • catalog.data.gov
    • +1more
    txt
    Updated Feb 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erin Antognoli; Jonathan Sears; Cynthia Parr (2024). Inventory of online public databases and repositories holding agricultural data in 2017 [Dataset]. http://doi.org/10.15482/USDA.ADC/1389839
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 8, 2024
    Dataset provided by
    Ag Data Commons
    Authors
    Erin Antognoli; Jonathan Sears; Cynthia Parr
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to

    establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data

    Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered.
    Search methods We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review:

    Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection.
    Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation.

    See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt

  15. n

    Data from: Data reuse and the open data citation advantage

    • data.niaid.nih.gov
    • search.dataone.org
    • +3more
    zip
    Updated Oct 1, 2013
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heather A. Piwowar; Todd J. Vision (2013). Data reuse and the open data citation advantage [Dataset]. http://doi.org/10.5061/dryad.781pv
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 1, 2013
    Dataset provided by
    National Evolutionary Synthesis Center
    Authors
    Heather A. Piwowar; Todd J. Vision
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    Background: Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the "citation benefit". Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results: Here, we look at citation rates while controlling for many known citation predictors, and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation benefit varied with date of dataset deposition: a citation benefit was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties. Conclusion: After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered.We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.

  16. u

    Health Reports - Catalogue - Canadian Urban Data Catalogue (CUDC)

    • data.urbandatacentre.ca
    • betadata.urbandatacentre.ca
    Updated Oct 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Health Reports - Catalogue - Canadian Urban Data Catalogue (CUDC) [Dataset]. https://data.urbandatacentre.ca/dataset/gov-canada-c13fe405-ff7f-4571-8195-d38234cc6dff
    Explore at:
    Dataset updated
    Oct 19, 2025
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Canada
    Description

    Health Reports, published by the Health Analysis Division of Statistics Canada, is a peer-reviewed journal of population health and health services research. It is designed for a broad audience that includes health professionals, researchers, policymakers, and the general public. The journal publishes articles of wide interest that contain original and timely analyses of national or provincial/territorial surveys or administrative databases. New articles are published electronically each month. Health Reports had an impact factor of 2.673 for 2014 and a five-year impact factor of 4.167. All articles are indexed in PubMed. Our online catalogue is free and receives more than 500,000 visits per year. External submissions are welcome.

  17. Characteristics of included articles by Funding Source (n = 61).

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gail Rattinger; Lisa Bero (2023). Characteristics of included articles by Funding Source (n = 61). [Dataset]. http://doi.org/10.1371/journal.pone.0005826.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Gail Rattinger; Lisa Bero
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Includes articles that were sponsored by glitazone company, another drug company and other non -drug company funding (n = 2) and by glitazone company and non-drug-company (n = 2).Data from 5 articles were excluded because they were published in journals that had no impact factor. Of the N = 56 trials reported in journals with impact factors, median value N = 2.84, mean value = 4.63, range (0.34–44.02) and standard deviation σ = 6.06.**Sample size characteristics, median value N = 252, mean value = 390, range (20–4360) and standard deviation σ = 590.4.

  18. Synthetic Particle Image Dataset (SPID)

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Nov 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michel Machado; Michel Machado; Douglas Rocha; Douglas Rocha (2023). Synthetic Particle Image Dataset (SPID) [Dataset]. http://doi.org/10.5281/zenodo.7935215
    Explore at:
    binAvailable download formats
    Dataset updated
    Nov 2, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Michel Machado; Michel Machado; Douglas Rocha; Douglas Rocha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SPID is a comprehensive dataset composed of synthetic particle image velocimetry (PIV) image pairs and their corresponding exact optical flow computations. It serves as a valuable resource for researchers and practitioners in the field. The dataset is organized into three subsets: training, validation, and test, distributed in a ratio of 70%, 15%, and 15%, respectively.

    Each subset within SPID consists of an input denoted as "x", which comprises synthetic image pairs. These image pairs provide the necessary context for the optical flow computations. Additionally, an output termed "y" is provided, which represents the exact optical flow calculated for each image pair. Notably, the images within the dataset are single-channel, and the optical flow is decomposed into its u and v components.

    The shape of the input subsets in SPID is given by (number of samples, number of frames, image width, image height, number of channels), representing the dimensions of the input data. On the other hand, the shape of the output subsets is given by (number of samples, velocity components, image width, image height), denoting the shape of the optical flow data.

    It is important to mention that SPID dataset is a preprocessed version of the Raw Synthetic Particle Image Dataset (RSPID), ensuring improved usability and reliability. Moreover, the dataset is packaged as a NumPy compressed NPZ file, which conveniently stores the inputs and outputs as separate NumPy NPZ files with the labels train, validation and test as acess keys. This format simplifies data extraction and integration into machine learning frameworks and libraries, facilitating seamless usage of the dataset.

    SPID incorporates various factors that impact PIV analysis to provide a comprehensive and realistic simulation. The dataset includes image pairs with an image width of 665 pixels and an image height of 630 pixels, ensuring a high level of detail and accuracy with an 8-bit depth. It incorporates different particle radii (1, 2, 3, and 4 pixels) and particle densities (15, 17, 20, 23, 25, and 32 particles) to capture diverse particle configurations.

    To simulate real-world scenarios, SPID introduces displacement variations through the delta x factor, ranging from 0.05% to 0.25%. Noise levels (1, 5, 10, and 15) are also incorporated to mimic practical PIV measurements with varying degrees of noise. Furthermore, out-of-plane motion effects are considered with standard deviations of 0.01, 0.025, and 0.05 to assess their impact on optical flow accuracy.

    The dataset covers a wide range of flow patterns encountered in fluid dynamics. It includes Rankine uniform, Rankine vortex, parabolic, stagnation, shear, and decaying vortex flows, allowing for comprehensive testing and evaluation of PIV algorithms across different scenarios.

    By leveraging the SPID dataset, researchers can develop and validate PIV algorithms and techniques under various challenging conditions. Its realistic and diverse simulation of particle image velocimetry scenarios makes it an invaluable tool for advancing the field and improving the accuracy and reliability of optical flow computations.

  19. Salary at 30 Years of Age

    • kaggle.com
    zip
    Updated Feb 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Priyansh Awasthi (2025). Salary at 30 Years of Age [Dataset]. https://www.kaggle.com/datasets/priyansh013/salary-at-30-years-of-age
    Explore at:
    zip(130443 bytes)Available download formats
    Dataset updated
    Feb 11, 2025
    Authors
    Priyansh Awasthi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📌 Salary Prediction Dataset: Factors Affecting Salaries

    📖 Overview

    This dataset contains synthetic salary data generated to explore the impact of various factors on salary predictions. It includes attributes such as age, education level, years of experience, certifications, GPA, and job roles, providing a realistic dataset for machine learning, data analysis, and salary estimation models.

    Researchers and data scientists can use this dataset to study patterns, perform feature engineering, and develop predictive models for salary forecasting.

    📂 Dataset Information

    Column NameData TypeDescription
    AgeintAge of the individual (18-65 years)
    GendercategoryGender of the individual (Male, Female, Non-Binary)
    Education_LevelcategoryHighest education degree (High School, Bachelor's, Master's, PhD)
    Years_of_ExperienceintTotal years of work experience (0-40 years)
    Certificationsint/floatNumber of professional certifications obtained (0-5)
    GPAfloatGrade Point Average (GPA) from 0.0 to 4.0 (with some missing values)
    Job_RolecategoryJob designation (Data Scientist, Software Engineer, Manager, etc.)
    IndustrycategoryIndustry sector (Tech, Finance, Healthcare, Education, etc.)
    Company_SizecategorySize of the company (Small, Medium, Large)
    LocationcategoryWork location (Urban, Suburban, Rural)
    Remote_WorkbinaryWhether the individual works remotely (0 = No, 1 = Yes)
    SalaryfloatAnnual salary in USD ($30,000 - $300,000)

    📊 Data Preprocessing Notes

    • Missing Values: Some missing values are introduced in GPA and Certifications to simulate real-world scenarios.
    • Categorical Encoding: Text-based categorical features (Gender, Education_Level, Job_Role, etc.) need encoding before ML model training.
    • Normalization: Continuous features like Age, Years_of_Experience, and Salary should be normalized for better model performance.

    🎯 Use Cases

    This dataset is ideal for:
    Salary Prediction Models – Predict salaries based on experience, education, and industry.
    Feature Importance Analysis – Identify which factors contribute most to salary variations.
    Exploratory Data Analysis (EDA) – Discover salary trends across different demographics.
    Machine Learning Applications – Train regression or classification models for salary forecasting.

    📌 Example ML Workflow

    Here’s how you can use this dataset for salary prediction:

    import pandas as pd
    from sklearn.model_selection import train_test_split
    from sklearn.ensemble import RandomForestRegressor
    from sklearn.preprocessing import OneHotEncoder, StandardScaler
    
    # Load dataset
    df = pd.read_csv("salary_dataset.csv")
    
    # Handle missing values
    df["GPA"].fillna(df["GPA"].median(), inplace=True)
    df["Certifications"].fillna(0, inplace=True)
    
    # Encode categorical variables
    categorical_features = ["Gender", "Education_Level", "Job_Role", "Industry", "Company_Size", "Location"]
    df = pd.get_dummies(df, columns=categorical_features, drop_first=True)
    
    # Define X, y
    X = df.drop(columns=["Salary"])
    y = df["Salary"]
    
    # Train-test split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    # Train model
    model = RandomForestRegressor(n_estimators=100, random_state=42)
    model.fit(X_train, y_train)
    
    # Predict
    predictions = model.predict(X_test)
    

    🔗 References & Resources

    • 📘 Introduction to Machine Learning for Salary Prediction
    • 📈 Guide to Feature Engineering for Salary Forecasting
    • 🔥 Hands-on Salary Data Analysis with Pandas & Matplotlib

    💡 Download the dataset now and start exploring salary trends! 🚀

  20. Bookstore Financial Dataset 2019-2024 Calgary

    • kaggle.com
    zip
    Updated Nov 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabrielle Charlton-Wells (2025). Bookstore Financial Dataset 2019-2024 Calgary [Dataset]. https://www.kaggle.com/datasets/gabriellecharlton/bookstore-financial-dataset-2019-2024-calgary
    Explore at:
    zip(1673906 bytes)Available download formats
    Dataset updated
    Nov 13, 2025
    Authors
    Gabrielle Charlton-Wells
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Area covered
    Calgary
    Description

    📖 Overview

    This dataset represents a medium-sized Canadian bookstore business operating three retail locations across Calgary (Downtown, NW, SE) and a central warehouse.

    It covers 2019 to 2024, including the COVID-19 impact years (2020-2021) and post-pandemic recovery with inflation-adjusted growth. The data integrates finance, operations, HR, and customer analytics, perfect for data portfolio projects with specfic , KPI tracking, and realistic bookkeeping simulations.

    🧾 Files -> 12

    The dataset contains 9 CSV files representing different business metrics, a csv detailing the various columns in the 9 csv's, and a markdown README file:

    1. Bookstore Checking Balanced Dataset.csv: Daily bank account transactions (deposits, withdrawals, rolling balance).
    2. Bookstore Credit Balance Dataset.csv: Daily credit-card transactions (charges, payments, rolling balance).
    3. bookstore_sales.csv: Daily revenue by store and sales channel (gross / net / GST breakdown).
    4. bookstore_inventory.csv: Monthly warehouse-to-store inventory transfers and reorder levels.
    5. bookstore_employees_expanded.csv: Employee roster with department, role, employment type, and wages.
    6. bookstore_payroll_expanded.csv: Detailed payroll records with gross / net pay, deductions, and taxes.
    7. bookstore_loans.csv: Quarterly business loan balances, interest, and repayments (CEBA-style + LOC).
    8. bookstore_customers.csv: Clean customer file for customer-lifetime-value (LTV) analysis.
    9. bookstore_customers_expanded.csv: Expanded version of the customer dataset, including customer ratings.
    10. bookstore_customers_expanded_raw.csv: Messy version of the expanded customer dataset (duplicates, NA values) for data-cleaning exercises.
    11. data_dictionary.csv: Definitions of every column across all CSVs.
    12. README.md: Narrative summary and generation notes.

    🧮 Key Features

    Time span: 2019 – 2024

    Locations: Calgary -> Downtown (DT), NW, SE

    Currency: Canadian Dollars (CAD)

    Tax context: Alberta GST 5 %, no provincial PST

    Inflation factor: 1.00 → 1.18 (2019 → 2024) applied to payroll, sales, and loan interest

    💡 Example Analyses

    • Financial forecasting: Model cash flow trends using rolling bank balances.
    • Payroll cost analysis: Visualize seasonal vs. permanent staff expenses.
    • Sales forecasting: Fit time-series models by store/channel (e.g., ARIMA, Prophet).
    • Customer analytics: Segment LTV, churn probability, or satisfaction scores.
    • Data-cleaning demonstration: Compare bookstore_customers.csv vs. bookstore_customers_raw.csv.
    • Loan amortization & interest modeling: Analyze repayment structures over time.

    🧠 Intended Use

    This dataset is fully synthetic and designed for: - Business intelligence dashboards - Machine learning demos (forecasting, regression, clustering) - Financial and accounting analysis training - Data-cleaning and EDA (Exploratory Data Analysis) tutorials

    📜 License

    This dataset is released under the MIT License, free to use for research, learning, or commercial purposes.

    Photo: by Pixabay, free to use.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Corey J. A. Bradshaw; Barry W. Brook (2023). How to Rank Journals [Dataset]. http://doi.org/10.1371/journal.pone.0149852
Organization logo

How to Rank Journals

Explore at:
47 scholarly articles cite this dataset (View in Google Scholar)
txtAvailable download formats
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Corey J. A. Bradshaw; Barry W. Brook
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

There are now many methods available to assess the relative citation performance of peer-reviewed journals. Regardless of their individual faults and advantages, citation-based metrics are used by researchers to maximize the citation potential of their articles, and by employers to rank academic track records. The absolute value of any particular index is arguably meaningless unless compared to other journals, and different metrics result in divergent rankings. To provide a simple yet more objective way to rank journals within and among disciplines, we developed a κ-resampled composite journal rank incorporating five popular citation indices: Impact Factor, Immediacy Index, Source-Normalized Impact Per Paper, SCImago Journal Rank and Google 5-year h-index; this approach provides an index of relative rank uncertainty. We applied the approach to six sample sets of scientific journals from Ecology (n = 100 journals), Medicine (n = 100), Multidisciplinary (n = 50); Ecology + Multidisciplinary (n = 25), Obstetrics & Gynaecology (n = 25) and Marine Biology & Fisheries (n = 25). We then cross-compared the κ-resampled ranking for the Ecology + Multidisciplinary journal set to the results of a survey of 188 publishing ecologists who were asked to rank the same journals, and found a 0.68–0.84 Spearman’s ρ correlation between the two rankings datasets. Our composite index approach therefore approximates relative journal reputation, at least for that discipline. Agglomerative and divisive clustering and multi-dimensional scaling techniques applied to the Ecology + Multidisciplinary journal set identified specific clusters of similarly ranked journals, with only Nature & Science separating out from the others. When comparing a selection of journals within or among disciplines, we recommend collecting multiple citation-based metrics for a sample of relevant and realistic journals to calculate the composite rankings and their relative uncertainty windows.

Search
Clear search
Close search
Google apps
Main menu