Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
There are now many methods available to assess the relative citation performance of peer-reviewed journals. Regardless of their individual faults and advantages, citation-based metrics are used by researchers to maximize the citation potential of their articles, and by employers to rank academic track records. The absolute value of any particular index is arguably meaningless unless compared to other journals, and different metrics result in divergent rankings. To provide a simple yet more objective way to rank journals within and among disciplines, we developed a κ-resampled composite journal rank incorporating five popular citation indices: Impact Factor, Immediacy Index, Source-Normalized Impact Per Paper, SCImago Journal Rank and Google 5-year h-index; this approach provides an index of relative rank uncertainty. We applied the approach to six sample sets of scientific journals from Ecology (n = 100 journals), Medicine (n = 100), Multidisciplinary (n = 50); Ecology + Multidisciplinary (n = 25), Obstetrics & Gynaecology (n = 25) and Marine Biology & Fisheries (n = 25). We then cross-compared the κ-resampled ranking for the Ecology + Multidisciplinary journal set to the results of a survey of 188 publishing ecologists who were asked to rank the same journals, and found a 0.68–0.84 Spearman’s ρ correlation between the two rankings datasets. Our composite index approach therefore approximates relative journal reputation, at least for that discipline. Agglomerative and divisive clustering and multi-dimensional scaling techniques applied to the Ecology + Multidisciplinary journal set identified specific clusters of similarly ranked journals, with only Nature & Science separating out from the others. When comparing a selection of journals within or among disciplines, we recommend collecting multiple citation-based metrics for a sample of relevant and realistic journals to calculate the composite rankings and their relative uncertainty windows.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides supplementary data extracted and processed from the SCImago Journal Rank portal (2023) and the Scopus Discontinued Titles list (February 2025). It includes journal-level metrics such as SJR and h-index, quartile assignments, and subject category information. The files are intended to support exploratory analysis of citation patterns, disciplinary variations, and structural characteristics of journal evaluation systems. The dataset also contains Python code and visual materials used to examine relationships between prestige metrics and cumulative citation indicators.
Contents:
Facebook
TwitterThe aim of this study was to measure the publication rate of editorial board members in their board journals and to evaluate associated variables. We evaluated the ten highest-ranked journals according to the 5-year impact factor under ‘Dentistry, Oral Surgery & Medicine’ subject category for 2010, 2011, and 2012. All original research papers with at least one member of the editorial board as author were counted. Final analyses assessed associated variables such as size of the editorial board, number of papers published each year, and each journal’s impact factor. Overall, there was an increase in the average number of articles published from 2010 (115.2 ± 52.2) to 2012 (134.7 ± 47.4). The number and percentage of articles published with editorial board members as authors over the three years did not follow the same pattern, with a slight decrease from 2010 to 2011 and an increase in 2012. The number of articles with editorial board members as authors was significantly higher for journals with impact factors ≥4.0. Journals with a higher impact factor and larger editorial board were associated with higher chances of editorial board members publishing in their respective journals. Participation of editorial board members as authors in publishing varies significantly among journals.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Objective: To determine the top 100-ranked (by impact factor) clinical journals' policies toward publishing research previously published on preprint servers (preprints).
Design: Cross sectional. Main outcome measures: Editorial guidelines toward preprints, journal rank by impact factor.
Results: 86 (86%) of the journals examined will consider papers previously published as preprints (preprints), 13 (13%) determine their decision on a case-by-case basis, and 1 (1%) does not allow preprints.
Conclusions: We found wide acceptance of publishing preprints in the clinical research community, although researchers may still face uncertainty that their preprints will be accepted by all of their target journals.
Methods We examined journal policies of the 100 top-ranked clinical journals using the 2018 impact factors as reported by InCites Journal Citation Reports (JCR). First, we examined all journals with an impact factor greater than 5, and then we manually screened by title and category do identify the first 100 clinical journals. We included only those that publish original research. Next, we checked each journal's editorial policy on preprints. We examined, in order, the journal website, the publisher website, the Transpose Database, and the first 10 pages of a Google search with the journal name and the term "preprint." We classified each journal's policy, as shown in this dataset, as allowing preprints, determining based on preprint status on a case-by-case basis, and not allowing any preprints. We collected data on April 23, 2020.
(Full methods can also be found in previously published paper.)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Answers to a survey on gold Open Access run from July to October 2016. The dataset contains 15,235 unique responses from Web of Science published authors. This survey is part of a PhD thesis from the University of Granada in Spain. More details about the study can be found in the full text document, also available in Zenodo.
Following are listed the questions related to the WoS 2016 dataset. Please note that countries with less than 40 answers are listed as "Other" in order to preserve anonymity.
Fewer than 5 years
5-14 years
15-24 years
25 years or longer
Many of the questions that follow concern Open Access publishing. For the purposes of this survey, an article is Open Access if its final, peer-reviewed, version is published online by a journal and is free of charge to all users without restrictions on access or use.
Yes
No
I do not know
Yes
No
I have no opinion
I do not care
1-5
6-10
11-20
21-50
More than 50
[Each factor may be rated "Extremely important", "Important", "Less important" or "Irrelevant". The factors are presented in random order.]
Importance of the journal for academic promotion, tenure or assessment
Recommendation of the journal by my colleagues
Positive experience with publisher/editor(s) of the journal
The journal is an Open Access journal
Relevance of the journal for my community
The journal fits the policy of my organisation
Prestige/perceived quality of the journal
Likelihood of article acceptance in the journal
Absence of journal publication fees (e.g. submission charges, page charges, colour charges)
Copyright policy of the journal
Journal Impact Factor
Speed of publication of the journal
The decision is my own
A collective decision is made with my fellow authors
I am advised where to publish by a senior colleague
The organisation that finances my research advises me where to publish
Other (please specify) [Text box follows]
0
1-5
6-10
More than 10
I do not know
[If the answer is "0", the survey jumps to Q10.]
No charge
Up to €250 ($275)
€251-€500 ($275-$550)
€501-€1000 ($551-$1100)
€1001-€3000 ($1101-$3300)
More than €3000 ($3300)
I do not know
[If the answer is "No charge or I don't know" the survey jumps to Q20. ]
My research funding includes money for paying such fees
I used part of my research funding not specifically intended for paying such fees
My institution paid the fees
I paid the costs myself
Other (please specify) [Text box follows]
Easy
Difficult
I have not used these sources
[Each statement may be rated "Strongly agree", "Agree", "Neither agree nor disagree", "Disagree" or "Strongly disagree". The statements are presented in random order.]
Researchers should retain the rights to their published work and allow it to be used by others
Open Access publishing undermines the system of peer review
Open Access publishing leads to an increase in the publication of poor quality research
If authors pay publication fees to make their articles Open Access, there will be less money available for research
It is not beneficial for the general public to have access to published scientific and medical articles
Open Access unfairly penalises research-intensive institutions with large publication output by making them pay high costs for publication
Publicly-funded research should be made available to be read and used without access barrier
Open Access publishing is more cost-effective than subscription-based publishing and so will benefit public investment in research
Articles that are available by Open Access are likely to be read and cited more often than those not Open Access
This study and its questionnaire are based on the SOAP Project (http://project-soap.eu). An article describing the highlights of the SOAP Survey is available at: https://arxiv.org/abs/1101.5260. The dataset of the SOAP survey is available at http://bit.ly/gSmm71. A manual describing the SOAP dataset is available at http://bit.ly/gI8nc.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
First and last authorship are important metrics of productivity and scholarly success for trainees and professors. For 11 drug delivery-related journals in 2021, the percentage of female first (39.5%) and last (25.7%) authorship was reported. A strong negative correlation, with female first (rp = −0.73) and female last authorship (rp = −0.66), was observed with respect to journal impact factor. In contrast, there was a strong positive correlation with male first and last authorship (rp = 0.71). Papers were ∼1.5 times more likely to have a male first author, and ∼3 times more likely to have a male last author, than females. A female was 22% more likely to have first authorship if the last author was female, although there is an ∼1% increase per year in female authorship with male last authorship, which equates to equality in first authorship by 2044. Considering that drug delivery is composed of engineering, chemistry, and pharmaceutical science disciplines, the observed 25.7% female last authorship does not represent the approximately 35.5% to 50% of professors that are female in these disciplines, internationally. Overall, female authorship in drug delivery-related journals should improve to better represent the work of female senior authors.
Facebook
TwitterAnnals of Botany Impact Factor 2024-2025 - ResearchHelpDesk - Annals of Botany is an international plant science journal publishing novel and rigorous research in all areas of plant science. It is published monthly in both electronic and printed forms with at least two extra issues each year that focus on a particular theme in plant biology. The Journal is managed by the Annals of Botany Company, a not-for-profit educational charity established to promote plant science worldwide. The Journal publishes original research papers, invited and submitted review articles, ‘Research in Context’ expanding on original work, 'Botanical Briefings' as short overviews of important topics, and ‘Viewpoints’ giving opinions. All papers in each issue are summarized briefly in Content Snapshots , there are topical news items in the Plant Cuttings section and Book Reviews . A rigorous review process ensures that readers are exposed to genuine and novel advances across a wide spectrum of botanical knowledge. All papers aim to advance knowledge and make a difference to our understanding of plant science. The Annals of Botany Company is a Limited Company registered in England No. 78001 at University of Exeter, Innovation Centre, Rennes Drive, Exeter EX4 4RN, UK, and is also a Registered Charity, No. 237771. Abstract & indexing The journal is covered by the following services: Abstracts on Hygiene and Communicable Diseases Agricultural Engineering Abstracts Agbiotech News and Information Abstracts in Anthropology Agroforestry Abstracts Aquatic Sciences and Fisheries Abstracts ASCI Database Biocontrol News and Information Biological Abstracts Biological and Agricultural Index BIOSIS Previews CAB Abstracts Chemical Abstracts Crop Physiology Abstracts Current Contents® /Agriculture, Biology, and Environmental Sciences Dairy Science Abstracts Ecology Abstracts Elsevier BIOBASE - Current Awareness in Biological Sciences (CABS) Environmental Science and Pollution Management Excerpta Medica Abstract Journals Field Crop Abstracts Food Science and Technology Abstracts Forest Products Abstracts Forestry Abstracts Geobase Global Health Grasslands & Forage Abstracts Horticultural Abstracts Irrigation and Drainage Abstracts Journal Citation Reports /Science Edition Maize Abstracts Online Nematological Abstracts Nutrition Abstracts and Reviews Oceanic Abstracts Ornamental Horticulture Plant Breeding Abstracts Plant Genetic Resources Abstracts Plant Growth Regulators Post Harvest News and Information Potato Abstracts Poultry Abstracts PROQUEST DATABASE : AGRICOLA PlusText PROQUEST DATABASE : MEDLINE with Full Text PROQUEST DATABASE : ProQuest 5000 International PROQUEST DATABASE : ProQuest Agriculture Journals PROQUEST DATABASE : ProQuest Biology Journals PROQUEST DATABASE : ProQuest Central PROQUEST DATABASE : ProQuest Health & Medical Complete PROQUEST DATABASE : ProQuest Medical Library PROQUEST DATABASE : ProQuest Pharma Collection PubMed Review of Agricultural Entomology Review of Aromatic and Medicinal Plants Review of Plant Pathology Rice Abstracts Science Citation Index Expanded (SciSearch®) Science Citation Index® Seed Abstracts Soybean Abstracts Sugar Industry Abstracts The Standard Periodical Directory VITIS - Viticulture and Oenology Abstracts Water Resources Abstracts Weed Abstracts Wheat, Barley and Triticale Abstracts Wildlife Review
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The Post-Pandemic Remote Work Health Impact 2025 dataset presents a comprehensive, global snapshot of how remote, hybrid, and onsite work arrangements are influencing the mental and physical health of employees in the post-pandemic era. Collected in June 2025, this dataset aggregates responses from a diverse workforce spanning continents, industries, age groups, and job roles. It is designed to support research, data analysis, and policy-making around the evolving landscape of work and well-being.
This dataset enables in-depth exploration of: - The prevalence of mental health conditions (e.g., anxiety, burnout, PTSD, depression) across different work setups. - The relationship between work arrangements and physical health complaints (e.g., back pain, eye strain, neck pain). - Variations in work-life balance, social isolation, and burnout levels segmented by demographic and occupational factors. - Salary distributions and their correlation with health outcomes and job roles.
By providing granular, anonymized data on both subjective (self-reported) and objective (hours worked, salary range) factors, this resource empowers data scientists, health researchers, HR professionals, and business leaders to: - Identify risk factors and protective factors for employee well-being. - Benchmark health impacts across industries and regions. - Inform organizational policy and future-of-work strategies.
The dataset is in CSV format, with each row representing an individual survey response. Below is a detailed explanation of each column:
| Column Name | Description | Example Values |
|---|---|---|
| Survey_Date | Date when the survey response was submitted (YYYY-MM-DD) | 2025-06-01 |
| Age | Age of the respondent (in years) | 27, 52, 40 |
| Gender | Gender identity of the respondent | Female, Male, Non-binary, Prefer not to say |
| Region | Geographical region of employment | Asia, Europe, North America, Africa, Oceania |
| Industry | Industry sector of the respondent | Technology, Manufacturing, Finance, Healthcare |
| Job_Role | Specific job title or function | Data Analyst, HR Manager, Software Engineer |
| Work_Arrangement | Primary work mode | Onsite, Remote, Hybrid |
| Hours_Per_Week | Average number of hours worked per week | 36, 55, 64 |
| Mental_Health_Status | Primary self-reported mental health condition | Anxiety, Burnout, Depression, None, PTSD |
| Burnout_Level | Self-assessed burnout (categorical: Low, Medium, High) | High, Medium, Low |
| Work_Life_Balance_Score | Self-rated work-life balance on a scale of 1 (poor) to 5 (excellent) | 1, 3, 5 |
| Physical_Health_Issues | Self-reported physical health complaints (semicolon-separated if multiple) | Back Pain; Eye Strain; Neck Pain; None |
| Social_Isolation_Score | Self-rated social isolation on a scale of 1 (none) to 5 (severe) | 1, 2, 5 |
| Salary_Range | Annual salary range in USD | $40K-60K, $80K-100K, $120K+ |
| Survey_Date | 2025-06-01 |
|---|---|
| Age | 27 |
| Gender | Female |
| Region | Asia |
| Industry | Professional Services |
| Job_Role | Data Analyst |
| Work_Arrangement | Onsite |
| Hours_Per_Week | 64 |
| Mental_Health_Status | Stress Disorder |
| Burnout_Level | High |
| Work_Life_Balance_Score | 3 |
| Physical_Health_Issues | Shoulder Pain; Neck Pain |
| Social_Isolation_Score | 2 |
| Salary_Range | $40K-60K |
Facebook
TwitterResearch dissemination and knowledge translation are imperative in social work. Methodological developments in data visualization techniques have improved the ability to convey meaning and reduce erroneous conclusions. The purpose of this project is to examine: (1) How are empirical results presented visually in social work research?; (2) To what extent do top social work journals vary in the publication of data visualization techniques?; (3) What is the predominant type of analysis presented in tables and graphs?; (4) How can current data visualization methods be improved to increase understanding of social work research? Method: A database was built from a systematic literature review of the four most recent issues of Social Work Research and 6 other highly ranked journals in social work based on the 2009 5-year impact factor (Thomson Reuters ISI Web of Knowledge). Overall, 294 articles were reviewed. Articles without any form of data visualization were not included in the final database. The number of articles reviewed by journal includes : Child Abuse & Neglect (38), Child Maltreatment (30), American Journal of Community Psychology (31), Family Relations (36), Social Work (29), Children and Youth Services Review (112), and Social Work Research (18). Articles with any type of data visualization (table, graph, other) were included in the database and coded sequentially by two reviewers based on the type of visualization method and type of analyses presented (descriptive, bivariate, measurement, estimate, predicted value, other). Additional revi ew was required from the entire research team for 68 articles. Codes were discussed until 100% agreement was reached. The final database includes 824 data visualization entries.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sample characteristics of students in transdisciplinary (TD) and traditional doctoral programs at time of enrollment and advisor characteristics at program year 5.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Standard errors in parentheses.Regression Results, Poisson Model, Equation (5): Economics.
Facebook
TwitterThis dataset contains characterization factors (CFs) for the five mandatory life cycle impact assessment (LCIA) categories required in ISO 21930:2017: 1. Greenhouse gases (GHG), which is incorrectly named ‘GWP’ in the standard, 2. Ozone Depletion Potential (ODP), 3. Eutrophication Potential (EP), 4. Acidification Potential (AP), and 5. Photochemical Ozone Formation Potential (POCP) These CFs are appropriate for use with life cycle inventory data for activities occurring within the United States. The short name for the dataset is ISO21930-LCIA-US v0.1. The characterization factors, with the exception of GHGs, are identical to the those currently in TRACI v2.1 for the corresponding impact categories. The four TRACI v2.1 impact categories have the same names as ISO 21930:2017 with the exception of POCP, which is called “smog formation” in TRACI v2.1. The characterization factors for GHGs are the 100-year (GWP-100) GWPs from the International Panel for Climate Change (IPCC)’s 5th Assessment Report (AR5) report. The names for the chemicals, release contexts, units and IDs are from the Federal LCA Elementary Flow List (FEDEFL) v1.2. These datasets were created using the LCIA Formatter v1.1.2 (https://github.com/USEPA/LCIAformatter). Formats Datasets are provided in simple tables in Excel, in the openLCA JSON-LD format using Federal LCA Commons standards, and in Apache parquet format. The fields in the Excel and identical parquet versions use the LCIAmethod format fields: https://github.com/USEPA/LCIAformatter/blob/master/format%20specs/LCIAmethod.md 1. Zip archives of JSON files in the JSON-LD schema: a file type associated with the openLCA schema. Two JSON-LD versions are provided. a. “ISO21930-LCIA-USv0.1_noflows_json-ld.zip” is without flow objects. b. “ISO21930-LCIA-USv0.1_wprefflows_json-ld.zip” is with flow objects of preferred flows from the FEDEFL. See usage notes below. 2. Excel and parquet: tabular format according to schema from the LCIA formatter, with additional fields added: o “source_method”: indicates the original method source for the indicator (e.g., TRACI 2.1 or IPCC) o “source_indicator”: indicates the name of the indicator in its original form (e.g. Smog Formation) o “category”: indicates the desired parent folder name for the impact category (shown as “EPA EPD in Figure 1) Usage Generally, in all formats, the CFs can be multiplied by kg (or unit specified in the denominator) of the relevant chemical emitted to calculate the potential impact value for a given impact category for that relevant chemical. If no CF exists for a chemical in a given impact category, it is not considered to have an impact in that category. The parquet format is most efficient for import into applications or scripts using languages like Python and R. The Zip archives of JSON-LD files can be loaded into openLCA or other LCA or EPD software supporting that format. When loaded into openLCA (via JSON-LD), the method shows as a separate impact assessment method. Individual indicators are categorized within the EPA EPD category. For introduction to importing a dataset into openLCA we recommend this training video from the National Renewable Energy Laboratory. https://youtu.be/YLao5jC5b_0?si=H0SNZ_ufOwInkgCF&t=48 The version with no flows is designed to import in a database that already has FEDEFL elementary flows or no more modeling is to be done that would use any new flows. It will only create the LCIA method. The version with flows can be imported into a new ‘empty’ database and it will create not just the LCIA method but all associated flows and more basic objects like units and flow properties. It can be used when no process data that you wish to model has been created yet and/or if you want to have a full import of all relevant elementary flows.
Facebook
Twitterhttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasetshttps://www.worldbank.org/en/about/legal/terms-of-use-for-datasets
Title: Mortality Rate (Under-5, Per 1000 Live Births)
Subtitle: Exploring global trends in child survival and health advancements.
Detailed Description:
This dataset contains the under-5 mortality rate, measured as the number of deaths per 1,000 live births for children under five years of age. Sourced from the World Bank, it highlights progress in child survival and health outcomes globally over decades.
Key Highlights: - Annual data for countries worldwide. - Metric: Mortality rate (under-5, per 1000 live births). - Use cases: Analyze trends, compare regional disparities, and correlate mortality rates with health and economic indicators.
Data Cleaning:
Visualizations:
Descriptive Analysis:
Create a Kaggle notebook with: 1. Data Cleaning: Show how missing or inconsistent values are handled. 2. EDA: Include visualizations like heatmaps, scatterplots, and line charts. 3. Insights: Highlight significant findings, such as countries with notable improvements in child survival. 4. Optional Predictive Modeling: Use regression or time-series models to project future trends.
GitHub Link: https://github.com/yourusername/Under5_Mortality_Trends
Kaggle Link: https://www.kaggle.com/datasets/yourusername/under5-mortality-rate
Post Title:
📉 Global Trends in Under-5 Mortality Rates 🌍
Post Body:
I’m excited to share my latest dataset on under-5 mortality rates (per 1,000 live births), sourced from the World Bank. This dataset highlights progress in global health and child survival, spanning decades and covering countries worldwide.
📂 Explore the Dataset:
- GitHub Repository: https://github.com/yourusername/Under5_Mortality_Trends
- Kaggle Dataset: https://www.kaggle.com/datasets/yourusername/under5-mortality-rate
Child survival is a fundamental measure of global health progress. This dataset is ideal for:
- Trend Analysis: Explore how under-5 mortality rates have evolved globally.
- Regional Comparisons: Identify disparities in child survival rates across regions.
- Correlations: Study the relationship between mortality rates and economic indicators like healthcare expenditure or GDP per capita.
📈 Get Involved:
- Use the dataset for your own analyses and visualizations.
- Share your insights and findings.
- Upvote the Kaggle dataset to help others discover it!
❓ What trends or correlations do you find in the data?
- Which country or region has shown the most improvement?
- What factors would you analyze further?
Let me know your thoughts, and feel free to share this resource with others who might benefit! 🌟
Let me know if you'd like assistance with EDA or visualization templates!
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to
establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, domain-specific databases, and the top journals compare how much data is in institutional vs. domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data
Approach
The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered.
Search methods
We first compiled a list of known domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were domain specific, though some contained a mix of data subjects.
We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories.
Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo.
Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories.
Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals.
Evaluation
We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results.
We compared domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind.
Results
A summary of the major findings from our data review:
Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors.
There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection.
Not even one quarter of the domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation.
See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Background: Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the "citation benefit". Furthermore, little is known about patterns in data reuse over time and across datasets. Method and Results: Here, we look at citation rates while controlling for many known citation predictors, and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation benefit varied with date of dataset deposition: a citation benefit was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties. Conclusion: After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered.We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.
Facebook
TwitterOpen Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Health Reports, published by the Health Analysis Division of Statistics Canada, is a peer-reviewed journal of population health and health services research. It is designed for a broad audience that includes health professionals, researchers, policymakers, and the general public. The journal publishes articles of wide interest that contain original and timely analyses of national or provincial/territorial surveys or administrative databases. New articles are published electronically each month. Health Reports had an impact factor of 2.673 for 2014 and a five-year impact factor of 4.167. All articles are indexed in PubMed. Our online catalogue is free and receives more than 500,000 visits per year. External submissions are welcome.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Includes articles that were sponsored by glitazone company, another drug company and other non -drug company funding (n = 2) and by glitazone company and non-drug-company (n = 2).Data from 5 articles were excluded because they were published in journals that had no impact factor. Of the N = 56 trials reported in journals with impact factors, median value N = 2.84, mean value = 4.63, range (0.34–44.02) and standard deviation σ = 6.06.**Sample size characteristics, median value N = 252, mean value = 390, range (20–4360) and standard deviation σ = 590.4.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SPID is a comprehensive dataset composed of synthetic particle image velocimetry (PIV) image pairs and their corresponding exact optical flow computations. It serves as a valuable resource for researchers and practitioners in the field. The dataset is organized into three subsets: training, validation, and test, distributed in a ratio of 70%, 15%, and 15%, respectively.
Each subset within SPID consists of an input denoted as "x", which comprises synthetic image pairs. These image pairs provide the necessary context for the optical flow computations. Additionally, an output termed "y" is provided, which represents the exact optical flow calculated for each image pair. Notably, the images within the dataset are single-channel, and the optical flow is decomposed into its u and v components.
The shape of the input subsets in SPID is given by (number of samples, number of frames, image width, image height, number of channels), representing the dimensions of the input data. On the other hand, the shape of the output subsets is given by (number of samples, velocity components, image width, image height), denoting the shape of the optical flow data.
It is important to mention that SPID dataset is a preprocessed version of the Raw Synthetic Particle Image Dataset (RSPID), ensuring improved usability and reliability. Moreover, the dataset is packaged as a NumPy compressed NPZ file, which conveniently stores the inputs and outputs as separate NumPy NPZ files with the labels train, validation and test as acess keys. This format simplifies data extraction and integration into machine learning frameworks and libraries, facilitating seamless usage of the dataset.
SPID incorporates various factors that impact PIV analysis to provide a comprehensive and realistic simulation. The dataset includes image pairs with an image width of 665 pixels and an image height of 630 pixels, ensuring a high level of detail and accuracy with an 8-bit depth. It incorporates different particle radii (1, 2, 3, and 4 pixels) and particle densities (15, 17, 20, 23, 25, and 32 particles) to capture diverse particle configurations.
To simulate real-world scenarios, SPID introduces displacement variations through the delta x factor, ranging from 0.05% to 0.25%. Noise levels (1, 5, 10, and 15) are also incorporated to mimic practical PIV measurements with varying degrees of noise. Furthermore, out-of-plane motion effects are considered with standard deviations of 0.01, 0.025, and 0.05 to assess their impact on optical flow accuracy.
The dataset covers a wide range of flow patterns encountered in fluid dynamics. It includes Rankine uniform, Rankine vortex, parabolic, stagnation, shear, and decaying vortex flows, allowing for comprehensive testing and evaluation of PIV algorithms across different scenarios.
By leveraging the SPID dataset, researchers can develop and validate PIV algorithms and techniques under various challenging conditions. Its realistic and diverse simulation of particle image velocimetry scenarios makes it an invaluable tool for advancing the field and improving the accuracy and reliability of optical flow computations.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains synthetic salary data generated to explore the impact of various factors on salary predictions. It includes attributes such as age, education level, years of experience, certifications, GPA, and job roles, providing a realistic dataset for machine learning, data analysis, and salary estimation models.
Researchers and data scientists can use this dataset to study patterns, perform feature engineering, and develop predictive models for salary forecasting.
| Column Name | Data Type | Description |
|---|---|---|
Age | int | Age of the individual (18-65 years) |
Gender | category | Gender of the individual (Male, Female, Non-Binary) |
Education_Level | category | Highest education degree (High School, Bachelor's, Master's, PhD) |
Years_of_Experience | int | Total years of work experience (0-40 years) |
Certifications | int/float | Number of professional certifications obtained (0-5) |
GPA | float | Grade Point Average (GPA) from 0.0 to 4.0 (with some missing values) |
Job_Role | category | Job designation (Data Scientist, Software Engineer, Manager, etc.) |
Industry | category | Industry sector (Tech, Finance, Healthcare, Education, etc.) |
Company_Size | category | Size of the company (Small, Medium, Large) |
Location | category | Work location (Urban, Suburban, Rural) |
Remote_Work | binary | Whether the individual works remotely (0 = No, 1 = Yes) |
Salary | float | Annual salary in USD ($30,000 - $300,000) |
GPA and Certifications to simulate real-world scenarios.Gender, Education_Level, Job_Role, etc.) need encoding before ML model training.Age, Years_of_Experience, and Salary should be normalized for better model performance.This dataset is ideal for:
✅ Salary Prediction Models – Predict salaries based on experience, education, and industry.
✅ Feature Importance Analysis – Identify which factors contribute most to salary variations.
✅ Exploratory Data Analysis (EDA) – Discover salary trends across different demographics.
✅ Machine Learning Applications – Train regression or classification models for salary forecasting.
Here’s how you can use this dataset for salary prediction:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import OneHotEncoder, StandardScaler
# Load dataset
df = pd.read_csv("salary_dataset.csv")
# Handle missing values
df["GPA"].fillna(df["GPA"].median(), inplace=True)
df["Certifications"].fillna(0, inplace=True)
# Encode categorical variables
categorical_features = ["Gender", "Education_Level", "Job_Role", "Industry", "Company_Size", "Location"]
df = pd.get_dummies(df, columns=categorical_features, drop_first=True)
# Define X, y
X = df.drop(columns=["Salary"])
y = df["Salary"]
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train model
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
# Predict
predictions = model.predict(X_test)
💡 Download the dataset now and start exploring salary trends! 🚀
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset represents a medium-sized Canadian bookstore business operating three retail locations across Calgary (Downtown, NW, SE) and a central warehouse.
It covers 2019 to 2024, including the COVID-19 impact years (2020-2021) and post-pandemic recovery with inflation-adjusted growth. The data integrates finance, operations, HR, and customer analytics, perfect for data portfolio projects with specfic , KPI tracking, and realistic bookkeeping simulations.
Time span: 2019 – 2024
Locations: Calgary -> Downtown (DT), NW, SE
Currency: Canadian Dollars (CAD)
Tax context: Alberta GST 5 %, no provincial PST
Inflation factor: 1.00 → 1.18 (2019 → 2024) applied to payroll, sales, and loan interest
This dataset is fully synthetic and designed for: - Business intelligence dashboards - Machine learning demos (forecasting, regression, clustering) - Financial and accounting analysis training - Data-cleaning and EDA (Exploratory Data Analysis) tutorials
This dataset is released under the MIT License, free to use for research, learning, or commercial purposes.
Photo: by Pixabay, free to use.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
There are now many methods available to assess the relative citation performance of peer-reviewed journals. Regardless of their individual faults and advantages, citation-based metrics are used by researchers to maximize the citation potential of their articles, and by employers to rank academic track records. The absolute value of any particular index is arguably meaningless unless compared to other journals, and different metrics result in divergent rankings. To provide a simple yet more objective way to rank journals within and among disciplines, we developed a κ-resampled composite journal rank incorporating five popular citation indices: Impact Factor, Immediacy Index, Source-Normalized Impact Per Paper, SCImago Journal Rank and Google 5-year h-index; this approach provides an index of relative rank uncertainty. We applied the approach to six sample sets of scientific journals from Ecology (n = 100 journals), Medicine (n = 100), Multidisciplinary (n = 50); Ecology + Multidisciplinary (n = 25), Obstetrics & Gynaecology (n = 25) and Marine Biology & Fisheries (n = 25). We then cross-compared the κ-resampled ranking for the Ecology + Multidisciplinary journal set to the results of a survey of 188 publishing ecologists who were asked to rank the same journals, and found a 0.68–0.84 Spearman’s ρ correlation between the two rankings datasets. Our composite index approach therefore approximates relative journal reputation, at least for that discipline. Agglomerative and divisive clustering and multi-dimensional scaling techniques applied to the Ecology + Multidisciplinary journal set identified specific clusters of similarly ranked journals, with only Nature & Science separating out from the others. When comparing a selection of journals within or among disciplines, we recommend collecting multiple citation-based metrics for a sample of relevant and realistic journals to calculate the composite rankings and their relative uncertainty windows.