61 datasets found
  1. OpenAlex

    • redivis.com
    • huggingface.co
    application/jsonl +7
    Updated Apr 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kellogg School of Management (2025). OpenAlex [Dataset]. https://redivis.com/datasets/a08b-162382x4j
    Explore at:
    arrow, spss, stata, csv, parquet, application/jsonl, avro, sasAvailable download formats
    Dataset updated
    Apr 15, 2025
    Dataset provided by
    Redivis Inc.
    Authors
    Kellogg School of Management
    Description

    Methodology

    OpenAlex is a fully open catalog of the global research system. It's named after the ancient library of Alexandria and made by the non-profit OurResearch.

    The OpenAlex dataset describes scholarly entities and how those entities are connected to each other. Types of entities include works, authors, sources, institutions, topics, publishers, and funders. Together, these make a huge web (or more technically, heterogenous directed graph) of hundreds of millions of entities and billions of connections between them all.

    OpenAlex offers an open replacement for industry-standard scientific knowledge bases like Elsevier's Scopus and Clarivate's WEb of Science. Compared to these paywalled services, OpenAlex offers significant advantages in terms of inclusivity, affordability, and availability.

    The data here are derived from the snapshot data, which is updated about once per month. The raw data are stored on Amazon S3 in the publicly available openalex bucket as gzip-compressed JSON lines files. We use custom functions in Python code to flatten these records into the relational database hosted here on Redivis.

    The live data are also available for free via the REST API.

    Usage

    If you use OpenAlex in research, please cite this paper:

    Priem, J., Piwowar, H., & Orr, R. (2022).

    OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. ArXiv. https://arxiv.org/abs/2205.01833

  2. s

    OpenAlex Affiliations Corrections

    • data.smartidf.services
    • data.enseignementsup-recherche.gouv.fr
    csv, excel, json
    Updated Nov 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). OpenAlex Affiliations Corrections [Dataset]. https://data.smartidf.services/explore/dataset/openalex-affiliations-corrections/
    Explore at:
    csv, json, excelAvailable download formats
    Dataset updated
    Nov 28, 2024
    License

    Licence Ouverte / Open Licence 2.0https://www.etalab.gouv.fr/wp-content/uploads/2018/11/open-licence.pdf
    License information was derived automatically

    Description

    This dataset list all the issues on the Github repository https://github.com/dataesr/openalex-affiliations/.The dataset is updated every day at 7 AM GMT through a Github action on this repository https://github.com/dataesr/openalex-affiliations/blob/main/.github/workflows/sync_openalex_affiliations_github_issues.yml.List of corrections of OpenAlex affiliations.Ce jeu de données liste toutes les issue de l'entrepôt Github https://github.com/dataesr/openalex-affiliations/."github_issue_id": integer, issue number according to Github"github_issue_link": string, weblink to the Github issue"state": string, current state of the issue, can be "open" or "closed". An issue is closed if it has been ingested by OpenAlex."date_opened": date, when the issue has been opened, eg "2024-11-15""date_closed": date, when the issue has been closed, null if not closed, eg "2024-11-18""raw_affiliation_name": string, raw affiliation string as collected by OpenAlex,"has_added_rors": boolean, if the correction suggest the add of a ROR, 1 if true, 0 if false"has_removed_rors": boolean, if the correction suggest the of a ROR, 1 if true, 0 if false"new_rors": string, list of corrected RORs, separated by ";""previous_rors": string, list of RORs before correction, separated by ";""added_rors": string, list of added RORs after correction, separated by ";""removed_rors": string, list of removed RORs after correction, separated by ";""openalex_works_examples": string, weblink to OpenAlex work mentionning the affiliation string"searched_between": searched years range, eg "2018 - 2024""contact": string, encrypted version of the email of the author of the correction, only the domain name server is not encrypted"contact_domain": string, domain name server"version": version of the works-magnet app used to collect the correction

  3. a

    OpenAlex Snapshot - 23-08-21 (Complete)

    • academictorrents.com
    bittorrent
    Updated Aug 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jason Priem and Heather Piwowar and Richard Orr (2023). OpenAlex Snapshot - 23-08-21 (Complete) [Dataset]. https://academictorrents.com/details/c29c888839bf0512043c770e78ddbe321ea6567b
    Explore at:
    bittorrent(335409170855)Available download formats
    Dataset updated
    Aug 22, 2023
    Dataset authored and provided by
    Jason Priem and Heather Piwowar and Richard Orr
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    OpenAlex is a new, fully-open scientific knowledge graph (SKG), launched to replace the discontinued Microsoft Academic Graph (MAG). It contains metadata for 209M works (journal articles, books, etc); 2013M disambiguated authors; 124k venues (places that host works, such as journals and online repositories); 109k institutions; and 65k Wikidata concepts (linked to works via an automated hierarchical multi-tag classifier). The dataset is fully and freely available via a web-based GUI, a full data dump, and high-volume REST API. The resource is under active development and future work will improve accuracy and coverage of citation information and author/institution parsing and deduplication. From: Priem, J., Piwowar, H., & Orr, R. (2022). OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. ArXiv. Upload details: Downloaded a copy from the aws endpoint s3://openalex on 2023-08-21. Updates are rolling, so future

  4. h

    openalex-topic-title-abstract

    • huggingface.co
    Updated Feb 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Albert Martínez (2025). openalex-topic-title-abstract [Dataset]. https://huggingface.co/datasets/albertmartinez/openalex-topic-title-abstract
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 12, 2025
    Authors
    Albert Martínez
    Description

    albertmartinez/openalex-topic-title-abstract dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. g

    OpenAlex Affiliations Corrections | gimi9.com

    • gimi9.com
    Updated Nov 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). OpenAlex Affiliations Corrections | gimi9.com [Dataset]. https://gimi9.com/dataset/fr_674e751add1d43173f113fcf/
    Explore at:
    Dataset updated
    Nov 28, 2024
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset list all the issues on the Github repository The dataset is updated every day at 2 AM through a Github action on this repository. List of corrections of OpenAlex affiliations. Ce jeu de données liste toutes les issue de l'entrepôt Github https://github.com/dataesr/openalex-affiliations/. "github_issue_id": integer, issue number according to Github "github_issue_link": string, weblink to the Github issue "state": string, current state of the issue, can be "open" or "closed". An issue is closed if it has been treated by OpenAlex. "date_opened": date, when the issue has been opened, ex "2024-11-15" "date_closed": date, when the issue has been closed, null if not closed, ex "2024-11-18" "raw_affiliation_name": string, raw affiliation string as collected by OpenAlex, "has_added_rors": boolean, if the correction suggest the add of a ROR, 1 if true, 0 if false "has_removed_rors": boolean, if the correction suggest the of a ROR, 1 if true, 0 if false "new_rors": string, list of corrected RORs, separated by ";" "previous_rors": string, list of RORs before correction, separated by ";" "added_rors": string, list of added RORs after correction, separated by ";" "removed_rors": string, list of removed RORs after correction, separated by ";" "openalex_works_examples": string, weblink to OpenAlex work mentionning the affiliation string "contact": string, encrypted version of the email of the author of the correction, only the domain name server is not encrypted "contact_domain": string, domain name server

  6. f

    University of Arizona authors' scholarly works published and cited works...

    • arizona.figshare.com
    txt
    Updated May 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yan Han (2025). University of Arizona authors' scholarly works published and cited works year 2020 from OpenAlex [Dataset]. http://doi.org/10.25422/azu.data.28746095.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 1, 2025
    Dataset provided by
    University of Arizona Research Data Repository
    Authors
    Yan Han
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Two Datasets: works_published and works_cited for year 2020 from OpenAlex database.Check license https://github.com/ourresearch/openalex-docs/blob/main/license.md "OpenAlex data is made available under the CC0 license. That means it's in the public domain, and free to use in any way you like. We appreciate attribution where it's convenient, but it's not at all necessary. There is one exception: the MAG Format snapshot is released under ODC-BY, as per the original MAG license applied by Microsoft (it reuses their schema). See the LICENSE.txt file in the MAG format snapshot distribution for attribution requirement details."Data Quality Considerations:OpenAlex has improved the accuracy of the data with helps from algorithms and institutions.Our current data quality assessment showed the precision and recall 95%+.The first dataset "works_published", as constructed in the provided sources, refers to the publications authored by individuals affiliated with the University of Arizona (UArizona). The data is retrieved using the OpenAlexR package by querying the OpenAlex database with UArizona's Research Organization Registry (ROR) ID (03m2x1q45) and specific publication date ranges. Key aspects of this dataset:Scope: It contains records of scholarly works associated with UArizona authors, including various publication types such as journals, repositories (like PubMed and arXiv), and others. It is also possible to filter the results to include only "journal" type publications using the primary_location.source.type = "journal" parameter in the oa_fetch function.Temporal Coverage: The sources demonstrate fetching data for specific years (e.g., 2019, 2020, 2021, 2022, 2023).Data Retrieval: The process involves using the oa_fetch function from the openalexR package with the entity="works" parameter and specifying the institutions.ror.Data Structure: Each record in this dataset represents a publication and includes various fields. Certain fields are data frames.Usage: This dataset is used as a starting point for various data analyses and data mining.The second dataset "works_cited", refers to scholarly works cited by the publications within the works_published dataset. It is created by extracting the OpenAlex IDs from the $referenced_works field of the works_published data and then using the oa_fetch function to retrieve the full metadata for these cited works. Key aspects of this dataset:Scope: It includes metadata for a wide range of scholarly works that have been cited by UArizona-affiliated publications. This can encompass articles, books, preprints, book chapters, and other types of scholarly outputs.Data Derivation: The dataset is derived from the referenced_works field of the works_published dataset.Data Structure: Each record in this dataset represents a cited work and contains various fields retrieved by the OpenAlex API.The third file (institution_publications.r) is the source code to get the above dataset.Note the code retrieves additional years in addition to 2020.Usage: Both datasets are crucial for performing publication and citation analysis and mining, including:Identifying the most frequently cited works and journals.Analyzing the journal usage and publisher distribution of cited works.Understanding the scholarly landscape influencing UArizona research.Identifying potential resources for library collections based on citation frequency.Investigating the presence and frequency of citations from specific publishers or to specific works.For inquiries regarding the contents of this dataset, please contact the Corresponding Author listed in the README.txt file. Administrative inquiries (e.g., removal requests, trouble downloading, etc.) can be directed to data-management@arizona.eduThis item is part of University of Arizona authors' scholarly works published and cited works

  7. Z

    Core sources and core publications in OpenAlex

    • data.niaid.nih.gov
    • zenodo.org
    Updated Oct 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Van Eck, Nees Jan (2024). Core sources and core publications in OpenAlex [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10949670
    Explore at:
    Dataset updated
    Oct 2, 2024
    Dataset authored and provided by
    Van Eck, Nees Jan
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This data set contains data on core sources and core publications identified in the OpenAlex database (based on the OpenAlex snapshot released on August 30, 2024).

    The source code used to identify core sources and core publications in OpenAlex is available in this GitHub repository.

    See this report for more information about the identification of core sources and core publications in OpenAlex.

    This data set consists of the following tab-delimited files.

    source.tsv

    source_id

    source

    source_type

    issn_l

    is_core_source

    n_works

    n_core_works

    work.tsv

    work_id

    work_type

    pub_year

    source_id

    doi

    is_core_work

  8. "Whitelist" journals data obtained from OpenAlex

    • zenodo.org
    bin
    Updated May 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Irina Volkova; Irina Volkova (2025). "Whitelist" journals data obtained from OpenAlex [Dataset]. http://doi.org/10.5281/zenodo.15392888
    Explore at:
    binAvailable download formats
    Dataset updated
    May 14, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Irina Volkova; Irina Volkova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 9, 2024
    Description

    The dataset contains two samples of "White List" journals for December 2024 and February 2025. The data was collected based on API queries to OpenAlex and parsing the JSON file from the RCNI website where the "White List" is located. The following properties of "White List" journal objects were considered: ISSN, Title, journal level in the "White List", date accepted, journal ID in OpenAlex, database indexing information (Web of Science, Scopus, DOAJ), publisher and country of publishing, total citations, open access data, leading thematic topic, leading thematic field, authors from which countries published in the journal and the number of publications (if Russian authors were among them, an additional request was made for the number of publications in the journal with Russian authors by year).

    These are SQLite database files that can be opened in DB Browser for SQLite.

  9. Z

    More open abstracts? Comparing abstract coverage in Crossref and OpenAlex...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kramer, Bianca (2024). More open abstracts? Comparing abstract coverage in Crossref and OpenAlex [dataset] [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11580549
    Explore at:
    Dataset updated
    Nov 25, 2024
    Dataset authored and provided by
    Kramer, Bianca
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Aggregated data underlying the blogpost:More open abstracts? Comparing abstract coverage in Crossref and OpenAlexhttps://bmkramer.github.io/SesameOpenScience_site/thought/202411_open_abstracts/The dataset contains the following files:

    abstracts_crossref_openalex_202410.csv

    abstracts_crossref_openalex_data_dictionary.txt

    The csv file contains data on abstract coverage for Crossref DOIs in Crossref and OpenAlex, aggregated by publisher, for the top 1000 publishers in terms of number of retrieved dois. Scope is limited to publications with Crossref type 'journal-articles' and publication years 2022-2024. Variables are described in the data dictionary included in this record.This analysis was performed using Curtin Open Knowledge Initiative (COKI) infrastructure, which is documented on GitHub: https://github.com/The-Academic-Observatory. Here, a number of open data sources (including Crossref, OpenAlex and OpenAIRE) are ingested into a Google Big Query environment, which can then be queried via SQL.The following data sources were used:

    Crossref (Metadata Plus snaphot 2024-10-31, Crossref member route API 2024-11-20)

    OpenAlex (data snapshot 2024-10-30)

    The code used to generate the dataset is available on GitHub: https://github.com/bmkramer/more_open_abstracts

  10. Z

    Colombia Coauthorship networks divided by openalex lvl 0 concepts

    • data.niaid.nih.gov
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gerardo Gutierrez (2023). Colombia Coauthorship networks divided by openalex lvl 0 concepts [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7685127
    Explore at:
    Dataset updated
    Jul 13, 2023
    Dataset authored and provided by
    Gerardo Gutierrez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Colombia
    Description

    Graphs are uploaded in gml format, which can be easily imported by networkx, gephi and neo4j.

    The undirected graphs are constructed from openalex database (last update Dic 2022), where nodes are authors and edges specify whether or not two nodes coauthored at least one paper. We avoided papers with more than 10 authors since they are very scarse and could affect the posterior analysis of the networks.

    The attributes of the nodes consist of the list of used words for each author and its frequency and all the concepts (of every level) the papers of the author are labeld.

    The attributes of the edges only contains the number of papers published between two authors.

    For more information about openalex concepts, visit https://docs.openalex.org/api-entities/concepts

  11. Classification of research publications based on data from OpenAlex

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jan 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nees Jan Van Eck; Nees Jan Van Eck (2024). Classification of research publications based on data from OpenAlex [Dataset]. http://doi.org/10.5281/zenodo.10560276
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 24, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Nees Jan Van Eck; Nees Jan Van Eck
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This data set contains an algorithmic classification of research publications based on data from OpenAlex. The classification is based on the OpenAlex snapshot released on November 21, 2023.

    To build the classification, we used the so-called extended direct citation approach in combination with the Leiden algorithm. The source code of our software is available here. The classification covers the 71 million journal articles, proceedings papers, preprints, and book chapters in OpenAlex that were published between 2000 and 2023 and that are connected to each other by citation links. Based on 1715 million citation links, we built a three-level hierarchical classification. Each publication was assigned to a cluster at each of the three levels of the classification. Clusters consist of publications that are relatively strongly connected by citation links and that can therefore be expected to be topically related. At each level of the classification, a publication was assigned to only one cluster, which means clusters do not overlap.

    The classification consists of 4521 micro clusters at the lowest (most granular) level, 917 meso clusters at the middle level, and 20 macro clusters at the highest (least granular) level. We also algorithmically linked each cluster in the classification to one or more of the following five broad main fields: biomedical and health sciences, life and earth sciences, mathematics and computer science, physical sciences and engineering, and social sciences and humanities.

    We used the Updated GPT 3.5 Turbo large language model, developed by OpenAI, to label the 4521 micro clusters at the lowest level in the classification. The source code of our software can be found here.

    See this blog post for more information about the classification.

    The classification, including the labels of the micro clusters, is available in the following tab-delimited files.

    clustering.tsv

    • work_id
    • doi
    • macro_cluster_id
    • meso_cluster_id
    • micro_cluster_id

    main_field.tsv

    • main_field_id
    • main_field

    macro_cluster.tsv

    • macro_cluster_id
    • macro_cluster
    • n_works

    macro_cluster_main_field.tsv

    • macro_cluster_id
    • main_field_seq
    • main_field_id
    • weight
    • is_primary_main_field

    meso_cluster.tsv

    • meso_cluster_id
    • meso_cluster
    • parent_macro_cluster_id
    • n_works

    meso_cluster_main_field.tsv

    • meso_cluster_id
    • main_field_seq
    • main_field_id
    • weight
    • is_primary_main_field

    meso_cluster_source.tsv

    • meso_cluster_id
    • source_seq
    • source_id
    • n_works

    micro_cluster.tsv

    • micro_cluster_id
    • micro_cluster
    • short_label
    • long_label
    • keywords
    • summary
    • wikipedia_url
    • parent_macro_cluster_id
    • parent_meso_cluster_id
    • n_works

    micro_cluster_main_field.tsv

    • micro_cluster_id
    • main_field_seq
    • main_field_id
    • weight
    • is_primary_main_field

    micro_cluster_keyword.tsv

    • micro_cluster_id
    • keyword_seq
    • keyword

    micro_cluster_source.tsv

    • micro_cluster_id
    • source_seq
    • source_id
    • n_works
  12. z

    Data from: OPENBIB: Selected curated open metadata based on OpenAlex

    • zenodo.org
    bin, csv
    Updated May 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nick Haupka; Nick Haupka; Jack Culbert; Jack Culbert; Paul Donner; Paul Donner; Najko Jahn; Najko Jahn; Christopher Lenke; Christopher Lenke; Philipp Mayr; Philipp Mayr; Andreas Meier; Andreas Meier; Bernhard Mittermaier; Bernhard Mittermaier; Barbara Scheidt; Barbara Scheidt; Stephan Stahlschmidt; Stephan Stahlschmidt; Niels Taubert; Niels Taubert (2025). OPENBIB: Selected curated open metadata based on OpenAlex [Dataset]. http://doi.org/10.5281/zenodo.15308680
    Explore at:
    csv, binAvailable download formats
    Dataset updated
    May 5, 2025
    Dataset provided by
    Kompetenznetzwerk Bibliometrie
    Authors
    Nick Haupka; Nick Haupka; Jack Culbert; Jack Culbert; Paul Donner; Paul Donner; Najko Jahn; Najko Jahn; Christopher Lenke; Christopher Lenke; Philipp Mayr; Philipp Mayr; Andreas Meier; Andreas Meier; Bernhard Mittermaier; Bernhard Mittermaier; Barbara Scheidt; Barbara Scheidt; Stephan Stahlschmidt; Stephan Stahlschmidt; Niels Taubert; Niels Taubert
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    May 2025
    Description

    This dataset, compiled by the German Kompetenznetzwerk Bibliometrie, provides access to curated bibliometric data in OpenAlex focussing on the German research landscape.

    Curated data is provided for following entities:

    - Address information
    - Publishers
    - Funding information
    - Document types
    - Transformative agreements
    - Authors (tba)

    For an overview about the tables included, see data-overview.md

    This release is based on the August 2024 snapshot of OpenAlex. The OPENBIB snapshot is offered in both CSV and JSONL format.

    This is a initial release to demonstrate the current state of metadata curation. The aim is to continue these efforts and improve the curation together with the community and data providers.

    Data is made available under the CC0 license.

    Github repository: https://github.com/kbopenbib/kbopenbib_data/

  13. h

    openalex-old

    • huggingface.co
    Updated May 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sumuk's Archived Content (2025). openalex-old [Dataset]. https://huggingface.co/datasets/sumukshashidhar-archive/openalex-old
    Explore at:
    Dataset updated
    May 29, 2025
    Dataset authored and provided by
    Sumuk's Archived Content
    Description

    sumukshashidhar-archive/openalex-old dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. Z

    Data from: 2024 dataset on independent researchers collected from OpenAlex

    • data.niaid.nih.gov
    • repository.uantwerpen.be
    Updated Apr 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hertil Lindelöw, Camilla (2024). 2024 dataset on independent researchers collected from OpenAlex [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10925111
    Explore at:
    Dataset updated
    Apr 22, 2024
    Dataset provided by
    Hertil Lindelöw, Camilla
    Vandewalle, Eline
    License

    https://creativecommons.org/public-domain/https://creativecommons.org/public-domain/

    Description

    This dataset belongs to a paper about independent researchers submitted for the STI conference 2024 (https://sti2024.org/). It consists of several files described below. The data is from OpenAlex, collected through the InSySPo instance of the february snapshot of OpenAlex, hosted on Google Cloud. Since Topics are a new feature of OpenAlex data and therefore not part of the snapshot, this data as well as some other data not available at the InSySPo instance at the time of collection have been collected through the OpenAlex API, and incorporated in the files. Data from Scopus and Web of Science may be retrieved by using the search string in the appendix of the article.

    Files all domains

    240307_open_alex_works.tsv

    contains all works retrieved with the search string for Independent researchers in OpenAlex in the article's appendix.

    Files Social Sciences and/or Arts & Humanities

    240312_open_alex_works_soc_sci_arts_2010.tsv

    contains articles by Independent researchers in Social Sciences and Humanities published from 2010 and retrieved from OpenAlex.

    240312_open_alex_authors_soc_sci_arts_2010.tsv

    contains authors who are Independent researchers in Social Sciences and Humanities published from 2010 and retrieved from OpenAlex.

    240313_open_alex_authors_all_works_soc_sci_arts_2010.tsv

    contains all works by Independent researchers in Social Sciences and Humanities published from 2010 and retrieved from OpenAlex. All works mean that the researcher has at least once indicated independent status in the affiliation, and the author's other works are also included.

    author_distribution_domain1.csv

    contains number of works per number of authors in the domain Social Sciences (includes Arts & Humanities).

    author_distribution_field33.csv

    contains number of works per number of authors in the field Social Sciences.

    author_distribution_field12.csv

    contains number of works per number of authors in the field Arts & Humanities.

    all_ssh_oa.csv

    contains data for analyzing open access patterns for the domain Social Sciences (includes Arts & Humanities).

  15. OpenAlex Author Name Disambiguation V3 Initial Clusters

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Jul 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Justin Barrett; Jason Priem; Jason Priem; Jason Portenoy; Jason Portenoy; Richard Orr; Casey Meyer; Justin Barrett; Richard Orr; Casey Meyer (2023). OpenAlex Author Name Disambiguation V3 Initial Clusters [Dataset]. http://doi.org/10.5281/zenodo.8170024
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jul 21, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Justin Barrett; Jason Priem; Jason Priem; Jason Portenoy; Jason Portenoy; Richard Orr; Casey Meyer; Justin Barrett; Richard Orr; Casey Meyer
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Author name disambiguation V3 initial clusters for the OpenAlex dataset. See https://openalex.org

    There are 633803287 rows, split into 4 CSV (comma-delimited) files (with headers).

    The CSV files have two columns: "work_author_id" and "author_id"

    "work_author_id": An OpenAlex Work ID and an author sequence number, joined with an underscore ("_")

    "author_id": An OpenAlex Author ID, representing a unique author in OpenAlex

  16. P

    Works-magnet OpenAlex Affiliations Corrections Dataset

    • paperswithcode.com
    Updated Jun 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Works-magnet OpenAlex Affiliations Corrections Dataset [Dataset]. https://paperswithcode.com/dataset/works-magnet-openalex-affiliations
    Explore at:
    Dataset updated
    Jun 16, 2025
    Description

    The works-magnet aims at getting visible the AI-processed metadata for scholarly outputs and help curators improve those metadata. This dataset lists all the corrections asked by the works-magnet users to improve OpenAlex affiliations metadata.

  17. Gephi Open-Access Articles: Curated via OpenAlex and Looker Studio

    • zenodo.org
    Updated May 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Verónica Espinoza-González; Verónica Espinoza-González (2025). Gephi Open-Access Articles: Curated via OpenAlex and Looker Studio [Dataset]. http://doi.org/10.5281/zenodo.15507720
    Explore at:
    Dataset updated
    May 25, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Verónica Espinoza-González; Verónica Espinoza-González
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract

    This dataset presents a curated collection of over 700 open-access research articles in which Gephi was used as a primary tool for network analysis. The records were extracted using OpenAlex, cleaned and organized to facilitate exploration by students, researchers, and educators. The goal is to provide a reliable and accessible bibliography for those seeking to understand how Gephi has been applied in diverse research contexts. An interactive dashboard was built using Looker Studio to allow users to filter and visualize the dataset by topic, year, journal, and other dimensions. This resource supports academic work by helping users find methodological references and examples of Gephi applications in scholarly research.

  18. Z

    Experimental AI corpus from OpenAlex

    • data.niaid.nih.gov
    • zenodo.org
    Updated Aug 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jack Vines (2022). Experimental AI corpus from OpenAlex [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6997685
    Explore at:
    Dataset updated
    Aug 18, 2022
    Dataset provided by
    Juan Mateos-Garcia
    Jack Vines
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A corpus of AI research from OpenAlex. Includes:

    A works table with metadata about AI papers

    An authors table with information about the authors

    An institutions table with information about institutions

    A concepts table with information about concepts in works

    A MeSH table with information about MeSH terms in works

    A concepts json with the OpenAlex concept taxonomy

    An abstracts json with deinverted abstracts

    A citations json with citations from papers

    See ai_openalex_description.md for data dictionaries.

    See ai_openalex_methodology.md for a description of the method used to create the dataset.

    See here for additional information: https://github.com/nestauk/ai_genomics

  19. Z

    OpenAlex Author Name Disambiguation V3 Data - Disambiguation Model

    • data.niaid.nih.gov
    Updated Aug 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Portenoy, Jason (2023). OpenAlex Author Name Disambiguation V3 Data - Disambiguation Model [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8200678
    Explore at:
    Dataset updated
    Aug 1, 2023
    Dataset provided by
    Portenoy, Jason
    Barrett, Justin
    Orr, Richard
    Meyer, Casey
    Priem, Jason
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    5 Separate files used in the OpenAlex (https://openalex.org) V3 Author Name Disambiguation Model Creation:

    ORCID_hard_negative_pairs: Pairs of ORCIDs where either the full name, family name, or given name are a match and would therefore be more difficult to disambiguate.

    Disambiguator_all_possible_training_data: Dataset created which contains all possible features for modeling and all possible samples of data. Eventually, this was split into train/val/test and also processed more to create a better balance of positive to negative samples for our purposes.

    Disambiguator_final_train_data: Final data which the disambiguator was trained on.

    Disambiguator_final_val_data: Data which was used to test the model during training to optimize the features/hyperparameters chosen.

    Disambiguator_final_test_data: Final dataset which gave model performance indication after all hyperparameters were tuned and features were chosen.

    More details can be found at https://github.com/ourresearch/openalex-name-disambiguation

  20. Z

    OpenAire Research Graph linked with OpenAlex

    • data.niaid.nih.gov
    Updated Aug 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Klorek, Antoni (2024). OpenAire Research Graph linked with OpenAlex [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13365368
    Explore at:
    Dataset updated
    Aug 23, 2024
    Dataset authored and provided by
    Klorek, Antoni
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This package contains linked datasets of OpenAire Research Graph and OpenAlex.

    Files descriptions:

    • author_to_publication_dic.json contains a mapping of authors to their publications

    • downloads_views_dic.json contains mappings of the publication id to the number of its downloads and views

    • id_doi_dic.json contains a mapping of the publication id to its doi

    • merged1..5.json contain all publication data from the OARG dataset

    • necessary_fields_dic.json contains extracted publications’ fields necessary for the work

    • oarg_ref_rel_dic.json contains mapping of publication id to referenced and related work present in OpenAlex dataset

    • openalex_found_publications5_4.json contains all data on found publications from the OpenAlex

    • publication_to_author_dic.json contains a mapping of publications to their authors

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kellogg School of Management (2025). OpenAlex [Dataset]. https://redivis.com/datasets/a08b-162382x4j
Organization logo

OpenAlex

Explore at:
arrow, spss, stata, csv, parquet, application/jsonl, avro, sasAvailable download formats
Dataset updated
Apr 15, 2025
Dataset provided by
Redivis Inc.
Authors
Kellogg School of Management
Description

Methodology

OpenAlex is a fully open catalog of the global research system. It's named after the ancient library of Alexandria and made by the non-profit OurResearch.

The OpenAlex dataset describes scholarly entities and how those entities are connected to each other. Types of entities include works, authors, sources, institutions, topics, publishers, and funders. Together, these make a huge web (or more technically, heterogenous directed graph) of hundreds of millions of entities and billions of connections between them all.

OpenAlex offers an open replacement for industry-standard scientific knowledge bases like Elsevier's Scopus and Clarivate's WEb of Science. Compared to these paywalled services, OpenAlex offers significant advantages in terms of inclusivity, affordability, and availability.

The data here are derived from the snapshot data, which is updated about once per month. The raw data are stored on Amazon S3 in the publicly available openalex bucket as gzip-compressed JSON lines files. We use custom functions in Python code to flatten these records into the relational database hosted here on Redivis.

The live data are also available for free via the REST API.

Usage

If you use OpenAlex in research, please cite this paper:

Priem, J., Piwowar, H., & Orr, R. (2022).

OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. ArXiv. https://arxiv.org/abs/2205.01833

Search
Clear search
Close search
Google apps
Main menu