61 datasets found

OpenAlex
redivis.com
huggingface.co
application/jsonl +7
Updated Apr 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kellogg School of Management (2025). OpenAlex [Dataset]. https://redivis.com/datasets/a08b-162382x4j
Explore at:
arrow, spss, stata, csv, parquet, application/jsonl, avro, sasAvailable download formats
Dataset updated
Apr 15, 2025
Dataset provided by
Redivis Inc.
Authors
Kellogg School of Management
Description
Methodology

OpenAlex is a fully open catalog of the global research system. It's named after the ancient library of Alexandria and made by the non-profit OurResearch.

The OpenAlex dataset describes scholarly entities and how those entities are connected to each other. Types of entities include works, authors, sources, institutions, topics, publishers, and funders. Together, these make a huge web (or more technically, heterogenous directed graph) of hundreds of millions of entities and billions of connections between them all.

OpenAlex offers an open replacement for industry-standard scientific knowledge bases like Elsevier's Scopus and Clarivate's WEb of Science. Compared to these paywalled services, OpenAlex offers significant advantages in terms of inclusivity, affordability, and availability.

The data here are derived from the snapshot data, which is updated about once per month. The raw data are stored on Amazon S3 in the publicly available openalex bucket as gzip-compressed JSON lines files. We use custom functions in Python code to flatten these records into the relational database hosted here on Redivis.

The live data are also available for free via the REST API.

Usage

If you use OpenAlex in research, please cite this paper:

Priem, J., Piwowar, H., & Orr, R. (2022).

OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. ArXiv. https://arxiv.org/abs/2205.01833
s
OpenAlex Affiliations Corrections
data.smartidf.services
data.enseignementsup-recherche.gouv.fr
csv, excel, json
Updated Nov 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). OpenAlex Affiliations Corrections [Dataset]. https://data.smartidf.services/explore/dataset/openalex-affiliations-corrections/
Explore at:
csv, json, excelAvailable download formats
Dataset updated
Nov 28, 2024
License
Licence Ouverte / Open Licence 2.0https://www.etalab.gouv.fr/wp-content/uploads/2018/11/open-licence.pdf
License information was derived automatically
Description
This dataset list all the issues on the Github repository https://github.com/dataesr/openalex-affiliations/.The dataset is updated every day at 7 AM GMT through a Github action on this repository https://github.com/dataesr/openalex-affiliations/blob/main/.github/workflows/sync_openalex_affiliations_github_issues.yml.List of corrections of OpenAlex affiliations.Ce jeu de données liste toutes les issue de l'entrepôt Github https://github.com/dataesr/openalex-affiliations/."github_issue_id": integer, issue number according to Github"github_issue_link": string, weblink to the Github issue"state": string, current state of the issue, can be "open" or "closed". An issue is closed if it has been ingested by OpenAlex."date_opened": date, when the issue has been opened, eg "2024-11-15""date_closed": date, when the issue has been closed, null if not closed, eg "2024-11-18""raw_affiliation_name": string, raw affiliation string as collected by OpenAlex,"has_added_rors": boolean, if the correction suggest the add of a ROR, 1 if true, 0 if false"has_removed_rors": boolean, if the correction suggest the of a ROR, 1 if true, 0 if false"new_rors": string, list of corrected RORs, separated by ";""previous_rors": string, list of RORs before correction, separated by ";""added_rors": string, list of added RORs after correction, separated by ";""removed_rors": string, list of removed RORs after correction, separated by ";""openalex_works_examples": string, weblink to OpenAlex work mentionning the affiliation string"searched_between": searched years range, eg "2018 - 2024""contact": string, encrypted version of the email of the author of the correction, only the domain name server is not encrypted"contact_domain": string, domain name server"version": version of the works-magnet app used to collect the correction
a
OpenAlex Snapshot - 23-08-21 (Complete)
academictorrents.com
bittorrent
Updated Aug 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jason Priem and Heather Piwowar and Richard Orr (2023). OpenAlex Snapshot - 23-08-21 (Complete) [Dataset]. https://academictorrents.com/details/c29c888839bf0512043c770e78ddbe321ea6567b
Explore at:
bittorrent(335409170855)Available download formats
Dataset updated
Aug 22, 2023
Dataset authored and provided by
Jason Priem and Heather Piwowar and Richard Orr
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
OpenAlex is a new, fully-open scientific knowledge graph (SKG), launched to replace the discontinued Microsoft Academic Graph (MAG). It contains metadata for 209M works (journal articles, books, etc); 2013M disambiguated authors; 124k venues (places that host works, such as journals and online repositories); 109k institutions; and 65k Wikidata concepts (linked to works via an automated hierarchical multi-tag classifier). The dataset is fully and freely available via a web-based GUI, a full data dump, and high-volume REST API. The resource is under active development and future work will improve accuracy and coverage of citation information and author/institution parsing and deduplication. From: Priem, J., Piwowar, H., & Orr, R. (2022). OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. ArXiv. Upload details: Downloaded a copy from the aws endpoint s3://openalex on 2023-08-21. Updates are rolling, so future
h
openalex-topic-title-abstract
huggingface.co
Updated Feb 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Albert Martínez (2025). openalex-topic-title-abstract [Dataset]. https://huggingface.co/datasets/albertmartinez/openalex-topic-title-abstract
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 12, 2025
Authors
Albert Martínez
Description
albertmartinez/openalex-topic-title-abstract dataset hosted on Hugging Face and contributed by the HF Datasets community
g
OpenAlex Affiliations Corrections | gimi9.com
gimi9.com
Updated Nov 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). OpenAlex Affiliations Corrections | gimi9.com [Dataset]. https://gimi9.com/dataset/fr_674e751add1d43173f113fcf/
Explore at:
Dataset updated
Nov 28, 2024
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset list all the issues on the Github repository The dataset is updated every day at 2 AM through a Github action on this repository. List of corrections of OpenAlex affiliations. Ce jeu de données liste toutes les issue de l'entrepôt Github https://github.com/dataesr/openalex-affiliations/. "github_issue_id": integer, issue number according to Github "github_issue_link": string, weblink to the Github issue "state": string, current state of the issue, can be "open" or "closed". An issue is closed if it has been treated by OpenAlex. "date_opened": date, when the issue has been opened, ex "2024-11-15" "date_closed": date, when the issue has been closed, null if not closed, ex "2024-11-18" "raw_affiliation_name": string, raw affiliation string as collected by OpenAlex, "has_added_rors": boolean, if the correction suggest the add of a ROR, 1 if true, 0 if false "has_removed_rors": boolean, if the correction suggest the of a ROR, 1 if true, 0 if false "new_rors": string, list of corrected RORs, separated by ";" "previous_rors": string, list of RORs before correction, separated by ";" "added_rors": string, list of added RORs after correction, separated by ";" "removed_rors": string, list of removed RORs after correction, separated by ";" "openalex_works_examples": string, weblink to OpenAlex work mentionning the affiliation string "contact": string, encrypted version of the email of the author of the correction, only the domain name server is not encrypted "contact_domain": string, domain name server
f
University of Arizona authors' scholarly works published and cited works...
arizona.figshare.com
txt
Updated May 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yan Han (2025). University of Arizona authors' scholarly works published and cited works year 2020 from OpenAlex [Dataset]. http://doi.org/10.25422/azu.data.28746095.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.25422/azu.data.28746095.v1
Dataset updated
May 1, 2025
Dataset provided by
University of Arizona Research Data Repository
Authors
Yan Han
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Two Datasets: works_published and works_cited for year 2020 from OpenAlex database.Check license https://github.com/ourresearch/openalex-docs/blob/main/license.md "OpenAlex data is made available under the CC0 license. That means it's in the public domain, and free to use in any way you like. We appreciate attribution where it's convenient, but it's not at all necessary. There is one exception: the MAG Format snapshot is released under ODC-BY, as per the original MAG license applied by Microsoft (it reuses their schema). See the LICENSE.txt file in the MAG format snapshot distribution for attribution requirement details."Data Quality Considerations:OpenAlex has improved the accuracy of the data with helps from algorithms and institutions.Our current data quality assessment showed the precision and recall 95%+.The first dataset "works_published", as constructed in the provided sources, refers to the publications authored by individuals affiliated with the University of Arizona (UArizona). The data is retrieved using the OpenAlexR package by querying the OpenAlex database with UArizona's Research Organization Registry (ROR) ID (03m2x1q45) and specific publication date ranges. Key aspects of this dataset:Scope: It contains records of scholarly works associated with UArizona authors, including various publication types such as journals, repositories (like PubMed and arXiv), and others. It is also possible to filter the results to include only "journal" type publications using the primary_location.source.type = "journal" parameter in the oa_fetch function.Temporal Coverage: The sources demonstrate fetching data for specific years (e.g., 2019, 2020, 2021, 2022, 2023).Data Retrieval: The process involves using the oa_fetch function from the openalexR package with the entity="works" parameter and specifying the institutions.ror.Data Structure: Each record in this dataset represents a publication and includes various fields. Certain fields are data frames.Usage: This dataset is used as a starting point for various data analyses and data mining.The second dataset "works_cited", refers to scholarly works cited by the publications within the works_published dataset. It is created by extracting the OpenAlex IDs from the $referenced_works field of the works_published data and then using the oa_fetch function to retrieve the full metadata for these cited works. Key aspects of this dataset:Scope: It includes metadata for a wide range of scholarly works that have been cited by UArizona-affiliated publications. This can encompass articles, books, preprints, book chapters, and other types of scholarly outputs.Data Derivation: The dataset is derived from the referenced_works field of the works_published dataset.Data Structure: Each record in this dataset represents a cited work and contains various fields retrieved by the OpenAlex API.The third file (institution_publications.r) is the source code to get the above dataset.Note the code retrieves additional years in addition to 2020.Usage: Both datasets are crucial for performing publication and citation analysis and mining, including:Identifying the most frequently cited works and journals.Analyzing the journal usage and publisher distribution of cited works.Understanding the scholarly landscape influencing UArizona research.Identifying potential resources for library collections based on citation frequency.Investigating the presence and frequency of citations from specific publishers or to specific works.For inquiries regarding the contents of this dataset, please contact the Corresponding Author listed in the README.txt file. Administrative inquiries (e.g., removal requests, trouble downloading, etc.) can be directed to data-management@arizona.eduThis item is part of University of Arizona authors' scholarly works published and cited works
Z
Core sources and core publications in OpenAlex
data.niaid.nih.gov
zenodo.org
Updated Oct 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Van Eck, Nees Jan (2024). Core sources and core publications in OpenAlex [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10949670
Explore at:
Dataset updated
Oct 2, 2024
Dataset authored and provided by
Van Eck, Nees Jan
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This data set contains data on core sources and core publications identified in the OpenAlex database (based on the OpenAlex snapshot released on August 30, 2024).

The source code used to identify core sources and core publications in OpenAlex is available in this GitHub repository.

See this report for more information about the identification of core sources and core publications in OpenAlex.

This data set consists of the following tab-delimited files.

source.tsv

source_id

source

source_type

issn_l

is_core_source

n_works

n_core_works

work.tsv

work_id

work_type

pub_year

source_id

doi

is_core_work
"Whitelist" journals data obtained from OpenAlex
zenodo.org
bin
Updated May 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Irina Volkova; Irina Volkova (2025). "Whitelist" journals data obtained from OpenAlex [Dataset]. http://doi.org/10.5281/zenodo.15392888
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15392888
Dataset updated
May 14, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Irina Volkova; Irina Volkova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 9, 2024
Description
The dataset contains two samples of "White List" journals for December 2024 and February 2025. The data was collected based on API queries to OpenAlex and parsing the JSON file from the RCNI website where the "White List" is located. The following properties of "White List" journal objects were considered: ISSN, Title, journal level in the "White List", date accepted, journal ID in OpenAlex, database indexing information (Web of Science, Scopus, DOAJ), publisher and country of publishing, total citations, open access data, leading thematic topic, leading thematic field, authors from which countries published in the journal and the number of publications (if Russian authors were among them, an additional request was made for the number of publications in the journal with Russian authors by year).

These are SQLite database files that can be opened in DB Browser for SQLite.
Z
More open abstracts? Comparing abstract coverage in Crossref and OpenAlex...
data.niaid.nih.gov
zenodo.org
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kramer, Bianca (2024). More open abstracts? Comparing abstract coverage in Crossref and OpenAlex [dataset] [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11580549
Explore at:
Dataset updated
Nov 25, 2024
Dataset authored and provided by
Kramer, Bianca
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Aggregated data underlying the blogpost:More open abstracts? Comparing abstract coverage in Crossref and OpenAlexhttps://bmkramer.github.io/SesameOpenScience_site/thought/202411_open_abstracts/The dataset contains the following files:

abstracts_crossref_openalex_202410.csv

abstracts_crossref_openalex_data_dictionary.txt

The csv file contains data on abstract coverage for Crossref DOIs in Crossref and OpenAlex, aggregated by publisher, for the top 1000 publishers in terms of number of retrieved dois. Scope is limited to publications with Crossref type 'journal-articles' and publication years 2022-2024. Variables are described in the data dictionary included in this record.This analysis was performed using Curtin Open Knowledge Initiative (COKI) infrastructure, which is documented on GitHub: https://github.com/The-Academic-Observatory. Here, a number of open data sources (including Crossref, OpenAlex and OpenAIRE) are ingested into a Google Big Query environment, which can then be queried via SQL.The following data sources were used:

Crossref (Metadata Plus snaphot 2024-10-31, Crossref member route API 2024-11-20)

OpenAlex (data snapshot 2024-10-30)

The code used to generate the dataset is available on GitHub: https://github.com/bmkramer/more_open_abstracts
Z
Colombia Coauthorship networks divided by openalex lvl 0 concepts
data.niaid.nih.gov
Updated Jul 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gerardo Gutierrez (2023). Colombia Coauthorship networks divided by openalex lvl 0 concepts [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7685127
Explore at:
Dataset updated
Jul 13, 2023
Dataset authored and provided by
Gerardo Gutierrez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Colombia
Description
Graphs are uploaded in gml format, which can be easily imported by networkx, gephi and neo4j.

The undirected graphs are constructed from openalex database (last update Dic 2022), where nodes are authors and edges specify whether or not two nodes coauthored at least one paper. We avoided papers with more than 10 authors since they are very scarse and could affect the posterior analysis of the networks.

The attributes of the nodes consist of the list of used words for each author and its frequency and all the concepts (of every level) the papers of the author are labeld.

The attributes of the edges only contains the number of papers published between two authors.

For more information about openalex concepts, visit https://docs.openalex.org/api-entities/concepts
Classification of research publications based on data from OpenAlex
zenodo.org
data.niaid.nih.gov
zip
Updated Jan 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nees Jan Van Eck; Nees Jan Van Eck (2024). Classification of research publications based on data from OpenAlex [Dataset]. http://doi.org/10.5281/zenodo.10560276
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10560276
Dataset updated
Jan 24, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Nees Jan Van Eck; Nees Jan Van Eck
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This data set contains an algorithmic classification of research publications based on data from OpenAlex. The classification is based on the OpenAlex snapshot released on November 21, 2023.

To build the classification, we used the so-called extended direct citation approach in combination with the Leiden algorithm. The source code of our software is available here. The classification covers the 71 million journal articles, proceedings papers, preprints, and book chapters in OpenAlex that were published between 2000 and 2023 and that are connected to each other by citation links. Based on 1715 million citation links, we built a three-level hierarchical classification. Each publication was assigned to a cluster at each of the three levels of the classification. Clusters consist of publications that are relatively strongly connected by citation links and that can therefore be expected to be topically related. At each level of the classification, a publication was assigned to only one cluster, which means clusters do not overlap.

The classification consists of 4521 micro clusters at the lowest (most granular) level, 917 meso clusters at the middle level, and 20 macro clusters at the highest (least granular) level. We also algorithmically linked each cluster in the classification to one or more of the following five broad main fields: biomedical and health sciences, life and earth sciences, mathematics and computer science, physical sciences and engineering, and social sciences and humanities.

We used the Updated GPT 3.5 Turbo large language model, developed by OpenAI, to label the 4521 micro clusters at the lowest level in the classification. The source code of our software can be found here.

See this blog post for more information about the classification.

The classification, including the labels of the micro clusters, is available in the following tab-delimited files.

clustering.tsv

work_id

doi

macro_cluster_id

meso_cluster_id

micro_cluster_id

main_field.tsv

main_field_id

main_field

macro_cluster.tsv

macro_cluster_id

macro_cluster

n_works

macro_cluster_main_field.tsv

macro_cluster_id

main_field_seq

main_field_id

weight

is_primary_main_field

meso_cluster.tsv

meso_cluster_id

meso_cluster

parent_macro_cluster_id

n_works

meso_cluster_main_field.tsv

meso_cluster_id

main_field_seq

main_field_id

weight

is_primary_main_field

meso_cluster_source.tsv

meso_cluster_id

source_seq

source_id

n_works

micro_cluster.tsv

micro_cluster_id

micro_cluster

short_label

long_label

keywords

summary

wikipedia_url

parent_macro_cluster_id

parent_meso_cluster_id

n_works

micro_cluster_main_field.tsv

micro_cluster_id

main_field_seq

main_field_id

weight

is_primary_main_field

micro_cluster_keyword.tsv

micro_cluster_id

keyword_seq

keyword

micro_cluster_source.tsv

micro_cluster_id

source_seq

source_id

n_works
z
Data from: OPENBIB: Selected curated open metadata based on OpenAlex
zenodo.org
bin, csv
Updated May 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nick Haupka; Nick Haupka; Jack Culbert; Jack Culbert; Paul Donner; Paul Donner; Najko Jahn; Najko Jahn; Christopher Lenke; Christopher Lenke; Philipp Mayr; Philipp Mayr; Andreas Meier; Andreas Meier; Bernhard Mittermaier; Bernhard Mittermaier; Barbara Scheidt; Barbara Scheidt; Stephan Stahlschmidt; Stephan Stahlschmidt; Niels Taubert; Niels Taubert (2025). OPENBIB: Selected curated open metadata based on OpenAlex [Dataset]. http://doi.org/10.5281/zenodo.15308680
Explore at:
csv, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15308680
Dataset updated
May 5, 2025
Dataset provided by
Kompetenznetzwerk Bibliometrie
Authors
Nick Haupka; Nick Haupka; Jack Culbert; Jack Culbert; Paul Donner; Paul Donner; Najko Jahn; Najko Jahn; Christopher Lenke; Christopher Lenke; Philipp Mayr; Philipp Mayr; Andreas Meier; Andreas Meier; Bernhard Mittermaier; Bernhard Mittermaier; Barbara Scheidt; Barbara Scheidt; Stephan Stahlschmidt; Stephan Stahlschmidt; Niels Taubert; Niels Taubert
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
May 2025
Description
This dataset, compiled by the German Kompetenznetzwerk Bibliometrie, provides access to curated bibliometric data in OpenAlex focussing on the German research landscape.

Curated data is provided for following entities:

- Address information
- Publishers
- Funding information
- Document types
- Transformative agreements
- Authors (tba)

For an overview about the tables included, see data-overview.md

This release is based on the August 2024 snapshot of OpenAlex. The OPENBIB snapshot is offered in both CSV and JSONL format.

This is a initial release to demonstrate the current state of metadata curation. The aim is to continue these efforts and improve the curation together with the community and data providers.

Data is made available under the CC0 license.

Github repository: https://github.com/kbopenbib/kbopenbib_data/
h
openalex-old
huggingface.co
Updated May 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sumuk's Archived Content (2025). openalex-old [Dataset]. https://huggingface.co/datasets/sumukshashidhar-archive/openalex-old
Explore at:
Dataset updated
May 29, 2025
Dataset authored and provided by
Sumuk's Archived Content
Description
sumukshashidhar-archive/openalex-old dataset hosted on Hugging Face and contributed by the HF Datasets community
Z
Data from: 2024 dataset on independent researchers collected from OpenAlex
data.niaid.nih.gov
repository.uantwerpen.be
Updated Apr 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hertil Lindelöw, Camilla (2024). 2024 dataset on independent researchers collected from OpenAlex [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10925111
Explore at:
Dataset updated
Apr 22, 2024
Dataset provided by
Hertil Lindelöw, Camilla
Vandewalle, Eline
License
https://creativecommons.org/public-domain/https://creativecommons.org/public-domain/
Description
This dataset belongs to a paper about independent researchers submitted for the STI conference 2024 (https://sti2024.org/). It consists of several files described below. The data is from OpenAlex, collected through the InSySPo instance of the february snapshot of OpenAlex, hosted on Google Cloud. Since Topics are a new feature of OpenAlex data and therefore not part of the snapshot, this data as well as some other data not available at the InSySPo instance at the time of collection have been collected through the OpenAlex API, and incorporated in the files. Data from Scopus and Web of Science may be retrieved by using the search string in the appendix of the article.

Files all domains

240307_open_alex_works.tsv

contains all works retrieved with the search string for Independent researchers in OpenAlex in the article's appendix.

Files Social Sciences and/or Arts & Humanities

240312_open_alex_works_soc_sci_arts_2010.tsv

contains articles by Independent researchers in Social Sciences and Humanities published from 2010 and retrieved from OpenAlex.

240312_open_alex_authors_soc_sci_arts_2010.tsv

contains authors who are Independent researchers in Social Sciences and Humanities published from 2010 and retrieved from OpenAlex.

240313_open_alex_authors_all_works_soc_sci_arts_2010.tsv

contains all works by Independent researchers in Social Sciences and Humanities published from 2010 and retrieved from OpenAlex. All works mean that the researcher has at least once indicated independent status in the affiliation, and the author's other works are also included.

author_distribution_domain1.csv

contains number of works per number of authors in the domain Social Sciences (includes Arts & Humanities).

author_distribution_field33.csv

contains number of works per number of authors in the field Social Sciences.

author_distribution_field12.csv

contains number of works per number of authors in the field Arts & Humanities.

all_ssh_oa.csv

contains data for analyzing open access patterns for the domain Social Sciences (includes Arts & Humanities).
OpenAlex Author Name Disambiguation V3 Initial Clusters
zenodo.org
data.niaid.nih.gov
application/gzip
Updated Jul 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Justin Barrett; Jason Priem; Jason Priem; Jason Portenoy; Jason Portenoy; Richard Orr; Casey Meyer; Justin Barrett; Richard Orr; Casey Meyer (2023). OpenAlex Author Name Disambiguation V3 Initial Clusters [Dataset]. http://doi.org/10.5281/zenodo.8170024
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8170024
Dataset updated
Jul 21, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Justin Barrett; Jason Priem; Jason Priem; Jason Portenoy; Jason Portenoy; Richard Orr; Casey Meyer; Justin Barrett; Richard Orr; Casey Meyer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Author name disambiguation V3 initial clusters for the OpenAlex dataset. See https://openalex.org

There are 633803287 rows, split into 4 CSV (comma-delimited) files (with headers).

The CSV files have two columns: "work_author_id" and "author_id"

"work_author_id": An OpenAlex Work ID and an author sequence number, joined with an underscore ("_")

"author_id": An OpenAlex Author ID, representing a unique author in OpenAlex
P
Works-magnet OpenAlex Affiliations Corrections Dataset
paperswithcode.com
Updated Jun 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Works-magnet OpenAlex Affiliations Corrections Dataset [Dataset]. https://paperswithcode.com/dataset/works-magnet-openalex-affiliations
Explore at:
Dataset updated
Jun 16, 2025
Description
The works-magnet aims at getting visible the AI-processed metadata for scholarly outputs and help curators improve those metadata. This dataset lists all the corrections asked by the works-magnet users to improve OpenAlex affiliations metadata.
Gephi Open-Access Articles: Curated via OpenAlex and Looker Studio
zenodo.org
Updated May 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Verónica Espinoza-González; Verónica Espinoza-González (2025). Gephi Open-Access Articles: Curated via OpenAlex and Looker Studio [Dataset]. http://doi.org/10.5281/zenodo.15507720
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15507720
Dataset updated
May 25, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Verónica Espinoza-González; Verónica Espinoza-González
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

This dataset presents a curated collection of over 700 open-access research articles in which Gephi was used as a primary tool for network analysis. The records were extracted using OpenAlex, cleaned and organized to facilitate exploration by students, researchers, and educators. The goal is to provide a reliable and accessible bibliography for those seeking to understand how Gephi has been applied in diverse research contexts. An interactive dashboard was built using Looker Studio to allow users to filter and visualize the dataset by topic, year, journal, and other dimensions. This resource supports academic work by helping users find methodological references and examples of Gephi applications in scholarly research.
Z
Experimental AI corpus from OpenAlex
data.niaid.nih.gov
zenodo.org
Updated Aug 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jack Vines (2022). Experimental AI corpus from OpenAlex [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6997685
Explore at:
Dataset updated
Aug 18, 2022
Dataset provided by
Juan Mateos-Garcia
Jack Vines
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A corpus of AI research from OpenAlex. Includes:

A works table with metadata about AI papers

An authors table with information about the authors

An institutions table with information about institutions

A concepts table with information about concepts in works

A MeSH table with information about MeSH terms in works

A concepts json with the OpenAlex concept taxonomy

An abstracts json with deinverted abstracts

A citations json with citations from papers

See ai_openalex_description.md for data dictionaries.

See ai_openalex_methodology.md for a description of the method used to create the dataset.

See here for additional information: https://github.com/nestauk/ai_genomics
Z
OpenAlex Author Name Disambiguation V3 Data - Disambiguation Model
data.niaid.nih.gov
Updated Aug 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Portenoy, Jason (2023). OpenAlex Author Name Disambiguation V3 Data - Disambiguation Model [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8200678
Explore at:
Dataset updated
Aug 1, 2023
Dataset provided by
Portenoy, Jason
Barrett, Justin
Orr, Richard
Meyer, Casey
Priem, Jason
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
5 Separate files used in the OpenAlex (https://openalex.org) V3 Author Name Disambiguation Model Creation:

ORCID_hard_negative_pairs: Pairs of ORCIDs where either the full name, family name, or given name are a match and would therefore be more difficult to disambiguate.

Disambiguator_all_possible_training_data: Dataset created which contains all possible features for modeling and all possible samples of data. Eventually, this was split into train/val/test and also processed more to create a better balance of positive to negative samples for our purposes.

Disambiguator_final_train_data: Final data which the disambiguator was trained on.

Disambiguator_final_val_data: Data which was used to test the model during training to optimize the features/hyperparameters chosen.

Disambiguator_final_test_data: Final dataset which gave model performance indication after all hyperparameters were tuned and features were chosen.

More details can be found at https://github.com/ourresearch/openalex-name-disambiguation
Z
OpenAire Research Graph linked with OpenAlex
data.niaid.nih.gov
Updated Aug 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Klorek, Antoni (2024). OpenAire Research Graph linked with OpenAlex [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13365368
Explore at:
Dataset updated
Aug 23, 2024
Dataset authored and provided by
Klorek, Antoni
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This package contains linked datasets of OpenAire Research Graph and OpenAlex.

Files descriptions:

author_to_publication_dic.json contains a mapping of authors to their publications

downloads_views_dic.json contains mappings of the publication id to the number of its downloads and views

id_doi_dic.json contains a mapping of the publication id to its doi

merged1..5.json contain all publication data from the OARG dataset

necessary_fields_dic.json contains extracted publications’ fields necessary for the work

oarg_ref_rel_dic.json contains mapping of publication id to referenced and related work present in OpenAlex dataset

openalex_found_publications5_4.json contains all data on found publications from the OpenAlex

publication_to_author_dic.json contains a mapping of publications to their authors

Facebook

Twitter

Click to copy link

Link copied

Cite

Kellogg School of Management (2025). OpenAlex [Dataset]. https://redivis.com/datasets/a08b-162382x4j

OpenAlex

Explore at:

arrow, spss, stata, csv, parquet, application/jsonl, avro, sasAvailable download formats

Dataset updated

Apr 15, 2025

Dataset provided by

Redivis Inc.

Authors

Kellogg School of Management

Description

Methodology

OpenAlex is a fully open catalog of the global research system. It's named after the ancient library of Alexandria and made by the non-profit OurResearch.

The OpenAlex dataset describes scholarly entities and how those entities are connected to each other. Types of entities include works, authors, sources, institutions, topics, publishers, and funders. Together, these make a huge web (or more technically, heterogenous directed graph) of hundreds of millions of entities and billions of connections between them all.

OpenAlex offers an open replacement for industry-standard scientific knowledge bases like Elsevier's Scopus and Clarivate's WEb of Science. Compared to these paywalled services, OpenAlex offers significant advantages in terms of inclusivity, affordability, and availability.

The data here are derived from the snapshot data, which is updated about once per month. The raw data are stored on Amazon S3 in the publicly available openalex bucket as gzip-compressed JSON lines files. We use custom functions in Python code to flatten these records into the relational database hosted here on Redivis.

The live data are also available for free via the REST API.

Usage

If you use OpenAlex in research, please cite this paper:

Priem, J., Piwowar, H., & Orr, R. (2022).

OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. ArXiv. https://arxiv.org/abs/2205.01833

Clear search

Close search

Google apps

Main menu

OpenAlex

Methodology

Usage

OpenAlex Affiliations Corrections

OpenAlex Snapshot - 23-08-21 (Complete)

openalex-topic-title-abstract

OpenAlex Affiliations Corrections | gimi9.com

University of Arizona authors' scholarly works published and cited works...

Core sources and core publications in OpenAlex

"Whitelist" journals data obtained from OpenAlex

More open abstracts? Comparing abstract coverage in Crossref and OpenAlex...

Colombia Coauthorship networks divided by openalex lvl 0 concepts

Classification of research publications based on data from OpenAlex

Data from: OPENBIB: Selected curated open metadata based on OpenAlex

openalex-old

Data from: 2024 dataset on independent researchers collected from OpenAlex

OpenAlex Author Name Disambiguation V3 Initial Clusters

Works-magnet OpenAlex Affiliations Corrections Dataset

Gephi Open-Access Articles: Curated via OpenAlex and Looker Studio

Abstract

Experimental AI corpus from OpenAlex

OpenAlex Author Name Disambiguation V3 Data - Disambiguation Model

OpenAire Research Graph linked with OpenAlex

OpenAlex

Methodology

Usage