Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The USPTO grants US patents to inventors and assignees all over the world. For researchers in particular, PatentsView is intended to encourage the study and understanding of the intellectual property (IP) and innovation system; to serve as a fundamental function of the government in creating “public good” platforms in these data; and to eliminate redundant cleaning, converting and matching of these data by individual researchers, thus freeing up researcher time to do what they do best—study IP, innovation, and technological change.
PatentsView Data is a database that longitudinally links inventors, their organizations, locations, and overall patenting activity. The dataset uses data derived from USPTO bulk data files.
Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.
“PatentsView” by the USPTO, US Department of Agriculture (USDA), the Center for the Science of Science and Innovation Policy, New York University, the University of California at Berkeley, Twin Arch Technologies, and Periscopic, used under CC BY 4.0.
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patentsview
Facebook
TwitterThe PatentsView PatentSearch API is intended to inspire the exploration and enhanced understanding of US intellectual property (IP) and innovation systems. The database driving the API is regularly updated and integrates the best available tools for inventor disambiguation and data quality control. We hope researchers and developers alike will explore the API to discover people and companies and to visualize trends and patterns across the US innovation landscape.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These are the detailed description text from granted patents (1976-2014, prefix "g_") and patent applications (2021-2014, prefix "pg_") from the final release of PatentsView on 12/31/2024.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PatentsView Data is a dataset that longitudinally links inventors, their organizations, locations, and overall patenting activity. The dataset uses data derived from USPTO bulk data files.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Patents View API can easily query all the granted patents in US. But the result table is not very comprehensive and have repeat records for different values for a particular column. I wrote a script to comprehend it and pulled only data rich columns on which various analysis can be done.
This dataset consists of all granted patents in US in the first quarter of 2019 (Jan - Mar). I wanted to analyze this to see which industries are leading in innovations, sectors and technologies used in these patents and see if we could draw some patterns.
This dataset is a publicly available dataset and you can check all available columns here - http://www.patentsview.org/api/patent.html.
I am still building my analyzing and visualization dashboard. Open to any questions that you may want to see answered from this dataset.
Facebook
TwitterThis dataset was generated using Google's BigQuery API. The query is adapted from Appendix A from the work by Lee and Hsiang.
The query is changed to include patents from 2000 - 2015. The specific query is shown below.
SELECT STRING_AGG(distinct t2.group_id ORDER BY t2.group_id) AS cpc_ids,
t1.id, t1.date, t3.text
FROM patents-public-data.patentsview.patent t1,
patents-public-data.patentsview.cpc_current t2,
patents-public-data.patentsview.claim t3
WHERE t1.id =… See the full description on the dataset page: https://huggingface.co/datasets/MalavP/USPTO-3M.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Version 1. 15/07/2025This dataset includes measures associated to the technological significance of USPTO patents and supporting variables. It covers patents granted between 1 January 1980 and 31 December 2009.These variables were utilized in “The Changing Nature of Firm Innovation: Short-term Orientation and Influential Innovation in US Public Firms”, Management Science, forthcomingIf you use this dataset in your research or publication, we kindly ask that you acknowledge it by including the following citation: Corredoira, R. A., & Goldfarb, B. D. (2025). Measures associated to USPTO patent technological significance [Data set]. Newcastle University. https://doi.org/10.25405/data.ncl.29506094 Proper citation helps ensure the dataset's impact is recognized and supports continued data sharing.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository includes the version of the PatentsView data (March 2022) that is used to produce some of the exhibits in "Social Push and the Direction of Innovation."Because we needed citations data and the March 2022 version is no longer available, we use the latest available version of g_us_patent_citation.tsv (from April 2025), which is a bit newer than the version available at: https://doi.org/10.3886/E223582V1.
Facebook
TwitterUS Patent Descriptions
This dataset contains the descriptions of granted US utility patents, filtered and deduplicated.The original data comes from all granted patents in 2025 up to May 20, available from PatentsView.
Splits
train: 10,000 rows for model training
validation: 2,500 rows for validation
test: 2,500 rows for evaluation
Columns
patent_id: Identifier for the patent; useful for reconciling with other PatentsView datasets
description_text: Full… See the full description on the dataset page: https://huggingface.co/datasets/mhurhangee/us-patent-descriptions.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains supplementary material regarding country-specific analysis for the Nordic region.
Facebook
Twitterhttps://cdla.dev/open-use-of-data-agreement-v1-0https://cdla.dev/open-use-of-data-agreement-v1-0
The DISCERN dataset was developed to support academic research on corporate innovation by linking data on U.S. publicly listed firms from Standard & Poor’s Compustat database to their patents and scientific publications. A key feature of DISCERN is its comprehensive coverage of firms’ subsidiaries and their ownership changes over time, which is crucial for accurately mapping corporate innovation. Patents and publications may be assigned to various legal entities within a firm’s organizational structure. Subsidiaries may change ownership in M&A events. By accounting for these ownership linkages over time, DISCERN enables researchers to construct more precise measures of firms’ knowledge production and examine the factors influencing their R&D investment decisions.
Version 2.0 incorporates several key improvements over the previous version of DISCERN. First, we shift to using the PatentsView database as the main source of patent data and OpenAlex as the main source of scientific publication data. PatentsView is publicly available and continuously maintained directly by the United States Patents & Trademarks Office (USPTO). OpenAlex is currently the only open data source of scientific publication metadata. Using freely available data sources allows us to share both the patent and the publication datasets openly. This enhances data access, which was previously limited due to the use of propriety data. Second, the updated dataset now covers the period from 1980 to 2021, providing an additional six years of data. Third, we transition to using Securities and Exchange Commission (SEC) filings as the primary source of subsidiary data, allowing us to trace ownership linkages further back to the mid-1990s and ensuring a higher degree of reliability compared to the Orbis data used in the original version, which was less reliable and had comprehensive coverage only from 2008. Finally, by transitioning to PatentsView and additional data sourced from the USPTO, we expand the scope of the dataset to include pre-grant patent applications and patent re-assignment information. This addition allows users to study patent applications regardless of grant status and to observe ownership transitions beyond those related to mergers and acquisitions.
A special thanks and appreciation go to Sanskriti Purohit and Ron Rabi for their diligent work and dedication to this effort.
The dataset is freely available under the O-UDA-1.0 License, permitting unrestricted use for research and commercial purposes. We request that users provide proper citations when utilizing the dataset. The license also allows for the creation of derivative datasets based on DISCERN, with the condition that creators ask their downstream users to cite the original authors appropriately.
If you use the data, please add these citations:
1. Arora, A., Belenzon, S., Cioaca, L., Sheer, L, Shin, H.M. & Shvadron, D. (2024). DISCERN 2.0: Duke Innovation & SCientific Enterprises Research Network [Dataset]. In Zenodo (CERN European Organization for Nuclear Research). https://doi.org/10.5281/zenodo.3594642
2. Arora, A., Belenzon, S., Cioaca, L., Sheer, L, & Shvadron, D. (2024). Back to the Future: Are Big Firms Regaining their Scientific and Technological Dominance? Evidence from DISCERN 2.0 (available soon)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Accompanying material for the paper "The anatomy of Green AI technologies: structure, evolution, and impact" (2025).
The Green AI Patent Dataset comprises 63 326 unique U.S. patents that intersect environmental (“green”) technologies with artificial‐intelligence components, spanning from 1976 to 2023. It was assembled by combining:
PatentsView (USPTO) – U.S. patents (snapshot of January 2025) labelled under Cooperative Patent Classification classes Y02 and Y04S for climate‐change mitigation/adaptation and smart‐grid technologies.
Artificial Intelligence Patent Dataset (AIPD 2023 - most recent update) – USPTO’s machine‐learning–validated classification of AI‐related patents (predict50_any_ai = 1). Available here: Pairolero, N. et al. The artificial intelligence patent dataset (aipd) 2023 update. USPTO Economic Working Paper 2024-4,
USPTO (2024). Available at https://www.uspto.gov/sites/default/files/documents/oce-aipd-2023.pdf.
| Variable | Description | Completeness (non-null count) |
|---|---|---|
| patent_id | Unique USPTO patent identifier. | 63 326 |
| cpc_subclass | Subclasses of "green" CPC taxonomy Y02 / Y04S. Refer to the USTPO's website for more details: https://www.uspto.gov/web/patents/classification/cpc/html/cpc-Y.html | 63 326 |
| patent_date | Grant date of the patent (YYYY-MM-DD). | 63 326 |
| patent_title | Title of the patent. | 63 326 |
| assignee | Disambiguated assignee organization name. | 59 479 |
| country | Disambiguated assignee country. | 59 155 |
| forward_citations | Number of times this patent is cited by later patents (forward citations). | 63 326 |
| tech_domain | BERTOPIC‐derived technology domain (integer 0–15; –1 marks outliers). | 62 337 |
| real_value | Market‐value proxy associated with the patent, derived from the updated dataset of Kogan, L., Papanikolaou, D., Seru, A. & Stoffman, N. Technological innovation, resource allocation, and growth. The Q. J. Econ. 132, 665–712, DOI: 10.1093/qje/qjw040 (2017). | 26 306 |
Each patent was assigned to one of 16 topics (tech_domain), numbered 0–15 (with –1 for outliers). Below is the label, example keywords (with their topic cohesion scores), and the number of patents in each topic:
| ID | Label | Top Keywords (score) | Count |
|---|---|---|---|
| 0 | Data Processing & Memory Management | processing (0.516), computing (0.461), process (0.449), systems (0.443), memory (0.421) | 27 435 |
| 1 | Microgrid & Distributed Energy Systems | microgrid (0.487), electricity (0.421), utility (0.401), power (0.380), energy (0.370) | 5 378 |
| 2 | Vehicle Control & Autonomous Powertrains | vehicle (0.477), vehicles (0.468), control (0.416), driving (0.387), engine (0.386) | 3 747 |
| 3 | Irrigation & Agricultural Water Mgmt | irrigation (0.511), systems (0.431), flow (0.353), process (0.348), water (0.333) | 2 754 |
| 4 | Photovoltaic & Electrochemical Devices | semiconductor (0.518), photoelectric (0.509), electrodes (0.487), electrode (0.473), photovoltaic (0.470) | 2 599 |
| 5 | Clinical Microbiome & Therapeutics | microbiome (0.481), clinical (0.371), physiological (0.321), therapeutic (0.320), disease (0.314) | 2 286 |
| 6 | Combustion Engine Control | combustion (0.423), engine (0.373), control (0.342), fuel (0.338), ignition (0.318) | 2 179 |
| 7 | Battery Charging & Management | charging (0.485), charger (0.449), charge (0.425), battery (0.386), batteries (0.377) | 1 541 |
| 8 | HVAC & Thermal Regulation | hvac (0.515), heater (0.474), cooling (0.471), heating (0.464), evaporator (0.455) | 1 523 |
| 9 | Lighting & Illumination Systems | lighting (0.621), illumination (0.601), lights (0.545), brightness (0.526), light (0.488) | 1 219 |
| 10 | Exhaust & Emission Treatment | exhaust (0.464), catalytic (0.446), purification (0.444), catalyst (0.366), emissions (0.365) | 1 064 |
| 11 | Wind Turbine & Rotor Control | turbines (0.498), turbine (0.488), windmill (0.464), wind (0.418), rotor (0.300) | 988 |
| 12 | Aircraft Wing Aerodynamics & Control | wing (0.450), aircraft (0.448), wingtip (0.424), apparatus (0.423), aerodynamic (0.418) | 697 |
| 13 | Meteorological Radar & Weather Forecasting | radar (0.541), meteorological (0.511), weather (0.412), precipitation (0.391), systems (0.372) | 542 |
| 14 | Fuel Cell Systems & Electrodes | fuel (0.375), cell (0.313), systems (0.295), cells (0.291), controls (0.262) | 377 |
| 15 | Turbine Airfoils & Cooling | airfoils (0.584), airfoil (0.572), turbine (0.433), engine (0.333), axial (0.321) | 352 |
| –1 | Outliers | – | 7 656 |
This Zenodo entry contains topic_modeling.ipynb, a fully documented jupyter notebook containing Python code for uncovering latent themes in patent abstracts using BERTopic. It walks through text preprocessing (lowercasing, standard English stopwords plus “herein” and “invention,” tokenization, and boilerplate removal), embedding with the all-MiniLM-L6-v2 SentenceTransformer, dimensionality reduction via UMAP, clustering with HDBSCAN, and topic extraction through class-based TF-IDF. The script also executes a grid search over UMAP and HDBSCAN hyperparameters, computes UMass coherence and topic diversity for each configuration, and saves a CSV of evaluation metrics, enabling straightforward reproduction of our topic-modeling workflow.
Additional analyses, such as data cleaning, merging, aggregation, and the generation of summary tables and plots, were also performed but are not included here by default, as they consist of straightforward operations using standard open-source libraries (e.g., pandas, NumPy, matplotlib, and seaborn). The full code for these steps can be made available upon request.
Facebook
TwitterAttribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
This repository includes the base notebooks used to prepare the patent and trademark data for SEI 2024. This covers the uploading of the PatentsView database, its curation, and the preparation of patent and trademark indicators across all the mapping classifications.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This database collects and links U.S. federal funded awards to U.S. utility patents, and such patents to virtual patent marking (VPM) pages, in line with two related project: 3PFL and IPRoduct. Specifically, this database looks at awards provided by the U.S. Department of Defense (DOD) within the Small Business Innovation Research (SBIR) and Small Business Technology Transfer Program (STTR) programs from 1984 to 2018.
The database is part of a project, IRIS - Insights on the "Real" Impact of Science. The project aims at assessing how public investment in research and development (R&D) translates into commercial products for the final consumer.
The database is composed of three main elements: awards; patents; and web pages. The database provides several information pieces. This has been possible by making use of several sources, that has been properly combined and further elaborated in a convenient way. Information about the awards comes from the Defense Contract Action Data System (DCADS), for the years 1984--2001, and from USAspending.gov, for the years 2001--2018. Most information about the patents is provided by PatentsView, while specific information comes from the Patent Examination Research Dataset (PatEx) or from PATSTAT.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The USPTO grants US patents to inventors and assignees all over the world. For researchers in particular, PatentsView is intended to encourage the study and understanding of the intellectual property (IP) and innovation system; to serve as a fundamental function of the government in creating “public good” platforms in these data; and to eliminate redundant cleaning, converting and matching of these data by individual researchers, thus freeing up researcher time to do what they do best—study IP, innovation, and technological change.
PatentsView Data is a database that longitudinally links inventors, their organizations, locations, and overall patenting activity. The dataset uses data derived from USPTO bulk data files.
Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.
“PatentsView” by the USPTO, US Department of Agriculture (USDA), the Center for the Science of Science and Innovation Policy, New York University, the University of California at Berkeley, Twin Arch Technologies, and Periscopic, used under CC BY 4.0.
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patentsview