https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
PatentsView (description below) will go offline on March 28th. This torrent includes all bulk downloadable tables from: , along with the data dictionaries and the published logic diagram. Zip files contain tab-delimited files that are considerably larger than the zip files when uncompressed. The data includes patent activity from 1976 to 2024. Description: PatentsView is an award-winning visualization, data dissemination, and analysis platform that focuses on intellectual property (IP) data. Support for the site and the team that works on it comes from the Office of the Chief Economist at the U.S. Patent & Trademark Office (USPTO). PatentsView serves students, educators, researchers, policymakers, small business owners, and the public. It offers a unique and valuable open data platform providing free data dissemination and value-added analyses to foster better knowledge of the IP system and drive new insights into invention and i
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PatentsView Data is a dataset that longitudinally links inventors, their organizations, locations, and overall patenting activity. The dataset uses data derived from USPTO bulk data files.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These are the detailed description text from granted patents (1976-2014, prefix "g_") and patent applications (2021-2014, prefix "pg_") from the final release of PatentsView on 12/31/2024.
The USPTO grants US patents to inventors and assignees all over the world. For researchers in particular, PatentsView is intended to encourage the study and understanding of the intellectual property (IP) and innovation system; to serve as a fundamental function of the government in creating “public good” platforms in these data; and to eliminate redundant cleaning, converting and matching of these data by individual researchers, thus freeing up researcher time to do what they do best—study IP, innovation, and technological change.
PatentsView Data is a database that longitudinally links inventors, their organizations, locations, and overall patenting activity. The dataset uses data derived from USPTO bulk data files.
Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.
“PatentsView” by the USPTO, US Department of Agriculture (USDA), the Center for the Science of Science and Innovation Policy, New York University, the University of California at Berkeley, Twin Arch Technologies, and Periscopic, used under CC BY 4.0.
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patentsview
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
g_applicant_not_disambiguated zip: 217.1 MiB tsv: 569.8 MiBDescription: Raw information on non-inventor applicants# of Rows: 6010194Origin: rawLast Updated: March 17, 2025g_application zip: 65.5 MiB tsv: 399.9 MiBDescription: Information on the applications for granted patent.# of Rows: 9073162Origin: rawLast Updated: March 17, 2025g_assignee_disambiguated zip: 330.9 MiB tsv: 1011.1 MiBDescription: Disambiguated assignee data for granted patents.# of Rows: 8385078Origin: disambigLast Updated: March 17, 2025g_assignee_not_disambiguated zip: 454.5 MiB tsv: 926.6 MiBDescription: Raw assignee data for granted patents.# of Rows: 8385078Origin: rawLast Updated: March 17, 2025g_attorney_disambiguated zip: 60.4 MiB tsv: 799.8 MiBDescription: Disambiguated lawyer data for granted patents.# of Rows: 10290325Origin: disambigLast Updated: March 17, 2025g_attorney_not_disambiguated zip: 327.4 MiB tsv: 799.0 MiBDescription: Raw lawyer data for granted patents.# of Rows: 10280955Origin: rawLast Updated: March 17, 2025g_botanic zip: 324.1 KiB tsv: 924.7 KiBDescription: Information about granted plant patents.# of Rows: 20887Origin: rawLast Updated: March 17, 2025g_cpc_at_issue zip: 306.2 MiB tsv: 1.8 GiBDescription: CPC classification data for granted patents at the time of their issue.# of Rows: 23166175Origin: rawLast Updated: March 17, 2025g_cpc_current zip: 462.7 MiB tsv: 3.0 GiBDescription: Current CPC classifications of granted patents.# of Rows: 56755723Origin: rawLast Updated: March 17, 2025g_cpc_title zip: 6.1 MiB tsv: 105.4 MiBDescription: CPC group classification at issue of the granted patent.# of Rows: 269285Origin: rawLast Updated: March 17, 2025g_examiner_not_disambiguated zip: 181.6 MiB tsv: 528.2 MiBDescription: Raw information about the examiner for granted patents.# of Rows: 12089390Origin: rawLast Updated: March 17, 2025g_figures zip: 49.1 MiB tsv: 121.5 MiBDescription: Number of figures and drawing sheets included with the granted patent.# of Rows: 8507845Origin: rawLast Updated: March 17, 2025g_foreign_citation zip: 657.1 MiB tsv: 2.5 GiBDescription: Citations made to foreign patents by granted U.S. patents.# of Rows: 42604311Origin: rawLast Updated: March 17, 2025g_foreign_priority zip: 64.9 MiB tsv: 207.1 MiBDescription: Information about an earlier patent filing in a foreign country which gives the claim priority.# of Rows: 4189787Origin: rawLast Updated: March 17, 2025g_gov_interest zip: 5.7 MiB tsv: 40.9 MiBDescription: Mapping of patent numbers to raw government interest text# of Rows: 181001Origin: rawLast Updated: March 17, 2025g_gov_interest_contracts zip: 1.7 MiB tsv: 5.4 MiBDescription: Mapping of Federal contract award numbers to patent numbers# of Rows: 223375Origin: processedLast Updated: March 17, 2025g_gov_interest_org zip: 1.2 MiB tsv: 20.9 MiBDescription: Federal agencies with government interests in patents# of Rows: 226971Origin: rawLast Updated: March 17, 2025g_inventor_disambiguated zip: 642.1 MiB tsv: 2.0 GiBDescription: Disambiguated inventor data for granted patents.# of Rows: 22884194Origin: disambigLast Updated: March 17, 2025g_inventor_not_disambiguated zip: 939.5 MiB tsv: 1.9 GiBDescription: Raw inventor data for granted patents.# of Rows: 22884194Origin: rawLast Updated: March 17, 2025g_ipc_at_issue zip: 354.8 MiB tsv: 1.6 GiBDescription: International Patent Classification data for all patents (as of publication date).# of Rows: 24131767Origin: rawLast Updated: March 17, 2025g_location_disambiguated zip: 2.5 MiB tsv: 8.8 MiBDescription: Disambiguated location data, including latitude and longitude for granted patents.# of Rows: 96968Origin: disambigLast Updated: March 17, 2025g_location_not_disambiguated zip: 1007.8 MiB tsv: 3.0 GiBDescription: Raw location data, including latitude and longitude for granted patents.# of Rows: 37423146Origin: rawLast Updated: March 17, 2025g_other_reference zip: 3.8 GiB tsv: 8.8 GiBDescription: Non-patent citations (e.g. articles, papers, etc.) mentioned in granted patents.# of Rows: 61072261Origin: rawLast Updated: March 17, 2025g_patent zip: 212.9 MiB tsv: 1.0 GiBDescription: Data on granted patents.# of Rows: 9075421Origin: rawLast Updated: March 17, 2025g_patent_abstract zip: 1.5 GiB tsv: 5.6 GiBDescription: Abstract data for granted patents.# of Rows: 9075421Origin: rawLast Updated: March 17, 202
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These two data file contains information on patent citations for USPTO utility patents granted between 1976 and 2015 and for patents that have been classified in 30 specific technology domains.
The file 'CITATION_INFO_no_neg_citlag.csv' is generated combining raw data freely dowloadable from patentsview.org from which citations where the filing year of the citing patent is younger than the filing year of the cited one have been removed.
The file 'CITATIONS_DOMAINS.csv' is a sample of the previous file that only includes citations made by patents belonging to one of 30 domains defined in the paper 'Estimating technology performance improvement rates by mining patent data' by Giorgio Triulzi, Jeff Alstott and Chris Magee.
These two files complement another dataset published on Mendeley Data. The two datasets can be used, together with the code published on GitHub, to replicate the main results from the paper.
Hand-disambiguation of a sample of U.S. patents inventor mentions from PatentsView.org.
Inventors we selected indirectly by sampling inventor mentions uniformly at random. This results in inventor sampled with probability proportional to their number of granted patents.
The time period considered is from 1976 to December 31, 2021, corresponding to the disambiguation labeled "disamb_inventor_id_20211230" in PatentsView's bulk data downloads "g_persistent_inventor.tsv" file (https://patentsview.org/download/data-download-tables). That is, the benchmark disambiguation intends to contain all inventor mentions for the sampled inventors from that time period. Note that the benchmark disambiguation contains a few extraneous mentions to patents granted outside of that time period. These should be ignored for evaluation purposes.
The methodology used for the hand-disambiguation is described in Binette et al. (2022) (https://arxiv.org/abs/2210.01230). We used one disambiguation of 200 inventors from Binette et al. (2022), as well as an additional disambiguation of 200 inventors provided by an additional staff member. The two disambiguations were reviewed and validated. However, they should be expected to contain errors due to the ambiguous nature of inventor disambiguation. Furthermore, given the use as the December 30, 2021, disambiguation from PatentsView as a starting point of the hand-labeling, a bias towards this disambiguation should be expected.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Accompanying material for the paper "The anatomy of Green AI technologies: structure, evolution, and impact" (2025).
The Green AI Patent Dataset comprises 63 326 unique U.S. patents that intersect environmental (“green”) technologies with artificial‐intelligence components, spanning from 1976 to 2023. It was assembled by combining:
PatentsView (USPTO) – U.S. patents (snapshot of January 2025) labelled under Cooperative Patent Classification classes Y02 and Y04S for climate‐change mitigation/adaptation and smart‐grid technologies.
Artificial Intelligence Patent Dataset (AIPD 2023 - most recent update) – USPTO’s machine‐learning–validated classification of AI‐related patents (predict50_any_ai = 1). Available here: Pairolero, N. et al. The artificial intelligence patent dataset (aipd) 2023 update. USPTO Economic Working Paper 2024-4,
USPTO (2024). Available at https://www.uspto.gov/sites/default/files/documents/oce-aipd-2023.pdf.
Variable | Description | Completeness (non-null count) |
---|---|---|
patent_id | Unique USPTO patent identifier. | 63 326 |
cpc_subclass | Subclasses of "green" CPC taxonomy Y02 / Y04S. Refer to the USTPO's website for more details: https://www.uspto.gov/web/patents/classification/cpc/html/cpc-Y.html | 63 326 |
patent_date | Grant date of the patent (YYYY-MM-DD). | 63 326 |
patent_title | Title of the patent. | 63 326 |
assignee | Disambiguated assignee organization name. | 59 479 |
country | Disambiguated assignee country. | 59 155 |
forward_citations | Number of times this patent is cited by later patents (forward citations). | 63 326 |
tech_domain | BERTOPIC‐derived technology domain (integer 0–15; –1 marks outliers). | 62 337 |
real_value | Market‐value proxy associated with the patent, derived from the updated dataset of Kogan, L., Papanikolaou, D., Seru, A. & Stoffman, N. Technological innovation, resource allocation, and growth. The Q. J. Econ. 132, 665–712, DOI: 10.1093/qje/qjw040 (2017). | 26 306 |
Each patent was assigned to one of 16 topics (tech_domain), numbered 0–15 (with –1 for outliers). Below is the label, example keywords (with their topic cohesion scores), and the number of patents in each topic:
ID | Label | Top Keywords (score) | Count |
---|---|---|---|
0 | Data Processing & Memory Management | processing (0.516), computing (0.461), process (0.449), systems (0.443), memory (0.421) | 27 435 |
1 | Microgrid & Distributed Energy Systems | microgrid (0.487), electricity (0.421), utility (0.401), power (0.380), energy (0.370) | 5 378 |
2 | Vehicle Control & Autonomous Powertrains | vehicle (0.477), vehicles (0.468), control (0.416), driving (0.387), engine (0.386) | 3 747 |
3 | Irrigation & Agricultural Water Mgmt | irrigation (0.511), systems (0.431), flow (0.353), process (0.348), water (0.333) | 2 754 |
4 | Photovoltaic & Electrochemical Devices | semiconductor (0.518), photoelectric (0.509), electrodes (0.487), electrode (0.473), photovoltaic (0.470) | 2 599 |
5 | Clinical Microbiome & Therapeutics | microbiome (0.481), clinical (0.371), physiological (0.321), therapeutic (0.320), disease (0.314) | 2 286 |
6 | Combustion Engine Control | combustion (0.423), engine (0.373), control (0.342), fuel (0.338), ignition (0.318) | 2 179 |
7 | Battery Charging & Management | charging (0.485), charger (0.449), charge (0.425), battery (0.386), batteries (0.377) | 1 541 |
8 | HVAC & Thermal Regulation | hvac (0.515), heater (0.474), cooling (0.471), heating (0.464), evaporator (0.455) | 1 523 |
9 | Lighting & Illumination Systems | lighting (0.621), illumination (0.601), lights (0.545), brightness (0.526), light (0.488) | 1 219 |
10 | Exhaust & Emission Treatment | exhaust (0.464), catalytic (0.446), purification (0.444), catalyst (0.366), emissions (0.365) | 1 064 |
11 | Wind Turbine & Rotor Control | turbines (0.498), turbine (0.488), windmill (0.464), wind (0.418), rotor (0.300) | 988 |
12 | Aircraft Wing Aerodynamics & Control | wing (0.450), aircraft (0.448), wingtip (0.424), apparatus (0.423), aerodynamic (0.418) | 697 |
13 | Meteorological Radar & Weather Forecasting | radar (0.541), meteorological (0.511), weather (0.412), precipitation (0.391), systems (0.372) | 542 |
14 | Fuel Cell Systems & Electrodes | fuel (0.375), cell (0.313), systems (0.295), cells (0.291), controls (0.262) | 377 |
15 | Turbine Airfoils & Cooling | airfoils (0.584), airfoil (0.572), turbine (0.433), engine (0.333), axial (0.321) | 352 |
–1 | Outliers | – | 7 656 |
This Zenodo entry contains topic_modeling.ipynb
, a fully documented jupyter notebook containing Python code for uncovering latent themes in patent abstracts using BERTopic. It walks through text preprocessing (lowercasing, standard English stopwords plus “herein” and “invention,” tokenization, and boilerplate removal), embedding with the all-MiniLM-L6-v2 SentenceTransformer, dimensionality reduction via UMAP, clustering with HDBSCAN, and topic extraction through class-based TF-IDF. The script also executes a grid search over UMAP and HDBSCAN hyperparameters, computes UMass coherence and topic diversity for each configuration, and saves a CSV of evaluation metrics, enabling straightforward reproduction of our topic-modeling workflow.
Additional analyses, such as data cleaning, merging, aggregation, and the generation of summary tables and plots, were also performed but are not included here by default, as they consist of straightforward operations using standard open-source libraries (e.g., pandas, NumPy, matplotlib, and seaborn). The full code for these steps can be made available upon request.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is data, code, and replication instructions for "Do Local Conditions Determine the Direction of Science? Evidence from U.S. Land Grant Colleges." In this project, we test whether land grant colleges that are located in counties that are agriculturally unrepresentative relative to the rest of their states also tend to produce research focusing on more unrepresentative crops.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The file geoc_inv.txt contains identifiers for patent first filings (corresponding to appln_id in PATSTAT), latitude, longitude, city, region, and country of the inventor. Missing coordinates have been imputed from equivalents and other second filings or from information on the location of applicants. The file also contains a variable indicating the source of information ('source'): 1: information comes from the first filing itself 2: information comes from direct equivalent 3: information comes from other subsequent filings 4: information comes from the applicant’s location in first filings 5: information comes from the applicant’s location in the equivalent 6: information comes from the applicant’s location in other subsequent filings; the column 'coord_source' indicates the source of coordinates (whether they come from geolocalisation services, from geonames, or from PatentsView). It is possible to select certain types of first filings based on column 'type'. For example, Paris Convention priority filings can be retrieved by specifying type=priority. The file geoc_app.txt contains location information of applicants. Sources of information (first filings, equivalents, etc.) are thus browsed in reverse order. A detailed data description can be found in de Rassenfosse, Kozak, Seliger 2019: Geocoding of worldwide patent data, published in 'Scientific Data' and available at https://doi.org/10.1038/s41597-019-0264-6. Please note the following: The files geoc_inv_person.txt and geoc_app_person.txt contain person IDs for inventors and applicants, respectively, whenever the location information comes from PATSTAT. If not, the person_id is = 0. These files are not described in the paper. They have been made accessible to improve interoperability with PATSTAT data. Some files had to be zipped in order to upload them to Harvard Dataverse.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The study analyzes quantitative micro-level data aggregated to the city-level in urban systems in Europe and the United States. The study demonstrates how urban scaling laws arise from within-city inequality. We show that indicators of interconnectivity, productivity, and innovation have heavy tailed distributions in cities, and that city tails, and their growth with city size, play an important role in the emergence of urban scaling. With agent-based simulation and an analysis of longitudinal micro-level data, we identify a city-size dependent cumulative advantage mechanism behind differences in the tailedness of urban indicators by city size.
The data and code that support the findings of this study are available for download here. We collected the online networking data for Russia and Ukraine through the VKontakte API (https://vk.com/dev/openapi), the data on US patents are from the US Patent and Trademark Office (https://www.patentsview.org) and on research grants from Dimensions (https://www.dimensions.ai). The code for these data collections is available upon request. The Swedish micro-level data come from administrative and tax records and can therefore not be shared; access may be requested from Statistics Sweden (https://scb.se/en/services/guidance-for-researchers-and-universities). Additional information and data may be requested from the authors.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Policymakers are increasingly concerned that incumbent acquisitions of small or young firms may slow down rather than speed up innovation, but it is difficult to identify which firms are related in the fast-changing space of technological innovation. This paper proposes a new, data-driven method to classify patent data into tech-business zones on a probabilistic basis, using patent assignee information. After combining M&A data from S&P Global Market Intelligence with PatentsView data from the US Patent and Trademark Office, we discuss how the zone classification can aid merger reviews and other lines of research.
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
This repository includes the base notebooks used to prepare the patent and trademark data for SEI 2024. This covers the uploading of the PatentsView database, its curation, and the preparation of patent and trademark indicators across all the mapping classifications.
2 (Per million inhabitants) in 2012. Applications to the EPO
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
PatentsView (description below) will go offline on March 28th. This torrent includes all bulk downloadable tables from: , along with the data dictionaries and the published logic diagram. Zip files contain tab-delimited files that are considerably larger than the zip files when uncompressed. The data includes patent activity from 1976 to 2024. Description: PatentsView is an award-winning visualization, data dissemination, and analysis platform that focuses on intellectual property (IP) data. Support for the site and the team that works on it comes from the Office of the Chief Economist at the U.S. Patent & Trademark Office (USPTO). PatentsView serves students, educators, researchers, policymakers, small business owners, and the public. It offers a unique and valuable open data platform providing free data dissemination and value-added analyses to foster better knowledge of the IP system and drive new insights into invention and i