Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Google Patents Public Data, provided by IFI CLAIMS Patent Services, is a worldwide bibliographic and US full-text dataset of patent publications.
BIGPATENT, consisting of 1.3 million records of U.S. patent documents along with human written abstractive summaries. Each US patent application is filed under a Cooperative Patent Classification (CPC) code. There are nine such classification categories:
There are two features:
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('big_patent', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
g_applicant_not_disambiguated zip: 217.1 MiB tsv: 569.8 MiBDescription: Raw information on non-inventor applicants# of Rows: 6010194Origin: rawLast Updated: March 17, 2025g_application zip: 65.5 MiB tsv: 399.9 MiBDescription: Information on the applications for granted patent.# of Rows: 9073162Origin: rawLast Updated: March 17, 2025g_assignee_disambiguated zip: 330.9 MiB tsv: 1011.1 MiBDescription: Disambiguated assignee data for granted patents.# of Rows: 8385078Origin: disambigLast Updated: March 17, 2025g_assignee_not_disambiguated zip: 454.5 MiB tsv: 926.6 MiBDescription: Raw assignee data for granted patents.# of Rows: 8385078Origin: rawLast Updated: March 17, 2025g_attorney_disambiguated zip: 60.4 MiB tsv: 799.8 MiBDescription: Disambiguated lawyer data for granted patents.# of Rows: 10290325Origin: disambigLast Updated: March 17, 2025g_attorney_not_disambiguated zip: 327.4 MiB tsv: 799.0 MiBDescription: Raw lawyer data for granted patents.# of Rows: 10280955Origin: rawLast Updated: March 17, 2025g_botanic zip: 324.1 KiB tsv: 924.7 KiBDescription: Information about granted plant patents.# of Rows: 20887Origin: rawLast Updated: March 17, 2025g_cpc_at_issue zip: 306.2 MiB tsv: 1.8 GiBDescription: CPC classification data for granted patents at the time of their issue.# of Rows: 23166175Origin: rawLast Updated: March 17, 2025g_cpc_current zip: 462.7 MiB tsv: 3.0 GiBDescription: Current CPC classifications of granted patents.# of Rows: 56755723Origin: rawLast Updated: March 17, 2025g_cpc_title zip: 6.1 MiB tsv: 105.4 MiBDescription: CPC group classification at issue of the granted patent.# of Rows: 269285Origin: rawLast Updated: March 17, 2025g_examiner_not_disambiguated zip: 181.6 MiB tsv: 528.2 MiBDescription: Raw information about the examiner for granted patents.# of Rows: 12089390Origin: rawLast Updated: March 17, 2025g_figures zip: 49.1 MiB tsv: 121.5 MiBDescription: Number of figures and drawing sheets included with the granted patent.# of Rows: 8507845Origin: rawLast Updated: March 17, 2025g_foreign_citation zip: 657.1 MiB tsv: 2.5 GiBDescription: Citations made to foreign patents by granted U.S. patents.# of Rows: 42604311Origin: rawLast Updated: March 17, 2025g_foreign_priority zip: 64.9 MiB tsv: 207.1 MiBDescription: Information about an earlier patent filing in a foreign country which gives the claim priority.# of Rows: 4189787Origin: rawLast Updated: March 17, 2025g_gov_interest zip: 5.7 MiB tsv: 40.9 MiBDescription: Mapping of patent numbers to raw government interest text# of Rows: 181001Origin: rawLast Updated: March 17, 2025g_gov_interest_contracts zip: 1.7 MiB tsv: 5.4 MiBDescription: Mapping of Federal contract award numbers to patent numbers# of Rows: 223375Origin: processedLast Updated: March 17, 2025g_gov_interest_org zip: 1.2 MiB tsv: 20.9 MiBDescription: Federal agencies with government interests in patents# of Rows: 226971Origin: rawLast Updated: March 17, 2025g_inventor_disambiguated zip: 642.1 MiB tsv: 2.0 GiBDescription: Disambiguated inventor data for granted patents.# of Rows: 22884194Origin: disambigLast Updated: March 17, 2025g_inventor_not_disambiguated zip: 939.5 MiB tsv: 1.9 GiBDescription: Raw inventor data for granted patents.# of Rows: 22884194Origin: rawLast Updated: March 17, 2025g_ipc_at_issue zip: 354.8 MiB tsv: 1.6 GiBDescription: International Patent Classification data for all patents (as of publication date).# of Rows: 24131767Origin: rawLast Updated: March 17, 2025g_location_disambiguated zip: 2.5 MiB tsv: 8.8 MiBDescription: Disambiguated location data, including latitude and longitude for granted patents.# of Rows: 96968Origin: disambigLast Updated: March 17, 2025g_location_not_disambiguated zip: 1007.8 MiB tsv: 3.0 GiBDescription: Raw location data, including latitude and longitude for granted patents.# of Rows: 37423146Origin: rawLast Updated: March 17, 2025g_other_reference zip: 3.8 GiB tsv: 8.8 GiBDescription: Non-patent citations (e.g. articles, papers, etc.) mentioned in granted patents.# of Rows: 61072261Origin: rawLast Updated: March 17, 2025g_patent zip: 212.9 MiB tsv: 1.0 GiBDescription: Data on granted patents.# of Rows: 9075421Origin: rawLast Updated: March 17, 2025g_patent_abstract zip: 1.5 GiB tsv: 5.6 GiBDescription: Abstract data for granted patents.# of Rows: 9075421Origin: rawLast Updated: March 17, 202
Patent data is aggregated across multiple Intellectual Property (IP) registries, including USPTO, CIPO, EUIPO and WIPO (USA, Canada, Europe). Our complete dataset of active patent records is updated weekly. Customized reports available based on company lists, or full dataset via raw feed or one-off reports. Full bibliographic data provided for each IP record; including filing date, grant date, expiry date, inventor(s), IPC, full text abstract, title, etc. Ownership/entity relationship mapping, ticker mapping, ISIN mapping, Crunchbase uuid mapping, Crunchbase domain mapping. We also provide our proprietary IP Activity Score for each owner, which can assist to compare recent innovation activity amongst owners, as reflected in their Intellectual Property filings.
Ipqwery's Patent data is also available as a combined dataset with our Trademark dataset, enabling full IP profiles for corporate entities.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Google Patents Public Data, provided by IFI CLAIMS Patent Services, is a worldwide bibliographic and US full-text dataset of patent publications. Patent information accessibility is critical for examining new patents, informing public policy decisions, managing corporate investment in intellectual property, and promoting future scientific innovation. The growing number of available patent data sources means researchers often spend more time downloading, parsing, loading, syncing and managing local databases than conducting analysis. With these new datasets, researchers and companies can access the data they need from multiple sources in one place, thus spending more time on analysis than data preparation.
The Google Patents Public Data dataset contains a collection of publicly accessible, connected database tables for empirical analysis of the international patent system.
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patents
For more info, see the documentation at https://developers.google.com/web/tools/chrome-user-experience-report/
“Google Patents Public Data” by IFI CLAIMS Patent Services and Google is licensed under a Creative Commons Attribution 4.0 International License.
Banner photo by Helloquence on Unsplash
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset consists of PDFs in Google Cloud Storage from the first page of select US and EU patents, and BigQuery tables with extracted entities, labels, and other properties, including a link to each file in GCS. The structured data contains labels for eleven patent entities (patent inventor, publication date, classification number, patent title, etc.), global properties (US/EU issued, language, invention type), and the location of any figures or schematics on the patent's first page. The structured data is the result of a data entry operation collecting information from PDF documents, making the dataset a useful testing ground for benchmarking and developing AI/ML systems intended to perform broad document understanding tasks like extraction of structured data from unstructured documents. This dataset can be used to develop and benchmark natural language tasks such as named entity recognition and text classification, AI/ML vision tasks such as image classification and object detection, as well as more general AI/ML tasks such as automated data entry and document understanding. Google is sharing this dataset to support the AI/ML community because there is a shortage of document extraction/understanding datasets shared under an open license. This public dataset is hosted in Google Cloud Storage and Google BigQuery. It is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery or this this Cloud Storage quick start guide to begin.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Google Patents Research Data contains the output of much of the data analysis work used in Google Patents (patents.google.com), including machine translations of titles and abstracts from Google Translate, embedding vectors, extracted top terms, similar documents, and forward references.
Using a Bayesian supervised learning approach, we identify individual inventors from the U.S. utility patent database, from 1975 to the present. An interface to calculate and illustrate patent co-authorship networks and social network measures is also provided. The network representation does not require bounding the social network beforehand. We provide descriptive statistics of individual and collaborative vari ables and illustrate examples of networks for an individual, an organization, a technology, and a region. The paper provides an overview of the technical algorithms and pointers to the data, code, and documentation, with the hope of further open development by the research community. Go here for theNBER pdpass file -- https://sites.google.com/site/patentdataproject/Home/downloads. It's old and hasn't been updated
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global commercial patent database market is experiencing robust growth, driven by the increasing need for intellectual property (IP) management and competitive intelligence among businesses. The market's expansion is fueled by several key factors, including rising R&D investments across various industries (pharmaceuticals, technology, etc.), a surge in patent filings worldwide, and the growing adoption of sophisticated analytical tools for patent data mining. This necessitates comprehensive and user-friendly databases that offer advanced search functionalities, allowing businesses to identify opportunities, track competitors, and protect their own innovations effectively. The market's segmentation reflects the diverse needs of users, encompassing solutions tailored to specific industries and IP management tasks. Leading players are continuously innovating, integrating AI and machine learning capabilities to enhance search precision and data analysis, creating more efficient and insightful platforms. The competitive landscape is characterized by a mix of established players and emerging technology companies, each striving for differentiation through superior user experience, data quality, and analytical features. We estimate the market size to be approximately $2.5 billion in 2025, growing at a compound annual growth rate (CAGR) of 12% between 2025 and 2033. This strong growth is projected to continue throughout the forecast period, primarily due to the ongoing digital transformation across sectors and the increasing reliance on data-driven decision-making. However, challenges remain, including the high cost of access to premium database features and the complex nature of patent data, requiring specialized expertise to interpret effectively. The market will see continued consolidation, with larger players acquiring smaller companies to expand their market reach and product offerings. Furthermore, the focus on user experience and the development of more intuitive interfaces will be critical to broaden the appeal of these databases to a wider range of users, from IP professionals to business strategists. Geographic expansion, particularly in emerging economies with growing R&D activities, will also be a key driver of market growth in the coming years.
Encouraging disclosure is important for the patent system, yet the technical information in patent applications is often inadequate. We use algorithms from computational linguistics to quantify the effectiveness of disclosure in patent applications. Relying on the expectation that universities have more ability and incentive to disclose their inventions than corporations, we analyze 64 linguistic features of patent applications, and show that university patents are more readable by 0.4 SD of a synthetic measure of readability. Results are robust to controlling for non-disclosure-related invention heterogeneity. Testing the usefulness of linguistic metrics with disclosure and readability evaluations by an engineering student ``expert'' panel and by examining USPTO 112 (a)---lack of disclosure---rejection, we find modest support for our approach. The ability to quantify disclosure opens new research paths and potentially facilitates improvement of disclosure.
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/5.0/customlicense?persistentId=doi:10.7910/DVN/FX3DIEhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/5.0/customlicense?persistentId=doi:10.7910/DVN/FX3DIE
Published by the European Patent Office, PATSTAT Global provides information on patent applications and granted patents collected from national and regional patent offices worldwide. The dataset has been formatted to facilitate statistical analysis. It can be used to research when a patent application was filed, how the patent progressed through the process, if and when it was granted, who the inventors were, and the textual abstract of the patent itself. PATSTAT Global is extracted from PATSTAT Online database. It is a snapshot of the source database at the time of extraction which is end of January for the spring edition and end of July for the autumn edition. Files are in .csv format. More information is available on the PATSTAT website. DATA AVAILABLE FOR PERIOD: 1900-July 2024
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Research on international migration and innovation relies heavily on inventor and patent data, with “migrant inventors” attracting a great deal of attention, especially for what concerns their role in easing the international transfer of knowledge. This hides the fact that many of them move to their host country before starting their inventive career or even before completing their education. We discuss the conceptual and practical difficulties that stand in the way of investigating other likely channels of influence of inventor’s migration on innovation, namely the easing of skill shortages and the increase of variety in inventive teams, firms, and location.
USPTO
Description
In the United States, patent documents are released into the public domain as government works. Patents follow a highly standardized format with distinct required sections for background, detailed description, and claims. We include patents from the US Patents and Trademark Office (USPTO) as provided by the Google Patents Public Data dataset, which includes millions of granted patents and published patent applications dating back to 1782. We processed… See the full description on the dataset page: https://huggingface.co/datasets/common-pile/uspto_filtered.
https://data.gov.sg/open-data-licencehttps://data.gov.sg/open-data-licence
Dataset from Agency For Science, Technology and Research. For more information, visit https://data.gov.sg/datasets/d_455fe9261807a6a0255b2e2fbe188545/view
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Explore the detailed Indian Patent Dataset featuring patent applications from 2010, 2011, and 2019. Ideal for researchers, policymakers, businesses, and academics.
The OECD have created a search strategy for environment-related technologies (ENV-TECH) based on more than 200,000 different classification symbols, containing both International Patent Classification (IPC) symbols and Cooperative Patent Classification (CPC) symbols. The classifications cover a broad spectrum of technologies related to environmental pollution, water scarcity and climate change mitigation. The classifications found in the ENV-TECH search strategy have been grouped according to their relevance within 12 innovation systems. Not all the classifications found in the ENV-TECH search strategy have been used. For each innovation system, a list of the relevant CPC and IPC schemes is created. The raw patent data found in the OECD REGPAT database (which contains all patent filed to the EPO) is then filtered for each list. This yields the number of patent applications relevant within each innovation system. These are allocated fractionally to the inventor(s) country according to inventor share. The patents are then sorted according to the priority year of filing. Only patents filed to the EPO are listed in the data. This contains inventions sought protected within the jurisdiction of the EPO and also captures international patents filed under the Patent Cooperation Treaty (PCT), which must also be filed to the EPO. The method does, however, not list patents which are filed to either the United States Patents and Trademark Office (USPTO) or the Japan Patent Office (JPO) alone. Hence, inventions which are sought protected in the ERA countries and all the PCT member states are covered, but not inventions which are sought protected under the jurisdiction of either the USPTO or JPO alone. The patents are also listed according to the country of the inventor(s), though an invention may have been developed in a different country. Many patents are never used in any industrial application, and do therefore not contribute to innovation directly. Many inventions are also not sought patented, either because they cannot be patented or because the inventors attempt to protect the invention through other means. These inventions are not captured through patent statistics, which are then not a perfect indicator for innovation. Be aware that due to delayed data entries in the OECD patent database the values for the last couple of years might be underestimated and could possibly increase over the next years. Have this in mind when working with data from recent years.
Detailed Patent Litigation Data on 74,000 Cases from 1963-2015. From the US Patent and Trademarks Office. Source: https://www.kaggle.com/uspto/patent-litigations
Context
Achieving the appropriate balance of intellectual property (IP) protection through patent litigation is critical to economic growth. Examining the interplay between US patent law and economic effect is of great interest to many stakeholders. Published in March 2017, this dataset is the most comprehensive public body of information on USPTO patent litigation.
Content
The dataset covers over 74k cases across 52 years. Five different files (attorneys.csv, cases.csv, documents.csv, names.csv, pacer_cases.csv) detail the litigating parties, their attorneys, results, locations, and dates. The large documents.csv file covers more than 5 million relevant documents (a tool like split might be your friend here).
Acknowledgements
This data was collected by the Office of the Chief Economist at the USPTO. Data was collected from both the Public Access to Court Electronics Records (PACER), as well as RECAP, an independent PACER repository. Further documentation available via this paper.
Inspiration
Patent litigation is a tug of war between patent holders, competing parties using similar IP, and government policy. Which industries see the most litigation? Any notable changes over time? Is there a positive (or negative) correlation between litigation, and a company’s economic fortunes?
License
Public Domain Mark 1.0 Also see source.
Provides the bulk zip files that contain the concatenated full-text of each patent application XML document (non-provisional utility and plant). This page provides an additional feature called 'View Patent Records' which allows user to find or discover the patent applications that are bundled in the zip file. However, in this zip file, there may be applications that are not available in ODP and will not be included in the search results. Also, if there are revisions to the applications, the file will only show the original application.
Are you looking for data that tell if the companies or persons you look into own any patents? If they do, do you want to know how many patents they own?
The Assignee Query Data will provide you with a timely and comprehensive result of global patent ownership of the companies or individuals with the history of 50 years.
How do we do that?
We include decades’ worth of global full-text databases, such as the US, China, EM/EUIPO, Japan, Korea, WIPO and so on, and keep them updated on a timely basis—as frequently as every day or week, depending on the sources.
Furthermore, the data downloaded are cleansed to minimize data errors and thus search and analysis errors. For example, we standardize assignee names to enables individual patents to correspond to a single owner; logic-based corrections ensure that values are corrected based on rules.
In addition, we use advanced algorithms in analyzing, selecting, and presenting the most current and accurate information from multiple available data sources. For instance, a single patent’s legal status is triangulated across different patent data for accuracy. Moreover, proprietary Quality and Value rankings put patents in each key market under the equally evaluative process, offering subjective predictions for the patent's likelihood of validity and monetization.
Offers display/download of Patent Trial and Appeal Board (PTAB) (formerly the Board of Patent Appeals and Interferences (BPAI)) Precedential Decisions (PDF sorted alphabetically or by topic); offers display/download of PTAB Informative Decisions (PDF sorted alphabetically or by topic); and also offers search, display, and download of Board Decisions, Proceedings, or Documents.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Google Patents Public Data, provided by IFI CLAIMS Patent Services, is a worldwide bibliographic and US full-text dataset of patent publications.