Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Fork of b-mc2/sql-create-context
Overview
This dataset builds from WikiSQL and Spider. There are 78,577 examples of natural language queries, SQL CREATE TABLE statements, and SQL Query answering the question using the CREATE statement as context. This dataset was built with text-to-sql LLMs in mind, intending to prevent hallucination of column and table names often seen when trained on text-to-sql datasets. The CREATE TABLE statement can often be copy and pasted from… See the full description on the dataset page: https://huggingface.co/datasets/philschmid/sql-create-context-copy.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Patent Examination Data System gives users access to multiple records of USPTO patent application or patent filing status at no cost. PEDS is updated daily and mirrors the data available in the Patent Application Location and Monitoring system (PALM). PEDS provides access to public applications including: published patent applications and patents. PCT applications that have not been published by WIPO. Any applications that have not been released by the USPTO will not be available in PEDS.
USPTO Patent Examiner Data System (PEDS) API Data contains data from the examination process of USPTO patent applications. PEDS contains the bibliographic, published document and patent term extension data tabs in Public PAIR from 1981 to present. There is also some data dating back to 1935.
Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.
"Patent Examination Data System" by the USPTO, for public use.
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:uspto_peds
Banner photo by Thought Catalog on Unsplash
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
This dataset is a fork from sql-create-context This dataset builds from WikiSQL and Spider. There are 78,577 examples of natural language queries, SQL CREATE TABLE statements, and SQL Query answering the question using the CREATE statement as context. This dataset was built with text-to-sql LLMs in mind, intending to prevent hallucination of column and table names often seen when trained on text-to-sql datasets. The CREATE TABLE statement can often be copy and pasted from… See the full description on the dataset page: https://huggingface.co/datasets/detakarang/sql-create-context-id.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Google Patents Public Data, provided by IFI CLAIMS Patent Services, is a worldwide bibliographic and US full-text dataset of patent publications. Patent information accessibility is critical for examining new patents, informing public policy decisions, managing corporate investment in intellectual property, and promoting future scientific innovation. The growing number of available patent data sources means researchers often spend more time downloading, parsing, loading, syncing and managing local databases than conducting analysis. With these new datasets, researchers and companies can access the data they need from multiple sources in one place, thus spending more time on analysis than data preparation.
The Google Patents Public Data dataset contains a collection of publicly accessible, connected database tables for empirical analysis of the international patent system.
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patents
For more info, see the documentation at https://developers.google.com/web/tools/chrome-user-experience-report/
“Google Patents Public Data” by IFI CLAIMS Patent Services and Google is licensed under a Creative Commons Attribution 4.0 International License.
Banner photo by Helloquence on Unsplash
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This curated dataset consists of 269,353 patent documents (published patent applications and granted patents) spanning the 1976 to 2016 period and is intended to help identify promising R&D on the horizon in diagnostics, therapeutics, data analytics, and model biological systems.
USPTO Cancer Moonshot Patent Data was generated using USPTO examiner tools to execute a series of queries designed to identify cancer-specific patents and patent applications. This includes drugs, diagnostics, cell lines, mouse models, radiation-based devices, surgical devices, image analytics, data analytics, and genomic-based inventions.
“USPTO Cancer Moonshot Patent Data” by the USPTO, for public use. Frumkin, Jesse and Myers, Amanda F., Cancer Moonshot Patent Data (August, 2016).
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:uspto_oce_cancer
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The Office Action Research Dataset for Patents contains detailed information derived from the Office actions issued by patent examiners to applicants during the patent examination process. The “Office action” is a written notification to the applicant of the examiner’s decision on patentability and generally discloses the grounds for a rejection, the claims affected, and the pertinent prior art.
This initial release consists of three files derived from 4.4 million Office actions mailed during the 2008 to mid-2017 period from USPTO examiners to the applicants of 2.2 million unique patent applications.
A working paper describing this dataset is available and can be cited as Lu, Qiang and Myers, Amanda F. and Beliveau, Scott, USPTO Patent Prosecution Research Data: Unlocking Office Action Traits (November 20, 2017). USPTO Economic Working Paper No. 2017-10. Available at SSRN: https://ssrn.com/abstract=3024621 (link is external).
This effort is made possible by the USPTO Digital Services & Big Data portfolio and collaboration with the USPTO Office of the Chief Economist (OCE). The OCE provides these data files for public use and encourages users to identify fixes and improvements. Please provide all feedback to: EconomicsData@uspto.gov.
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:uspto_oce_office_actions
Banner photo by Trent Erwin on Unsplash
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The Patent Claims Research Dataset contain detailed information on claims from U.S. patents granted between 1976 and 2014 and U.S. patent applications published between 2001 and 2014. The dataset is derived from the Patent Application Publication Full-Text and Patent Grant Full Text files, available at https://bulkdata.uspto.gov/, to which the Office of Chief Economist (OCE) applied a Python algorithm to identify individual claims as well as the dependency relationship between claims. From the parsed claims text, OCE created six data files containing individually-parsed claims, claim-level statistics, and document-level statistics, including newly-developed measures of patent scope.
USPTO OCE Patent Claims Research data contains detailed information on claims from U.S. patents granted between 1976 and 2014 and U.S. patent applications published between 2001 and 2014.
"USPTO OCE Patent Claims Research Data" by the USPTO, for public use. Marco, Alan C. and Sarnoff, Joshua D. and deGrazia, Charles, "Patent Claims and Patent Scope" (October 2016). USPTO Economic Working Paper 2016-04.
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:uspto_oce_claims
Banner photo by William Iven on Unsplash
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
World Development Indicators (WDI) by World Bank includes data spanning up to 56 years—from 1960 to 2016. WDI frames global trends with indicators on population, population density, urbanization, GNI, and GDP. These indicators measure the world’s economy and progress toward improving lives, achieving sustainable development, providing support for vulnerable populations, and reducing gender disparities.
World Development Indicators Data is the primary World Bank collection of development indicators, compiled from officially-recognized international sources. It presents the most current and accurate global development data available, and includes national, regional and global estimates.
“World Development Indicators” by the World Bank, used under CC BY 3.0 IGO.
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:worldbank_wdi
Banner photo by Joshua Rawson-Harris on Unsplash
The CPC is the result of a partnership between the EPO and the USPTO in their joint effort to develop a common, internationally compatible classification system for technical documents, in particular patent publications, which will be used by both offices in the patent granting process.
Cooperative Patent Classification Data contains the scheme and definitions of the Cooperative Patent Classification system for classifying patent documents.
“Cooperative Patent Classification” by the EPO and USPTO, for public use. Modifications have been made to parse the XML description sections to extract references to other classification symbols.
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:cpc
Banner photo by Helloquence on Unsplash
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
OCE collected all of the data from the Public Access to Court Electronics Records (PACER) and RECAP, an independent project designed to serve as a repository for litigation data sourced from PACER. The final output datasets include information on the litigating parties involved and their attorneys; the cause of action; the court location; important dates in the litigation history; and descriptions of all documents submitted in a given case, which cover more than 5 million separate documents contained in the case docket reports.
USPTO OCE Patent Litigation Docket Reports Data contains detailed patent litigation data on 74,623 unique district court cases filed during the period 1963-2015.
"USPTO OCE Patent Litigation Docket Reports Data" by the USPTO, for public use. Marco, A., A. Tesfayesus, A. Toole (2017). “Patent Litigation Data from US District Court Electronic Records (1963-2015).” USPTO Economic Working Paper No. 2017-06.
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:uspto_oce_litigation
Banner photo by Samuel Zeller on Unsplash
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This database is the result of a combined effort of several European and US researchers to collect, clean and harmonise disclosed SEPs data at thirteen major standard setting organisations (including ETSI, ITU, IEEE, ISO, and more).
Disclosed Standard Essential Patents (dSEP) Data provides a full overview of disclosed intellectual property rights at setting organizations worldwide.
Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:dsep
“Disclosed Standard Essential Patents Database” by Bekkers, R., Catalini, C., Martinelli, A., & Simcoe, T. (2012). Intellectual Property Disclosure in Standards Development. Proceedings from NBER conference on Standards, Patents & Innovation, Tucson (AZ), January 20 and 21, 2012.
Banner photo by Helloquence on Unsplash
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
ChEMBL is maintained by the European Bioinformatics Institute (EBI), of the European Molecular Biology Laboratory (EMBL), based at the Wellcome Trust Genome Campus, Hinxton, UK.
ChEMBL is a manually curated database of bioactive molecules with drug-like properties used in drug discovery, including information about existing patented drugs.
Schema: http://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_23/chembl_23_schema.png
Documentation: http://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_23/schema_documentation.html
Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.
“ChEMBL” by the European Bioinformatics Institute (EMBL-EBI), used under CC BY-SA 3.0. Modifications have been made to add normalized publication numbers.
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:ebi_chembl
The USPTO grants US patents to inventors and assignees all over the world. For researchers in particular, PatentsView is intended to encourage the study and understanding of the intellectual property (IP) and innovation system; to serve as a fundamental function of the government in creating “public good” platforms in these data; and to eliminate redundant cleaning, converting and matching of these data by individual researchers, thus freeing up researcher time to do what they do best—study IP, innovation, and technological change.
PatentsView Data is a database that longitudinally links inventors, their organizations, locations, and overall patenting activity. The dataset uses data derived from USPTO bulk data files.
Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.
“PatentsView” by the USPTO, US Department of Agriculture (USDA), the Center for the Science of Science and Innovation Policy, New York University, the University of California at Berkeley, Twin Arch Technologies, and Periscopic, used under CC BY 4.0.
Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patentsview
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Fork of b-mc2/sql-create-context
Overview
This dataset builds from WikiSQL and Spider. There are 78,577 examples of natural language queries, SQL CREATE TABLE statements, and SQL Query answering the question using the CREATE statement as context. This dataset was built with text-to-sql LLMs in mind, intending to prevent hallucination of column and table names often seen when trained on text-to-sql datasets. The CREATE TABLE statement can often be copy and pasted from… See the full description on the dataset page: https://huggingface.co/datasets/philschmid/sql-create-context-copy.