14 datasets found
  1. h

    sql-create-context-copy

    • huggingface.co
    Updated Apr 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Philipp Schmid (2023). sql-create-context-copy [Dataset]. https://huggingface.co/datasets/philschmid/sql-create-context-copy
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 21, 2023
    Authors
    Philipp Schmid
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Fork of b-mc2/sql-create-context

      Overview
    

    This dataset builds from WikiSQL and Spider. There are 78,577 examples of natural language queries, SQL CREATE TABLE statements, and SQL Query answering the question using the CREATE statement as context. This dataset was built with text-to-sql LLMs in mind, intending to prevent hallucination of column and table names often seen when trained on text-to-sql datasets. The CREATE TABLE statement can often be copy and pasted from… See the full description on the dataset page: https://huggingface.co/datasets/philschmid/sql-create-context-copy.

  2. USPTO Patent Examiner Data System (PEDS) Data

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). USPTO Patent Examiner Data System (PEDS) Data [Dataset]. https://www.kaggle.com/bigquery/uspto-peds
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

    Context

    Patent Examination Data System gives users access to multiple records of USPTO patent application or patent filing status at no cost. PEDS is updated daily and mirrors the data available in the Patent Application Location and Monitoring system (PALM). PEDS provides access to public applications including: published patent applications and patents. PCT applications that have not been published by WIPO. Any applications that have not been released by the USPTO will not be available in PEDS.

    Content

    USPTO Patent Examiner Data System (PEDS) API Data contains data from the examination process of USPTO patent applications. PEDS contains the bibliographic, published document and patent term extension data tabs in Public PAIR from 1981 to present. There is also some data dating back to 1935.

    Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.

    Acknowledgements

    "Patent Examination Data System" by the USPTO, for public use.

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:uspto_peds

    Banner photo by Thought Catalog on Unsplash

  3. h

    sql-create-context-id

    • huggingface.co
    Updated Nov 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gede Putra Nugraha (2023). sql-create-context-id [Dataset]. https://huggingface.co/datasets/detakarang/sql-create-context-id
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 18, 2023
    Authors
    Gede Putra Nugraha
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview

    This dataset is a fork from sql-create-context This dataset builds from WikiSQL and Spider. There are 78,577 examples of natural language queries, SQL CREATE TABLE statements, and SQL Query answering the question using the CREATE statement as context. This dataset was built with text-to-sql LLMs in mind, intending to prevent hallucination of column and table names often seen when trained on text-to-sql datasets. The CREATE TABLE statement can often be copy and pasted from… See the full description on the dataset page: https://huggingface.co/datasets/detakarang/sql-create-context-id.

  4. Google Patents Public Data

    • kaggle.com
    zip
    Updated Sep 19, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2018). Google Patents Public Data [Dataset]. https://www.kaggle.com/datasets/bigquery/patents
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Sep 19, 2018
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

    Context

    Google Patents Public Data, provided by IFI CLAIMS Patent Services, is a worldwide bibliographic and US full-text dataset of patent publications. Patent information accessibility is critical for examining new patents, informing public policy decisions, managing corporate investment in intellectual property, and promoting future scientific innovation. The growing number of available patent data sources means researchers often spend more time downloading, parsing, loading, syncing and managing local databases than conducting analysis. With these new datasets, researchers and companies can access the data they need from multiple sources in one place, thus spending more time on analysis than data preparation.

    Content

    The Google Patents Public Data dataset contains a collection of publicly accessible, connected database tables for empirical analysis of the international patent system.

    Acknowledgements

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patents

    For more info, see the documentation at https://developers.google.com/web/tools/chrome-user-experience-report/

    “Google Patents Public Data” by IFI CLAIMS Patent Services and Google is licensed under a Creative Commons Attribution 4.0 International License.

    Banner photo by Helloquence on Unsplash

  5. USPTO Cancer Moonshot Patent Data

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). USPTO Cancer Moonshot Patent Data [Dataset]. https://www.kaggle.com/datasets/bigquery/uspto-oce-cancer
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

    Context

    This curated dataset consists of 269,353 patent documents (published patent applications and granted patents) spanning the 1976 to 2016 period and is intended to help identify promising R&D on the horizon in diagnostics, therapeutics, data analytics, and model biological systems.

    Content

    USPTO Cancer Moonshot Patent Data was generated using USPTO examiner tools to execute a series of queries designed to identify cancer-specific patents and patent applications. This includes drugs, diagnostics, cell lines, mouse models, radiation-based devices, surgical devices, image analytics, data analytics, and genomic-based inventions.

    Acknowledgements

    “USPTO Cancer Moonshot Patent Data” by the USPTO, for public use. Frumkin, Jesse and Myers, Amanda F., Cancer Moonshot Patent Data (August, 2016).

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:uspto_oce_cancer

    Banner photo by Jaron Nix on Unsplash

  6. The Office Action Research Dataset for Patents

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). The Office Action Research Dataset for Patents [Dataset]. https://www.kaggle.com/datasets/bigquery/uspto-oce-office-actions
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

    Context

    The Office Action Research Dataset for Patents contains detailed information derived from the Office actions issued by patent examiners to applicants during the patent examination process. The “Office action” is a written notification to the applicant of the examiner’s decision on patentability and generally discloses the grounds for a rejection, the claims affected, and the pertinent prior art.

    Content

    This initial release consists of three files derived from 4.4 million Office actions mailed during the 2008 to mid-2017 period from USPTO examiners to the applicants of 2.2 million unique patent applications.

    Acknowledgements

    A working paper describing this dataset is available and can be cited as Lu, Qiang and Myers, Amanda F. and Beliveau, Scott, USPTO Patent Prosecution Research Data: Unlocking Office Action Traits (November 20, 2017). USPTO Economic Working Paper No. 2017-10. Available at SSRN: https://ssrn.com/abstract=3024621 (link is external).

    This effort is made possible by the USPTO Digital Services & Big Data portfolio and collaboration with the USPTO Office of the Chief Economist (OCE). The OCE provides these data files for public use and encourages users to identify fixes and improvements. Please provide all feedback to: EconomicsData@uspto.gov.

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:uspto_oce_office_actions

    Banner photo by Trent Erwin on Unsplash

  7. USPTO OCE Patent Assignment Dataset

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). USPTO OCE Patent Assignment Dataset [Dataset]. https://www.kaggle.com/datasets/bigquery/uspto-oce-assignment
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

    Context

    The Office of the Chief Economist (OCE) is responsible for advising the Under Secretary of Commerce for Intellectual Property and Director of the USPTO on the economic implications of policies and programs affecting the U.S. intellectual property (IP) system. The office disseminates detailed patent and trademark data, undertakes research, and conducts economic analysis on a variety of IP issues. OCE works with policy makers, collaborates with academics, and engages the public more generally through conferences it organizes, the publicly accessible research datasets it provides, and its publications.

    Content

    The USPTO OCE Patent Assignment Dataset contains detailed data patent assignments and other transactions recorded at the USPTO since 1970.

    Acknowledgements

    "USPTO OCE Patent Assignment Data" by the USPTO, for public use. Marco, Alan C., Graham, Stuart J.H., Myers, Amanda F., D'Agostino, Paul A and Apple, Kirsten, "The USPTO Patent Assignment Dataset: Descriptions and Analysis" (July 27, 2015).

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:uspto_oce_assignment

    Banner photo by Jeff Sheldon on Unsplash

  8. USPTO OCE Patent Claims Research Data

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). USPTO OCE Patent Claims Research Data [Dataset]. https://www.kaggle.com/datasets/bigquery/uspto-oce-claims
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

    Context

    The Patent Claims Research Dataset contain detailed information on claims from U.S. patents granted between 1976 and 2014 and U.S. patent applications published between 2001 and 2014. The dataset is derived from the Patent Application Publication Full-Text and Patent Grant Full Text files, available at https://bulkdata.uspto.gov/, to which the Office of Chief Economist (OCE) applied a Python algorithm to identify individual claims as well as the dependency relationship between claims. From the parsed claims text, OCE created six data files containing individually-parsed claims, claim-level statistics, and document-level statistics, including newly-developed measures of patent scope.

    Content

    USPTO OCE Patent Claims Research data contains detailed information on claims from U.S. patents granted between 1976 and 2014 and U.S. patent applications published between 2001 and 2014.

    Acknowledgements

    "USPTO OCE Patent Claims Research Data" by the USPTO, for public use. Marco, Alan C. and Sarnoff, Joshua D. and deGrazia, Charles, "Patent Claims and Patent Scope" (October 2016). USPTO Economic Working Paper 2016-04.

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:uspto_oce_claims

    Banner photo by William Iven on Unsplash

  9. World Development Indicators (WDI) Data

    • kaggle.com
    zip
    Updated Aug 27, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2018). World Development Indicators (WDI) Data [Dataset]. https://www.kaggle.com/datasets/bigquery/worldbank-wdi
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Aug 27, 2018
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

    Context

    World Development Indicators (WDI) by World Bank includes data spanning up to 56 years—from 1960 to 2016. WDI frames global trends with indicators on population, population density, urbanization, GNI, and GDP. These indicators measure the world’s economy and progress toward improving lives, achieving sustainable development, providing support for vulnerable populations, and reducing gender disparities.

    Content

    World Development Indicators Data is the primary World Bank collection of development indicators, compiled from officially-recognized international sources. It presents the most current and accurate global development data available, and includes national, regional and global estimates.

    Acknowledgements

    “World Development Indicators” by the World Bank, used under CC BY 3.0 IGO.

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:worldbank_wdi

    Banner photo by Joshua Rawson-Harris on Unsplash

  10. Cooperative Patent Classification (CPC) Data

    • kaggle.com
    zip
    Updated Mar 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). Cooperative Patent Classification (CPC) Data [Dataset]. https://www.kaggle.com/bigquery/cpc
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 7, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

    Context

    The CPC is the result of a partnership between the EPO and the USPTO in their joint effort to develop a common, internationally compatible classification system for technical documents, in particular patent publications, which will be used by both offices in the patent granting process.

    Content

    Cooperative Patent Classification Data contains the scheme and definitions of the Cooperative Patent Classification system for classifying patent documents.

    Acknowledgments

    “Cooperative Patent Classification” by the EPO and USPTO, for public use. Modifications have been made to parse the XML description sections to extract references to other classification symbols.

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:cpc

    Banner photo by Helloquence on Unsplash

  11. USPTO OCE Patent Litigation Docket Reports Data

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). USPTO OCE Patent Litigation Docket Reports Data [Dataset]. https://www.kaggle.com/bigquery/uspto-oce-litigation
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Fork this notebook to get started on accessing data in the BigQuery dataset by writing SQL queries using the BQhelper module.

    Context

    OCE collected all of the data from the Public Access to Court Electronics Records (PACER) and RECAP, an independent project designed to serve as a repository for litigation data sourced from PACER. The final output datasets include information on the litigating parties involved and their attorneys; the cause of action; the court location; important dates in the litigation history; and descriptions of all documents submitted in a given case, which cover more than 5 million separate documents contained in the case docket reports.

    Content

    USPTO OCE Patent Litigation Docket Reports Data contains detailed patent litigation data on 74,623 unique district court cases filed during the period 1963-2015.

    Acknowledgements

    "USPTO OCE Patent Litigation Docket Reports Data" by the USPTO, for public use. Marco, A., A. Tesfayesus, A. Toole (2017). “Patent Litigation Data from US District Court Electronic Records (1963-2015).” USPTO Economic Working Paper No. 2017-06.

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:uspto_oce_litigation

    Banner photo by Samuel Zeller on Unsplash

  12. Disclosed Standard Essential Patents (dSEP) Data

    • kaggle.com
    zip
    Updated Aug 27, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2018). Disclosed Standard Essential Patents (dSEP) Data [Dataset]. https://www.kaggle.com/datasets/bigquery/dsep
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Aug 27, 2018
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    This database is the result of a combined effort of several European and US researchers to collect, clean and harmonise disclosed SEPs data at thirteen major standard setting organisations (including ETSI, ITU, IEEE, ISO, and more).

    Content

    Disclosed Standard Essential Patents (dSEP) Data provides a full overview of disclosed intellectual property rights at setting organizations worldwide.

    Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.

    Acknowledgements

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:dsep

    “Disclosed Standard Essential Patents Database” by Bekkers, R., Catalini, C., Martinelli, A., & Simcoe, T. (2012). Intellectual Property Disclosure in Standards Development. Proceedings from NBER conference on Standards, Patents & Innovation, Tucson (AZ), January 20 and 21, 2012.

    Banner photo by Helloquence on Unsplash

  13. ChEMBL EBI Small Molecules Database

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). ChEMBL EBI Small Molecules Database [Dataset]. https://www.kaggle.com/bigquery/ebi-chembl
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    ChEMBL is maintained by the European Bioinformatics Institute (EBI), of the European Molecular Biology Laboratory (EMBL), based at the Wellcome Trust Genome Campus, Hinxton, UK.

    Content

    ChEMBL is a manually curated database of bioactive molecules with drug-like properties used in drug discovery, including information about existing patented drugs.

    Schema: http://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_23/chembl_23_schema.png

    Documentation: http://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_23/schema_documentation.html

    Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.

    Acknowledgements

    “ChEMBL” by the European Bioinformatics Institute (EMBL-EBI), used under CC BY-SA 3.0. Modifications have been made to add normalized publication numbers.

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:ebi_chembl

    Banner photo by rawpixel on Unsplash

  14. PatentsView Data

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). PatentsView Data [Dataset]. https://www.kaggle.com/bigquery/patentsview
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    The USPTO grants US patents to inventors and assignees all over the world. For researchers in particular, PatentsView is intended to encourage the study and understanding of the intellectual property (IP) and innovation system; to serve as a fundamental function of the government in creating “public good” platforms in these data; and to eliminate redundant cleaning, converting and matching of these data by individual researchers, thus freeing up researcher time to do what they do best—study IP, innovation, and technological change.

    Content

    PatentsView Data is a database that longitudinally links inventors, their organizations, locations, and overall patenting activity. The dataset uses data derived from USPTO bulk data files.

    Fork this notebook to get started on accessing data in the BigQuery dataset using the BQhelper package to write SQL queries.

    Acknowledgements

    “PatentsView” by the USPTO, US Department of Agriculture (USDA), the Center for the Science of Science and Innovation Policy, New York University, the University of California at Berkeley, Twin Arch Technologies, and Periscopic, used under CC BY 4.0.

    Data Origin: https://bigquery.cloud.google.com/dataset/patents-public-data:patentsview

    Banner photo by rawpixel on Unsplash

  15. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Philipp Schmid (2023). sql-create-context-copy [Dataset]. https://huggingface.co/datasets/philschmid/sql-create-context-copy

sql-create-context-copy

sql-create-context

philschmid/sql-create-context-copy

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 21, 2023
Authors
Philipp Schmid
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Fork of b-mc2/sql-create-context

  Overview

This dataset builds from WikiSQL and Spider. There are 78,577 examples of natural language queries, SQL CREATE TABLE statements, and SQL Query answering the question using the CREATE statement as context. This dataset was built with text-to-sql LLMs in mind, intending to prevent hallucination of column and table names often seen when trained on text-to-sql datasets. The CREATE TABLE statement can often be copy and pasted from… See the full description on the dataset page: https://huggingface.co/datasets/philschmid/sql-create-context-copy.

Search
Clear search
Close search
Google apps
Main menu