4 datasets found
  1. Data from: Stack Overflow

    • console.cloud.google.com
    Updated Mar 4, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Stack%20Exchange&inv=1&invt=Abz7JQ (2020). Stack Overflow [Dataset]. https://console.cloud.google.com/marketplace/product/stack-exchange/stack-overflow
    Explore at:
    Dataset updated
    Mar 4, 2020
    Dataset provided by
    Googlehttp://google.com/
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their careers. Updated on a quarterly basis, this BigQuery dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges. This dataset is updated to mirror the Stack Overflow content on the Internet Archive, and is also available through the Stack Exchange Data Explorer. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  2. h

    stackexchange

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Albert Gong, stackexchange [Dataset]. https://huggingface.co/datasets/ag2435/stackexchange
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Albert Gong
    Description

    StackExchange Dataset

    Working doc: https://docs.google.com/document/d/1h585bH5sYcQW4pkHzqWyQqA4ape2Bq6o1Cya0TkMOQc/edit?usp=sharing

    BigQuery query (see so_bigquery.ipynb): CREATE TEMP TABLE answers AS SELECT * FROM bigquery-public-data.stackoverflow.posts_answers WHERE LOWER(Body) LIKE '%arxiv%';

    CREATE TEMPORARY TABLE questions AS SELECT * FROM bigquery-public-data.stackoverflow.posts_questions;

    SELECT * FROM answers JOIN questions ON questions.id = answers.parent_id;

    NOTE:… See the full description on the dataset page: https://huggingface.co/datasets/ag2435/stackexchange.

  3. p

    MIMIC-IV

    • physionet.org
    Updated Oct 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark (2024). MIMIC-IV [Dataset]. http://doi.org/10.13026/kpb9-mt58
    Explore at:
    Dataset updated
    Oct 11, 2024
    Authors
    Alistair Johnson; Lucas Bulgarelli; Tom Pollard; Brian Gow; Benjamin Moody; Steven Horng; Leo Anthony Celi; Roger Mark
    License

    https://github.com/MIT-LCP/license-and-dua/tree/master/draftshttps://github.com/MIT-LCP/license-and-dua/tree/master/drafts

    Description

    Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. Here we present Medical Information Mart for Intensive Care (MIMIC)-IV, a large deidentified dataset of patients admitted to the emergency department or an intensive care unit at the Beth Israel Deaconess Medical Center in Boston, MA. MIMIC-IV contains data for over 65,000 patients admitted to an ICU and over 200,000 patients admitted to the emergency department. MIMIC-IV incorporates contemporary data and adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.

  4. Patent PDF Samples with Extracted Structured Data

    • console.cloud.google.com
    Updated Jul 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Subsets%20of%20Patent%20Data&hl=de&inv=1&invt=Ab0Jiw (2023). Patent PDF Samples with Extracted Structured Data [Dataset]. https://console.cloud.google.com/marketplace/product/global-patents/labeled-patents?hl=de
    Explore at:
    Dataset updated
    Jul 20, 2023
    Dataset provided by
    Googlehttp://google.com/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset consists of PDFs in Google Cloud Storage from the first page of select US and EU patents, and BigQuery tables with extracted entities, labels, and other properties, including a link to each file in GCS. The structured data contains labels for eleven patent entities (patent inventor, publication date, classification number, patent title, etc.), global properties (US/EU issued, language, invention type), and the location of any figures or schematics on the patent's first page. The structured data is the result of a data entry operation collecting information from PDF documents, making the dataset a useful testing ground for benchmarking and developing AI/ML systems intended to perform broad document understanding tasks like extraction of structured data from unstructured documents. This dataset can be used to develop and benchmark natural language tasks such as named entity recognition and text classification, AI/ML vision tasks such as image classification and object detection, as well as more general AI/ML tasks such as automated data entry and document understanding. Google is sharing this dataset to support the AI/ML community because there is a shortage of document extraction/understanding datasets shared under an open license. This public dataset is hosted in Google Cloud Storage and Google BigQuery. It is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery or this this Cloud Storage quick start guide to begin.

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:Stack%20Exchange&inv=1&invt=Abz7JQ (2020). Stack Overflow [Dataset]. https://console.cloud.google.com/marketplace/product/stack-exchange/stack-overflow
Organization logo

Data from: Stack Overflow

Related Article
Explore at:
5 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Mar 4, 2020
Dataset provided by
Googlehttp://google.com/
License

Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically

Description

Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their careers. Updated on a quarterly basis, this BigQuery dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges. This dataset is updated to mirror the Stack Overflow content on the Internet Archive, and is also available through the Stack Exchange Data Explorer. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

Search
Clear search
Close search
Google apps
Main menu