7 datasets found
  1. Stack Overflow Data

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stack Overflow (2019). Stack Overflow Data [Dataset]. https://www.kaggle.com/datasets/stackoverflow/stackoverflow
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    Stack Overflowhttp://stackoverflow.com/
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Context

    Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their careers.

    Content

    Updated on a quarterly basis, this BigQuery dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges. This dataset is updated to mirror the Stack Overflow content on the Internet Archive, and is also available through the Stack Exchange Data Explorer.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    Dataset Source: https://archive.org/download/stackexchange

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:stackoverflow

    https://cloud.google.com/bigquery/public-data/stackoverflow

    Banner Photo by Caspar Rubin from Unplash.

    Inspiration

    What is the percentage of questions that have been answered over the years?

    What is the reputation and badge count of users across different tenures on StackOverflow?

    What are 10 of the “easier” gold badges to earn?

    Which day of the week has most questions answered within an hour?

  2. Cyclistic. Data Analysis in SQL BigQuery

    • kaggle.com
    zip
    Updated Aug 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maryia Lvouskaya (2023). Cyclistic. Data Analysis in SQL BigQuery [Dataset]. https://www.kaggle.com/datasets/maryialvouskaya/cyclistic-data-analysis-in-sql-bigquery
    Explore at:
    zip(634034907 bytes)Available download formats
    Dataset updated
    Aug 27, 2023
    Authors
    Maryia Lvouskaya
    Description

    There are cleaned datasets from the fictional bike-sharing company 'Cyclistic,' consisting of original data from the Divvy Bikes company. These datasets correspond to the period from 01/01/2020 to 30/06/2023. I obtained the original data from the following link: https://divvy-tripdata.s3.amazonaws.com/index.html. The files that I uploaded are original but have undergone the cleaning process in R. The data is properly licensed, well-organized, and dependable.

  3. Data from: Stack Overflow

    • console.cloud.google.com
    Updated Aug 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Stack%20Exchange&hl=id (2024). Stack Overflow [Dataset]. https://console.cloud.google.com/marketplace/details/stack-exchange/stack-overflow?hl=id
    Explore at:
    Dataset updated
    Aug 13, 2024
    Dataset provided by
    Googlehttp://google.com/
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their careers. Updated on a quarterly basis, this BigQuery dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges. This dataset is updated to mirror the Stack Overflow content on the Internet Archive, and is also available through the Stack Exchange Data Explorer. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  4. Chicago Crime (2015 - 2020)

    • kaggle.com
    zip
    Updated Dec 19, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ronnie (2021). Chicago Crime (2015 - 2020) [Dataset]. https://www.kaggle.com/datasets/redlineracer/chicago-crime-2015-2020
    Explore at:
    zip(1275046 bytes)Available download formats
    Dataset updated
    Dec 19, 2021
    Authors
    Ronnie
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Chicago
    Description

    Context

    This dataset contains information on Chicago crime reported between 2015 and 2020.

    Content

    This dataset is a subset of the BigQuery public database on Chicago Crime.

    Acknowledgements

    I appreciate the efforts of BigQuery hosting and allowing access to their public databases and Kaggle for providing a space for the widespread sharing of data and knowledge.

    Inspiration

    This dataset is a useful learning tool for applying descriptive statistics, analytics, and visualisations. For example, one could look at crime trends over time, identify areas with the lowest amount of crime, calculate the propability that an arrest is made based on crime type or area, and determine days of the week with the highest and lowest crime.

  5. h

    stackexchange

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Albert Gong, stackexchange [Dataset]. https://huggingface.co/datasets/ag2435/stackexchange
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Albert Gong
    Description

    StackExchange Dataset

    Working doc: https://docs.google.com/document/d/1h585bH5sYcQW4pkHzqWyQqA4ape2Bq6o1Cya0TkMOQc/edit?usp=sharing

    BigQuery query (see so_bigquery.ipynb): CREATE TEMP TABLE answers AS SELECT * FROM bigquery-public-data.stackoverflow.posts_answers WHERE LOWER(Body) LIKE '%arxiv%';

    CREATE TEMPORARY TABLE questions AS SELECT * FROM bigquery-public-data.stackoverflow.posts_questions;

    SELECT * FROM answers JOIN questions ON questions.id = answers.parent_id;

    NOTE:… See the full description on the dataset page: https://huggingface.co/datasets/ag2435/stackexchange.

  6. Patent PDF Samples with Extracted Structured Data

    • console.cloud.google.com
    Updated Jul 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:Subsets%20of%20Patent%20Data&hl=de (2023). Patent PDF Samples with Extracted Structured Data [Dataset]. https://console.cloud.google.com/marketplace/product/global-patents/labeled-patents?hl=de
    Explore at:
    Dataset updated
    Jul 20, 2023
    Dataset provided by
    Googlehttp://google.com/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset consists of PDFs in Google Cloud Storage from the first page of select US and EU patents, and BigQuery tables with extracted entities, labels, and other properties, including a link to each file in GCS. The structured data contains labels for eleven patent entities (patent inventor, publication date, classification number, patent title, etc.), global properties (US/EU issued, language, invention type), and the location of any figures or schematics on the patent's first page. The structured data is the result of a data entry operation collecting information from PDF documents, making the dataset a useful testing ground for benchmarking and developing AI/ML systems intended to perform broad document understanding tasks like extraction of structured data from unstructured documents. This dataset can be used to develop and benchmark natural language tasks such as named entity recognition and text classification, AI/ML vision tasks such as image classification and object detection, as well as more general AI/ML tasks such as automated data entry and document understanding. Google is sharing this dataset to support the AI/ML community because there is a shortage of document extraction/understanding datasets shared under an open license. This public dataset is hosted in Google Cloud Storage and Google BigQuery. It is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery or this this Cloud Storage quick start guide to begin.

  7. Cyclistic Bike Share: A Case Study

    • kaggle.com
    zip
    Updated Jul 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Casey Kellerhals (2023). Cyclistic Bike Share: A Case Study [Dataset]. https://www.kaggle.com/datasets/caskelle/cyclistic-bike-share-a-case-study/code
    Explore at:
    zip(269575250 bytes)Available download formats
    Dataset updated
    Jul 25, 2023
    Authors
    Casey Kellerhals
    Description

    The Mission Statement

    Cyclistic, a bike sharing company, wants to analyze their user data to find the main differences in behavior between their two types of users. The Casual Riders are those who pay for each ride and the Annual Member who pays a yearly subscription to the service.

    PHASE 1 : ASK

    Key objectives: 1.Identify The Business Task: - Cyclistic wants to analyze the data to find the key differences between Casual Riders and Annual Members. The goal of this project is to reach out to the casual riders and incentivize them into paying for the annual subscription.

    1. Consider Key Stakeholders:
      • The key stakeholders in this project are the executive team and the director of marketing, Lily Moreno.

    PHASE 2 : Prepare

    Key objectives: 1. Download Data And Store It Appropriately - Downloaded the data as .csv files, which were saved in their own folder to keep everything organized. I then uploaded those files into BigQuery for cleaning and analysis. For this project I downloaded all of 2022 and up to May of 2023, as this is the most recent data that I have access to.

    1. Identify How It's Organized

      • The data is organized into months, from 01-2022 to 05-2023.
    2. Sort and Filter The Data and Determine The Credibility of The Data

      • For this data I used BigQuery and SQL in order to sort, filter and analyze the credibility of the data. The data is collected first hand by Cyslistic and there is a lot of information to work with. I filtered out the data that I wanted to work with, the data that I chose were the types of bikes, the types of members and the date the bikes were used.

    PHASE 3 : Process

    Key objectives: 1.Clean The Data and Prepare The Data For Analysis: -I used some simple SQL code in order to determine that no members were missing, that no information was repeated and that there were no misspellings in the data as well.

    --no misspelling in either member or casual. This ensures that all results will not have missing information. SELECT DISTINCT member_casual
    FROM table

    --This shows how many casual riders and members used the service, should add up to the numb of rows in the dataset SELECT member_casual AS member_type, COUNT(*) AS total_riders FROM table GROUP BY member_type

    --Shows that every bike has a distinct ID. SELECT DISTINCT ride_id FROM table

    --Shows that there are no typos in the types of bikes, so no data will be missing from results. SELECT DISTINCT rideable_type FROM table

    PHASE 4 : Analyze

    Key objectives: 1. Aggregate Your Data So It's Useful and Accessible -I had to write some SQL code so that I could combine all the data from the different files I had uploaded onto BigQuery

    select rideable_type, started_at, ended_at, member_casual from table 1 union all select rideable_type, started_at, ended_at, member_casual from table 2 union all select rideable_type, started_at, ended_at, member_casual from table 3 union all select rideable_type, started_at, ended_at, member_casual from table 4 union all select rideable_type, started_at, ended_at, member_casual from table 5 union all select rideable_type, started_at, ended_at, member_casual from table 6 union all select rideable_type, started_at, ended_at, member_casual from table 7 union all select rideable_type, started_at, ended_at, member_casual from table 8 union all select rideable_type, started_at, ended_at, member_casual from table 9 union all select rideable_type, started_at, ended_at, member_casual from table10 union all select rideable_type, started_at, ended_at, member_casual from table 11 union all select rideable_type, started_at, ended_at, member_casual from table 12 union all select rideable_type, started_at, ended_at, member_casual from table 13 union all select rideable_type, started_at, ended_at, member_casual from table 14 union all select rideable_type, started_at, ended_at, member_casual from table 15 union all select rideable_type, started_at, ended_at, member_casual from table 16 union all select rideable_type, started_at, ended_at, member_casual from table 17

    1. Identify trends and relationships -After I had aggregated all of the data I had chosen, I then ran SQL code to determine the trends and relationships contained within the data. After analyzing the data, I uploaded that data into google sheets to make the graphs to express those trends and make it easier to identify the key differences between Casual Riders and Annual Members.

    --This shows how many casual and annual members used bikes SELECT member_casual AS member_type, COUNT(*) AS total_riders FROM Aggregate Data Table GROUP BY member_type

    ![](https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F14378099%2Fe09c3496bf38d323f8323f52f67...

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Stack Overflow (2019). Stack Overflow Data [Dataset]. https://www.kaggle.com/datasets/stackoverflow/stackoverflow
Organization logo

Stack Overflow Data

Stack Overflow Data (BigQuery Dataset)

Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
Stack Overflowhttp://stackoverflow.com/
License

Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically

Description

Context

Stack Overflow is the largest online community for programmers to learn, share their knowledge, and advance their careers.

Content

Updated on a quarterly basis, this BigQuery dataset includes an archive of Stack Overflow content, including posts, votes, tags, and badges. This dataset is updated to mirror the Stack Overflow content on the Internet Archive, and is also available through the Stack Exchange Data Explorer.

Fork this kernel to get started with this dataset.

Acknowledgements

Dataset Source: https://archive.org/download/stackexchange

https://bigquery.cloud.google.com/dataset/bigquery-public-data:stackoverflow

https://cloud.google.com/bigquery/public-data/stackoverflow

Banner Photo by Caspar Rubin from Unplash.

Inspiration

What is the percentage of questions that have been answered over the years?

What is the reputation and badge count of users across different tenures on StackOverflow?

What are 10 of the “easier” gold badges to earn?

Which day of the week has most questions answered within an hour?

Search
Clear search
Close search
Google apps
Main menu