18 datasets found
  1. Meta Kaggle Code

    • kaggle.com
    zip
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
    Explore at:
    zip(151045619431 bytes)Available download formats
    Dataset updated
    Jul 31, 2025
    Dataset authored and provided by
    Kagglehttp://kaggle.com/
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Explore our public notebook content!

    Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

    Why we’re releasing this dataset

    By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

    Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

    The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

    Sensitive data

    While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

    Joining with Meta Kaggle

    The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

    File organization

    The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

    The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

    Questions / Comments

    We love feedback! Let us know in the Discussion tab.

    Happy Kaggling!

  2. Google Landmarks Dataset v2

    • github.com
    • opendatalab.com
    Updated Sep 27, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google (2019). Google Landmarks Dataset v2 [Dataset]. https://github.com/cvdfoundation/google-landmark
    Explore at:
    Dataset updated
    Sep 27, 2019
    Dataset provided by
    Googlehttp://google.com/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the second version of the Google Landmarks dataset (GLDv2), which contains images annotated with labels representing human-made and natural landmarks. The dataset can be used for landmark recognition and retrieval experiments. This version of the dataset contains approximately 5 million images, split into 3 sets of images: train, index and test. The dataset was presented in our CVPR'20 paper. In this repository, we present download links for all dataset files and relevant code for metric computation. This dataset was associated to two Kaggle challenges, on landmark recognition and landmark retrieval. Results were discussed as part of a CVPR'19 workshop. In this repository, we also provide scores for the top 10 teams in the challenges, based on the latest ground-truth version. Please visit the challenge and workshop webpages for more details on the data, tasks and technical solutions from top teams.

  3. Google Trends

    • console.cloud.google.com
    Updated Oct 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=JA&inv=1&invt=Ab3yrg (2023). Google Trends [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-search-trends?hl=JA
    Explore at:
    Dataset updated
    Oct 12, 2023
    Dataset provided by
    Google Searchhttp://google.com/
    Googlehttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Description

    The Google Trends dataset will provide critical signals that individual users and businesses alike can leverage to make better data-driven decisions. This dataset simplifies the manual interaction with the existing Google Trends UI by automating and exposing anonymized, aggregated, and indexed search data in BigQuery. This dataset includes the Top 25 stories and Top 25 Rising queries from Google Trends. It will be made available as two separate BigQuery tables, with a set of new top terms appended daily. Each set of Top 25 and Top 25 rising expires after 30 days, and will be accompanied by a rolling five-year window of historical data in 210 distinct locations in the United States. This Google dataset is hosted in Google BigQuery as part of Google Cloud's Datasets solution and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery

  4. Google Trends - International

    • console.cloud.google.com
    Updated May 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=it&inv=1&invt=Ab4Ugg (2023). Google Trends - International [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-trends-intl?hl=it
    Explore at:
    Dataset updated
    May 6, 2023
    Dataset provided by
    Google Searchhttp://google.com/
    BigQueryhttps://cloud.google.com/bigquery
    Googlehttp://google.com/
    Description

    The International Google Trends dataset will provide critical signals that individual users and businesses alike can leverage to make better data-driven decisions. This dataset simplifies the manual interaction with the existing Google Trends UI by automating and exposing anonymized, aggregated, and indexed search data in BigQuery. This dataset includes the Top 25 stories and Top 25 Rising queries from Google Trends. It will be made available as two separate BigQuery tables, with a set of new top terms appended daily. Each set of Top 25 and Top 25 rising expires after 30 days, and will be accompanied by a rolling five-year window of historical data for each country and region across the globe, where data is available. This Google dataset is hosted in Google BigQuery as part of Google Cloud's Datasets solution and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery

  5. a

    Data from: Google Earth Engine (GEE)

    • catalog-usgs.opendata.arcgis.com
    • data.amerigeoss.org
    • +4more
    Updated Nov 28, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AmeriGEOSS (2018). Google Earth Engine (GEE) [Dataset]. https://catalog-usgs.opendata.arcgis.com/datasets/amerigeoss::google-earth-engine-gee
    Explore at:
    Dataset updated
    Nov 28, 2018
    Dataset authored and provided by
    AmeriGEOSS
    Description

    Meet Earth EngineGoogle Earth Engine combines a multi-petabyte catalog of satellite imagery and geospatial datasets with planetary-scale analysis capabilities and makes it available for scientists, researchers, and developers to detect changes, map trends, and quantify differences on the Earth's surface.SATELLITE IMAGERY+YOUR ALGORITHMS+REAL WORLD APPLICATIONSLEARN MOREGLOBAL-SCALE INSIGHTExplore our interactive timelapse viewer to travel back in time and see how the world has changed over the past twenty-nine years. Timelapse is one example of how Earth Engine can help gain insight into petabyte-scale datasets.EXPLORE TIMELAPSEREADY-TO-USE DATASETSThe public data archive includes more than thirty years of historical imagery and scientific datasets, updated and expanded daily. It contains over twenty petabytes of geospatial data instantly available for analysis.EXPLORE DATASETSSIMPLE, YET POWERFUL APIThe Earth Engine API is available in Python and JavaScript, making it easy to harness the power of Google’s cloud for your own geospatial analysis.EXPLORE THE APIGoogle Earth Engine has made it possible for the first time in history to rapidly and accurately process vast amounts of satellite imagery, identifying where and when tree cover change has occurred at high resolution. Global Forest Watch would not exist without it. For those who care about the future of the planet Google Earth Engine is a great blessing!-Dr. Andrew Steer, President and CEO of the World Resources Institute.CONVENIENT TOOLSUse our web-based code editor for fast, interactive algorithm development with instant access to petabytes of data.LEARN ABOUT THE CODE EDITORSCIENTIFIC AND HUMANITARIAN IMPACTScientists and non-profits use Earth Engine for remote sensing research, predicting disease outbreaks, natural resource management, and more.SEE CASE STUDIESREADY TO BE PART OF THE SOLUTION?SIGN UP NOWTERMS OF SERVICE PRIVACY ABOUT GOOGLE

  6. NPPES Plan and Provider Enumeration System

    • kaggle.com
    zip
    Updated Mar 20, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Medicare & Medicaid Services (2019). NPPES Plan and Provider Enumeration System [Dataset]. https://www.kaggle.com/cms/nppes
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 20, 2019
    Dataset authored and provided by
    Centers for Medicare & Medicaid Services
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The CMS National Plan and Provider Enumeration System (NPPES) was developed as part of the Administrative Simplification provisions in the original HIPAA act. The primary purpose of NPPES was to develop a unique identifier for each physician that billed medicare and medicaid. This identifier is now known as the National Provider Identifier Standard (NPI) which is a required 10 digit number that is unique to an individual provider at the national level.

    Once an NPI record is assigned to a healthcare provider, parts of the NPI record that have public relevance, including the provider’s name, speciality, and practice address are published in a searchable website as well as downloadable file of zipped data containing all of the FOIA disclosable health care provider data in NPPES and a separate PDF file of code values which documents and lists the descriptions for all of the codes found in the data file.

    Content

    The dataset contains the latest NPI downloadable file in an easy to query BigQuery table, npi_raw. In addition, there is a second table, npi_optimized which harnesses the power of Big Query’s next-generation columnar storage format to provide an analytical view of the NPI data containing description fields for the codes based on the mappings in Data Dissemination Public File - Code Values documentation as well as external lookups to the healthcare provider taxonomy codes . While this generates hundreds of columns, BigQuery makes it possible to process all this data effectively and have a convenient single lookup table for all provider information.

    Fork this kernel to get started.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:nppes?_ga=2.117120578.-577194880.1523455401

    https://console.cloud.google.com/marketplace/details/hhs/nppes?filter=category:science-research

    Dataset Source: Center for Medicare and Medicaid Services. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by @rawpixel from Unplash.

    Inspiration

    What are the top ten most common types of physicians in Mountain View?

    What are the names and phone numbers of dentists in California who studied public health?

  7. Ethereum Blockchain

    • kaggle.com
    zip
    Updated Mar 4, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Google BigQuery (2019). Ethereum Blockchain [Dataset]. https://www.kaggle.com/datasets/bigquery/ethereum-blockchain
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Mar 4, 2019
    Dataset provided by
    BigQueryhttps://cloud.google.com/bigquery
    Authors
    Google BigQuery
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Bitcoin and other cryptocurrencies have captured the imagination of technologists, financiers, and economists. Digital currencies are only one application of the underlying blockchain technology. Like its predecessor, Bitcoin, the Ethereum blockchain can be described as an immutable distributed ledger. However, creator Vitalik Buterin also extended the set of capabilities by including a virtual machine that can execute arbitrary code stored on the blockchain as smart contracts.

    Both Bitcoin and Ethereum are essentially OLTP databases, and provide little in the way of OLAP (analytics) functionality. However the Ethereum dataset is notably distinct from the Bitcoin dataset:

    • The Ethereum blockchain has as its primary unit of value Ether, while the Bitcoin blockchain has Bitcoin. However, the majority of value transfer on the Ethereum blockchain is composed of so-called tokens. Tokens are created and managed by smart contracts.

    • Ether value transfers are precise and direct, resembling accounting ledger debits and credits. This is in contrast to the Bitcoin value transfer mechanism, for which it can be difficult to determine the balance of a given wallet address.

    • Addresses can be not only wallets that hold balances, but can also contain smart contract bytecode that allows the programmatic creation of agreements and automatic triggering of their execution. An aggregate of coordinated smart contracts could be used to build a decentralized autonomous organization.

    Content

    The Ethereum blockchain data are now available for exploration with BigQuery. All historical data are in the ethereum_blockchain dataset, which updates daily.

    Our hope is that by making the data on public blockchain systems more readily available it promotes technological innovation and increases societal benefits.

    Querying BigQuery tables

    You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.crypto_ethereum.[TABLENAME]. Fork this kernel to get started.

    Acknowledgements

    Cover photo by Thought Catalog on Unsplash

    Inspiration

    • What are the most popularly exchanged digital tokens, represented by ERC-721 and ERC-20 smart contracts?
    • Compare transaction volume and transaction networks over time
    • Compare transaction volume to historical prices by joining with other available data sources like Bitcoin Historical Data
  8. Z

    Dataset: A Systematic Literature Review on the topic of High-value datasets

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrea Miletič (2023). Dataset: A Systematic Literature Review on the topic of High-value datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7944424
    Explore at:
    Dataset updated
    Jun 23, 2023
    Dataset provided by
    Andrea Miletič
    Nina Rizun
    Magdalena Ciesielska
    Anastasija Nikiforova
    Charalampos Alexopoulos
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains data collected during a study ("Towards High-Value Datasets determination for data-driven development: a systematic literature review") conducted by Anastasija Nikiforova (University of Tartu), Nina Rizun, Magdalena Ciesielska (Gdańsk University of Technology), Charalampos Alexopoulos (University of the Aegean) and Andrea Miletič (University of Zagreb) It being made public both to act as supplementary data for "Towards High-Value Datasets determination for data-driven development: a systematic literature review" paper (pre-print is available in Open Access here -> https://arxiv.org/abs/2305.10234) and in order for other researchers to use these data in their own work.

    The protocol is intended for the Systematic Literature review on the topic of High-value Datasets with the aim to gather information on how the topic of High-value datasets (HVD) and their determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks. The data in this dataset were collected in the result of the SLR over Scopus, Web of Science, and Digital Government Research library (DGRL) in 2023.

    Methodology

    To understand how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, all relevant literature covering this topic has been studied. To this end, the SLR was carried out to by searching digital libraries covered by Scopus, Web of Science (WoS), Digital Government Research library (DGRL).

    These databases were queried for keywords ("open data" OR "open government data") AND ("high-value data*" OR "high value data*"), which were applied to the article title, keywords, and abstract to limit the number of papers to those, where these objects were primary research objects rather than mentioned in the body, e.g., as a future work. After deduplication, 11 articles were found unique and were further checked for relevance. As a result, a total of 9 articles were further examined. Each study was independently examined by at least two authors.

    To attain the objective of our study, we developed the protocol, where the information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information.

    Test procedure Each study was independently examined by at least two authors, where after the in-depth examination of the full-text of the article, the structured protocol has been filled for each study. The structure of the survey is available in the supplementary file available (see Protocol_HVD_SLR.odt, Protocol_HVD_SLR.docx) The data collected for each study by two researchers were then synthesized in one final version by the third researcher.

    Description of the data in this data set

    Protocol_HVD_SLR provides the structure of the protocol Spreadsheets #1 provides the filled protocol for relevant studies. Spreadsheet#2 provides the list of results after the search over three indexing databases, i.e. before filtering out irrelevant studies

    The information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information

    Descriptive information
    1) Article number - a study number, corresponding to the study number assigned in an Excel worksheet 2) Complete reference - the complete source information to refer to the study 3) Year of publication - the year in which the study was published 4) Journal article / conference paper / book chapter - the type of the paper -{journal article, conference paper, book chapter} 5) DOI / Website- a link to the website where the study can be found 6) Number of citations - the number of citations of the article in Google Scholar, Scopus, Web of Science 7) Availability in OA - availability of an article in the Open Access 8) Keywords - keywords of the paper as indicated by the authors 9) Relevance for this study - what is the relevance level of the article for this study? {high / medium / low}

    Approach- and research design-related information 10) Objective / RQ - the research objective / aim, established research questions 11) Research method (including unit of analysis) - the methods used to collect data, including the unit of analy-sis (country, organisation, specific unit that has been ana-lysed, e.g., the number of use-cases, scope of the SLR etc.) 12) Contributions - the contributions of the study 13) Method - whether the study uses a qualitative, quantitative, or mixed methods approach? 14) Availability of the underlying research data- whether there is a reference to the publicly available underly-ing research data e.g., transcriptions of interviews, collected data, or explanation why these data are not shared? 15) Period under investigation - period (or moment) in which the study was conducted 16) Use of theory / theoretical concepts / approaches - does the study mention any theory / theoretical concepts / approaches? If any theory is mentioned, how is theory used in the study?

    Quality- and relevance- related information
    17) Quality concerns - whether there are any quality concerns (e.g., limited infor-mation about the research methods used)? 18) Primary research object - is the HVD a primary research object in the study? (primary - the paper is focused around the HVD determination, sec-ondary - mentioned but not studied (e.g., as part of discus-sion, future work etc.))

    HVD determination-related information
    19) HVD definition and type of value - how is the HVD defined in the article and / or any other equivalent term? 20) HVD indicators - what are the indicators to identify HVD? How were they identified? (components & relationships, “input -> output") 21) A framework for HVD determination - is there a framework presented for HVD identification? What components does it consist of and what are the rela-tionships between these components? (detailed description) 22) Stakeholders and their roles - what stakeholders or actors does HVD determination in-volve? What are their roles? 23) Data - what data do HVD cover? 24) Level (if relevant) - what is the level of the HVD determination covered in the article? (e.g., city, regional, national, international)

    Format of the file .xls, .csv (for the first spreadsheet only), .odt, .docx

    Licenses or restrictions CC-BY

    For more info, see README.txt

  9. Linked Open Data Management Services: A Comparison

    • zenodo.org
    • data.niaid.nih.gov
    Updated Sep 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert Nasarek; Robert Nasarek; Lozana Rossenova; Lozana Rossenova (2023). Linked Open Data Management Services: A Comparison [Dataset]. http://doi.org/10.5281/zenodo.7738424
    Explore at:
    Dataset updated
    Sep 18, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Robert Nasarek; Robert Nasarek; Lozana Rossenova; Lozana Rossenova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Thanks to a variety of software services, it has never been easier to produce, manage and publish Linked Open Data. But until now, there has been a lack of an accessible overview to help researchers make the right choice for their use case. This dataset release will be regularly updated to reflect the latest data published in a comparison table developed in Google Sheets [1]. The comparison table includes the most commonly used LOD management software tools from NFDI4Culture to illustrate what functionalities and features a service should offer for the long-term management of FAIR research data, including:

    • ConedaKOR
    • LinkedDataHub
    • Metaphacts
    • Omeka S
    • ResearchSpace
    • Vitro
    • Wikibase
    • WissKI

    The table presents two views based on a comparison system of categories developed iteratively during workshops with expert users and developers from the respective tool communities. First, a short overview with field values coming from controlled vocabularies and multiple-choice options; and a second sheet allowing for more descriptive free text additions. The table and corresponding dataset releases for each view mode are designed to provide a well-founded basis for evaluation when deciding on a LOD management service. The Google Sheet table will remain open to collaboration and community contribution, as well as updates with new data and potentially new tools, whereas the datasets released here are meant to provide stable reference points with version control.

    The research for the comparison table was first presented as a paper at DHd2023, Open Humanities – Open Culture, 13-17.03.2023, Trier and Luxembourg [2].

    [1] Non-editing access is available here: docs.google.com/spreadsheets/d/1FNU8857JwUNFXmXAW16lgpjLq5TkgBUuafqZF-yo8_I/edit?usp=share_link To get editing access contact the authors.

    [2] Full paper will be made available open access in the conference proceedings.

  10. T

    speech_commands

    • tensorflow.org
    • datasets.activeloop.ai
    • +1more
    Updated Jan 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). speech_commands [Dataset]. http://identifiers.org/arxiv:1804.03209
    Explore at:
    Dataset updated
    Jan 13, 2023
    Description

    An audio dataset of spoken words designed to help train and evaluate keyword spotting systems. Its primary goal is to provide a way to build and test small models that detect when a single word is spoken, from a set of ten target words, with as few false positives as possible from background noise or unrelated speech. Note that in the train and validation set, the label "unknown" is much more prevalent than the labels of the target words or background noise. One difference from the release version is the handling of silent segments. While in the test set the silence segments are regular 1 second files, in the training they are provided as long segments under "background_noise" folder. Here we split these background noise into 1 second clips, and also keep one of the files for the validation set.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('speech_commands', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  11. COVID-19 Community Mobility Reports

    • google.com
    • google.com.tr
    • +4more
    csv, pdf
    Updated Oct 17, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    COVID-19 Community Mobility Reports [Dataset]. https://www.google.com/covid19/mobility/
    Explore at:
    csv, pdfAvailable download formats
    Dataset updated
    Oct 17, 2022
    Dataset provided by
    Google Searchhttp://google.com/
    Googlehttp://google.com/
    Authors
    Google
    Description

    As global communities responded to COVID-19, we heard from public health officials that the same type of aggregated, anonymized insights we use in products such as Google Maps would be helpful as they made critical decisions to combat COVID-19. These Community Mobility Reports aimed to provide insights into what changed in response to policies aimed at combating COVID-19. The reports charted movement trends over time by geography, across different categories of places such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential.

  12. O uso do Instagram e TikTok como mecanismos de busca

    • zenodo.org
    • explore.openaire.eu
    bin
    Updated Jul 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Claudio de Jesus Junior; Claudio de Jesus Junior (2025). O uso do Instagram e TikTok como mecanismos de busca [Dataset]. http://doi.org/10.5281/zenodo.15802948
    Explore at:
    binAvailable download formats
    Dataset updated
    Jul 4, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Claudio de Jesus Junior; Claudio de Jesus Junior
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Conjunto de respostas do formulário "O uso do Instagram e TikTok como mecanismos de busca". O formulário foi aplicado de forma online para centenas de pessoas de diferentes faixas etárias e níveis de escolaridade com o intuito de identificar o comportamento de busca dos usuários no Instagram e TikTok em comparação à Pesquisa Google. No total, foram obtidas 231 respostas válidas e todas elas estão inclusas neste arquivo de forma anonimizada. O questionário foi aplicado via Google Forms, se mantendo apto para o recolhimento de respostas do dia 20 de julho de 2024 até 27 de julho de 2024.

    Dataset from the form "The use of Instagram and TikTok as search engines." The form was administered online to hundreds of individuals from different age groups and educational levels, aiming to identify users' search behavior on Instagram and TikTok compared to Google Search. In total, 231 valid responses were collected, all of which are included in this file in an anonymized format. The questionnaire was administered via Google Forms and remained open for responses from July 20, 2024, to July 27, 2024.

  13. T

    glue

    • tensorflow.org
    • tensorflow.google.cn
    • +1more
    Updated Dec 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). glue [Dataset]. https://www.tensorflow.org/datasets/catalog/glue
    Explore at:
    Dataset updated
    Dec 6, 2022
    Description

    GLUE, the General Language Understanding Evaluation benchmark (https://gluebenchmark.com/) is a collection of resources for training, evaluating, and analyzing natural language understanding systems.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('glue', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  14. Google energy consumption 2011-2023

    • statista.com
    • ai-chatbox.pro
    Updated Oct 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2024). Google energy consumption 2011-2023 [Dataset]. https://www.statista.com/statistics/788540/energy-consumption-of-google/
    Explore at:
    Dataset updated
    Oct 11, 2024
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    Google’s energy consumption has increased over the last few years, reaching 25.9 terawatt hours in 2023, up from 12.8 terawatt hours in 2019. The company has made efforts to make its data centers more efficient through customized high-performance servers, using smart temperature and lighting, advanced cooling techniques, and machine learning. Datacenters and energy Through its operations, Google pursues a more sustainable impact on the environment by creating efficient data centers that use less energy than the average, transitioning towards renewable energy, creating sustainable workplaces, and providing its users with the technological means towards a cleaner future for the future generations. Through its efficient data centers, Google has also managed to divert waste from its operations away from landfills. Reducing Google’s carbon footprint Google’s clean energy efforts is also related to their efforts to reduce their carbon footprint. Since their commitment to using 100 percent renewable energy, the company has met their targets largely through solar and wind energy power purchase agreements and buying renewable power from utilities. Google is one of the largest corporate purchasers of renewable energy in the world.

  15. Global net revenue of Amazon 2014-2024, by product group

    • statista.com
    • ai-chatbox.pro
    Updated Feb 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Global net revenue of Amazon 2014-2024, by product group [Dataset]. https://www.statista.com/statistics/672747/amazons-consolidated-net-revenue-by-segment/
    Explore at:
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    In 2024, Amazon's net revenue from subscription services segment amounted to 44.37 billion U.S. dollars. Subscription services include Amazon Prime, for which Amazon reported 200 million paying members worldwide at the end of 2020. The AWS category generated 107.56 billion U.S. dollars in annual sales. During the most recently reported fiscal year, the company’s net revenue amounted to 638 billion U.S. dollars. Amazon revenue segments Amazon is one of the biggest online companies worldwide. In 2019, the company’s revenue increased by 21 percent, compared to Google’s revenue growth during the same fiscal period, which was just 18 percent. The majority of Amazon’s net sales are generated through its North American business segment, which accounted for 236.3 billion U.S. dollars in 2020. The United States are the company’s leading market, followed by Germany and the United Kingdom. Business segment: Amazon Web Services Amazon Web Services, commonly referred to as AWS, is one of the strongest-growing business segments of Amazon. AWS is a cloud computing service that provides individuals, companies and governments with a wide range of computing, networking, storage, database, analytics and application services, among many others. As of the third quarter of 2020, AWS accounted for approximately 32 percent of the global cloud infrastructure services vendor market.

  16. Facebook: distribution of global audiences 2024, by age and gender

    • statista.com
    • es.statista.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stacy Jo Dixon, Facebook: distribution of global audiences 2024, by age and gender [Dataset]. https://www.statista.com/topics/1164/social-networks/
    Explore at:
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Stacy Jo Dixon
    Description

    As of April 2024, it was found that men between the ages of 25 and 34 years made up Facebook largest audience, accounting for 18.4 percent of global users. Additionally, Facebook's second largest audience base could be found with men aged 18 to 24 years.

                  Facebook connects the world
    
                  Founded in 2004 and going public in 2012, Facebook is one of the biggest internet companies in the world with influence that goes beyond social media. It is widely considered as one of the Big Four tech companies, along with Google, Apple, and Amazon (all together known under the acronym GAFA). Facebook is the most popular social network worldwide and the company also owns three other billion-user properties: mobile messaging apps WhatsApp and Facebook Messenger,
                  as well as photo-sharing app Instagram. Facebook usersThe vast majority of Facebook users connect to the social network via mobile devices. This is unsurprising, as Facebook has many users in mobile-first online markets. Currently, India ranks first in terms of Facebook audience size with 378 million users. The United States, Brazil, and Indonesia also all have more than 100 million Facebook users each.
    
  17. Amazon Web Services: year-on-year growth 2014-2025

    • statista.com
    Updated May 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Amazon Web Services: year-on-year growth 2014-2025 [Dataset]. https://www.statista.com/statistics/422273/yoy-quarterly-growth-aws-revenues/
    Explore at:
    Dataset updated
    May 13, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    Worldwide
    Description

    In the first quarter of 2025, revenues of Amazon Web Services (AWS) rose to 17 percent, a decrease from the previous three quarters. AWS is one of Amazon’s strongest revenue segments, generating over 115 billion U.S. dollars in 2024 net sales, up from 105 billion U.S. dollars in 2023. Amazon Web Services Amazon Web Services (AWS) provides on-demand cloud platforms and APIs through a pay-as-you-go-model to customers. AWS launched in 2002 providing general services and tools and produced its first cloud products in 2006. Today, more than 175 different cloud services for a variety of technologies and industries are released already. AWS ranks as one of the most popular public cloud infrastructure and platform services running applications worldwide in 2020, ahead of Microsoft Azure and Google cloud services. Cloud computing Cloud computing is essentially the delivery of online computing services to customers. As enterprises continually migrate their applications and data to the cloud instead of storing it on local machines, it becomes possible to access resources from different locations. Some of the key services of the AWS ecosystem for cloud applications include storage, database, security tools, and management tools. AWS is among the most popular cloud providers Some of the largest globally operating enterprises use AWS for their cloud services, including Netflix, BBC, and Baidu. Accordingly, AWS is one of the leading cloud providers in the global cloud market. Due to its continuously expanding portfolio of services and deepening of expertise, the company continues to be not only an important cloud service provider but also a business partner.

  18. Smartphone penetration worldwide 2024, by country

    • statista.com
    Updated Dec 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista Research Department (2024). Smartphone penetration worldwide 2024, by country [Dataset]. https://www.statista.com/topics/4872/mobile-payments-worldwide/
    Explore at:
    Dataset updated
    Dec 17, 2024
    Dataset provided by
    Statistahttp://statista.com/
    Authors
    Statista Research Department
    Description

    The smartphone penetration ranking is led by Canada with 97 percent, while the United Arab Emirates is following with 97 percent. In contrast, Mozambique is at the bottom of the ranking with 9.48 percent, showing a difference of 87.52 percentage points to Canada. The penetration rate refers to the share of the total population.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).

  19. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
Organization logo

Meta Kaggle Code

Kaggle's public data on notebook code

Explore at:
zip(151045619431 bytes)Available download formats
Dataset updated
Jul 31, 2025
Dataset authored and provided by
Kagglehttp://kaggle.com/
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!

Search
Clear search
Close search
Google apps
Main menu