18 datasets found

Meta Kaggle Code
kaggle.com
zip
Updated Jul 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code
Explore at:
zip(151045619431 bytes)Available download formats
Dataset updated
Jul 31, 2025
Dataset authored and provided by
Kagglehttp://kaggle.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!
Google Landmarks Dataset v2
github.com
opendatalab.com
Updated Sep 27, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google (2019). Google Landmarks Dataset v2 [Dataset]. https://github.com/cvdfoundation/google-landmark
Explore at:
Dataset updated
Sep 27, 2019
Dataset provided by
Googlehttp://google.com/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the second version of the Google Landmarks dataset (GLDv2), which contains images annotated with labels representing human-made and natural landmarks. The dataset can be used for landmark recognition and retrieval experiments. This version of the dataset contains approximately 5 million images, split into 3 sets of images: train, index and test. The dataset was presented in our CVPR'20 paper. In this repository, we present download links for all dataset files and relevant code for metric computation. This dataset was associated to two Kaggle challenges, on landmark recognition and landmark retrieval. Results were discussed as part of a CVPR'19 workshop. In this repository, we also provide scores for the top 10 teams in the challenges, based on the latest ground-truth version. Please visit the challenge and workshop webpages for more details on the data, tasks and technical solutions from top teams.
Google Trends
console.cloud.google.com
Updated Oct 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=JA&inv=1&invt=Ab3yrg (2023). Google Trends [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-search-trends?hl=JA
Explore at:
Dataset updated
Oct 12, 2023
Dataset provided by
Google Searchhttp://google.com/
Googlehttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Description
The Google Trends dataset will provide critical signals that individual users and businesses alike can leverage to make better data-driven decisions. This dataset simplifies the manual interaction with the existing Google Trends UI by automating and exposing anonymized, aggregated, and indexed search data in BigQuery. This dataset includes the Top 25 stories and Top 25 Rising queries from Google Trends. It will be made available as two separate BigQuery tables, with a set of new top terms appended daily. Each set of Top 25 and Top 25 rising expires after 30 days, and will be accompanied by a rolling five-year window of historical data in 210 distinct locations in the United States. This Google dataset is hosted in Google BigQuery as part of Google Cloud's Datasets solution and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
Google Trends - International
console.cloud.google.com
Updated May 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:BigQuery%20Public%20Datasets%20Program&hl=it&inv=1&invt=Ab4Ugg (2023). Google Trends - International [Dataset]. https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/google-trends-intl?hl=it
Explore at:
Dataset updated
May 6, 2023
Dataset provided by
Google Searchhttp://google.com/
BigQueryhttps://cloud.google.com/bigquery
Googlehttp://google.com/
Description
The International Google Trends dataset will provide critical signals that individual users and businesses alike can leverage to make better data-driven decisions. This dataset simplifies the manual interaction with the existing Google Trends UI by automating and exposing anonymized, aggregated, and indexed search data in BigQuery. This dataset includes the Top 25 stories and Top 25 Rising queries from Google Trends. It will be made available as two separate BigQuery tables, with a set of new top terms appended daily. Each set of Top 25 and Top 25 rising expires after 30 days, and will be accompanied by a rolling five-year window of historical data for each country and region across the globe, where data is available. This Google dataset is hosted in Google BigQuery as part of Google Cloud's Datasets solution and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery
a
Data from: Google Earth Engine (GEE)
catalog-usgs.opendata.arcgis.com
data.amerigeoss.org
+4more
Updated Nov 28, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AmeriGEOSS (2018). Google Earth Engine (GEE) [Dataset]. https://catalog-usgs.opendata.arcgis.com/datasets/amerigeoss::google-earth-engine-gee
Explore at:
Dataset updated
Nov 28, 2018
Dataset authored and provided by
AmeriGEOSS
Description
Meet Earth EngineGoogle Earth Engine combines a multi-petabyte catalog of satellite imagery and geospatial datasets with planetary-scale analysis capabilities and makes it available for scientists, researchers, and developers to detect changes, map trends, and quantify differences on the Earth's surface.SATELLITE IMAGERY+YOUR ALGORITHMS+REAL WORLD APPLICATIONSLEARN MOREGLOBAL-SCALE INSIGHTExplore our interactive timelapse viewer to travel back in time and see how the world has changed over the past twenty-nine years. Timelapse is one example of how Earth Engine can help gain insight into petabyte-scale datasets.EXPLORE TIMELAPSEREADY-TO-USE DATASETSThe public data archive includes more than thirty years of historical imagery and scientific datasets, updated and expanded daily. It contains over twenty petabytes of geospatial data instantly available for analysis.EXPLORE DATASETSSIMPLE, YET POWERFUL APIThe Earth Engine API is available in Python and JavaScript, making it easy to harness the power of Google’s cloud for your own geospatial analysis.EXPLORE THE APIGoogle Earth Engine has made it possible for the first time in history to rapidly and accurately process vast amounts of satellite imagery, identifying where and when tree cover change has occurred at high resolution. Global Forest Watch would not exist without it. For those who care about the future of the planet Google Earth Engine is a great blessing!-Dr. Andrew Steer, President and CEO of the World Resources Institute.CONVENIENT TOOLSUse our web-based code editor for fast, interactive algorithm development with instant access to petabytes of data.LEARN ABOUT THE CODE EDITORSCIENTIFIC AND HUMANITARIAN IMPACTScientists and non-profits use Earth Engine for remote sensing research, predicting disease outbreaks, natural resource management, and more.SEE CASE STUDIESREADY TO BE PART OF THE SOLUTION?SIGN UP NOWTERMS OF SERVICE PRIVACY ABOUT GOOGLE
NPPES Plan and Provider Enumeration System
kaggle.com
zip
Updated Mar 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Medicare & Medicaid Services (2019). NPPES Plan and Provider Enumeration System [Dataset]. https://www.kaggle.com/cms/nppes
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 20, 2019
Dataset authored and provided by
Centers for Medicare & Medicaid Services
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The CMS National Plan and Provider Enumeration System (NPPES) was developed as part of the Administrative Simplification provisions in the original HIPAA act. The primary purpose of NPPES was to develop a unique identifier for each physician that billed medicare and medicaid. This identifier is now known as the National Provider Identifier Standard (NPI) which is a required 10 digit number that is unique to an individual provider at the national level.

Once an NPI record is assigned to a healthcare provider, parts of the NPI record that have public relevance, including the provider’s name, speciality, and practice address are published in a searchable website as well as downloadable file of zipped data containing all of the FOIA disclosable health care provider data in NPPES and a separate PDF file of code values which documents and lists the descriptions for all of the codes found in the data file.

Content

The dataset contains the latest NPI downloadable file in an easy to query BigQuery table, npi_raw. In addition, there is a second table, npi_optimized which harnesses the power of Big Query’s next-generation columnar storage format to provide an analytical view of the NPI data containing description fields for the codes based on the mappings in Data Dissemination Public File - Code Values documentation as well as external lookups to the healthcare provider taxonomy codes . While this generates hundreds of columns, BigQuery makes it possible to process all this data effectively and have a convenient single lookup table for all provider information.

Fork this kernel to get started.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:nppes?_ga=2.117120578.-577194880.1523455401

https://console.cloud.google.com/marketplace/details/hhs/nppes?filter=category:science-research

Dataset Source: Center for Medicare and Medicaid Services. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source - http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @rawpixel from Unplash.

Inspiration

What are the top ten most common types of physicians in Mountain View?

What are the names and phone numbers of dentists in California who studied public health?
Ethereum Blockchain
kaggle.com
zip
Updated Mar 4, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2019). Ethereum Blockchain [Dataset]. https://www.kaggle.com/datasets/bigquery/ethereum-blockchain
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Mar 4, 2019
Dataset provided by
BigQueryhttps://cloud.google.com/bigquery
Authors
Google BigQuery
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Bitcoin and other cryptocurrencies have captured the imagination of technologists, financiers, and economists. Digital currencies are only one application of the underlying blockchain technology. Like its predecessor, Bitcoin, the Ethereum blockchain can be described as an immutable distributed ledger. However, creator Vitalik Buterin also extended the set of capabilities by including a virtual machine that can execute arbitrary code stored on the blockchain as smart contracts.

Both Bitcoin and Ethereum are essentially OLTP databases, and provide little in the way of OLAP (analytics) functionality. However the Ethereum dataset is notably distinct from the Bitcoin dataset:

The Ethereum blockchain has as its primary unit of value Ether, while the Bitcoin blockchain has Bitcoin. However, the majority of value transfer on the Ethereum blockchain is composed of so-called tokens. Tokens are created and managed by smart contracts.

Ether value transfers are precise and direct, resembling accounting ledger debits and credits. This is in contrast to the Bitcoin value transfer mechanism, for which it can be difficult to determine the balance of a given wallet address.

Addresses can be not only wallets that hold balances, but can also contain smart contract bytecode that allows the programmatic creation of agreements and automatic triggering of their execution. An aggregate of coordinated smart contracts could be used to build a decentralized autonomous organization.

Content

The Ethereum blockchain data are now available for exploration with BigQuery. All historical data are in the ethereum_blockchain dataset, which updates daily.

Our hope is that by making the data on public blockchain systems more readily available it promotes technological innovation and increases societal benefits.

Querying BigQuery tables

You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.crypto_ethereum.[TABLENAME]. Fork this kernel to get started.

Acknowledgements

Cover photo by Thought Catalog on Unsplash

Inspiration

What are the most popularly exchanged digital tokens, represented by ERC-721 and ERC-20 smart contracts?

Compare transaction volume and transaction networks over time

Compare transaction volume to historical prices by joining with other available data sources like Bitcoin Historical Data
Z
Dataset: A Systematic Literature Review on the topic of High-value datasets
data.niaid.nih.gov
zenodo.org
Updated Jun 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrea Miletič (2023). Dataset: A Systematic Literature Review on the topic of High-value datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7944424
Explore at:
Dataset updated
Jun 23, 2023
Dataset provided by
Andrea Miletič
Nina Rizun
Magdalena Ciesielska
Anastasija Nikiforova
Charalampos Alexopoulos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains data collected during a study ("Towards High-Value Datasets determination for data-driven development: a systematic literature review") conducted by Anastasija Nikiforova (University of Tartu), Nina Rizun, Magdalena Ciesielska (Gdańsk University of Technology), Charalampos Alexopoulos (University of the Aegean) and Andrea Miletič (University of Zagreb) It being made public both to act as supplementary data for "Towards High-Value Datasets determination for data-driven development: a systematic literature review" paper (pre-print is available in Open Access here -> https://arxiv.org/abs/2305.10234) and in order for other researchers to use these data in their own work.

The protocol is intended for the Systematic Literature review on the topic of High-value Datasets with the aim to gather information on how the topic of High-value datasets (HVD) and their determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks. The data in this dataset were collected in the result of the SLR over Scopus, Web of Science, and Digital Government Research library (DGRL) in 2023.

Methodology

To understand how HVD determination has been reflected in the literature over the years and what has been found by these studies to date, all relevant literature covering this topic has been studied. To this end, the SLR was carried out to by searching digital libraries covered by Scopus, Web of Science (WoS), Digital Government Research library (DGRL).

These databases were queried for keywords ("open data" OR "open government data") AND ("high-value data*" OR "high value data*"), which were applied to the article title, keywords, and abstract to limit the number of papers to those, where these objects were primary research objects rather than mentioned in the body, e.g., as a future work. After deduplication, 11 articles were found unique and were further checked for relevance. As a result, a total of 9 articles were further examined. Each study was independently examined by at least two authors.

To attain the objective of our study, we developed the protocol, where the information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information.

Test procedure Each study was independently examined by at least two authors, where after the in-depth examination of the full-text of the article, the structured protocol has been filled for each study. The structure of the survey is available in the supplementary file available (see Protocol_HVD_SLR.odt, Protocol_HVD_SLR.docx) The data collected for each study by two researchers were then synthesized in one final version by the third researcher.

Description of the data in this data set

Protocol_HVD_SLR provides the structure of the protocol Spreadsheets #1 provides the filled protocol for relevant studies. Spreadsheet#2 provides the list of results after the search over three indexing databases, i.e. before filtering out irrelevant studies

The information on each selected study was collected in four categories: (1) descriptive information, (2) approach- and research design- related information, (3) quality-related information, (4) HVD determination-related information

Descriptive information
1) Article number - a study number, corresponding to the study number assigned in an Excel worksheet 2) Complete reference - the complete source information to refer to the study 3) Year of publication - the year in which the study was published 4) Journal article / conference paper / book chapter - the type of the paper -{journal article, conference paper, book chapter} 5) DOI / Website- a link to the website where the study can be found 6) Number of citations - the number of citations of the article in Google Scholar, Scopus, Web of Science 7) Availability in OA - availability of an article in the Open Access 8) Keywords - keywords of the paper as indicated by the authors 9) Relevance for this study - what is the relevance level of the article for this study? {high / medium / low}

Approach- and research design-related information 10) Objective / RQ - the research objective / aim, established research questions 11) Research method (including unit of analysis) - the methods used to collect data, including the unit of analy-sis (country, organisation, specific unit that has been ana-lysed, e.g., the number of use-cases, scope of the SLR etc.) 12) Contributions - the contributions of the study 13) Method - whether the study uses a qualitative, quantitative, or mixed methods approach? 14) Availability of the underlying research data- whether there is a reference to the publicly available underly-ing research data e.g., transcriptions of interviews, collected data, or explanation why these data are not shared? 15) Period under investigation - period (or moment) in which the study was conducted 16) Use of theory / theoretical concepts / approaches - does the study mention any theory / theoretical concepts / approaches? If any theory is mentioned, how is theory used in the study?

Quality- and relevance- related information
17) Quality concerns - whether there are any quality concerns (e.g., limited infor-mation about the research methods used)? 18) Primary research object - is the HVD a primary research object in the study? (primary - the paper is focused around the HVD determination, sec-ondary - mentioned but not studied (e.g., as part of discus-sion, future work etc.))

HVD determination-related information
19) HVD definition and type of value - how is the HVD defined in the article and / or any other equivalent term? 20) HVD indicators - what are the indicators to identify HVD? How were they identified? (components & relationships, “input -> output") 21) A framework for HVD determination - is there a framework presented for HVD identification? What components does it consist of and what are the rela-tionships between these components? (detailed description) 22) Stakeholders and their roles - what stakeholders or actors does HVD determination in-volve? What are their roles? 23) Data - what data do HVD cover? 24) Level (if relevant) - what is the level of the HVD determination covered in the article? (e.g., city, regional, national, international)

Format of the file .xls, .csv (for the first spreadsheet only), .odt, .docx

Licenses or restrictions CC-BY

For more info, see README.txt
Linked Open Data Management Services: A Comparison
zenodo.org
data.niaid.nih.gov
Updated Sep 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert Nasarek; Robert Nasarek; Lozana Rossenova; Lozana Rossenova (2023). Linked Open Data Management Services: A Comparison [Dataset]. http://doi.org/10.5281/zenodo.7738424
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7738424
Dataset updated
Sep 18, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Robert Nasarek; Robert Nasarek; Lozana Rossenova; Lozana Rossenova
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Thanks to a variety of software services, it has never been easier to produce, manage and publish Linked Open Data. But until now, there has been a lack of an accessible overview to help researchers make the right choice for their use case. This dataset release will be regularly updated to reflect the latest data published in a comparison table developed in Google Sheets [1]. The comparison table includes the most commonly used LOD management software tools from NFDI4Culture to illustrate what functionalities and features a service should offer for the long-term management of FAIR research data, including:

ConedaKOR

LinkedDataHub

Metaphacts

Omeka S

ResearchSpace

Vitro

Wikibase

WissKI

The table presents two views based on a comparison system of categories developed iteratively during workshops with expert users and developers from the respective tool communities. First, a short overview with field values coming from controlled vocabularies and multiple-choice options; and a second sheet allowing for more descriptive free text additions. The table and corresponding dataset releases for each view mode are designed to provide a well-founded basis for evaluation when deciding on a LOD management service. The Google Sheet table will remain open to collaboration and community contribution, as well as updates with new data and potentially new tools, whereas the datasets released here are meant to provide stable reference points with version control.

The research for the comparison table was first presented as a paper at DHd2023, Open Humanities – Open Culture, 13-17.03.2023, Trier and Luxembourg [2].

[1] Non-editing access is available here: docs.google.com/spreadsheets/d/1FNU8857JwUNFXmXAW16lgpjLq5TkgBUuafqZF-yo8_I/edit?usp=share_link To get editing access contact the authors.

[2] Full paper will be made available open access in the conference proceedings.
T
speech_commands
tensorflow.org
datasets.activeloop.ai
+1more
Updated Jan 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). speech_commands [Dataset]. http://identifiers.org/arxiv:1804.03209
Explore at:
Unique identifier
https://identifiers.org/arxiv:1804.03209
Dataset updated
Jan 13, 2023
Description
An audio dataset of spoken words designed to help train and evaluate keyword spotting systems. Its primary goal is to provide a way to build and test small models that detect when a single word is spoken, from a set of ten target words, with as few false positives as possible from background noise or unrelated speech. Note that in the train and validation set, the label "unknown" is much more prevalent than the labels of the target words or background noise. One difference from the release version is the handling of silent segments. While in the test set the silence segments are regular 1 second files, in the training they are provided as long segments under "background_noise" folder. Here we split these background noise into 1 second clips, and also keep one of the files for the validation set.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('speech_commands', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
COVID-19 Community Mobility Reports
google.com
google.com.tr
+4more
csv, pdf
Updated Oct 17, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
COVID-19 Community Mobility Reports [Dataset]. https://www.google.com/covid19/mobility/
Explore at:
csv, pdfAvailable download formats
Dataset updated
Oct 17, 2022
Dataset provided by
Google Searchhttp://google.com/
Googlehttp://google.com/
Authors
Google
Description
As global communities responded to COVID-19, we heard from public health officials that the same type of aggregated, anonymized insights we use in products such as Google Maps would be helpful as they made critical decisions to combat COVID-19. These Community Mobility Reports aimed to provide insights into what changed in response to policies aimed at combating COVID-19. The reports charted movement trends over time by geography, across different categories of places such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential.
O uso do Instagram e TikTok como mecanismos de busca
zenodo.org
explore.openaire.eu
bin
Updated Jul 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Claudio de Jesus Junior; Claudio de Jesus Junior (2025). O uso do Instagram e TikTok como mecanismos de busca [Dataset]. http://doi.org/10.5281/zenodo.15802948
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15802948
Dataset updated
Jul 4, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Claudio de Jesus Junior; Claudio de Jesus Junior
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Conjunto de respostas do formulário "O uso do Instagram e TikTok como mecanismos de busca". O formulário foi aplicado de forma online para centenas de pessoas de diferentes faixas etárias e níveis de escolaridade com o intuito de identificar o comportamento de busca dos usuários no Instagram e TikTok em comparação à Pesquisa Google. No total, foram obtidas 231 respostas válidas e todas elas estão inclusas neste arquivo de forma anonimizada. O questionário foi aplicado via Google Forms, se mantendo apto para o recolhimento de respostas do dia 20 de julho de 2024 até 27 de julho de 2024.

Dataset from the form "The use of Instagram and TikTok as search engines." The form was administered online to hundreds of individuals from different age groups and educational levels, aiming to identify users' search behavior on Instagram and TikTok compared to Google Search. In total, 231 valid responses were collected, all of which are included in this file in an anonymized format. The questionnaire was administered via Google Forms and remained open for responses from July 20, 2024, to July 27, 2024.
T
glue
tensorflow.org
tensorflow.google.cn
+1more
Updated Dec 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). glue [Dataset]. https://www.tensorflow.org/datasets/catalog/glue
Explore at:
Dataset updated
Dec 6, 2022
Description
GLUE, the General Language Understanding Evaluation benchmark (https://gluebenchmark.com/) is a collection of resources for training, evaluating, and analyzing natural language understanding systems.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('glue', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
Google energy consumption 2011-2023
statista.com
ai-chatbox.pro
Updated Oct 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Google energy consumption 2011-2023 [Dataset]. https://www.statista.com/statistics/788540/energy-consumption-of-google/
Explore at:
Dataset updated
Oct 11, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
Google’s energy consumption has increased over the last few years, reaching 25.9 terawatt hours in 2023, up from 12.8 terawatt hours in 2019. The company has made efforts to make its data centers more efficient through customized high-performance servers, using smart temperature and lighting, advanced cooling techniques, and machine learning. Datacenters and energy Through its operations, Google pursues a more sustainable impact on the environment by creating efficient data centers that use less energy than the average, transitioning towards renewable energy, creating sustainable workplaces, and providing its users with the technological means towards a cleaner future for the future generations. Through its efficient data centers, Google has also managed to divert waste from its operations away from landfills. Reducing Google’s carbon footprint Google’s clean energy efforts is also related to their efforts to reduce their carbon footprint. Since their commitment to using 100 percent renewable energy, the company has met their targets largely through solar and wind energy power purchase agreements and buying renewable power from utilities. Google is one of the largest corporate purchasers of renewable energy in the world.
Global net revenue of Amazon 2014-2024, by product group
statista.com
ai-chatbox.pro
Updated Feb 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Global net revenue of Amazon 2014-2024, by product group [Dataset]. https://www.statista.com/statistics/672747/amazons-consolidated-net-revenue-by-segment/
Explore at:
Dataset updated
Feb 24, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
In 2024, Amazon's net revenue from subscription services segment amounted to 44.37 billion U.S. dollars. Subscription services include Amazon Prime, for which Amazon reported 200 million paying members worldwide at the end of 2020. The AWS category generated 107.56 billion U.S. dollars in annual sales. During the most recently reported fiscal year, the company’s net revenue amounted to 638 billion U.S. dollars. Amazon revenue segments Amazon is one of the biggest online companies worldwide. In 2019, the company’s revenue increased by 21 percent, compared to Google’s revenue growth during the same fiscal period, which was just 18 percent. The majority of Amazon’s net sales are generated through its North American business segment, which accounted for 236.3 billion U.S. dollars in 2020. The United States are the company’s leading market, followed by Germany and the United Kingdom. Business segment: Amazon Web Services Amazon Web Services, commonly referred to as AWS, is one of the strongest-growing business segments of Amazon. AWS is a cloud computing service that provides individuals, companies and governments with a wide range of computing, networking, storage, database, analytics and application services, among many others. As of the third quarter of 2020, AWS accounted for approximately 32 percent of the global cloud infrastructure services vendor market.

Facebook: distribution of global audiences 2024, by age and gender

statista.com
es.statista.com

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Stacy Jo Dixon, Facebook: distribution of global audiences 2024, by age and gender [Dataset]. https://www.statista.com/topics/1164/social-networks/

Explore at:

Dataset provided by

Statistahttp://statista.com/

Authors

Stacy Jo Dixon

Description

As of April 2024, it was found that men between the ages of 25 and 34 years made up Facebook largest audience, accounting for 18.4 percent of global users. Additionally, Facebook's second largest audience base could be found with men aged 18 to 24 years.

              Facebook connects the world

              Founded in 2004 and going public in 2012, Facebook is one of the biggest internet companies in the world with influence that goes beyond social media. It is widely considered as one of the Big Four tech companies, along with Google, Apple, and Amazon (all together known under the acronym GAFA). Facebook is the most popular social network worldwide and the company also owns three other billion-user properties: mobile messaging apps WhatsApp and Facebook Messenger,
              as well as photo-sharing app Instagram. Facebook usersThe vast majority of Facebook users connect to the social network via mobile devices. This is unsurprising, as Facebook has many users in mobile-first online markets. Currently, India ranks first in terms of Facebook audience size with 378 million users. The United States, Brazil, and Indonesia also all have more than 100 million Facebook users each.

Amazon Web Services: year-on-year growth 2014-2025
statista.com
Updated May 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Amazon Web Services: year-on-year growth 2014-2025 [Dataset]. https://www.statista.com/statistics/422273/yoy-quarterly-growth-aws-revenues/
Explore at:
Dataset updated
May 13, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
In the first quarter of 2025, revenues of Amazon Web Services (AWS) rose to 17 percent, a decrease from the previous three quarters. AWS is one of Amazon’s strongest revenue segments, generating over 115 billion U.S. dollars in 2024 net sales, up from 105 billion U.S. dollars in 2023. Amazon Web Services Amazon Web Services (AWS) provides on-demand cloud platforms and APIs through a pay-as-you-go-model to customers. AWS launched in 2002 providing general services and tools and produced its first cloud products in 2006. Today, more than 175 different cloud services for a variety of technologies and industries are released already. AWS ranks as one of the most popular public cloud infrastructure and platform services running applications worldwide in 2020, ahead of Microsoft Azure and Google cloud services. Cloud computing Cloud computing is essentially the delivery of online computing services to customers. As enterprises continually migrate their applications and data to the cloud instead of storing it on local machines, it becomes possible to access resources from different locations. Some of the key services of the AWS ecosystem for cloud applications include storage, database, security tools, and management tools. AWS is among the most popular cloud providers Some of the largest globally operating enterprises use AWS for their cloud services, including Netflix, BBC, and Baidu. Accordingly, AWS is one of the leading cloud providers in the global cloud market. Due to its continuously expanding portfolio of services and deepening of expertise, the company continues to be not only an important cloud service provider but also a business partner.
Smartphone penetration worldwide 2024, by country
statista.com
Updated Dec 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista Research Department (2024). Smartphone penetration worldwide 2024, by country [Dataset]. https://www.statista.com/topics/4872/mobile-payments-worldwide/
Explore at:
Dataset updated
Dec 17, 2024
Dataset provided by
Statistahttp://statista.com/
Authors
Statista Research Department
Description
The smartphone penetration ranking is led by Canada with 97 percent, while the United Arab Emirates is following with 97 percent. In contrast, Mozambique is at the bottom of the ranking with 9.48 percent, showing a difference of 87.52 percentage points to Canada. The penetration rate refers to the share of the total population.The shown data are an excerpt of Statista's Key Market Indicators (KMI). The KMI are a collection of primary and secondary indicators on the macro-economic, demographic and technological environment in up to 150 countries and regions worldwide. All indicators are sourced from international and national statistical offices, trade associations and the trade press and they are processed to generate comparable data sets (see supplementary notes under details for more information).
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Kaggle (2025). Meta Kaggle Code [Dataset]. https://www.kaggle.com/datasets/kaggle/meta-kaggle-code/code

Meta Kaggle Code

Kaggle's public data on notebook code

Explore at:

zip(151045619431 bytes)Available download formats

Dataset updated

Jul 31, 2025

Dataset authored and provided by

Kagglehttp://kaggle.com/

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Explore our public notebook content!

Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.

Why we’re releasing this dataset

By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.

Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.

The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!

Sensitive data

While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.

Joining with Meta Kaggle

The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.

File organization

The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.

The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays

Questions / Comments

We love feedback! Let us know in the Discussion tab.

Happy Kaggling!

Clear search

Close search

Google apps

Main menu

Meta Kaggle Code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments

Google Landmarks Dataset v2

Google Trends

Google Trends - International

Data from: Google Earth Engine (GEE)

NPPES Plan and Provider Enumeration System

Context

Content

Acknowledgements

Inspiration

Ethereum Blockchain

Context

Content

Querying BigQuery tables

Acknowledgements

Inspiration

Dataset: A Systematic Literature Review on the topic of High-value datasets

Linked Open Data Management Services: A Comparison

speech_commands

COVID-19 Community Mobility Reports

O uso do Instagram e TikTok como mecanismos de busca

glue

Google energy consumption 2011-2023

Global net revenue of Amazon 2014-2024, by product group

Facebook: distribution of global audiences 2024, by age and gender

Amazon Web Services: year-on-year growth 2014-2025

Smartphone penetration worldwide 2024, by country

Meta Kaggle Code

Kaggle's public data on notebook code

Explore our public notebook content!

Why we’re releasing this dataset

Sensitive data

Joining with Meta Kaggle

File organization

Questions / Comments