Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Meta Kaggle may not be the Rosetta Stone of data science, but we do think there's a lot to learn (and plenty of fun to be had) from this collection of rich data about Kaggle’s community and activity.
Strategizing to become a Competitions Grandmaster? Wondering who, where, and what goes into a winning team? Choosing evaluation metrics for your next data science project? The kernels published using this data can help. We also hope they'll spark some lively Kaggler conversations and be a useful resource for the larger data science community.
https://imgur.com/2Egeb8R.png" alt="Kaggle Leaderboard Performance">
This dataset is made available as CSV files through Kaggle Kernels. It contains tables on public activity from Competitions, Datasets, Kernels, Discussions, and more. The tables are updated daily.
Please note: This data is not a complete dump of our database. Rows, columns, and tables have been filtered out and transformed.
In August 2023, we released Meta Kaggle for Code, a companion to Meta Kaggle containing public, Apache 2.0 licensed notebook data. View the dataset and instructions for how to join it with Meta Kaggle here: https://www.kaggle.com/datasets/kaggle/meta-kaggle-code
We also updated the license on Meta Kaggle from CC-BY-NC-SA to Apache 2.0.
Facebook
Twitterhttps://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Dataset Card for The Stack Metadata
Changelog
Release Description
v1.1 This is the first release of the metadata. It is for The Stack v1.1
v1.2 Metadata dataset matching The Stack v1.2
Dataset Summary
This is a set of additional information for repositories used for The Stack. It contains file paths, detected licenes as well as some other information for the repositories.
Supported Tasks and Leaderboards
The main task is to recreate… See the full description on the dataset page: https://huggingface.co/datasets/bigcode/the-stack-metadata.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is a curated subset of the Amazon Berkeley Objects (ABO) dataset, tailored specifically for multimodal applications like Visual Question Answering (VQA). It merges product metadata and image identifiers into a unified format, enabling rapid development and prototyping of multimodal AI models.
Each entry in the dataset corresponds to a unique product listing and includes structured information suitable for downstream tasks like: - Multilingual product description understanding - Image-grounded question generation - Metadata-aware classification and retrieval
| Field Name | Description |
|---|---|
brand | Brand name of the product (e.g., AmazonBasics, Solimo) |
bullet_point | Short description points highlighting features |
color | Product color (e.g., White Powder Coat, Multicolor) |
item_id | Unique identifier for the product |
item_keywords | A collection of search-relevant tags |
item_name | Official product title |
main_image_id | Main image identifier (used for image retrieval) |
other_image_id | Additional image identifiers |
product_type | Broad category like CELLULAR_PHONE_CASE, SHOES, etc. |
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
Unlock one of the most comprehensive movie datasets available—4.5 million structured IMDb movie records, extracted and enriched for data science, machine learning, and entertainment research.
This dataset includes a vast collection of global movie metadata, including details on title, release year, genre, country, language, runtime, cast, directors, IMDb ratings, reviews, and synopsis. Whether you're building a recommendation engine, benchmarking trends, or training AI models, this dataset is designed to give you deep and wide access to cinematic data across decades and continents.
Perfect for use in film analytics, OTT platforms, review sentiment analysis, knowledge graphs, and LLM fine-tuning, the dataset is cleaned, normalized, and exportable in multiple formats.
Genres: Drama, Comedy, Horror, Action, Sci-Fi, Documentary, and more
Train LLMs or chatbots on cinematic language and metadata
Build or enrich movie recommendation engines
Run cross-lingual or multi-region film analytics
Benchmark genre popularity across time periods
Power academic studies or entertainment dashboards
Feed into knowledge graphs, search engines, or NLP pipelines
Facebook
TwitterThe dataset consists of public domain acute and chronic toxicity and chemistry data for algal species. Data are accessible at: https://envirotoxdatabase.org/ Data include algal species, chemical identification, and the concentrations that do and do not affect algal growth.
Facebook
TwitterThis dataset was created by Nicole Wong98
Facebook
Twitterhttps://crawlfeeds.com/privacy_policyhttps://crawlfeeds.com/privacy_policy
This comprehensive dataset features detailed metadata for over 190,000 movies and TV shows, with a strong concentration in the Horror genre. It is ideal for entertainment research, machine learning models, genre-specific trend analysis, and content recommendation systems.
Each record contains rich information, making it perfect for streaming platforms, film industry analysts, or academic media researchers.
Primary Genre Focus: Horror
Build movie recommendation systems or genre classifiers
Train NLP models on movie descriptions
Analyze Horror content trends over time
Explore box office vs. rating correlations
Enrich entertainment datasets with directorial and cast metadata
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
A comprehensive Amazon books dataset featuring 20,000 books and 727,876 reviews spanning 26 years (1997-2023), paired with a complete step-by-step data science tutorial. Perfect for learning data analytics from scratch or conducting advanced book market analysis.
What's Included:
Raw Data: 20K book metadata (titles, authors, prices, ratings, descriptions) + 727K detailed reviews Complete Tutorial Series: 4 progressive Python scripts covering data loading, cleaning, exploratory analysis, and visualization Ready-to-Run Code: Fully documented scripts with practice exercises Educational Focus: Designed for ENTR 3901 coursework but suitable for all skill levels Key Features:
Real-world e-commerce data (pre-filtered for quality: 200+ reviews, $5+ price) Comprehensive documentation and setup instructions Generates 6+ professional visualizations Includes bonus analysis challenges (sentiment analysis, price optimization, time patterns) Perfect for business analytics, market research, and data science education Use Cases:
Learning data analytics fundamentals Book market analysis and trends Customer behavior insights Price optimization studies Review sentiment analysis Academic coursework and projects This dataset bridges the gap between raw data and practical learning, making it ideal for both beginners and experienced analysts looking to explore e-commerce patterns in the publishing industry.
Facebook
Twitterhttps://www.usa.gov/government-workshttps://www.usa.gov/government-works
HHS Metadata Standard: Version 1.0, published in July 2025, serves as the authoritative framework for defining HHS metadata—data about data—fields and attributes. Aligned with the Evidence Act and HealthData.gov, this standard establishes clear guidelines for metadata collection and public sharing across all data assets created, collected, managed, or maintained by HHS. It outlines required metadata fields for HHS datasets, ensuring consistency, interoperability, and discoverability in HHS data governance.
Facebook
TwitterThis is the metadata from DICOM files for UNIFESP X-ray Body Part Competition in csv format
Competition and original dataset:
https://www.kaggle.com/competitions/unifesp-x-ray-body-part-classifier/
Acknowledgements We thank Sarah Lustosa Haiek, Julia Tagliaferri, Lucas Diniz, and Rogerio Jadjiski for annotating this dataset. We thank the PI Nitamar Abdala, MD, PhD, for supporting this work. We thank Ernandez, our PACS admin, and Jefferson, our IT manager. We thank MD.ai for providing the annotation platform.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Metadata of a Large Sonar and Stereo Camera Dataset Suitable for Sonar-to-RGB Image Translation
Introduction
This is a set of metadata describing a large dataset of synchronized sonar and stereo camera recordings, that were captured between August 2021 and September 2023 during the project DeeperSense (https://robotik.dfki-bremen.de/en/research/projects/deepersense/), as training data for Sonar-to-RGB image translation. Parts of the sensor data have been published (https://zenodo.org/records/7728089, https://zenodo.org/records/10220989). Due to the size of the sensor data corpus, it is currently impractical to make the entire corpus accessible online. Instead, this metadatabase serves as a relatively compact representation, allowing interested researchers to inspect the data, and select relevant portions for their particular use case, which will be made available on demand. This is an effort to comply with the FAIR principle A2 (https://www.go-fair.org/fair-principles/) that metadata shall be accessible, even when the base data is not immediately.
Locations and sensors
The sensor data was captured at four different locations, including one laboratory (Maritime Exploration Hall at DFKI RIC Bremen) and three field locations (Chalk Lake Hemmoor, Tank Wash Basin Neu-Ulm, Lake Starnberg). At all locations, a ZED camera and a Blueprint Oculus M1200d sonar were used. Additionally, a SeaVision camera was used at the Maritime Exploration Hall at DFKI RIC Bremen and at the Chalk Lake Hemmoor. The examples/ directory holds a typical output image for each sensor at each available location.
Data volume per session
Six data collection sessions were conducted. The table below presents an overview of the amount of data captured in each session:
Session dates Location Number of datasets Total duration of datasets [h] Total logfile size [GB] Number of images Total image size [GB]
2021-08-09 - 2021-08-12 Maritime Exploration Hall at DFKI RIC Bremen 52 10.8 28.8 389’047 88.1
2022-02-07 - 2022-02-08 Maritime Exploration Hall at DFKI RIC Bremen 35 4.4 54.1 629’626 62.3
2022-04-26 - 2022-04-28 Chalk Lake Hemmoor 52 8.1 133.6 1’114’281 97.8
2022-06-28 - 2022-06-29 Tank Wash Basin Neu-Ulm 42 6.7 144.2 824’969 26.9
2023-04-26 - 2023-04-27 Maritime Exploration Hall at DFKI RIC Bremen 55 7.4 141.9 739’613 9.6
2023-09-01 - 2023-09-02 Lake Starnberg 19 2.9 40.1 217’385 2.3
255 40.3 542.7 3’914’921 287.0
Data and metadata structure
Sensor data corpus
The sensor data corpus comprises two processing stages:
raw data streams stored in ROS bagfiles (aka logfiles),
camera and sonar images (aka datafiles) extracted from the logfiles.
The files are stored in a file tree hierarchy which groups them by session, dataset, and modality:
${session_key}/ ${dataset_key}/ ${logfile_name} ${modality_key}/ ${datafile_name}
A typical logfile path has this form:
2023-09_starnberg_lake/ 2023-09-02-15-06_hydraulic_drill/ stereo_camera-zed-2023-09-02-15-06-07.bag
A typical datafile path has this form:
2023-09_starnberg_lake/ 2023-09-02-15-06_hydraulic_drill/ zed_right/ 1693660038_368077993.jpg
All directory and file names, and their particles, are designed to serve as identifiers in the metadatabase. Their formatting, as well as the definitions of all terms, are documented in the file entities.json.
Metadatabase
The metadatabase is provided in two equivalent forms:
as a standalone SQLite (https://www.sqlite.org/index.html) database file metadata.sqlite for users familiar with SQLite,
as a collection of CSV files in the csv/ directory for users who prefer other tools.
The database file has been generated from the CSV files, so each database table holds the same information as the corresponding CSV file. In addition, the metadatabase contains a series of convenience views that facilitate access to certain aggregate information.
An entity relationship diagram of the metadatabase tables is stored in the file entity_relationship_diagram.png. Each entity, its attributes, and relations are documented in detail in the file entities.json
Some general design remarks:
For convenience, timestamps are always given in both a human-readable form (ISO 8601 formatted datetime strings with explicit local time zone), and as seconds since the UNIX epoch.
In practice, each logfile always contains a single stream, and each stream is stored always in a single logfile. Per database schema however, the entities stream and logfile are modeled separately, with a “many-streams-to-one-logfile” relationship. This design was chosen to be compatible with, and open for, data collections where a single logfile contains multiple streams.
A modality is not an attribute of a sensor alone, but of a datafile: Because a sensor is an attribute of a stream, and a single stream may be the source of multiple modalities (e.g. RGB vs. grayscale images from the same camera, or cartesian vs. polar projection of the same sonar output). Conversely, the same modality may originate from different sensors.
As a usage example, the data volume per session which is tabulated at the top of this document, can be extracted from the metadatabase with the following SQL query:
SELECT PRINTF( '%s - %s', SUBSTR(session_start, 1, 10), SUBSTR(session_end, 1, 10)) AS 'Session dates', location_name_english AS Location, number_of_datasets AS 'Number of datasets', total_duration_of_datasets_h AS 'Total duration of datasets [h]', total_logfile_size_gb AS 'Total logfile size [GB]', number_of_images AS 'Number of images', total_image_size_gb AS 'Total image size [GB]' FROM location JOIN session USING (location_id) JOIN ( SELECT session_id, COUNT(dataset_id) AS number_of_datasets, ROUND( SUM(dataset_duration) / 3600, 1) AS total_duration_of_datasets_h, ROUND( SUM(total_logfile_size) / 10e9, 1) AS total_logfile_size_gb FROM location JOIN session USING (location_id) JOIN dataset USING (session_id) JOIN view_dataset_total_logfile_size USING (dataset_id) GROUP BY session_id ) USING (session_id) JOIN ( SELECT session_id, COUNT(datafile_id) AS number_of_images, ROUND(SUM(datafile_size) / 10e9, 1) AS total_image_size_gb FROM session JOIN dataset USING (session_id) JOIN stream USING (dataset_id) JOIN datafile USING (stream_id) GROUP BY session_id ) USING (session_id) ORDER BY session_id;
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This master metadata spreadsheet documents all of the Gede ruins heritage items published by the Zamani Project.The information in this site description is provided for contextual purposes only and should not be regarded as a primary source.Gede is a Swahili archaeological site comprising coral stone structures, including mosques, houses, and tombs arranged within a walled town layout. Architectural features such as mihrabs, water cisterns, and decorative niches reflect Islamic influence and urban planning. Excavations have revealed trade goods and domestic artifacts, indicating participation in Indian Ocean commerce. Gede provides insights into Swahili cultural identity, religious practice, and economic networks.Gede is listed as the UNESCO World Heritage Site, 'The Historic Town and Archaeological Site of Gedi'.The Zamani Project seeks to increase awareness and knowledge of tangible cultural heritage in Africa and internationally by creating metrically accurate digital representations of historical sites. Digital spatial data of cultural heritage sites can be used for research and education, for restoration and conservation, and as a record for future generations. The Zamani Project operates as a non-profit organisation within the University of Cape Town.Special thanks to the Saville Foundation, and the Andrew W. Mellon Foundation, among others, for their contributions to the digital documentation of this heritage site.If you believe any information in this description is incorrect, please contact the repository administrators.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises of metada of articles citing retracted publications. Originally, we obtained the DOIs from the Feet of Clay Detector of the Problematic Paper Screener (PPS - FoCD). Additional columns that were not provided in PPS were added using Crossref & Retraction Watch Database (CRxRW) and Dimensions API services. This detector flags publications that cite retracted articles with additional metadata.
By querying the Dimensions API with the DOIs of the FoC articles, we acquired information such as more detailed document types (editorial, review article, research article), open access status (we only kept open access FoC articles in the dataset since we want to access the full-texts in the future), and research fields (classified according to the Australian and New Zealand Standard Research Classification (ANZSRC) Fields of Research (FoR), comprising of 23 main fields such as biological sciences, education.
To get further information about the cited retracted articles in the dataset, we used the joint release of CRxRW. Using this dataset, we added the retraction reasons and retraction years.
The original dataset was obtained from the PPS FoCD in December 2023. At this time there were 22558 total articles flagged in FoCD. Using the data filtering feature in PPS, we had a preliminary selection before downloading the first version of the dataset. We applied a filter to obtain:
More information about the usage of this dataset will be updated.
*Current retraction status of the citing articles can be different since this is a static dataset and scientific literature is dynamic.
Facebook
TwitterThe OpenScience Slovenia metadata dataset contains metadata entries for Slovenian public domain academic documents which include undergraduate and postgraduate theses, research and professional articles, along with other academic document types. The data within the dataset was collected as a part of the establishment of the Slovenian Open-Access Infrastructure which defined a unified document collection process and cataloguing for universities in Slovenia within the infrastructure repositories. The data was collected from several already established but separate library systems in Slovenia and merged into a single metadata scheme using metadata deduplication and merging techniques. It consists of text and numerical fields, representing attributes that describe documents. These attributes include document titles, keywords, abstracts, typologies, authors, issue years and other identifiers such as URL and UDC. The potential of this dataset lies especially in text mining and text classification tasks and can also be used in development or benchmarking of content-based recommender systems on real-world data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
FSCBAC Book Metadata Dataset
This dataset contains structured metadata for children's books aligned with the FSCBAC Standard 3.1.0.It provides machine-readable entries describing book structure, linguistic load, emotional intensity, visual load, developmental purpose, and recommended usage. The dataset does not modify or extend the FSCBAC Standard.It functions as a dataset-only layer and references the standard externally.
Files Included
books.json — full dataset of… See the full description on the dataset page: https://huggingface.co/datasets/fscbac-standard/fscbac-book-metadata-dataset.
Facebook
TwitterPoint of Interest (POI) is defined as an entity (such as a business) at a ground location (point) which may be (of interest). We provide high-quality POI data that is fresh, consistent, customizable, easy to use and with high-density coverage for all countries of the world.
This is our process flow:
Our machine learning systems continuously crawl for new POI data
Our geoparsing and geocoding calculates their geo locations
Our categorization systems cleanup and standardize the datasets
Our data pipeline API publishes the datasets on our data store
A new POI comes into existence. It could be a bar, a stadium, a museum, a restaurant, a cinema, or store, etc.. In today's interconnected world its information will appear very quickly in social media, pictures, websites, press releases. Soon after that, our systems will pick it up.
POI Data is in constant flux. Every minute worldwide over 200 businesses will move, over 600 new businesses will open their doors and over 400 businesses will cease to exist. And over 94% of all businesses have a public online presence of some kind tracking such changes. When a business changes, their website and social media presence will change too. We'll then extract and merge the new information, thus creating the most accurate and up-to-date business information dataset across the globe.
We offer our customers perpetual data licenses for any dataset representing this ever changing information, downloaded at any given point in time. This makes our company's licensing model unique in the current Data as a Service - DaaS Industry. Our customers don't have to delete our data after the expiration of a certain "Term", regardless of whether the data was purchased as a one time snapshot, or via our data update pipeline.
Customers requiring regularly updated datasets may subscribe to our Annual subscription plans. Our data is continuously being refreshed, therefore subscription plans are recommended for those who need the most up to date data. The main differentiators between us vs the competition are our flexible licensing terms and our data freshness.
Data samples may be downloaded at https://store.poidata.xyz/us
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This dataset contains additional metadata on datasets, useful for automatic registering of datasets on the data.gov.uk system
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains the metadata of the datasets published in 118 Dataverse installations, information about the metadata blocks of 118 installations, and the lists of pre-defined licenses or dataset terms that depositors can apply to datasets in the 100 installations that were running versions of the Dataverse software that include the "multiple-license" feature. The data is useful for improving understandings about how certain Dataverse features and metadata fields are used and for learning about the quality of dataset and file-level metadata within and across Dataverse installations. How the metadata was downloaded The dataset metadata and metadata block JSON files were downloaded from each installation between August 25 and September 2, 2025 using a Python script that uses the Dataverse API. How the files are organized ├── csv_files_with_metadata_from_most_known_dataverse_installations │ ├── author(citation)_2025.08.25-2025.09.02.csv │ ├── contributor(citation)_2025.08.25-2025.09.02.csv │ ├── data_source(citation)_2025.08.25-2025.09.02.csv │ ├── ... │ └── topic_classification(citation)_2025.08.25-2025.09.02.csv ├── dataverse_json_metadata_from_each_known_dataverse_installation │ ├── Abacus_2025.08.26_07.14.00.zip │ ├── dataset_pids_Abacus_2025.08.26_07.14.00.csv │ ├── Dataverse_JSON_metadata_2025.08.26_07.14.00 │ ├── hdl_11272.1_AB2_0AQZNT_v1.0(latest_version).json │ ├── ... │ ├── metadatablocks_v5.9 │ ├── astrophysics_v5.9.json │ ├── biomedical_v5.9.json │ ├── citation_v5.9.json │ ├── ... │ ├── socialscience_v5.6.json │ ├── ACSS_Dataverse_2025.08.25_15.45.25.zip │ ├── ... │ └── Yale_Dataverse_2025.08.25_11.51.29.zip └── dataverse_installations_summary_2025.09.02.csv └── dataset_pids_from_most_known_dataverse_installations_2025.08.25-2025.09.02.csv └── license_options_for_each_dataverse_installation_2025.08.29_14.58.36.csv └── metadatablocks_from_most_known_dataverse_installations_2025.08.29.csv This dataset contains two directories and four CSV files not in a directory. One directory, "csv_files_with_metadata_from_most_known_dataverse_installations", contains 20 CSV files that list the values of many of the metadata fields in the "Citation" metadata block and "Geospatial" metadata block of datasets in the 118 Dataverse installations. For example, author(citation)_2025.08.25-2025.09.02.csv contains the "Author" metadata for the latest versions of all published, non-deaccessioned datasets in 118 installations, with a column for each of the four child fields: author name, affiliation, identifier type, and identifier. The other directory, "dataverse_json_metadata_from_each_known_dataverse_installation", contains 118 zip files, one zip file for each of the 118 Dataverse installations whose sites were functioning when I attempted to collect their metadata and that have at least one published dataset. Each zip file contains: A CSV file listing information about the datasets published in the installation, including a column to indicate if the Python script was able to download the Dataverse JSON metadata for each dataset. A directory with JSON files that have information about the installation's metadata fields, such as the field names and how they're organized. A directory of JSON files that contain the metadata of the installation's published, non-deaccessioned dataset versions in the Dataverse JSON metadata schema. The dataverse_installations_summary_2025.09.02.csv file contains information about each installation, including its name, URL, Dataverse software version, and counts of dataset metadata included and not included in this dataset. The dataset_pids_from_most_known_dataverse_installations_2025.08.25-2025.09.02.csv file contains the dataset PIDs of published datasets in 118 Dataverse installations, with a column to indicate if the Python script was able to download the dataset's metadata. It's a union of all "dataset_pids_....csv" files in each of the 118 zip files in the dataverse_json_metadata_from_each_known_dataverse_installation directory. The license_options_for_each_dataverse_installation_2025.08.29_14.58.36.csv file contains information about the licenses and data use agreements that some installations let depositors choose when creating datasets. When I collected this data, 100 of the available 118 installations were running versions of the Dataverse software that allow depositors to choose a "predefined license or data use agreement" from a dropdown menu in the dataset deposit form. For more information about this Dataverse feature, see https://guides.dataverse.org/en/6.7/user/dataset-management.html#choosing-a-license. The metadatablocks_from_most_known_dataverse_installations_2025.08.29.csv file contains the metadata block names, field names, child field names (if the field is a compound field), display names, descriptions/tooltip text, watermarks, and controlled vocabulary values of fields in the 118 Dataverse installations' metadata blocks. This file is useful for learning...
Facebook
TwitterMetadata for the data collected at the NEES@UCSB Garner Valley Downhole Array field site on September 10-12, 2013 as part of the larger PoroTomo project.
Facebook
Twittermrfakename/metadata dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Meta Kaggle may not be the Rosetta Stone of data science, but we do think there's a lot to learn (and plenty of fun to be had) from this collection of rich data about Kaggle’s community and activity.
Strategizing to become a Competitions Grandmaster? Wondering who, where, and what goes into a winning team? Choosing evaluation metrics for your next data science project? The kernels published using this data can help. We also hope they'll spark some lively Kaggler conversations and be a useful resource for the larger data science community.
https://imgur.com/2Egeb8R.png" alt="Kaggle Leaderboard Performance">
This dataset is made available as CSV files through Kaggle Kernels. It contains tables on public activity from Competitions, Datasets, Kernels, Discussions, and more. The tables are updated daily.
Please note: This data is not a complete dump of our database. Rows, columns, and tables have been filtered out and transformed.
In August 2023, we released Meta Kaggle for Code, a companion to Meta Kaggle containing public, Apache 2.0 licensed notebook data. View the dataset and instructions for how to join it with Meta Kaggle here: https://www.kaggle.com/datasets/kaggle/meta-kaggle-code
We also updated the license on Meta Kaggle from CC-BY-NC-SA to Apache 2.0.