Facebook
TwitterExcel spreadsheets by species (4 letter code is abbreviation for genus and species used in study, year 2010 or 2011 is year data collected, SH indicates data for Science Hub, date is date of file preparation). The data in a file are described in a read me file which is the first worksheet in each file. Each row in a species spreadsheet is for one plot (plant). The data themselves are in the data worksheet. One file includes a read me description of the column in the date set for chemical analysis. In this file one row is an herbicide treatment and sample for chemical analysis (if taken). This dataset is associated with the following publication: Olszyk , D., T. Pfleeger, T. Shiroyama, M. Blakely-Smith, E. Lee , and M. Plocher. Plant reproduction is altered by simulated herbicide drift toconstructed plant communities. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY. Society of Environmental Toxicology and Chemistry, Pensacola, FL, USA, 36(10): 2799-2813, (2017).
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Meta Kaggle Code is an extension to our popular Meta Kaggle dataset. This extension contains all the raw source code from hundreds of thousands of public, Apache 2.0 licensed Python and R notebooks versions on Kaggle used to analyze Datasets, make submissions to Competitions, and more. This represents nearly a decade of data spanning a period of tremendous evolution in the ways ML work is done.
By collecting all of this code created by Kaggle’s community in one dataset, we hope to make it easier for the world to research and share insights about trends in our industry. With the growing significance of AI-assisted development, we expect this data can also be used to fine-tune models for ML-specific code generation tasks.
Meta Kaggle for Code is also a continuation of our commitment to open data and research. This new dataset is a companion to Meta Kaggle which we originally released in 2016. On top of Meta Kaggle, our community has shared nearly 1,000 public code examples. Research papers written using Meta Kaggle have examined how data scientists collaboratively solve problems, analyzed overfitting in machine learning competitions, compared discussions between Kaggle and Stack Overflow communities, and more.
The best part is Meta Kaggle enriches Meta Kaggle for Code. By joining the datasets together, you can easily understand which competitions code was run against, the progression tier of the code’s author, how many votes a notebook had, what kinds of comments it received, and much, much more. We hope the new potential for uncovering deep insights into how ML code is written feels just as limitless to you as it does to us!
While we have made an attempt to filter out notebooks containing potentially sensitive information published by Kaggle users, the dataset may still contain such information. Research, publications, applications, etc. relying on this data should only use or report on publicly available, non-sensitive information.
The files contained here are a subset of the KernelVersions in Meta Kaggle. The file names match the ids in the KernelVersions csv file. Whereas Meta Kaggle contains data for all interactive and commit sessions, Meta Kaggle Code contains only data for commit sessions.
The files are organized into a two-level directory structure. Each top level folder contains up to 1 million files, e.g. - folder 123 contains all versions from 123,000,000 to 123,999,999. Each sub folder contains up to 1 thousand files, e.g. - 123/456 contains all versions from 123,456,000 to 123,456,999. In practice, each folder will have many fewer than 1 thousand files due to private and interactive sessions.
The ipynb files in this dataset hosted on Kaggle do not contain the output cells. If the outputs are required, the full set of ipynbs with the outputs embedded can be obtained from this public GCS bucket: kaggle-meta-kaggle-code-downloads. Note that this is a "requester pays" bucket. This means you will need a GCP account with billing enabled to download. Learn more here: https://cloud.google.com/storage/docs/requester-pays
We love feedback! Let us know in the Discussion tab.
Happy Kaggling!
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
"The BBC's Great Debate"
was broadcasted live in the UK by the BBC on Tuesday 21 June 2016 between 20:00 and 22:00 BST.
It saw activity on Twitter with the #BBCDebate hashtag. I collected some
of the Tweets tagged with #BBCDebate using a Google Spreadsheet.The raw data was downloaded as an Excel spreadsheet file
containing an archive of 38,166 Tweets (38,066 Unique Tweets) publicly
published with the queried hashtag (#BBCDebate) between 14/06/2016
22:03:18 and 22/06/2016 09:12:32 BST. Due to the expected high volume of
Tweets only users with at least 10 followers were included in the
archive. The Tweets contained in the Archive sheet were collected using Martin Hawksey’s TAGS 6.0. Given the relatively large volume of activity expected around #BBCDebate
and the public and political nature of the hashtag, I have only shared
indicative data. No full tweets nor any other associated metadata have been shared. The dataset contains a metrics summary as well
as a table with column headings labeled created_at, time,
geo_coordinates (anonymised; if there was data YES has been indicated; if no data was present the corresponding cell has been left blank),
user_lang and user_followers_count data corresponding to each Tweet.
Timestamps should suffice to prove the existence of the Tweets and could
be useful to run analyses of activity on Twitter around a real-time
media event.No Personally identifiable information (PII), nor Sensitive Personal
Information (SPI) was collected nor was contained in the dataset.Some basic deduplication and refining of the collected data performed.I
have shared the anonymised dataset including the extra tables as a sample and as
an act of citizen scholarship in order to archive, document and
encourage open educational and historical research and analysis. It is
hoped that by sharing the data someone else might be able to run
different analyses and ideally discover different or more significant
insights.For more information including methodological and limitation issues etc. please click on the references listed below.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data files presented relate to a pre-registered systematic review and meta-analysis of outpatient CBT for anorexia nervosa. It was conducted to assess the effectiveness of outpatient CBT for anorexia nervosa and explore potential moderators in order to inform clinical practice. Preregistration: PROSPERO (CRD42023484924)The following documents are intended for reading alongside the published paper: http://dx.doi.org/10.1080/16506073.2025.2465745 (published March 2025).The provided documents are: An Excel file containing a workbook with the dataset used in this review, named "main datafile". The front tab serves as a contents page, and further details on how the data were obtained are provided in PROSPERO and the accompanying Word document.*A zip file with CSV versions of each sheet in the above Excel workbook.A Word document titled Reviewer Guidelines for Full Paper Screening, Data Extraction, and Quality Assessment. This document contains the instructions the reviewers used for screening, data extraction, and quality assessment. A .txt version of this file is also provided. An Excel file containing the risk of bias assessments conducted for the included studies. The first tab provides overall guidance for each of the three risk-of-bias assessments conducted. Each set of assessments has another tab with guidance followed by the reviewer for each set of assessments.A zip file with CSV versions of each sheet in the above Excel worksheet.A Word document titled Holm-Bonferroni Corrections for Results Tables. A .txt version of this file. Populated Meta-Essentials workbooks for each of the included meta-analyses (.csv versions of these workbooks are not provided as Meta-Essentials only works in Excel). There are workbooks for each of the following variables: weight, eating disorder symptoms, depression, anxiety, and quality of life. Note. If you download any of the Meta-Essentials workbooks, please keep the following in mind:Workbooks must be opened in Microsoft Excel only, as other spreadsheet programs (e.g., Google Sheets, Numbers) will not run the calculations correctly.When you open a workbook, Excel may display a security warning. Please click “Enable Editing” and, if prompted, “Enable Content” so that all formulas and functions work properly.All formulas, references, and embedded calculations should work exactly as in the original file. The online preview in ORDA may not display calculations correctly but they should work in the downloaded files. If the files do not work for you, or if you wish to run analyses with the data file themselves, we have provided the dataset and you can download the Meta-Essentials workbooks at: https://www.eur.nl/en/erim/research-support/meta-essentials/download*Note. Rossi et al. (2023) was corrected to Rossi et al. (2024) in the published paper.
Facebook
TwitterTypically e-commerce datasets are proprietary and consequently hard to find among publicly available data. However, The UCI Machine Learning Repository has made this dataset containing actual transactions from 2010 and 2011. The dataset is maintained on their site, where it can be found by the title "Online Retail".
"This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers."
Per the UCI Machine Learning Repository, this data was made available by Dr Daqing Chen, Director: Public Analytics group. chend '@' lsbu.ac.uk, School of Engineering, London South Bank University, London SE1 0AA, UK.
Image from stocksnap.io.
Analyses for this dataset could include time series, clustering, classification and more.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Project is still being worked on.
Initially, this dataset was just for a Google Data Analytics project, where I was given a task to accomplish with the data in a spreadsheet: look at the table given in the spreadsheet, and see if there's a correlation between temperature and revenue in ice cream sales. Eventually, I did see the pattern: higher temperatures usually meant more revenue, which seems realistic. However, I wanted to dig further into the data and perform a deeper analysis using a visualization, and maybe even a regression. My new questions were, "How strong is this correlation?" and "Can we represent the data using a linear regression?"
Facebook
TwitterAny aspiring datascientist will look everything in view of data. Even when chilling with friends, watching cricket live and cheering for the favorite team.
It includes ODI, Test, t20 statistics of all the players in all the three category (batting ,bowling and fielding).
We wouldn't be here without the help of cricket. Thank you for all the great cricketers for the wonderful contribution.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By [source]
This dataset contains the coordinates of 11 anatomical landmarks on 14,354 pairs of field-collected tsetse fly wings. Accurately located with automatic deep learning by a two-tier method, this identification process is essential for those conducting morphological or biological research on the species Glossina pallidipes and G. m. morsitans. An accurate capture of these data points is both difficult and time-consuming — making our employee double tier method an invaluable resource for any researchers in need! Columns include morphology data such as wing length measurements, landmark locations, host collections, collection dates/months/years, morphometric data strings and more — allowing you to uncover new insights into these fascinating insects through detailed analysis! Unlock new discoveries within the natural world by exploring this exciting dataset today — from gaining insight into tsetse fly wing characteristics to larger implications regarding biology and evolution— you never know what exciting findings await!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
Step 1: Download the data from Kaggle. Unzip it and open it in your favorite spreadsheet software (e.g., Excel or Google Sheets).
Step 2: Become familiar with the two available data fields in ALDTTFW — wing length measurement ‘wlm' and distance between left and right wings ‘dis_l'. These two pieces of information are extremely helpful when analyzing wingpair morphology within a larger sample size as they allow researchers to identify discrepancies between multiple sets of wings in a given group quickly and easily.
**Step 3: ** Take note of each wing's landmark coordinates, which can be found under columns lmkl through lmkr — there are 11 total areas measured per each individual left and right wing (e.g., ‘L x1’: X coordinate of first landmark on the left wing provides anatomical precision)
**Step 4: ** Make sure that both wings have been labeled accurately by checking out their respective quality grades found under columns 'left_good' and 'right_good'. A grade of either 0 or 1 indicates whether background noise is present, which could result in inaccurate set of landmark points later on during analysis; thus grade should always be 1 before continuing with further steps
** Step 5 :** Calculate pertinent averages from given values such as overall wing span measurement or anatomic landmarks distances – these averages shall tell us if there exist particular traits distinguishing among multiple groups gathered together for comparison purposes
Lastly – always double check accuracy! It is advised that you reference previously collected literature regarding locations specific anatomic landmarks prior making any final conclusions from your
- Comparing the morphology of tsetse fly wings across different host species, locations, and/or collections.
- Creating classification algorithms for morphometric analysis that use deep learning architectures for automatic landmark detection.
- Developing high resolution identifying methods (or markers) to distinguish between tsesse fly species and subspecies based on their wing anatomy landmarks
If you use this dataset in your research, please credit the original authors. Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: morphometric_data.csv | Column name | Description | |:---------------|:----------------------------------------------| | vpn | Unique identifier for the wing pair. (String) | | cd | Collection date. (Date) | | cm | Collection month. (Integer) | | cy | Collection year. (Integer) | | md | Morphometric data. (String) | | g | Genus. (String) | | wlm | Wing length measurem...
Facebook
TwitterAttribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
In the master's thesis research conducted by student Mohammed Ismail Lifta (2023-2024) at the Department of Computer Science, College of Computer Science and Mathematics- Tikrit University,Iraq. Data was collected from a smartly-equipped greenhouse. The study was supervised by Assistant Professor Wissam Dawood Abdullah, Director of the Cisco Networking Academy at Tikrit University. It involved the construction of a smart greenhouse equipped with advanced technologies for monitoring and controlling environmental conditions. The study included an application that links data to Google Sheets for remote monitoring and control, providing an effective platform for efficient management of the greenhouse. ( 13 features , 37923 Row)
date (datetime64): The date and time the measurements were recorded. temperature (int64): The recorded temperature in degrees Celsius. humidity (int64): The percentage of humidity in the environment. water_level (int64): The water level as a percentage. N (int64): The nitrogen level in the soil, scaled from 0 to 255. P (int64): The phosphorus level in the soil, scaled from 0 to 255. K (int64): The potassium level in the soil, scaled from 0 to 255. Fan_actuator_OFF (float64): Indicator for the fan actuator if it is off (0 or 1). Fan_actuator_ON (float64): Indicator for the fan actuator if it is on (0 or 1). Watering_plant_pump_OFF (float64): Indicator for the plant watering pump if it is off (0 or 1). Watering_plant_pump_ON (float64): Indicator for the plant watering pump if it is on (0 or 1). Water_pump_actuator_OFF (float64): Indicator for the water pump actuator if it is off (0 or 1). Water_pump_actuator_ON (float64): Indicator for the water pump actuator if it is on (0 or 1).
The data was cleaned by removing duplicate rows and missing values. Categorical columns were encoded using One-Hot Encoding technique to facilitate the use of the data in machine learning. The file is ready for analysis and modeling using machine learning tools.
License Licensed under the (CC BY-ND).
This data can be used for environmental research and studies. Proper attribution must be given when using this data in any publication.No Change the dataset.
For more information or inquiries, please contact the principal researcher: Professor ( Assistant) Wisam Dawood Abdullah (Email: wisamdawood@tu.edu.iq).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
UPDATED EVERY WEEK Last Update - 26th July 2025
Disclaimer!!! Data uploaded here are collected from the internet and some google drive. The sole purposes of uploading these data are to provide this Kaggle community with a good source of data for analysis and research. I don't own these datasets and am also not responsible for them legally by any means. I am not charging anything (either money or any favor) for this dataset. RESEARCH PURPOSE ONLY
This data contains all the indices of NSE.
NIFTY 50,
NIFTY BANK,
NIFTY 100,
NIFTY COMMODITIES,
NIFTY CONSUMPTION,
NIFTY FIN SERVICE,
NIFTY IT,
NIFTY INFRA,
NIFTY ENERGY,
NIFTY FMCG,
NIFTY AUTO,
NIFTY 200,
NIFTY ALPHA 50,
NIFTY 500,
NIFTY CPSE,
NIFTY GS COMPSITE,
NIFTY HEALTHCARE,
NIFTY CONSR DURBL,
NIFTY LARGEMID250,
NIFTY INDIA MFG,
NIFTY IND DIGITAL,
INDIA VIX
Nifty 50 index data with 1 minute data. The dataset contains OHLC (Open, High, Low, and Close) prices from Jan 2015 to Aug 2024. - This dataset can be used for time series analysis, regression problems, and time series forecasting both for one step and multi-step ahead in the future. - Options data can be integrated with this minute data, to get more insight about this data. - Different backtesting strategies can be built on this data.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset contains daily stock price data for 16 publicly listed companies under the Tata Group, covering the period from 2006 to 2025. The data includes key trading metrics such as Open, Close, High, Low, Volume, and Date for each company. It was sourced using the GOOGLEFINANCE() function in Google Sheets, and then cleaned and standardized — including proper formatting of the date column and converting all numeric values to appropriate data types — to make it ready for analysis.
My inspiration for creating this dataset stems from the remarkable legacy of Ratan Tata and the way the Tata Group has served India across generations. Their ethical leadership, innovation, and contributions to nation-building have always inspired me. This dataset is not just about numbers — it's about documenting the financial journey of a group that has deeply impacted lives, industries, and society.
By making this dataset public, I hope it helps analysts, researchers, and students explore market behavior, practice forecasting models, and draw insights from the evolution of one of India’s most respected business conglomerates.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterExcel spreadsheets by species (4 letter code is abbreviation for genus and species used in study, year 2010 or 2011 is year data collected, SH indicates data for Science Hub, date is date of file preparation). The data in a file are described in a read me file which is the first worksheet in each file. Each row in a species spreadsheet is for one plot (plant). The data themselves are in the data worksheet. One file includes a read me description of the column in the date set for chemical analysis. In this file one row is an herbicide treatment and sample for chemical analysis (if taken). This dataset is associated with the following publication: Olszyk , D., T. Pfleeger, T. Shiroyama, M. Blakely-Smith, E. Lee , and M. Plocher. Plant reproduction is altered by simulated herbicide drift toconstructed plant communities. ENVIRONMENTAL TOXICOLOGY AND CHEMISTRY. Society of Environmental Toxicology and Chemistry, Pensacola, FL, USA, 36(10): 2799-2813, (2017).