52 datasets found

World's biggest companies dataset
kaggle.com
Updated Feb 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maryna Shut (2023). World's biggest companies dataset [Dataset]. https://www.kaggle.com/marshuu/worlds-biggest-companies-dataset/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 2, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Maryna Shut
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
World
Description
The dataset contains information about world's biggest companies.

Among them you can find companies founded in the US, the UK, Europe, Asia, South America, South Africa, Australia.

The dataset contains information about the year the company was founded, its' revenue and net income in years 2018 - 2020, and the industry.

I have included 2 csv files: the raw csv file if you want to practice cleaning the data, and the clean csv ready to be analyzed.

The third dataset includes the name of all the companies included in the previous datasets and 2 additional columns: number of employees and name of the founder.

In addition there's tesla.csv file containing shares prices for Tesla.
Data from: Company Financials Dataset
kaggle.com
Updated Aug 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Atharva Arya (2023). Company Financials Dataset [Dataset]. https://www.kaggle.com/datasets/atharvaarya25/financials
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 1, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Atharva Arya
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
This is a dataset that requires a lot of preprocessing with amazing EDA insights for a company. A dataset consisting of sales and profit data sorted by market segment and country/region.

Tips for pre-processing: 1. Check for column names and find error there itself!! 2. Remove '$' sign and '-' from all columns where they are present 3. Change datatype from objects to int after the above two. 4. Challenge: Try removing " , " (comma) from all numerical numbers. 5. Try plotting sales and profit with respect to timeline
Predictive Maintenance Dataset
kaggle.com
Updated Nov 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Himanshu Agarwal (2022). Predictive Maintenance Dataset [Dataset]. https://www.kaggle.com/datasets/hiimanshuagarwal/predictive-maintenance-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 7, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Himanshu Agarwal
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
A company has a fleet of devices transmitting daily sensor readings. They would like to create a predictive maintenance solution to proactively identify when maintenance should be performed. This approach promises cost savings over routine or time based preventive maintenance, because tasks are performed only when warranted.

The task is to build a predictive model using machine learning to predict the probability of a device failure. When building this model, be sure to minimize false positives and false negatives. The column you are trying to Predict is called failure with binary value 0 for non-failure and 1 for failure.
T
civil_comments
tensorflow.org
huggingface.co
Updated Feb 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). civil_comments [Dataset]. https://www.tensorflow.org/datasets/catalog/civil_comments
Explore at:
Dataset updated
Feb 28, 2023
Description
This version of the CivilComments Dataset provides access to the primary seven labels that were annotated by crowd workers, the toxicity and other tags are a value between 0 and 1 indicating the fraction of annotators that assigned these attributes to the comment text.

The other tags are only available for a fraction of the input examples. They are currently ignored for the main dataset; the CivilCommentsIdentities set includes those labels, but only consists of the subset of the data with them. The other attributes that were part of the original CivilComments release are included only in the raw data. See the Kaggle documentation for more details about the available features.

The comments in this dataset come from an archive of the Civil Comments platform, a commenting plugin for independent news sites. These public comments were created from 2015 - 2017 and appeared on approximately 50 English-language news sites across the world. When Civil Comments shut down in 2017, they chose to make the public comments available in a lasting open archive to enable future research. The original data, published on figshare, includes the public comment text, some associated metadata such as article IDs, publication IDs, timestamps and commenter-generated "civility" labels, but does not include user ids. Jigsaw extended this dataset by adding additional labels for toxicity, identity mentions, as well as covert offensiveness. This data set is an exact replica of the data released for the Jigsaw Unintended Bias in Toxicity Classification Kaggle challenge. This dataset is released under CC0, as is the underlying comment text.

For comments that have a parent_id also in the civil comments data, the text of the previous comment is provided as the "parent_text" feature. Note that the splits were made without regard to this information, so using previous comments may leak some information. The annotators did not have access to the parent text when making the labels.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('civil_comments', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
A
‘Google Stock Data’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Google Stock Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-google-stock-data-1a5f/latest
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Google Stock Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/varpit94/google-stock-data on 28 January 2022.

--- Dataset description provided by original source is as follows ---

What is Google?

Google LLC is an American multinational technology company that specializes in Internet-related services and products, which include online advertising technologies, a search engine, cloud computing, software, and hardware. It is considered one of the Big Five companies in the American information technology industry, along with Amazon, Facebook, Apple, and Microsoft. Google was founded on September 4, 1998, by Larry Page and Sergey Brin while they were Ph.D. students at Stanford University in California. Together they own about 14% of its publicly-listed shares and control 56% of the stockholder voting power through super-voting stock. The company went public via an initial public offering (IPO) in 2004. In 2015, Google was reorganized as a wholly-owned subsidiary of Alphabet Inc. Google is Alphabet's largest subsidiary and is a holding company for Alphabet's Internet properties and interests. Sundar Pichai was appointed CEO of Google on October 24, 2015, replacing Larry Page, who became the CEO of Alphabet. On December 3, 2019, Pichai also became the CEO of Alphabet.

Information about this dataset

This dataset provides historical data of Alphabet Inc. (GOOG). The data is available at a daily level. Currency is USD.

--- Original source retains full ownership of the source dataset ---
A
‘🐕 Cat VS Dog popularity per state’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘🐕 Cat VS Dog popularity per state’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-cat-vs-dog-popularity-per-state-24a0/668f83a8/?iid=001-843&v=presentation
Explore at:
Dataset updated
Feb 13, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘🐕 Cat VS Dog popularity per state’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/cat-vs-dog-popularity-in-u-se on 13 February 2022.

--- Dataset description provided by original source is as follows ---

About this dataset

http://i.imgur.com/LGI7wTt.png" alt="Imgur" style="">

This dataset was created by Andrew Duff and contains around 0 samples along with Percentage Of Cat Owners, Mean Number Of Dogs Per Household, technical information and other features such as: - Percentage Of Households With Pets - Mean Number Of Cats - and more.

How to use this dataset

Analyze Percentage Of Dog Owners in relation to Number Of Pet Households (in 1000)

Study the influence of Percentage Of Cat Owners on Mean Number Of Dogs Per Household

More datasets

Acknowledgements

If you use this dataset in your research, please credit Andrew Duff

Start A New Notebook!

--- Original source retains full ownership of the source dataset ---
Data on Palestinian Structures Israel Demolished
kaggle.com
Updated Nov 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
asaniczka (2023). Data on Palestinian Structures Israel Demolished [Dataset]. http://doi.org/10.34740/kaggle/ds/3840933
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/ds/3840933
Dataset updated
Nov 7, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
asaniczka
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Israel, Palestine
Description
Demolitions in the Occupied Territories is a dataset that provides statistics on the demolition of Palestinian-owned homes and structures in the Occupied Territories.

The information is based on investigations conducted by B’Tselem – The Israeli Information Center for Human Rights in the Occupied Territories.

Dataset Details:

The dataset covers a period from January 2004 to August 2023 and includes information about the date of demolition, locality, district, area, housing units, people left homeless, minors left homeless, type of structure, and reason for demolition.

Interesting Task Ideas (for Data Analysts):

Analyze the trend of demolitions over time to identify any significant patterns or changes.

Investigate the distribution of demolitions across different localities, districts, and areas to understand the geographical impact.

Explore the relationship between the number of housing units demolished and the number of people, particularly minors, left homeless.

Examine the reasons for demolitions and assess their frequency and impact.

Visualize the data using maps and charts to highlight the magnitude and geographical distribution of demolitions.

The intention of using this data should be solely for objective analysis and understanding of the situation, without any political intent. Any analysis or interpretation should be approached with sensitivity and respect for human rights.

Related Datasets:

Fatalities in the Israeli-Palestinian Conflict

If you find this dataset valuable, don't forget to hit the upvote button! 😊💝

Photo by Oleg Solodkov on Unsplash
h
Kaggle-post-and-comments-question-answer-topic
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Duverne Mathieu, Kaggle-post-and-comments-question-answer-topic [Dataset]. https://huggingface.co/datasets/Raaxx/Kaggle-post-and-comments-question-answer-topic
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Duverne Mathieu
Description
This is a dataset containing 10,000 posts from Kaggle and 60,000 comments related to those posts in the question-answer topic.

Data Fields kaggle_post

'pseudo', The question authors. 'title', Title of the Post. 'question', The question's body. 'vote', Voting on Kaggle is similar to liking. 'medal', I will share with you the Kaggle medal system, which can be found at https://www.kaggle.com/progression. The system awards medals to users based on… See the full description on the dataset page: https://huggingface.co/datasets/Raaxx/Kaggle-post-and-comments-question-answer-topic.
Apple iPhone 15 (15 pro, plus and pro max) Reviews
kaggle.com
Updated Sep 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nuhmanpk (2023). Apple iPhone 15 (15 pro, plus and pro max) Reviews [Dataset]. https://www.kaggle.com/datasets/nuhmanpk/iphone-15-15-pro-pro-max-reviews
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 20, 2023
Dataset provided by
Kaggle
Authors
nuhmanpk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contain video transcript from a limited number of youtubers who post Their review on iPhone 15, 15 plus , pro and pro max model . These are the videos used for the videos. Video Credits are owned by respective creators.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13244501%2Fc3bf6524f3ddfa376794de29f97651a1%2F_results_14_0.png?generation=1695205189424943&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13244501%2F645638973f5f8f5782cc8720ac4214c1%2F_results_15_0.png?generation=1695205202162850&alt=media" alt="">

For more check Here
h
kaggle-hugomathien-soccer
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julien Chaumond, kaggle-hugomathien-soccer [Dataset]. https://huggingface.co/datasets/julien-c/kaggle-hugomathien-soccer
Explore at:
Authors
Julien Chaumond
License
https://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/
Description
Source: https://www.kaggle.com/datasets/hugomathien/soccer by Hugo Mathien

About Dataset The ultimate Soccer database for data analysis and machine learning

What you get:

+25,000 matches +10,000 players 11 European Countries with their lead championship Seasons 2008 to 2016 Players and Teams' attributes* sourced from EA Sports' FIFA video game series, including the weekly updates Team line up with squad formation (X, Y coordinates) Betting odds from up to 10 providers… See the full description on the dataset page: https://huggingface.co/datasets/julien-c/kaggle-hugomathien-soccer.
h
ML-ArXiv-Papers
huggingface.co
opendatalab.com
Updated Jun 29, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Connor Shorten (2022). ML-ArXiv-Papers [Dataset]. https://huggingface.co/datasets/CShorten/ML-ArXiv-Papers
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 29, 2022
Authors
Connor Shorten
License
https://choosealicense.com/licenses/afl-3.0/https://choosealicense.com/licenses/afl-3.0/
Description
This dataset contains the subset of ArXiv papers with the "cs.LG" tag to indicate the paper is about Machine Learning. The core dataset is filtered from the full ArXiv dataset hosted on Kaggle: https://www.kaggle.com/datasets/Cornell-University/arxiv. The original dataset contains roughly 2 million papers. This dataset contains roughly 100,000 papers following the category filtering. The dataset is maintained by with requests to the ArXiv API. The current iteration of the dataset only contains… See the full description on the dataset page: https://huggingface.co/datasets/CShorten/ML-ArXiv-Papers.
Dataset: 23andMe Holding Co. (ME) Stock Perform...
kaggle.com
Updated Jun 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nitiraj Kulkarni (2024). Dataset: 23andMe Holding Co. (ME) Stock Perform... [Dataset]. https://www.kaggle.com/datasets/nitirajkulkarni/me-stock-performance/versions/1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 21, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nitiraj Kulkarni
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset provides historical stock market performance data for specific companies. It enables users to analyze and understand the past trends and fluctuations in stock prices over time. This information can be utilized for various purposes such as investment analysis, financial research, and market trend forecasting.
Dataset: Royalty Management Holding Corporation...
kaggle.com
Updated Jun 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nitiraj Kulkarni (2024). Dataset: Royalty Management Holding Corporation... [Dataset]. https://www.kaggle.com/datasets/nitirajkulkarni/rmco-stock-performance
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 21, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Nitiraj Kulkarni
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset provides historical stock market performance data for specific companies. It enables users to analyze and understand the past trends and fluctuations in stock prices over time. This information can be utilized for various purposes such as investment analysis, financial research, and market trend forecasting.
NASDAQ Company Details and Listings
kaggle.com
Updated Aug 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ganesh Bhabad (2024). NASDAQ Company Details and Listings [Dataset]. https://www.kaggle.com/datasets/ganeshbhabad/nasdaq-company-details-and-listings
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 11, 2024
Dataset provided by
Kaggle
Authors
Ganesh Bhabad
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
NASDAQ Listed Companies Dataset

Description:

This dataset provides comprehensive information on companies listed on the NASDAQ stock exchange. It includes essential details about each company, making it a valuable resource for financial analysis, stock market research, and investment strategies.

Features:

symbol: The unique ticker symbol used to identify the company's stock on the NASDAQ exchange.

name: The full name of the company.

currency: The currency in which the company's stock is traded.

exchange: The stock exchange where the company is listed (in this case, NASDAQ).

mic_code: The Market Identifier Code (MIC) for the NASDAQ exchange.

country: The country where the company is headquartered.

type: The type of company, such as common stock or preferred stock.

Usage: This dataset can be used for various purposes including:

Stock Market Analysis:

Analyze stock symbols, company names, and market data.

Financial Modeling:

Incorporate company details into financial models and investment strategies.

Market Research:

Understand the distribution of companies by country and currency.

Data Visualization:

Create visualizations of the NASDAQ market landscape.

Data Source:

The data is sourced from the Twelve Data API, which provides up-to-date financial and stock market information.

Notes: The dataset includes only NASDAQ-listed companies and does not cover other exchanges. Ensure to comply with any data usage policies or licensing agreements associated with the data source. Feel free to adapt the description based on the specific details and attributes of your dataset.
Financial Statement Data Sets
kaggle.com
Updated Jul 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vadim Vanak (2025). Financial Statement Data Sets [Dataset]. https://www.kaggle.com/datasets/vadimvanak/company-facts-2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 4, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Vadim Vanak
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset offers a detailed collection of US-GAAP financial data extracted from the financial statements of exchange-listed U.S. companies, as submitted to the U.S. Securities and Exchange Commission (SEC) via the EDGAR database. Covering filings from January 2009 onwards, this dataset provides key financial figures reported by companies in accordance with U.S. Generally Accepted Accounting Principles (GAAP).

Dataset Features:

Data Scope: The dataset is restricted to figures reported under US-GAAP standards, with the exception of EntityCommonStockSharesOutstanding and EntityPublicFloat.

Currency and Units: The dataset exclusively includes figures reported in USD or shares, ensuring uniformity and comparability. It excludes ratios and non-financial metrics to maintain focus on financial data.

Company Selection: The dataset is limited to companies with U.S. exchange tickers, providing a concentrated analysis of publicly traded firms within the United States.

Submission Types: The dataset only incorporates data from 10-Q, 10-K, 10-Q/A, and 10-K/A filings, ensuring consistency in the type of financial reports analyzed.

Data Sources and Extraction:

This dataset primarily relies on the SEC's Financial Statement Data Sets and EDGAR APIs: - SEC Financial Statement Data Sets - EDGAR Application Programming Interfaces

In instances where specific figures were missing from these sources, data was directly extracted from the companies' financial statements to ensure completeness.

Please note that the dataset presents financial figures exactly as reported by the companies, which may occasionally include errors. A common issue involves incorrect reporting of scaling factors in the XBRL format. XBRL supports two tag attributes related to scaling: 'decimals' and 'scale.' The 'decimals' attribute indicates the number of significant decimal places but does not affect the actual value of the figure, while the 'scale' attribute adjusts the value by a specific factor.

However, there are several instances, numbering in the thousands, where companies have incorrectly used the 'decimals' attribute (e.g., 'decimals="-6"') under the mistaken assumption that it controls scaling. This is not correct, and as a result, some figures may be inaccurately scaled. This dataset does not attempt to detect or correct such errors; it aims to reflect the data precisely as reported by the companies. A future version of the dataset may be introduced to address and correct these issues.

The source code for data extraction is available here
🦈 Shark Tank India dataset 🇮🇳
kaggle.com
Updated Apr 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Satya Thirumani (2025). 🦈 Shark Tank India dataset 🇮🇳 [Dataset]. https://www.kaggle.com/datasets/thirumani/shark-tank-india
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 20, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Satya Thirumani
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Shark Tank India Data set.

Shark Tank India - Season 1 to season 4 information, with 80 fields/columns and 630+ records.

All seasons/episodes of 🦈 SHARKTANK INDIA 🇮🇳 were broadcasted on SonyLiv OTT/Sony TV.

Here is the data dictionary for (Indian) Shark Tank season's dataset.

Season Number - Season number

Startup Name - Company name or product name

Episode Number - Episode number within the season

Pitch Number - Overall pitch number

Season Start - Season first aired date

Season End - Season last aired date

Original Air Date - Episode original/first aired date, on OTT/TV

Episode Title - Episode title in SonyLiv

Anchor - Name of the episode presenter/host

Industry - Industry name or type

Business Description - Business Description

Company Website - Company Website URL

Started in - Year in which startup was started/incorporated

Number of Presenters - Number of presenters

Male Presenters - Number of male presenters

Female Presenters - Number of female presenters

Transgender Presenters - Number of transgender/LGBTQ presenters

Couple Presenters - Are presenters wife/husband ? 1-yes, 0-no

Pitchers Average Age - All pitchers average age, <30 young, 30-50 middle, >50 old

Pitchers City - Presenter's town/city or place where company head office exists

Pitchers State - Indian state pitcher hails from or state where company head office exists

Yearly Revenue - Yearly revenue, in lakhs INR, -1 means negative revenue, 0 means pre-revenue

Monthly Sales - Total monthly sales, in lakhs

Gross Margin - Gross margin/profit of company, in percentages

Net Margin - Net margin/profit of company, in percentages

EBITDA - Earnings Before Interest, Taxes, Depreciation, and Amortization

Cash Burn - In loss in current year; burning/paying money from their pocket (yes/no)

SKUs - Stock Keeping Units or number of varieties, at the time of pitch

Has Patents - Pitcher has Patents/Intellectual property (filed/granted), at the time of pitch

Bootstrapped - Startup is bootstrapped or not (yes/no)

Part of Match off - Competition between two similar brands, pitched at same time

Original Ask Amount - Original Ask Amount, in lakhs INR

Original Offered Equity - Original Offered Equity, in percentages

Valuation Requested - Valuation Requested, in lakhs INR

Received Offer - Received offer or not, 1-received, 0-not received

Accepted Offer - Accepted offer or not, 1-accepted, 0-rejected

Total Deal Amount - Total Deal Amount, in lakhs INR

Total Deal Equity - Total Deal Equity, in percentages

Total Deal Debt - Total Deal debt/loan amount, in lakhs INR

Debt Interest - Debt interest rate, in percentages

Deal Valuation - Deal Valuation, in lakhs INR

Number of sharks in deal - Number of sharks involved in deal

Deal has conditions - Deal has conditions or not? (yes or no)

Royalty Percentage - Royalty percentage, if it's royalty deal

Royalty Recouped Amount - Royalty recouped amount, if it's royalty deal, in lakhs

Advisory Shares Equity - Deal with Advisory shares or equity, in percentages

Namita Investment Amount - Namita Investment Amount, in lakhs INR

Namita Investment Equity - Namita Investment Equity, in percentages

Namita Debt Amount - Namita Debt Amount, in lakhs INR

Vineeta Investment Amount - Vineeta Investment Amount, in lakhs INR

Vineeta Investment Equity - Vineeta Investment Equity, in percentages

Vineeta Debt Amount - Vineeta Debt Amount, in lakhs INR

Anupam Investment Amount - Anupam Investment Amount, in lakhs INR

Anupam Investment Equity - Anupam Investment Equity, in percentages

Anupam Debt Amount - Anupam Debt Amount, in lakhs INR

Aman Investment Amount - Aman Investment Amount, in lakhs INR

Aman Investment Equity - Aman Investment Equity, in percentages

Aman Debt Amount - Aman Debt Amount, in lakhs INR

Peyush Investment Amount - Peyush Investment Amount, in lakhs INR

Peyush Investment Equity - Peyush Investment Equity, in percentages

Peyush Debt Amount - Peyush Debt Amount, in lakhs INR

Ritesh Investment Amount - Ritesh Investment Amount, in lakhs INR

Ritesh Investment Equity - Ritesh Investment Equity, in percentages

Ritesh Debt Amount - Ritesh Debt Amount, in lakhs INR

Amit Investment Amount - Amit Investment Amount, in lakhs INR

Amit Investment Equity - Amit Investment Equity, in percentages

Amit Debt Amount - Amit Debt Amount, in lakhs INR

Guest Investment Amount - Guest Investment Amount, in lakhs INR

Guest Investment Equity - Guest Investment Equity, in percentages

Guest Debt Amount - Guest Debt Amount, in lakhs INR

Invested Guest Name - Name of the guest(s) who invested in deal

All Guest Names - Name of all guests, who are present in episode

Namita Present - Whether Namita present in episode or not

Vineeta Present - Whether Vineeta present in episode or not

Anupam ...

Pokemon Detective: Unmask Team Rocket

kaggle.com

Updated Mar 27, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Kotso P (2025). Pokemon Detective: Unmask Team Rocket [Dataset]. https://www.kaggle.com/datasets/kotsop/pokmon-detective-challenge

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Mar 27, 2025

Dataset provided by

Kaggle

Authors

Kotso P

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

🔍 The Case of the Disguised Villains: Predicting Team Rocket with Data

In the bustling world of Kanto, where Pokémon battles shape destinies, crime lurks in the shadows. Detective Kotso, the sharpest mind in Pokémon crime investigations, has been tasked with an urgent mission. The mayor suspects that Team Rocket has infiltrated the city, disguising themselves as ordinary citizens.

But Kotso doesn’t work alone—he relies on you, a brilliant data scientist, to uncover the truth. Your job? Analyze the data of 5,000 residents to predict which of the 1,000 unclassified individuals are secretly part of Team Rocket.

Can you spot the hidden patterns? Can Machine Learning crack the case where traditional detective work fails? The fate of Kanto depends on your skills.

📊 Dataset Structure & Features

This dataset holds the key to exposing Team Rocket’s operatives. Below is a breakdown of the features at your disposal:

Column Name	Description
ID	Unique identifier for each citizen
Age	Age of the citizen
City	City the citizen is from
Economic Status	Low, Medium, High
Occupation	Profession in the Pokémon world
Most Frequent Pokémon Type	The type of Pokémon most frequently used
Average Pokémon Level	Average level of owned Pokémon
Criminal Record	Clean (0) or Dirty (1)
Pokéball Usage	Preferred Pokéball type (e.g., DarkBall, UltraBall)
Winning Percentage	Battle win rate (e.g., 64%, 88%)
Gym Badges	Number of gym badges collected (0 to 8)
Is Pokémon Champion	True if the citizen has defeated the Pokémon Elite Four
Battle Strategy	Defensive, Aggressive, Unpredictable
City Movement Frequency	Number of times the citizen moved between cities in the last year
Possession of Rare Items	Yes or No
Debts to the Kanto System	Amount of debt (e.g., 20,000)
Charitable Activities	Yes or No
Team Rocket Membership	Yes or No (target variable)

🕵️ Can You Crack the Case?

This dataset is not just about numbers—it’s a criminal investigation. Hidden patterns lurk beneath the surface, waiting to be uncovered.

Are certain Pokémon types more common among Team Rocket members?
Do suspicious financial transactions hint at illegal activities?
Does their battle strategy betray their allegiance?

This isn’t just another classification task—it’s a race against time to stop Team Rocket before they take control of Kanto!

Detective Kotso is counting on you. Will you rise to the challenge? 🕵️‍♂️🔎

🔎 10 Key Questions & Suggested Analysis Techniques

1️⃣ Do certain Pokémon types indicate suspicious behavior?
- 📈 Graph: Stacked bar chart comparing Pokémon type distribution between Rocket & non-Rocket members.
- 🎯 Test: Chi-square test for correlation.

2️⃣ Is economic status a reliable predictor of criminal affiliation?
- 📊 Graph: Box plot of debt and economic status per Team Rocket status.
- 🏦 Test: ANOVA test for group differences.

3️⃣ Do Team Rocket members have a preference for specific PokéBalls?
- 🎨 Graph: Heatmap of PokéBall usage vs. Team Rocket status.
- ⚡ Test: Chi-square test for independence.

4️⃣ Does a high battle win ratio correlate with Team Rocket membership?
- 📉 Graph: KDE plot of win ratio distribution for both classes.
- 🏆 Test: T-test for mean differences.

5️⃣ Are migration patterns different for Team Rocket members?
- 📈 Graph: Violin plot of migration counts per group.
- 🌍 Test: Mann-Whitney U test.

6️⃣ Do Rocket members tend to avoid charity participation?
- 📊 Graph: Grouped bar chart of charity participation rates.
- 🕵️‍♂️ Test: Fisher’s Exact Test for small sample sizes.

7️⃣ Do Rocket members disguise themselves in certain professions?
- 📊 Graph: Horizontal bar chart of profession frequency per group.
- 🕵️‍♂️ Test: Chi-square test for profession-Team Rocket relationship.

8️⃣ Is there an unusual cluster of Rocket members in specific cities?
- 🗺 Graph: Geographic heatmap of city distributions.
- 📌 Test: Spatial autocorrelation test.

9️⃣ How does badge count affect the likelihood of being a Rocket member?
- 📉 Graph: Histogram of gym badge distributions.
- 🏅 Test: Kruskal-Wallis test.

🔟 **Are there any multi-feature interactions that reve...

Loan Approval Dataset
kaggle.com
Updated Oct 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arbaaz Tamboli (2024). Loan Approval Dataset [Dataset]. https://www.kaggle.com/datasets/arbaaztamboli/loan-approval-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 15, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Arbaaz Tamboli
Description
This dataset contains a wealth of information from 52,000 loan applications, offering detailed insights into the factors that influence loan approval decisions. Collected from financial institutions, this data is highly valuable for credit risk analysis, financial modeling, and predictive analytics. The dataset is particularly useful for anyone interested in applying machine learning techniques to real-world financial decision-making scenarios.

Overview: This dataset provides information about various applicants and the loans they applied for, including their demographic details, income, loan terms, and approval status. By analyzing this data, one can gain an understanding of which factors are most critical for determining the likelihood of loan approval. The dataset can also help in evaluating credit risk and building robust credit scoring systems.

Dataset Columns: Applicant_ID: Unique identifier for each loan application. Gender: Gender of the applicant (Male/Female). Age: Age of the applicant. Marital_Status: Marital status of the applicant (Single/Married). Dependents: Number of dependents the applicant has. Education: Education level of the applicant (Graduate/Not Graduate). Employment_Status: Employment status of the applicant (Employed, Self-Employed, Unemployed). Occupation_Type: Type of occupation, which provides insights into the nature of the applicant’s job (Salaried, Business, Others). Residential_Status: Type of residence (Owned, Rented, Mortgage). City/Town: The city or town where the applicant resides. Annual_Income: The total annual income of the applicant, a key factor in loan eligibility. Monthly_Expenses: The monthly expenses of the applicant, indicating their financial obligations. Credit_Score: The applicant's credit score, reflecting their creditworthiness. Existing_Loans: Number of existing loans the applicant is servicing. Total_Existing_Loan_Amount: The total amount of all existing loans the applicant has. Outstanding_Debt: The remaining amount of debt yet to be paid by the applicant. Loan_History: The applicant’s previous loan history (Good/Bad), indicating their repayment reliability. Loan_Amount_Requested: The loan amount the applicant has applied for. Loan_Term: The term of the loan in months. Loan_Purpose: The purpose of the loan (e.g., Home, Car, Education, Personal, Business). Interest_Rate: The interest rate applied to the loan. Loan_Type: The type of loan (Secured/Unsecured). Co-Applicant: Indicates if there is a co-applicant for the loan (Yes/No). Bank_Account_History: Applicant’s banking history, showing past transactions and reliability. Transaction_Frequency: The frequency of financial transactions in the applicant’s bank account (Low/Medium/High). Default_Risk: The risk level of the applicant defaulting on the loan (Low/Medium/High). Loan_Approval_Status: Final decision on the loan application (Approved/Rejected).
Chicago Veteran Owned Businesses
kaggle.com
Updated Feb 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Chicago (2020). Chicago Veteran Owned Businesses [Dataset]. https://www.kaggle.com/chicago/chicago-veteran-owned-businesses/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 1, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
City of Chicago
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Chicago
Description
Content

Cook County Certified Veteran Owned Businesses

Context

This is a dataset hosted by the City of Chicago. The city has an open data platform found here and they update their information according the amount of data that is brought in. Explore the City of Chicago using Kaggle and all of the data sources available through the City of Chicago organization page!

Update Frequency: This dataset is updated monthly.

Acknowledgements

This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.

Cover photo by 刘帅 on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
Retail Transactions Dataset
kaggle.com
Updated May 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Prasad Patil (2024). Retail Transactions Dataset [Dataset]. https://www.kaggle.com/datasets/prasad22/retail-transactions-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 18, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Prasad Patil
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset was created to simulate a market basket dataset, providing insights into customer purchasing behavior and store operations. The dataset facilitates market basket analysis, customer segmentation, and other retail analytics tasks. Here's more information about the context and inspiration behind this dataset:

Context:

Retail businesses, from supermarkets to convenience stores, are constantly seeking ways to better understand their customers and improve their operations. Market basket analysis, a technique used in retail analytics, explores customer purchase patterns to uncover associations between products, identify trends, and optimize pricing and promotions. Customer segmentation allows businesses to tailor their offerings to specific groups, enhancing the customer experience.

Inspiration:

The inspiration for this dataset comes from the need for accessible and customizable market basket datasets. While real-world retail data is sensitive and often restricted, synthetic datasets offer a safe and versatile alternative. Researchers, data scientists, and analysts can use this dataset to develop and test algorithms, models, and analytical tools.

Dataset Information:

The columns provide information about the transactions, customers, products, and purchasing behavior, making the dataset suitable for various analyses, including market basket analysis and customer segmentation. Here's a brief explanation of each column in the Dataset:

Transaction_ID: A unique identifier for each transaction, represented as a 10-digit number. This column is used to uniquely identify each purchase.

Date: The date and time when the transaction occurred. It records the timestamp of each purchase.

Customer_Name: The name of the customer who made the purchase. It provides information about the customer's identity.

Product: A list of products purchased in the transaction. It includes the names of the products bought.

Total_Items: The total number of items purchased in the transaction. It represents the quantity of products bought.

Total_Cost: The total cost of the purchase, in currency. It represents the financial value of the transaction.

Payment_Method: The method used for payment in the transaction, such as credit card, debit card, cash, or mobile payment.

City: The city where the purchase took place. It indicates the location of the transaction.

Store_Type: The type of store where the purchase was made, such as a supermarket, convenience store, department store, etc.

Discount_Applied: A binary indicator (True/False) representing whether a discount was applied to the transaction.

Customer_Category: A category representing the customer's background or age group.

Season: The season in which the purchase occurred, such as spring, summer, fall, or winter.

Promotion: The type of promotion applied to the transaction, such as "None," "BOGO (Buy One Get One)," or "Discount on Selected Items."

Use Cases:

Market Basket Analysis: Discover associations between products and uncover buying patterns.

Customer Segmentation: Group customers based on purchasing behavior.

Pricing Optimization: Optimize pricing strategies and identify opportunities for discounts and promotions.

Retail Analytics: Analyze store performance and customer trends.

Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.

Facebook

Twitter

Click to copy link

Link copied

Cite

Maryna Shut (2023). World's biggest companies dataset [Dataset]. https://www.kaggle.com/marshuu/worlds-biggest-companies-dataset/discussion

World's biggest companies dataset

Data on world's biggest companies.

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Feb 2, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Maryna Shut

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered

World

Description

The dataset contains information about world's biggest companies.

Among them you can find companies founded in the US, the UK, Europe, Asia, South America, South Africa, Australia.

The dataset contains information about the year the company was founded, its' revenue and net income in years 2018 - 2020, and the industry.

I have included 2 csv files: the raw csv file if you want to practice cleaning the data, and the clean csv ready to be analyzed.

The third dataset includes the name of all the companies included in the previous datasets and 2 additional columns: number of employees and name of the founder.

In addition there's tesla.csv file containing shares prices for Tesla.

Clear search

Close search

Google apps

Main menu

World's biggest companies dataset

Data from: Company Financials Dataset

Predictive Maintenance Dataset

civil_comments

‘Google Stock Data’ analyzed by Analyst-2

What is Google?

Information about this dataset

‘🐕 Cat VS Dog popularity per state’ analyzed by Analyst-2

About this dataset

How to use this dataset

Acknowledgements

Start A New Notebook!

Data on Palestinian Structures Israel Demolished

Dataset Details:

Interesting Task Ideas (for Data Analysts):

Related Datasets:

Kaggle-post-and-comments-question-answer-topic

Apple iPhone 15 (15 pro, plus and pro max) Reviews

kaggle-hugomathien-soccer

ML-ArXiv-Papers

Dataset: 23andMe Holding Co. (ME) Stock Perform...

Dataset: Royalty Management Holding Corporation...

NASDAQ Company Details and Listings

NASDAQ Listed Companies Dataset

Description:

Features:

Stock Market Analysis:

Financial Modeling:

Market Research:

Data Visualization:

Data Source:

Financial Statement Data Sets

Dataset Features:

Data Sources and Extraction:

🦈 Shark Tank India dataset 🇮🇳

Shark Tank India Data set.

Pokemon Detective: Unmask Team Rocket

🔍 The Case of the Disguised Villains: Predicting Team Rocket with Data

📊 Dataset Structure & Features

🕵️ Can You Crack the Case?

🔎 10 Key Questions & Suggested Analysis Techniques

Loan Approval Dataset

Chicago Veteran Owned Businesses

Content

Context

Acknowledgements

Retail Transactions Dataset

Context:

Inspiration:

Dataset Information:

Use Cases:

Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.

World's biggest companies dataset

Data on world's biggest companies.

`Context:`

`Inspiration:`

`Dataset Information:`

`Use Cases:`