52 datasets found
  1. World's biggest companies dataset

    • kaggle.com
    Updated Feb 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maryna Shut (2023). World's biggest companies dataset [Dataset]. https://www.kaggle.com/marshuu/worlds-biggest-companies-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 2, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Maryna Shut
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    World
    Description

    The dataset contains information about world's biggest companies.

    Among them you can find companies founded in the US, the UK, Europe, Asia, South America, South Africa, Australia.

    The dataset contains information about the year the company was founded, its' revenue and net income in years 2018 - 2020, and the industry.

    I have included 2 csv files: the raw csv file if you want to practice cleaning the data, and the clean csv ready to be analyzed.

    The third dataset includes the name of all the companies included in the previous datasets and 2 additional columns: number of employees and name of the founder.

    In addition there's tesla.csv file containing shares prices for Tesla.

  2. Data from: Company Financials Dataset

    • kaggle.com
    Updated Aug 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Atharva Arya (2023). Company Financials Dataset [Dataset]. https://www.kaggle.com/datasets/atharvaarya25/financials
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 1, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Atharva Arya
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    This is a dataset that requires a lot of preprocessing with amazing EDA insights for a company. A dataset consisting of sales and profit data sorted by market segment and country/region.

    Tips for pre-processing: 1. Check for column names and find error there itself!! 2. Remove '$' sign and '-' from all columns where they are present 3. Change datatype from objects to int after the above two. 4. Challenge: Try removing " , " (comma) from all numerical numbers. 5. Try plotting sales and profit with respect to timeline

  3. Predictive Maintenance Dataset

    • kaggle.com
    Updated Nov 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Himanshu Agarwal (2022). Predictive Maintenance Dataset [Dataset]. https://www.kaggle.com/datasets/hiimanshuagarwal/predictive-maintenance-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 7, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Himanshu Agarwal
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    A company has a fleet of devices transmitting daily sensor readings. They would like to create a predictive maintenance solution to proactively identify when maintenance should be performed. This approach promises cost savings over routine or time based preventive maintenance, because tasks are performed only when warranted.

    The task is to build a predictive model using machine learning to predict the probability of a device failure. When building this model, be sure to minimize false positives and false negatives. The column you are trying to Predict is called failure with binary value 0 for non-failure and 1 for failure.

  4. T

    civil_comments

    • tensorflow.org
    • huggingface.co
    Updated Feb 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). civil_comments [Dataset]. https://www.tensorflow.org/datasets/catalog/civil_comments
    Explore at:
    Dataset updated
    Feb 28, 2023
    Description

    This version of the CivilComments Dataset provides access to the primary seven labels that were annotated by crowd workers, the toxicity and other tags are a value between 0 and 1 indicating the fraction of annotators that assigned these attributes to the comment text.

    The other tags are only available for a fraction of the input examples. They are currently ignored for the main dataset; the CivilCommentsIdentities set includes those labels, but only consists of the subset of the data with them. The other attributes that were part of the original CivilComments release are included only in the raw data. See the Kaggle documentation for more details about the available features.

    The comments in this dataset come from an archive of the Civil Comments platform, a commenting plugin for independent news sites. These public comments were created from 2015 - 2017 and appeared on approximately 50 English-language news sites across the world. When Civil Comments shut down in 2017, they chose to make the public comments available in a lasting open archive to enable future research. The original data, published on figshare, includes the public comment text, some associated metadata such as article IDs, publication IDs, timestamps and commenter-generated "civility" labels, but does not include user ids. Jigsaw extended this dataset by adding additional labels for toxicity, identity mentions, as well as covert offensiveness. This data set is an exact replica of the data released for the Jigsaw Unintended Bias in Toxicity Classification Kaggle challenge. This dataset is released under CC0, as is the underlying comment text.

    For comments that have a parent_id also in the civil comments data, the text of the previous comment is provided as the "parent_text" feature. Note that the splits were made without regard to this information, so using previous comments may leak some information. The annotators did not have access to the parent text when making the labels.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('civil_comments', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  5. A

    ‘Google Stock Data’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Google Stock Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-google-stock-data-1a5f/latest
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Google Stock Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/varpit94/google-stock-data on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    What is Google?

    Google LLC is an American multinational technology company that specializes in Internet-related services and products, which include online advertising technologies, a search engine, cloud computing, software, and hardware. It is considered one of the Big Five companies in the American information technology industry, along with Amazon, Facebook, Apple, and Microsoft. Google was founded on September 4, 1998, by Larry Page and Sergey Brin while they were Ph.D. students at Stanford University in California. Together they own about 14% of its publicly-listed shares and control 56% of the stockholder voting power through super-voting stock. The company went public via an initial public offering (IPO) in 2004. In 2015, Google was reorganized as a wholly-owned subsidiary of Alphabet Inc. Google is Alphabet's largest subsidiary and is a holding company for Alphabet's Internet properties and interests. Sundar Pichai was appointed CEO of Google on October 24, 2015, replacing Larry Page, who became the CEO of Alphabet. On December 3, 2019, Pichai also became the CEO of Alphabet.

    Information about this dataset

    This dataset provides historical data of Alphabet Inc. (GOOG). The data is available at a daily level. Currency is USD.

    --- Original source retains full ownership of the source dataset ---

  6. A

    ‘🐕 Cat VS Dog popularity per state’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘🐕 Cat VS Dog popularity per state’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-cat-vs-dog-popularity-per-state-24a0/668f83a8/?iid=001-843&v=presentation
    Explore at:
    Dataset updated
    Feb 13, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘🐕 Cat VS Dog popularity per state’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/cat-vs-dog-popularity-in-u-se on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    About this dataset

    http://i.imgur.com/LGI7wTt.png" alt="Imgur" style="">

    This dataset was created by Andrew Duff and contains around 0 samples along with Percentage Of Cat Owners, Mean Number Of Dogs Per Household, technical information and other features such as: - Percentage Of Households With Pets - Mean Number Of Cats - and more.

    How to use this dataset

    • Analyze Percentage Of Dog Owners in relation to Number Of Pet Households (in 1000)
    • Study the influence of Percentage Of Cat Owners on Mean Number Of Dogs Per Household
    • More datasets

    Acknowledgements

    If you use this dataset in your research, please credit Andrew Duff

    Start A New Notebook!

    --- Original source retains full ownership of the source dataset ---

  7. Data on Palestinian Structures Israel Demolished

    • kaggle.com
    Updated Nov 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    asaniczka (2023). Data on Palestinian Structures Israel Demolished [Dataset]. http://doi.org/10.34740/kaggle/ds/3840933
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 7, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    asaniczka
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Israel, Palestine
    Description

    Demolitions in the Occupied Territories is a dataset that provides statistics on the demolition of Palestinian-owned homes and structures in the Occupied Territories.

    The information is based on investigations conducted by B’Tselem – The Israeli Information Center for Human Rights in the Occupied Territories.

    Dataset Details:

    The dataset covers a period from January 2004 to August 2023 and includes information about the date of demolition, locality, district, area, housing units, people left homeless, minors left homeless, type of structure, and reason for demolition.

    Interesting Task Ideas (for Data Analysts):

    1. Analyze the trend of demolitions over time to identify any significant patterns or changes.
    2. Investigate the distribution of demolitions across different localities, districts, and areas to understand the geographical impact.
    3. Explore the relationship between the number of housing units demolished and the number of people, particularly minors, left homeless.
    4. Examine the reasons for demolitions and assess their frequency and impact.
    5. Visualize the data using maps and charts to highlight the magnitude and geographical distribution of demolitions.

    The intention of using this data should be solely for objective analysis and understanding of the situation, without any political intent. Any analysis or interpretation should be approached with sensitivity and respect for human rights.

    Related Datasets:

    Fatalities in the Israeli-Palestinian Conflict

    If you find this dataset valuable, don't forget to hit the upvote button! 😊💝

    Photo by Oleg Solodkov on Unsplash

  8. h

    Kaggle-post-and-comments-question-answer-topic

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Duverne Mathieu, Kaggle-post-and-comments-question-answer-topic [Dataset]. https://huggingface.co/datasets/Raaxx/Kaggle-post-and-comments-question-answer-topic
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Duverne Mathieu
    Description

    This is a dataset containing 10,000 posts from Kaggle and 60,000 comments related to those posts in the question-answer topic.

      Data Fields
    
    
    
    
    
    
    
      kaggle_post
    

    'pseudo', The question authors. 'title', Title of the Post. 'question', The question's body. 'vote', Voting on Kaggle is similar to liking. 'medal', I will share with you the Kaggle medal system, which can be found at https://www.kaggle.com/progression. The system awards medals to users based on… See the full description on the dataset page: https://huggingface.co/datasets/Raaxx/Kaggle-post-and-comments-question-answer-topic.

  9. Apple iPhone 15 (15 pro, plus and pro max) Reviews

    • kaggle.com
    Updated Sep 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nuhmanpk (2023). Apple iPhone 15 (15 pro, plus and pro max) Reviews [Dataset]. https://www.kaggle.com/datasets/nuhmanpk/iphone-15-15-pro-pro-max-reviews
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 20, 2023
    Dataset provided by
    Kaggle
    Authors
    nuhmanpk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contain video transcript from a limited number of youtubers who post Their review on iPhone 15, 15 plus , pro and pro max model . These are the videos used for the videos. Video Credits are owned by respective creators.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13244501%2Fc3bf6524f3ddfa376794de29f97651a1%2F_results_14_0.png?generation=1695205189424943&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F13244501%2F645638973f5f8f5782cc8720ac4214c1%2F_results_15_0.png?generation=1695205202162850&alt=media" alt="">

    For more check Here

  10. h

    kaggle-hugomathien-soccer

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julien Chaumond, kaggle-hugomathien-soccer [Dataset]. https://huggingface.co/datasets/julien-c/kaggle-hugomathien-soccer
    Explore at:
    Authors
    Julien Chaumond
    License

    https://choosealicense.com/licenses/odbl/https://choosealicense.com/licenses/odbl/

    Description

    Source: https://www.kaggle.com/datasets/hugomathien/soccer by Hugo Mathien

      About Dataset
    
    
    
    
    
      The ultimate Soccer database for data analysis and machine learning
    

    What you get:

    +25,000 matches +10,000 players 11 European Countries with their lead championship Seasons 2008 to 2016 Players and Teams' attributes* sourced from EA Sports' FIFA video game series, including the weekly updates Team line up with squad formation (X, Y coordinates) Betting odds from up to 10 providers… See the full description on the dataset page: https://huggingface.co/datasets/julien-c/kaggle-hugomathien-soccer.

  11. h

    ML-ArXiv-Papers

    • huggingface.co
    • opendatalab.com
    Updated Jun 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Connor Shorten (2022). ML-ArXiv-Papers [Dataset]. https://huggingface.co/datasets/CShorten/ML-ArXiv-Papers
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 29, 2022
    Authors
    Connor Shorten
    License

    https://choosealicense.com/licenses/afl-3.0/https://choosealicense.com/licenses/afl-3.0/

    Description

    This dataset contains the subset of ArXiv papers with the "cs.LG" tag to indicate the paper is about Machine Learning. The core dataset is filtered from the full ArXiv dataset hosted on Kaggle: https://www.kaggle.com/datasets/Cornell-University/arxiv. The original dataset contains roughly 2 million papers. This dataset contains roughly 100,000 papers following the category filtering. The dataset is maintained by with requests to the ArXiv API. The current iteration of the dataset only contains… See the full description on the dataset page: https://huggingface.co/datasets/CShorten/ML-ArXiv-Papers.

  12. Dataset: 23andMe Holding Co. (ME) Stock Perform...

    • kaggle.com
    Updated Jun 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nitiraj Kulkarni (2024). Dataset: 23andMe Holding Co. (ME) Stock Perform... [Dataset]. https://www.kaggle.com/datasets/nitirajkulkarni/me-stock-performance/versions/1
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 21, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nitiraj Kulkarni
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset provides historical stock market performance data for specific companies. It enables users to analyze and understand the past trends and fluctuations in stock prices over time. This information can be utilized for various purposes such as investment analysis, financial research, and market trend forecasting.

  13. Dataset: Royalty Management Holding Corporation...

    • kaggle.com
    Updated Jun 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nitiraj Kulkarni (2024). Dataset: Royalty Management Holding Corporation... [Dataset]. https://www.kaggle.com/datasets/nitirajkulkarni/rmco-stock-performance
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 21, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nitiraj Kulkarni
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset provides historical stock market performance data for specific companies. It enables users to analyze and understand the past trends and fluctuations in stock prices over time. This information can be utilized for various purposes such as investment analysis, financial research, and market trend forecasting.

  14. NASDAQ Company Details and Listings

    • kaggle.com
    Updated Aug 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ganesh Bhabad (2024). NASDAQ Company Details and Listings [Dataset]. https://www.kaggle.com/datasets/ganeshbhabad/nasdaq-company-details-and-listings
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 11, 2024
    Dataset provided by
    Kaggle
    Authors
    Ganesh Bhabad
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    NASDAQ Listed Companies Dataset

    Description:

    This dataset provides comprehensive information on companies listed on the NASDAQ stock exchange. It includes essential details about each company, making it a valuable resource for financial analysis, stock market research, and investment strategies.

    Features:

    • symbol: The unique ticker symbol used to identify the company's stock on the NASDAQ exchange.
    • name: The full name of the company.
    • currency: The currency in which the company's stock is traded.
    • exchange: The stock exchange where the company is listed (in this case, NASDAQ).
    • mic_code: The Market Identifier Code (MIC) for the NASDAQ exchange.
    • country: The country where the company is headquartered.
    • type: The type of company, such as common stock or preferred stock.
    • Usage: This dataset can be used for various purposes including:

    Stock Market Analysis:

    Analyze stock symbols, company names, and market data.

    Financial Modeling:

    Incorporate company details into financial models and investment strategies.

    Market Research:

    Understand the distribution of companies by country and currency.

    Data Visualization:

    Create visualizations of the NASDAQ market landscape.

    Data Source:

    The data is sourced from the Twelve Data API, which provides up-to-date financial and stock market information.

    Notes: The dataset includes only NASDAQ-listed companies and does not cover other exchanges. Ensure to comply with any data usage policies or licensing agreements associated with the data source. Feel free to adapt the description based on the specific details and attributes of your dataset.

  15. Financial Statement Data Sets

    • kaggle.com
    Updated Jul 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vadim Vanak (2025). Financial Statement Data Sets [Dataset]. https://www.kaggle.com/datasets/vadimvanak/company-facts-2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 4, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Vadim Vanak
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset offers a detailed collection of US-GAAP financial data extracted from the financial statements of exchange-listed U.S. companies, as submitted to the U.S. Securities and Exchange Commission (SEC) via the EDGAR database. Covering filings from January 2009 onwards, this dataset provides key financial figures reported by companies in accordance with U.S. Generally Accepted Accounting Principles (GAAP).

    Dataset Features:

    • Data Scope: The dataset is restricted to figures reported under US-GAAP standards, with the exception of EntityCommonStockSharesOutstanding and EntityPublicFloat.
    • Currency and Units: The dataset exclusively includes figures reported in USD or shares, ensuring uniformity and comparability. It excludes ratios and non-financial metrics to maintain focus on financial data.
    • Company Selection: The dataset is limited to companies with U.S. exchange tickers, providing a concentrated analysis of publicly traded firms within the United States.
    • Submission Types: The dataset only incorporates data from 10-Q, 10-K, 10-Q/A, and 10-K/A filings, ensuring consistency in the type of financial reports analyzed.

    Data Sources and Extraction:

    This dataset primarily relies on the SEC's Financial Statement Data Sets and EDGAR APIs: - SEC Financial Statement Data Sets - EDGAR Application Programming Interfaces

    In instances where specific figures were missing from these sources, data was directly extracted from the companies' financial statements to ensure completeness.

    Please note that the dataset presents financial figures exactly as reported by the companies, which may occasionally include errors. A common issue involves incorrect reporting of scaling factors in the XBRL format. XBRL supports two tag attributes related to scaling: 'decimals' and 'scale.' The 'decimals' attribute indicates the number of significant decimal places but does not affect the actual value of the figure, while the 'scale' attribute adjusts the value by a specific factor.

    However, there are several instances, numbering in the thousands, where companies have incorrectly used the 'decimals' attribute (e.g., 'decimals="-6"') under the mistaken assumption that it controls scaling. This is not correct, and as a result, some figures may be inaccurately scaled. This dataset does not attempt to detect or correct such errors; it aims to reflect the data precisely as reported by the companies. A future version of the dataset may be introduced to address and correct these issues.

    The source code for data extraction is available here

  16. 🦈 Shark Tank India dataset 🇮🇳

    • kaggle.com
    Updated Apr 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Satya Thirumani (2025). 🦈 Shark Tank India dataset 🇮🇳 [Dataset]. https://www.kaggle.com/datasets/thirumani/shark-tank-india
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 20, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Satya Thirumani
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Shark Tank India Data set.

    Shark Tank India - Season 1 to season 4 information, with 80 fields/columns and 630+ records.

    All seasons/episodes of 🦈 SHARKTANK INDIA 🇮🇳 were broadcasted on SonyLiv OTT/Sony TV.

    Here is the data dictionary for (Indian) Shark Tank season's dataset.

    • Season Number - Season number
    • Startup Name - Company name or product name
    • Episode Number - Episode number within the season
    • Pitch Number - Overall pitch number
    • Season Start - Season first aired date
    • Season End - Season last aired date
    • Original Air Date - Episode original/first aired date, on OTT/TV
    • Episode Title - Episode title in SonyLiv
    • Anchor - Name of the episode presenter/host
    • Industry - Industry name or type
    • Business Description - Business Description
    • Company Website - Company Website URL
    • Started in - Year in which startup was started/incorporated
    • Number of Presenters - Number of presenters
    • Male Presenters - Number of male presenters
    • Female Presenters - Number of female presenters
    • Transgender Presenters - Number of transgender/LGBTQ presenters
    • Couple Presenters - Are presenters wife/husband ? 1-yes, 0-no
    • Pitchers Average Age - All pitchers average age, <30 young, 30-50 middle, >50 old
    • Pitchers City - Presenter's town/city or place where company head office exists
    • Pitchers State - Indian state pitcher hails from or state where company head office exists
    • Yearly Revenue - Yearly revenue, in lakhs INR, -1 means negative revenue, 0 means pre-revenue
    • Monthly Sales - Total monthly sales, in lakhs
    • Gross Margin - Gross margin/profit of company, in percentages
    • Net Margin - Net margin/profit of company, in percentages
    • EBITDA - Earnings Before Interest, Taxes, Depreciation, and Amortization
    • Cash Burn - In loss in current year; burning/paying money from their pocket (yes/no)
    • SKUs - Stock Keeping Units or number of varieties, at the time of pitch
    • Has Patents - Pitcher has Patents/Intellectual property (filed/granted), at the time of pitch
    • Bootstrapped - Startup is bootstrapped or not (yes/no)
    • Part of Match off - Competition between two similar brands, pitched at same time
    • Original Ask Amount - Original Ask Amount, in lakhs INR
    • Original Offered Equity - Original Offered Equity, in percentages
    • Valuation Requested - Valuation Requested, in lakhs INR
    • Received Offer - Received offer or not, 1-received, 0-not received
    • Accepted Offer - Accepted offer or not, 1-accepted, 0-rejected
    • Total Deal Amount - Total Deal Amount, in lakhs INR
    • Total Deal Equity - Total Deal Equity, in percentages
    • Total Deal Debt - Total Deal debt/loan amount, in lakhs INR
    • Debt Interest - Debt interest rate, in percentages
    • Deal Valuation - Deal Valuation, in lakhs INR
    • Number of sharks in deal - Number of sharks involved in deal
    • Deal has conditions - Deal has conditions or not? (yes or no)
    • Royalty Percentage - Royalty percentage, if it's royalty deal
    • Royalty Recouped Amount - Royalty recouped amount, if it's royalty deal, in lakhs
    • Advisory Shares Equity - Deal with Advisory shares or equity, in percentages
    • Namita Investment Amount - Namita Investment Amount, in lakhs INR
    • Namita Investment Equity - Namita Investment Equity, in percentages
    • Namita Debt Amount - Namita Debt Amount, in lakhs INR
    • Vineeta Investment Amount - Vineeta Investment Amount, in lakhs INR
    • Vineeta Investment Equity - Vineeta Investment Equity, in percentages
    • Vineeta Debt Amount - Vineeta Debt Amount, in lakhs INR
    • Anupam Investment Amount - Anupam Investment Amount, in lakhs INR
    • Anupam Investment Equity - Anupam Investment Equity, in percentages
    • Anupam Debt Amount - Anupam Debt Amount, in lakhs INR
    • Aman Investment Amount - Aman Investment Amount, in lakhs INR
    • Aman Investment Equity - Aman Investment Equity, in percentages
    • Aman Debt Amount - Aman Debt Amount, in lakhs INR
    • Peyush Investment Amount - Peyush Investment Amount, in lakhs INR
    • Peyush Investment Equity - Peyush Investment Equity, in percentages
    • Peyush Debt Amount - Peyush Debt Amount, in lakhs INR
    • Ritesh Investment Amount - Ritesh Investment Amount, in lakhs INR
    • Ritesh Investment Equity - Ritesh Investment Equity, in percentages
    • Ritesh Debt Amount - Ritesh Debt Amount, in lakhs INR
    • Amit Investment Amount - Amit Investment Amount, in lakhs INR
    • Amit Investment Equity - Amit Investment Equity, in percentages
    • Amit Debt Amount - Amit Debt Amount, in lakhs INR
    • Guest Investment Amount - Guest Investment Amount, in lakhs INR
    • Guest Investment Equity - Guest Investment Equity, in percentages
    • Guest Debt Amount - Guest Debt Amount, in lakhs INR
    • Invested Guest Name - Name of the guest(s) who invested in deal
    • All Guest Names - Name of all guests, who are present in episode
    • Namita Present - Whether Namita present in episode or not
    • Vineeta Present - Whether Vineeta present in episode or not
    • Anupam ...
  17. Pokemon Detective: Unmask Team Rocket

    • kaggle.com
    Updated Mar 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kotso P (2025). Pokemon Detective: Unmask Team Rocket [Dataset]. https://www.kaggle.com/datasets/kotsop/pokmon-detective-challenge
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 27, 2025
    Dataset provided by
    Kaggle
    Authors
    Kotso P
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    🔍 The Case of the Disguised Villains: Predicting Team Rocket with Data

    In the bustling world of Kanto, where Pokémon battles shape destinies, crime lurks in the shadows. Detective Kotso, the sharpest mind in Pokémon crime investigations, has been tasked with an urgent mission. The mayor suspects that Team Rocket has infiltrated the city, disguising themselves as ordinary citizens.

    But Kotso doesn’t work alone—he relies on you, a brilliant data scientist, to uncover the truth. Your job? Analyze the data of 5,000 residents to predict which of the 1,000 unclassified individuals are secretly part of Team Rocket.

    Can you spot the hidden patterns? Can Machine Learning crack the case where traditional detective work fails? The fate of Kanto depends on your skills.

    📊 Dataset Structure & Features

    This dataset holds the key to exposing Team Rocket’s operatives. Below is a breakdown of the features at your disposal:

    Column NameDescription
    IDUnique identifier for each citizen
    AgeAge of the citizen
    CityCity the citizen is from
    Economic StatusLow, Medium, High
    OccupationProfession in the Pokémon world
    Most Frequent Pokémon TypeThe type of Pokémon most frequently used
    Average Pokémon LevelAverage level of owned Pokémon
    Criminal RecordClean (0) or Dirty (1)
    Pokéball UsagePreferred Pokéball type (e.g., DarkBall, UltraBall)
    Winning PercentageBattle win rate (e.g., 64%, 88%)
    Gym BadgesNumber of gym badges collected (0 to 8)
    Is Pokémon ChampionTrue if the citizen has defeated the Pokémon Elite Four
    Battle StrategyDefensive, Aggressive, Unpredictable
    City Movement FrequencyNumber of times the citizen moved between cities in the last year
    Possession of Rare ItemsYes or No
    Debts to the Kanto SystemAmount of debt (e.g., 20,000)
    Charitable ActivitiesYes or No
    Team Rocket MembershipYes or No (target variable)

    🕵️ Can You Crack the Case?

    This dataset is not just about numbers—it’s a criminal investigation. Hidden patterns lurk beneath the surface, waiting to be uncovered.

    • Are certain Pokémon types more common among Team Rocket members?
    • Do suspicious financial transactions hint at illegal activities?
    • Does their battle strategy betray their allegiance?

    This isn’t just another classification task—it’s a race against time to stop Team Rocket before they take control of Kanto!

    Detective Kotso is counting on you. Will you rise to the challenge? 🕵️‍♂️🔎

    🔎 10 Key Questions & Suggested Analysis Techniques

    1️⃣ Do certain Pokémon types indicate suspicious behavior?
    - 📈 Graph: Stacked bar chart comparing Pokémon type distribution between Rocket & non-Rocket members.
    - 🎯 Test: Chi-square test for correlation.

    2️⃣ Is economic status a reliable predictor of criminal affiliation?
    - 📊 Graph: Box plot of debt and economic status per Team Rocket status.
    - 🏦 Test: ANOVA test for group differences.

    3️⃣ Do Team Rocket members have a preference for specific PokéBalls?
    - 🎨 Graph: Heatmap of PokéBall usage vs. Team Rocket status.
    - ⚡ Test: Chi-square test for independence.

    4️⃣ Does a high battle win ratio correlate with Team Rocket membership?
    - 📉 Graph: KDE plot of win ratio distribution for both classes.
    - 🏆 Test: T-test for mean differences.

    5️⃣ Are migration patterns different for Team Rocket members?
    - 📈 Graph: Violin plot of migration counts per group.
    - 🌍 Test: Mann-Whitney U test.

    6️⃣ Do Rocket members tend to avoid charity participation?
    - 📊 Graph: Grouped bar chart of charity participation rates.
    - 🕵️‍♂️ Test: Fisher’s Exact Test for small sample sizes.

    7️⃣ Do Rocket members disguise themselves in certain professions?
    - 📊 Graph: Horizontal bar chart of profession frequency per group.
    - 🕵️‍♂️ Test: Chi-square test for profession-Team Rocket relationship.

    8️⃣ Is there an unusual cluster of Rocket members in specific cities?
    - 🗺 Graph: Geographic heatmap of city distributions.
    - 📌 Test: Spatial autocorrelation test.

    9️⃣ How does badge count affect the likelihood of being a Rocket member?
    - 📉 Graph: Histogram of gym badge distributions.
    - 🏅 Test: Kruskal-Wallis test.

    🔟 **Are there any multi-feature interactions that reve...

  18. Loan Approval Dataset

    • kaggle.com
    Updated Oct 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arbaaz Tamboli (2024). Loan Approval Dataset [Dataset]. https://www.kaggle.com/datasets/arbaaztamboli/loan-approval-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 15, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Arbaaz Tamboli
    Description

    This dataset contains a wealth of information from 52,000 loan applications, offering detailed insights into the factors that influence loan approval decisions. Collected from financial institutions, this data is highly valuable for credit risk analysis, financial modeling, and predictive analytics. The dataset is particularly useful for anyone interested in applying machine learning techniques to real-world financial decision-making scenarios.

    Overview: This dataset provides information about various applicants and the loans they applied for, including their demographic details, income, loan terms, and approval status. By analyzing this data, one can gain an understanding of which factors are most critical for determining the likelihood of loan approval. The dataset can also help in evaluating credit risk and building robust credit scoring systems.

    Dataset Columns: Applicant_ID: Unique identifier for each loan application. Gender: Gender of the applicant (Male/Female). Age: Age of the applicant. Marital_Status: Marital status of the applicant (Single/Married). Dependents: Number of dependents the applicant has. Education: Education level of the applicant (Graduate/Not Graduate). Employment_Status: Employment status of the applicant (Employed, Self-Employed, Unemployed). Occupation_Type: Type of occupation, which provides insights into the nature of the applicant’s job (Salaried, Business, Others). Residential_Status: Type of residence (Owned, Rented, Mortgage). City/Town: The city or town where the applicant resides. Annual_Income: The total annual income of the applicant, a key factor in loan eligibility. Monthly_Expenses: The monthly expenses of the applicant, indicating their financial obligations. Credit_Score: The applicant's credit score, reflecting their creditworthiness. Existing_Loans: Number of existing loans the applicant is servicing. Total_Existing_Loan_Amount: The total amount of all existing loans the applicant has. Outstanding_Debt: The remaining amount of debt yet to be paid by the applicant. Loan_History: The applicant’s previous loan history (Good/Bad), indicating their repayment reliability. Loan_Amount_Requested: The loan amount the applicant has applied for. Loan_Term: The term of the loan in months. Loan_Purpose: The purpose of the loan (e.g., Home, Car, Education, Personal, Business). Interest_Rate: The interest rate applied to the loan. Loan_Type: The type of loan (Secured/Unsecured). Co-Applicant: Indicates if there is a co-applicant for the loan (Yes/No). Bank_Account_History: Applicant’s banking history, showing past transactions and reliability. Transaction_Frequency: The frequency of financial transactions in the applicant’s bank account (Low/Medium/High). Default_Risk: The risk level of the applicant defaulting on the loan (Low/Medium/High). Loan_Approval_Status: Final decision on the loan application (Approved/Rejected).

  19. Chicago Veteran Owned Businesses

    • kaggle.com
    Updated Feb 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Chicago (2020). Chicago Veteran Owned Businesses [Dataset]. https://www.kaggle.com/chicago/chicago-veteran-owned-businesses/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 1, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    City of Chicago
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Chicago
    Description

    Content

    Cook County Certified Veteran Owned Businesses

    Context

    This is a dataset hosted by the City of Chicago. The city has an open data platform found here and they update their information according the amount of data that is brought in. Explore the City of Chicago using Kaggle and all of the data sources available through the City of Chicago organization page!

    • Update Frequency: This dataset is updated monthly.

    Acknowledgements

    This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.

    Cover photo by 刘 帅 on Unsplash
    Unsplash Images are distributed under a unique Unsplash License.

  20. Retail Transactions Dataset

    • kaggle.com
    Updated May 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prasad Patil (2024). Retail Transactions Dataset [Dataset]. https://www.kaggle.com/datasets/prasad22/retail-transactions-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 18, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Prasad Patil
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset was created to simulate a market basket dataset, providing insights into customer purchasing behavior and store operations. The dataset facilitates market basket analysis, customer segmentation, and other retail analytics tasks. Here's more information about the context and inspiration behind this dataset:

    Context:

    Retail businesses, from supermarkets to convenience stores, are constantly seeking ways to better understand their customers and improve their operations. Market basket analysis, a technique used in retail analytics, explores customer purchase patterns to uncover associations between products, identify trends, and optimize pricing and promotions. Customer segmentation allows businesses to tailor their offerings to specific groups, enhancing the customer experience.

    Inspiration:

    The inspiration for this dataset comes from the need for accessible and customizable market basket datasets. While real-world retail data is sensitive and often restricted, synthetic datasets offer a safe and versatile alternative. Researchers, data scientists, and analysts can use this dataset to develop and test algorithms, models, and analytical tools.

    Dataset Information:

    The columns provide information about the transactions, customers, products, and purchasing behavior, making the dataset suitable for various analyses, including market basket analysis and customer segmentation. Here's a brief explanation of each column in the Dataset:

    • Transaction_ID: A unique identifier for each transaction, represented as a 10-digit number. This column is used to uniquely identify each purchase.
    • Date: The date and time when the transaction occurred. It records the timestamp of each purchase.
    • Customer_Name: The name of the customer who made the purchase. It provides information about the customer's identity.
    • Product: A list of products purchased in the transaction. It includes the names of the products bought.
    • Total_Items: The total number of items purchased in the transaction. It represents the quantity of products bought.
    • Total_Cost: The total cost of the purchase, in currency. It represents the financial value of the transaction.
    • Payment_Method: The method used for payment in the transaction, such as credit card, debit card, cash, or mobile payment.
    • City: The city where the purchase took place. It indicates the location of the transaction.
    • Store_Type: The type of store where the purchase was made, such as a supermarket, convenience store, department store, etc.
    • Discount_Applied: A binary indicator (True/False) representing whether a discount was applied to the transaction.
    • Customer_Category: A category representing the customer's background or age group.
    • Season: The season in which the purchase occurred, such as spring, summer, fall, or winter.
    • Promotion: The type of promotion applied to the transaction, such as "None," "BOGO (Buy One Get One)," or "Discount on Selected Items."

    Use Cases:

    • Market Basket Analysis: Discover associations between products and uncover buying patterns.
    • Customer Segmentation: Group customers based on purchasing behavior.
    • Pricing Optimization: Optimize pricing strategies and identify opportunities for discounts and promotions.
    • Retail Analytics: Analyze store performance and customer trends.

    Note: This dataset is entirely synthetic and was generated using the Python Faker library, which means it doesn't contain real customer data. It's designed for educational and research purposes.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Maryna Shut (2023). World's biggest companies dataset [Dataset]. https://www.kaggle.com/marshuu/worlds-biggest-companies-dataset/discussion
Organization logo

World's biggest companies dataset

Data on world's biggest companies.

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 2, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Maryna Shut
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered
World
Description

The dataset contains information about world's biggest companies.

Among them you can find companies founded in the US, the UK, Europe, Asia, South America, South Africa, Australia.

The dataset contains information about the year the company was founded, its' revenue and net income in years 2018 - 2020, and the industry.

I have included 2 csv files: the raw csv file if you want to practice cleaning the data, and the clean csv ready to be analyzed.

The third dataset includes the name of all the companies included in the previous datasets and 2 additional columns: number of employees and name of the founder.

In addition there's tesla.csv file containing shares prices for Tesla.

Search
Clear search
Close search
Google apps
Main menu