26 datasets found
  1. api token

    • kaggle.com
    Updated Apr 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sundriedtomatoes (2025). api token [Dataset]. https://www.kaggle.com/datasets/tanushguha/api-token/suggestions
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    sundriedtomatoes
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by sundriedtomatoes

    Released under MIT

    Contents

  2. codeparrot_1M

    • kaggle.com
    Updated Feb 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tanay Mehta (2024). codeparrot_1M [Dataset]. https://www.kaggle.com/datasets/heyytanay/codeparrot-1m
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 25, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Tanay Mehta
    Description

    A subset of codeparrot/github-code dataset consisting of 1 Million tokenized Python files in Lance file format for blazing fast and memory efficient I/O.

    The files were tokenized using the EleutherAI/gpt-neox-20b tokenizer with no extra tokens.

    For detailed information on how the dataset was created, refer to my article on Curating Custom Datasets for efficient LLM training using Lance.

    The script used for creating the dataset can be found here.

    Instructions for using this dataset

    This dataset is not supposed to be used on Kaggle Kernels since Lance requires the input directory of the dataset to have write access but Kaggle Kernel's input directory doesn't have it and the dataset size prohibits one from moving it to /kaggle/working. Hence, to use this dataset, you must download it by using the Kaggle API or through this page and then move the unzipped files to a folder called codeparrot_1M.lance. Below are detailed snippets on how to download and use this dataset.

    First download and unzip the dataset from your terminal (make sure you have your kaggle API key at ~/.kaggle/:

    $ pip install -q kaggle pyarrow pylance
    $ kaggle datasets download -d heyytanay/codeparrot-1m
    $ mkdir codeparrot_1M.lance/
    $ unzip -qq codeparrot-1m.zip -d codeparrot_1M.lance/
    $ rm codeparrot-1m.zip
    

    Once this is done, you will find your dataset in the codeparrot_1M.lance/ folder. Now to load and get a gist of the data, run the below snippet.

    import lance
    dataset = lance.dataset('codeparrot_1M.lance/')
    print(dataset.count_rows())
    

    This will give you the total number of tokens in the dataset.

    Considerations for Using the Data The dataset consists of source code from a wide range of repositories. As such they can potentially include harmful or biased code as well as sensitive information like passwords or usernames.

  3. h

    amazon-product-data-2020

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Calm Goose, amazon-product-data-2020 [Dataset]. https://huggingface.co/datasets/calmgoose/amazon-product-data-2020
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Calm Goose
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    What is this?

    This is a cleaned version of Amazon Product Dataset 2020 from Kaggle.

      Why?
    

    Using via Hugging Face API is easier; Kaggle API is annoying because their authentication is having credentials in a folder. Cleaned because 13/28 columns are empty.

  4. A

    ‘Doge Coin: An explosion’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Doge Coin: An explosion’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-doge-coin-an-explosion-98dc/eb83891a/?iid=002-697&v=presentation
    Explore at:
    Dataset updated
    Jan 6, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Doge Coin: An explosion’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/cyruskouhyar/doge-coin-an-explosion on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context🔥

    this Dataset contains the doge coin prices in 2019-Now

    Content💯

    Doge Coin prices with details, open, close, low and high prices. open and close and all related dates. API that I got the results of is CoinAPI: with free plan you can access rest api i put the link below so you can also use it.

    https://www.coinapi.io/

    Acknowledgements✔️

    thanks to CoinAPI for this amazing service. I will be happy if you vote up it and follow my kaggle profile.😃 I did the same thing for bitcoin: https://www.kaggle.com/cyruskouhyar/btcprices2015now

    --- Original source retains full ownership of the source dataset ---

  5. Top 100 Cryptos - 15 min cycles

    • kaggle.com
    Updated Mar 5, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Idan Erez (2018). Top 100 Cryptos - 15 min cycles [Dataset]. https://www.kaggle.com/datasets/idanerez/top-100-cryptos-updates-every-15-min
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 5, 2018
    Dataset provided by
    Kaggle
    Authors
    Idan Erez
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    The past two months were crazy in the crypto market. The goal is to allow analyze correlations between Bitcoin and other Crypto Currencies in order to do smarter day-trading.

    Content

    This data set was updated every 15 min using Coin Market Cap API and includes the top 100 coins market cap, price in USD and price in BTC. Every row has its update time in EST Time zone

    Acknowledgements

    Coin Market Cap API

    Inspiration

    Who are the followers and leaders in the crypto market? When BTC goes down - what coins should be bought and when? When it goes up - which coins start to rise following it but still giving us enough time to buy them?

  6. t

    Credit Card Fraud Detection

    • test.researchdata.tuwien.at
    • zenodo.org
    • +1more
    csv, json, pdf +2
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja (2025). Credit Card Fraud Detection [Dataset]. http://doi.org/10.82556/yvxj-9t22
    Explore at:
    csv, pdf, text/markdown, txt, jsonAvailable download formats
    Dataset updated
    Apr 28, 2025
    Dataset provided by
    TU Wien
    Authors
    Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 28, 2025
    Description

    Below is a draft DMP–style description of your credit‐card fraud detection experiment, modeled on the antiquities example:

    1. Dataset Description

    Research Domain
    This work resides in the domain of financial fraud detection and applied machine learning. We focus on detecting anomalous credit‐card transactions in real time to reduce financial losses and improve trust in digital payment systems.

    Purpose
    The goal is to train and evaluate a binary classification model that flags potentially fraudulent transactions. By publishing both the code and data splits via FAIR repositories, we enable reproducible benchmarking of fraud‐detection algorithms and support future research on anomaly detection in transaction data.

    Data Sources
    We used the publicly available credit‐card transaction dataset from Kaggle (original source: https://www.kaggle.com/mlg-ulb/creditcardfraud), which contains anonymized transactions made by European cardholders over two days in September 2013. The dataset includes 284 807 transactions, of which 492 are fraudulent.

    Method of Dataset Preparation

    1. Schema validation: Renamed columns to snake_case (e.g. transaction_amount, is_declined) so they conform to DBRepo’s requirements.

    2. Data import: Uploaded the full CSV into DBRepo, assigned persistent identifiers (PIDs).

    3. Splitting: Programmatically derived three subsets—training (70%), validation (15%), test (15%)—using range‐based filters on the primary key actionnr. Each subset was materialized in DBRepo and assigned its own PID for precise citation.

    4. Cleaning: Converted the categorical flags (is_declined, isforeigntransaction, ishighriskcountry, isfradulent) from “Y”/“N” to 1/0 and dropped non‐feature identifiers (actionnr, merchant_id).

    5. Modeling: Trained a RandomForest classifier on the training split, tuned on validation, and evaluated on the held‐out test set.

    2. Technical Details

    Dataset Structure

    • The raw data is a single CSV with columns:

      • actionnr (integer transaction ID)

      • merchant_id (string)

      • average_amount_transaction_day (float)

      • transaction_amount (float)

      • is_declined, isforeigntransaction, ishighriskcountry, isfradulent (binary flags)

      • total_number_of_declines_day, daily_chargeback_avg_amt, sixmonth_avg_chbk_amt, sixmonth_chbk_freq (numeric features)

    Naming Conventions

    • All columns use lowercase snake_case.

    • Subsets are named creditcard_training, creditcard_validation, creditcard_test in DBRepo.

    • Files in the code repo follow a clear structure:

      ├── data/         # local copies only; raw data lives in DBRepo 
      ├── notebooks/Task.ipynb 
      ├── models/rf_model_v1.joblib 
      ├── outputs/        # confusion_matrix.png, roc_curve.png, predictions.csv 
      ├── README.md 
      ├── requirements.txt 
      └── codemeta.json 
      

    Required Software

    • Python 3.9+

    • pandas, numpy (data handling)

    • scikit-learn (modeling, metrics)

    • matplotlib (visualizations)

    • dbrepo‐client.py (DBRepo API)

    • requests (TU WRD API)

    Additional Resources

    3. Further Details

    Data Limitations

    • Highly imbalanced: only ~0.17% of transactions are fraudulent.

    • Anonymized PCA features (V1V28) hidden; we extended with domain features but cannot reverse engineer raw variables.

    • Time‐bounded: only covers two days of transactions, may not capture seasonal patterns.

    Licensing and Attribution

    • Raw data: CC-0 (per Kaggle terms)

    • Code & notebooks: MIT License

    • Model artifacts & outputs: CC-BY 4.0

    • DUWRD records include ORCID identifiers for the author.

    Recommended Uses

    • Benchmarking new fraud‐detection algorithms on a standard imbalanced dataset.

    • Educational purposes: demonstrating model‐training pipelines, FAIR data practices.

    • Extension: adding time‐series or deep‐learning models.

    Known Issues

    • Possible temporal leakage if date/time features not handled correctly.

    • Model performance may degrade on live data due to concept drift.

    • Binary flags may oversimplify nuanced transaction outcomes.

  7. t

    Credit Card Fraud Detection

    • test.researchdata.tuwien.ac.at
    csv, json, pdf +1
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja (2025). Credit Card Fraud Detection [Dataset]. http://doi.org/10.82556/yvxj-9t22
    Explore at:
    text/markdown, csv, pdf, jsonAvailable download formats
    Dataset updated
    Apr 28, 2025
    Dataset provided by
    TU Wien
    Authors
    Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 28, 2025
    Description

    Below is a draft DMP–style description of your credit‐card fraud detection experiment, modeled on the antiquities example:

    1. Dataset Description

    Research Domain
    This work resides in the domain of financial fraud detection and applied machine learning. We focus on detecting anomalous credit‐card transactions in real time to reduce financial losses and improve trust in digital payment systems.

    Purpose
    The goal is to train and evaluate a binary classification model that flags potentially fraudulent transactions. By publishing both the code and data splits via FAIR repositories, we enable reproducible benchmarking of fraud‐detection algorithms and support future research on anomaly detection in transaction data.

    Data Sources
    We used the publicly available credit‐card transaction dataset from Kaggle (original source: https://www.kaggle.com/mlg-ulb/creditcardfraud), which contains anonymized transactions made by European cardholders over two days in September 2013. The dataset includes 284 807 transactions, of which 492 are fraudulent.

    Method of Dataset Preparation

    1. Schema validation: Renamed columns to snake_case (e.g. transaction_amount, is_declined) so they conform to DBRepo’s requirements.

    2. Data import: Uploaded the full CSV into DBRepo, assigned persistent identifiers (PIDs).

    3. Splitting: Programmatically derived three subsets—training (70%), validation (15%), test (15%)—using range‐based filters on the primary key actionnr. Each subset was materialized in DBRepo and assigned its own PID for precise citation.

    4. Cleaning: Converted the categorical flags (is_declined, isforeigntransaction, ishighriskcountry, isfradulent) from “Y”/“N” to 1/0 and dropped non‐feature identifiers (actionnr, merchant_id).

    5. Modeling: Trained a RandomForest classifier on the training split, tuned on validation, and evaluated on the held‐out test set.

    2. Technical Details

    Dataset Structure

    • The raw data is a single CSV with columns:

      • actionnr (integer transaction ID)

      • merchant_id (string)

      • average_amount_transaction_day (float)

      • transaction_amount (float)

      • is_declined, isforeigntransaction, ishighriskcountry, isfradulent (binary flags)

      • total_number_of_declines_day, daily_chargeback_avg_amt, sixmonth_avg_chbk_amt, sixmonth_chbk_freq (numeric features)

    Naming Conventions

    • All columns use lowercase snake_case.

    • Subsets are named creditcard_training, creditcard_validation, creditcard_test in DBRepo.

    • Files in the code repo follow a clear structure:

      ├── data/         # local copies only; raw data lives in DBRepo 
      ├── notebooks/Task.ipynb 
      ├── models/rf_model_v1.joblib 
      ├── outputs/        # confusion_matrix.png, roc_curve.png, predictions.csv 
      ├── README.md 
      ├── requirements.txt 
      └── codemeta.json 
      

    Required Software

    • Python 3.9+

    • pandas, numpy (data handling)

    • scikit-learn (modeling, metrics)

    • matplotlib (visualizations)

    • dbrepo‐client.py (DBRepo API)

    • requests (TU WRD API)

    Additional Resources

    3. Further Details

    Data Limitations

    • Highly imbalanced: only ~0.17% of transactions are fraudulent.

    • Anonymized PCA features (V1V28) hidden; we extended with domain features but cannot reverse engineer raw variables.

    • Time‐bounded: only covers two days of transactions, may not capture seasonal patterns.

    Licensing and Attribution

    • Raw data: CC-0 (per Kaggle terms)

    • Code & notebooks: MIT License

    • Model artifacts & outputs: CC-BY 4.0

    • DUWRD records include ORCID identifiers for the author.

    Recommended Uses

    • Benchmarking new fraud‐detection algorithms on a standard imbalanced dataset.

    • Educational purposes: demonstrating model‐training pipelines, FAIR data practices.

    • Extension: adding time‐series or deep‐learning models.

    Known Issues

    • Possible temporal leakage if date/time features not handled correctly.

    • Model performance may degrade on live data due to concept drift.

    • Binary flags may oversimplify nuanced transaction outcomes.

  8. Integrated Cryptocurrency Historical Data for a Predictive Data-Driven...

    • cryptodata.center
    Updated Dec 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    cryptodata.center (2024). Integrated Cryptocurrency Historical Data for a Predictive Data-Driven Decision-Making Algorithm - Dataset - CryptoData Hub [Dataset]. https://cryptodata.center/dataset/integrated-cryptocurrency-historical-data-for-a-predictive-data-driven-decision-making-algorithm
    Explore at:
    Dataset updated
    Dec 4, 2024
    Dataset provided by
    CryptoDATA
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cryptocurrency historical datasets from January 2012 (if available) to October 2021 were obtained and integrated from various sources and Application Programming Interfaces (APIs) including Yahoo Finance, Cryptodownload, CoinMarketCap, various Kaggle datasets, and multiple APIs. While these datasets used various formats of time (e.g., minutes, hours, days), in order to integrate the datasets days format was used for in this research study. The integrated cryptocurrency historical datasets for 80 cryptocurrencies including but not limited to Bitcoin (BTC), Ethereum (ETH), Binance Coin (BNB), Cardano (ADA), Tether (USDT), Ripple (XRP), Solana (SOL), Polkadot (DOT), USD Coin (USDC), Dogecoin (DOGE), Tron (TRX), Bitcoin Cash (BCH), Litecoin (LTC), EOS (EOS), Cosmos (ATOM), Stellar (XLM), Wrapped Bitcoin (WBTC), Uniswap (UNI), Terra (LUNA), SHIBA INU (SHIB), and 60 more cryptocurrencies were uploaded in this online Mendeley data repository. Although the primary attribute of including the mentioned cryptocurrencies was the Market Capitalization, a subject matter expert i.e., a professional trader has also guided the initial selection of the cryptocurrencies by analyzing various indicators such as Relative Strength Index (RSI), Moving Average Convergence/Divergence (MACD), MYC Signals, Bollinger Bands, Fibonacci Retracement, Stochastic Oscillator and Ichimoku Cloud. The primary features of this dataset that were used as the decision-making criteria of the CLUS-MCDA II approach are Timestamps, Open, High, Low, Closed, Volume (Currency), % Change (7 days and 24 hours), Market Cap and Weighted Price values. The available excel and CSV files in this data set are just part of the integrated data and other databases, datasets and API References that was used in this study are as follows: [1] https://finance.yahoo.com/ [2] https://coinmarketcap.com/historical/ [3] https://cryptodatadownload.com/ [4] https://kaggle.com/philmohun/cryptocurrency-financial-data [5] https://kaggle.com/deepshah16/meme-cryptocurrency-historical-data [6] https://kaggle.com/sudalairajkumar/cryptocurrencypricehistory [7] https://min-api.cryptocompare.com/data/price?fsym=BTC&tsyms=USD [8] https://min-api.cryptocompare.com/ [9] https://p.nomics.com/cryptocurrency-bitcoin-api [10] https://www.coinapi.io/ [11] https://www.coingecko.com/en/api [12] https://cryptowat.ch/ [13] https://www.alphavantage.co/ This dataset is part of the CLUS-MCDA (Cluster analysis for improving Multiple Criteria Decision Analysis) and CLUS-MCDAII Project: https://aimaghsoodi.github.io/CLUSMCDA-R-Package/ https://github.com/Aimaghsoodi/CLUS-MCDA-II https://github.com/azadkavian/CLUS-MCDA

  9. Global Financial Inclusion (Global Findex) Data

    • kaggle.com
    zip
    Updated May 16, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Bank (2019). Global Financial Inclusion (Global Findex) Data [Dataset]. https://www.kaggle.com/theworldbank/global-financial-inclusion-global-findex-data
    Explore at:
    zip(7384649 bytes)Available download formats
    Dataset updated
    May 16, 2019
    Dataset authored and provided by
    World Bankhttp://worldbank.org/
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Content

    The Global Financial Inclusion Database provides 800 country-level indicators of financial inclusion summarized for all adults and disaggregated by key demographic characteristics-gender, age, education, income, and rural residence. Covering more than 140 economies, the indicators of financial inclusion measure how people save, borrow, make payments and manage risk.

    The reference citation for the data is: Demirguc-Kunt, Asli, Leora Klapper, Dorothe Singer, and Peter Van Oudheusden. 2015. “The Global Findex Database 2014: Measuring Financial Inclusion around the World.” Policy Research Working Paper 7255, World Bank, Washington, DC.

    Context

    This is a dataset hosted by the World Bank. The organization has an open data platform found here and they update their information according the amount of data that is brought in. Explore the World Bank using Kaggle and all of the data sources available through the World Bank organization page!

    • Update Frequency: This dataset is updated daily.

    Acknowledgements

    This dataset is maintained using the World Bank's APIs and Kaggle's API.

    Cover photo by ZACHARY STAINES on Unsplash
    Unsplash Images are distributed under a unique Unsplash License.

  10. RAWG Games Dataset

    • kaggle.com
    Updated Feb 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TF (2025). RAWG Games Dataset [Dataset]. https://www.kaggle.com/datasets/atalaydenknalbant/rawg-games-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 14, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    TF
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://cdn-uploads.huggingface.co/production/uploads/65e3c559d26b426e3e1994f8/e85CmtDucO_FQ-W5h1RTB.png">

    https://visitor-badge.laobi.icu/badge?page_id=atalaydenknalbant/rawg-games-dataset" alt="visitors">

    Description

    RAWG Games Dataset video game records data gathered directly from the RAWG API. It includes essential fields such as game id, title, release date, rating, genres, platforms, descriptive tags, Metacritic score, developers, publishers, playtime, and a detailed description. The data was collected to support studies, trend analysis, and insights into the gaming industry. Each field is aligned with the specifications provided in the RAWG API documentation.

    Latest Update: February 14, 2025

    Acknowledgements

    Grateful to RAWG for data API.

    FieldDescription
    idA unique identifier for each game, serving as the primary key to reference detailed game data via the API.
    nameThe official title of the game.
    releasedThe release date of the game, typically in the YYYY-MM-DD format.
    ratingAn aggregated score based on player reviews, computed on a standardized scale reflecting user opinions.
    genresA list of genre objects categorizing the game (e.g., Action, Adventure, RPG).
    platformsAn array of platform objects that indicate on which systems the game is available (e.g., PC, PlayStation, Xbox).
    tagsA collection of descriptive keyword tags (e.g., multiplayer, indie).
    metacriticA numerical score derived from Metacritic reviews (usually ranging from 0 to 100).
    developersThe individuals or companies responsible for creating the game.
    publishersEntities that market and distribute the game.
    playtimeAn estimate of the average time (in hours) that players spend engaging with the game.
    descriptionA detailed narrative of the game, providing in-depth information about gameplay, plot, mechanics, and overall context.
  11. St. Louis Fed Economic News Index Real GDP Nowcast

    • kaggle.com
    zip
    Updated Dec 12, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    St. Louis Fed (2019). St. Louis Fed Economic News Index Real GDP Nowcast [Dataset]. https://www.kaggle.com/stlouisfed/st.-louis-fed-economic-news-index-real-gdp-nowcast
    Explore at:
    zip(1270 bytes)Available download formats
    Dataset updated
    Dec 12, 2019
    Dataset provided by
    Federal Reserve Bank Of St. Louishttps://www.stlouisfed.org/
    Authors
    St. Louis Fed
    Area covered
    St. Louis
    Description

    Content

    St. Louis Fed’s Economic News Index (ENI) uses economic content from key monthly economic data releases to forecast the growth of real GDP during that quarter. In general, the most-current observation is revised multiple times throughout the quarter. The final forecasted value (before the BEA’s release of the advance estimate of GDP) is the static, historical value for that quarter. For more information, see Grover, Sean P.; Kliesen, Kevin L.; and McCracken, Michael W. “A Macroeconomic News Index for Constructing Nowcasts of U.S. Real Gross Domestic Product Growth" (https://research.stlouisfed.org/publications/review/2016/12/05/a-macroeconomic-news-index-for-constructing-nowcasts-of-u-s-real-gross-domestic-product-growth/ )

    Context

    This is a dataset from the Federal Reserve Bank of St. Louis hosted by the Federal Reserve Economic Database (FRED). FRED has a data platform found here and they update their information according to the frequency that the data updates. Explore the Federal Reserve Bank of St. Louis using Kaggle and all of the data sources available through the St. Louis Fed organization page!

    • Update Frequency: This dataset is updated daily.

    • Observation Start: 2013-04-01

    • Observation End : 2019-10-01

    Acknowledgements

    This dataset is maintained using FRED's API and Kaggle's API.

    Cover photo by Ferdinand Stöhr on Unsplash
    Unsplash Images are distributed under a unique Unsplash License.

  12. St. Louis Fed Financial Stress Index

    • kaggle.com
    zip
    Updated Dec 11, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    St. Louis Fed (2019). St. Louis Fed Financial Stress Index [Dataset]. https://www.kaggle.com/stlouisfed/st.-louis-fed-financial-stress-index
    Explore at:
    zip(8684 bytes)Available download formats
    Dataset updated
    Dec 11, 2019
    Dataset provided by
    Federal Reserve Bank Of St. Louishttps://www.stlouisfed.org/
    Authors
    St. Louis Fed
    Description

    Content

    The STLFSI measures the degree of financial stress in the markets and is constructed from 18 weekly data series: seven interest rate series, six yield spreads and five other indicators. Each of these variables captures some aspect of financial stress. Accordingly, as the level of financial stress in the economy changes, the data series are likely to move together.

    How to Interpret the Index: The average value of the index, which begins in late 1993, is designed to be zero. Thus, zero is viewed as representing normal financial market conditions. Values below zero suggest below-average financial market stress, while values above zero suggest above-average financial market stress.

    More information: For additional information on the STLFSI and its construction, see "Measuring Financial Market Stress" (https://files.stlouisfed.org/research/publications/es/10/ES1002.pdf) and the related appendix (https://files.stlouisfed.org/files/htdocs/publications/net/NETJan2010Appendix.pdf).

    See this list (https://www.stlouisfed.org/news-releases/st-louis-fed-financial-stress-index/stlfsi-key) of the components that are used to construct the STLFSI.

    As of 07/15/2010 the Vanguard Financial Exchange-Traded Fund series has been replaced with the S&P 500 Financials Index. This change was made to facilitate a more timely and automated updating of the FSI. Switching from the Vanguard series to the S&P series produced no meaningful change in the index.

    Copyright, 2016, Federal Reserve Bank of St. Louis.

    Context

    This is a dataset from the Federal Reserve Bank of St. Louis hosted by the Federal Reserve Economic Database (FRED). FRED has a data platform found here and they update their information according to the frequency that the data updates. Explore the Federal Reserve Bank of St. Louis using Kaggle and all of the data sources available through the St. Louis Fed organization page!

    • Update Frequency: This dataset is updated daily.

    • Observation Start: 1993-12-31

    • Observation End : 2019-11-29

    Acknowledgements

    This dataset is maintained using FRED's API and Kaggle's API.

    Cover photo by Laura Lefurgey-Smith on Unsplash
    Unsplash Images are distributed under a unique Unsplash License.

  13. Historical Price Data of Tether (USDT)

    • kaggle.com
    Updated Dec 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sergio Nantenaina Raoelinirina (2024). Historical Price Data of Tether (USDT) [Dataset]. https://www.kaggle.com/datasets/sergioraoelinirina/historical-price-data-of-tether-usdt
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 26, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sergio Nantenaina Raoelinirina
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset provides comprehensive annual data on Tether (USDT), one of the most widely used stablecoins in the cryptocurrency ecosystem. The data includes key market metrics collected via the CoinGecko API, structured for in-depth analysis and versatile applications, such as market analysis, financial modeling, and machine learning algorithms.

  14. Intrusion Detect. CICEV2023: DDoS Attack Profiling

    • kaggle.com
    zip
    Updated Mar 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agung Pambudi (2025). Intrusion Detect. CICEV2023: DDoS Attack Profiling [Dataset]. https://www.kaggle.com/datasets/agungpambudi/secure-intrusion-detection-ddos-attacks-profiling
    Explore at:
    zip(231762852 bytes)Available download formats
    Dataset updated
    Mar 27, 2025
    Authors
    Agung Pambudi
    Description

    To cite the dataset please reference it as Y. Kim, S. Hakak, and A. Ghorbani. "DDoS Attack Dataset (CICEV2023) against EV Authentication in Charging Infrastructure," in 2023 20th Annual International Conference on Privacy, Security and Trust (PST), IEEE Computer Society, pp. 1-9, August 2023.

    Explore a comprehensive dataset capturing DDoS attack scenarios within electric vehicle (EV) charging infrastructure. This dataset features diverse machine learning attributes, including packet access counts, system status details, and authentication profiles across multiple charging stations and grid services. Simulated attack scenarios, authentication protocols, and extensive profiling results offer invaluable insights for training and testing detection models in safeguarding EV charging systems against cyber threats.

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F5737185%2F2dec3a047fec426e0b6d2f7672d25016%2Fadjusted-5221113.jpg?generation=1743055158796994&alt=media" alt=""> Figure 1: Proposed simulator structure, source: Y. Kim, S. Hakak, and A. Ghorbani.


    Acknowledgment :

    The authors sincerely appreciate the support provided by the Canadian Institute for Cybersecurity (CIC), as well as the funding received from the Canada Research Chair and the Atlantic Canada Opportunities Agency (ACOA).


    Reference :

    Y. Kim, S. Hakak, and A. Ghorbani. "DDoS Attack Dataset (CICEV2023) against EV Authentication in Charging Infrastructure," in 2023 20th Annual International Conference on Privacy, Security and Trust (PST), IEEE Computer Society, pp. 1-9, August 2023.

  15. Spotify API Data on The Beatles songs

    • kaggle.com
    Updated May 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Suraj Karakulath (2024). Spotify API Data on The Beatles songs [Dataset]. https://www.kaggle.com/datasets/surajkarakulath/spotify-data-on-the-beatles-songs
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 25, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Suraj Karakulath
    Description

    Data on the song catalogue of The Beatles, along with the audio features such as Tempo, Key, Mode, Energy, Loudness, Valence and Danceability among others as assigned in the Spotify API.

  16. Data from: WRS DATASET

    • kaggle.com
    Updated Mar 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VELOCIS (2025). WRS DATASET [Dataset]. https://www.kaggle.com/datasets/vbproductions/wrs-dataset-2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 10, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    VELOCIS
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    🌍 # AI Waste Recognition Dataset – Empowering Sustainable Solutions with Deep Learning 📌 ## Overview The AI Waste Recognition Dataset is a high-quality dataset designed for training deep learning models to automatically classify and detect waste materials. With a growing need for smart waste management, this dataset provides a structured approach to recognizing four key waste categories:

    ♻ Plastic Bottles ♻ Aluminium Cans ♻ Paper Cups ♻ Glass Bottles

    By leveraging this dataset, researchers, data scientists, and AI enthusiasts can develop advanced computer vision models to enhance automated recycling systems, reduce environmental pollution, and contribute to a sustainable future.

    📊 Dataset Details 🔹 Total Images: 100,000+ (Augmented for diversity) 🔹 Categories: 4 (Plastic Bottles, Aluminium Cans, Paper Cups, Glass Bottles) 🔹 Resolution: High-quality 256x256 images 🔹 Annotations: Labeled with folder names (stored in labels.csv) 🔹 File Format: JPEG / PNG

    This dataset includes real-world waste images collected from various environments, augmented with advanced transformations to improve model generalization.

    🚀 Ideal Use Cases ✅ Object Detection & Classification – Train CNNs, YOLO, Faster R-CNN, etc. ✅ AI-Powered Recycling Bins – Automate waste sorting in smart bins. ✅ Environmental AI Research – Contribute to eco-friendly AI projects. ✅ Edge AI & IoT – Deploy waste detection models on edge devices.

    📥 How to Use? 1️⃣ Download the dataset or load it via Kaggle API. 2️⃣ Use labels.csv to map images to their respective classes. 3️⃣ Train deep learning models using TensorFlow, PyTorch, or YOLO. 4️⃣ Deploy your model for real-world waste classification!

    🎯 Why This Dataset? 🌟 Well-structured & diverse – Covers different lighting, backgrounds & perspectives. 🌟 AI-ready – Optimized for deep learning & computer vision tasks. 🌟 Promotes sustainability – Helps in developing AI solutions for waste management. 🌟 Real-world applications – Supports smart cities & environmental research.

    🛠️ Get Started Today! Use this dataset to build innovative AI models, contribute to make dataset superior, and be part of the VELOCIS.

    Let’s revolutionize waste management with AI! 🚀♻

    🔹 Keywords: AI Waste Detection, Smart Recycling, Object Recognition, Deep Learning, CNN, YOLO, Kaggle Dataset

  17. 785 Million Language Translation Database for AI

    • kaggle.com
    Updated Aug 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ramakrishnan Lakshmanan (2023). 785 Million Language Translation Database for AI [Dataset]. https://www.kaggle.com/datasets/ramakrishnan1984/785-million-language-translation-database-ai-ml
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 28, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ramakrishnan Lakshmanan
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Description

    Our groundbreaking translation dataset represents a monumental advancement in the field of natural language processing and machine translation. Comprising a staggering 785 million records, this corpus bridges language barriers by offering translations from English to an astonishing 548 languages. The dataset promises to be a cornerstone resource for researchers, engineers, and developers seeking to enhance their machine translation models, cross-lingual analysis, and linguistic investigations.

    Size of the dataset – 41GB(Uncompressed) and Compressed – 20GB

    Key Features:

    Scope and Scale: With a comprehensive collection of 785 million records, this dataset provides an unparalleled wealth of translated text. Each record consists of an English sentence paired with its translation in one of the 548 target languages, enabling multi-directional translation applications.

    Language Diversity: Encompassing translations into 548 languages, this dataset represents a diverse array of linguistic families, dialects, and scripts. From widely spoken languages to those with limited digital representation, the dataset bridges communication gaps on a global scale.

    Quality and Authenticity: The translations have been meticulously curated, verified, and cross-referenced to ensure high quality and authenticity. This attention to detail guarantees that the dataset is not only extensive but also reliable, serving as a solid foundation for machine learning applications. Data is collected from various open datasets for my personal ML projects and looking to share it to team.

    Use Case Versatility: Researchers and practitioners across a spectrum of domains can harness this dataset for a myriad of applications. It facilitates the training and evaluation of machine translation models, empowers cross-lingual sentiment analysis, aids in linguistic typology studies, and supports cultural and sociolinguistic investigations.

    Machine Learning Advancement: Machine translation models, especially neural machine translation (NMT) systems, can leverage this dataset to enhance their training. The large-scale nature of the dataset allows for more robust and contextually accurate translation outputs.

    Fine-tuning and Customization: Developers can fine-tune translation models using specific language pairs, offering a powerful tool for specialized translation tasks. This customization capability ensures that the dataset is adaptable to various industries and use cases.

    Data Format: The dataset is provided in a structured json format, facilitating easy integration into existing machine learning pipelines. This structured approach expedites research and experimentation. Json format contains the English word and equivalent word as single record. Data was exported from MongoDB database to ensure the uniqueness of the record. Each of the record is unique and sorted.

    Access: The dataset is available for academic and research purposes, enabling the global AI community to contribute to and benefit from its usage. A well-documented API and sample code are provided to expedite exploration and integration.

    The English-to-548-languages translation dataset represents an incredible leap forward in advancing multilingual communication, breaking down barriers to understanding, and fostering collaboration on a global scale. It holds the potential to reshape how we approach cross-lingual communication, linguistic studies, and the development of cutting-edge translation technologies.

    Dataset Composition: The dataset is a culmination of translations from English, a widely spoken and understood language, into 548 distinct languages. Each language represents a unique linguistic and cultural background, providing a rich array of translation contexts. This diverse range of languages spans across various language families, regions, and linguistic complexities, making the dataset a comprehensive repository for linguistic research.

    Data Volume and Scale: With a staggering 785 million records, the dataset boasts an immense scale that captures a vast array of translations and linguistic nuances. Each translation entry consists of an English source text paired with its corresponding translation in one of the 548 target languages. This vast corpus allows researchers and practitioners to explore patterns, trends, and variations across languages, enabling the development of robust and adaptable translation models.

    Linguistic Coverage: The dataset covers an extensive set of languages, including but not limited to Indo-European, Afroasiatic, Sino-Tibetan, Austronesian, Niger-Congo, and many more. This broad linguistic coverage ensures that languages with varying levels of grammatical complexity, vocabulary richness, and syntactic structures are included, enhancing the applicability of translation models across diverse linguistic landscapes.

    Dataset Preparation: The translation ...

  18. Counter Strike 2 Win Prediction (FACEIT)

    • kaggle.com
    Updated Mar 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pierce Hentosh (2025). Counter Strike 2 Win Prediction (FACEIT) [Dataset]. https://www.kaggle.com/datasets/piercehentosh/counter-strike-2-win-prediction-faceit
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 29, 2025
    Dataset provided by
    Kaggle
    Authors
    Pierce Hentosh
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset contains match statistics from FACEIT, an online matchmaking service for Counter-Strike 2 (CS2). The data has been collected via the FACEIT API* (see below) and preprocessed for machine learning applications focused on predicting match outcomes based on team average player performance history and elo. Use win_prediciton_clean or join each excel file data_win_prediction_#.xlsx (excel files are not clean and contain duplicates/non-competitive maps).

    Dataset Details: - Observations: 9,651 matches - response: win - team that one the given match (a or b) - match ID - the id given by FACEIT for the match played, can be used to pull additional match data from API.*

    Features: - Average Win Percentage (for the given map) - Average ELO (team skill rating) - Average Kills per Round (K/R Ratio)

    Also attached is notebooks used to pull data, feature engineering, and model tuning. The highest predictive accuracy I was able to get was 77.11% ± 0.84 using CNN.

    *If you'd like to pull data from FACEIT API, you need an authorization token from FACEIT, you can get more information at https://docs.faceit.com/.

  19. Taylor Swift Dataset (TTPD included)

    • kaggle.com
    Updated Jun 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yamini Manral (2024). Taylor Swift Dataset (TTPD included) [Dataset]. https://www.kaggle.com/datasets/yaminimanral/taylor-swift-dataset-ttpd-included
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 6, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Yamini Manral
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset encompasses all albums released by the renowned US songwriter and artist Taylor Swift, up to and including June 6, 2024. The most recent addition to this collection is "The Tortured Poets Department: The Anthology," featuring 31 tracks. The dataset has been generated using the Python library SpotiPy and is provided in an untouched, unfiltered, and raw state, making it ideal for model training, data analysis, or visualization projects.

    Key Features: - Comprehensive Collection: Includes all of Taylor Swift's albums released by June 6, 2024. - Latest Album: "The Tortured Poets Department: The Anthology" with 31 tracks. - Raw and Unfiltered: The dataset is presented in its original form without any modifications, ensuring the authenticity of the data. - Generated with SpotiPy: Data extracted using the SpotiPy library, ensuring accuracy and reliability.

    Usage Notes: - Multiple Versions of Albums: Be aware that the dataset includes multiple versions of some albums. This means that tracks and their details may appear more than once if they are present in different album versions. - Model Training and Visualization: The dataset's comprehensive and unaltered nature makes it an excellent resource for various applications, including machine learning model training, data analysis, and visualizations.

    Potential Applications: - Music Analysis: Analyze trends, patterns, and characteristics of Taylor Swift's music over the years. - Machine Learning: Train models for music recommendation, genre classification, or popularity prediction. - Data Visualization: Create visual representations of Taylor Swift's discography, track features, and album details.

    Dataset Contents: - Album Details: Information about each album, including release dates, album names, and the number of tracks. - Track Information: Details about each track, such as track names, durations, and other relevant metadata. - Track Audio feature: Includes features like danceability, energy, acousticness, speechiness, etc. Note: Description of Audio Features have directly been taken from Spotify API description for each term to eliminate any confusion.

    Acknowledgements: This dataset was created using the SpotiPy library, a Python client for the Spotify Web API, which allows for easy access to Spotify's vast music catalog.

    We hope this dataset provides valuable insights and facilitates various analyses and applications related to Taylor Swift's music.

    For any questions or issues, please feel free to contact us through the Kaggle community forum.

    Enjoy exploring Taylor Swift's musical journey!

  20. UAE Real Estate 2024 Dataset

    • kaggle.com
    Updated Aug 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kanchana1990 (2024). UAE Real Estate 2024 Dataset [Dataset]. http://doi.org/10.34740/kaggle/ds/5567442
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 20, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Kanchana1990
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Area covered
    United Arab Emirates
    Description

    Dataset Overview

    This dataset provides a detailed snapshot of real estate properties listed in Dubai, UAE, as of August 2024. The dataset includes over 5,000 listings scraped using the Apify API from Propertyfinder and various other real estate websites in the UAE. The data includes key details such as the number of bedrooms and bathrooms, price, location, size, and whether the listing is verified. All personal identifiers, such as agent names and contact details, have been ethically removed.

    Data Science Applications

    Given the size and structure of this dataset, it is ideal for the following data science applications:

    • Price Prediction Models: Predicting the price of properties based on features like location, size, and furnishing status.
    • Market Analysis: Understanding trends in the Dubai real estate market by analyzing price distributions, property types, and locations.
    • Recommendation Systems: Developing systems to recommend properties based on user preferences (e.g., number of bedrooms, budget).
    • Sentiment Analysis: Extracting and analyzing sentiments from the property descriptions to gauge the market's tone.

    This dataset provides a practical foundation for both beginners and experts in data science, allowing for the exploration of real estate trends, development of predictive models, and implementation of machine learning algorithms.

    # Column Descriptors

    • title: The listing's title, summarizing the key selling points of the property.
    • displayAddress: The public address of the property, including the community and city.
    • bathrooms: The number of bathrooms available in the property.
    • bedrooms: The number of bedrooms available in the property.
    • addedOn: The timestamp indicating when the property was added to the listing platform.
    • type: Specifies whether the property is residential, commercial, etc.
    • price: The listed price of the property in AED.
    • verified: A boolean value indicating whether the listing has been verified by the platform.
    • priceDuration: Indicates if the property is listed for sale or rent.
    • sizeMin: The minimum size of the property in square feet.
    • furnishing: Describes whether the property is furnished, unfurnished, or partially furnished.
    • description: A more detailed narrative about the property, including additional features and selling points.

    # Ethically Mined Data

    This dataset was ethically scraped using the Apify API, ensuring compliance with data privacy standards. All personal data such as agent names, phone numbers, and any other sensitive information have been omitted from this dataset to ensure privacy and ethical use. The data is intended solely for educational purposes and should not be used for commercial activities.

    # Acknowledgements

    This dataset was made possible thanks to the following:

    • Apify: For providing the API to ethically scrape the data.
    • Propertyfinder and various other real estate websites in the UAE for the original listings.
    • Kaggle: For providing the platform to share and analyze this dataset.

    -**Photo by** : Francesca Tosolini on Unsplash

    Use the Data Responsibly

    Please ensure that this dataset is used responsibly, with respect to privacy and data ethics. This data is provided for educational purposes.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
sundriedtomatoes (2025). api token [Dataset]. https://www.kaggle.com/datasets/tanushguha/api-token/suggestions
Organization logo

api token

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 1, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
sundriedtomatoes
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Dataset

This dataset was created by sundriedtomatoes

Released under MIT

Contents

Search
Clear search
Close search
Google apps
Main menu