26 datasets found

api token
kaggle.com
Updated Apr 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sundriedtomatoes (2025). api token [Dataset]. https://www.kaggle.com/datasets/tanushguha/api-token/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 1, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
sundriedtomatoes
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by sundriedtomatoes

Released under MIT

Contents
codeparrot_1M
kaggle.com
Updated Feb 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tanay Mehta (2024). codeparrot_1M [Dataset]. https://www.kaggle.com/datasets/heyytanay/codeparrot-1m
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 25, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Tanay Mehta
Description
A subset of codeparrot/github-code dataset consisting of 1 Million tokenized Python files in Lance file format for blazing fast and memory efficient I/O.

The files were tokenized using the EleutherAI/gpt-neox-20b tokenizer with no extra tokens.

For detailed information on how the dataset was created, refer to my article on Curating Custom Datasets for efficient LLM training using Lance.

The script used for creating the dataset can be found here.

Instructions for using this dataset

This dataset is not supposed to be used on Kaggle Kernels since Lance requires the input directory of the dataset to have write access but Kaggle Kernel's input directory doesn't have it and the dataset size prohibits one from moving it to /kaggle/working. Hence, to use this dataset, you must download it by using the Kaggle API or through this page and then move the unzipped files to a folder called codeparrot_1M.lance. Below are detailed snippets on how to download and use this dataset.

First download and unzip the dataset from your terminal (make sure you have your kaggle API key at ~/.kaggle/:

$ pip install -q kaggle pyarrow pylance $ kaggle datasets download -d heyytanay/codeparrot-1m $ mkdir codeparrot_1M.lance/ $ unzip -qq codeparrot-1m.zip -d codeparrot_1M.lance/ $ rm codeparrot-1m.zip

Once this is done, you will find your dataset in the codeparrot_1M.lance/ folder. Now to load and get a gist of the data, run the below snippet.

import lance dataset = lance.dataset('codeparrot_1M.lance/') print(dataset.count_rows())

This will give you the total number of tokens in the dataset.

Considerations for Using the Data The dataset consists of source code from a wide range of repositories. As such they can potentially include harmful or biased code as well as sensitive information like passwords or usernames.
h
amazon-product-data-2020
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Calm Goose, amazon-product-data-2020 [Dataset]. https://huggingface.co/datasets/calmgoose/amazon-product-data-2020
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Calm Goose
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
What is this?

This is a cleaned version of Amazon Product Dataset 2020 from Kaggle.

Why?

Using via Hugging Face API is easier; Kaggle API is annoying because their authentication is having credentials in a folder. Cleaned because 13/28 columns are empty.
A
‘Doge Coin: An explosion’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Doge Coin: An explosion’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-doge-coin-an-explosion-98dc/eb83891a/?iid=002-697&v=presentation
Explore at:
Dataset updated
Jan 6, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Doge Coin: An explosion’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/cyruskouhyar/doge-coin-an-explosion on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Context🔥

this Dataset contains the doge coin prices in 2019-Now

Content💯

Doge Coin prices with details, open, close, low and high prices. open and close and all related dates. API that I got the results of is CoinAPI: with free plan you can access rest api i put the link below so you can also use it.

https://www.coinapi.io/

Acknowledgements✔️

thanks to CoinAPI for this amazing service. I will be happy if you vote up it and follow my kaggle profile.😃 I did the same thing for bitcoin: https://www.kaggle.com/cyruskouhyar/btcprices2015now

--- Original source retains full ownership of the source dataset ---
Top 100 Cryptos - 15 min cycles
kaggle.com
Updated Mar 5, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Idan Erez (2018). Top 100 Cryptos - 15 min cycles [Dataset]. https://www.kaggle.com/datasets/idanerez/top-100-cryptos-updates-every-15-min
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 5, 2018
Dataset provided by
Kaggle
Authors
Idan Erez
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

The past two months were crazy in the crypto market. The goal is to allow analyze correlations between Bitcoin and other Crypto Currencies in order to do smarter day-trading.

Content

This data set was updated every 15 min using Coin Market Cap API and includes the top 100 coins market cap, price in USD and price in BTC. Every row has its update time in EST Time zone

Acknowledgements

Coin Market Cap API

Inspiration

Who are the followers and leaders in the crypto market? When BTC goes down - what coins should be bought and when? When it goes up - which coins start to rise following it but still giving us enough time to buy them?
t
Credit Card Fraud Detection
test.researchdata.tuwien.at
zenodo.org
+1more
csv, json, pdf +2
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja (2025). Credit Card Fraud Detection [Dataset]. http://doi.org/10.82556/yvxj-9t22
Explore at:
csv, pdf, text/markdown, txt, jsonAvailable download formats
Unique identifier
https://doi.org/10.82556/yvxj-9t22
Dataset updated
Apr 28, 2025
Dataset provided by
TU Wien
Authors
Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 28, 2025
Description
Below is a draft DMP–style description of your credit‐card fraud detection experiment, modeled on the antiquities example:

1. Dataset Description

Research Domain
This work resides in the domain of financial fraud detection and applied machine learning. We focus on detecting anomalous credit‐card transactions in real time to reduce financial losses and improve trust in digital payment systems.

Purpose
The goal is to train and evaluate a binary classification model that flags potentially fraudulent transactions. By publishing both the code and data splits via FAIR repositories, we enable reproducible benchmarking of fraud‐detection algorithms and support future research on anomaly detection in transaction data.

Data Sources
We used the publicly available credit‐card transaction dataset from Kaggle (original source: https://www.kaggle.com/mlg-ulb/creditcardfraud), which contains anonymized transactions made by European cardholders over two days in September 2013. The dataset includes 284 807 transactions, of which 492 are fraudulent.

Method of Dataset Preparation

Schema validation: Renamed columns to snake_case (e.g. transaction_amount, is_declined) so they conform to DBRepo’s requirements.

Data import: Uploaded the full CSV into DBRepo, assigned persistent identifiers (PIDs).

Splitting: Programmatically derived three subsets—training (70%), validation (15%), test (15%)—using range‐based filters on the primary key actionnr. Each subset was materialized in DBRepo and assigned its own PID for precise citation.

Cleaning: Converted the categorical flags (is_declined, isforeigntransaction, ishighriskcountry, isfradulent) from “Y”/“N” to 1/0 and dropped non‐feature identifiers (actionnr, merchant_id).

Modeling: Trained a RandomForest classifier on the training split, tuned on validation, and evaluated on the held‐out test set.

2. Technical Details

Dataset Structure

The raw data is a single CSV with columns:

actionnr (integer transaction ID)

merchant_id (string)

average_amount_transaction_day (float)

transaction_amount (float)

is_declined, isforeigntransaction, ishighriskcountry, isfradulent (binary flags)

total_number_of_declines_day, daily_chargeback_avg_amt, sixmonth_avg_chbk_amt, sixmonth_chbk_freq (numeric features)

Naming Conventions

All columns use lowercase snake_case.

Subsets are named creditcard_training, creditcard_validation, creditcard_test in DBRepo.

Files in the code repo follow a clear structure:

├── data/ # local copies only; raw data lives in DBRepo ├── notebooks/Task.ipynb ├── models/rf_model_v1.joblib ├── outputs/ # confusion_matrix.png, roc_curve.png, predictions.csv ├── README.md ├── requirements.txt └── codemeta.json

Required Software

Python 3.9+

pandas, numpy (data handling)

scikit-learn (modeling, metrics)

matplotlib (visualizations)

dbrepo‐client.py (DBRepo API)

requests (TU WRD API)

Additional Resources

Original dataset: https://www.kaggle.com/mlg-ulb/creditcardfraud

Scikit-learn docs: https://scikit-learn.org/stable

DBRepo API guide: via the starter notebook’s dbrepo_client.py template

TU WRD REST API spec: https://test.researchdata.tuwien.ac.at/api/docs

3. Further Details

Data Limitations

Highly imbalanced: only ~0.17% of transactions are fraudulent.

Anonymized PCA features (V1–V28) hidden; we extended with domain features but cannot reverse engineer raw variables.

Time‐bounded: only covers two days of transactions, may not capture seasonal patterns.

Licensing and Attribution

Raw data: CC-0 (per Kaggle terms)

Code & notebooks: MIT License

Model artifacts & outputs: CC-BY 4.0

DUWRD records include ORCID identifiers for the author.

Recommended Uses

Benchmarking new fraud‐detection algorithms on a standard imbalanced dataset.

Educational purposes: demonstrating model‐training pipelines, FAIR data practices.

Extension: adding time‐series or deep‐learning models.

Known Issues

Possible temporal leakage if date/time features not handled correctly.

Model performance may degrade on live data due to concept drift.

Binary flags may oversimplify nuanced transaction outcomes.
t
Credit Card Fraud Detection
test.researchdata.tuwien.ac.at
csv, json, pdf +1
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja (2025). Credit Card Fraud Detection [Dataset]. http://doi.org/10.82556/yvxj-9t22
Explore at:
text/markdown, csv, pdf, jsonAvailable download formats
Unique identifier
https://doi.org/10.82556/yvxj-9t22
Dataset updated
Apr 28, 2025
Dataset provided by
TU Wien
Authors
Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 28, 2025
Description
Below is a draft DMP–style description of your credit‐card fraud detection experiment, modeled on the antiquities example:

1. Dataset Description

Research Domain
This work resides in the domain of financial fraud detection and applied machine learning. We focus on detecting anomalous credit‐card transactions in real time to reduce financial losses and improve trust in digital payment systems.

Purpose
The goal is to train and evaluate a binary classification model that flags potentially fraudulent transactions. By publishing both the code and data splits via FAIR repositories, we enable reproducible benchmarking of fraud‐detection algorithms and support future research on anomaly detection in transaction data.

Data Sources
We used the publicly available credit‐card transaction dataset from Kaggle (original source: https://www.kaggle.com/mlg-ulb/creditcardfraud), which contains anonymized transactions made by European cardholders over two days in September 2013. The dataset includes 284 807 transactions, of which 492 are fraudulent.

Method of Dataset Preparation

Schema validation: Renamed columns to snake_case (e.g. transaction_amount, is_declined) so they conform to DBRepo’s requirements.

Data import: Uploaded the full CSV into DBRepo, assigned persistent identifiers (PIDs).

Splitting: Programmatically derived three subsets—training (70%), validation (15%), test (15%)—using range‐based filters on the primary key actionnr. Each subset was materialized in DBRepo and assigned its own PID for precise citation.

Cleaning: Converted the categorical flags (is_declined, isforeigntransaction, ishighriskcountry, isfradulent) from “Y”/“N” to 1/0 and dropped non‐feature identifiers (actionnr, merchant_id).

Modeling: Trained a RandomForest classifier on the training split, tuned on validation, and evaluated on the held‐out test set.

2. Technical Details

Dataset Structure

The raw data is a single CSV with columns:

actionnr (integer transaction ID)

merchant_id (string)

average_amount_transaction_day (float)

transaction_amount (float)

is_declined, isforeigntransaction, ishighriskcountry, isfradulent (binary flags)

total_number_of_declines_day, daily_chargeback_avg_amt, sixmonth_avg_chbk_amt, sixmonth_chbk_freq (numeric features)

Naming Conventions

All columns use lowercase snake_case.

Subsets are named creditcard_training, creditcard_validation, creditcard_test in DBRepo.

Files in the code repo follow a clear structure:

├── data/ # local copies only; raw data lives in DBRepo ├── notebooks/Task.ipynb ├── models/rf_model_v1.joblib ├── outputs/ # confusion_matrix.png, roc_curve.png, predictions.csv ├── README.md ├── requirements.txt └── codemeta.json

Required Software

Python 3.9+

pandas, numpy (data handling)

scikit-learn (modeling, metrics)

matplotlib (visualizations)

dbrepo‐client.py (DBRepo API)

requests (TU WRD API)

Additional Resources

Original dataset: https://www.kaggle.com/mlg-ulb/creditcardfraud

Scikit-learn docs: https://scikit-learn.org/stable

DBRepo API guide: via the starter notebook’s dbrepo_client.py template

TU WRD REST API spec: https://test.researchdata.tuwien.ac.at/api/docs

3. Further Details

Data Limitations

Highly imbalanced: only ~0.17% of transactions are fraudulent.

Anonymized PCA features (V1–V28) hidden; we extended with domain features but cannot reverse engineer raw variables.

Time‐bounded: only covers two days of transactions, may not capture seasonal patterns.

Licensing and Attribution

Raw data: CC-0 (per Kaggle terms)

Code & notebooks: MIT License

Model artifacts & outputs: CC-BY 4.0

DUWRD records include ORCID identifiers for the author.

Recommended Uses

Benchmarking new fraud‐detection algorithms on a standard imbalanced dataset.

Educational purposes: demonstrating model‐training pipelines, FAIR data practices.

Extension: adding time‐series or deep‐learning models.

Known Issues

Possible temporal leakage if date/time features not handled correctly.

Model performance may degrade on live data due to concept drift.

Binary flags may oversimplify nuanced transaction outcomes.
Integrated Cryptocurrency Historical Data for a Predictive Data-Driven...
cryptodata.center
Updated Dec 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
cryptodata.center (2024). Integrated Cryptocurrency Historical Data for a Predictive Data-Driven Decision-Making Algorithm - Dataset - CryptoData Hub [Dataset]. https://cryptodata.center/dataset/integrated-cryptocurrency-historical-data-for-a-predictive-data-driven-decision-making-algorithm
Explore at:
Dataset updated
Dec 4, 2024
Dataset provided by
CryptoDATA
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cryptocurrency historical datasets from January 2012 (if available) to October 2021 were obtained and integrated from various sources and Application Programming Interfaces (APIs) including Yahoo Finance, Cryptodownload, CoinMarketCap, various Kaggle datasets, and multiple APIs. While these datasets used various formats of time (e.g., minutes, hours, days), in order to integrate the datasets days format was used for in this research study. The integrated cryptocurrency historical datasets for 80 cryptocurrencies including but not limited to Bitcoin (BTC), Ethereum (ETH), Binance Coin (BNB), Cardano (ADA), Tether (USDT), Ripple (XRP), Solana (SOL), Polkadot (DOT), USD Coin (USDC), Dogecoin (DOGE), Tron (TRX), Bitcoin Cash (BCH), Litecoin (LTC), EOS (EOS), Cosmos (ATOM), Stellar (XLM), Wrapped Bitcoin (WBTC), Uniswap (UNI), Terra (LUNA), SHIBA INU (SHIB), and 60 more cryptocurrencies were uploaded in this online Mendeley data repository. Although the primary attribute of including the mentioned cryptocurrencies was the Market Capitalization, a subject matter expert i.e., a professional trader has also guided the initial selection of the cryptocurrencies by analyzing various indicators such as Relative Strength Index (RSI), Moving Average Convergence/Divergence (MACD), MYC Signals, Bollinger Bands, Fibonacci Retracement, Stochastic Oscillator and Ichimoku Cloud. The primary features of this dataset that were used as the decision-making criteria of the CLUS-MCDA II approach are Timestamps, Open, High, Low, Closed, Volume (Currency), % Change (7 days and 24 hours), Market Cap and Weighted Price values. The available excel and CSV files in this data set are just part of the integrated data and other databases, datasets and API References that was used in this study are as follows: [1] https://finance.yahoo.com/ [2] https://coinmarketcap.com/historical/ [3] https://cryptodatadownload.com/ [4] https://kaggle.com/philmohun/cryptocurrency-financial-data [5] https://kaggle.com/deepshah16/meme-cryptocurrency-historical-data [6] https://kaggle.com/sudalairajkumar/cryptocurrencypricehistory [7] https://min-api.cryptocompare.com/data/price?fsym=BTC&tsyms=USD [8] https://min-api.cryptocompare.com/ [9] https://p.nomics.com/cryptocurrency-bitcoin-api [10] https://www.coinapi.io/ [11] https://www.coingecko.com/en/api [12] https://cryptowat.ch/ [13] https://www.alphavantage.co/ This dataset is part of the CLUS-MCDA (Cluster analysis for improving Multiple Criteria Decision Analysis) and CLUS-MCDAII Project: https://aimaghsoodi.github.io/CLUSMCDA-R-Package/ https://github.com/Aimaghsoodi/CLUS-MCDA-II https://github.com/azadkavian/CLUS-MCDA
Global Financial Inclusion (Global Findex) Data
kaggle.com
zip
Updated May 16, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Bank (2019). Global Financial Inclusion (Global Findex) Data [Dataset]. https://www.kaggle.com/theworldbank/global-financial-inclusion-global-findex-data
Explore at:
zip(7384649 bytes)Available download formats
Dataset updated
May 16, 2019
Dataset authored and provided by
World Bankhttp://worldbank.org/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Content

The Global Financial Inclusion Database provides 800 country-level indicators of financial inclusion summarized for all adults and disaggregated by key demographic characteristics-gender, age, education, income, and rural residence. Covering more than 140 economies, the indicators of financial inclusion measure how people save, borrow, make payments and manage risk.

The reference citation for the data is: Demirguc-Kunt, Asli, Leora Klapper, Dorothe Singer, and Peter Van Oudheusden. 2015. “The Global Findex Database 2014: Measuring Financial Inclusion around the World.” Policy Research Working Paper 7255, World Bank, Washington, DC.

Context

This is a dataset hosted by the World Bank. The organization has an open data platform found here and they update their information according the amount of data that is brought in. Explore the World Bank using Kaggle and all of the data sources available through the World Bank organization page!

Update Frequency: This dataset is updated daily.

Acknowledgements

This dataset is maintained using the World Bank's APIs and Kaggle's API.

Cover photo by ZACHARY STAINES on Unsplash
Unsplash Images are distributed under a unique Unsplash License.

RAWG Games Dataset

kaggle.com

Updated Feb 14, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

TF (2025). RAWG Games Dataset [Dataset]. https://www.kaggle.com/datasets/atalaydenknalbant/rawg-games-dataset/code

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Feb 14, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

https://cdn-uploads.huggingface.co/production/uploads/65e3c559d26b426e3e1994f8/e85CmtDucO_FQ-W5h1RTB.png">

https://visitor-badge.laobi.icu/badge?page_id=atalaydenknalbant/rawg-games-dataset" alt="visitors">

Description

RAWG Games Dataset video game records data gathered directly from the RAWG API. It includes essential fields such as game id, title, release date, rating, genres, platforms, descriptive tags, Metacritic score, developers, publishers, playtime, and a detailed description. The data was collected to support studies, trend analysis, and insights into the gaming industry. Each field is aligned with the specifications provided in the RAWG API documentation.

Latest Update: February 14, 2025

Acknowledgements

Grateful to RAWG for data API.

Field	Description
id	A unique identifier for each game, serving as the primary key to reference detailed game data via the API.
name	The official title of the game.
released	The release date of the game, typically in the YYYY-MM-DD format.
rating	An aggregated score based on player reviews, computed on a standardized scale reflecting user opinions.
genres	A list of genre objects categorizing the game (e.g., Action, Adventure, RPG).
platforms	An array of platform objects that indicate on which systems the game is available (e.g., PC, PlayStation, Xbox).
tags	A collection of descriptive keyword tags (e.g., multiplayer, indie).
metacritic	A numerical score derived from Metacritic reviews (usually ranging from 0 to 100).
developers	The individuals or companies responsible for creating the game.
publishers	Entities that market and distribute the game.
playtime	An estimate of the average time (in hours) that players spend engaging with the game.
description	A detailed narrative of the game, providing in-depth information about gameplay, plot, mechanics, and overall context.

St. Louis Fed Economic News Index Real GDP Nowcast
kaggle.com
zip
Updated Dec 12, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
St. Louis Fed (2019). St. Louis Fed Economic News Index Real GDP Nowcast [Dataset]. https://www.kaggle.com/stlouisfed/st.-louis-fed-economic-news-index-real-gdp-nowcast
Explore at:
zip(1270 bytes)Available download formats
Dataset updated
Dec 12, 2019
Dataset provided by
Federal Reserve Bank Of St. Louishttps://www.stlouisfed.org/
Authors
St. Louis Fed
Area covered
St. Louis
Description
Content

St. Louis Fed’s Economic News Index (ENI) uses economic content from key monthly economic data releases to forecast the growth of real GDP during that quarter. In general, the most-current observation is revised multiple times throughout the quarter. The final forecasted value (before the BEA’s release of the advance estimate of GDP) is the static, historical value for that quarter. For more information, see Grover, Sean P.; Kliesen, Kevin L.; and McCracken, Michael W. “A Macroeconomic News Index for Constructing Nowcasts of U.S. Real Gross Domestic Product Growth" (https://research.stlouisfed.org/publications/review/2016/12/05/a-macroeconomic-news-index-for-constructing-nowcasts-of-u-s-real-gross-domestic-product-growth/ )

Context

This is a dataset from the Federal Reserve Bank of St. Louis hosted by the Federal Reserve Economic Database (FRED). FRED has a data platform found here and they update their information according to the frequency that the data updates. Explore the Federal Reserve Bank of St. Louis using Kaggle and all of the data sources available through the St. Louis Fed organization page!

Update Frequency: This dataset is updated daily.

Observation Start: 2013-04-01

Observation End : 2019-10-01

Acknowledgements

This dataset is maintained using FRED's API and Kaggle's API.

Cover photo by Ferdinand Stöhr on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
St. Louis Fed Financial Stress Index
kaggle.com
zip
Updated Dec 11, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
St. Louis Fed (2019). St. Louis Fed Financial Stress Index [Dataset]. https://www.kaggle.com/stlouisfed/st.-louis-fed-financial-stress-index
Explore at:
zip(8684 bytes)Available download formats
Dataset updated
Dec 11, 2019
Dataset provided by
Federal Reserve Bank Of St. Louishttps://www.stlouisfed.org/
Authors
St. Louis Fed
Description
Content

The STLFSI measures the degree of financial stress in the markets and is constructed from 18 weekly data series: seven interest rate series, six yield spreads and five other indicators. Each of these variables captures some aspect of financial stress. Accordingly, as the level of financial stress in the economy changes, the data series are likely to move together.

How to Interpret the Index: The average value of the index, which begins in late 1993, is designed to be zero. Thus, zero is viewed as representing normal financial market conditions. Values below zero suggest below-average financial market stress, while values above zero suggest above-average financial market stress.

More information: For additional information on the STLFSI and its construction, see "Measuring Financial Market Stress" (https://files.stlouisfed.org/research/publications/es/10/ES1002.pdf) and the related appendix (https://files.stlouisfed.org/files/htdocs/publications/net/NETJan2010Appendix.pdf).

See this list (https://www.stlouisfed.org/news-releases/st-louis-fed-financial-stress-index/stlfsi-key) of the components that are used to construct the STLFSI.

As of 07/15/2010 the Vanguard Financial Exchange-Traded Fund series has been replaced with the S&P 500 Financials Index. This change was made to facilitate a more timely and automated updating of the FSI. Switching from the Vanguard series to the S&P series produced no meaningful change in the index.

Copyright, 2016, Federal Reserve Bank of St. Louis.

Context

This is a dataset from the Federal Reserve Bank of St. Louis hosted by the Federal Reserve Economic Database (FRED). FRED has a data platform found here and they update their information according to the frequency that the data updates. Explore the Federal Reserve Bank of St. Louis using Kaggle and all of the data sources available through the St. Louis Fed organization page!

Update Frequency: This dataset is updated daily.

Observation Start: 1993-12-31

Observation End : 2019-11-29

Acknowledgements

This dataset is maintained using FRED's API and Kaggle's API.

Cover photo by Laura Lefurgey-Smith on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
Historical Price Data of Tether (USDT)
kaggle.com
Updated Dec 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sergio Nantenaina Raoelinirina (2024). Historical Price Data of Tether (USDT) [Dataset]. https://www.kaggle.com/datasets/sergioraoelinirina/historical-price-data-of-tether-usdt
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 26, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sergio Nantenaina Raoelinirina
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset provides comprehensive annual data on Tether (USDT), one of the most widely used stablecoins in the cryptocurrency ecosystem. The data includes key market metrics collected via the CoinGecko API, structured for in-depth analysis and versatile applications, such as market analysis, financial modeling, and machine learning algorithms.
Intrusion Detect. CICEV2023: DDoS Attack Profiling
kaggle.com
zip
Updated Mar 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agung Pambudi (2025). Intrusion Detect. CICEV2023: DDoS Attack Profiling [Dataset]. https://www.kaggle.com/datasets/agungpambudi/secure-intrusion-detection-ddos-attacks-profiling
Explore at:
zip(231762852 bytes)Available download formats
Dataset updated
Mar 27, 2025
Authors
Agung Pambudi
Description
To cite the dataset please reference it as Y. Kim, S. Hakak, and A. Ghorbani. "DDoS Attack Dataset (CICEV2023) against EV Authentication in Charging Infrastructure," in 2023 20th Annual International Conference on Privacy, Security and Trust (PST), IEEE Computer Society, pp. 1-9, August 2023.

Explore a comprehensive dataset capturing DDoS attack scenarios within electric vehicle (EV) charging infrastructure. This dataset features diverse machine learning attributes, including packet access counts, system status details, and authentication profiles across multiple charging stations and grid services. Simulated attack scenarios, authentication protocols, and extensive profiling results offer invaluable insights for training and testing detection models in safeguarding EV charging systems against cyber threats.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F5737185%2F2dec3a047fec426e0b6d2f7672d25016%2Fadjusted-5221113.jpg?generation=1743055158796994&alt=media" alt=""> Figure 1: Proposed simulator structure, source: Y. Kim, S. Hakak, and A. Ghorbani.

Acknowledgment :

The authors sincerely appreciate the support provided by the Canadian Institute for Cybersecurity (CIC), as well as the funding received from the Canada Research Chair and the Atlantic Canada Opportunities Agency (ACOA).

Reference :

Y. Kim, S. Hakak, and A. Ghorbani. "DDoS Attack Dataset (CICEV2023) against EV Authentication in Charging Infrastructure," in 2023 20th Annual International Conference on Privacy, Security and Trust (PST), IEEE Computer Society, pp. 1-9, August 2023.
Spotify API Data on The Beatles songs
kaggle.com
Updated May 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Suraj Karakulath (2024). Spotify API Data on The Beatles songs [Dataset]. https://www.kaggle.com/datasets/surajkarakulath/spotify-data-on-the-beatles-songs
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 25, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Suraj Karakulath
Description
Data on the song catalogue of The Beatles, along with the audio features such as Tempo, Key, Mode, Energy, Loudness, Valence and Danceability among others as assigned in the Spotify API.
Data from: WRS DATASET
kaggle.com
Updated Mar 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VELOCIS (2025). WRS DATASET [Dataset]. https://www.kaggle.com/datasets/vbproductions/wrs-dataset-2
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 10, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
VELOCIS
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
🌍 # AI Waste Recognition Dataset – Empowering Sustainable Solutions with Deep Learning 📌 ## Overview The AI Waste Recognition Dataset is a high-quality dataset designed for training deep learning models to automatically classify and detect waste materials. With a growing need for smart waste management, this dataset provides a structured approach to recognizing four key waste categories:

♻ Plastic Bottles ♻ Aluminium Cans ♻ Paper Cups ♻ Glass Bottles

By leveraging this dataset, researchers, data scientists, and AI enthusiasts can develop advanced computer vision models to enhance automated recycling systems, reduce environmental pollution, and contribute to a sustainable future.

📊 Dataset Details 🔹 Total Images: 100,000+ (Augmented for diversity) 🔹 Categories: 4 (Plastic Bottles, Aluminium Cans, Paper Cups, Glass Bottles) 🔹 Resolution: High-quality 256x256 images 🔹 Annotations: Labeled with folder names (stored in labels.csv) 🔹 File Format: JPEG / PNG

This dataset includes real-world waste images collected from various environments, augmented with advanced transformations to improve model generalization.

🚀 Ideal Use Cases ✅ Object Detection & Classification – Train CNNs, YOLO, Faster R-CNN, etc. ✅ AI-Powered Recycling Bins – Automate waste sorting in smart bins. ✅ Environmental AI Research – Contribute to eco-friendly AI projects. ✅ Edge AI & IoT – Deploy waste detection models on edge devices.

📥 How to Use? 1️⃣ Download the dataset or load it via Kaggle API. 2️⃣ Use labels.csv to map images to their respective classes. 3️⃣ Train deep learning models using TensorFlow, PyTorch, or YOLO. 4️⃣ Deploy your model for real-world waste classification!

🎯 Why This Dataset? 🌟 Well-structured & diverse – Covers different lighting, backgrounds & perspectives. 🌟 AI-ready – Optimized for deep learning & computer vision tasks. 🌟 Promotes sustainability – Helps in developing AI solutions for waste management. 🌟 Real-world applications – Supports smart cities & environmental research.

🛠️ Get Started Today! Use this dataset to build innovative AI models, contribute to make dataset superior, and be part of the VELOCIS.

Let’s revolutionize waste management with AI! 🚀♻

🔹 Keywords: AI Waste Detection, Smart Recycling, Object Recognition, Deep Learning, CNN, YOLO, Kaggle Dataset
785 Million Language Translation Database for AI
kaggle.com
Updated Aug 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ramakrishnan Lakshmanan (2023). 785 Million Language Translation Database for AI [Dataset]. https://www.kaggle.com/datasets/ramakrishnan1984/785-million-language-translation-database-ai-ml
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 28, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ramakrishnan Lakshmanan
License
http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
Description
Our groundbreaking translation dataset represents a monumental advancement in the field of natural language processing and machine translation. Comprising a staggering 785 million records, this corpus bridges language barriers by offering translations from English to an astonishing 548 languages. The dataset promises to be a cornerstone resource for researchers, engineers, and developers seeking to enhance their machine translation models, cross-lingual analysis, and linguistic investigations.

Size of the dataset – 41GB(Uncompressed) and Compressed – 20GB

Key Features:

Scope and Scale: With a comprehensive collection of 785 million records, this dataset provides an unparalleled wealth of translated text. Each record consists of an English sentence paired with its translation in one of the 548 target languages, enabling multi-directional translation applications.

Language Diversity: Encompassing translations into 548 languages, this dataset represents a diverse array of linguistic families, dialects, and scripts. From widely spoken languages to those with limited digital representation, the dataset bridges communication gaps on a global scale.

Quality and Authenticity: The translations have been meticulously curated, verified, and cross-referenced to ensure high quality and authenticity. This attention to detail guarantees that the dataset is not only extensive but also reliable, serving as a solid foundation for machine learning applications. Data is collected from various open datasets for my personal ML projects and looking to share it to team.

Use Case Versatility: Researchers and practitioners across a spectrum of domains can harness this dataset for a myriad of applications. It facilitates the training and evaluation of machine translation models, empowers cross-lingual sentiment analysis, aids in linguistic typology studies, and supports cultural and sociolinguistic investigations.

Machine Learning Advancement: Machine translation models, especially neural machine translation (NMT) systems, can leverage this dataset to enhance their training. The large-scale nature of the dataset allows for more robust and contextually accurate translation outputs.

Fine-tuning and Customization: Developers can fine-tune translation models using specific language pairs, offering a powerful tool for specialized translation tasks. This customization capability ensures that the dataset is adaptable to various industries and use cases.

Data Format: The dataset is provided in a structured json format, facilitating easy integration into existing machine learning pipelines. This structured approach expedites research and experimentation. Json format contains the English word and equivalent word as single record. Data was exported from MongoDB database to ensure the uniqueness of the record. Each of the record is unique and sorted.

Access: The dataset is available for academic and research purposes, enabling the global AI community to contribute to and benefit from its usage. A well-documented API and sample code are provided to expedite exploration and integration.

The English-to-548-languages translation dataset represents an incredible leap forward in advancing multilingual communication, breaking down barriers to understanding, and fostering collaboration on a global scale. It holds the potential to reshape how we approach cross-lingual communication, linguistic studies, and the development of cutting-edge translation technologies.

Dataset Composition: The dataset is a culmination of translations from English, a widely spoken and understood language, into 548 distinct languages. Each language represents a unique linguistic and cultural background, providing a rich array of translation contexts. This diverse range of languages spans across various language families, regions, and linguistic complexities, making the dataset a comprehensive repository for linguistic research.

Data Volume and Scale: With a staggering 785 million records, the dataset boasts an immense scale that captures a vast array of translations and linguistic nuances. Each translation entry consists of an English source text paired with its corresponding translation in one of the 548 target languages. This vast corpus allows researchers and practitioners to explore patterns, trends, and variations across languages, enabling the development of robust and adaptable translation models.

Linguistic Coverage: The dataset covers an extensive set of languages, including but not limited to Indo-European, Afroasiatic, Sino-Tibetan, Austronesian, Niger-Congo, and many more. This broad linguistic coverage ensures that languages with varying levels of grammatical complexity, vocabulary richness, and syntactic structures are included, enhancing the applicability of translation models across diverse linguistic landscapes.

Dataset Preparation: The translation ...
Counter Strike 2 Win Prediction (FACEIT)
kaggle.com
Updated Mar 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pierce Hentosh (2025). Counter Strike 2 Win Prediction (FACEIT) [Dataset]. https://www.kaggle.com/datasets/piercehentosh/counter-strike-2-win-prediction-faceit
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 29, 2025
Dataset provided by
Kaggle
Authors
Pierce Hentosh
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset contains match statistics from FACEIT, an online matchmaking service for Counter-Strike 2 (CS2). The data has been collected via the FACEIT API* (see below) and preprocessed for machine learning applications focused on predicting match outcomes based on team average player performance history and elo. Use win_prediciton_clean or join each excel file data_win_prediction_#.xlsx (excel files are not clean and contain duplicates/non-competitive maps).

Dataset Details: - Observations: 9,651 matches - response: win - team that one the given match (a or b) - match ID - the id given by FACEIT for the match played, can be used to pull additional match data from API.*

Features: - Average Win Percentage (for the given map) - Average ELO (team skill rating) - Average Kills per Round (K/R Ratio)

Also attached is notebooks used to pull data, feature engineering, and model tuning. The highest predictive accuracy I was able to get was 77.11% ± 0.84 using CNN.

*If you'd like to pull data from FACEIT API, you need an authorization token from FACEIT, you can get more information at https://docs.faceit.com/.
Taylor Swift Dataset (TTPD included)
kaggle.com
Updated Jun 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yamini Manral (2024). Taylor Swift Dataset (TTPD included) [Dataset]. https://www.kaggle.com/datasets/yaminimanral/taylor-swift-dataset-ttpd-included
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 6, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Yamini Manral
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset encompasses all albums released by the renowned US songwriter and artist Taylor Swift, up to and including June 6, 2024. The most recent addition to this collection is "The Tortured Poets Department: The Anthology," featuring 31 tracks. The dataset has been generated using the Python library SpotiPy and is provided in an untouched, unfiltered, and raw state, making it ideal for model training, data analysis, or visualization projects.

Key Features: - Comprehensive Collection: Includes all of Taylor Swift's albums released by June 6, 2024. - Latest Album: "The Tortured Poets Department: The Anthology" with 31 tracks. - Raw and Unfiltered: The dataset is presented in its original form without any modifications, ensuring the authenticity of the data. - Generated with SpotiPy: Data extracted using the SpotiPy library, ensuring accuracy and reliability.

Usage Notes: - Multiple Versions of Albums: Be aware that the dataset includes multiple versions of some albums. This means that tracks and their details may appear more than once if they are present in different album versions. - Model Training and Visualization: The dataset's comprehensive and unaltered nature makes it an excellent resource for various applications, including machine learning model training, data analysis, and visualizations.

Potential Applications: - Music Analysis: Analyze trends, patterns, and characteristics of Taylor Swift's music over the years. - Machine Learning: Train models for music recommendation, genre classification, or popularity prediction. - Data Visualization: Create visual representations of Taylor Swift's discography, track features, and album details.

Dataset Contents: - Album Details: Information about each album, including release dates, album names, and the number of tracks. - Track Information: Details about each track, such as track names, durations, and other relevant metadata. - Track Audio feature: Includes features like danceability, energy, acousticness, speechiness, etc. Note: Description of Audio Features have directly been taken from Spotify API description for each term to eliminate any confusion.

Acknowledgements: This dataset was created using the SpotiPy library, a Python client for the Spotify Web API, which allows for easy access to Spotify's vast music catalog.

We hope this dataset provides valuable insights and facilitates various analyses and applications related to Taylor Swift's music.

For any questions or issues, please feel free to contact us through the Kaggle community forum.

Enjoy exploring Taylor Swift's musical journey!
UAE Real Estate 2024 Dataset
kaggle.com
Updated Aug 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kanchana1990 (2024). UAE Real Estate 2024 Dataset [Dataset]. http://doi.org/10.34740/kaggle/ds/5567442
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/ds/5567442
Dataset updated
Aug 20, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kanchana1990
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Area covered
United Arab Emirates
Description
Dataset Overview

This dataset provides a detailed snapshot of real estate properties listed in Dubai, UAE, as of August 2024. The dataset includes over 5,000 listings scraped using the Apify API from Propertyfinder and various other real estate websites in the UAE. The data includes key details such as the number of bedrooms and bathrooms, price, location, size, and whether the listing is verified. All personal identifiers, such as agent names and contact details, have been ethically removed.

Data Science Applications

Given the size and structure of this dataset, it is ideal for the following data science applications:

Price Prediction Models: Predicting the price of properties based on features like location, size, and furnishing status.

Market Analysis: Understanding trends in the Dubai real estate market by analyzing price distributions, property types, and locations.

Recommendation Systems: Developing systems to recommend properties based on user preferences (e.g., number of bedrooms, budget).

Sentiment Analysis: Extracting and analyzing sentiments from the property descriptions to gauge the market's tone.

This dataset provides a practical foundation for both beginners and experts in data science, allowing for the exploration of real estate trends, development of predictive models, and implementation of machine learning algorithms.

# Column Descriptors

title: The listing's title, summarizing the key selling points of the property.

displayAddress: The public address of the property, including the community and city.

bathrooms: The number of bathrooms available in the property.

bedrooms: The number of bedrooms available in the property.

addedOn: The timestamp indicating when the property was added to the listing platform.

type: Specifies whether the property is residential, commercial, etc.

price: The listed price of the property in AED.

verified: A boolean value indicating whether the listing has been verified by the platform.

priceDuration: Indicates if the property is listed for sale or rent.

sizeMin: The minimum size of the property in square feet.

furnishing: Describes whether the property is furnished, unfurnished, or partially furnished.

description: A more detailed narrative about the property, including additional features and selling points.

# Ethically Mined Data

This dataset was ethically scraped using the Apify API, ensuring compliance with data privacy standards. All personal data such as agent names, phone numbers, and any other sensitive information have been omitted from this dataset to ensure privacy and ethical use. The data is intended solely for educational purposes and should not be used for commercial activities.

# Acknowledgements

This dataset was made possible thanks to the following:

Apify: For providing the API to ethically scrape the data.

Propertyfinder and various other real estate websites in the UAE for the original listings.

Kaggle: For providing the platform to share and analyze this dataset.

-**Photo by** : Francesca Tosolini on Unsplash

Use the Data Responsibly

Please ensure that this dataset is used responsibly, with respect to privacy and data ethics. This data is provided for educational purposes.

Facebook

Twitter

Click to copy link

Link copied

Cite

sundriedtomatoes (2025). api token [Dataset]. https://www.kaggle.com/datasets/tanushguha/api-token/suggestions

api token

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 1, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

sundriedtomatoes

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Dataset

This dataset was created by sundriedtomatoes

Released under MIT

Clear search

Close search

Google apps

Main menu

api token

Dataset

Contents

codeparrot_1M

Instructions for using this dataset

amazon-product-data-2020

‘Doge Coin: An explosion’ analyzed by Analyst-2

Context🔥

Content💯

https://www.coinapi.io/

Acknowledgements✔️

Top 100 Cryptos - 15 min cycles

Context

Content

Acknowledgements

Inspiration

Credit Card Fraud Detection

1. Dataset Description

2. Technical Details

3. Further Details

Credit Card Fraud Detection

1. Dataset Description

2. Technical Details

3. Further Details

Integrated Cryptocurrency Historical Data for a Predictive Data-Driven...

Global Financial Inclusion (Global Findex) Data

Content

Context

Acknowledgements

RAWG Games Dataset

Description

Acknowledgements

St. Louis Fed Economic News Index Real GDP Nowcast

Content

Context

Acknowledgements

St. Louis Fed Financial Stress Index

Content

Context

Acknowledgements

Historical Price Data of Tether (USDT)

Intrusion Detect. CICEV2023: DDoS Attack Profiling

Spotify API Data on The Beatles songs

Data from: WRS DATASET

Let’s revolutionize waste management with AI! 🚀♻

785 Million Language Translation Database for AI

Counter Strike 2 Win Prediction (FACEIT)

Taylor Swift Dataset (TTPD included)

UAE Real Estate 2024 Dataset

api token

Dataset

Contents