29 datasets found

IBM🎗️ | Stock Prices Dataset📊
kaggle.com
zip
Updated May 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mursaleen Ameer (2024). IBM🎗️ | Stock Prices Dataset📊 [Dataset]. https://www.kaggle.com/datasets/innocentmfa/ibm-stock-prices-dataset
Explore at:
zip(27608 bytes)Available download formats
Dataset updated
May 8, 2024
Authors
Mursaleen Ameer
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Description:

This dataset contains historical stock price data for International Business Machines Corporation (IBM) from [Jan/01/2020] to [May/01/2024]. The dataset includes daily closing prices, adjusted closing prices, and other relevant information.

Features:

Date:

Open:

High:

Low:

Close:

Adj Close:

Volume:

Use Cases:

Predicting stock prices

Building stock forecasting models

Analyzing stock market trends

Backtesting investment strategies

Comparing machine learning models for stock prediction

This dataset is perfect for data scientists, analysts, and students looking to practice their skills in:

Time series analysis

Stock market analysis

Predictive modeling

Machine learning

Get started: Download the dataset and start exploring!
i
IBM Debater® - Recorded Debating Dataset - Release #4 (Compressed audio...
research.ibm.com
Updated Sep 25, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). IBM Debater® - Recorded Debating Dataset - Release #4 (Compressed audio files) + Annotated general-purpose claim-rebuttal pairs 200 speeches recorded by professional debaters discussing 50 controversial topics (with their manual and automatic transcriptions), and 55 general-purpose claim-rebuttal pairs, along with the results of several annotation experiments performed on these data. The dataset includes: - Audio files of 200 debating speeches (down-sampled, mono & compressed with flac). [first released in IBM Debater® - Recorded Debating Dataset - Release #2] - Manual and automatic transcripts of the speeches, in both raw and cleaned (processed) versions. [first released in IBM Debater® - Recorded Debating Dataset - Release #2] - 55 general-purpose claim-rebuttal pairs written by an expert human debater - The results of several annotation experiments performed using the general-purpose claim-rebuttal pairs and the speeches Size: 1.2 GB [Dataset]. https://research.ibm.com/haifa/dept/vst/debating_data.shtml
Explore at:
Dataset updated
Sep 25, 2017
Description
200 speeches recorded by professional debaters discussing 50 controversial topics (with their manual and automatic transcriptions), and 55 general-purpose claim-rebuttal pairs, along with the results of several annotation experiments performed on these data. The dataset includes: - Audio files of 200 debating speeches (down-sampled, mono & compressed with flac). [first released in IBM Debater® - Recorded Debating Dataset - Release #2] - Manual and automatic transcripts of the speeches, in both raw and cleaned (processed) versions. [first released in IBM Debater® - Recorded Debating Dataset - Release #2] - 55 general-purpose claim-rebuttal pairs written by an expert human debater - The results of several annotation experiments performed using the general-purpose claim-rebuttal pairs and the speeches Size: 1.2 GB
Coffee shop sample data (11.1.3+)
kaggle.com
zip
Updated Nov 8, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jack Chang (2019). Coffee shop sample data (11.1.3+) [Dataset]. https://www.kaggle.com/datasets/ylchang/coffee-shop-sample-data-1113/suggestions
Explore at:
zip(602708 bytes)Available download formats
Dataset updated
Nov 8, 2019
Authors
Jack Chang
Description
Context

This sample data module contains representative retail data from a fictional coffee chain. The source data is contained in an uploaded file named April Sales.zip. Source: IBM.

We have created sample data for a fictional coffee shop chain with three locations in New York city. The chain has purchased IBM Cognos Analytics to identify factors that contribute to their success, and ultimately to make data-informed decisions.

Amber and Sandeep are the co-founders of the coffee chain. They uploaded their data in a series of spreadsheets and created a data module. From that data, they designed an operations dashboard and a marketing dashboard.

Inventory

Amber and Sandeep have created two dashboards and one data module that is based on nine spreadsheets:

Coffee operations: This sample dashboard demonstrates operational data from a fictional coffee chain. Location: Team content > Samples > Dashboards.

Coffee marketing: This sample dashboard demonstrates marketing data from a fictional coffee chain. Location: Team content > Samples > Dashboards.

Coffee sales and marketing: This sample data module contains representative retail data from a fictional coffee chain. Location: Team content > Samples > Data.

April Sales.zip: This sample data contains representative retail data from a fictional coffee chain. This ZIP file contains nine related CSV files. Location: Team content > Samples > Data > Source files > Retail.

Content

Data

The sample data module named Coffee sales and marketing can be found in Team content > Samples > Data. There are nine tables:

Sales Receipts

Pastry Inventory

Sales Targets

Customer

Dates

Product

Sales Outlet

Staff

Generation

Acknowledgements

https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/12/beanie-coffee-1113
h
earnings_call
huggingface.co
dataverse.nl
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Henning, earnings_call [Dataset]. http://doi.org/10.34894/TJE0D0
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34894/TJE0D0
Authors
John Henning
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
The dataset reports a collection of earnings call transcripts, the related stock prices, and the sector index In terms of volume, there is a total of 188 transcripts, 11970 stock prices, and 1196 sector index values. Furthermore, all of these data originated in the period 2016-2020 and are related to the NASDAQ stock market. Furthermore, the data collection was made possible by Yahoo Finance and Thomson Reuters Eikon. Specifically, Yahoo Finance enabled the search for stock values and Thomson Reuters Eikon provided the earnings call transcripts. Lastly, the dataset can be used as a benchmark for the evaluation of several NLP techniques to understand their potential for financial applications. Moreover, it is also possible to expand the dataset by extending the period in which the data originated following a similar procedure.
2019 Stack Overflow Developer Survey Random Sample
kaggle.com
zip
Updated Mar 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nolan37 (2022). 2019 Stack Overflow Developer Survey Random Sample [Dataset]. https://www.kaggle.com/datasets/nolan37/m1surveydata
Explore at:
zip(5934140 bytes)Available download formats
Dataset updated
Mar 4, 2022
Authors
Nolan37
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
The underlying data is from Stack Overflow's 2019 Developer Survey Responses and can be found: https://stackoverflow.blog/2019/04/09/the-2019-stack-overflow-developer-survey-results-are-in/ Please note my intent with uploading this is to showcase my experience working with the datasets. My goal is to build a centralized portfolio.

Please note that we are using a randomized sample of 1/10th the original data set. Conclusions may not reflect real world.

The goal of this project was to explore, analyze, and visualize.

Follow this link to see the Cognos Dashboard I created: https://dataplatform.cloud.ibm.com/dashboards/ee7bf962-3882-4145-a41c-ecdda9323484/view/4427dc2d63b71c921ee1e6e4079c29002c362d5fe4bb860ad18c7b495d607297f3614099c82f4d5bde135661a7e8400f9d

Feel free to filter and play with the dashboard as you want.
Naturalistic Variation in Goal-Oriented Dialog datasets
github.com
Updated Jul 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IBM (2024). Naturalistic Variation in Goal-Oriented Dialog datasets [Dataset]. https://github.com/IBM/naturalistic-variation-goal-oriented-dialog-datasets
Explore at:
Dataset updated
Jul 22, 2024
Dataset provided by
IBMhttp://ibm.com/
Description
The datasets are new and more effective testbeds for bAbI dialog task 5 and Stanford Multi-Domain datasets, which incorporate naturalistic variation by the user. Existing benchmarks used to evaluate the performance of end-to-end neural dialog systems lack a key component: natural variation present in human conversations. Most datasets are constructed through crowdsourcing, where the crowd workers follow a fixed template of instructions while enacting the role of a user/agent. This results in straight-forward, somewhat routine, and mostly trouble-free conversations, as crowd workers do not think to represent the full range of actions that occur naturally with real users. We observe that there is a significant drop in performance (more than 60% in Ent. F1 on SMD and 85% in per-dialog accuracy on bAbI task) of recent state-of-the-art end-to-end neural methods such as BossNet and GLMP on both updated datasets which incorporate naturalistic variation by the user.
i
IBM Debater® - Recorded Debating Dataset - Release #1 (Light version - no...
research.ibm.com
Updated Sep 25, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). IBM Debater® - Recorded Debating Dataset - Release #1 (Light version - no audio files) 60 speeches recorded by professional debaters about controversial topics, and their manual and automatic transcripts, in both raw and cleaned (processed) versions. The dataset includes: - Manual and automatic transcripts of the speeches, raw and cleaned versions Size: 1MB [Dataset]. https://research.ibm.com/haifa/dept/vst/debating_data.shtml
Explore at:
Dataset updated
Sep 25, 2017
Description
60 speeches recorded by professional debaters about controversial topics, and their manual and automatic transcripts, in both raw and cleaned (processed) versions. The dataset includes: - Manual and automatic transcripts of the speeches, raw and cleaned versions Size: 1MB
Telco Customer Churn
kaggle.com
zip
Updated Feb 23, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BlastChar (2018). Telco Customer Churn [Dataset]. https://www.kaggle.com/datasets/blastchar/telco-customer-churn
Explore at:
zip(175758 bytes)Available download formats
Dataset updated
Feb 23, 2018
Authors
BlastChar
Description
Context

"Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs." [IBM Sample Data Sets]

Content

Each row represents a customer, each column contains customer’s attributes described on the column Metadata.

The data set includes information about:

Customers who left within the last month – the column is called Churn

Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies

Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges

Demographic info about customers – gender, age range, and if they have partners and dependents

Inspiration

To explore this type of models and learn more about the subject.

New version from IBM: https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113
i
IBM Debater® - Recorded Debating Dataset - Release #5 (Full version - 2...
research.ibm.com
Updated Sep 25, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). IBM Debater® - Recorded Debating Dataset - Release #5 (Full version - 2 parts) + Counter speech annotations 3,562 speeches recorded by professional debaters discussing 440 controversial topics (with their automatic and manually-corrected transcript texts), and an annotation specifying the response speeches recorded for each speech. The dataset will include: - Audio files of all debate speeches - Automatic and manually-corrected transcripts of the speeches, in both raw and cleaned (processed) versions - An annotation specifying the response speeches recorded for each speech, and the type of the response (explicit/implicit) - Metadata describing the speeches, such as the topic discussed in each speech Size: 30 + 21.7 GB [Dataset]. https://research.ibm.com/haifa/dept/vst/debating_data.shtml
Explore at:
Dataset updated
Sep 25, 2017
Description
3,562 speeches recorded by professional debaters discussing 440 controversial topics (with their automatic and manually-corrected transcript texts), and an annotation specifying the response speeches recorded for each speech. The dataset will include: - Audio files of all debate speeches - Automatic and manually-corrected transcripts of the speeches, in both raw and cleaned (processed) versions - An annotation specifying the response speeches recorded for each speech, and the type of the response (explicit/implicit) - Metadata describing the speeches, such as the topic discussed in each speech Size: 30 + 21.7 GB
MarketScan Dental
redivis.com
stanford.redivis.com
application/jsonl +7
Updated Jun 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2025). MarketScan Dental [Dataset]. http://doi.org/10.57761/g33d-dy59
Explore at:
csv, avro, parquet, spss, arrow, application/jsonl, stata, sasAvailable download formats
Unique identifier
https://doi.org/10.57761/g33d-dy59
Dataset updated
Jun 27, 2025
Dataset provided by
Redivis Inc.
Authors
Stanford Center for Population Health Sciences
Time period covered
Jan 1, 2007 - Dec 31, 2023
Description
Abstract

The MarketScan Dental Database is a standalone product that corresponds with and is linkable to a given year and version of the IBM MarketScan Commercial Claims and Encounters Database and the MarketScan Medicare Supplemental and Coordination of Benefits Database. Currently, data is available for the years: 2005 - 2023. In order to view the MarketScan Dental user guide or data dictionary, you must have data access to this dataset.

Usage

In addition to what's on this page, we also have:

The MarketScan Commercial Database

The MarketScan Medicare Supplemental Database

The MarketScan Benefit Plan Design data

%3C!-- --%3E

MarketScan Redbook (The MarketScan Code Reference Book)

%3C!-- --%3E

**Starting in 2026, there will be a data access fee for using the full dataset **(though the 1% sample will remain free to use). The pricing structure and other **relevant information can be found in this **FAQ Sheet.

Before Manuscript Submission

All manuscripts (and other items you'd like to publish) must be submitted to

support@stanfordphs.freshdesk.com for approval prior to journal submission.

We will check your cell sizes and citations.

For more information about how to cite PHS and PHS datasets, please visit:

https:/phsdocs.developerhub.io/need-help/citing-phs-data-core

Data Documentation

Data access is required to view this section.

Section 3

Metadata access is required to view this section.

Section 4

Metadata access is required to view this section.

Section 5

Metadata access is required to view this section.

Section 6

Metadata access is required to view this section.
IBM Transactions for Anti Money Laundering (AML)
kaggle.com
zip
Updated Jul 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erik Altman (2025). IBM Transactions for Anti Money Laundering (AML) [Dataset]. https://www.kaggle.com/datasets/ealtman2019/ibm-transactions-for-anti-money-laundering-aml/code
Explore at:
zip(8176169418 bytes)Available download formats
Dataset updated
Jul 8, 2025
Authors
Erik Altman
License
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Description
CONTEXT

========================================

[ Paper describing generation of data and uses of it -- Also to appear at Neurips'2023 ]

[ Github Site with GNN Models to Predict Laundering ]

[ Provably Powerful Graph Neural Networks for Directed Multigraphs ] Paper with more detailed description of the GNN models.

If use the datasets and code and publish papers, we would appreciate if you cite these works.

========================================

Money laundering is a multi-billion dollar issue. Detection of laundering is very difficult. Most automated algorithms have a high false positive rate: legitimate transactions incorrectly flagged as laundering. The converse is also a major problem -- false negatives, i.e. undetected laundering transactions. Naturally, criminals work hard to cover their tracks.

Access to real financial transaction data is highly restricted -- for both proprietary and privacy reasons. Even when access is possible, it is problematic to provide a correct tag (laundering or legitimate) to each transaction -- as noted above. This synthetic transaction data from IBM avoids these problems.

The data provided here is based on a virtual world inhabited by individuals, companies, and banks. Individuals interact with other individuals and companies. Likewise, companies interact with other companies and with individuals. These interactions can take many forms, e.g. purchase of consumer goods and services, purchase orders for industrial supplies, payment of salaries, repayment of loans, and more. These financial transactions are generally conducted via banks, i.e. the payer and receiver both have accounts, with accounts taking multiple forms from checking to credit cards to bitcoin.

Some (small) fraction of the individuals and companies in the generator model engage in criminal behavior -- such as smuggling, illegal gambling, extortion, and more. Criminals obtain funds from these illicit activities, and then try to hide the source of these illicit funds via a series of financial transactions. Such financial transactions to hide illicit funds constitute laundering. Thus, the data available here is labelled and can be used for training and testing AML (Anti Money Laundering) models and for other purposes.

The data generator that created the data here not only models illicit activity, but also tracks funds derived from illicit activity through arbitrarily many transactions -- thus creating the ability to label laundering transactions many steps removed from their illicit source. With this foundation, it is straightforward for the generator to label individual transactions as laundering or legitimate.

Note that this IBM generator models the entire money laundering cycle: - Placement: Sources like smuggling of illicit funds. - Layering: Mixing the illicit funds into the financial system. - Integration: Spending the illicit funds.

As another capability possible only with synthetic data, note that a real bank or other institution typically has access to only a portion of the transactions involved in laundering: the transactions involving that bank. Transactions happening at other banks or between other banks are not seen. Thus, models built on real transactions from one institution can have only a limited view of the world.

By contrast these synthetic transactions contain an entire financial ecosystem. Thus it may be possible to create laundering detection models that undertand the broad sweep of transactions across institutions, but apply those models to make inferences only about transactions at a particular bank.

As another point of reference, IBM previously released data from a very early version of this data generator: https://ibm.box.com/v/AML-Anti-Money-Laundering-Data

The generator has been made significantly more robust since that previous data was released, and these transactions reflect improved realism, bug fixes, and other improvements compared to the previous release.

Credit card transaction data labeled for fraud and built using a related generator is also available on Kaggle: https://www.kaggle.com/datasets/ealtman2019/credit-card-transactions

CONTENT

We release 6 datasets here divided into two groups of three: - Group HI has a relatively higher illicit ratio (more laundering). - Group LI has a relatively lower illicit ratio (less laundering).

Both HI and LI internally have three sets of data: small, medium, and large. The goal is to support a broad degree of modeling and computational resources. All of these datasets are independent, e.g. the small datasets are not ...
The Dionísio Effect: Perfect Quantum Coherence Stability Dataset
zenodo.org
zip
Updated Jun 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andre Luis Tomaz Dionisio; Andre Luis Tomaz Dionisio (2025). The Dionísio Effect: Perfect Quantum Coherence Stability Dataset [Dataset]. http://doi.org/10.5281/zenodo.15668069
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15668069
Dataset updated
Jun 15, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andre Luis Tomaz Dionisio; Andre Luis Tomaz Dionisio
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset containing 160 quantum computing experiments performed on IBM Quantum hardware

demonstrating unprecedented perfect coherence stability. All experiments yielded

identical suppression factors (64.0 ± 0.0) across 1,310,720 individual quantum

measurements. This work was completed using only IBM's free quantum computing tier

(10 minutes/month), proving that groundbreaking quantum research is accessible to all.

Related to Physical Review Letters submissions LF19916 and LF20020.
Detailed Data for Cluster Analysis in IBM SPSS Statistics
figshare.com
xlsx
Updated Sep 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vitaliy Kolomiets (2024). Detailed Data for Cluster Analysis in IBM SPSS Statistics [Dataset]. http://doi.org/10.6084/m9.figshare.27083131.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27083131.v1
Dataset updated
Sep 22, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Vitaliy Kolomiets
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ASSESSMENT OF THE ECONOMIC EFFICIENCY OF FEDERAL FMCG RETAIL CHAINS IN RUSSIA: A CLUSTER APPROACH
Multinomial logistic regression parameter estimates for the Allergy Model.
plos.figshare.com
datasetcatalog.nlm.nih.gov
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sydney Banton; Andrew Baynham; Júlia G. Pezzali; Michael von Massow; Anna K. Shoveller (2023). Multinomial logistic regression parameter estimates for the Allergy Model. [Dataset]. http://doi.org/10.1371/journal.pone.0250806.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0250806.t003
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Sydney Banton; Andrew Baynham; Júlia G. Pezzali; Michael von Massow; Anna K. Shoveller
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Multinomial logistic regression parameter estimates for the Allergy Model.
Z
Brainport, Automated valet parking, parking spot detected by drone
data-staging.niaid.nih.gov
data.niaid.nih.gov
+1more
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DLR (Deutsches Zentrum für Luft- und Raumfahrt); VICOMTECH; NEVS (National Electric Vehicle Sweden); TNO (2020). Brainport, Automated valet parking, parking spot detected by drone [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_3607937
Explore at:
Dataset updated
Jan 24, 2020
Authors
DLR (Deutsches Zentrum für Luft- und Raumfahrt); VICOMTECH; NEVS (National Electric Vehicle Sweden); TNO
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Scenario description:

The drone receives AVP command message (Message type AutoPilot.DroneAVPCommand) from PMS via the IBM IoT Platfom. The command message contains the instruction about the selected parking spots to be checked. The drone takes off and fly to the corresponding parking spots detects the occupancy (FREE or OCCUPIED) of the parking spot and publishes the message from type AutoPilot.ParkingSpotDetection to the PMS via IBM Watson IoT platform und return to the lading position and landed. During the flight the drone sends continuously the message about it current position and some status information as message from type AutoPilot.PositionEstimate to the PMS via IoT Platform.

Session description:

Selection of one free parking spot to be check (see command message contain), the drone detects the parking spot and publish the occupancy information to the PMS for parking management purpose

Datasets descriptions:

AUTOPILOT_BrainPort_AutomatedValetParking_DriverVehicleInteraction: Data extracted from the CAN of the vehicle

Dataset Description This dataset contains e.g. throttlestatus, clutchstatus, brakestatus, brakeforce, wipersstatus, steeringwheel for the vehicle

AUTOPILOT_BrainPort_AutomatedValetParking_DroneAvpCommand: Data sent from drone

Dataset Description This dataset contains route information for a vehicle to a designated parking spot

AUTOPILOT_BrainPort_AutomatedValetParking_EnvironmentSensorsAbsolute: Data extracted from the vehicle environment sensors

Dataset Description This dataset contains information about detected object, with absolute coordinates

AUTOPILOT_BrainPort_AutomatedValetParking_EnvironmentSensorsRelative: Data extracted from the vehicle environment sensors

Dataset Description This dataset contains information about detected object, with relative coordinates

AUTOPILOT_BrainPort_AutomatedValetParking_IotVehicleMessage: Data sent between all devices, vehicles and services

Dataset Description Each sensor data submission is a Message. A Message has an Envelope, a Path, and optionally (but likely) Path Events and optionally Path Media. The envelope bears fundamental information about the individual sender (the vehicle) but not to a level that owner of the vehicle can be identified or different messages can be identified that originate from a single vehicle.

AUTOPILOT_BrainPort_AutomatedValetParking_ParkingSpotDetection: Data sent from drone to parkingService

Dataset Description This dataset contains informaton about detected parking spots

AUTOPILOT_BrainPort_AutomatedValetParking_PositioningSystem: Data from GPS on the vehicle

Dataset Description This dataset contains speed, longitude, latitude, heading from the GPS

AUTOPILOT_BrainPort_AutomatedValetParking_PositioningSystemResampled: Data from GPS on the vehicle

Dataset Description This dataset contains speed,longitude,latitude,heading from the GPS, resampled to 100 milliseconds

AUTOPILOT_BrainPort_AutomatedValetParking_Vehicle: Data from the CAN and sensors about the state of the vehicle

Dataset Description This dataset contains a.o temperature and battery state of the vehicles

AUTOPILOT_BrainPort_AutomatedValetParking_VehicleAvpCommand: Data sent from ParkingService to vehicle

Dataset Description This dataset contains route to parkingspot, and some other environmental information

AUTOPILOT_BrainPort_AutomatedValetParking_VehicleAvpStatus: Data sent from vehicle to ParkingService

Dataset Description This dataset contains information about the current status and parkingstatus of the vehicle

AUTOPILOT_BrainPort_AutomatedValetParking_VehicleDynamics: Data from the CAN and sensors about the state of the vehicle

Dataset Description This dataset contains a.o accelerations and speedlimit of the vehicle, as observed from the CAN and the external sensors
h
telco-customer-churn
huggingface.co
Updated Feb 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
aai510-group1 (2025). telco-customer-churn [Dataset]. https://huggingface.co/datasets/aai510-group1/telco-customer-churn
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 18, 2025
Dataset authored and provided by
aai510-group1
Description
Dataset Card for Telco Customer Churn

This dataset contains information about customers of a fictional telecommunications company, including demographic information, services subscribed to, location details, and churn behavior. This merged dataset combines the information from the original Telco Customer Churn dataset with additional details.

Dataset Details Dataset Description

This merged Telco Customer Churn dataset provides a comprehensive view of customer… See the full description on the dataset page: https://huggingface.co/datasets/aai510-group1/telco-customer-churn.
Credit Card Fraud Dataset
kaggle.com
zip
Updated Jul 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SAYIKANMI TITILAYO MARY (2024). Credit Card Fraud Dataset [Dataset]. https://www.kaggle.com/datasets/sayikanmititilayo/credit-card-fraud-dataset/code
Explore at:
zip(21958791 bytes)Available download formats
Dataset updated
Jul 10, 2024
Authors
SAYIKANMI TITILAYO MARY
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
It is a data gotten from the transformation and dart engineering. IBM has 24,386,900 and 15 columns. But this result produced a dataset with over 300,000 and 28 columns of the dataset. The remaining portion of the dataset is used with error free datasets.
DVSGesture128
kaggle.com
zip
Updated Nov 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
xingfenyizhen (2023). DVSGesture128 [Dataset]. https://www.kaggle.com/datasets/xingfenyizhen/dvsgesture128
Explore at:
zip(2942849907 bytes)Available download formats
Dataset updated
Nov 25, 2023
Authors
xingfenyizhen
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by xingfenyizhen

Released under Apache 2.0

Contents
Supply Chain DataSet
kaggle.com
zip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amir Motefaker (2023). Supply Chain DataSet [Dataset]. https://www.kaggle.com/datasets/amirmotefaker/supply-chain-dataset
Explore at:
zip(9340 bytes)Available download formats
Dataset updated
Jun 1, 2023
Authors
Amir Motefaker
Description
Supply chain analytics is a valuable part of data-driven decision-making in various industries such as manufacturing, retail, healthcare, and logistics. It is the process of collecting, analyzing and interpreting data related to the movement of products and services from suppliers to customers.
Synthetic Financial Datasets For Fraud Detection
kaggle.com
zip
Updated Apr 3, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Edgar Lopez-Rojas (2017). Synthetic Financial Datasets For Fraud Detection [Dataset]. https://www.kaggle.com/datasets/ealaxi/paysim1
Explore at:
zip(186385561 bytes)Available download formats
Dataset updated
Apr 3, 2017
Authors
Edgar Lopez-Rojas
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Context

There is a lack of public available datasets on financial services and specially in the emerging mobile money transactions domain. Financial datasets are important to many researchers and in particular to us performing research in the domain of fraud detection. Part of the problem is the intrinsically private nature of financial transactions, that leads to no publicly available datasets.

We present a synthetic dataset generated using the simulator called PaySim as an approach to such a problem. PaySim uses aggregated data from the private dataset to generate a synthetic dataset that resembles the normal operation of transactions and injects malicious behaviour to later evaluate the performance of fraud detection methods.

Content

PaySim simulates mobile money transactions based on a sample of real transactions extracted from one month of financial logs from a mobile money service implemented in an African country. The original logs were provided by a multinational company, who is the provider of the mobile financial service which is currently running in more than 14 countries all around the world.

This synthetic dataset is scaled down 1/4 of the original dataset and it is created just for Kaggle.

NOTE: Transactions which are detected as fraud are cancelled, so for fraud detection these columns (oldbalanceOrg, newbalanceOrig, oldbalanceDest, newbalanceDest ) must not be used.

Headers

This is a sample of 1 row with headers explanation:

1,PAYMENT,1060.31,C429214117,1089.0,28.69,M1591654462,0.0,0.0,0,0

step - maps a unit of time in the real world. In this case 1 step is 1 hour of time. Total steps 744 (30 days simulation).

type - CASH-IN, CASH-OUT, DEBIT, PAYMENT and TRANSFER.

amount - amount of the transaction in local currency.

nameOrig - customer who started the transaction

oldbalanceOrg - initial balance before the transaction

newbalanceOrig - new balance after the transaction.

nameDest - customer who is the recipient of the transaction

oldbalanceDest - initial balance recipient before the transaction. Note that there is not information for customers that start with M (Merchants).

newbalanceDest - new balance recipient after the transaction. Note that there is not information for customers that start with M (Merchants).

isFraud - This is the transactions made by the fraudulent agents inside the simulation. In this specific dataset the fraudulent behavior of the agents aims to profit by taking control or customers accounts and try to empty the funds by transferring to another account and then cashing out of the system.

isFlaggedFraud - The business model aims to control massive transfers from one account to another and flags illegal attempts. An illegal attempt in this dataset is an attempt to transfer more than 200.000 in a single transaction.

Past Research

There are 5 similar files that contain the run of 5 different scenarios. These files are better explained at my PhD thesis chapter 7 (PhD Thesis Available here http://urn.kb.se/resolve?urn=urn:nbn:se:bth-12932.

We ran PaySim several times using random seeds for 744 steps, representing each hour of one month of real time, which matches the original logs. Each run took around 45 minutes on an i7 intel processor with 16GB of RAM. The final result of a run contains approximately 24 million of financial records divided into the 5 types of categories: CASH-IN, CASH-OUT, DEBIT, PAYMENT and TRANSFER.

Acknowledgements

This work is part of the research project ”Scalable resource-efficient systems for big data analytics” funded by the Knowledge Foundation (grant: 20140032) in Sweden.

Please refer to this dataset using the following citations:

PaySim first paper of the simulator:

E. A. Lopez-Rojas , A. Elmir, and S. Axelsson. "PaySim: A financial mobile money simulator for fraud detection". In: The 28th European Modeling and Simulation Symposium-EMSS, Larnaca, Cyprus. 2016

Facebook

Twitter

Click to copy link

Link copied

Cite

Mursaleen Ameer (2024). IBM🎗️ | Stock Prices Dataset📊 [Dataset]. https://www.kaggle.com/datasets/innocentmfa/ibm-stock-prices-dataset

IBM🎗️ | Stock Prices Dataset📊

IBM [International Business Machine] Explore the Prices

Explore at:

zip(27608 bytes)Available download formats

Dataset updated

May 8, 2024

Authors

Mursaleen Ameer

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Description:

This dataset contains historical stock price data for International Business Machines Corporation (IBM) from [Jan/01/2020] to [May/01/2024]. The dataset includes daily closing prices, adjusted closing prices, and other relevant information.

Features:

Date:
Open:
High:
Low:
Close:
Adj Close:
Volume:

Use Cases:

Predicting stock prices
Building stock forecasting models
Analyzing stock market trends
Backtesting investment strategies
Comparing machine learning models for stock prediction

This dataset is perfect for data scientists, analysts, and students looking to practice their skills in:
Time series analysis
Stock market analysis
Predictive modeling
Machine learning

Get started: Download the dataset and start exploring!

Clear search

Close search

Google apps

Main menu

IBM🎗️ | Stock Prices Dataset📊

Description:

Features:

Use Cases:

IBM Debater® - Recorded Debating Dataset - Release #4 (Compressed audio...

Coffee shop sample data (11.1.3+)

Context

Content

Acknowledgements

earnings_call

2019 Stack Overflow Developer Survey Random Sample

Naturalistic Variation in Goal-Oriented Dialog datasets

IBM Debater® - Recorded Debating Dataset - Release #1 (Light version - no...

Telco Customer Churn

Context

Content

Inspiration

IBM Debater® - Recorded Debating Dataset - Release #5 (Full version - 2...

MarketScan Dental

Abstract

Usage

Before Manuscript Submission

Data Documentation

Section 3

Section 4

Section 5

Section 6

IBM Transactions for Anti Money Laundering (AML)

The Dionísio Effect: Perfect Quantum Coherence Stability Dataset

Detailed Data for Cluster Analysis in IBM SPSS Statistics

Multinomial logistic regression parameter estimates for the Allergy Model.

Brainport, Automated valet parking, parking spot detected by drone

telco-customer-churn

Credit Card Fraud Dataset

DVSGesture128

Dataset

Contents

Supply Chain DataSet

Synthetic Financial Datasets For Fraud Detection

Context

Content

NOTE: Transactions which are detected as fraud are cancelled, so for fraud detection these columns (oldbalanceOrg, newbalanceOrig, oldbalanceDest, newbalanceDest ) must not be used.

Headers

Past Research

Acknowledgements

IBM🎗️ | Stock Prices Dataset📊

IBM [International Business Machine] Explore the Prices

Description:

Features:

Use Cases: