Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains historical stock price data for International Business Machines Corporation (IBM) from [Jan/01/2020] to [May/01/2024]. The dataset includes daily closing prices, adjusted closing prices, and other relevant information.
Comparing machine learning models for stock prediction
This dataset is perfect for data scientists, analysts, and students looking to practice their skills in:
Time series analysis
Stock market analysis
Predictive modeling
Machine learning
Get started: Download the dataset and start exploring!
Facebook
Twitter200 speeches recorded by professional debaters discussing 50 controversial topics (with their manual and automatic transcriptions), and 55 general-purpose claim-rebuttal pairs, along with the results of several annotation experiments performed on these data. The dataset includes: - Audio files of 200 debating speeches (down-sampled, mono & compressed with flac). [first released in IBM Debater® - Recorded Debating Dataset - Release #2] - Manual and automatic transcripts of the speeches, in both raw and cleaned (processed) versions. [first released in IBM Debater® - Recorded Debating Dataset - Release #2] - 55 general-purpose claim-rebuttal pairs written by an expert human debater - The results of several annotation experiments performed using the general-purpose claim-rebuttal pairs and the speeches Size: 1.2 GB
Facebook
TwitterThis sample data module contains representative retail data from a fictional coffee chain. The source data is contained in an uploaded file named April Sales.zip. Source: IBM.
We have created sample data for a fictional coffee shop chain with three locations in New York city. The chain has purchased IBM Cognos Analytics to identify factors that contribute to their success, and ultimately to make data-informed decisions.
Amber and Sandeep are the co-founders of the coffee chain. They uploaded their data in a series of spreadsheets and created a data module. From that data, they designed an operations dashboard and a marketing dashboard.
Inventory
Amber and Sandeep have created two dashboards and one data module that is based on nine spreadsheets:
Data
The sample data module named Coffee sales and marketing can be found in Team content > Samples > Data. There are nine tables:
Facebook
Twitterhttps://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
The dataset reports a collection of earnings call transcripts, the related stock prices, and the sector index In terms of volume, there is a total of 188 transcripts, 11970 stock prices, and 1196 sector index values. Furthermore, all of these data originated in the period 2016-2020 and are related to the NASDAQ stock market. Furthermore, the data collection was made possible by Yahoo Finance and Thomson Reuters Eikon. Specifically, Yahoo Finance enabled the search for stock values and Thomson Reuters Eikon provided the earnings call transcripts. Lastly, the dataset can be used as a benchmark for the evaluation of several NLP techniques to understand their potential for financial applications. Moreover, it is also possible to expand the dataset by extending the period in which the data originated following a similar procedure.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The underlying data is from Stack Overflow's 2019 Developer Survey Responses and can be found: https://stackoverflow.blog/2019/04/09/the-2019-stack-overflow-developer-survey-results-are-in/ Please note my intent with uploading this is to showcase my experience working with the datasets. My goal is to build a centralized portfolio.
Please note that we are using a randomized sample of 1/10th the original data set. Conclusions may not reflect real world.
The goal of this project was to explore, analyze, and visualize.
Follow this link to see the Cognos Dashboard I created: https://dataplatform.cloud.ibm.com/dashboards/ee7bf962-3882-4145-a41c-ecdda9323484/view/4427dc2d63b71c921ee1e6e4079c29002c362d5fe4bb860ad18c7b495d607297f3614099c82f4d5bde135661a7e8400f9d
Feel free to filter and play with the dashboard as you want.
Facebook
TwitterThe datasets are new and more effective testbeds for bAbI dialog task 5 and Stanford Multi-Domain datasets, which incorporate naturalistic variation by the user. Existing benchmarks used to evaluate the performance of end-to-end neural dialog systems lack a key component: natural variation present in human conversations. Most datasets are constructed through crowdsourcing, where the crowd workers follow a fixed template of instructions while enacting the role of a user/agent. This results in straight-forward, somewhat routine, and mostly trouble-free conversations, as crowd workers do not think to represent the full range of actions that occur naturally with real users. We observe that there is a significant drop in performance (more than 60% in Ent. F1 on SMD and 85% in per-dialog accuracy on bAbI task) of recent state-of-the-art end-to-end neural methods such as BossNet and GLMP on both updated datasets which incorporate naturalistic variation by the user.
Facebook
Twitter60 speeches recorded by professional debaters about controversial topics, and their manual and automatic transcripts, in both raw and cleaned (processed) versions. The dataset includes: - Manual and automatic transcripts of the speeches, raw and cleaned versions Size: 1MB
Facebook
Twitter"Predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs." [IBM Sample Data Sets]
Each row represents a customer, each column contains customer’s attributes described on the column Metadata.
The data set includes information about:
To explore this type of models and learn more about the subject.
New version from IBM: https://community.ibm.com/community/user/businessanalytics/blogs/steven-macko/2019/07/11/telco-customer-churn-1113
Facebook
Twitter3,562 speeches recorded by professional debaters discussing 440 controversial topics (with their automatic and manually-corrected transcript texts), and an annotation specifying the response speeches recorded for each speech. The dataset will include: - Audio files of all debate speeches - Automatic and manually-corrected transcripts of the speeches, in both raw and cleaned (processed) versions - An annotation specifying the response speeches recorded for each speech, and the type of the response (explicit/implicit) - Metadata describing the speeches, such as the topic discussed in each speech Size: 30 + 21.7 GB
Facebook
TwitterThe MarketScan Dental Database is a standalone product that corresponds with and is linkable to a given year and version of the IBM MarketScan Commercial Claims and Encounters Database and the MarketScan Medicare Supplemental and Coordination of Benefits Database. Currently, data is available for the years: 2005 - 2023. In order to view the MarketScan Dental user guide or data dictionary, you must have data access to this dataset.
In addition to what's on this page, we also have:
%3C!-- --%3E
%3C!-- --%3E
**Starting in 2026, there will be a data access fee for using the full dataset **(though the 1% sample will remain free to use). The pricing structure and other **relevant information can be found in this **FAQ Sheet.
All manuscripts (and other items you'd like to publish) must be submitted to
support@stanfordphs.freshdesk.com for approval prior to journal submission.
We will check your cell sizes and citations.
For more information about how to cite PHS and PHS datasets, please visit:
https:/phsdocs.developerhub.io/need-help/citing-phs-data-core
Data access is required to view this section.
Metadata access is required to view this section.
Metadata access is required to view this section.
Metadata access is required to view this section.
Metadata access is required to view this section.
Facebook
Twitterhttps://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
CONTEXT
========================================
========================================
Money laundering is a multi-billion dollar issue. Detection of laundering is very difficult. Most automated algorithms have a high false positive rate: legitimate transactions incorrectly flagged as laundering. The converse is also a major problem -- false negatives, i.e. undetected laundering transactions. Naturally, criminals work hard to cover their tracks.
Access to real financial transaction data is highly restricted -- for both proprietary and privacy reasons. Even when access is possible, it is problematic to provide a correct tag (laundering or legitimate) to each transaction -- as noted above. This synthetic transaction data from IBM avoids these problems.
The data provided here is based on a virtual world inhabited by individuals, companies, and banks. Individuals interact with other individuals and companies. Likewise, companies interact with other companies and with individuals. These interactions can take many forms, e.g. purchase of consumer goods and services, purchase orders for industrial supplies, payment of salaries, repayment of loans, and more. These financial transactions are generally conducted via banks, i.e. the payer and receiver both have accounts, with accounts taking multiple forms from checking to credit cards to bitcoin.
Some (small) fraction of the individuals and companies in the generator model engage in criminal behavior -- such as smuggling, illegal gambling, extortion, and more. Criminals obtain funds from these illicit activities, and then try to hide the source of these illicit funds via a series of financial transactions. Such financial transactions to hide illicit funds constitute laundering. Thus, the data available here is labelled and can be used for training and testing AML (Anti Money Laundering) models and for other purposes.
The data generator that created the data here not only models illicit activity, but also tracks funds derived from illicit activity through arbitrarily many transactions -- thus creating the ability to label laundering transactions many steps removed from their illicit source. With this foundation, it is straightforward for the generator to label individual transactions as laundering or legitimate.
Note that this IBM generator models the entire money laundering cycle: - Placement: Sources like smuggling of illicit funds. - Layering: Mixing the illicit funds into the financial system. - Integration: Spending the illicit funds.
As another capability possible only with synthetic data, note that a real bank or other institution typically has access to only a portion of the transactions involved in laundering: the transactions involving that bank. Transactions happening at other banks or between other banks are not seen. Thus, models built on real transactions from one institution can have only a limited view of the world.
By contrast these synthetic transactions contain an entire financial ecosystem. Thus it may be possible to create laundering detection models that undertand the broad sweep of transactions across institutions, but apply those models to make inferences only about transactions at a particular bank.
As another point of reference, IBM previously released data from a very early version of this data generator: https://ibm.box.com/v/AML-Anti-Money-Laundering-Data
The generator has been made significantly more robust since that previous data was released, and these transactions reflect improved realism, bug fixes, and other improvements compared to the previous release.
Credit card transaction data labeled for fraud and built using a related generator is also available on Kaggle: https://www.kaggle.com/datasets/ealtman2019/credit-card-transactions
CONTENT
We release 6 datasets here divided into two groups of three: - Group HI has a relatively higher illicit ratio (more laundering). - Group LI has a relatively lower illicit ratio (less laundering).
Both HI and LI internally have three sets of data: small, medium, and large. The goal is to support a broad degree of modeling and computational resources. All of these datasets are independent, e.g. the small datasets are not ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ASSESSMENT OF THE ECONOMIC EFFICIENCY OF FEDERAL FMCG RETAIL CHAINS IN RUSSIA: A CLUSTER APPROACH
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Multinomial logistic regression parameter estimates for the Allergy Model.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Scenario description:
The drone receives AVP command message (Message type AutoPilot.DroneAVPCommand) from PMS via the IBM IoT Platfom. The command message contains the instruction about the selected parking spots to be checked. The drone takes off and fly to the corresponding parking spots detects the occupancy (FREE or OCCUPIED) of the parking spot and publishes the message from type AutoPilot.ParkingSpotDetection to the PMS via IBM Watson IoT platform und return to the lading position and landed. During the flight the drone sends continuously the message about it current position and some status information as message from type AutoPilot.PositionEstimate to the PMS via IoT Platform.
Session description:
Selection of one free parking spot to be check (see command message contain), the drone detects the parking spot and publish the occupancy information to the PMS for parking management purpose
Datasets descriptions:
AUTOPILOT_BrainPort_AutomatedValetParking_DriverVehicleInteraction: Data extracted from the CAN of the vehicle
Dataset Description This dataset contains e.g. throttlestatus, clutchstatus, brakestatus, brakeforce, wipersstatus, steeringwheel for the vehicle
AUTOPILOT_BrainPort_AutomatedValetParking_DroneAvpCommand: Data sent from drone
Dataset Description This dataset contains route information for a vehicle to a designated parking spot
AUTOPILOT_BrainPort_AutomatedValetParking_EnvironmentSensorsAbsolute: Data extracted from the vehicle environment sensors
Dataset Description This dataset contains information about detected object, with absolute coordinates
AUTOPILOT_BrainPort_AutomatedValetParking_EnvironmentSensorsRelative: Data extracted from the vehicle environment sensors
Dataset Description This dataset contains information about detected object, with relative coordinates
AUTOPILOT_BrainPort_AutomatedValetParking_IotVehicleMessage: Data sent between all devices, vehicles and services
Dataset Description Each sensor data submission is a Message. A Message has an Envelope, a Path, and optionally (but likely) Path Events and optionally Path Media. The envelope bears fundamental information about the individual sender (the vehicle) but not to a level that owner of the vehicle can be identified or different messages can be identified that originate from a single vehicle.
AUTOPILOT_BrainPort_AutomatedValetParking_ParkingSpotDetection: Data sent from drone to parkingService
Dataset Description This dataset contains informaton about detected parking spots
AUTOPILOT_BrainPort_AutomatedValetParking_PositioningSystem: Data from GPS on the vehicle
Dataset Description This dataset contains speed, longitude, latitude, heading from the GPS
AUTOPILOT_BrainPort_AutomatedValetParking_PositioningSystemResampled: Data from GPS on the vehicle
Dataset Description This dataset contains speed,longitude,latitude,heading from the GPS, resampled to 100 milliseconds
AUTOPILOT_BrainPort_AutomatedValetParking_Vehicle: Data from the CAN and sensors about the state of the vehicle
Dataset Description This dataset contains a.o temperature and battery state of the vehicles
AUTOPILOT_BrainPort_AutomatedValetParking_VehicleAvpCommand: Data sent from ParkingService to vehicle
Dataset Description This dataset contains route to parkingspot, and some other environmental information
AUTOPILOT_BrainPort_AutomatedValetParking_VehicleAvpStatus: Data sent from vehicle to ParkingService
Dataset Description This dataset contains information about the current status and parkingstatus of the vehicle
AUTOPILOT_BrainPort_AutomatedValetParking_VehicleDynamics: Data from the CAN and sensors about the state of the vehicle
Dataset Description This dataset contains a.o accelerations and speedlimit of the vehicle, as observed from the CAN and the external sensors
Facebook
TwitterDataset Card for Telco Customer Churn
This dataset contains information about customers of a fictional telecommunications company, including demographic information, services subscribed to, location details, and churn behavior. This merged dataset combines the information from the original Telco Customer Churn dataset with additional details.
Dataset Details
Dataset Description
This merged Telco Customer Churn dataset provides a comprehensive view of customer… See the full description on the dataset page: https://huggingface.co/datasets/aai510-group1/telco-customer-churn.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
It is a data gotten from the transformation and dart engineering. IBM has 24,386,900 and 15 columns. But this result produced a dataset with over 300,000 and 28 columns of the dataset. The remaining portion of the dataset is used with error free datasets.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by xingfenyizhen
Released under Apache 2.0
Facebook
TwitterSupply chain analytics is a valuable part of data-driven decision-making in various industries such as manufacturing, retail, healthcare, and logistics. It is the process of collecting, analyzing and interpreting data related to the movement of products and services from suppliers to customers.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
There is a lack of public available datasets on financial services and specially in the emerging mobile money transactions domain. Financial datasets are important to many researchers and in particular to us performing research in the domain of fraud detection. Part of the problem is the intrinsically private nature of financial transactions, that leads to no publicly available datasets.
We present a synthetic dataset generated using the simulator called PaySim as an approach to such a problem. PaySim uses aggregated data from the private dataset to generate a synthetic dataset that resembles the normal operation of transactions and injects malicious behaviour to later evaluate the performance of fraud detection methods.
PaySim simulates mobile money transactions based on a sample of real transactions extracted from one month of financial logs from a mobile money service implemented in an African country. The original logs were provided by a multinational company, who is the provider of the mobile financial service which is currently running in more than 14 countries all around the world.
This synthetic dataset is scaled down 1/4 of the original dataset and it is created just for Kaggle.
This is a sample of 1 row with headers explanation:
1,PAYMENT,1060.31,C429214117,1089.0,28.69,M1591654462,0.0,0.0,0,0
step - maps a unit of time in the real world. In this case 1 step is 1 hour of time. Total steps 744 (30 days simulation).
type - CASH-IN, CASH-OUT, DEBIT, PAYMENT and TRANSFER.
amount - amount of the transaction in local currency.
nameOrig - customer who started the transaction
oldbalanceOrg - initial balance before the transaction
newbalanceOrig - new balance after the transaction.
nameDest - customer who is the recipient of the transaction
oldbalanceDest - initial balance recipient before the transaction. Note that there is not information for customers that start with M (Merchants).
newbalanceDest - new balance recipient after the transaction. Note that there is not information for customers that start with M (Merchants).
isFraud - This is the transactions made by the fraudulent agents inside the simulation. In this specific dataset the fraudulent behavior of the agents aims to profit by taking control or customers accounts and try to empty the funds by transferring to another account and then cashing out of the system.
isFlaggedFraud - The business model aims to control massive transfers from one account to another and flags illegal attempts. An illegal attempt in this dataset is an attempt to transfer more than 200.000 in a single transaction.
There are 5 similar files that contain the run of 5 different scenarios. These files are better explained at my PhD thesis chapter 7 (PhD Thesis Available here http://urn.kb.se/resolve?urn=urn:nbn:se:bth-12932.
We ran PaySim several times using random seeds for 744 steps, representing each hour of one month of real time, which matches the original logs. Each run took around 45 minutes on an i7 intel processor with 16GB of RAM. The final result of a run contains approximately 24 million of financial records divided into the 5 types of categories: CASH-IN, CASH-OUT, DEBIT, PAYMENT and TRANSFER.
This work is part of the research project ”Scalable resource-efficient systems for big data analytics” funded by the Knowledge Foundation (grant: 20140032) in Sweden.
Please refer to this dataset using the following citations:
PaySim first paper of the simulator:
E. A. Lopez-Rojas , A. Elmir, and S. Axelsson. "PaySim: A financial mobile money simulator for fraud detection". In: The 28th European Modeling and Simulation Symposium-EMSS, Larnaca, Cyprus. 2016
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset contains historical stock price data for International Business Machines Corporation (IBM) from [Jan/01/2020] to [May/01/2024]. The dataset includes daily closing prices, adjusted closing prices, and other relevant information.
Comparing machine learning models for stock prediction
This dataset is perfect for data scientists, analysts, and students looking to practice their skills in:
Time series analysis
Stock market analysis
Predictive modeling
Machine learning
Get started: Download the dataset and start exploring!