Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains bitcoin transfer transactions extracted from the Bitcoin Mainnet blockchain.
Part1 is available at https://zenodo.org/deposit/7157356 Part3 is available at https://zenodo.org/deposit/7158133 Part4 is available at https://zenodo.org/deposit/7158328
Details of the datasets are given below:
FILENAME FORMAT:
The filenames have the following format:
btc-tx-
where
For example file btc-tx-100000-149999-aa.bz2 and the rest of the parts if any contain transactions from
block 100000 to block 149999 inclusive.
The files are compressed with bzip2. They can be uncompressed using command bunzip2.
TRANSACTION FORMAT:
Each line in a file corresponds to a transaction. The transaction has the following format:
BLOCK TIME FORMAT:
The block time file has the following format:
IMPORTANT NOTE:
Public Bitcoin Mainnet blockchain data is open and can be obtained by connecting as a node on the blockchain or by using the block explorer web sites such as https://btcscan.org . The downloaders and users of this dataset accept the full responsibility of using the data in GDPR compliant manner or any other regulations. We provide the data as is and we cannot be held responsible for anything.
NOTE:
If you use this dataset, please do not forget to add the DOI number to the citation.
If you use our dataset in your research, please also cite our paper: https://link.springer.com/chapter/10.1007/978-3-030-94590-9_14
@incollection{kilicc2022analyzing, title={Analyzing Large-Scale Blockchain Transaction Graphs for Fraudulent Activities}, author={K{\i}l{\i}{\c{c}}, Baran and {"O}zturan, Can and {\c{S}}en, Alper}, booktitle={Big Data and Artificial Intelligence in Digital Finance}, pages={253--267}, year={2022}, publisher={Springer, Cham} }
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The dataset consists of three files: a file with behaviour data (events.csv), a file with item properties (item_properties.сsv) and a file, which describes category tree (category_tree.сsv). The data has been collected from a real-world ecommerce website. It is raw data, i.e. without any content transformations, however, all values are hashed due to confidential issues. The purpose of publishing is to motivate researches in the field of recommender systems with implicit feedback.
The behaviour data, i.e. events like clicks, add to carts, transactions, represent interactions that were collected over a period of 4.5 months. A visitor can make three types of events, namely “view”, “addtocart” or “transaction”. In total there are 2 756 101 events including 2 664 312 views, 69 332 add to carts and 22 457 transactions produced by 1 407 580 unique visitors. For about 90% of events corresponding properties can be found in the “item_properties.csv” file.
For example:
The file with item properties (item_properties.csv) includes 20 275 902 rows, i.e. different properties, describing 417 053 unique items. File is divided into 2 files due to file size limitations. Since the property of an item can vary in time (e.g., price changes over time), every row in the file has corresponding timestamp. In other words, the file consists of concatenated snapshots for every week in the file with the behaviour data. However, if a property of an item is constant over the observed period, only a single snapshot value will be present in the file. For example, we have three properties for single item and 4 weekly snapshots, like below:
timestamp,itemid,property,value
1439694000000,1,100,1000
1439695000000,1,100,1000
1439696000000,1,100,1000
1439697000000,1,100,1000
1439694000000,1,200,1000
1439695000000,1,200,1100
1439696000000,1,200,1200
1439697000000,1,200,1300
1439694000000,1,300,1000
1439695000000,1,300,1000
1439696000000,1,300,1100
1439697000000,1,300,1100
After snapshot merge it would looks like:
1439694000000,1,100,1000
1439694000000,1,200,1000
1439695000000,1,200,1100
1439696000000,1,200,1200
1439697000000,1,200,1300
1439694000000,1,300,1000
1439696000000,1,300,1100
Because property=100 is constant over time, property=200 has different values for all snapshots, property=300 has been changed once.
Item properties file contain timestamp column because all of them are time dependent, since properties may change over time, e.g. price, category, etc. Initially, this file consisted of snapshots for every week in the events file and contained over 200 millions rows. We have merged consecutive constant property values, so it's changed from snapshot form to change log form. Thus, constant values would appear only once in the file. This action has significantly reduced the number of rows in 10 times.
All values in the “item_properties.csv” file excluding "categoryid" and "available" properties were hashed. Value of the "categoryid" property contains item category identifier. Value of the "available" property contains availability of the item, i.e. 1 means the item was available, otherwise 0. All numerical values were marked with "n" char at the beginning, and have 3 digits precision after decimal point, e.g., "5" will become "n5.000", "-3.67584" will become "n-3.675". All words in text values were normalized (stemming procedure: https://en.wikipedia.org/wiki/Stemming) and hashed, numbers were processed as above, e.g. text "Hello world 2017!" will become "24214 44214 n2017.000"
The category tree file has 1669 rows. Every row in the file specifies a child categoryId and the corresponding parent. For example:
Retail Rocket (retailrocket.io) helps web shoppers make better shopping decisions by providing personalized real-time recommendations through multiple channels with over 100MM unique monthly users and 1000+ retail partners over the world.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset consists of feature vectors belonging to 12,330 sessions. The dataset was formed so that each session would belong to a different user in a 1-year period to avoid any tendency to a specific campaign, special day, user profile, or period. Of the 12,330 sessions in the dataset, 84.5% (10,422) were negative class samples that did not end with shopping, and the rest (1908) were positive class samples ending with shopping.The dataset consists of 10 numerical and 8 categorical attributes. The 'Revenue' attribute can be used as the class label.The dataset contains 18 columns, each representing specific attributes of online shopping behavior:Administrative and Administrative_Duration: Number of pages visited and time spent on administrative pages.Informational and Informational_Duration: Number of pages visited and time spent on informational pages.ProductRelated and ProductRelated_Duration: Number of pages visited and time spent on product-related pages.BounceRates and ExitRates: Metrics indicating user behavior during the session.PageValues: Value of the page based on e-commerce metrics.SpecialDay: Likelihood of shopping based on special days.Month: Month of the session.OperatingSystems, Browser, Region, TrafficType: Technical and geographical attributes.VisitorType: Categorizes users as returning, new, or others.Weekend: Indicates if the session occurred on a weekend.Revenue: Target variable indicating whether a transaction was completed (True or False).The original dataset has been picked up from the UCI Machine Learning Repository, the link to which is as follows :https://archive.ics.uci.edu/dataset/468/online+shoppers+purchasing+intention+datasetAdditional Variable InformationThe dataset consists of 10 numerical and 8 categorical attributes. The 'Revenue' attribute can be used as the class label. "Administrative", "Administrative Duration", "Informational", "Informational Duration", "Product Related" and "Product Related Duration" represent the number of different types of pages visited by the visitor in that session and total time spent in each of these page categories. The values of these features are derived from the URL information of the pages visited by the user and updated in real time when a user takes an action, e.g. moving from one page to another. The "Bounce Rate", "Exit Rate" and "Page Value" features represent the metrics measured by "Google Analytics" for each page in the e-commerce site. The value of "Bounce Rate" feature for a web page refers to the percentage of visitors who enter the site from that page and then leave ("bounce") without triggering any other requests to the analytics server during that session. The value of "Exit Rate" feature for a specific web page is calculated as for all pageviews to the page, the percentage that were the last in the session. The "Page Value" feature represents the average value for a web page that a user visited before completing an e-commerce transaction. The "Special Day" feature indicates the closeness of the site visiting time to a specific special day (e.g. Mother’s Day, Valentine's Day) in which the sessions are more likely to be finalized with transaction. The value of this attribute is determined by considering the dynamics of e-commerce such as the duration between the order date and delivery date. For example, for Valentina’s day, this value takes a nonzero value between February 2 and February 12, zero before and after this date unless it is close to another special day, and its maximum value of 1 on February 8. The dataset also includes operating system, browser, region, traffic type, visitor type as returning or new visitor, a Boolean value indicating whether the date of the visit is weekend, and month of the year.
The Measurable AI Amazon Consumer Transaction Dataset is a leading source of email receipts and consumer transaction data, offering data collected directly from users via Proprietary Consumer Apps, with millions of opt-in users.
We source our email receipt consumer data panel via two consumer apps which garner the express consent of our end-users (GDPR compliant). We then aggregate and anonymize all the transactional data to produce raw and aggregate datasets for our clients.
Use Cases Our clients leverage our datasets to produce actionable consumer insights such as: - Market share analysis - User behavioral traits (e.g. retention rates) - Average order values - Promotional strategies used by the key players. Several of our clients also use our datasets for forecasting and understanding industry trends better.
Coverage - Asia (Japan) - EMEA (Spain, United Arab Emirates) - Continental Europe - USA
Granular Data Itemized, high-definition data per transaction level with metrics such as - Order value - Items ordered - No. of orders per user - Delivery fee - Service fee - Promotions used - Geolocation data and more
Aggregate Data - Weekly/ monthly order volume - Revenue delivered in aggregate form, with historical data dating back to 2018. All the transactional e-receipts are sent from app to users’ registered accounts.
Most of our clients are fast-growing Tech Companies, Financial Institutions, Buyside Firms, Market Research Agencies, Consultancies and Academia.
Our dataset is GDPR compliant, contains no PII information and is aggregated & anonymized with user consent. Contact business@measurable.ai for a data dictionary and to find out our volume in each country.
S&P Global Marketplace: home of premium fundamental and alternative datasets you can pair with big data solutions designed to enhance your daily workflows
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description This dataset contains the Pagerank values and rankings of Bitcoin addresses and transaction IDs (TXID). It contains a total of 1.608.748.675 addresses or TXIDs. Part 1 is available at https://zenodo.org/record/6052811 File format The dataset is compressed with bzip2. It can be uncompressed using the command bunzip2. The dataset is divided into multiple files since it was large. The files are space-delimited plain text files and have the following five fields: Label: A alphanumeric Bitcoin address (e.g. 1DzTCMmWABEDM1rYFL1RgdLyE59jXMzEHV) or a 64 character hexadecimal transaction ID (e.g. 000000000fdf0c619cd8e0d512c7e2c0da5a5808e60f12f1e0d01522d2986a51) Type: String Label type: It's value is 0 if the label is transaction ID and 1 if the label is a Bitcoin address. Type: Integer Rank: Unique Pagerank rank where the ties (addresses having the same Pagerank value) are resolved by sorting the addresses. Type: Integer Rank with ties: Pagerank rank where the ties (addresses having the same Pagerank value) have the same rank. Type: Integer Pagerank value: Pagerank of the address and transaction IDs calculated using Pagerank algorithm. Type: Floating-point number Sample lines: 000000000fdf0c619cd8e0d512c7e2c0da5a5808e60f12f1e0d01522d2986a51 0 427225664 266976712 0.979246 1DzTCMmWABEDM1rYFL1RgdLyE59jXMzEHV 1 1114666798 508037940 0.877961 Dataset Generation The Bitcoin transactions between blocks 0 (mined on 03.01.2009) and 713.999 (mined on 13.12.2021) are extracted. A transaction graph is constructed, where Bitcoin addresses and transaction IDs are nodes of the graph and the transaction inputs and outputs are edges of the graph. Pagerank is applied on this transaction graph. This computation is performed using the system presented in the paper 'Parallel analysis of Ethereum blockchain transaction data using cluster computing'. Note If you use our dataset in your research, please cite our paper: https://link.springer.com/article/10.1007/s10586-021-03511-0 @article{kilic2022parallel, title={Parallel Analysis of Ethereum Blockchain Transaction Data using Cluster Computing}, journal={Cluster Computing}, author={K{\i}l{\i}{\c{c}}, Baran and {"O}zturan, Can and Sen, Alper}, year={2022}, month={Jan} } Other Datasets If you are interested, please also check out our Pagerank Dataset for Ethereum Blockchain.
This dataset contains ether as well as popular ERC20 token transfer transactions extracted from the Ethereum Mainnet blockchain.
Only send ether, contract function call, contract deployment transactions are present in the dataset. Miner reward transactions are not currently included in the dataset.
Details of the datasets are given below:
FILENAME FORMAT:
The filenames have the following format:
eth-tx-
where
For example file eth-tx-1000000-1099999.txt.bz2 contains transactions from
block 1000000 to block 1099999 inclusive.
The files are compressed with bzip2. They can be uncompressed using command bunzip2.
TRANSACTION FORMAT:
Each line in a file corresponds to a transaction. The transaction has the following format:
units. ERC20 tokens transfers (transfer and transferFrom function calls in ERC20
contract) are indicated by token symbol. For example GUSD is Gemini USD stable
coin. The JSON file erc20tokens.json given below contains the details of ERC20 tokens.
decoder-error.txt FILE:
This file contains transactions (block no, tx no, tx hash) on each line that produced
error while decoding calldata. These transactions are not present in the data files.
er20tokens.json FILE:
This file contains the list of popular ERC20 token contracts whose transfer/transferFrom
transactions appear in the data files.
-------------------------------------------------------------------------------------------
[
{
"address": "0xdac17f958d2ee523a2206206994597c13d831ec7",
"decdigits": 6,
"symbol": "USDT",
"name": "Tether-USD"
},
{
"address": "0xB8c77482e45F1F44dE1745F52C74426C631bDD52",
"decdigits": 18,
"symbol": "BNB",
"name": "Binance"
},
{
"address": "0x2af5d2ad76741191d15dfe7bf6ac92d4bd912ca3",
"decdigits": 18,
"symbol": "LEO",
"name": "Bitfinex-LEO"
},
{
"address": "0x514910771af9ca656af840dff83e8264ecf986ca",
"decdigits": 18,
"symbol": "LNK",
"name": "Chainlink"
},
{
"address": "0x6f259637dcd74c767781e37bc6133cd6a68aa161",
"decdigits": 18,
"symbol": "HT",
"name": "HuobiToken"
},
{
"address": "0xf1290473e210b2108a85237fbcd7b6eb42cc654f",
"decdigits": 18,
"symbol": "HEDG",
"name": "HedgeTrade"
},
{
"address": "0x9f8f72aa9304c8b593d555f12ef6589cc3a579a2",
"decdigits": 18,
"symbol": "MKR",
"name": "Maker"
},
{
"address": "0xa0b73e1ff0b80914ab6fe0444e65848c4c34450b",
"decdigits": 8,
"symbol": "CRO",
"name": "Crypto.com"
},
{
"address": "0xd850942ef8811f2a866692a623011bde52a462c1",
"decdigits": 18,
"symbol": "VEN",
"name": "VeChain"
},
{
"address": "0x0d8775f648430679a709e98d2b0cb6250d2887ef",
"decdigits": 18,
"symbol": "BAT",
"name": "Basic-Attention"
},
{
"address": "0xc9859fccc876e6b4b3c749c5d29ea04f48acb74f",
"decdigits": 0,
"symbol": "INO",
"name": "INO-Coin"
},
{
"address": "0x8e870d67f660d95d5be530380d0ec0bd388289e1",
"decdigits": 18,
"symbol": "PAX",
"name": "Paxos-Standard"
},
{
"address": "0x17aa18a4b64a55abed7fa543f2ba4e91f2dce482",
"decdigits": 18,
"symbol": "INB",
"name": "Insight-Chain"
},
{
"address": "0xc011a72400e58ecd99ee497cf89e3775d4bd732f",
"decdigits": 18,
"symbol": "SNX",
"name": "Synthetix-Network"
},
{
"address": "0x1985365e9f78359a9B6AD760e32412f4a445E862",
"decdigits": 18,
"symbol": "REP",
"name": "Reputation"
},
{
"address": "0x653430560be843c4a3d143d0110e896c2ab8ac0d",
"decdigits": 16,
"symbol": "MOF",
"name": "Molecular-Future"
},
{
"address": "0x0000000000085d4780B73119b644AE5ecd22b376",
"decdigits": 18,
"symbol": "TUSD",
"name": "True-USD"
},
{
"address": "0xe41d2489571d322189246dafa5ebde1f4699f498",
"decdigits": 18,
"symbol": "ZRX",
"name": "ZRX"
},
{
"address": "0x8ce9137d39326ad0cd6491fb5cc0cba0e089b6a9",
"decdigits": 18,
"symbol": "SXP",
"name": "Swipe"
},
{
"address": "0x75231f58b43240c9718dd58b4967c5114342a86c",
"decdigits": 18,
"symbol": "OKB",
"name": "Okex"
},
{
"address": "0xa974c709cfb4566686553a20790685a47aceaa33",
"decdigits": 18,
"symbol": "XIN",
"name": "Mixin"
},
{
"address": "0xd26114cd6EE289AccF82350c8d8487fedB8A0C07",
"decdigits": 18,
"symbol": "OMG",
"name": "OmiseGO"
},
{
"address": "0x89d24a6b4ccb1b6faa2625fe562bdd9a23260359",
"decdigits": 18,
"symbol": "SAI",
"name": "Sai Stablecoin v1.0"
},
{
"address": "0x6c6ee5e31d828de241282b9606c8e98ea48526e2",
"decdigits": 18,
"symbol": "HOT",
"name": "HoloToken"
},
{
"address": "0x6b175474e89094c44da98b954eedeac495271d0f",
"decdigits": 18,
"symbol": "DAI",
"name": "Dai Stablecoin"
},
{
"address": "0xdb25f211ab05b1c97d595516f45794528a807ad8",
"decdigits": 2,
"symbol": "EURS",
"name": "Statis-EURS"
},
{
"address": "0xa66daa57432024023db65477ba87d4e7f5f95213",
"decdigits": 18,
"symbol": "HPT",
"name": "HuobiPoolToken"
},
{
"address": "0x4fabb145d64652a948d72533023f6e7a623c7c53",
"decdigits": 18,
"symbol": "BUSD",
"name": "Binance-USD"
},
{
"address": "0x056fd409e1d7a124bd7017459dfea2f387b6d5cd",
"decdigits": 2,
"symbol": "GUSD",
"name": "Gemini-USD"
},
{
"address": "0x2c537e5624e4af88a7ae4060c022609376c8d0eb",
"decdigits": 6,
"symbol": "TRYB",
"name": "BiLira"
},
{
"address": "0x4922a015c4407f87432b179bb209e125432e4a2a",
"decdigits": 6,
"symbol": "XAUT",
"name": "Tether-Gold"
},
{
"address": "0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48",
"decdigits": 6,
"symbol": "USDC",
"name": "USD-Coin"
},
{
"address": "0xa5b55e6448197db434b92a0595389562513336ff",
"decdigits": 16,
"symbol": "SUSD",
"name": "Santender"
},
{
"address": "0xffe8196bc259e8dedc544d935786aa4709ec3e64",
"decdigits": 18,
"symbol": "HDG",
"name": "HedgeTrade"
},
{
"address": "0x4a16baf414b8e637ed12019fad5dd705735db2e0",
"decdigits": 2,
"symbol": "QCAD",
"name": "QCAD"
}
]
-------------------------------------------------------------------------------------------
From 1 April 2018, LTT replaced Stamp Duty Land Tax (SDLT) on residential and non-residential property and land interests purchased in Wales. The tax rates and tax bands for LTT vary depending on the type of transaction. Taxpayers must notify the WRA of all land transactions with a value above £40,000. There are also circumstances where certain lease transactions are not notifiable if they are less than 7 years in duration. When filing an LTT return, the organisation paying the return has 30 days after the effective date to submit and pay the return. This dataset includes estimates of LTT notifiable transactions received by the WRA by the close of 17 February 2025. Care should be taken with any comparisons over time which involve data from spring 2020 to summer 2021. This is due to the coronavirus (COVID-19) pandemic and changes to LTT rates. A national lockdown on 23 March 2020 resulted in the housing market being mainly closed from this date until 22 June 2020 when it partially re-opened. The market was re-opened more fully on 27 July, to coincide with a change in LTT rates effective until 30 June 2021. There is evidence some purchasers may have brought their transactions forward to June 2021 to benefit from the temporary tax reduction. There were some changes to LTT rates effective from 22 December 2020. Non-residential transactions and higher rates residential transactions were affected. The main residential rates and bands for Land Transaction changed for transactions effective after 10 October 2022. The dataset includes a breakdown by: - transaction type: residential (including sub-categories for main rate and higher rate transactions), non-residential - transaction value: values grouped into the different bands for which different rates are charged - measure: number of transactions, tax due, value of property taxed - effective date (month, quarter and year) For some transactions we are unable to provide any detail other than the total tax due figure in the year, as there is a risk of revealing details of the individual transactions. Therefore, these transactions are omitted from all breakdowns and instead summarised here: https://statswales.gov.wales/Catalogue/Taxes-devolved-to-Wales/Land-Transaction-Tax/landtransactiontaxstatistics-tax-due-including-transactions-restricted-detail Note that because the value bands differ for residential and non-residential transactions and so there is no viable band breakdown for the total across the two. Note also that where a new non-residential lease is granted, there may be both a purchase element and a rental element. The rental elements are a different concept, shown separately, and not included in the non-residential or overall totals (which reflect the purchase elements only).
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase.
The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.
It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, ... V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-sensitive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.
Given the class imbalance ratio, we recommend measuring the accuracy using the Area Under the Precision-Recall Curve (AUPRC). Confusion matrix accuracy is not meaningful for unbalanced classification.
A simulator for transaction data has been released as part of the practical handbook on Machine Learning for Credit Card Fraud Detection - https://fraud-detection-handbook.github.io/fraud-detection-handbook/Chapter_3_GettingStarted/SimulatedDataset.html. We invite all practitioners interested in fraud detection datasets to also check out this data simulator, and the methodologies for credit card fraud detection presented in the book.
The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection. More details on current and past projects on related topics are available on https://www.researchgate.net/project/Fraud-detection-5 and the page of the DefeatFraud project
Please cite the following works:
Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015
Dal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a practitioner perspective, Expert systems with applications,41,10,4915-4928,2014, Pergamon
Dal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning strategy, IEEE transactions on neural networks and learning systems,29,8,3784-3797,2018,IEEE
Dal Pozzolo, Andrea Adaptive Machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi)
Carcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-Aël; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming credit card fraud detection with Spark, Information fusion,41, 182-194,2018,Elsevier
Carcillo, Fabrizio; Le Borgne, Yann-Aël; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization, International Journal of Data Science and Analytics, 5,4,285-300,2018,Springer International Publishing
Bertrand Lebichot, Yann-Aël Le Borgne, Liyun He, Frederic Oblé, Gianluca Bontempi Deep-Learning Domain Adaptation Techniques for Credit Cards Fraud Detection, INNSBDDL 2019: Recent Advances in Big Data and Deep Learning, pp 78-88, 2019
Fabrizio Carcillo, Yann-Aël Le Borgne, Olivier Caelen, Frederic Oblé, Gianluca Bontempi Combining Unsupervised and Supervised Learning in Credit Card Fraud Detection Information Sciences, 2019
Yann-Aël Le Borgne, Gianluca Bontempi Reproducible machine Learning for Credit Card Fraud Detection - Practical Handbook
Bertrand Lebichot, Gianmarco Paldino, Wissam Siblini, Liyun He, Frederic Oblé, Gianluca Bontempi Incremental learning strategies for credit cards fraud detection, IInternational Journal of Data Science and Analytics
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
3MEth Dataset OverviewSection 1: Token TransactionsThis section provides 303 million transaction records from 3,880 tokens and 35 million users on the Ethereum blockchain. The data is stored in 3,880 CSV files, each representing a specific token. Each transaction includes the following information:Sender and receiver wallet addresses: Enables network analysis and user behavior studies.Token address: Links transactions to specific tokens for token-specific analysis.Transaction value: Reflects the number of tokens transferred, essential for liquidity studies.Blockchain timestamp: Captures transaction timing for temporal analysis.Apart from the large dataset, we also provide a smaller CSV file containing 267,242 transaction records from 29,164 wallet addresses. This smaller dataset involves a total of 1,194 tokens, covering the time period September 2016 to November 2023. This detailed transaction data is critical for studying user behavior, liquidity patterns, and tasks such as link prediction and fraud detection.Section 2: Token InformationThis section offers metadata for 3,880 tokens, stored in corresponding CSV files. Each file contains:Timestamp: Marks the time of data update.Token price: Useful for price prediction and volatility studies.Market capitalization: Reflects the token's market size and dominance.24-hour trading volume: Indicates liquidity and trading activity.Section 3: Global Market IndicesThis section provides macro-level data to contextualize token transactions, stored in separate CSV files. Key indicators include:Bitcoin dominance: Tracks Bitcoin's share of the cryptocurrency market.Total market capitalization: Measures the overall market's value, with breakdowns by token type.Stablecoin market capitalization: Highlights stablecoin liquidity and stability.24-hour trading volume: A key measure of market activity.These indices are essential for integrating global market trends into predictive models for volatility and risk-adjusted returns.Section 4: Textual IndicesThis section contains sentiment data from Reddit's Ethereum community, covering 7,800 top posts from 2014 to 2024. Each post includes:Post score (net upvotes): Reflects engagement and sentiment strength.Timestamp: Aligns sentiment with price movements.Number of comments: Gauges sentiment intensity.Sentiment indices: Sentiment scores computed using methods detailed in the data preprocessing section.The full Reddit textual dataset is available upon request; please contact us for access. Alternatively our open-source repository includes a tool to guide users in collecting Reddit data. Researchers are encouraged to apply for a Reddit API Key and adhere to Reddit's policies. This data is valuable for understanding social dynamics in the market and enhancing sentiment analysis models that can explain market movements and improve behavioral predictions.
Envestnet®| Yodlee®'s Bank Statement Data (Aggregate/Row) Panels consist of de-identified, near-real time (T+1) USA credit/debit/ACH transaction level data – offering a wide view of the consumer activity ecosystem. The underlying data is sourced from end users leveraging the aggregation portion of the Envestnet®| Yodlee®'s financial technology platform.
Envestnet | Yodlee Consumer Panels (Aggregate/Row) include data relating to millions of transactions, including ticket size and merchant location. The dataset includes de-identified credit/debit card and bank transactions (such as a payroll deposit, account transfer, or mortgage payment). Our coverage offers insights into areas such as consumer, TMT, energy, REITs, internet, utilities, ecommerce, MBS, CMBS, equities, credit, commodities, FX, and corporate activity. We apply rigorous data science practices to deliver key KPIs daily that are focused, relevant, and ready to put into production.
We offer free trials. Our team is available to provide support for loading, validation, sample scripts, or other services you may need to generate insights from our data.
Investors, corporate researchers, and corporates can use our data to answer some key business questions such as: - How much are consumers spending with specific merchants/brands and how is that changing over time? - Is the share of consumer spend at a specific merchant increasing or decreasing? - How are consumers reacting to new products or services launched by merchants? - For loyal customers, how is the share of spend changing over time? - What is the company’s market share in a region for similar customers? - Is the company’s loyal user base increasing or decreasing? - Is the lifetime customer value increasing or decreasing?
Additional Use Cases: - Use spending data to analyze sales/revenue broadly (sector-wide) or granular (company-specific). Historically, our tracked consumer spend has correlated above 85% with company-reported data from thousands of firms. Users can sort and filter by many metrics and KPIs, such as sales and transaction growth rates and online or offline transactions, as well as view customer behavior within a geographic market at a state or city level. - Reveal cohort consumer behavior to decipher long-term behavioral consumer spending shifts. Measure market share, wallet share, loyalty, consumer lifetime value, retention, demographics, and more.) - Study the effects of inflation rates via such metrics as increased total spend, ticket size, and number of transactions. - Seek out alpha-generating signals or manage your business strategically with essential, aggregated transaction and spending data analytics.
Use Cases Categories (Our data provides an innumerable amount of use cases, and we look forward to working with new ones): 1. Market Research: Company Analysis, Company Valuation, Competitive Intelligence, Competitor Analysis, Competitor Analytics, Competitor Insights, Customer Data Enrichment, Customer Data Insights, Customer Data Intelligence, Demand Forecasting, Ecommerce Intelligence, Employee Pay Strategy, Employment Analytics, Job Income Analysis, Job Market Pricing, Marketing, Marketing Data Enrichment, Marketing Intelligence, Marketing Strategy, Payment History Analytics, Price Analysis, Pricing Analytics, Retail, Retail Analytics, Retail Intelligence, Retail POS Data Analysis, and Salary Benchmarking
Investment Research: Financial Services, Hedge Funds, Investing, Mergers & Acquisitions (M&A), Stock Picking, Venture Capital (VC)
Consumer Analysis: Consumer Data Enrichment, Consumer Intelligence
Market Data: AnalyticsB2C Data Enrichment, Bank Data Enrichment, Behavioral Analytics, Benchmarking, Customer Insights, Customer Intelligence, Data Enhancement, Data Enrichment, Data Intelligence, Data Modeling, Ecommerce Analysis, Ecommerce Data Enrichment, Economic Analysis, Financial Data Enrichment, Financial Intelligence, Local Economic Forecasting, Location-based Analytics, Market Analysis, Market Analytics, Market Intelligence, Market Potential Analysis, Market Research, Market Share Analysis, Sales, Sales Data Enrichment, Sales Enablement, Sales Insights, Sales Intelligence, Spending Analytics, Stock Market Predictions, and Trend Analysis
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This table contains information on the balance sheet of the general government sector. The information is limited to financial assets and liabilities. For each reporting period the opening and closing stocks, financial transactions and other changes are shown. Transactions are economic flows that are the result of agreements between units. Other changes are changes in the value of assets or liabilities that do not result from transactions such as revaluations or reclassifications. The figures are consolidated which means that flows between units that belong to the same sector are eliminated. As a result, assets and liabilities of subsectors do not add up to total assets or liabilities of general government. For example, loans of the State provided to social security funds are part of loans of the State. However, these are not included in the consolidated assets of general government, because it is an asset of a government unit with a government unit as debtor. Financial assets and liabilities in this table are presented at market value. The terms and definitions used are in accordance with the framework of the Dutch national accounts. National accounts are based on the international definitions of the European System of Accounts (ESA 2010). Small temporary differences with publications of the National Accounts may occur due to the fact that the government finance statistics are sometimes more up to date.
Data available from: Yearly figures from 1995, quarterly figures from 1999.
Status of the figures: The figures for the period 1995-2022 are final. The figures for 2023 and 2024 are provisional.
Changes as of 24 December 2024: Figures on the third quarter of 2024 are available. The figures for the second quarter of 2024 have been adjusted.
When will new figures be published? Provisional quarterly figures are published three months after the end of the quarter. In September the figures on the first quarter may be revised, in December the figures on the second quarter may be revised and in March the first three quarters may be revised. Yearly figures are published for the first time three months after the end of the year concerned. Yearly figures are revised two times: 6 and 18 months after the end of the year. Please note that there is a possibility that adjustments might take place at the end of March or September, in order to provide the European Commission with the most actual figures. Revised yearly figures are published in June each year. Quarterly figures are aligned to the three revised years at the end of June. More information on the revision policy of Dutch national accounts and government finance statistics can be found under 'relevant articles' under paragraph 3.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Blockchain technology, first implemented by Satoshi Nakamoto in 2009 as a core component of Bitcoin, is a distributed, public ledger recording transactions. Its usage allows secure peer-to-peer communication by linking blocks containing hash pointers to a previous block, a timestamp, and transaction data. Bitcoin is a decentralized digital currency (cryptocurrency) which leverages the Blockchain to store transactions in a distributed manner in order to mitigate against flaws in the financial industry.
Nearly ten years after its inception, Bitcoin and other cryptocurrencies experienced an explosion in popular awareness. The value of Bitcoin, on the other hand, has experienced more volatility. Meanwhile, as use cases of Bitcoin and Blockchain grow, mature, and expand, hype and controversy have swirled.
In this dataset, you will have access to information about blockchain blocks and transactions. All historical data are in the bigquery-public-data:crypto_bitcoin
dataset. It’s updated it every 10 minutes. The data can be joined with historical prices in kernels. See available similar datasets here: https://www.kaggle.com/datasets?search=bitcoin.
You can use the BigQuery Python client library to query tables in this dataset in Kernels. Note that methods available in Kernels are limited to querying data. Tables are at bigquery-public-data.crypto_bitcoin.[TABLENAME]
. Fork this kernel to get started.
Allen Day (Twitter | Medium), Google Cloud Developer Advocate & Colin Bookman, Google Cloud Customer Engineer retrieve data from the Bitcoin network using a custom client available on GitHub that they built with the bitcoinj
Java library. Historical data from the origin block to 2018-01-31 were loaded in bulk to two BigQuery tables, blocks_raw and transactions. These tables contain fresh data, as they are now appended when new blocks are broadcast to the Bitcoin network. For additional information visit the Google Cloud Big Data and Machine Learning Blog post "Bitcoin in BigQuery: Blockchain analytics on public data".
Photo by Andre Francois on Unsplash.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This table provides an overview of the non-financial transactions of the institutional sectors of the Dutch economy. Non-financial transactions consist of current transactions and capital account transactions. Transactions are broken down by resources and uses. In addition, the balances of the sectors are also shown. Non-financial transactions are estimated for the main sectors of the economy: non-financial corporations, financial institutions, general government, households, non-profit institutions serving households and the rest of the world. In addition, the financial corporations and general government sectors are further broken down by subsectors. Sectors are presented both consolidated and non-consolidated.
Data available from: Annual data since 1995. Quarterly data from the first quarter of 1999 onwards.
Status of figures: Data from 1995 to 2020 are final. Data from 2021, 2022 and 2023 are provisional. As this table has been discontinued, provisional data are no longer definitive.
Changes as of 24 June 2024 None, this table has been discontinued. The Central Bureau of Statistics recently revised the national accounts. New sources, methods and concepts are introduced into the national accounts, so that the picture of the Dutch economy is optimally aligned with all underlying statistics, sources and international guidelines for compiling the national accounts. For more information see section 3.
When will there be new figures? No longer applicable.
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
The dataset contains year, month and payment application-wise UPI Apps Transaction Statistics like Customer Initiated Transactions, B2C Transactions, B2B Transactions and On-us Transactions Note: 1) Unified Payments Interface(UPI) is an instant real-time payment system developed by National Payments Corporation of India. The interface facilitates inter-bank peer-to-peer and person-to-merchant transactions 2) From January 2021 onwards, ‚On-us Transactions‚ in UPI that are not processed and settled through the UPI Central System is shown under ‚ On-us Transactions column 3) Apps which has volume less than 10,000 is included under‚ Other Apps. 4) App volume in table is basis the Payer App logic, i.e the financial transaction is attributed to the PSP in UPI on the Payer's side. 5) BHIM Volume is inclusive of *99# volume. 6) For WhatsApp, Maximum registered user base of hundred (100) million in UPI
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains data on all Real Property parcels that have sold since 2013 in Allegheny County, PA.
Before doing any market analysis on property sales, check the sales validation codes. Many property "sales" are not considered a valid representation of the true market value of the property. For example, when multiple lots are together on one deed with one price they are generally coded as invalid ("H") because the sale price for each parcel ID number indicates the total price paid for a group of parcels, not just for one parcel. See the Sales Validation Codes Dictionary for a complete explanation of valid and invalid sale codes.
Sales Transactions Disclaimer: Sales information is provided from the Allegheny County Department of Administrative Services, Real Estate Division. Content and validation codes are subject to change. Please review the Data Dictionary for details on included fields before each use. Property owners are not required by law to record a deed at the time of sale. Consequently the assessment system may not contain a complete sales history for every property and every sale. You may do a deed search at http://www.alleghenycounty.us/re/index.aspx directly for the most updated information. Note: Ordinance 3478-07 prohibits public access to search assessment records by owner name. It was signed by the Chief Executive in 2007.
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
A. SUMMARY This dataset contains current summary information for electronically filed FPPC campaign forms. The columns in this dataset correspond to the figures reported on the summary page of FPPC forms 450, 460, 461, and 465. Refer to the FPPC Forms represented in this dataset.
B. HOW THE DATASET IS CREATED Committees file campaign statements with the Ethics Commission on a periodic basis. Those statements are stored with the Commission's provider. Data is generally presented as-filed by committees. If a committee files an amendment, the data from that filing completely replaces the original and any prior amendments in the filing sequence.
C. UPDATE PROCESS Each night starting at midnight Pacific time a script runs to check for new filings with the Commission's database, and updates this dataset with transactions from new filings. The update process can take a variable amount of time to complete. Viewing or downloading this dataset while the update is running may result in incomplete data, therefore it is highly recommended to view or download this data before midnight or after 8am.
D. HOW TO USE THIS DATASET Transactions from rejected and superseded filings are not included in this dataset. Transactions from many different FPPC forms are combined in this dataset, refer to the column "Form Type" to differentiate transaction types.
A row with no value in the SyncFlag column indicates a paper filing amended an electronic filing. The SFEC is working on how to automatically deal with these cases.
Properties suffixed with "-nid" can be used to join the data between Filers, Filings, and Transaction datasets.
Refer to the Ethics Commission's webpage for more information.
RELATED DATASETS
These National Statistics provide monthly estimates of the number of residential and non-residential property transactions in the UK and its constituent countries. National Statistics are https://osr.statisticsauthority.gov.uk/accredited-official-statistics/" class="govuk-link">accredited official statistics.
England and Northern Ireland statistics are based on information submitted to the HM Revenue and Customs (HMRC) Stamp Duty Land Tax (SDLT) database by taxpayers on SDLT returns.
Land and Buildings Transaction Tax (LBTT) replaced SDLT in Scotland from 1 April 2015 and this data is provided to HMRC by https://www.revenue.scot/" class="govuk-link">Revenue Scotland to continue the time series.
Land Transaction Tax (LTT) replaced SDLT in Wales from 1 April 2018. To continue the time series, the https://gov.wales/welsh-revenue-authority" class="govuk-link">Welsh Revenue Authority (WRA) have provided HMRC with a monthly data feed of LTT transactions since July 2021.
LTT figures for the latest month are estimated using a grossing factor based on data for the most recent and complete financial year. Until June 2021, LTT transactions for the latest month were estimated by HMRC based upon year on year growth in line with other UK nations.
LTT transactions up to the penultimate month are aligned with LTT statistics.
Go to Stamp Duty Land Tax guidance for the latest rates and information.
Go to Stamp Duty Land Tax rates from 1 December 2003 to 22 September 2022 and Stamp Duty: rates on land transfers before December 2003 for historic rates.
Further details for this statistical release, including data suitability and coverage, are included within the ‘Monthly property transactions completed in the UK with value of £40,000 or above’ quality report.
The latest release was published 09:30 28 February 2025 and was updated with provisional data from completed transactions during January 2025.
The next release will be published 09:30 28 February 2025 and will be updated with provisional data from completed transactions during January 2025.
https://webarchive.nationalarchives.gov.uk/ukgwa/20240320184933/https://www.gov.uk/government/statistics/monthly-property-transactions-completed-in-the-uk-with-value-40000-or-above" class="govuk-link">Archive versions of the Monthly property transactions completed in the UK with value of £40,000 or above are available via the UK Government Web Archive, from the National Archives.
Licence Ouverte / Open Licence 1.0https://www.etalab.gouv.fr/wp-content/uploads/2014/05/Open_Licence.pdf
License information was derived automatically
Since 2019, the Directorate-General for Public Finance (DGFIP) has made available to the general public the dataset “Requests for Land Values” (DVF). These data allow us to know the real estate transactions that took place over the last five years in metropolitan territory and the DOM-TOM, with the exception of Alsace, Moselle and Mayotte. The data are derived from notarial deeds and cadastral information. CEREMA offers an open data, cleaned and structured version of the file, called DVF+. This data is organised at the transaction, making it easier to use. However, their operation remains complex. Here we propose a version of the DVF+ data that aggregates information at the municipal level and manages upstream the complexity of the information by applying a number of filters. The objective is to propose stabilised and homogeneous data on clearly identified perimeters. The organisation of this information at the level of the municipality (commune 2020 and communal districts) aims to make access to this information easier without losing its quality. The dataset is accompanied by a PJ document describing the method developed in a collegial way within a workshop of the LIFTI (Laboratoire d’Initiatives Foncières et Territoriales Innovantes). This workshop, led by the CDC, allowed the creation of the dataset, with the help of the company Modaal which supported the script. This document is supplemented by a dictionary that explains the methods of setting up indicators and the set of filters applied to the data. In the dataset made available, the indicators are calculated as an annual average for two periods of 3 years, 2014 to 2016 and 2017 to 2019. The DVF+ vintage mobilised is that of April 2022; https://cerema.app.box.com/v/dvfplus-opendata Communes with less than 5 mutations (over the year or over the period) are absent from the dataset; For more information on the DVF source and best practices for this source, please refer to the Vademecum available at this link: HTTPS://WWW.GROUPE-DVF.FR/CATEGORY/VADEMECUM/ (https://opendata.caissedesdepots.fr/assets/theme_image/LIFTI_Logo%202017.jpg)!(s)
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This data set is a subset of the "Records of foreign capital" (Registros de capitais estrangeiros", RCE) published by the Central Bank of Brazil (CBB) on their website.The data set consists of three data files and three corresponding metadata files. All files are in openly accessible .csv or .txt formats. See detailed outline below for data contained in each. Data files contain transaction-specific data such as unique identifier, currency, cancelled status and amount. Metadata files outline variables in the corresponding data file.RCE_Unclean_full_dataset.csv - all transactions published to the Central Bank website from the four main categories outlined belowMetadata_Unclean_full_dataset.csvRCE_Unclean_cancelled_dataset.csv - data extracted from the RCE_Unclean_full_dataset.csv where transactions were registered then cancelledMetadata_Unclean_cancelled_dataset.csvRCE_Clean_selection_dataset.csv - transaction data extracted from RCE_Unclean_full_dataset.csv and RCE_Unclean_cancelled_dataset.csv for the nine companies and criteria identified belowMetadata_Clean_selection_dataset.csvThe data include the period between October 2000 and July 2011. This is the only time span for the data provided by the Central Bank of Brazil at this stage. The records were published monthly by the Central Bank of Brazil as required by Art. 66 in Decree nº 55.762 of 17 February 1965, modified by Decree nº 4.842 of 17 September 2003. The records were published on the bank’s website starting October 2000, as per communique nº 011489 of 7 October 2003. This remained the case until August 2011, after which the amount of each transaction was no longer disclosed (and publication of these stopped altogether after October 2011). The disclosure of the records was suspended in order to review their legal and technical aspects, and ensure their suitability to the requirements of the rules governing the confidentiality of the information (Law nº 12.527 of 18 November 2011 and Decree nº 7724 of May 2012) (pers. comm. Central Bank of Brazil, 2016. Name of contact available upon request to Authors).The records track transfers of foreign capital made from abroad to companies domiciled in Brazil, with information on the foreign company (name and country) transferring the money, and on the company receiving the capital (name and federative unit). For the purpose of this study, we consider the four categories of foreign capital transactions which are published with their amount and currency in the Central Bank’s data, and which are all part of the “Register of financial transactions” (abbreviated RDE-ROF): loans, leasing, financed import and cash in advance (see below for a detailed description). Additional categories exist, such as foreign direct investment (RDE-IED) and External Investment in Portfolio (RDE-Portfólio), for which no amount is published and which are therefore not included.We used the data posted online as PDFs on the bank’s website, and created a script to extract the data automatically from these four categories into the RCE_Unclean_full_dataset.csv file. This data set has not been double-checked manually and may contain errors. We used a similar script to extract rows from the "cancelled transactions" sections of the PDFs into the RCE_Unclean_cancelled_dataset.csv file. This is useful to identify transactions that have been registered to the Central Bank but later cancelled. This data set has not been double-checked manually and may contain errors.From these raw data sets, we conducted the following selections and calculations in order to create the RCE_Clean_selection_dataset.csv file. This data set has been double-checked manually to secure that no errors have been made in the extraction process.We selected all transactions whose recipient company name corresponds to one of these nine companies, or to one of their known subsidiaries in Brazil, according to the list of subsidiaries recorded in the Orbis database, maintained by Bureau Van Dijk. Transactions are included if the recipient company name matches one of the following:- the current or former name of one of the nine companies in our sample (former names are identified using Orbis, Bloomberg’s company profiles or the company website);- the name of a known subsidiary of one of the nine companies, if and only if we find evidence (in Orbis, Bloomberg’s company profiles or on the company website) that this subsidiary was owned at some point during the period 2000-2011, and that it operated in a sector related to the soy or beef industry (including fertilizers and trading activities).For each transaction, we extracted the name of the company sending capital and when possible, attributed the transaction to the known ultimate owner.The name of the countries of origin sometimes comes with typos or different denominations: we harmonized them.A manual check of all the selected data unveiled that a few transactions (n=14), appear twice in the database while bearing the same unique identification number. According to the Central Bank of Brazil (pers. comm., November 2016), this is due to errors in their routine of data extraction. We therefore deleted duplicates in our database, keeping only the latest occurrence of each unique transaction. Six (6) transactions recorded with an amount of zero were also deleted. Two (2) transactions registered in August 2003 with incoherent currencies (Deutsche Mark and Dutch guilder, which were demonetised in early 2002) were also deleted.To secure that the import of data from PDF to the database did not contain any systematic errors, for instance due to mistakes in coding, data were checked in two ways. First, because the script identifies the end of the row in the PDF using the amount of the transaction, which can sometimes fail if the amount is not entered correctly, we went through the extracted raw data (2798 rows) and cleaned all rows whose end had not been correctly identified by the script. Next, we manually double-checked the 486 largest transactions representing 90% of the total amount of capital inflows, as well as 140 randomly selected additional rows representing 5% of the total rows, compared the extracted data to the original PDFs, and found no mistakes.Transfers recorded in the database have been made in different currencies, including US dollars, Euros, Japanese Yens, Brazilian Reais, and more. The conversion to US dollars of all amounts denominated in other currencies was done using the average monthly exchange rate as published by the International Monetary Fund (International Financial Statistics: Exchange rates, national currency per US dollar, period average). Due to the limited time period, we have not corrected for inflation but aggregated nominal amounts in USD over the period 2000-2011.The categories loans, cash in advance (anticipated payment for exports), financed import, and leasing/rental, are those used by the Central Bank of Brazil in their published data. They are denominated respectively: “Loans” (“emprestimos” in original source) - : includes all loans, either contracted directly with creditors or indirectly through the issuance of securities, brokered by foreign agents. “Anticipated payment for exports” (“pagamento/renovacao pagamento antecipado de exportacao” in original source): defined as a type of loan (used in trade finance)“Financed import” (“importacao financiada” in original source): comprises all import financing transactions either direct (contracted by the importer with a foreign bank or with a foreign supplier), or indirect (contracted by Brazilian banks with foreign banks on behalf of Brazilian importers). They must be declared to the Central Bank if their term of payment is superior to 360 days.“Leasing/rental” (“arrendamento mercantil, leasing e aluguel” in original source) : concerns all types of external leasing operations consented by a Brazilian entity to a foreign one. They must be declared if the term of payment is superior to 360 days.More information about the different categories can be found through the Central Bank online.(Research Data Support provided by Springer Nature)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains bitcoin transfer transactions extracted from the Bitcoin Mainnet blockchain.
Part1 is available at https://zenodo.org/deposit/7157356 Part3 is available at https://zenodo.org/deposit/7158133 Part4 is available at https://zenodo.org/deposit/7158328
Details of the datasets are given below:
FILENAME FORMAT:
The filenames have the following format:
btc-tx-
where
For example file btc-tx-100000-149999-aa.bz2 and the rest of the parts if any contain transactions from
block 100000 to block 149999 inclusive.
The files are compressed with bzip2. They can be uncompressed using command bunzip2.
TRANSACTION FORMAT:
Each line in a file corresponds to a transaction. The transaction has the following format:
BLOCK TIME FORMAT:
The block time file has the following format:
IMPORTANT NOTE:
Public Bitcoin Mainnet blockchain data is open and can be obtained by connecting as a node on the blockchain or by using the block explorer web sites such as https://btcscan.org . The downloaders and users of this dataset accept the full responsibility of using the data in GDPR compliant manner or any other regulations. We provide the data as is and we cannot be held responsible for anything.
NOTE:
If you use this dataset, please do not forget to add the DOI number to the citation.
If you use our dataset in your research, please also cite our paper: https://link.springer.com/chapter/10.1007/978-3-030-94590-9_14
@incollection{kilicc2022analyzing, title={Analyzing Large-Scale Blockchain Transaction Graphs for Fraudulent Activities}, author={K{\i}l{\i}{\c{c}}, Baran and {"O}zturan, Can and {\c{S}}en, Alper}, booktitle={Big Data and Artificial Intelligence in Digital Finance}, pages={253--267}, year={2022}, publisher={Springer, Cham} }