CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
See the article "Targeting model based on principal component analysis and extreme learning machine" for the meaning of the data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Previously, we have established two distinct progressive multiple sclerosis (MS) models by induction of experimental autoimmune encephalomyelitis (EAE) with myelin oligodendrocyte glycoprotein (MOG) in two mouse strains. A.SW mice develop ataxia with antibody deposition, but no T cell infiltration, in the central nervous system (CNS), while SJL/J mice develop paralysis with CNS T cell infiltration. In this study, we determined biomarkers contributing to the homogeneity and heterogeneity of two models. Using the CNS and spleen microarray transcriptome and cytokine data, we conducted computational analyses. We identified up-regulation of immune-related genes, including immunoglobulins, in the CNS of both models. Pro-inflammatory cytokines, interferon (IFN)-γ and interleukin (IL)-17, were associated with the disease progression in SJL/J mice, while the expression of both cytokines was detected only at the EAE onset in A.SW mice. Principal component analysis (PCA) of CNS transcriptome data demonstrated that down-regulation of prolactin may reflect disease progression. Pattern matching analysis of spleen transcriptome with CNS PCA identified 333 splenic surrogate markers, including Stfa2l1, which reflected the changes in the CNS. Among them, we found that two genes (PER1/MIR6883 and FKBP5) and one gene (SLC16A1/MCT1) were also significantly up-regulated and down-regulated, respectively, in human MS peripheral blood, using data mining.
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase.
The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.
It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, ... V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-sensitive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.
Given the class imbalance ratio, we recommend measuring the accuracy using the Area Under the Precision-Recall Curve (AUPRC). Confusion matrix accuracy is not meaningful for unbalanced classification.
A simulator for transaction data has been released as part of the practical handbook on Machine Learning for Credit Card Fraud Detection - https://fraud-detection-handbook.github.io/fraud-detection-handbook/Chapter_3_GettingStarted/SimulatedDataset.html. We invite all practitioners interested in fraud detection datasets to also check out this data simulator, and the methodologies for credit card fraud detection presented in the book.
The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection. More details on current and past projects on related topics are available on https://www.researchgate.net/project/Fraud-detection-5 and the page of the DefeatFraud project
Please cite the following works:
Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015
Dal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a practitioner perspective, Expert systems with applications,41,10,4915-4928,2014, Pergamon
Dal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning strategy, IEEE transactions on neural networks and learning systems,29,8,3784-3797,2018,IEEE
Dal Pozzolo, Andrea Adaptive Machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi)
Carcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-Aël; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming credit card fraud detection with Spark, Information fusion,41, 182-194,2018,Elsevier
Carcillo, Fabrizio; Le Borgne, Yann-Aël; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization, International Journal of Data Science and Analytics, 5,4,285-300,2018,Springer International Publishing
Bertrand Lebichot, Yann-Aël Le Borgne, Liyun He, Frederic Oblé, Gianluca Bontempi Deep-Learning Domain Adaptation Techniques for Credit Cards Fraud Detection, INNSBDDL 2019: Recent Advances in Big Data and Deep Learning, pp 78-88, 2019
Fabrizio Carcillo, Yann-Aël Le Borgne, Olivier Caelen, Frederic Oblé, Gianluca Bontempi Combining Unsupervised and Supervised Learning in Credit Card Fraud Detection Information Sciences, 2019
Yann-Aël Le Borgne, Gianluca Bontempi Reproducible machine Learning for Credit Card Fraud Detection - Practical Handbook
Bertrand Lebichot, Gianmarco Paldino, Wissam Siblini, Liyun He, Frederic Oblé, Gianluca Bontempi Incremental learning strategies for credit cards fraud detection, IInternational Journal of Data Science and Analytics
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 9: Table S9. PubChem high throughput screen results of 3-(1H-1,3-Benzadiol-2-yl)quinoline and 2-(4-Methoxyphenyl)-7-methylimidazo[1,2-a]pyridine.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study addresses the evaluation of the generation of domestic solid waste in Peruvian households using statistical techniques and the SEMMA and PCA data mining methodology. The objective is to explore how waste management, population and the Per Capita Generation index PCG index per capita influence the production of this waste in Peruvian departments. The sample was obtained from the database of annual reports submitted by district and provincial municipalities to MINAM through the Information System for Solid Waste Management (SIGERSOL), including data from the 24 departments of Peru, with a total of 14,852 records organized in 196 registration forms. Statistical techniques and the adaptation of the SEMMA methodology were applied together with the Principal Component Analysis (PCA) to examine the impacts of the accumulation of household solid waste in Peru. This study showed that the first component accounts for 80.2% of the inertia. Combining the first two components accounts for 99.8% of the total variation, suggesting that most of the meaningful information can be maintained using only two dimensions. Welch’s ANOVA showed significant differences in domestic solid waste generation among Peruvian departments [F (6, 94.310) = 790.444; p = 0.0, p
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 11. The 14 training sets used for study which is derived from AID 1721, a high throughput screened, confirmatory bioassay dataset on pyruvate kinase protein target of Leishmania mexicana. Training sets are given as ARFF file and have 179 molecular descriptors generated using PowerMV.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 7: Table S10. Weighted burden number descriptor values (PCAD) of FDA approved drugs and that of PubChem molecules which were enlisted in Table 3.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
See the article "Targeting model based on principal component analysis and extreme learning machine" for the meaning of the data.