7 datasets found

f
Raw data of the ships of priority II in 2017
figshare.com
data.4tu.nl
+1more
xlsx
Updated Jul 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiaqi Mu (2020). Raw data of the ships of priority II in 2017 [Dataset]. http://doi.org/10.4121/uuid:92a6ce98-5001-4768-bec1-49881a367d36
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.4121/uuid:92a6ce98-5001-4768-bec1-49881a367d36
Dataset updated
Jul 28, 2020
Dataset provided by
4TU.ResearchData
Authors
Jiaqi Mu
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
See the article "Targeting model based on principal component analysis and extreme learning machine" for the meaning of the data.
f
Data_Sheet_1_Bioinformatics Analyses Determined the Distinct CNS and...
figshare.com
frontiersin.figshare.com
docx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seiichi Omura; Fumitaka Sato; Nicholas E. Martinez; Ah-Mee Park; Mitsugu Fujita; Nikki J. Kennett; Urška Cvek; Alireza Minagar; J. Steven Alexander; Ikuo Tsunoda (2023). Data_Sheet_1_Bioinformatics Analyses Determined the Distinct CNS and Peripheral Surrogate Biomarker Candidates Between Two Mouse Models for Progressive Multiple Sclerosis.docx [Dataset]. http://doi.org/10.3389/fimmu.2019.00516.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/fimmu.2019.00516.s001
Dataset updated
May 31, 2023
Dataset provided by
Frontiers
Authors
Seiichi Omura; Fumitaka Sato; Nicholas E. Martinez; Ah-Mee Park; Mitsugu Fujita; Nikki J. Kennett; Urška Cvek; Alireza Minagar; J. Steven Alexander; Ikuo Tsunoda
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Previously, we have established two distinct progressive multiple sclerosis (MS) models by induction of experimental autoimmune encephalomyelitis (EAE) with myelin oligodendrocyte glycoprotein (MOG) in two mouse strains. A.SW mice develop ataxia with antibody deposition, but no T cell infiltration, in the central nervous system (CNS), while SJL/J mice develop paralysis with CNS T cell infiltration. In this study, we determined biomarkers contributing to the homogeneity and heterogeneity of two models. Using the CNS and spleen microarray transcriptome and cytokine data, we conducted computational analyses. We identified up-regulation of immune-related genes, including immunoglobulins, in the CNS of both models. Pro-inflammatory cytokines, interferon (IFN)-γ and interleukin (IL)-17, were associated with the disease progression in SJL/J mice, while the expression of both cytokines was detected only at the EAE onset in A.SW mice. Principal component analysis (PCA) of CNS transcriptome data demonstrated that down-regulation of prolactin may reflect disease progression. Pattern matching analysis of spleen transcriptome with CNS PCA identified 333 splenic surrogate markers, including Stfa2l1, which reflected the changes in the CNS. Among them, we found that two genes (PER1/MIR6883 and FKBP5) and one gene (SLC16A1/MCT1) were also significantly up-regulated and down-regulated, respectively, in human MS peripheral blood, using data mining.
Credit Card Fraud Detection
kaggle.com
zip
Updated Mar 23, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Machine Learning Group - ULB (2018). Credit Card Fraud Detection [Dataset]. https://www.kaggle.com/mlg-ulb/creditcardfraud
Explore at:
zip(69155672 bytes)Available download formats
Dataset updated
Mar 23, 2018
Dataset authored and provided by
Machine Learning Group - ULB
License
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Description
Context

It is important that credit card companies are able to recognize fraudulent credit card transactions so that customers are not charged for items that they did not purchase.

Content

The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, ... V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependant cost-sensitive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.

Given the class imbalance ratio, we recommend measuring the accuracy using the Area Under the Precision-Recall Curve (AUPRC). Confusion matrix accuracy is not meaningful for unbalanced classification.

Update (03/05/2021)

A simulator for transaction data has been released as part of the practical handbook on Machine Learning for Credit Card Fraud Detection - https://fraud-detection-handbook.github.io/fraud-detection-handbook/Chapter_3_GettingStarted/SimulatedDataset.html. We invite all practitioners interested in fraud detection datasets to also check out this data simulator, and the methodologies for credit card fraud detection presented in the book.

Acknowledgements

The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection. More details on current and past projects on related topics are available on https://www.researchgate.net/project/Fraud-detection-5 and the page of the DefeatFraud project

Please cite the following works:

Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015

Dal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a practitioner perspective, Expert systems with applications,41,10,4915-4928,2014, Pergamon

Dal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning strategy, IEEE transactions on neural networks and learning systems,29,8,3784-3797,2018,IEEE

Dal Pozzolo, Andrea Adaptive Machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi)

Carcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-Aël; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming credit card fraud detection with Spark, Information fusion,41, 182-194,2018,Elsevier

Carcillo, Fabrizio; Le Borgne, Yann-Aël; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization, International Journal of Data Science and Analytics, 5,4,285-300,2018,Springer International Publishing

Bertrand Lebichot, Yann-Aël Le Borgne, Liyun He, Frederic Oblé, Gianluca Bontempi Deep-Learning Domain Adaptation Techniques for Credit Cards Fraud Detection, INNSBDDL 2019: Recent Advances in Big Data and Deep Learning, pp 78-88, 2019

Fabrizio Carcillo, Yann-Aël Le Borgne, Olivier Caelen, Frederic Oblé, Gianluca Bontempi Combining Unsupervised and Supervised Learning in Credit Card Fraud Detection Information Sciences, 2019

Yann-Aël Le Borgne, Gianluca Bontempi Reproducible machine Learning for Credit Card Fraud Detection - Practical Handbook

Bertrand Lebichot, Gianmarco Paldino, Wissam Siblini, Liyun He, Frederic Oblé, Gianluca Bontempi Incremental learning strategies for credit cards fraud detection, IInternational Journal of Data Science and Analytics
f
MOESM9 of Feature optimization in high dimensional chemical space:...
springernature.figshare.com
xlsx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jinuraj K. R.; Rakhila M.; Dhanalakshmi M.; Sajeev R.; Akshata Gad; Jayan K.; Muhammed Iqbal P.; Andrew Manuel; Abdul Jaleel U. C. (2023). MOESM9 of Feature optimization in high dimensional chemical space: statistical and data mining solutions [Dataset]. http://doi.org/10.6084/m9.figshare.6814010.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6814010.v1
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Jinuraj K. R.; Rakhila M.; Dhanalakshmi M.; Sajeev R.; Akshata Gad; Jayan K.; Muhammed Iqbal P.; Andrew Manuel; Abdul Jaleel U. C.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 9: Table S9. PubChem high throughput screen results of 3-(1H-1,3-Benzadiol-2-yl)quinoline and 2-(4-Methoxyphenyl)-7-methylimidazo[1,2-a]pyridine.
f
Data Sheet 3_Statistical approach for the evaluation of household solid...
frontiersin.figshare.com
csv
Updated Feb 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jorge L. Padilla-Vento; Juan J. Soria (2025). Data Sheet 3_Statistical approach for the evaluation of household solid waste generation in Peruvian households: 2014–2021.csv [Dataset]. http://doi.org/10.3389/fams.2025.1498513.s003
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.3389/fams.2025.1498513.s003
Dataset updated
Feb 27, 2025
Dataset provided by
Frontiers
Authors
Jorge L. Padilla-Vento; Juan J. Soria
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This study addresses the evaluation of the generation of domestic solid waste in Peruvian households using statistical techniques and the SEMMA and PCA data mining methodology. The objective is to explore how waste management, population and the Per Capita Generation index PCG index per capita influence the production of this waste in Peruvian departments. The sample was obtained from the database of annual reports submitted by district and provincial municipalities to MINAM through the Information System for Solid Waste Management (SIGERSOL), including data from the 24 departments of Peru, with a total of 14,852 records organized in 196 registration forms. Statistical techniques and the adaptation of the SEMMA methodology were applied together with the Principal Component Analysis (PCA) to examine the impacts of the accumulation of household solid waste in Peru. This study showed that the first component accounts for 80.2% of the inertia. Combining the first two components accounts for 99.8% of the total variation, suggesting that most of the meaningful information can be maintained using only two dimensions. Welch’s ANOVA showed significant differences in domestic solid waste generation among Peruvian departments [F (6, 94.310) = 790.444; p = 0.0, p 
MOESM11 of Feature optimization in high dimensional chemical space:...
springernature.figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jinuraj K. R.; Rakhila M.; Dhanalakshmi M.; Sajeev R.; Akshata Gad; Jayan K.; Muhammed Iqbal P.; Andrew Manuel; Abdul Jaleel U. C. (2023). MOESM11 of Feature optimization in high dimensional chemical space: statistical and data mining solutions [Dataset]. http://doi.org/10.6084/m9.figshare.6813827.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6813827.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Jinuraj K. R.; Rakhila M.; Dhanalakshmi M.; Sajeev R.; Akshata Gad; Jayan K.; Muhammed Iqbal P.; Andrew Manuel; Abdul Jaleel U. C.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 11. The 14 training sets used for study which is derived from AID 1721, a high throughput screened, confirmatory bioassay dataset on pyruvate kinase protein target of Leishmania mexicana. Training sets are given as ARFF file and have 179 molecular descriptors generated using PowerMV.
MOESM7 of Feature optimization in high dimensional chemical space:...
springernature.figshare.com
xlsx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jinuraj K. R.; Rakhila M.; Dhanalakshmi M.; Sajeev R.; Akshata Gad; Jayan K.; Muhammed Iqbal P.; Andrew Manuel; Abdul Jaleel U. C. (2023). MOESM7 of Feature optimization in high dimensional chemical space: statistical and data mining solutions [Dataset]. http://doi.org/10.6084/m9.figshare.6813983.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6813983.v1
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Jinuraj K. R.; Rakhila M.; Dhanalakshmi M.; Sajeev R.; Akshata Gad; Jayan K.; Muhammed Iqbal P.; Andrew Manuel; Abdul Jaleel U. C.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 7: Table S10. Weighted burden number descriptor values (PCAD) of FDA approved drugs and that of PubChem molecules which were enlisted in TableÂ 3.
Not seeing a result you expected?
Learn how you can add new datasets to our index.