45 datasets found

f
Data from: Results obtained in a data mining process applied to a database...
scielo.figshare.com
jpeg
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
E.M. Ruiz Lobaina; C. P. Romero Suárez (2023). Results obtained in a data mining process applied to a database containing bibliographic information concerning four segments of science. [Dataset]. http://doi.org/10.6084/m9.figshare.20011798.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20011798.v1
Dataset updated
Jun 4, 2023
Dataset provided by
SciELO journals
Authors
E.M. Ruiz Lobaina; C. P. Romero Suárez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract The objective of this work is to improve the quality of the information that belongs to the database CubaCiencia, of the Institute of Scientific and Technological Information. This database has bibliographic information referring to four segments of science and is the main database of the Library Management System. The applied methodology was based on the Decision Trees, the Correlation Matrix, the 3D Scatter Plot, etc., which are techniques used by data mining, for the study of large volumes of information. The results achieved not only made it possible to improve the information in the database, but also provided truly useful patterns in the solution of the proposed objectives.
d
Discovering Anomalous Aviation Safety Events Using Scalable Data Mining...
datadiscoverystudio.org
cloud.csiss.gmu.edu
+6more
Updated Sep 8, 2014
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2014). Discovering Anomalous Aviation Safety Events Using Scalable Data Mining Algorithms [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/ee5b868c86f7498ab5e1473e8d908629/html
Explore at:
Dataset updated
Sep 8, 2014
Description
The worldwide civilian aviation system is one of the most complex dynamical systems created. Most modern commercial aircraft have onboard flight data recorders that record several hundred discrete and continuous parameters at approximately 1Hz for the entire duration of the flight. These data contain information about the flight control systems, actuators, engines, landing gear, avionics, and pilot commands. In this paper, recent advances in the development of a novel knowledge discovery process consisting of a suite of data mining techniques for identifying precursors to aviation safety incidents are discussed. The data mining techniques include scalable multiple-kernel learning for large-scale distributed anomaly detection. A novel multivariate time-series search algorithm is used to search for signatures of discovered anomalies on massive datasets. The process can identify operationally significant events due to environmental, mechanical, and human factors issues in the high-dimensional flight operations quality assurance data. All discovered anomalies are validated by a team of independent domain experts. This novel automated knowledge discovery process is aimed at complementing the state-of-the-art human-generated exceedance-based analysis that fails to discover previously unknown aviation safety incidents. In this paper, the discovery pipeline, the methods used, and some of the significant anomalies detected on real-world commercial aviation data are discussed.
kdd cyberattack
kaggle.com
Updated Jul 28, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ziyad Mestour (2018). kdd cyberattack [Dataset]. https://www.kaggle.com/slashtea/kdd-cyberattack/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 28, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ziyad Mestour
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Context

This is the data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between bad'' connections, called intrusions or attacks, andgood'' normal connections. This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment.

Content

For more information about the contents refer to this link http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

Acknowledgements

The dataset is shared on Kaggle on behalf of KDD's work.

Inspiration

Build a classifier capable of distinguishing between attacks, and normal connections
f
Table_1_The TargetMine Data Warehouse: Enhancement and Updates.xlsx
frontiersin.figshare.com
xlsx
Updated Jun 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yi-An Chen; Lokesh P. Tripathi; Takeshi Fujiwara; Tatsuya Kameyama; Mari N. Itoh; Kenji Mizuguchi (2023). Table_1_The TargetMine Data Warehouse: Enhancement and Updates.xlsx [Dataset]. http://doi.org/10.3389/fgene.2019.00934.s004
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.3389/fgene.2019.00934.s004
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
Yi-An Chen; Lokesh P. Tripathi; Takeshi Fujiwara; Tatsuya Kameyama; Mari N. Itoh; Kenji Mizuguchi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Biological data analysis is the key to new discoveries in disease biology and drug discovery. The rapid proliferation of high-throughput ‘omics’ data has necessitated a need for tools and platforms that allow the researchers to combine and analyse different types of biological data and obtain biologically relevant knowledge. We had previously developed TargetMine, an integrative data analysis platform for target prioritisation and broad-based biological knowledge discovery. Here, we describe the newly modelled biological data types and the enhanced visual and analytical features of TargetMine. These enhancements have included: an enhanced coverage of gene–gene relations, small molecule metabolite to pathway mappings, an improved literature survey feature, and in silico prediction of gene functional associations such as protein–protein interactions and global gene co-expression. We have also described two usage examples on trans-omics data analysis and extraction of gene-disease associations using MeSH term descriptors. These examples have demonstrated how the newer enhancements in TargetMine have contributed to a more expansive coverage of the biological data space and can help interpret genotype–phenotype relations. TargetMine with its auxiliary toolkit is available at https://targetmine.mizuguchilab.org. The TargetMine source code is available at https://github.com/chenyian-nibio/targetmine-gradle.
d
Replication Data for: \"Unraveling spatial, structural, and social...
search.dataone.org
Updated Nov 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PÁJARO, Agustin; DURAN, Ignacio J.; RODRIGO, Pablo (2023). Replication Data for: \"Unraveling spatial, structural, and social country-level conditions for the emergence of the foreign fighter phenomenon: an exploratory data mining approach to the case of ISIS\" [Dataset]. http://doi.org/10.7910/DVN/SFT3RT
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/SFT3RT
Dataset updated
Nov 9, 2023
Dataset provided by
Harvard Dataverse
Authors
PÁJARO, Agustin; DURAN, Ignacio J.; RODRIGO, Pablo
Description
Data from the article "Unraveling spatial, structural, and social country-level conditions for the emergence of the foreign fighter phenomenon: an exploratory data mining approach to the case of ISIS", by Agustin Pájaro, Ignacio J. Duran and Pablo Rodrigo, published in Revista DADOS, v. 65, n. 3, 2022.
i
Data from: KDD Cup 1999 Data
impactcybertrust.org
Updated Jan 19, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
External Data Source (2019). KDD Cup 1999 Data [Dataset]. http://doi.org/10.23721/100/1478801
Explore at:
Unique identifier
https://doi.org/10.23721/100/1478801
Dataset updated
Jan 19, 2019
Authors
External Data Source
Description
This is the data set used for intrusion detector learning task in the Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99, The Fifth International Conference on Knowledge Discovery and Data Mining. The intrusion detector learning task is to build a predictive model (i.e. a classifier) capable of distinguishing between bad'' connections, called intrusions or attacks, andgood'' normal connections.

The 1998 DARPA Intrusion Detection Evaluation Program was prepared and managed by MIT Lincoln Labs. The objective was to survey and evaluate research in intrusion detection. A standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment, was provided. The 1999 KDD intrusion detection contest uses a version of this dataset.

Lincoln Labs set up an environment to acquire nine weeks of raw TCP dump data for a local-area network (LAN) simulating a typical U.S. Air Force LAN. They operated the LAN as if it were a true Air Force environment, but peppered it with multiple attacks.

The raw training data was about four gigabytes of compressed binary TCP dump data from seven weeks of network traffic. This was processed into about five million connection records. Similarly, the two weeks of test data yielded around two million connection records. ; gcounsel@ics.uci.edu
m
Data for:A Real-Time Social Network-Based Knowledge Discovery System for...
data.mendeley.com
Updated Feb 18, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Asim Sinan Yuksel (2018). Data for:A Real-Time Social Network-Based Knowledge Discovery System for Decision Making [Dataset]. http://doi.org/10.17632/29tbvvwkdp.1
Explore at:
Unique identifier
https://doi.org/10.17632/29tbvvwkdp.1
Dataset updated
Feb 18, 2018
Authors
Asim Sinan Yuksel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
1-Turkish comments for 128 venues in Foursquare Social Network Platform (binary and ternary classified) 2-Turkish adjectives and polarities 3-Turkish food and drink names 4- All comments without tagging 5-Venues, liked meals/foods
d
Data from: Towards open data blockchain analytics: a Bitcoin perspective
search.dataone.org
datadryad.org
+1more
Updated Jun 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dan McGinn; Douglas McIlwraith; Yike Guo (2025). Towards open data blockchain analytics: a Bitcoin perspective [Dataset]. http://doi.org/10.5061/dryad.h9r0p65
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.h9r0p65
Dataset updated
Jun 12, 2025
Dataset provided by
Dryad Digital Repository
Authors
Dan McGinn; Douglas McIlwraith; Yike Guo
Time period covered
Jul 9, 2018
Description
Bitcoin is the first implementation of a technology that has become known as a 'public permissionless' blockchain. Such systems allow public read/write access to an append-only blockchain database without the need for any mediating central authority. Instead they guarantee access, security and protocol conformity through an elegant combination of cryptographic assurances and game theoretic economic incentives. Not until the advent of the Bitcoin blockchain has such a trusted, transparent, comprehensive and granular data set of digital economic behaviours been available for public network analysis. In this article, by translating the cumbersome binary data structure of the Bitcoin blockchain into a high fidelity graph model, we demonstrate through various analyses the often overlooked social and econometric benefits of employing such a novel open data architecture. Specifically we show (a) how repeated patterns of transaction behaviours can be revealed to link user activity across t...
T
kddcup99
tensorflow.org
Updated Jan 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). kddcup99 [Dataset]. https://www.tensorflow.org/datasets/catalog/kddcup99
Explore at:
Dataset updated
Jan 4, 2023
Description
This is the data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between 'bad' connections, called intrusions or attacks, and 'good' normal connections. This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('kddcup99', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
Additional file 1 of Learning from biomedical linked data to suggest valid...
springernature.figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kevin Dalleau; Yassine Marzougui; SĂŠbastien Da Silva; Patrice Ringot; Ndeye Coumba Ndiaye; Adrien Coulet (2023). Additional file 1 of Learning from biomedical linked data to suggest valid pharmacogenes [Dataset]. http://doi.org/10.6084/m9.figshare.c.3747806_D1.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.c.3747806_D1.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Kevin Dalleau; Yassine Marzougui; SĂŠbastien Da Silva; Patrice Ringot; Ndeye Coumba Ndiaye; Adrien Coulet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SPARQL query example 1. This text file contains the SPARQL query we apply on our PGx linked data to obtain the data graph represented in Fig. 3. This query includes the definition of prefixes mentioned in Figs. 2 and 3. This query takes about 30 s on our https://pgxlod.loria.fr server. (TXT 2 kb)
S
Semantic Knowledge Discovery Software Report
datainsightsmarket.com
doc, pdf, ppt
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Semantic Knowledge Discovery Software Report [Dataset]. https://www.datainsightsmarket.com/reports/semantic-knowledge-discovery-software-1949491
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
May 29, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Semantic Knowledge Discovery Software market is experiencing robust growth, driven by the increasing need for organizations to extract actionable insights from complex and unstructured data. The market, estimated at $2 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching an estimated $6 billion by 2033. This growth is fueled by several key factors. The rising adoption of artificial intelligence (AI) and machine learning (ML) technologies across various industries is enabling more sophisticated semantic analysis, leading to improved decision-making. Furthermore, the proliferation of big data, coupled with the limitations of traditional data analysis methods, is driving the demand for solutions that can effectively uncover hidden patterns and relationships within vast datasets. The growing emphasis on data-driven decision-making across sectors like healthcare, finance, and research and development is also contributing significantly to market expansion. Major restraints to market growth include the high initial investment costs associated with implementing semantic knowledge discovery software, the complexity of integrating these solutions with existing IT infrastructure, and the scarcity of skilled professionals capable of managing and interpreting the results generated by these systems. However, these challenges are being addressed through the development of more user-friendly software, cloud-based deployment models that reduce upfront costs, and increased training and education programs focused on semantic technology. The market is segmented by deployment mode (cloud, on-premise), industry (healthcare, finance, manufacturing, etc.), and functionality (data integration, knowledge graph construction, semantic search). Key players like Expert System SpA, ChemAxon, Collexis (Elsevier), MAANA, OntoText, Cambridge Semantics, and Nervana (Intel) are actively shaping the market landscape through innovation and strategic partnerships. The North American market currently holds a significant share, but regions like Asia-Pacific are expected to witness rapid growth in the coming years.
m
Data Mining and Unsupervised Machine Learning in Canadian In Situ Oil Sands...
data.mendeley.com
Updated Feb 10, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Minxing Si (2021). Data Mining and Unsupervised Machine Learning in Canadian In Situ Oil Sands Database for Knowledge Discovery and Carbon Cost Analysis [Dataset]. http://doi.org/10.17632/8ngkgz69zb.4
Explore at:
Unique identifier
https://doi.org/10.17632/8ngkgz69zb.4
Dataset updated
Feb 10, 2021
Authors
Minxing Si
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Canada
Description
A better understanding of greenhouse gas (GHG) emissions resulting from oil sands (bitumen) extraction can help to meet global oil demands, identify potential mitigation measures, and design effective carbon policies. While several studies have attempted to model GHG emissions from oil sands extractions, these studies have encountered data availability challenges, particularly with respect to actual fuel use data, and have thus struggled to accurately quantify GHG emissions. This dataset contains actual operational data from 20 in-situ oil sands operations, including information for fuel gas, flare gas, vented gas, production, steam injection, gas injection, condensate injection, and C3 injection.
Imbalanced dataset for benchmarking
zenodo.org
data.niaid.nih.gov
application/gzip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Guillaume Lemaitre; Fernando Nogueira; Christos K. Aridas; Dayvid V. R. Oliveira; Guillaume Lemaitre; Fernando Nogueira; Christos K. Aridas; Dayvid V. R. Oliveira (2020). Imbalanced dataset for benchmarking [Dataset]. http://doi.org/10.5281/zenodo.61452
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.61452
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Guillaume Lemaitre; Fernando Nogueira; Christos K. Aridas; Dayvid V. R. Oliveira; Guillaume Lemaitre; Fernando Nogueira; Christos K. Aridas; Dayvid V. R. Oliveira
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Imbalanced dataset for benchmarking
=======================

The different algorithms of the `imbalanced-learn` toolbox are evaluated on a set of common dataset, which are more or less balanced. These benchmark have been proposed in [1]. The following section presents the main characteristics of this benchmark.

Characteristics
-------------------

|ID |Name |Repository & Target |Ratio |# samples| # features |
|:---:|:----------------------:|--------------------------------------|:------:|:-------------:|:--------------:|
|1 |Ecoli |UCI, target: imU |8.6:1 |336 |7 |
|2 |Optical Digits |UCI, target: 8 |9.1:1 |5,620 |64 |
|3 |SatImage |UCI, target: 4 |9.3:1 |6,435 |36 |
|4 |Pen Digits |UCI, target: 5 |9.4:1 |10,992 |16 |
|5 |Abalone |UCI, target: 7 |9.7:1 |4,177 |8 |
|6 |Sick Euthyroid |UCI, target: sick euthyroid |9.8:1 |3,163 |25 |
|7 |Spectrometer |UCI, target: >=44 |11:1 |531 |93 |
|8 |Car_Eval_34 |UCI, target: good, v good |12:1 |1,728 |6 |
|9 |ISOLET |UCI, target: A, B |12:1 |7,797 |617 |
|10 |US Crime |UCI, target: >0.65 |12:1 |1,994 |122 |
|11 |Yeast_ML8 |LIBSVM, target: 8 |13:1 |2,417 |103 |
|12 |Scene |LIBSVM, target: >one label |13:1 |2,407 |294 |
|13 |Libras Move |UCI, target: 1 |14:1 |360 |90 |
|14 |Thyroid Sick |UCI, target: sick |15:1 |3,772 |28 |
|15 |Coil_2000 |KDD, CoIL, target: minority |16:1 |9,822 |85 |
|16 |Arrhythmia |UCI, target: 06 |17:1 |452 |279 |
|17 |Solar Flare M0 |UCI, target: M->0 |19:1 |1,389 |10 |
|18 |OIL |UCI, target: minority |22:1 |937 |49 |
|19 |Car_Eval_4 |UCI, target: vgood |26:1 |1,728 |6 |
|20 |Wine Quality |UCI, wine, target: <=4 |26:1 |4,898 |11 |
|21 |Letter Img |UCI, target: Z |26:1 |20,000 |16 |
|22 |Yeast _ME2 |UCI, target: ME2 |28:1 |1,484 |8 |
|23 |Webpage |LIBSVM, w7a, target: minority|33:1 |49,749 |300 |
|24 |Ozone Level |UCI, ozone, data |34:1 |2,536 |72 |
|25 |Mammography |UCI, target: minority |42:1 |11,183 |6 |
|26 |Protein homo. |KDD CUP 2004, minority |111:1|145,751 |74 |
|27 |Abalone_19 |UCI, target: 19 |130:1|4,177 |8 |

References
----------
[1] Ding, Zejin, "Diversified Ensemble Classifiers for H
ighly Imbalanced Data Learning and their Application in Bioinformatics." Dissertation, Georgia State University, (2011).

[2] Blake, Catherine, and Christopher J. Merz. "UCI Repository of machine learning databases." (1998).

[3] Chang, Chih-Chung, and Chih-Jen Lin. "LIBSVM: a library for support vector machines." ACM Transactions on Intelligent Systems and Technology (TIST) 2.3 (2011): 27.

[4] Caruana, Rich, Thorsten Joachims, and Lars Backstrom. "KDD-Cup 2004: results and analysis." ACM SIGKDD Explorations Newsletter 6.2 (2004): 95-108.
n
RefMED
neuinfo.org
dknet.org
Updated Jun 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). RefMED [Dataset]. http://identifiers.org/RRID:SCR_011871
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_011871
Dataset updated
Jun 24, 2025
Description
Relevance Feedback Search Engine for PubMed. When a user enters a keyword in the search box, the PubMed search results will be returned. The user then specifies on a sample of results how much they are relevant to what she intends to find, for example, by specifying whether each article is high relevant, somewhat relevant, or not relevant. Once the user clicks Push Feedback button, the system learns a relevance function from the feedback and returns the top articles ranked highly according to the relevance function. The user can repeat the process until she gets satisfying results.
Z
Datasets from the KDD 2021 article "A Semi-Personalized System for User Cold...
data.niaid.nih.gov
Updated Jul 23, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Walid Bendada (2021). Datasets from the KDD 2021 article "A Semi-Personalized System for User Cold Start Recommendation on Music Streaming Apps" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5121673
Explore at:
Dataset updated
Jul 23, 2021
Dataset provided by
Walid Bendada
Guillaume Salha-Galvan
Léa Briand
Viet-Anh Tran
Mathieu Morlon
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We publicly release the anonymized song_embeddings.parquet user_embeddings.parquet user_features_test.parquet user_features_train.parquet user_features_validation.parquet datasets, with each of the TT-SVD or UT-ALS versions of embeddings, from the music streaming platform Deezer, as described in the article "A Semi-Personalized System for User Cold Start Recommendation on Music Streaming Apps" published in the proceedings of the 27TH ACM SIGKDD conference on knowledge discovery and data mining (KDD 2021). The paper is available here.

These datasets are used in the GitHub repository deezer/semi_perso_user_cold_start to reproduce experiments from the article.

Please cite our paper if you use our code or data in your work.
c
Data from: A KNOWLEDGE DISCOVERY STRATEGY FOR RELATING SEA SURFACE...
s.cnmilf.com
datasets.ai
+4more
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). A KNOWLEDGE DISCOVERY STRATEGY FOR RELATING SEA SURFACE TEMPERATURES TO FREQUENCIES OF TROPICAL STORMS AND GENERATING PREDICTIONS OF HURRICANES UNDER 21ST-CENTURY GLOBAL WARMING SCENARIOS [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/a-knowledge-discovery-strategy-for-relating-sea-surface-temperatures-to-frequencies-of-tro
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
A KNOWLEDGE DISCOVERY STRATEGY FOR RELATING SEA SURFACE TEMPERATURES TO FREQUENCIES OF TROPICAL STORMS AND GENERATING PREDICTIONS OF HURRICANES UNDER 21ST-CENTURY GLOBAL WARMING SCENARIOS CAITLIN RACE, MICHAEL STEINBACH, AUROOP GANGULY, FRED SEMAZZI, AND VIPIN KUMAR Abstract. The connections among greenhouse-gas emissions scenarios, global warming, and frequencies of hurricanes or tropical cyclones are among the least understood in climate science but among the most fiercely debated in the context of adaptation decisions or mitigation policies. Here we show that a knowledge discovery strategy, which leverages observations and climate model simulations, offers the promise of developing credible projections of tropical cyclones based on sea surface temperatures (SST) in a warming environment. While this study motivates the development of new methodologies in statistics and data mining, the ability to solve challenging climate science problems with innovative combinations of traditional and state-of-the-art methods is demonstrated. Here we develop new insights, albeit in a proof-of-concept sense, on the relationship between sea surface temperatures and hurricane frequencies, and generate the most likely projections with uncertainty bounds for storm counts in the 21st-century warming environment based in turn on the Intergovernmental Panel on Climate Change Special Report on Emissions Scenarios. Our preliminary insights point to the benefits that can be achieved for climate science and impacts analysis, as well as adaptation and mitigation policies, by a solution strategy that remains tailored to the climate _domain and complements physics-based climate model simulations with a combination of existing and new computational and data science approaches.
Additional file 2 of Learning from biomedical linked data to suggest valid...
springernature.figshare.com
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kevin Dalleau; Yassine Marzougui; SĂŠbastien Da Silva; Patrice Ringot; Ndeye Coumba Ndiaye; Adrien Coulet (2023). Additional file 2 of Learning from biomedical linked data to suggest valid pharmacogenes [Dataset]. http://doi.org/10.6084/m9.figshare.c.3747806_D2.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.c.3747806_D2.v1
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Kevin Dalleau; Yassine Marzougui; SĂŠbastien Da Silva; Patrice Ringot; Ndeye Coumba Ndiaye; Adrien Coulet
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SPARQL query example 2. This text file contains an example of SPARQL query that enable to explore the vicinity of an entity. This particular query returns the RDF graph surrounding, within a lenght of 4, the node pharmgkb:PA451906 that represents the warfarin, an anticoagulant drug. (TXT 392 bytes)
Data associated with "A collaborative filtering based approach to biomedical...
zenodo.org
data.niaid.nih.gov
application/gzip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jake Lever; Jake Lever (2020). Data associated with "A collaborative filtering based approach to biomedical knowledge discovery" [Dataset]. http://doi.org/10.5281/zenodo.1227313
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1227313
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jake Lever; Jake Lever
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the data set associated with the publication: "A collaborative filtering based approach to biomedical knowledge discovery" published in Bioinformatics.

The data are sets of cooccurrences of biomedical terms extracted from published abstracts and full text articles. The cooccurrences are then represented in sparse matrix form. There are three different splits of this data denoted by the prefix number on the files.

1. All - All cooccurrences combined in a single file

2. Training/Validation - All cooccurrences in publications before 2010 in training, all novel cooccurrences in publication in 2010 go in validation

3. Training+Validation/Test - All cooccurrences in publication upto and including 2010 in training+validation. All novel cooccurrences after 2010 in year by year increments and also all combined together

Furthermore there are subset files which are used in some experiments to deal with the computational cost of evaluating the full set. The associated cuids.txt file containing a link between the row/column in the matrix with the UMLS Metathesaurus CUIDs. Hence the first row of cuids.txt matches up to the 0th row/column in the matrix. Note that the matrix is square and symmetric. This work was done with UMLS Metathesaurus 2016AB.
d
Discovering Precursors to Aviation Safety Incidents: KDD 2010
catalog.data.gov
s.cnmilf.com
+2more
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Discovering Precursors to Aviation Safety Incidents: KDD 2010 [Dataset]. https://catalog.data.gov/dataset/discovering-precursors-to-aviation-safety-incidents-kdd-2010
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
Modern aircraft are producing data at an unprecedented rate with hundreds of parameters being recorded on a second by second basis. The data can be used for studying the condition of the hardware systems of the aircraft and also for studying the complex interactions between the pilot and the aircraft. NASA is developing novel data mining algorithms to detect precursors to aviation safety incidents from these data sources. This talk will cover the theoretical aspects of the algorithms and practical aspects of implementing these techniques to study one of the most complex dynamical systems in the world: the national airspace.
i
KDDCup99
ieee-dataport.org
Updated Jan 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Santhosh B J (2025). KDDCup99 [Dataset]. https://ieee-dataport.org/documents/kddcup99
Explore at:
Dataset updated
Jan 12, 2025
Authors
Santhosh B J
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
called intrusions or attacks

Facebook

Twitter

Click to copy link

Link copied

Cite

E.M. Ruiz Lobaina; C. P. Romero Suárez (2023). Results obtained in a data mining process applied to a database containing bibliographic information concerning four segments of science. [Dataset]. http://doi.org/10.6084/m9.figshare.20011798.v1

Data from: Results obtained in a data mining process applied to a database containing bibliographic information concerning four segments of science.

Explore at:

jpegAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.20011798.v1

Dataset updated

Jun 4, 2023

Dataset provided by

SciELO journals

Authors

E.M. Ruiz Lobaina; C. P. Romero Suárez

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Abstract The objective of this work is to improve the quality of the information that belongs to the database CubaCiencia, of the Institute of Scientific and Technological Information. This database has bibliographic information referring to four segments of science and is the main database of the Library Management System. The applied methodology was based on the Decision Trees, the Correlation Matrix, the 3D Scatter Plot, etc., which are techniques used by data mining, for the study of large volumes of information. The results achieved not only made it possible to improve the information in the database, but also provided truly useful patterns in the solution of the proposed objectives.

Clear search

Close search

Google apps

Main menu

Data from: Results obtained in a data mining process applied to a database...

Discovering Anomalous Aviation Safety Events Using Scalable Data Mining...

kdd cyberattack

Context

Content

Acknowledgements

Inspiration

Table_1_The TargetMine Data Warehouse: Enhancement and Updates.xlsx

Replication Data for: \"Unraveling spatial, structural, and social...

Data from: KDD Cup 1999 Data

Data for:A Real-Time Social Network-Based Knowledge Discovery System for...

Data from: Towards open data blockchain analytics: a Bitcoin perspective

kddcup99

Additional file 1 of Learning from biomedical linked data to suggest valid...

Semantic Knowledge Discovery Software Report

Data Mining and Unsupervised Machine Learning in Canadian In Situ Oil Sands...

Imbalanced dataset for benchmarking

RefMED

Datasets from the KDD 2021 article "A Semi-Personalized System for User Cold...

Data from: A KNOWLEDGE DISCOVERY STRATEGY FOR RELATING SEA SURFACE...

Additional file 2 of Learning from biomedical linked data to suggest valid...

Data associated with "A collaborative filtering based approach to biomedical...

Discovering Precursors to Aviation Safety Incidents: KDD 2010

KDDCup99

Data from: Results obtained in a data mining process applied to a database containing bibliographic information concerning four segments of science.