45 datasets found
  1. f

    Data from: Results obtained in a data mining process applied to a database...

    • scielo.figshare.com
    jpeg
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    E.M. Ruiz Lobaina; C. P. Romero Suárez (2023). Results obtained in a data mining process applied to a database containing bibliographic information concerning four segments of science. [Dataset]. http://doi.org/10.6084/m9.figshare.20011798.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    SciELO journals
    Authors
    E.M. Ruiz Lobaina; C. P. Romero Suárez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract The objective of this work is to improve the quality of the information that belongs to the database CubaCiencia, of the Institute of Scientific and Technological Information. This database has bibliographic information referring to four segments of science and is the main database of the Library Management System. The applied methodology was based on the Decision Trees, the Correlation Matrix, the 3D Scatter Plot, etc., which are techniques used by data mining, for the study of large volumes of information. The results achieved not only made it possible to improve the information in the database, but also provided truly useful patterns in the solution of the proposed objectives.

  2. d

    Discovering Anomalous Aviation Safety Events Using Scalable Data Mining...

    • datadiscoverystudio.org
    • cloud.csiss.gmu.edu
    • +6more
    Updated Sep 8, 2014
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2014). Discovering Anomalous Aviation Safety Events Using Scalable Data Mining Algorithms [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/ee5b868c86f7498ab5e1473e8d908629/html
    Explore at:
    Dataset updated
    Sep 8, 2014
    Description

    The worldwide civilian aviation system is one of the most complex dynamical systems created. Most modern commercial aircraft have onboard flight data recorders that record several hundred discrete and continuous parameters at approximately 1Hz for the entire duration of the flight. These data contain information about the flight control systems, actuators, engines, landing gear, avionics, and pilot commands. In this paper, recent advances in the development of a novel knowledge discovery process consisting of a suite of data mining techniques for identifying precursors to aviation safety incidents are discussed. The data mining techniques include scalable multiple-kernel learning for large-scale distributed anomaly detection. A novel multivariate time-series search algorithm is used to search for signatures of discovered anomalies on massive datasets. The process can identify operationally significant events due to environmental, mechanical, and human factors issues in the high-dimensional flight operations quality assurance data. All discovered anomalies are validated by a team of independent domain experts. This novel automated knowledge discovery process is aimed at complementing the state-of-the-art human-generated exceedance-based analysis that fails to discover previously unknown aviation safety incidents. In this paper, the discovery pipeline, the methods used, and some of the significant anomalies detected on real-world commercial aviation data are discussed.

  3. kdd cyberattack

    • kaggle.com
    Updated Jul 28, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ziyad Mestour (2018). kdd cyberattack [Dataset]. https://www.kaggle.com/slashtea/kdd-cyberattack/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 28, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ziyad Mestour
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Context

    This is the data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between bad'' connections, called intrusions or attacks, andgood'' normal connections. This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment.

    Content

    For more information about the contents refer to this link http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

    Acknowledgements

    The dataset is shared on Kaggle on behalf of KDD's work.

    Inspiration

    Build a classifier capable of distinguishing between attacks, and normal connections

  4. f

    Table_1_The TargetMine Data Warehouse: Enhancement and Updates.xlsx

    • frontiersin.figshare.com
    xlsx
    Updated Jun 1, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yi-An Chen; Lokesh P. Tripathi; Takeshi Fujiwara; Tatsuya Kameyama; Mari N. Itoh; Kenji Mizuguchi (2023). Table_1_The TargetMine Data Warehouse: Enhancement and Updates.xlsx [Dataset]. http://doi.org/10.3389/fgene.2019.00934.s004
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers
    Authors
    Yi-An Chen; Lokesh P. Tripathi; Takeshi Fujiwara; Tatsuya Kameyama; Mari N. Itoh; Kenji Mizuguchi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Biological data analysis is the key to new discoveries in disease biology and drug discovery. The rapid proliferation of high-throughput ‘omics’ data has necessitated a need for tools and platforms that allow the researchers to combine and analyse different types of biological data and obtain biologically relevant knowledge. We had previously developed TargetMine, an integrative data analysis platform for target prioritisation and broad-based biological knowledge discovery. Here, we describe the newly modelled biological data types and the enhanced visual and analytical features of TargetMine. These enhancements have included: an enhanced coverage of gene–gene relations, small molecule metabolite to pathway mappings, an improved literature survey feature, and in silico prediction of gene functional associations such as protein–protein interactions and global gene co-expression. We have also described two usage examples on trans-omics data analysis and extraction of gene-disease associations using MeSH term descriptors. These examples have demonstrated how the newer enhancements in TargetMine have contributed to a more expansive coverage of the biological data space and can help interpret genotype–phenotype relations. TargetMine with its auxiliary toolkit is available at https://targetmine.mizuguchilab.org. The TargetMine source code is available at https://github.com/chenyian-nibio/targetmine-gradle.

  5. d

    Replication Data for: \"Unraveling spatial, structural, and social...

    • search.dataone.org
    Updated Nov 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PÁJARO, Agustin; DURAN, Ignacio J.; RODRIGO, Pablo (2023). Replication Data for: \"Unraveling spatial, structural, and social country-level conditions for the emergence of the foreign fighter phenomenon: an exploratory data mining approach to the case of ISIS\" [Dataset]. http://doi.org/10.7910/DVN/SFT3RT
    Explore at:
    Dataset updated
    Nov 9, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    PÁJARO, Agustin; DURAN, Ignacio J.; RODRIGO, Pablo
    Description

    Data from the article "Unraveling spatial, structural, and social country-level conditions for the emergence of the foreign fighter phenomenon: an exploratory data mining approach to the case of ISIS", by Agustin Pájaro, Ignacio J. Duran and Pablo Rodrigo, published in Revista DADOS, v. 65, n. 3, 2022.

  6. i

    Data from: KDD Cup 1999 Data

    • impactcybertrust.org
    Updated Jan 19, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    External Data Source (2019). KDD Cup 1999 Data [Dataset]. http://doi.org/10.23721/100/1478801
    Explore at:
    Dataset updated
    Jan 19, 2019
    Authors
    External Data Source
    Description

    This is the data set used for intrusion detector learning task in the Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99, The Fifth International Conference on Knowledge Discovery and Data Mining. The intrusion detector learning task is to build a predictive model (i.e. a classifier) capable of distinguishing between bad'' connections, called intrusions or attacks, andgood'' normal connections.

    The 1998 DARPA Intrusion Detection Evaluation Program was prepared and managed by MIT Lincoln Labs. The objective was to survey and evaluate research in intrusion detection. A standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment, was provided. The 1999 KDD intrusion detection contest uses a version of this dataset.

    Lincoln Labs set up an environment to acquire nine weeks of raw TCP dump data for a local-area network (LAN) simulating a typical U.S. Air Force LAN. They operated the LAN as if it were a true Air Force environment, but peppered it with multiple attacks.

    The raw training data was about four gigabytes of compressed binary TCP dump data from seven weeks of network traffic. This was processed into about five million connection records. Similarly, the two weeks of test data yielded around two million connection records. ; gcounsel@ics.uci.edu

  7. m

    Data for:A Real-Time Social Network-Based Knowledge Discovery System for...

    • data.mendeley.com
    Updated Feb 18, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asim Sinan Yuksel (2018). Data for:A Real-Time Social Network-Based Knowledge Discovery System for Decision Making [Dataset]. http://doi.org/10.17632/29tbvvwkdp.1
    Explore at:
    Dataset updated
    Feb 18, 2018
    Authors
    Asim Sinan Yuksel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    1-Turkish comments for 128 venues in Foursquare Social Network Platform (binary and ternary classified) 2-Turkish adjectives and polarities 3-Turkish food and drink names 4- All comments without tagging 5-Venues, liked meals/foods

  8. d

    Data from: Towards open data blockchain analytics: a Bitcoin perspective

    • search.dataone.org
    • datadryad.org
    • +1more
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dan McGinn; Douglas McIlwraith; Yike Guo (2025). Towards open data blockchain analytics: a Bitcoin perspective [Dataset]. http://doi.org/10.5061/dryad.h9r0p65
    Explore at:
    Dataset updated
    Jun 12, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Dan McGinn; Douglas McIlwraith; Yike Guo
    Time period covered
    Jul 9, 2018
    Description

    Bitcoin is the first implementation of a technology that has become known as a 'public permissionless' blockchain. Such systems allow public read/write access to an append-only blockchain database without the need for any mediating central authority. Instead they guarantee access, security and protocol conformity through an elegant combination of cryptographic assurances and game theoretic economic incentives. Not until the advent of the Bitcoin blockchain has such a trusted, transparent, comprehensive and granular data set of digital economic behaviours been available for public network analysis. In this article, by translating the cumbersome binary data structure of the Bitcoin blockchain into a high fidelity graph model, we demonstrate through various analyses the often overlooked social and econometric benefits of employing such a novel open data architecture. Specifically we show (a) how repeated patterns of transaction behaviours can be revealed to link user activity across t...

  9. T

    kddcup99

    • tensorflow.org
    Updated Jan 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). kddcup99 [Dataset]. https://www.tensorflow.org/datasets/catalog/kddcup99
    Explore at:
    Dataset updated
    Jan 4, 2023
    Description

    This is the data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between 'bad' connections, called intrusions or attacks, and 'good' normal connections. This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('kddcup99', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  10. Additional file 1 of Learning from biomedical linked data to suggest valid...

    • springernature.figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kevin Dalleau; Yassine Marzougui; SĂŠbastien Da Silva; Patrice Ringot; Ndeye Coumba Ndiaye; Adrien Coulet (2023). Additional file 1 of Learning from biomedical linked data to suggest valid pharmacogenes [Dataset]. http://doi.org/10.6084/m9.figshare.c.3747806_D1.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Kevin Dalleau; Yassine Marzougui; SĂŠbastien Da Silva; Patrice Ringot; Ndeye Coumba Ndiaye; Adrien Coulet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SPARQL query example 1. This text file contains the SPARQL query we apply on our PGx linked data to obtain the data graph represented in Fig. 3. This query includes the definition of prefixes mentioned in Figs. 2 and 3. This query takes about 30 s on our https://pgxlod.loria.fr server. (TXT 2 kb)

  11. S

    Semantic Knowledge Discovery Software Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Semantic Knowledge Discovery Software Report [Dataset]. https://www.datainsightsmarket.com/reports/semantic-knowledge-discovery-software-1949491
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    May 29, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Semantic Knowledge Discovery Software market is experiencing robust growth, driven by the increasing need for organizations to extract actionable insights from complex and unstructured data. The market, estimated at $2 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching an estimated $6 billion by 2033. This growth is fueled by several key factors. The rising adoption of artificial intelligence (AI) and machine learning (ML) technologies across various industries is enabling more sophisticated semantic analysis, leading to improved decision-making. Furthermore, the proliferation of big data, coupled with the limitations of traditional data analysis methods, is driving the demand for solutions that can effectively uncover hidden patterns and relationships within vast datasets. The growing emphasis on data-driven decision-making across sectors like healthcare, finance, and research and development is also contributing significantly to market expansion. Major restraints to market growth include the high initial investment costs associated with implementing semantic knowledge discovery software, the complexity of integrating these solutions with existing IT infrastructure, and the scarcity of skilled professionals capable of managing and interpreting the results generated by these systems. However, these challenges are being addressed through the development of more user-friendly software, cloud-based deployment models that reduce upfront costs, and increased training and education programs focused on semantic technology. The market is segmented by deployment mode (cloud, on-premise), industry (healthcare, finance, manufacturing, etc.), and functionality (data integration, knowledge graph construction, semantic search). Key players like Expert System SpA, ChemAxon, Collexis (Elsevier), MAANA, OntoText, Cambridge Semantics, and Nervana (Intel) are actively shaping the market landscape through innovation and strategic partnerships. The North American market currently holds a significant share, but regions like Asia-Pacific are expected to witness rapid growth in the coming years.

  12. m

    Data Mining and Unsupervised Machine Learning in Canadian In Situ Oil Sands...

    • data.mendeley.com
    Updated Feb 10, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Minxing Si (2021). Data Mining and Unsupervised Machine Learning in Canadian In Situ Oil Sands Database for Knowledge Discovery and Carbon Cost Analysis [Dataset]. http://doi.org/10.17632/8ngkgz69zb.4
    Explore at:
    Dataset updated
    Feb 10, 2021
    Authors
    Minxing Si
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Canada
    Description

    A better understanding of greenhouse gas (GHG) emissions resulting from oil sands (bitumen) extraction can help to meet global oil demands, identify potential mitigation measures, and design effective carbon policies. While several studies have attempted to model GHG emissions from oil sands extractions, these studies have encountered data availability challenges, particularly with respect to actual fuel use data, and have thus struggled to accurately quantify GHG emissions. This dataset contains actual operational data from 20 in-situ oil sands operations, including information for fuel gas, flare gas, vented gas, production, steam injection, gas injection, condensate injection, and C3 injection.

  13. Imbalanced dataset for benchmarking

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Guillaume Lemaitre; Fernando Nogueira; Christos K. Aridas; Dayvid V. R. Oliveira; Guillaume Lemaitre; Fernando Nogueira; Christos K. Aridas; Dayvid V. R. Oliveira (2020). Imbalanced dataset for benchmarking [Dataset]. http://doi.org/10.5281/zenodo.61452
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Guillaume Lemaitre; Fernando Nogueira; Christos K. Aridas; Dayvid V. R. Oliveira; Guillaume Lemaitre; Fernando Nogueira; Christos K. Aridas; Dayvid V. R. Oliveira
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Imbalanced dataset for benchmarking
    =======================

    The different algorithms of the `imbalanced-learn` toolbox are evaluated on a set of common dataset, which are more or less balanced. These benchmark have been proposed in [1]. The following section presents the main characteristics of this benchmark.

    Characteristics
    -------------------

    |ID |Name |Repository & Target |Ratio |# samples| # features |
    |:---:|:----------------------:|--------------------------------------|:------:|:-------------:|:--------------:|
    |1 |Ecoli |UCI, target: imU |8.6:1 |336 |7 |
    |2 |Optical Digits |UCI, target: 8 |9.1:1 |5,620 |64 |
    |3 |SatImage |UCI, target: 4 |9.3:1 |6,435 |36 |
    |4 |Pen Digits |UCI, target: 5 |9.4:1 |10,992 |16 |
    |5 |Abalone |UCI, target: 7 |9.7:1 |4,177 |8 |
    |6 |Sick Euthyroid |UCI, target: sick euthyroid |9.8:1 |3,163 |25 |
    |7 |Spectrometer |UCI, target: >=44 |11:1 |531 |93 |
    |8 |Car_Eval_34 |UCI, target: good, v good |12:1 |1,728 |6 |
    |9 |ISOLET |UCI, target: A, B |12:1 |7,797 |617 |
    |10 |US Crime |UCI, target: >0.65 |12:1 |1,994 |122 |
    |11 |Yeast_ML8 |LIBSVM, target: 8 |13:1 |2,417 |103 |
    |12 |Scene |LIBSVM, target: >one label |13:1 |2,407 |294 |
    |13 |Libras Move |UCI, target: 1 |14:1 |360 |90 |
    |14 |Thyroid Sick |UCI, target: sick |15:1 |3,772 |28 |
    |15 |Coil_2000 |KDD, CoIL, target: minority |16:1 |9,822 |85 |
    |16 |Arrhythmia |UCI, target: 06 |17:1 |452 |279 |
    |17 |Solar Flare M0 |UCI, target: M->0 |19:1 |1,389 |10 |
    |18 |OIL |UCI, target: minority |22:1 |937 |49 |
    |19 |Car_Eval_4 |UCI, target: vgood |26:1 |1,728 |6 |
    |20 |Wine Quality |UCI, wine, target: <=4 |26:1 |4,898 |11 |
    |21 |Letter Img |UCI, target: Z |26:1 |20,000 |16 |
    |22 |Yeast _ME2 |UCI, target: ME2 |28:1 |1,484 |8 |
    |23 |Webpage |LIBSVM, w7a, target: minority|33:1 |49,749 |300 |
    |24 |Ozone Level |UCI, ozone, data |34:1 |2,536 |72 |
    |25 |Mammography |UCI, target: minority |42:1 |11,183 |6 |
    |26 |Protein homo. |KDD CUP 2004, minority |111:1|145,751 |74 |
    |27 |Abalone_19 |UCI, target: 19 |130:1|4,177 |8 |

    References
    ----------
    [1] Ding, Zejin, "Diversified Ensemble Classifiers for H
    ighly Imbalanced Data Learning and their Application in Bioinformatics." Dissertation, Georgia State University, (2011).

    [2] Blake, Catherine, and Christopher J. Merz. "UCI Repository of machine learning databases." (1998).

    [3] Chang, Chih-Chung, and Chih-Jen Lin. "LIBSVM: a library for support vector machines." ACM Transactions on Intelligent Systems and Technology (TIST) 2.3 (2011): 27.

    [4] Caruana, Rich, Thorsten Joachims, and Lars Backstrom. "KDD-Cup 2004: results and analysis." ACM SIGKDD Explorations Newsletter 6.2 (2004): 95-108.

  14. n

    RefMED

    • neuinfo.org
    • dknet.org
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). RefMED [Dataset]. http://identifiers.org/RRID:SCR_011871
    Explore at:
    Dataset updated
    Jun 24, 2025
    Description

    Relevance Feedback Search Engine for PubMed. When a user enters a keyword in the search box, the PubMed search results will be returned. The user then specifies on a sample of results how much they are relevant to what she intends to find, for example, by specifying whether each article is high relevant, somewhat relevant, or not relevant. Once the user clicks Push Feedback button, the system learns a relevance function from the feedback and returns the top articles ranked highly according to the relevance function. The user can repeat the process until she gets satisfying results.

  15. Z

    Datasets from the KDD 2021 article "A Semi-Personalized System for User Cold...

    • data.niaid.nih.gov
    Updated Jul 23, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Walid Bendada (2021). Datasets from the KDD 2021 article "A Semi-Personalized System for User Cold Start Recommendation on Music Streaming Apps" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5121673
    Explore at:
    Dataset updated
    Jul 23, 2021
    Dataset provided by
    Walid Bendada
    Guillaume Salha-Galvan
    Léa Briand
    Viet-Anh Tran
    Mathieu Morlon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We publicly release the anonymized song_embeddings.parquet user_embeddings.parquet user_features_test.parquet user_features_train.parquet user_features_validation.parquet datasets, with each of the TT-SVD or UT-ALS versions of embeddings, from the music streaming platform Deezer, as described in the article "A Semi-Personalized System for User Cold Start Recommendation on Music Streaming Apps" published in the proceedings of the 27TH ACM SIGKDD conference on knowledge discovery and data mining (KDD 2021). The paper is available here.

    These datasets are used in the GitHub repository deezer/semi_perso_user_cold_start to reproduce experiments from the article.

    Please cite our paper if you use our code or data in your work.

  16. c

    Data from: A KNOWLEDGE DISCOVERY STRATEGY FOR RELATING SEA SURFACE...

    • s.cnmilf.com
    • datasets.ai
    • +4more
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). A KNOWLEDGE DISCOVERY STRATEGY FOR RELATING SEA SURFACE TEMPERATURES TO FREQUENCIES OF TROPICAL STORMS AND GENERATING PREDICTIONS OF HURRICANES UNDER 21ST-CENTURY GLOBAL WARMING SCENARIOS [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/a-knowledge-discovery-strategy-for-relating-sea-surface-temperatures-to-frequencies-of-tro
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    A KNOWLEDGE DISCOVERY STRATEGY FOR RELATING SEA SURFACE TEMPERATURES TO FREQUENCIES OF TROPICAL STORMS AND GENERATING PREDICTIONS OF HURRICANES UNDER 21ST-CENTURY GLOBAL WARMING SCENARIOS CAITLIN RACE, MICHAEL STEINBACH, AUROOP GANGULY, FRED SEMAZZI, AND VIPIN KUMAR Abstract. The connections among greenhouse-gas emissions scenarios, global warming, and frequencies of hurricanes or tropical cyclones are among the least understood in climate science but among the most fiercely debated in the context of adaptation decisions or mitigation policies. Here we show that a knowledge discovery strategy, which leverages observations and climate model simulations, offers the promise of developing credible projections of tropical cyclones based on sea surface temperatures (SST) in a warming environment. While this study motivates the development of new methodologies in statistics and data mining, the ability to solve challenging climate science problems with innovative combinations of traditional and state-of-the-art methods is demonstrated. Here we develop new insights, albeit in a proof-of-concept sense, on the relationship between sea surface temperatures and hurricane frequencies, and generate the most likely projections with uncertainty bounds for storm counts in the 21st-century warming environment based in turn on the Intergovernmental Panel on Climate Change Special Report on Emissions Scenarios. Our preliminary insights point to the benefits that can be achieved for climate science and impacts analysis, as well as adaptation and mitigation policies, by a solution strategy that remains tailored to the climate _domain and complements physics-based climate model simulations with a combination of existing and new computational and data science approaches.

  17. Additional file 2 of Learning from biomedical linked data to suggest valid...

    • springernature.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kevin Dalleau; Yassine Marzougui; SĂŠbastien Da Silva; Patrice Ringot; Ndeye Coumba Ndiaye; Adrien Coulet (2023). Additional file 2 of Learning from biomedical linked data to suggest valid pharmacogenes [Dataset]. http://doi.org/10.6084/m9.figshare.c.3747806_D2.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Kevin Dalleau; Yassine Marzougui; SĂŠbastien Da Silva; Patrice Ringot; Ndeye Coumba Ndiaye; Adrien Coulet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SPARQL query example 2. This text file contains an example of SPARQL query that enable to explore the vicinity of an entity. This particular query returns the RDF graph surrounding, within a lenght of 4, the node pharmgkb:PA451906 that represents the warfarin, an anticoagulant drug. (TXT 392 bytes)

  18. Data associated with "A collaborative filtering based approach to biomedical...

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jake Lever; Jake Lever (2020). Data associated with "A collaborative filtering based approach to biomedical knowledge discovery" [Dataset]. http://doi.org/10.5281/zenodo.1227313
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Jake Lever; Jake Lever
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the data set associated with the publication: "A collaborative filtering based approach to biomedical knowledge discovery" published in Bioinformatics.

    The data are sets of cooccurrences of biomedical terms extracted from published abstracts and full text articles. The cooccurrences are then represented in sparse matrix form. There are three different splits of this data denoted by the prefix number on the files.

    1. All - All cooccurrences combined in a single file

    2. Training/Validation - All cooccurrences in publications before 2010 in training, all novel cooccurrences in publication in 2010 go in validation

    3. Training+Validation/Test - All cooccurrences in publication upto and including 2010 in training+validation. All novel cooccurrences after 2010 in year by year increments and also all combined together

    Furthermore there are subset files which are used in some experiments to deal with the computational cost of evaluating the full set. The associated cuids.txt file containing a link between the row/column in the matrix with the UMLS Metathesaurus CUIDs. Hence the first row of cuids.txt matches up to the 0th row/column in the matrix. Note that the matrix is square and symmetric. This work was done with UMLS Metathesaurus 2016AB.

  19. d

    Discovering Precursors to Aviation Safety Incidents: KDD 2010

    • catalog.data.gov
    • s.cnmilf.com
    • +2more
    Updated Apr 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Discovering Precursors to Aviation Safety Incidents: KDD 2010 [Dataset]. https://catalog.data.gov/dataset/discovering-precursors-to-aviation-safety-incidents-kdd-2010
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    Modern aircraft are producing data at an unprecedented rate with hundreds of parameters being recorded on a second by second basis. The data can be used for studying the condition of the hardware systems of the aircraft and also for studying the complex interactions between the pilot and the aircraft. NASA is developing novel data mining algorithms to detect precursors to aviation safety incidents from these data sources. This talk will cover the theoretical aspects of the algorithms and practical aspects of implementing these techniques to study one of the most complex dynamical systems in the world: the national airspace.

  20. i

    KDDCup99

    • ieee-dataport.org
    Updated Jan 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Santhosh B J (2025). KDDCup99 [Dataset]. https://ieee-dataport.org/documents/kddcup99
    Explore at:
    Dataset updated
    Jan 12, 2025
    Authors
    Santhosh B J
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    called intrusions or attacks

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
E.M. Ruiz Lobaina; C. P. Romero Suárez (2023). Results obtained in a data mining process applied to a database containing bibliographic information concerning four segments of science. [Dataset]. http://doi.org/10.6084/m9.figshare.20011798.v1

Data from: Results obtained in a data mining process applied to a database containing bibliographic information concerning four segments of science.

Related Article
Explore at:
jpegAvailable download formats
Dataset updated
Jun 4, 2023
Dataset provided by
SciELO journals
Authors
E.M. Ruiz Lobaina; C. P. Romero Suárez
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Abstract The objective of this work is to improve the quality of the information that belongs to the database CubaCiencia, of the Institute of Scientific and Technological Information. This database has bibliographic information referring to four segments of science and is the main database of the Library Management System. The applied methodology was based on the Decision Trees, the Correlation Matrix, the 3D Scatter Plot, etc., which are techniques used by data mining, for the study of large volumes of information. The results achieved not only made it possible to improve the information in the database, but also provided truly useful patterns in the solution of the proposed objectives.

Search
Clear search
Close search
Google apps
Main menu