60 datasets found
  1. f

    Data from: Results obtained in a data mining process applied to a database...

    • scielo.figshare.com
    jpeg
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    E.M. Ruiz Lobaina; C. P. Romero Suárez (2023). Results obtained in a data mining process applied to a database containing bibliographic information concerning four segments of science. [Dataset]. http://doi.org/10.6084/m9.figshare.20011798.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    SciELO journals
    Authors
    E.M. Ruiz Lobaina; C. P. Romero Suárez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract The objective of this work is to improve the quality of the information that belongs to the database CubaCiencia, of the Institute of Scientific and Technological Information. This database has bibliographic information referring to four segments of science and is the main database of the Library Management System. The applied methodology was based on the Decision Trees, the Correlation Matrix, the 3D Scatter Plot, etc., which are techniques used by data mining, for the study of large volumes of information. The results achieved not only made it possible to improve the information in the database, but also provided truly useful patterns in the solution of the proposed objectives.

  2. d

    Replication Data for: \"Unraveling spatial, structural, and social...

    • search.dataone.org
    Updated Nov 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PÁJARO, Agustin; DURAN, Ignacio J.; RODRIGO, Pablo (2023). Replication Data for: \"Unraveling spatial, structural, and social country-level conditions for the emergence of the foreign fighter phenomenon: an exploratory data mining approach to the case of ISIS\" [Dataset]. http://doi.org/10.7910/DVN/SFT3RT
    Explore at:
    Dataset updated
    Nov 9, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    PÁJARO, Agustin; DURAN, Ignacio J.; RODRIGO, Pablo
    Description

    Data from the article "Unraveling spatial, structural, and social country-level conditions for the emergence of the foreign fighter phenomenon: an exploratory data mining approach to the case of ISIS", by Agustin Pájaro, Ignacio J. Duran and Pablo Rodrigo, published in Revista DADOS, v. 65, n. 3, 2022.

  3. Additional file 1 of Learning from biomedical linked data to suggest valid...

    • springernature.figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kevin Dalleau; Yassine Marzougui; SĂŠbastien Da Silva; Patrice Ringot; Ndeye Coumba Ndiaye; Adrien Coulet (2023). Additional file 1 of Learning from biomedical linked data to suggest valid pharmacogenes [Dataset]. http://doi.org/10.6084/m9.figshare.c.3747806_D1.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Kevin Dalleau; Yassine Marzougui; SĂŠbastien Da Silva; Patrice Ringot; Ndeye Coumba Ndiaye; Adrien Coulet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SPARQL query example 1. This text file contains the SPARQL query we apply on our PGx linked data to obtain the data graph represented in Fig. 3. This query includes the definition of prefixes mentioned in Figs. 2 and 3. This query takes about 30 s on our https://pgxlod.loria.fr server. (TXT 2 kb)

  4. d

    Data from: Towards open data blockchain analytics: a Bitcoin perspective

    • search.dataone.org
    • datadryad.org
    • +1more
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dan McGinn; Douglas McIlwraith; Yike Guo (2025). Towards open data blockchain analytics: a Bitcoin perspective [Dataset]. http://doi.org/10.5061/dryad.h9r0p65
    Explore at:
    Dataset updated
    Jun 12, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Dan McGinn; Douglas McIlwraith; Yike Guo
    Time period covered
    Jul 9, 2018
    Description

    Bitcoin is the first implementation of a technology that has become known as a 'public permissionless' blockchain. Such systems allow public read/write access to an append-only blockchain database without the need for any mediating central authority. Instead they guarantee access, security and protocol conformity through an elegant combination of cryptographic assurances and game theoretic economic incentives. Not until the advent of the Bitcoin blockchain has such a trusted, transparent, comprehensive and granular data set of digital economic behaviours been available for public network analysis. In this article, by translating the cumbersome binary data structure of the Bitcoin blockchain into a high fidelity graph model, we demonstrate through various analyses the often overlooked social and econometric benefits of employing such a novel open data architecture. Specifically we show (a) how repeated patterns of transaction behaviours can be revealed to link user activity across t...

  5. Additional file 2 of Learning from biomedical linked data to suggest valid...

    • springernature.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kevin Dalleau; Yassine Marzougui; SĂŠbastien Da Silva; Patrice Ringot; Ndeye Coumba Ndiaye; Adrien Coulet (2023). Additional file 2 of Learning from biomedical linked data to suggest valid pharmacogenes [Dataset]. http://doi.org/10.6084/m9.figshare.c.3747806_D2.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Kevin Dalleau; Yassine Marzougui; SĂŠbastien Da Silva; Patrice Ringot; Ndeye Coumba Ndiaye; Adrien Coulet
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SPARQL query example 2. This text file contains an example of SPARQL query that enable to explore the vicinity of an entity. This particular query returns the RDF graph surrounding, within a lenght of 4, the node pharmgkb:PA451906 that represents the warfarin, an anticoagulant drug. (TXT 392 bytes)

  6. D

    Data from: Knowledge Discovery in Biological Databases for Revealing...

    • ckan.grassroots.tools
    html, pdf
    Updated Aug 7, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rothamsted Research (2019). Knowledge Discovery in Biological Databases for Revealing Candidate Genes Linked to Complex Phenotypes [Dataset]. https://ckan.grassroots.tools/dataset/bf47bbcd-d26b-40a1-a86b-144f37570967
    Explore at:
    html, pdfAvailable download formats
    Dataset updated
    Aug 7, 2019
    Dataset provided by
    Rothamsted Research
    License

    Attribution-NonCommercial-NoDerivs 3.0 (CC BY-NC-ND 3.0)https://creativecommons.org/licenses/by-nc-nd/3.0/
    License information was derived automatically

    Description

    jats:titleAbstract/jats:titlejats:pGenetics and “omics” studies designed to uncover genotype to phenotype relationships often identify large numbers of potential candidate genes, among which the causal genes are hidden. Scientists generally lack the time and technical expertise to review all relevant information available from the literature, from key model species and from a potentially wide range of related biological databases in a variety of data formats with variable quality and coverage. Computational tools are needed for the integration and evaluation of heterogeneous information in order to prioritise candidate genes and components of interaction networks that, if perturbed through potential interventions, have a positive impact on the biological outcome in the whole organism without producing negative side effects. Here we review several bioinformatics tools and databases that play an important role in biological knowledge discovery and candidate gene prioritization. We conclude with several key challenges that need to be addressed in order to facilitate biological knowledge discovery in the future./jats:p

  7. m

    Data Mining and Unsupervised Machine Learning in Canadian In Situ Oil Sands...

    • data.mendeley.com
    Updated Feb 10, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Minxing Si (2021). Data Mining and Unsupervised Machine Learning in Canadian In Situ Oil Sands Database for Knowledge Discovery and Carbon Cost Analysis [Dataset]. http://doi.org/10.17632/8ngkgz69zb.4
    Explore at:
    Dataset updated
    Feb 10, 2021
    Authors
    Minxing Si
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Canada
    Description

    A better understanding of greenhouse gas (GHG) emissions resulting from oil sands (bitumen) extraction can help to meet global oil demands, identify potential mitigation measures, and design effective carbon policies. While several studies have attempted to model GHG emissions from oil sands extractions, these studies have encountered data availability challenges, particularly with respect to actual fuel use data, and have thus struggled to accurately quantify GHG emissions. This dataset contains actual operational data from 20 in-situ oil sands operations, including information for fuel gas, flare gas, vented gas, production, steam injection, gas injection, condensate injection, and C3 injection.

  8. kdd cyberattack

    • kaggle.com
    Updated Jul 28, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ziyad Mestour (2018). kdd cyberattack [Dataset]. https://www.kaggle.com/slashtea/kdd-cyberattack/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 28, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ziyad Mestour
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Context

    This is the data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between bad'' connections, called intrusions or attacks, andgood'' normal connections. This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment.

    Content

    For more information about the contents refer to this link http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

    Acknowledgements

    The dataset is shared on Kaggle on behalf of KDD's work.

    Inspiration

    Build a classifier capable of distinguishing between attacks, and normal connections

  9. f

    Data from: Evaluation of classification techniques for identifying fake...

    • scielo.figshare.com
    jpeg
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrey Schmidt dos Santos; Luis Felipe Riehs Camargo; Daniel Pacheco Lacerda (2023). Evaluation of classification techniques for identifying fake reviews about products and services on the internet [Dataset]. http://doi.org/10.6084/m9.figshare.14283143.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    SciELO journals
    Authors
    Andrey Schmidt dos Santos; Luis Felipe Riehs Camargo; Daniel Pacheco Lacerda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract: With the e-commerce growth, more people are buying products over the internet. To increase customer satisfaction, merchants provide spaces for product and service reviews. Products with positive reviews attract customers, while products with negative reviews lose customers. Following this idea, some individuals and corporations write fake reviews to promote their products and services or defame their competitors. The difficulty for finding these reviews was in the large amount of information available. One solution is to use data mining techniques and tools, such as the classification function. Exploring this situation, the present work evaluates classification techniques to identify fake reviews about products and services on the Internet. The research also presents a literature systematic review on fake reviews. The research used 8 classification algorithms. The algorithms were trained and tested with a hotels database. The CONCENSO algorithm presented the best result, with 88% in the precision indicator. After the first test, the algorithms classified reviews on another hotels database. To compare the results of this new classification, the Review Skeptic algorithm was used. The SVM and GLMNET algorithms presented the highest convergence with the Review Skeptic algorithm, classifying 83% of reviews with the same result. The research contributes by demonstrating the algorithms ability to understand consumers’ real reviews to products and services on the Internet. Another contribution is to be the pioneer in the investigation of fake reviews in Brazil and in production engineering.

  10. P

    KDD Cup 1999 Dataset

    • paperswithcode.com
    • opendatalab.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KDD Cup 1999 Dataset [Dataset]. https://paperswithcode.com/dataset/kdd-cup-1999-data-data-set
    Explore at:
    Description

    This is the data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between bad'' connections, called intrusions or attacks, andgood'' normal connections. This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment.

  11. m

    Applying Knowledge Discovery in Databases to Canadian In Situ Oil Sands...

    • data.mendeley.com
    Updated Dec 21, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Minxing Si (2019). Applying Knowledge Discovery in Databases to Canadian In Situ Oil Sands Extraction [Dataset]. http://doi.org/10.17632/8ngkgz69zb.1
    Explore at:
    Dataset updated
    Dec 21, 2019
    Authors
    Minxing Si
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Canada
    Description

    Alberta’s oil sands play a critical role in Canada meeting its commitment to the Paris Climate Change Agreement. However, few studies published the actual operation data for extraction operations (schemes), especially fuel consumption data to accurately project greenhouse gas (GHG) emissions for development and expansion of oil sands projects. In this study, we mined 2015–2018 operation data from over 29 million records in Petrinex via knowledge discovery in databases (KDD) process, and described GHG and fuel consumption patterns for 20 in situ oil sands extraction schemes (representing > 80% in situ extractions in 2018). The discovered patterns were interpreted by a range of performance indicators. From 2015 to 2018, GHG emission intensity (EI) for the schemes dropped by 7.5% from 0.6193 t CO2e/m3 bitumen (oil) to 0.5732 t CO2e/m3 bitumen. On the four-year average, the in situ oil sands extractions used 3.8632 m3 steam to produce 1 m3 of oil (3.8632 m3 steam / m3 oil) with a range of 1.8170 to 7.0628 m3 steam / m3 oil; consumed 0.0668 103m3 steam generator fuel (SGF) to produce 1 m3 of steam (0.0668 103m3 SGF/ m3 steam) with a range of 0.0288 to 0.0910 103m3 SGF/m3 steam; consumed 0.2995 103m3 of stationary combustion fuel (SCF) to produce 1 m3 of bitumen (0.2955 103m3 SCF/m3 bitumen) with a range of 0.1224 to 0.6176 103m3 SCF/m3 bitumen. The Peace River region had the highest solution gas oil ratio. The region produced 0.0819 103m3 of solution gas from 1 m3 of bitumen produced (0.0819 103m3 solution gas/m3 bitumen). On average, cyclic steam stimulation recovery method used 53.5% more steam to produce 1 m3 of bitumen and used 11.1% more SGF to produce 1 m3 of steam, compared to steam assisted gravity drainage recovery method. With the carbon price at C$30/t CO2e and Western Canadian Select (WCS) crude oil price at US$38.46/bbl, the GHG costs account for 0.33% to 8.81% of WCS crude price using Alberta’s emission benchmark. The study provides methods to mine the public database – Petrinex for studying GHG, energy, and water consumption by the oil and gas industry in Canada. The results also provide more accurate energy and emission intensity, which can be used for GHG life cycle assessment and compared with other energy extraction methods on a life cycle basis.

  12. T

    kddcup99

    • tensorflow.org
    Updated Jan 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). kddcup99 [Dataset]. https://www.tensorflow.org/datasets/catalog/kddcup99
    Explore at:
    Dataset updated
    Jan 4, 2023
    Description

    This is the data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between 'bad' connections, called intrusions or attacks, and 'good' normal connections. This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('kddcup99', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  13. r

    International Journal of Engineering and Advanced Technology FAQ -...

    • researchhelpdesk.org
    Updated May 28, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Research Help Desk (2022). International Journal of Engineering and Advanced Technology FAQ - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/faq/552/international-journal-of-engineering-and-advanced-technology
    Explore at:
    Dataset updated
    May 28, 2022
    Dataset authored and provided by
    Research Help Desk
    Description

    International Journal of Engineering and Advanced Technology FAQ - ResearchHelpDesk - International Journal of Engineering and Advanced Technology (IJEAT) is having Online-ISSN 2249-8958, bi-monthly international journal, being published in the months of February, April, June, August, October, and December by Blue Eyes Intelligence Engineering & Sciences Publication (BEIESP) Bhopal (M.P.), India since the year 2011. It is academic, online, open access, double-blind, peer-reviewed international journal. It aims to publish original, theoretical and practical advances in Computer Science & Engineering, Information Technology, Electrical and Electronics Engineering, Electronics and Telecommunication, Mechanical Engineering, Civil Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. All submitted papers will be reviewed by the board of committee of IJEAT. Aim of IJEAT Journal disseminate original, scientific, theoretical or applied research in the field of Engineering and allied fields. dispense a platform for publishing results and research with a strong empirical component. aqueduct the significant gap between research and practice by promoting the publication of original, novel, industry-relevant research. seek original and unpublished research papers based on theoretical or experimental works for the publication globally. publish original, theoretical and practical advances in Computer Science & Engineering, Information Technology, Electrical and Electronics Engineering, Electronics and Telecommunication, Mechanical Engineering, Civil Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. impart a platform for publishing results and research with a strong empirical component. create a bridge for a significant gap between research and practice by promoting the publication of original, novel, industry-relevant research. solicit original and unpublished research papers, based on theoretical or experimental works. Scope of IJEAT International Journal of Engineering and Advanced Technology (IJEAT) covers all topics of all engineering branches. Some of them are Computer Science & Engineering, Information Technology, Electronics & Communication, Electrical and Electronics, Electronics and Telecommunication, Civil Engineering, Mechanical Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. The main topic includes but not limited to: 1. Smart Computing and Information Processing Signal and Speech Processing Image Processing and Pattern Recognition WSN Artificial Intelligence and machine learning Data mining and warehousing Data Analytics Deep learning Bioinformatics High Performance computing Advanced Computer networking Cloud Computing IoT Parallel Computing on GPU Human Computer Interactions 2. Recent Trends in Microelectronics and VLSI Design Process & Device Technologies Low-power design Nanometer-scale integrated circuits Application specific ICs (ASICs) FPGAs Nanotechnology Nano electronics and Quantum Computing 3. Challenges of Industry and their Solutions, Communications Advanced Manufacturing Technologies Artificial Intelligence Autonomous Robots Augmented Reality Big Data Analytics and Business Intelligence Cyber Physical Systems (CPS) Digital Clone or Simulation Industrial Internet of Things (IIoT) Manufacturing IOT Plant Cyber security Smart Solutions – Wearable Sensors and Smart Glasses System Integration Small Batch Manufacturing Visual Analytics Virtual Reality 3D Printing 4. Internet of Things (IoT) Internet of Things (IoT) & IoE & Edge Computing Distributed Mobile Applications Utilizing IoT Security, Privacy and Trust in IoT & IoE Standards for IoT Applications Ubiquitous Computing Block Chain-enabled IoT Device and Data Security and Privacy Application of WSN in IoT Cloud Resources Utilization in IoT Wireless Access Technologies for IoT Mobile Applications and Services for IoT Machine/ Deep Learning with IoT & IoE Smart Sensors and Internet of Things for Smart City Logic, Functional programming and Microcontrollers for IoT Sensor Networks, Actuators for Internet of Things Data Visualization using IoT IoT Application and Communication Protocol Big Data Analytics for Social Networking using IoT IoT Applications for Smart Cities Emulation and Simulation Methodologies for IoT IoT Applied for Digital Contents 5. Microwaves and Photonics Microwave filter Micro Strip antenna Microwave Link design Microwave oscillator Frequency selective surface Microwave Antenna Microwave Photonics Radio over fiber Optical communication Optical oscillator Optical Link design Optical phase lock loop Optical devices 6. Computation Intelligence and Analytics Soft Computing Advance Ubiquitous Computing Parallel Computing Distributed Computing Machine Learning Information Retrieval Expert Systems Data Mining Text Mining Data Warehousing Predictive Analysis Data Management Big Data Analytics Big Data Security 7. Energy Harvesting and Wireless Power Transmission Energy harvesting and transfer for wireless sensor networks Economics of energy harvesting communications Waveform optimization for wireless power transfer RF Energy Harvesting Wireless Power Transmission Microstrip Antenna design and application Wearable Textile Antenna Luminescence Rectenna 8. Advance Concept of Networking and Database Computer Network Mobile Adhoc Network Image Security Application Artificial Intelligence and machine learning in the Field of Network and Database Data Analytic High performance computing Pattern Recognition 9. Machine Learning (ML) and Knowledge Mining (KM) Regression and prediction Problem solving and planning Clustering Classification Neural information processing Vision and speech perception Heterogeneous and streaming data Natural language processing Probabilistic Models and Methods Reasoning and inference Marketing and social sciences Data mining Knowledge Discovery Web mining Information retrieval Design and diagnosis Game playing Streaming data Music Modelling and Analysis Robotics and control Multi-agent systems Bioinformatics Social sciences Industrial, financial and scientific applications of all kind 10. Advanced Computer networking Computational Intelligence Data Management, Exploration, and Mining Robotics Artificial Intelligence and Machine Learning Computer Architecture and VLSI Computer Graphics, Simulation, and Modelling Digital System and Logic Design Natural Language Processing and Machine Translation Parallel and Distributed Algorithms Pattern Recognition and Analysis Systems and Software Engineering Nature Inspired Computing Signal and Image Processing Reconfigurable Computing Cloud, Cluster, Grid and P2P Computing Biomedical Computing Advanced Bioinformatics Green Computing Mobile Computing Nano Ubiquitous Computing Context Awareness and Personalization, Autonomic and Trusted Computing Cryptography and Applied Mathematics Security, Trust and Privacy Digital Rights Management Networked-Driven Multicourse Chips Internet Computing Agricultural Informatics and Communication Community Information Systems Computational Economics, Digital Photogrammetric Remote Sensing, GIS and GPS Disaster Management e-governance, e-Commerce, e-business, e-Learning Forest Genomics and Informatics Healthcare Informatics Information Ecology and Knowledge Management Irrigation Informatics Neuro-Informatics Open Source: Challenges and opportunities Web-Based Learning: Innovation and Challenges Soft computing Signal and Speech Processing Natural Language Processing 11. Communications Microstrip Antenna Microwave Radar and Satellite Smart Antenna MIMO Antenna Wireless Communication RFID Network and Applications 5G Communication 6G Communication 12. Algorithms and Complexity Sequential, Parallel And Distributed Algorithms And Data Structures Approximation And Randomized Algorithms Graph Algorithms And Graph Drawing On-Line And Streaming Algorithms Analysis Of Algorithms And Computational Complexity Algorithm Engineering Web Algorithms Exact And Parameterized Computation Algorithmic Game Theory Computational Biology Foundations Of Communication Networks Computational Geometry Discrete Optimization 13. Software Engineering and Knowledge Engineering Software Engineering Methodologies Agent-based software engineering Artificial intelligence approaches to software engineering Component-based software engineering Embedded and ubiquitous software engineering Aspect-based software engineering Empirical software engineering Search-Based Software engineering Automated software design and synthesis Computer-supported cooperative work Automated software specification Reverse engineering Software Engineering Techniques and Production Perspectives Requirements engineering Software analysis, design and modelling Software maintenance and evolution Software engineering tools and environments Software engineering decision support Software design patterns Software product lines Process and workflow management Reflection and metadata approaches Program understanding and system maintenance Software domain modelling and analysis Software economics Multimedia and hypermedia software engineering Software engineering case study and experience reports Enterprise software, middleware, and tools Artificial intelligent methods, models, techniques Artificial life and societies Swarm intelligence Smart Spaces Autonomic computing and agent-based systems Autonomic computing Adaptive Systems Agent architectures, ontologies, languages and protocols Multi-agent systems Agent-based learning and knowledge discovery Interface agents Agent-based auctions and marketplaces Secure mobile and multi-agent systems Mobile agents SOA and Service-Oriented Systems Service-centric software engineering Service oriented requirements engineering Service oriented architectures Middleware for service based systems Service discovery and composition Service level agreements (drafting,

  14. Data from: Historical Data Mining Deep Dive into Machine Learning-Aided 2D...

    • acs.figshare.com
    • figshare.com
    xlsx
    Updated Jun 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Krittapong Deshsorn; Panwad Chavalekvirat; Somrudee Deepaisarn; Ho-Chiao Chuang; Pawin Iamprasertkun (2025). Historical Data Mining Deep Dive into Machine Learning-Aided 2D Materials Research in Electrochemical Applications [Dataset]. http://doi.org/10.1021/acsmaterialsau.5c00030.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 23, 2025
    Dataset provided by
    ACS Publications
    Authors
    Krittapong Deshsorn; Panwad Chavalekvirat; Somrudee Deepaisarn; Ho-Chiao Chuang; Pawin Iamprasertkun
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Machine learning transforms the landscape of 2D materials design, particularly in accelerating discovery, optimization, and screening processes. This review has delved into the historical and ongoing integration of machine learning in 2D materials for electrochemical energy applications, using the Knowledge Discovery in Databases (KDD) approach to guide the research through data mining from the Scopus database using analysis of citations, keywords, and trends. The topics will first focus on a “macro” scope, where hundreds of literature reports are computer analyzed for key insights, such as year analysis, publication origin, and word co-occurrence using heat maps and network graphs. Afterward, the focus will be narrowed down into a more specific “micro” scope obtained from the “macro” overview, which is intended to dive deep into machine learning usage. From the gathered insights, this work highlights how machine learning, density functional theory (DFT), and traditional experimentation are jointly advancing the field of materials science. Overall, the resulting review offers a comprehensive analysis, touching on essential applications such as batteries, fuel cells, supercapacitors, and synthesis processes while showcasing machine learning techniques that enhance the identification of critical material properties.

  15. Data and Code for "Comparing the Effects of Euclidean Distance Matching and...

    • zenodo.org
    bin, csv, png +1
    Updated Oct 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous; Anonymous (2024). Data and Code for "Comparing the Effects of Euclidean Distance Matching and Dynamic Time Warping in the Clustering of COVID-19 Evolution" [Dataset]. http://doi.org/10.5281/zenodo.13905791
    Explore at:
    png, text/x-python, csv, binAvailable download formats
    Dataset updated
    Oct 9, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anonymous; Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains the datasets and data sources, analysis code, and workflow associated with the manuscript "Comparing the Effects of Euclidean Distance Matching and Dynamic Time Warping in the Clustering of COVID-19 Evolution". The following resources are provided:

    • Data Files:

      • time_series_data.csv: A curated time series dataset with dates as rows and NUTS 2 regions as columns. Each column is labeled using a 4-letter abbreviation format "CC.RR", where "CC" represents the country code and "RR" represents the region code. This same abbreviation is also included in the accompanying GeoJSON file.
      • geometry_data.geojson: A GeoJSON file representing the spatial boundaries of the NUTS 2 regions, with the same 4-letter abbreviations used in the CSV file. EPSG:4326.
      • COVID19_data_sources.xlsx: This Excel file contains important metadata regarding the sources of COVID-19 data used in this study. It includes:
        • Source of the data for each country
        • Official website(s)
        • The agency responsible for the data
        • Description of the processing steps used to curate the data into the final time series.
    • Code:

      • analysis.py: A Python script used to process and analyze the data. This code can be run using Python 3.x. The libraries required to run this script are listed in the first lines of the code. The code is organized in different numbered sections (1), (2), ... and sub-sections (1a), (1b) ... Make sure to run the script one (sub-)section at a time, so that everything stays overviewable and you don't get all the output at once.
    • Workflow:

      • workflow.png : A detailed workflow according to the Knowledge Discovery in Databases (KDD) process, outlining the steps involved in processing and analyzing the data, including the methods used. This workflow provides a comprehensive guide to reproducing the analysis presented in the paper.
  16. S

    Crop trait regulating-genes knowledge graph dataset

    • scidb.cn
    Updated Jan 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    zhang dan dan (2025). Crop trait regulating-genes knowledge graph dataset [Dataset]. http://doi.org/10.57760/sciencedb.agriculture.00175
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 3, 2025
    Dataset provided by
    Science Data Bank
    Authors
    zhang dan dan
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    In the scientific research of crop breeding, breeding new crop varieties with various excellent traits has always been the direction of efforts of breeders. At present, with the accelerated application of information technology in the field of crop breeding, the multi-dimensional scientific data related to crop breeding has shown exponential growth. These semi-structured and structured scientific data are distributed in scientific databases in different fields and lack the association and fusion of multi-dimensional scientific data across species. It hindered the transfer and reuse of existing crop breeding knowledge and maximized the value of crop breeding scientific data, which brought challenges to the knowledge discovery of crop trait regulation genes. Therefore, more and more crop breeding research work is based on the reorganization, correlation, analysis and utilization of existing breeding scientific data, so as to achieve the discovery of crop trait regulation gene knowledge.The dataset of knowledge map of crop trait regulatory genes was selected from PubMed literature database, Phytozome (genomic information of 4 species) and Ensembl (European Molecular Biology Laboratory's European) Bioinformatics Institute (Bioinformatics Institute) plants (Genome information of 4 species), UniProt (Universal Protein) (protein Annotation information of 4 species), Rice Genome Annotation (RGAP) Project), STRING (protein interaction information for 4 species), Pfam (Protein family analysis and modeling) (protein family information for 4 species), KEGG (Kyoto Encyclopedia of Genes) The entities and relationships of the multi-source scientific data with different data formats were extracted using the and Genomes (pathway annotation information of the 4 species) and the GO (Gene Ontology) domain scientific database as the data sources. It mainly includes mapping knowledge extraction for structured data. For XML semi-structured data, knowledge extraction based on Kettle data analysis is adopted. For FASTA semi-structured data, knowledge extraction based on BLAST model is adopted. For Text unstructured data, knowledge extraction based on large language model is adopted. On the basis of the above entity and relationship extraction, the association fusion of multi-source crop breeding knowledge was realized based on entity mapping and specific attribute association. Finally, the crop trait regulatory gene knowledge map dataset was formed, which consisted of 13 entity datasets and 16 entity relationship datasets.The crop trait -egulating gene knowledge graph dataset provides a key semantic model and important data basis for crop breeding knowledge discovery, such as excellent pleiotropic gene discovery, cross-species gene function prediction and potential discovery of pathway gene network.

  17. P

    MIMIC-IV v2.2 Dataset

    • paperswithcode.com
    Updated Feb 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). MIMIC-IV v2.2 Dataset [Dataset]. https://paperswithcode.com/dataset/mimic-iv-v2-2
    Explore at:
    Dataset updated
    Feb 26, 2025
    Description

    Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. The Medical Information Mart for Intensive Care (MIMIC)-III database provided critical care data for over 40,000 patients admitted to intensive care units at the Beth Israel Deaconess Medical Center (BIDMC). Importantly, MIMIC-III was deidentified, and patient identifiers were removed according to the Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor provision. MIMIC-III has been integral in driving large amounts of research in clinical informatics, epidemiology, and machine learning. Here we present MIMIC-IV, an update to MIMIC-III, which incorporates contemporary data and improves on numerous aspects of MIMIC-III. MIMIC-IV adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.

  18. h

    Tiselac

    • huggingface.co
    Updated Jan 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Monash Scalable Time Series Evaluation Repository (2025). Tiselac [Dataset]. https://huggingface.co/datasets/monster-monash/Tiselac
    Explore at:
    Dataset updated
    Jan 22, 2025
    Dataset authored and provided by
    Monash Scalable Time Series Evaluation Repository
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Part of MONSTER: https://arxiv.org/abs/2502.15122.

    Tiselac

    Category Satellite

    Num. Examples 99,687

    Num. Channels 10

    Length 23

    Sampling Freq. 16 days

    Num. Classes 9

    License Other

    Citations [1] [2]

    TiSeLaC (Time Series Land Cover Classification) was created for the time series land cover classification challenge held in conjunction with the 2017 European Conference on Machine Learning & Principles and Practice of Knowledge Discovery in Databases [1]. It was… See the full description on the dataset page: https://huggingface.co/datasets/monster-monash/Tiselac.

  19. Data from: Is this bug severe? A text-cum-graph based model for bug severity...

    • zenodo.org
    csv, txt
    Updated Aug 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rima Hazra; Arpit Dwivedi; Animesh Mukherjee; Rima Hazra; Arpit Dwivedi; Animesh Mukherjee (2023). Is this bug severe? A text-cum-graph based model for bug severity prediction [Dataset]. http://doi.org/10.5281/zenodo.5554978
    Explore at:
    csv, txtAvailable download formats
    Dataset updated
    Aug 26, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Rima Hazra; Arpit Dwivedi; Animesh Mukherjee; Rima Hazra; Arpit Dwivedi; Animesh Mukherjee
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A snapshot of the dataset has been updated. For the time being, we are publishing a snapshot of the dataset where the bugs were reported after 2017.

    Paper link: https://arxiv.org/abs/2207.00623 (ECML-PKDD 2022)

    Cite our paper:

    @InProceedings{10.1007/978-3-031-26422-1_15,
    author="Hazra, Rima
    and Dwivedi, Arpit
    and Mukherjee, Animesh",
    editor="Amini, Massih-Reza
    and Canu, St{\'e}phane
    and Fischer, Asja
    and Guns, Tias
    and Kralj Novak, Petra
    and Tsoumakas, Grigorios",
    title="Is This Bug Severe? A Text-Cum-Graph Based Model for Bug Severity Prediction",
    booktitle="Machine Learning and Knowledge Discovery in Databases",
    year="2023",
    publisher="Springer Nature Switzerland",
    address="Cham",
    pages="236--252",
    isbn="978-3-031-26422-1"
    }

    *** Please see the new version. (10.5281/zenodo.5554978)

    There is a total of six files.

    • bug_descriptions.csv: This file contains the bug id and its description.
    • bug_comments.csv: This file contains three columns. The columns are the bug ids, comments and timestamp of the comment.
    • bug_REPORTED_ON_details.csv: This file contains the bug id and the package name on which the bug is reported
    • affect_dataset.csv: This file contains the bug id and the affected packages along with the affect timestamp.
    • bug_heat_2019.csv: This file contains the bug ids and its bug heats crawled in November 2019.
    • bug_heat_2020.csv: This file contains the bug ids and its bug heats crawled in November 2020.
  20. f

    datasheet1_Q-Finder: An Algorithm for Credible Subgroup Discovery in...

    • frontiersin.figshare.com
    pdf
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cyril Esnault; May-Line Gadonna; Maxence Queyrel; Alexandre Templier; Jean-Daniel Zucker (2023). datasheet1_Q-Finder: An Algorithm for Credible Subgroup Discovery in Clinical Data Analysis — An Application to the International Diabetes Management Practice Study.pdf [Dataset]. http://doi.org/10.3389/frai.2020.559927.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Frontiers
    Authors
    Cyril Esnault; May-Line Gadonna; Maxence Queyrel; Alexandre Templier; Jean-Daniel Zucker
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Addressing the heterogeneity of both the outcome of a disease and the treatment response to an intervention is a mandatory pathway for regulatory approval of medicines. In randomized clinical trials (RCTs), confirmatory subgroup analyses focus on the assessment of drugs in predefined subgroups, while exploratory ones allow a posteriori the identification of subsets of patients who respond differently. Within the latter area, subgroup discovery (SD) data mining approach is widely used—particularly in precision medicine—to evaluate treatment effect across different groups of patients from various data sources (be it from clinical trials or real-world data). However, both the limited consideration by standard SD algorithms of recommended criteria to define credible subgroups and the lack of statistical power of the findings after correcting for multiple testing hinder the generation of hypothesis and their acceptance by healthcare authorities and practitioners. In this paper, we present the Q-Finder algorithm that aims to generate statistically credible subgroups to answer clinical questions, such as finding drivers of natural disease progression or treatment response. It combines an exhaustive search with a cascade of filters based on metrics assessing key credibility criteria, including relative risk reduction assessment, adjustment on confounding factors, individual feature’s contribution to the subgroup’s effect, interaction tests for assessing between-subgroup treatment effect interactions and tests adjustment (multiple testing). This allows Q-Finder to directly target and assess subgroups on recommended credibility criteria. The top-k credible subgroups are then selected, while accounting for subgroups’ diversity and, possibly, clinical relevance. Those subgroups are tested on independent data to assess their consistency across databases, while preserving statistical power by limiting the number of tests. To illustrate this algorithm, we applied it on the database of the International Diabetes Management Practice Study (IDMPS) to better understand the drivers of improved glycemic control and rate of episodes of hypoglycemia in type 2 diabetics patients. We compared Q-Finder with state-of-the-art approaches from both Subgroup Identification and Knowledge Discovery in Databases literature. The results demonstrate its ability to identify and support a short list of highly credible and diverse data-driven subgroups for both prognostic and predictive tasks.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
E.M. Ruiz Lobaina; C. P. Romero Suárez (2023). Results obtained in a data mining process applied to a database containing bibliographic information concerning four segments of science. [Dataset]. http://doi.org/10.6084/m9.figshare.20011798.v1

Data from: Results obtained in a data mining process applied to a database containing bibliographic information concerning four segments of science.

Related Article
Explore at:
jpegAvailable download formats
Dataset updated
Jun 4, 2023
Dataset provided by
SciELO journals
Authors
E.M. Ruiz Lobaina; C. P. Romero Suárez
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Abstract The objective of this work is to improve the quality of the information that belongs to the database CubaCiencia, of the Institute of Scientific and Technological Information. This database has bibliographic information referring to four segments of science and is the main database of the Library Management System. The applied methodology was based on the Decision Trees, the Correlation Matrix, the 3D Scatter Plot, etc., which are techniques used by data mining, for the study of large volumes of information. The results achieved not only made it possible to improve the information in the database, but also provided truly useful patterns in the solution of the proposed objectives.

Search
Clear search
Close search
Google apps
Main menu