Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract The objective of this work is to improve the quality of the information that belongs to the database CubaCiencia, of the Institute of Scientific and Technological Information. This database has bibliographic information referring to four segments of science and is the main database of the Library Management System. The applied methodology was based on the Decision Trees, the Correlation Matrix, the 3D Scatter Plot, etc., which are techniques used by data mining, for the study of large volumes of information. The results achieved not only made it possible to improve the information in the database, but also provided truly useful patterns in the solution of the proposed objectives.
Data from the article "Unraveling spatial, structural, and social country-level conditions for the emergence of the foreign fighter phenomenon: an exploratory data mining approach to the case of ISIS", by Agustin Pájaro, Ignacio J. Duran and Pablo Rodrigo, published in Revista DADOS, v. 65, n. 3, 2022.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SPARQL query example 2. This text file contains an example of SPARQL query that enable to explore the vicinity of an entity. This particular query returns the RDF graph surrounding, within a lenght of 4, the node pharmgkb:PA451906 that represents the warfarin, an anticoagulant drug. (TXT 392 bytes)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SPARQL query example 1. This text file contains the SPARQL query we apply on our PGx linked data to obtain the data graph represented in Fig. 3. This query includes the definition of prefixes mentioned in Figs. 2 and 3. This query takes about 30 s on our https://pgxlod.loria.fr server. (TXT 2 kb)
Attribution-NonCommercial-NoDerivs 3.0 (CC BY-NC-ND 3.0)https://creativecommons.org/licenses/by-nc-nd/3.0/
License information was derived automatically
jats:titleAbstract/jats:titlejats:pGenetics and “omics” studies designed to uncover genotype to phenotype relationships often identify large numbers of potential candidate genes, among which the causal genes are hidden. Scientists generally lack the time and technical expertise to review all relevant information available from the literature, from key model species and from a potentially wide range of related biological databases in a variety of data formats with variable quality and coverage. Computational tools are needed for the integration and evaluation of heterogeneous information in order to prioritise candidate genes and components of interaction networks that, if perturbed through potential interventions, have a positive impact on the biological outcome in the whole organism without producing negative side effects. Here we review several bioinformatics tools and databases that play an important role in biological knowledge discovery and candidate gene prioritization. We conclude with several key challenges that need to be addressed in order to facilitate biological knowledge discovery in the future./jats:p
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Alberta’s oil sands play a critical role in Canada meeting its commitment to the Paris Climate Change Agreement. However, few studies published the actual operation data for extraction operations (schemes), especially fuel consumption data to accurately project greenhouse gas (GHG) emissions for development and expansion of oil sands projects. In this study, we mined 2015–2018 operation data from over 29 million records in Petrinex via knowledge discovery in databases (KDD) process, and described GHG and fuel consumption patterns for 20 in situ oil sands extraction schemes (representing > 80% in situ extractions in 2018). The discovered patterns were interpreted by a range of performance indicators. From 2015 to 2018, GHG emission intensity (EI) for the schemes dropped by 7.5% from 0.6193 t CO2e/m3 bitumen (oil) to 0.5732 t CO2e/m3 bitumen. On the four-year average, the in situ oil sands extractions used 3.8632 m3 steam to produce 1 m3 of oil (3.8632 m3 steam / m3 oil) with a range of 1.8170 to 7.0628 m3 steam / m3 oil; consumed 0.0668 103m3 steam generator fuel (SGF) to produce 1 m3 of steam (0.0668 103m3 SGF/ m3 steam) with a range of 0.0288 to 0.0910 103m3 SGF/m3 steam; consumed 0.2995 103m3 of stationary combustion fuel (SCF) to produce 1 m3 of bitumen (0.2955 103m3 SCF/m3 bitumen) with a range of 0.1224 to 0.6176 103m3 SCF/m3 bitumen. The Peace River region had the highest solution gas oil ratio. The region produced 0.0819 103m3 of solution gas from 1 m3 of bitumen produced (0.0819 103m3 solution gas/m3 bitumen). On average, cyclic steam stimulation recovery method used 53.5% more steam to produce 1 m3 of bitumen and used 11.1% more SGF to produce 1 m3 of steam, compared to steam assisted gravity drainage recovery method. With the carbon price at C$30/t CO2e and Western Canadian Select (WCS) crude oil price at US$38.46/bbl, the GHG costs account for 0.33% to 8.81% of WCS crude price using Alberta’s emission benchmark. The study provides methods to mine the public database – Petrinex for studying GHG, energy, and water consumption by the oil and gas industry in Canada. The results also provide more accurate energy and emission intensity, which can be used for GHG life cycle assessment and compared with other energy extraction methods on a life cycle basis.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Bitcoin is the first implementation of a technology that has become known as a 'public permissionless' blockchain. Such systems allow public read/write access to an append-only blockchain database without the need for any mediating central authority. Instead they guarantee access, security and protocol conformity through an elegant combination of cryptographic assurances and game theoretic economic incentives. Not until the advent of the Bitcoin blockchain has such a trusted, transparent, comprehensive and granular data set of digital economic behaviours been available for public network analysis. In this article, by translating the cumbersome binary data structure of the Bitcoin blockchain into a high fidelity graph model, we demonstrate through various analyses the often overlooked social and econometric benefits of employing such a novel open data architecture. Specifically we show (a) how repeated patterns of transaction behaviours can be revealed to link user activity across the blockchain; (b) how newly mined bitcoin can be associated to demonstrate individual accumulations of wealth; (c) through application of the naïve quantity theory of money that Bitcoin's disinflationary properties can be revealed and measured; and (d) how the user community can develop coordinated defences against repeated denial of service attacks on the network. Such public analyses of this open data are exemplary benefits unavailable to the closed data models of the 'private permissioned' distributed ledger architectures currently dominating enterprise-level blockchain development due to existing issues of scalability, confidentiality and governance.
Data and descriptions are copy from http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
This is the data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between bad'' connections, called intrusions or attacks, andgood'' normal connections. This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
If you use these data, please remember to cite the following paper:Sorano, D., Carrara, F., Cintia, P., Falchi, F., Pappalardo, L. (2020) Automatic Pass Annotation from Soccer VideoStreams Based on Object Detection and LSTM, In: Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020.Tensors extracted from soccer video broadcasts. Each file is a zip of folder and corresponds to a single half of a match. Each file in the folder (in .pickle format) corresponds to a frame of the video.This item contains the following files/matches:- roma_juve_1H_tensors.zip: tensors/frames of the first half of match Roma vs Juventus- roma_juve_2H_tensors.zip: tensors/frames of the second half of match Roma vs Juventus- roma_lazio_1H_tensors.zip: tensors/frames of the first half of match Roma vs Lazio- sassuolo_inter_1H_tensors.zip: tensors/frames of the first half of match Sassuolo vs Inter- sassuolo_inter_2H_tensors.zip: tensors/frames of the second half of match Sassuolo vs Inter
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A better understanding of greenhouse gas (GHG) emissions resulting from oil sands (bitumen) extraction can help to meet global oil demands, identify potential mitigation measures, and design effective carbon policies. While several studies have attempted to model GHG emissions from oil sands extractions, these studies have encountered data availability challenges, particularly with respect to actual fuel use data, and have thus struggled to accurately quantify GHG emissions. This dataset contains actual operational data from 20 in-situ oil sands operations, including information for fuel gas, flare gas, vented gas, production, steam injection, gas injection, condensate injection, and C3 injection.
International Journal of Engineering and Advanced Technology FAQ - ResearchHelpDesk - International Journal of Engineering and Advanced Technology (IJEAT) is having Online-ISSN 2249-8958, bi-monthly international journal, being published in the months of February, April, June, August, October, and December by Blue Eyes Intelligence Engineering & Sciences Publication (BEIESP) Bhopal (M.P.), India since the year 2011. It is academic, online, open access, double-blind, peer-reviewed international journal. It aims to publish original, theoretical and practical advances in Computer Science & Engineering, Information Technology, Electrical and Electronics Engineering, Electronics and Telecommunication, Mechanical Engineering, Civil Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. All submitted papers will be reviewed by the board of committee of IJEAT. Aim of IJEAT Journal disseminate original, scientific, theoretical or applied research in the field of Engineering and allied fields. dispense a platform for publishing results and research with a strong empirical component. aqueduct the significant gap between research and practice by promoting the publication of original, novel, industry-relevant research. seek original and unpublished research papers based on theoretical or experimental works for the publication globally. publish original, theoretical and practical advances in Computer Science & Engineering, Information Technology, Electrical and Electronics Engineering, Electronics and Telecommunication, Mechanical Engineering, Civil Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. impart a platform for publishing results and research with a strong empirical component. create a bridge for a significant gap between research and practice by promoting the publication of original, novel, industry-relevant research. solicit original and unpublished research papers, based on theoretical or experimental works. Scope of IJEAT International Journal of Engineering and Advanced Technology (IJEAT) covers all topics of all engineering branches. Some of them are Computer Science & Engineering, Information Technology, Electronics & Communication, Electrical and Electronics, Electronics and Telecommunication, Civil Engineering, Mechanical Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. The main topic includes but not limited to: 1. Smart Computing and Information Processing Signal and Speech Processing Image Processing and Pattern Recognition WSN Artificial Intelligence and machine learning Data mining and warehousing Data Analytics Deep learning Bioinformatics High Performance computing Advanced Computer networking Cloud Computing IoT Parallel Computing on GPU Human Computer Interactions 2. Recent Trends in Microelectronics and VLSI Design Process & Device Technologies Low-power design Nanometer-scale integrated circuits Application specific ICs (ASICs) FPGAs Nanotechnology Nano electronics and Quantum Computing 3. Challenges of Industry and their Solutions, Communications Advanced Manufacturing Technologies Artificial Intelligence Autonomous Robots Augmented Reality Big Data Analytics and Business Intelligence Cyber Physical Systems (CPS) Digital Clone or Simulation Industrial Internet of Things (IIoT) Manufacturing IOT Plant Cyber security Smart Solutions – Wearable Sensors and Smart Glasses System Integration Small Batch Manufacturing Visual Analytics Virtual Reality 3D Printing 4. Internet of Things (IoT) Internet of Things (IoT) & IoE & Edge Computing Distributed Mobile Applications Utilizing IoT Security, Privacy and Trust in IoT & IoE Standards for IoT Applications Ubiquitous Computing Block Chain-enabled IoT Device and Data Security and Privacy Application of WSN in IoT Cloud Resources Utilization in IoT Wireless Access Technologies for IoT Mobile Applications and Services for IoT Machine/ Deep Learning with IoT & IoE Smart Sensors and Internet of Things for Smart City Logic, Functional programming and Microcontrollers for IoT Sensor Networks, Actuators for Internet of Things Data Visualization using IoT IoT Application and Communication Protocol Big Data Analytics for Social Networking using IoT IoT Applications for Smart Cities Emulation and Simulation Methodologies for IoT IoT Applied for Digital Contents 5. Microwaves and Photonics Microwave filter Micro Strip antenna Microwave Link design Microwave oscillator Frequency selective surface Microwave Antenna Microwave Photonics Radio over fiber Optical communication Optical oscillator Optical Link design Optical phase lock loop Optical devices 6. Computation Intelligence and Analytics Soft Computing Advance Ubiquitous Computing Parallel Computing Distributed Computing Machine Learning Information Retrieval Expert Systems Data Mining Text Mining Data Warehousing Predictive Analysis Data Management Big Data Analytics Big Data Security 7. Energy Harvesting and Wireless Power Transmission Energy harvesting and transfer for wireless sensor networks Economics of energy harvesting communications Waveform optimization for wireless power transfer RF Energy Harvesting Wireless Power Transmission Microstrip Antenna design and application Wearable Textile Antenna Luminescence Rectenna 8. Advance Concept of Networking and Database Computer Network Mobile Adhoc Network Image Security Application Artificial Intelligence and machine learning in the Field of Network and Database Data Analytic High performance computing Pattern Recognition 9. Machine Learning (ML) and Knowledge Mining (KM) Regression and prediction Problem solving and planning Clustering Classification Neural information processing Vision and speech perception Heterogeneous and streaming data Natural language processing Probabilistic Models and Methods Reasoning and inference Marketing and social sciences Data mining Knowledge Discovery Web mining Information retrieval Design and diagnosis Game playing Streaming data Music Modelling and Analysis Robotics and control Multi-agent systems Bioinformatics Social sciences Industrial, financial and scientific applications of all kind 10. Advanced Computer networking Computational Intelligence Data Management, Exploration, and Mining Robotics Artificial Intelligence and Machine Learning Computer Architecture and VLSI Computer Graphics, Simulation, and Modelling Digital System and Logic Design Natural Language Processing and Machine Translation Parallel and Distributed Algorithms Pattern Recognition and Analysis Systems and Software Engineering Nature Inspired Computing Signal and Image Processing Reconfigurable Computing Cloud, Cluster, Grid and P2P Computing Biomedical Computing Advanced Bioinformatics Green Computing Mobile Computing Nano Ubiquitous Computing Context Awareness and Personalization, Autonomic and Trusted Computing Cryptography and Applied Mathematics Security, Trust and Privacy Digital Rights Management Networked-Driven Multicourse Chips Internet Computing Agricultural Informatics and Communication Community Information Systems Computational Economics, Digital Photogrammetric Remote Sensing, GIS and GPS Disaster Management e-governance, e-Commerce, e-business, e-Learning Forest Genomics and Informatics Healthcare Informatics Information Ecology and Knowledge Management Irrigation Informatics Neuro-Informatics Open Source: Challenges and opportunities Web-Based Learning: Innovation and Challenges Soft computing Signal and Speech Processing Natural Language Processing 11. Communications Microstrip Antenna Microwave Radar and Satellite Smart Antenna MIMO Antenna Wireless Communication RFID Network and Applications 5G Communication 6G Communication 12. Algorithms and Complexity Sequential, Parallel And Distributed Algorithms And Data Structures Approximation And Randomized Algorithms Graph Algorithms And Graph Drawing On-Line And Streaming Algorithms Analysis Of Algorithms And Computational Complexity Algorithm Engineering Web Algorithms Exact And Parameterized Computation Algorithmic Game Theory Computational Biology Foundations Of Communication Networks Computational Geometry Discrete Optimization 13. Software Engineering and Knowledge Engineering Software Engineering Methodologies Agent-based software engineering Artificial intelligence approaches to software engineering Component-based software engineering Embedded and ubiquitous software engineering Aspect-based software engineering Empirical software engineering Search-Based Software engineering Automated software design and synthesis Computer-supported cooperative work Automated software specification Reverse engineering Software Engineering Techniques and Production Perspectives Requirements engineering Software analysis, design and modelling Software maintenance and evolution Software engineering tools and environments Software engineering decision support Software design patterns Software product lines Process and workflow management Reflection and metadata approaches Program understanding and system maintenance Software domain modelling and analysis Software economics Multimedia and hypermedia software engineering Software engineering case study and experience reports Enterprise software, middleware, and tools Artificial intelligent methods, models, techniques Artificial life and societies Swarm intelligence Smart Spaces Autonomic computing and agent-based systems Autonomic computing Adaptive Systems Agent architectures, ontologies, languages and protocols Multi-agent systems Agent-based learning and knowledge discovery Interface agents Agent-based auctions and marketplaces Secure mobile and multi-agent systems Mobile agents SOA and Service-Oriented Systems Service-centric software engineering Service oriented requirements engineering Service oriented architectures Middleware for service based systems Service discovery and composition Service level agreements (drafting,
This is the data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining. The competition task was to build a network intrusion detector, a predictive model capable of distinguishing between 'bad' connections, called intrusions or attacks, and 'good' normal connections. This database contains a standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('kddcup99', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
DataSet for use in RapidMiner from the master's thesis. OPEN DATA MINING: AN ANALYSIS OF THE USE OF BOTS IN THEELECTRONIC TRADING FLOORSDissertation presented to the Graduate Program in Management in Learning Organizations in compliance with the requirements for completion of the Professional Master in Management in Learning Organizations-UFPB.Brazil's federal government has sought to match procurement procedures to trends in information and communication technologies. The electronic reverse auction was one of the products of these efforts, being characterized as a modality that presented structural solutions to improve the efficiency of purchases of common goods and services and that represents more than 94% of the bids that occurred in the country. Despite the benefits of electronic format, this environment brings challenges, such as dealing with the use of bots, which works by automatically bidding. While there is no law prohibiting its use, judgments of the Federal Court of Auditors state that its use provides a competitive advantage to suppliers holding this technology in question over other bidders, characterizing an affront to the principle of isonomy. Also in the sense of modernizing public procurement is increasing transparency through open data policies, as part of the context of Open Government and digital transformation. This study aims to analyze the situation of bot use in electronic reverse auctions through open data mining. Electronic reverse auctions held at the Ministry of Agriculture, Livestock and Supply in 2017 were analyzed. Data were obtained by request by the Electronic Information System for Citizen Information (e-SIC), having been adopted as methodology the knowledge discovery in databases. The results indicate that bot use in electronic reverse auctions in 2017 represented a more than 5% advantage in successful bid items observed for only 1.99% of the sample bidders, indicated as suspected use. The most relevant indicator for classifying bidders as suspects was the high number of bids issued in relation to the behavior observed in the sample. Results are expected to foster discussion of the effects of bot use on e-trading and to highlight the need for open data policy development for data mining to be an increasingly effective means to assess anomalies and increase the integrity of the bids made by the Federal Government Procurement Portal.DataSet para uso no RapidMiner provenientes da dissertação de mestrado. MINERAÇÃO DE DADOS ABERTOS: UMA ANÁLISE DO USO DE BOTS EMPREGÕES ELETRÔNICOSDissertação apresentada ao Programa de Pós-Graduação em Gestão nas Organizações Aprendentes em cumprimento às exigências para conclusão do Mestrado Profissional em Gestão nas Organizações Aprendentes-UFPB.https://sig-arq.ufpb.br/arquivos/2019071230f6981803056bc243c9a4b41/Dissertao_-_Hugo_Medeiros_Souto_-_Minerao_de_Dados_Abertos_2.pdf
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Part of MONSTER: https://arxiv.org/abs/2502.15122.
Tiselac
Category Satellite
Num. Examples 99,687
Num. Channels 10
Length 23
Sampling Freq. 16 days
Num. Classes 9
License Other
Citations [1] [2]
TiSeLaC (Time Series Land Cover Classification) was created for the time series land cover classification challenge held in conjunction with the 2017 European Conference on Machine Learning & Principles and Practice of Knowledge Discovery in Databases [1]. It was… See the full description on the dataset page: https://huggingface.co/datasets/monster-monash/Tiselac.
Dataset with annotated 12-lead ECG records. The exams were taken in 811 counties in the state of Minas Gerais/Brazil by the Telehealth Network of Minas Gerais (TNMG) between 2010 and 2016. And organized by the CODE (Clinical outcomes in digital electrocardiography) group. Requesting access Researchers affiliated to educational or research institutions might make requests to access this data dataset. Requests will be analyzed on an individual basis and should contain: Name of PI and host organisation; Contact details (including your name and email); and, the scientific purpose of data access request. If approved, a data user agreement will be forwarded to the researcher that made the request (through the email that was provided). After the agreement has been signed (by the researcher or by the research institution) access to the dataset will be granted. Openly available subset: A subset of this dataset (with 15% of the patients) is openly available. See: "CODE-15%: a large scale annotated dataset of 12-lead ECGs" https://doi.org/10.5281/zenodo.4916206. Content The folder contains: A column separated file containing basic patient attributes. The ECG waveforms in the wfdb format. Additional references The dataset is described in the paper "Automatic diagnosis of the 12-lead ECG using a deep neural network". https://www.nature.com/articles/s41467-020-15432-4. Related publications also using this dataset are: - [1] G. Paixao et al., “Validation of a Deep Neural Network Electrocardiographic-Age as a Mortality Predictor: The CODE Study,” Circulation, vol. 142, no. Suppl_3, pp. A16883–A16883, Nov. 2020, doi: 10.1161/circ.142.suppl_3.16883.- [2] A. L. P. Ribeiro et al., “Tele-electrocardiography and bigdata: The CODE (Clinical Outcomes in Digital Electrocardiography) study,” Journal of Electrocardiology, Sep. 2019, doi: 10/gf7pwg.- [3] D. M. Oliveira, A. H. Ribeiro, J. A. O. Pedrosa, G. M. M. Paixao, A. L. P. Ribeiro, and W. Meira Jr, “Explaining end-to-end ECG automated diagnosis using contextual features,” in Machine Learning and Knowledge Discovery in Databases. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), Ghent, Belgium, Sep. 2020, vol. 12461, pp. 204--219. doi: 10.1007/978-3-030-67670-4_13.- [4] D. M. Oliveira, A. H. Ribeiro, J. A. O. Pedrosa, G. M. M. Paixao, A. L. Ribeiro, and W. M. Jr, “Explaining black-box automated electrocardiogram classification to cardiologists,” in 2020 Computing in Cardiology (CinC), 2020, vol. 47. doi: 10.22489/CinC.2020.452.- [5] G. M. M. Paixão et al., “Evaluation of mortality in bundle branch block patients from an electronic cohort: Clinical Outcomes in Digital Electrocardiography (CODE) study,” Journal of Electrocardiology, Sep. 2019, doi: 10/dcgk.- [6] G. M. M. Paixão et al., “Evaluation of Mortality in Atrial Fibrillation: Clinical Outcomes in Digital Electrocardiography (CODE) Study,” Global Heart, vol. 15, no. 1, p. 48, Jul. 2020, doi: 10.5334/gh.772.- [7] G. M. M. Paixão et al., “Electrocardiographic Predictors of Mortality: Data from a Primary Care Tele-Electrocardiography Cohort of Brazilian Patients,” Hearts, vol. 2, no. 4, Art. no. 4, Dec. 2021, doi: 10.3390/hearts2040035.- [8] G. M. Paixão et al., “ECG-AGE FROM ARTIFICIAL INTELLIGENCE: A NEW PREDICTOR FOR MORTALITY? THE CODE (CLINICAL OUTCOMES IN DIGITAL ELECTROCARDIOGRAPHY) STUDY,” Journal of the American College of Cardiology, vol. 75, no. 11 Supplement 1, p. 3672, 2020, doi: 10.1016/S0735-1097(20)34299-6.- [9] E. M. Lima et al., “Deep neural network estimated electrocardiographic-age as a mortality predictor,” Nature Communications, vol. 12, 2021, doi: 10.1038/s41467-021-25351-7.- [10] W. Meira Jr, A. L. P. Ribeiro, D. M. Oliveira, and A. H. Ribeiro, “Contextualized Interpretable Machine Learning for Medical Diagnosis,” Communications of the ACM, 2020, doi: 10.1145/3416965.- [11] A. H. Ribeiro et al., “Automatic diagnosis of the 12-lead ECG using a deep neural network,” Nature Communications, vol. 11, no. 1, p. 1760, 2020, doi: 10/drkd.- [12] A. H. Ribeiro et al., “Automatic Diagnosis of Short-Duration 12-Lead ECG using a Deep Convolutional Network,” Machine Learning for Health (ML4H) Workshop at NeurIPS, 2018.- [13] A. H. Ribeiro et al., “Automatic 12-lead ECG classification using a convolutional network ensemble,” 2020. doi: 10.22489/CinC.2020.130.- [14] V. Sangha et al., “Automated Multilabel Diagnosis on Electrocardiographic Images and Signals,” medRxiv, Sep. 2021, doi: 10.1101/2021.09.22.21263926.- [15] S. Biton et al., “Atrial fibrillation risk prediction from the 12-lead ECG using digital biomarkers and deep representation learning,” European Heart Journal - Digital Health, 2021, doi: 10.1093/ehjdh/ztab071. Code: The following github repositories perform analysis that use this dataset: - https://github.com/antonior92/automatic-ecg-diagnosis- https://github.com/antonior92/ecg-age-prediction Related Datasets: - CODE-test: An annotated 12-lead ECG dataset (https://doi.org/10.5281/zenodo.3765780)- CODE-15%: a large scale annotated dataset of 12-lead ECGs (https://doi.org/10.5281/zenodo.4916206)- Sami-Trop: 12-lead ECG traces with age and mortality annotations (https://doi.org/10.5281/zenodo.4905618) Ethics declarations The CODE Study was approved by the Research Ethics Committee of the Universidade Federal de Minas Gerais, protocol 49368496317.7.0000.5149.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Addressing the heterogeneity of both the outcome of a disease and the treatment response to an intervention is a mandatory pathway for regulatory approval of medicines. In randomized clinical trials (RCTs), confirmatory subgroup analyses focus on the assessment of drugs in predefined subgroups, while exploratory ones allow a posteriori the identification of subsets of patients who respond differently. Within the latter area, subgroup discovery (SD) data mining approach is widely used—particularly in precision medicine—to evaluate treatment effect across different groups of patients from various data sources (be it from clinical trials or real-world data). However, both the limited consideration by standard SD algorithms of recommended criteria to define credible subgroups and the lack of statistical power of the findings after correcting for multiple testing hinder the generation of hypothesis and their acceptance by healthcare authorities and practitioners. In this paper, we present the Q-Finder algorithm that aims to generate statistically credible subgroups to answer clinical questions, such as finding drivers of natural disease progression or treatment response. It combines an exhaustive search with a cascade of filters based on metrics assessing key credibility criteria, including relative risk reduction assessment, adjustment on confounding factors, individual feature’s contribution to the subgroup’s effect, interaction tests for assessing between-subgroup treatment effect interactions and tests adjustment (multiple testing). This allows Q-Finder to directly target and assess subgroups on recommended credibility criteria. The top-k credible subgroups are then selected, while accounting for subgroups’ diversity and, possibly, clinical relevance. Those subgroups are tested on independent data to assess their consistency across databases, while preserving statistical power by limiting the number of tests. To illustrate this algorithm, we applied it on the database of the International Diabetes Management Practice Study (IDMPS) to better understand the drivers of improved glycemic control and rate of episodes of hypoglycemia in type 2 diabetics patients. We compared Q-Finder with state-of-the-art approaches from both Subgroup Identification and Knowledge Discovery in Databases literature. The results demonstrate its ability to identify and support a short list of highly credible and diverse data-driven subgroups for both prognostic and predictive tasks.
This is the official repository for the paper TCRIP-MIM: Rapid Intensification Prediction for Tropical Cyclone by Combining Memory In Memory Network with Sequential Satellite Images. We use the publicly available dataset from Taiwan University (Bai et al., 2019) as experimental data, consisting of four channels of TC satellite images with a temporal resolution of 3 hours, whose preprocessing method is also publicly available. We use infrared and passive microwave TC satellite image sequences for our experiments, each divided into 24-hour segments (8 infrared and 8 passive microwave satellite images, 16 in total), preprocessing and enhancing data as noted above, so there is no experimental error due to different data preprocessing methods. In this study, the 2003–2017 TC dataset from various global basins was divided into training (1097 TCs, 43528 events), validation (188 TCs, 7884 events), and test sets (94 TCs, 3196 events). {"references": ["Bai, C. Y., Chen, B. F., & Lin, H. T. (2019, September). Attention-based Deep Tropical Cyclone Rapid Intensification Prediction. In MACLEAN@ PKDD/ECML. doi: 10.48550/arXiv.1909.11616", "Bai, C. Y., Chen, B. F., & Lin, H. T. (2020, September). Benchmarking Tropical Cyclone Rapid Intensification with Satellite Images and Attention-Based Deep Models. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 497-512). Springer, Cham. doi: 10.1007/978-3-030- 67667-4_30"]}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was created using Spotify developer API. It consists of user-created as well as Spotify-curated playlists. The dataset consists of 1 million playlists, 3 million unique tracks, 3 million unique albums, and 1.3 million artists. The data is stored in a SQL database, with the primary entities being songs, albums, artists, and playlists. Each of the aforementioned entities are represented by unique IDs (Spotify URI). Data is stored into following tables:
album
artist
track
playlist
track_artist1
track_playlist1
album
| id | name | uri |
id: Album ID as provided by Spotify name: Album Name as provided by Spotify uri: Album URI as provided by Spotify
artist
| id | name | uri |
id: Artist ID as provided by Spotify name: Artist Name as provided by Spotify uri: Artist URI as provided by Spotify
track
| id | name | duration | popularity | explicit | preview_url | uri | album_id |
id: Track ID as provided by Spotify name: Track Name as provided by Spotify duration: Track Duration (in milliseconds) as provided by Spotify popularity: Track Popularity as provided by Spotify explicit: Whether the track has explicit lyrics or not. (true or false) preview_url: A link to a 30 second preview (MP3 format) of the track. Can be null uri: Track Uri as provided by Spotify album_id: Album Id to which the track belongs
playlist
| id | name | followers | uri | total_tracks |
id: Playlist ID as provided by Spotify name: Playlist Name as provided by Spotify followers: Playlist Followers as provided by Spotify uri: Playlist Uri as provided by Spotify total_tracks: Total number of tracks in the playlist.
track_artist1
| track_id | artist_id |
Track-Artist association table
track_playlist1
| track_id | playlist_id |
Track-Playlist association table
The data is in the form of a SQL dump. The download size is about 10 GB, and the database populated from it comes out to about 35GB.
spotifydbdumpschemashare.sql contains the schema for the database (for reference): spotifydbdumpshare.sql is the actual data dump.
Setup steps: 1. Create database 2. mysql -u -p < spotifydbdumpshare.sql
The description of this dataset can be found in the following paper:
Papreja P., Venkateswara H., Panchanathan S. (2020) Representation, Exploration and Recommendation of Playlists. In: Cellier P., Driessens K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Communications in Computer and Information Science, vol 1168. Springer, Cham
Retrospectively collected medical data has the opportunity to improve patient care through knowledge discovery and algorithm development. Broad reuse of medical data is desirable for the greatest public good, but data sharing must be done in a manner which protects patient privacy. The Medical Information Mart for Intensive Care (MIMIC)-III database provided critical care data for over 40,000 patients admitted to intensive care units at the Beth Israel Deaconess Medical Center (BIDMC). Importantly, MIMIC-III was deidentified, and patient identifiers were removed according to the Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor provision. MIMIC-III has been integral in driving large amounts of research in clinical informatics, epidemiology, and machine learning. Here we present MIMIC-IV, an update to MIMIC-III, which incorporates contemporary data and improves on numerous aspects of MIMIC-III. MIMIC-IV adopts a modular approach to data organization, highlighting data provenance and facilitating both individual and combined use of disparate data sources. MIMIC-IV is intended to carry on the success of MIMIC-III and support a broad set of applications within healthcare.
It is becoming increasingly clear that the next generation of web search and advertising will rely on a deeper understanding of user intent and task modeling, and a correspondingly richer interpretation of content on the web. How we get there, in particular, how we understand web content in richer terms than bags of words and links, is a wide open and fascinating question. I will discuss some of the options here, and look closely at the role that information extraction can play. Speaker Bio Raghu Ramakrishnan is Chief Scientist for Audience and Cloud Computing at Yahoo!, and is a Research Fellow, heading the Community Systems area in Yahoo! Research. He was Professor of Computer Sciences at the University of Wisconsin-Madison, and was founder and CTO of QUIQ, a company that pioneered question-answering communities, powering Ask Jeeves' AnswerPoint as well as customer-support for companies such as Compaq. His research has influenced query optimization in commercial database systems, and the design of window functions in SQL:1999. His paper on the Birch clustering algorithm received the SIGMOD 10-Year Test-of-Time award, and he has written the widely-used text "Database Management Systems" (with Johannes Gehrke). He is Chair of ACM SIGMOD, on the Board of Directors of ACM SIGKDD and the Board of Trustees of the VLDB Endowment, and has served as editor-in-chief of the Journal of Data Mining and Knowledge Discovery, associate editor of ACM Transactions on Database Systems, and the Database area editor of the Journal of Logic Programming. Ramakrishnan is a Fellow of the Association for Computing Machinery (ACM) and the Institute of Electrical and Electronics Engineers (IEEE), and has received several awards, including a Distinguished Alumnus Award from IIT Madras, a Packard Foundation Fellowship in Science and Engineering, an NSF Presidential Young Investigator Award, and an ACM SIGMOD Contributions Award.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract The objective of this work is to improve the quality of the information that belongs to the database CubaCiencia, of the Institute of Scientific and Technological Information. This database has bibliographic information referring to four segments of science and is the main database of the Library Management System. The applied methodology was based on the Decision Trees, the Correlation Matrix, the 3D Scatter Plot, etc., which are techniques used by data mining, for the study of large volumes of information. The results achieved not only made it possible to improve the information in the database, but also provided truly useful patterns in the solution of the proposed objectives.