Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data, programs, results, and analysis software for the paper "Comparison of 14 different families of classification algorithms on 115 binary data sets" https://arxiv.org/abs/1606.00930
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ABSTRACT This paper presents the application of data mining techniques for pattern identification obtained from the analysis of meteorological variables and their correlation with the occurrence of intense rainfall. The used data were collected between 2008 and 2012 by the surface meteorological station of the Polytechnic Institute of Rio de Janeiro State University, located in Nova Friburgo - RJ, Brazil. The main objective is the automatic prediction related to extreme precipitation events surrounding the meteorological station location one hour prior its occurrence. Classification models were developed based on decision trees and artificial neural networks. The steps of consistency analysis, treatment and data conversion, as well as the computational models used are described, and some metrics are compared in order to identify their effectiveness. The results obtained for the most accurate model presented a rate of 82. 9% of hits related to the prediction of rainfall equal to or greater than 10 mm h-1 one hour prior its occurrence. The results indicate the possibility of using this work to predict risk events in the study region.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The cryptocurrency mining platform market is experiencing robust growth, driven by the increasing adoption of cryptocurrencies and the ongoing evolution of mining technologies. The market, valued at approximately $2.5 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching an estimated market value exceeding $8 billion by 2033. This expansion is fueled by several key factors, including the increasing sophistication of mining hardware, the rise of cloud-based mining solutions offering accessibility to individual investors, and the ongoing development of more energy-efficient mining algorithms. The market is segmented by platform type (cloud-based, software-based, hardware-based), target users (individual miners, mining pools), and geographic region, with North America and Europe currently dominating market share. However, the market is not without its challenges. Regulatory uncertainties surrounding cryptocurrency mining in various jurisdictions pose a significant restraint on growth. Fluctuations in cryptocurrency prices also impact profitability, making it a volatile market for both miners and platform providers. Furthermore, the increasing energy consumption associated with cryptocurrency mining and the growing concerns about environmental sustainability are pushing for the adoption of more eco-friendly mining practices and technologies, thereby influencing platform development and adoption. The competitive landscape is intense, with a range of established players like NiceHash and newer entrants like Salad competing for market share. The success of these platforms hinges on factors such as ease of use, security features, profitability, and the ongoing support of the cryptocurrency ecosystem. The market will continue to evolve, influenced by technological advancements, regulatory developments, and the overall health of the cryptocurrency market.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for GDP FROM MINING.PHP reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
To get improved results on Machine Learning Algorithms, and other techniques used in Data Mining.
Comprises of two columns, the First row consists of comparative reviews, the second row contains polarities.
I pay thanks to my supervisor, Dr Muhammad Zubair Asghar, Assitant Professor, ICIT, Gomal University (KPK). Di.Khan. Without his guidance, I can't accomplish this task.
Comparative opinion mining is becoming the most popular research area in the field of Data Mining. These three comparative reviews datasets will help the researchers who are working in the area of opinion mining and sentiment analysis.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for MINING PRODUCTION reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for GDP FROM MINING.PHP reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
We discuss a statistical framework that underlies envelope detection schemes as well as dynamical models based on Hidden Markov Models (HMM) that can encompass both discrete and continuous sensor measurements for use in Integrated System Health Management (ISHM) applications. The HMM allows for the rapid assimilation, analysis, and discovery of system anomalies. We motivate our work with a discussion of an aviation problem where the identification of anomalous sequences is essential for safety reasons. The data in this application are discrete and continuous sensor measurements and can be dealt with seamlessly using the methods described here to discover anomalous flights. We specifically treat the problem of discovering anomalous features in the time series that may be hidden from the sensor suite and compare those methods to standard envelope detection methods on test data designed to accentuate the differences between the two methods. Identification of these hidden anomalies is crucial to building stable, reusable, and cost-efficient systems. We also discuss a data mining framework for the analysis and discovery of anomalies in high-dimensional time series of sensor measurements that would be found in an ISHM system. We conclude with recommendations that describe the tradeoffs in building an integrated scalable platform for robust anomaly detection in ISHM applications.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data on the impact on national parliaments resulting from the Covid-19 pandemic mined from country reports published by the Lex-Atlas: Covid-19 project and the Oxford University Press. For more information see https://lexatlas-c19.org
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1-Turkish comments for 128 venues in Foursquare Social Network Platform (binary and ternary classified) 2-Turkish adjectives and polarities 3-Turkish food and drink names 4- All comments without tagging 5-Venues, liked meals/foods
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for GDP FROM MINING.PHP reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
miners mines mining-contractors mining-data mining-fatalities mining-injuries mining-injury-rates mining-operators mining-safety-statistics mining-statistics mining-trends mining-yearly-comparisons msha msha-at-the-glance
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for EXPORT NATURAL HYDROCARBONS PRDS OF MINING ELE reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 4. Results of parameter selection.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for GDP FROM MINING reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Metallic Mining Development Potential Index: This global dataset continuously ranks from >0 (minimal) to 1 (highest) all suitable lands at a 1-km resolution for metallic mining (e.g. gold, silver, copper). These data are derived from the weighted summary of four criteria maps: 1) proxy yield values based on deposit size and numbers, 2) distance to demand centers, 3) distance to major roads, and 4) distance to railways or ports. Included with these data are: a) the classified version of the continuous DPI, b) the corresponding classified uncertainty dataset, c) detailed sensitivity tables for all criteria used in the analysis, and d) full description of the constraints and criteria used in the analysis with the Analytic Hierarch Process (AHP) pairwise comparison matrix and resulting criteria weights derived from AHP. Non Metallic Mining Development Potential Index: This global dataset continuously ranks from >0 (minimal) to 1 (highest) all suitable lands at a 1-km resolution for non-metallic mining (e.g. sand and gravel mining). These data are derived from the weighted summary of four criteria maps: 1) proxy yield values based on deposit size and numbers, 2) distance to demand centers, 3) distance to major roads, and 4) distance to railways or ports. Included with these data are: a) the classified version of the continuous DPI, b) the corresponding classified uncertainty dataset, c) detailed sensitivity tables for all criteria used in the analysis, and d) full description of the constraints and criteria used in the analysis with the Analytic Hierarch Process (AHP) pairwise comparison matrix and resulting criteria weights derived from AHP.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains more than 17000 data of credit card holder with 20 predictor variables and 1 binary target variable. The corresponding R code for comparing several proposed (density-based) and existing synthetic oversampling methods (SMOTE-based) is also provided.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Key Characteristics of Algorithms' Dynamics Beyond Accuracy - Evaluation Tests conducted for the paper: What do anomaly scores actually mean? Key characteristics of algorithms' dynamics beyond accuracy by F. Iglesias, H. O. Marques, A. Zimek, T. Zseby Context and methodology Anomaly detection is intrinsic to a large number of data analysis applications today. Most of the algorithms used assign an outlierness score to each instance prior to establishing anomalies in a binary form. The experiments in this repository study how different algorithms generate different dynamics in the outlierness scores and react in very different ways to possible model perturbations that affect data. The study elaborated in the referred paper presents new indices and coefficients to assess the dynamics and explores the responses of the algorithms as a function of variations in these indices, revealing key aspects of the interdependence between algorithms, data geometries and the ability to discriminate anomalies. Therefeore, this repository reproduces the conducted experiments, which study eight algorithms (ABOD, HBOS, iForest, K-NN, LOF, OCSVM, SDO and GLOSH), submitted to seven perturbations related to: cardinality, dimensionality, outlier proportion, inlier-outlier density ratio, density layers, clusters and local outliers, and collects behavioural profiles with eleven measurements (Adjusted Average Precission, ROC-AUC, Perini's Confidence [1], Perini's Stability [2], S-curves, Discriminant Power, Robust Coefficients of Variations for Inliers and Outliers, Coherence, Bias and Robustness) under two types of normalization: linear and Gaussian, the latter aiming to standardize the outlierness scores issued by different algorithms [3]. This repository is framed within the research on the following domains: algorithm evaluation, outlier detection, anomaly detection, unsupervised learning, machine learning, data mining, data analysis. Datasets and algorithms can be used for experiment replication and for further evaluation and comparison. References [1] Perini, L., Vercruyssen, V., Davis, J.: Quantifying the confidence of anomaly detectors in their example-wise predictions. In: The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. Springer Verlag (2020). [2] Perini, L., Galvin, C., Vercruyssen, V.: A Ranking Stability Measure for Quantifying the Robustness of Anomaly Detection Methods. In: 2nd Workshop on Evaluation and Experimental Design in Data Mining and Machine Learning @ ECML/PKDD (2020). [3] Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: Interpreting and unifying outlier scores. In: Proceedings of the 2011 SIAM International Conference on Data Mining (SDM), pp. 13–24 (2011) Technical details Experiments are in Python 3. Provided scripts generate all data and results. We keep them in the repo for the sake of comparability and replicability. The file and folder structure is as follows:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Technical notes and documentation on the common data model of the project CONCEPT-DM2.
This publication corresponds to the Common Data Model (CDM) specification of the CONCEPT-DM2 project for the implementation of a federated network analysis of the healthcare pathway of type 2 diabetes.
Aims of the CONCEPT-DM2 project:
General aim: To analyse chronic care effectiveness and efficiency of care pathways in diabetes, assuming the relevance of care pathways as independent factors of health outcomes using data from real life world (RWD) from five Spanish Regional Health Systems.
Main specific aims:
Study Design: It is a population-based retrospective observational study centered on all T2D patients diagnosed in five Regional Health Services within the Spanish National Health Service. We will include all the contacts of these patients with the health services using the electronic medical record systems including Primary Care data, Specialized Care data, Hospitalizations, Urgent Care data, Pharmacy Claims, and also other registers such as the mortality and the population register.
Cohort definition: All patients with code of Type 2 Diabetes in the clinical health records
Files included in this publication:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The "Dataset_HIR" folder contains the data to reproduce the results of the data mining approach proposed in the manuscript titled "Identification of hindered internal rotational mode for complex chemical species: A data mining approach with multivariate logistic regression model".
More specifically, the folder contains the raw electronic structure calculation input data provided by the domain experts as well as the training and testing dataset with the extracted features.
The "Dataset_HIR" folder contains the following subfolders namely:
Electronic structure calculation input data: contains the electronic structure calculation input generated by the Gaussian program
1.1. Testing data: contains the raw data of all training species (each is stored in a separate folder) used for extracting dataset for training and validation phase.
1.2. Testing data: contains the raw data of all testing species (each is stored in a separate folder) used for extracting data for the testing phase.
Dataset 2.1. Training dataset: used to produce the results in Tables 3 and 4 in the manuscript
+ datasetTrain_raw.csv: contains the features for all vibrational modes associated with corresponding labeled species to let the chemists select the Hindered Internal Rotor from the list easily for the training and validation steps.
+ datasetTrain.csv: refines the datasetTrain_raw.csv where the names of the species are all removed to transform the dataset into an appropriate form for the modeling and validation steps.
2.2. Testing dataset: used to produce the results of the data mining approach in Table 5 in the manuscript.
+ datasetTest_raw.csv: contains the features for all vibrational modes of each labeled species to let the chemists select the Hindered Internal Rotor from the list for the testing step.
+ datasetTest.csv: refines the datasetTest_raw.csv where the names of the species are all removed to transform the dataset into an appropriate form for the testing step.
Note for the Result feature in the dataset: 1 is for the mode needed to be treated as Hindered Internal Rotor, and 0 otherwise.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data, programs, results, and analysis software for the paper "Comparison of 14 different families of classification algorithms on 115 binary data sets" https://arxiv.org/abs/1606.00930