100+ datasets found

Data from: Comparison of predictive performance of data mining algorithms in...
scielo.figshare.com
jpeg
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Senol Celik; Ecevit Eyduran; Koksal Karadas; Mohammad Masood Tariq (2023). Comparison of predictive performance of data mining algorithms in predicting body weight in Mengali rams of Pakistan [Dataset]. http://doi.org/10.6084/m9.figshare.5719009.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5719009.v1
Dataset updated
Jun 4, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Senol Celik; Ecevit Eyduran; Koksal Karadas; Mohammad Masood Tariq
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Pakistan
Description
ABSTRACT The present study aimed at comparing predictive performance of some data mining algorithms (CART, CHAID, Exhaustive CHAID, MARS, MLP, and RBF) in biometrical data of Mengali rams. To compare the predictive capability of the algorithms, the biometrical data regarding body (body length, withers height, and heart girth) and testicular (testicular length, scrotal length, and scrotal circumference) measurements of Mengali rams in predicting live body weight were evaluated by most goodness of fit criteria. In addition, age was considered as a continuous independent variable. In this context, MARS data mining algorithm was used for the first time to predict body weight in two forms, without (MARS_1) and with interaction (MARS_2) terms. The superiority order in the predictive accuracy of the algorithms was found as CART > CHAID ≈ Exhaustive CHAID > MARS_2 > MARS_1 > RBF > MLP. Moreover, all tested algorithms provided a strong predictive accuracy for estimating body weight. However, MARS is the only algorithm that generated a prediction equation for body weight. Therefore, it is hoped that the available results might present a valuable contribution in terms of predicting body weight and describing the relationship between the body weight and body and testicular measurements in revealing breed standards and the conservation of indigenous gene sources for Mengali sheep breeding. Therefore, it will be possible to perform more profitable and productive sheep production. Use of data mining algorithms is useful for revealing the relationship between body weight and testicular traits in describing breed standards of Mengali sheep.
E
Dataset for training classifiers of comparative sentences
live.european-language-grid.eu
csv
Updated Apr 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Dataset for training classifiers of comparative sentences [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/7607
Explore at:
csvAvailable download formats
Dataset updated
Apr 19, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As there was no large publicly available cross-domain dataset for comparative argument mining, we create one composed of sentences, potentially annotated with BETTER / WORSE markers (the first object is better / worse than the second object) or NONE (the sentence does not contain a comparison of the target objects). The BETTER sentences stand for a pro-argument in favor of the first compared object and WORSE-sentences represent a con-argument and favor the second object. We aim for minimizing dataset domain-specific biases in order to capture the nature of comparison and not the nature of the particular domains, thus decided to control the specificity of domains by the selection of comparison targets. We hypothesized and could confirm in preliminary experiments that comparison targets usually have a common hypernym (i.e., are instances of the same class), which we utilized for selection of the compared objects pairs. The most specific domain we choose, is computer science with comparison targets like programming languages, database products and technology standards such as Bluetooth or Ethernet. Many computer science concepts can be compared objectively (e.g., on transmission speed or suitability for certain applications). The objects for this domain were manually extracted from List of-articles at Wikipedia. In the annotation process, annotators were asked to only label sentences from this domain if they had some basic knowledge in computer science. The second, broader domain is brands. It contains objects of different types (e.g., cars, electronics, and food). As brands are present in everyday life, anyone should be able to label the majority of sentences containing well-known brands such as Coca-Cola or Mercedes. Again, targets for this domain were manually extracted from `List of''-articles at Wikipedia.The third domain is not restricted to any topic: random. For each of 24~randomly selected seed words 10 similar words were collected based on the distributional similarity API of JoBimText (http://www.jobimtext.org). Seed words created using randomlists.com: book, car, carpenter, cellphone, Christmas, coffee, cork, Florida, hamster, hiking, Hoover, Metallica, NBC, Netflix, ninja, pencil, salad, soccer, Starbucks, sword, Tolkien, wine, wood, XBox, Yale.Especially for brands and computer science, the resulting object lists were large (4493 in brands and 1339 in computer science). In a manual inspection, low-frequency and ambiguous objects were removed from all object lists (e.g., RAID (a hardware concept) and Unity (a game engine) are also regularly used nouns). The remaining objects were combined to pairs. For each object type (seed Wikipedia list page or the seed word), all possible combinations were created. These pairs were then used to find sentences containing both objects. The aforementioned approaches to selecting compared objects pairs tend minimize inclusion of the domain specific data, but do not solve the problem fully though. We keep open a question of extending dataset with diverse object pairs including abstract concepts for future work. As for the sentence mining, we used the publicly available index of dependency-parsed sentences from the Common Crawl corpus containing over 14 billion English sentences filtered for duplicates. This index was queried for sentences containing both objects of each pair. For 90% of the pairs, we also added comparative cue words (better, easier, faster, nicer, wiser, cooler, decent, safer, superior, solid, terrific, worse, harder, slower, poorly, uglier, poorer, lousy, nastier, inferior, mediocre) to the query in order to bias the selection towards comparisons but at the same time admit comparisons that do not contain any of the anticipated cues. This was necessary as a random sampling would have resulted in only a very tiny fraction of comparisons. Note that even sentences containing a cue word do not necessarily express a comparison between the desired targets (dog vs. cat: He's the best pet that you can get, better than a dog or cat.). It is thus especially crucial to enable a classifier to learn not to rely on the existence of clue words only (very likely in a random sample of sentences with very few comparisons). For our corpus, we keep pairs with at least 100 retrieved sentences.From all sentences of those pairs, 2500 for each category were randomly sampled as candidates for a crowdsourced annotation that we conducted on figure-eight.com in several small batches. Each sentence was annotated by at least five trusted workers. We ranked annotations by confidence, which is the figure-eight internal measure of combining annotator trust and voting, and discarded annotations with a confidence below 50%. Of all annotated items, 71% received unanimous votes and for over 85% at least 4 out of 5 workers agreed -- rendering the collection procedure aimed at ease of annotation successful.The final dataset contains 7199 sentences with 271 distinct object pairs. The majority of sentences (over 72%) are non-comparative despite biasing the selection with cue words; in 70% of the comparative sentences, the favored target is named first.You can browse though the data here: https://docs.google.com/spreadsheets/d/1U8i6EU9GUKmHdPnfwXEuBxi0h3aiRCLPRC-3c9ROiOE/edit?usp=sharing Full description of the dataset is available in the workshop paper at ACL 2019 conference. Please cite this paper if you use the data: Franzek, Mirco, Alexander Panchenko, and Chris Biemann. ""Categorization of Comparative Sentences for Argument Mining."" arXiv preprint arXiv:1809.06152 (2018).@inproceedings{franzek2018categorization, title={Categorization of Comparative Sentences for Argument Mining}, author={Panchenko, Alexander and Bondarenko, and Franzek, Mirco and Hagen, Matthias and Biemann, Chris}, booktitle={Proceedings of the 6th Workshop on Argument Mining at ACL'2019}, year={2019}, address={Florence, Italy}}
Data from: Comparison of data mining models applied to a surface...
scielo.figshare.com
jpeg
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anderson Cordeiro Charles; Anderson Amendoeira Namen; Pedro Paulo Gomes Watts Rodrigues (2023). Comparison of data mining models applied to a surface meteorological station [Dataset]. http://doi.org/10.6084/m9.figshare.5667640.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5667640.v1
Dataset updated
May 30, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Anderson Cordeiro Charles; Anderson Amendoeira Namen; Pedro Paulo Gomes Watts Rodrigues
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT This paper presents the application of data mining techniques for pattern identification obtained from the analysis of meteorological variables and their correlation with the occurrence of intense rainfall. The used data were collected between 2008 and 2012 by the surface meteorological station of the Polytechnic Institute of Rio de Janeiro State University, located in Nova Friburgo - RJ, Brazil. The main objective is the automatic prediction related to extreme precipitation events surrounding the meteorological station location one hour prior its occurrence. Classification models were developed based on decision trees and artificial neural networks. The steps of consistency analysis, treatment and data conversion, as well as the computational models used are described, and some metrics are compared in order to identify their effectiveness. The results obtained for the most accurate model presented a rate of 82. 9% of hits related to the prediction of rainfall equal to or greater than 10 mm h-1 one hour prior its occurrence. The results indicate the possibility of using this work to predict risk events in the study region.
Comparison of 14 classifiers
figshare.com
application/gzip
Updated Jun 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacques Wainer (2023). Comparison of 14 classifiers [Dataset]. http://doi.org/10.6084/m9.figshare.3407932.v2
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3407932.v2
Dataset updated
Jun 11, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Jacques Wainer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data, programs, results, and analysis software for the paper "Comparison of 14 different families of classification algorithms on 115 binary data sets" https://arxiv.org/abs/1606.00930
e
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery -...
exaly.com
csv, json
Updated Nov 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery - impact-factor [Dataset]. https://exaly.com/journal/29473/wiley-interdisciplinary-reviews-data-mining-and-knowledge-discovery
Explore at:
json, csvAvailable download formats
Dataset updated
Nov 1, 2025
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The graph shows the changes in the impact factor of ^ and its corresponding percentile for the sake of comparison with the entire literature. Impact Factor is the most common scientometric index, which is defined by the number of citations of papers in two preceding years divided by the number of papers published in those years.
f
Data from: Which Is a More Accurate Predictor in Colorectal Survival...
datasetcatalog.nlm.nih.gov
figshare.com
Updated Jul 25, 2012
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yue, Zhen-yu; Wang, Zhen-ning; Zhou, Xin; Tong, Lin-lin; Gao, Peng; Xu, Ying-ying; Song, Yong-xi; Xu, Hui-mian (2012). Which Is a More Accurate Predictor in Colorectal Survival Analysis? Nine Data Mining Algorithms vs. the TNM Staging System [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001153856
Explore at:
Dataset updated
Jul 25, 2012
Authors
Yue, Zhen-yu; Wang, Zhen-ning; Zhou, Xin; Tong, Lin-lin; Gao, Peng; Xu, Ying-ying; Song, Yong-xi; Xu, Hui-mian
Description
ObjectiveOver the past decades, many studies have used data mining technology to predict the 5-year survival rate of colorectal cancer, but there have been few reports that compared multiple data mining algorithms to the TNM classification of malignant tumors (TNM) staging system using a dataset in which the training and testing data were from different sources. Here we compared nine data mining algorithms to the TNM staging system for colorectal survival analysis. MethodsTwo different datasets were used: 1) the National Cancer Institute's Surveillance, Epidemiology, and End Results dataset; and 2) the dataset from a single Chinese institution. An optimization and prediction system based on nine data mining algorithms as well as two variable selection methods was implemented. The TNM staging system was based on the 7th edition of the American Joint Committee on Cancer TNM staging system. ResultsWhen the training and testing data were from the same sources, all algorithms had slight advantages over the TNM staging system in predictive accuracy. When the data were from different sources, only four algorithms (logistic regression, general regression neural network, Bayesian networks, and Naïve Bayes) had slight advantages over the TNM staging system. Also, there was no significant differences among all the algorithms (p>0.05). ConclusionsThe TNM staging system is simple and practical at present, and data mining methods are not accurate enough to replace the TNM staging system for colorectal cancer survival prediction. Furthermore, there were no significant differences in the predictive accuracy of all the algorithms when the data were from different sources. Building a larger dataset that includes more variables may be important for furthering predictive accuracy.
Comparative Reviews Dataset's
kaggle.com
zip
Updated Jan 22, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Umair Younis (2019). Comparative Reviews Dataset's [Dataset]. https://www.kaggle.com/umairyounis/comparative-reviews-datasets
Explore at:
zip(205233 bytes)Available download formats
Dataset updated
Jan 22, 2019
Authors
Umair Younis
Description
Context

To get improved results on Machine Learning Algorithms, and other techniques used in Data Mining.

Content

Comprises of two columns, the First row consists of comparative reviews, the second row contains polarities.

Acknowledgements

I pay thanks to my supervisor, Dr Muhammad Zubair Asghar, Assitant Professor, ICIT, Gomal University (KPK). Di.Khan. Without his guidance, I can't accomplish this task.

Inspiration

Comparative opinion mining is becoming the most popular research area in the field of Data Mining. These three comparative reviews datasets will help the researchers who are working in the area of opinion mining and sentiment analysis.
e
Statistical Analysis and Data Mining - impact-factor
exaly.com
csv, json
Updated Nov 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Statistical Analysis and Data Mining - impact-factor [Dataset]. https://exaly.com/journal/28577/statistical-analysis-and-data-mining
Explore at:
csv, jsonAvailable download formats
Dataset updated
Nov 1, 2025
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The graph shows the changes in the impact factor of ^ and its corresponding percentile for the sake of comparison with the entire literature. Impact Factor is the most common scientometric index, which is defined by the number of citations of papers in two preceding years divided by the number of papers published in those years.
d
Data from: Discovering System Health Anomalies using Data Mining Techniques
catalog.data.gov
s.cnmilf.com
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Discovering System Health Anomalies using Data Mining Techniques [Dataset]. https://catalog.data.gov/dataset/discovering-system-health-anomalies-using-data-mining-techniques
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
We discuss a statistical framework that underlies envelope detection schemes as well as dynamical models based on Hidden Markov Models (HMM) that can encompass both discrete and continuous sensor measurements for use in Integrated System Health Management (ISHM) applications. The HMM allows for the rapid assimilation, analysis, and discovery of system anomalies. We motivate our work with a discussion of an aviation problem where the identification of anomalous sequences is essential for safety reasons. The data in this application are discrete and continuous sensor measurements and can be dealt with seamlessly using the methods described here to discover anomalous flights. We specifically treat the problem of discovering anomalous features in the time series that may be hidden from the sensor suite and compare those methods to standard envelope detection methods on test data designed to accentuate the differences between the two methods. Identification of these hidden anomalies is crucial to building stable, reusable, and cost-efficient systems. We also discuss a data mining framework for the analysis and discovery of anomalies in high-dimensional time series of sensor measurements that would be found in an ISHM system. We conclude with recommendations that describe the tradeoffs in building an integrated scalable platform for robust anomaly detection in ISHM applications.
T
MINING PRODUCTION by Country Dataset
tradingeconomics.com
csv, excel, json, xml
Updated Jul 2, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2015). MINING PRODUCTION by Country Dataset [Dataset]. https://tradingeconomics.com/country-list/mining-production
Explore at:
csv, json, xml, excelAvailable download formats
Dataset updated
Jul 2, 2015
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2025
Area covered
World
Description
This dataset provides values for MINING PRODUCTION reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
m
OULAD-Comparative analysis-SS vs STEM courses
data.mendeley.com
Updated May 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raúl Marticorena Sánchez (2023). OULAD-Comparative analysis-SS vs STEM courses [Dataset]. http://doi.org/10.17632/8624n5symc.1
Explore at:
Unique identifier
https://doi.org/10.17632/8624n5symc.1
Dataset updated
May 22, 2023
Authors
Raúl Marticorena Sánchez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The current Python notebooks/scripts preprocess and train classifiers on OULAD dataset to compare prediction results on SS vs STEM Courses An R script is available to analyse the collected results.
MOESM4 of Deep-learning: investigating deep neural networks hyper-parameters...
springernature.figshare.com
xlsx
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexios Koutsoukas; Keith Monaghan; Xiaoli Li; Jun Huan (2023). MOESM4 of Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data [Dataset]. http://doi.org/10.6084/m9.figshare.c.3814018_D4.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.c.3814018_D4.v1
Dataset updated
Jun 2, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Alexios Koutsoukas; Keith Monaghan; Xiaoli Li; Jun Huan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 4. Results of parameter selection.
C
Cryptocurrency Mining Platform Report
datainsightsmarket.com
doc, pdf, ppt
Updated Jun 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Cryptocurrency Mining Platform Report [Dataset]. https://www.datainsightsmarket.com/reports/cryptocurrency-mining-platform-1386453
Explore at:
pdf, ppt, docAvailable download formats
Dataset updated
Jun 24, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
Discover the booming cryptocurrency mining platform market! This comprehensive analysis reveals a $2.5B (2025) market projected to reach over $8B by 2033, driven by increased crypto adoption and technological advancements. Learn about key players, growth drivers, and market trends.
m
Foursquare Venue and Venue Comments Data
data.mendeley.com
Updated Feb 18, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Asim Sinan Yuksel (2018). Foursquare Venue and Venue Comments Data [Dataset]. http://doi.org/10.17632/29tbvvwkdp.2
Explore at:
Unique identifier
https://doi.org/10.17632/29tbvvwkdp.2
Dataset updated
Feb 18, 2018
Authors
Asim Sinan Yuksel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
1-Turkish comments for 128 venues in Foursquare Social Network Platform (binary and ternary classified) 2-Turkish adjectives and polarities 3-Turkish food and drink names 4- All comments without tagging 5-Venues, liked meals/foods

Global Web Crawler Tool Market Research Report: By Application (Data Mining,...

wiseguyreports.com

Updated Sep 15, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). Global Web Crawler Tool Market Research Report: By Application (Data Mining, Search Engine Optimization, Price Comparison, Web Archiving), By Deployment Type (On-Premises, Cloud-Based), By End Use (BFSI, E-commerce, Media and Entertainment, Healthcare, Education), By Size of Organization (Small Enterprises, Medium Enterprises, Large Enterprises) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/web-crawler-tool-market

Explore at:

Dataset updated

Sep 15, 2025

License

https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

Time period covered

Sep 25, 2025

Area covered

Global

Description

BASE YEAR	2024
HISTORICAL DATA	2019 - 2023
REGIONS COVERED	North America, Europe, APAC, South America, MEA
REPORT COVERAGE	Revenue Forecast, Competitive Landscape, Growth Factors, and Trends
MARKET SIZE 2024	2.87(USD Billion)
MARKET SIZE 2025	3.15(USD Billion)
MARKET SIZE 2035	8.0(USD Billion)
SEGMENTS COVERED	Application, Deployment Type, End Use, Size of Organization, Regional
COUNTRIES COVERED	US, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
KEY MARKET DYNAMICS	Increasing data volume, Rising demand for automation, Advancements in AI technologies, Growing e-commerce sector, Emphasis on data analysis
MARKET FORECAST UNITS	USD Billion
KEY COMPANIES PROFILED	Octoparse, IBM, Bing, Moz, Oracle, Ahrefs, Diffbot, WebHarvy, DataMiner, Import.io, Microsoft, ParseHub, Scrapy, Amazon, Google, Yahoo
MARKET FORECAST PERIOD	2025 - 2035
KEY MARKET OPPORTUNITIES	Increased demand for data analytics, Growing emphasis on SEO strategies, Rising usage of AI technology, Expansion in e-commerce sector, Enhanced cloud-based solutions.
COMPOUND ANNUAL GROWTH RATE (CAGR)	9.8% (2025 - 2035)

r
Data from: Comparison between traditional project appraisal methods and...
resodate.org
scielo.figshare.com
Updated Jan 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marcélio Prado Fontes; Jair Carlos Koppe; Nelson Albuquerque (2020). Comparison between traditional project appraisal methods and uncertainty analysis applied to mining planning [Dataset]. http://doi.org/10.6084/M9.FIGSHARE.12171498
Explore at:
Unique identifier
https://doi.org/10.6084/M9.FIGSHARE.12171498
Dataset updated
Jan 1, 2020
Dataset provided by
SciELO journals
Authors
Marcélio Prado Fontes; Jair Carlos Koppe; Nelson Albuquerque
Description
Abstract Long-term mining planning is a complex process which involves a large number of variables and uncertainties. Traditional discount cash flow (DCF) is usually used in the evaluation of mining projects. DCF includes net present value (NPV), internal rate of return (IRR), and profitability index (PI). A sensitivity analysis is usually carried out to evaluate the impact of the main variables on the project. Another way to measure uncertainties is through the Monte Carlo simulation (MCS). The objective of this study is to evaluate and compare the DCF methods and measure uncertainties through sensitivity analysis and MCS in the evaluation of mine sequencing. A case study of a phosphate mine project was used to chart the comparative study. In the results, NPV and uncertainty analysis through MCS were more consistent.
l
LScDC Word-Category RIG Matrix
figshare.le.ac.uk
pdf
Updated Apr 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neslihan Suzen (2020). LScDC Word-Category RIG Matrix [Dataset]. http://doi.org/10.25392/leicester.data.12133431.v2
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.25392/leicester.data.12133431.v2
Dataset updated
Apr 28, 2020
Dataset provided by
University of Leicester
Authors
Neslihan Suzen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
LScDC Word-Category RIG MatrixApril 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk / suzenneslihan@hotmail.com)Supervised by Prof Alexander Gorban and Dr Evgeny MirkesGetting StartedThis file describes the Word-Category RIG Matrix for theLeicester Scientific Corpus (LSC) [1], the procedure to build the matrix and introduces the Leicester Scientific Thesaurus (LScT) with the construction process. The Word-Category RIG Matrix is a 103,998 by 252 matrix, where rows correspond to words of Leicester Scientific Dictionary-Core (LScDC) [2] and columns correspond to 252 Web of Science (WoS) categories [3, 4, 5]. Each entry in the matrix corresponds to a pair (category,word). Its value for the pair shows the Relative Information Gain (RIG) on the belonging of a text from the LSC to the category from observing the word in this text. The CSV file of Word-Category RIG Matrix in the published archive is presented with two additional columns of the sum of RIGs in categories and the maximum of RIGs over categories (last two columns of the matrix). So, the file ‘Word-Category RIG Matrix.csv’ contains a total of 254 columns.This matrix is created to be used in future research on quantifying of meaning in scientific texts under the assumption that words have scientifically specific meanings in subject categories and the meaning can be estimated by information gains from word to categories. LScT (Leicester Scientific Thesaurus) is a scientific thesaurus of English. The thesaurus includes a list of 5,000 words from the LScDC. We consider ordering the words of LScDC by the sum of their RIGs in categories. That is, words are arranged in their informativeness in the scientific corpus LSC. Therefore, meaningfulness of words evaluated by words’ average informativeness in the categories. We have decided to include the most informative 5,000 words in the scientific thesaurus. Words as a Vector of Frequencies in WoS CategoriesEach word of the LScDC is represented as a vector of frequencies in WoS categories. Given the collection of the LSC texts, each entry of the vector consists of the number of texts containing the word in the corresponding category.It is noteworthy that texts in a corpus do not necessarily belong to a single category, as they are likely to correspond to multidisciplinary studies, specifically in a corpus of scientific texts. In other words, categories may not be exclusive. There are 252 WoS categories and a text can be assigned to at least 1 and at most 6 categories in the LSC. Using the binary calculation of frequencies, we introduce the presence of a word in a category. We create a vector of frequencies for each word, where dimensions are categories in the corpus.The collection of vectors, with all words and categories in the entire corpus, can be shown in a table, where each entry corresponds to a pair (word,category). This table is build for the LScDC with 252 WoS categories and presented in published archive with this file. The value of each entry in the table shows how many times a word of LScDC appears in a WoS category. The occurrence of a word in a category is determined by counting the number of the LSC texts containing the word in a category. Words as a Vector of Relative Information Gains Extracted for CategoriesIn this section, we introduce our approach to representation of a word as a vector of relative information gains for categories under the assumption that meaning of a word can be quantified by their information gained for categories.For each category, a function is defined on texts that takes the value 1, if the text belongs to the category, and 0 otherwise. For each word, a function is defined on texts that takes the value 1 if the word belongs to the text, and 0 otherwise. Consider LSC as a probabilistic sample space (the space of equally probable elementary outcomes). For the Boolean random variables, the joint probability distribution, the entropy and information gains are defined.The information gain about the category from the word is the amount of information on the belonging of a text from the LSC to the category from observing the word in the text [6]. We used the Relative Information Gain (RIG) providing a normalised measure of the Information Gain. This provides the ability of comparing information gains for different categories. The calculations of entropy, Information Gains and Relative Information Gains can be found in the README file in the archive published. Given a word, we created a vector where each component of the vector corresponds to a category. Therefore, each word is represented as a vector of relative information gains. It is obvious that the dimension of vector for each word is the number of categories. The set of vectors is used to form the Word-Category RIG Matrix, in which each column corresponds to a category, each row corresponds to a word and each component is the relative information gain from the word to the category. In Word-Category RIG Matrix, a row vector represents the corresponding word as a vector of RIGs in categories. We note that in the matrix, a column vector represents RIGs of all words in an individual category. If we choose an arbitrary category, words can be ordered by their RIGs from the most informative to the least informative for the category. As well as ordering words in each category, words can be ordered by two criteria: sum and maximum of RIGs in categories. The top n words in this list can be considered as the most informative words in the scientific texts. For a given word, the sum and maximum of RIGs are calculated from the Word-Category RIG Matrix.RIGs for each word of LScDC in 252 categories are calculated and vectors of words are formed. We then form the Word-Category RIG Matrix for the LSC. For each word, the sum (S) and maximum (M) of RIGs in categories are calculated and added at the end of the matrix (last two columns of the matrix). The Word-Category RIG Matrix for the LScDC with 252 categories, the sum of RIGs in categories and the maximum of RIGs over categories can be found in the database.Leicester Scientific Thesaurus (LScT)Leicester Scientific Thesaurus (LScT) is a list of 5,000 words form the LScDC [2]. Words of LScDC are sorted in descending order by the sum (S) of RIGs in categories and the top 5,000 words are selected to be included in the LScT. We consider these 5,000 words as the most meaningful words in the scientific corpus. In other words, meaningfulness of words evaluated by words’ average informativeness in the categories and the list of these words are considered as a ‘thesaurus’ for science. The LScT with value of sum can be found as CSV file with the published archive. Published archive contains following files:1) Word_Category_RIG_Matrix.csv: A 103,998 by 254 matrix where columns are 252 WoS categories, the sum (S) and the maximum (M) of RIGs in categories (last two columns of the matrix), and rows are words of LScDC. Each entry in the first 252 columns is RIG from the word to the category. Words are ordered as in the LScDC.2) Word_Category_Frequency_Matrix.csv: A 103,998 by 252 matrix where columns are 252 WoS categories and rows are words of LScDC. Each entry of the matrix is the number of texts containing the word in the corresponding category. Words are ordered as in the LScDC.3) LScT.csv: List of words of LScT with sum (S) values. 4) Text_No_in_Cat.csv: The number of texts in categories. 5) Categories_in_Documents.csv: List of WoS categories for each document of the LSC.6) README.txt: Description of Word-Category RIG Matrix, Word-Category Frequency Matrix and LScT and forming procedures.7) README.pdf (same as 6 in PDF format)References[1] Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v2[2] Suzen, Neslihan (2019): LScDC (Leicester Scientific Dictionary-Core). figshare. Dataset. https://doi.org/10.25392/leicester.data.9896579.v3[3] Web of Science. (15 July). Available: https://apps.webofknowledge.com/[4] WoS Subject Categories. Available: https://images.webofknowledge.com/WOKRS56B5/help/WOS/hp_subject_category_terms_tasca.html [5] Suzen, N., Mirkes, E. M., & Gorban, A. N. (2019). LScDC-new large scientific dictionary. arXiv preprint arXiv:1912.06858. [6] Shannon, C. E. (1948). A mathematical theory of communication. Bell system technical journal, 27(3), 379-423.
T
GDP FROM MINING by Country Dataset
tradingeconomics.com
csv, excel, json, xml
Updated Feb 3, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2016). GDP FROM MINING by Country Dataset [Dataset]. https://tradingeconomics.com/country-list/gdp-from-mining
Explore at:
csv, xml, json, excelAvailable download formats
Dataset updated
Feb 3, 2016
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2025
Area covered
World
Description
This dataset provides values for GDP FROM MINING reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
T
GDP FROM MINING by Country in AMERICA
tradingeconomics.com
csv, excel, json, xml
Updated May 27, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2017). GDP FROM MINING by Country in AMERICA [Dataset]. https://tradingeconomics.com/country-list/gdp-from-mining?continent=america
Explore at:
excel, csv, xml, jsonAvailable download formats
Dataset updated
May 27, 2017
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2025
Area covered
United States
Description
This dataset provides values for GDP FROM MINING reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
Data from: CONCEPT- DM2 DATA MODEL TO ANALYSE HEALTHCARE PATHWAYS OF TYPE 2...
zenodo.org
bin, png, zip
Updated Jul 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Berta Ibáñez-Beroiz; Berta Ibáñez-Beroiz; Asier Ballesteros-Domínguez; Asier Ballesteros-Domínguez; Ignacio Oscoz-Villanueva; Ignacio Oscoz-Villanueva; Ibai Tamayo; Ibai Tamayo; Julián Librero; Julián Librero; Mónica Enguita-Germán; Mónica Enguita-Germán; Francisco Estupiñán-Romero; Francisco Estupiñán-Romero; Enrique Bernal-Delgado; Enrique Bernal-Delgado (2024). CONCEPT- DM2 DATA MODEL TO ANALYSE HEALTHCARE PATHWAYS OF TYPE 2 DIABETES [Dataset]. http://doi.org/10.5281/zenodo.7778291
Explore at:
bin, png, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7778291
Dataset updated
Jul 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Berta Ibáñez-Beroiz; Berta Ibáñez-Beroiz; Asier Ballesteros-Domínguez; Asier Ballesteros-Domínguez; Ignacio Oscoz-Villanueva; Ignacio Oscoz-Villanueva; Ibai Tamayo; Ibai Tamayo; Julián Librero; Julián Librero; Mónica Enguita-Germán; Mónica Enguita-Germán; Francisco Estupiñán-Romero; Francisco Estupiñán-Romero; Enrique Bernal-Delgado; Enrique Bernal-Delgado
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Technical notes and documentation on the common data model of the project CONCEPT-DM2.

This publication corresponds to the Common Data Model (CDM) specification of the CONCEPT-DM2 project for the implementation of a federated network analysis of the healthcare pathway of type 2 diabetes.

Aims of the CONCEPT-DM2 project:

General aim: To analyse chronic care effectiveness and efficiency of care pathways in diabetes, assuming the relevance of care pathways as independent factors of health outcomes using data from real life world (RWD) from five Spanish Regional Health Systems.

Main specific aims:

To characterize the care pathways in patients with diabetes through the whole care system in terms of process indicators and pharmacologic recommendations

To compare these observed care pathways with the theoretical clinical pathways derived from the clinical practice guidelines

To assess if the adherence to clinical guidelines influence on important health outcomes, such as cardiovascular hospitalizations.

To compare the traditional analytical methods with process mining methods in terms of modeling quality, prediction performance and information provided.

Study Design: It is a population-based retrospective observational study centered on all T2D patients diagnosed in five Regional Health Services within the Spanish National Health Service. We will include all the contacts of these patients with the health services using the electronic medical record systems including Primary Care data, Specialized Care data, Hospitalizations, Urgent Care data, Pharmacy Claims, and also other registers such as the mortality and the population register.

Cohort definition: All patients with code of Type 2 Diabetes in the clinical health records

Inclusion criteria: patients that, at 01/01/2017 or during the follow-up from 01/01/2017 to 31/12/2022 had active health card (active TIS - tarjeta sanitaria activa) and code of type 2 diabetes (T2D, DM2 in spanish) in the clinical records of primary care (CIAP2 T90 in case of using CIAP code system)

Exclusion criteria:

patients with no contact with the health system from 01/01/2017 to 31/12/2022

patients that had a T1D (DM1) code opened after the T2D code during the follow-up.

Study period. From 01/01/2017 to 31/12/2022

Files included in this publication:

Datamodel_CONCEPT_DM2_diagram.png

Common data model specification (Datamodel_CONCEPT_DM2_v.0.1.0.xlsx)

Synthetic datasets (Datamodel_CONCEPT_DM2_sample_data)

sample_data1_dm_patient.csv

sample_data2_dm_param.csv

sample_data3_dm_patient.csv

sample_data4_dm_param.csv

sample_data5_dm_patient.csv

sample_data6_dm_param.csv

sample_data7_dm_param.csv

sample_data8_dm_param.csv

Datamodel_CONCEPT_DM2_explanation.pptx

Facebook

Twitter

Click to copy link

Link copied

Cite

Senol Celik; Ecevit Eyduran; Koksal Karadas; Mohammad Masood Tariq (2023). Comparison of predictive performance of data mining algorithms in predicting body weight in Mengali rams of Pakistan [Dataset]. http://doi.org/10.6084/m9.figshare.5719009.v1

Data from: Comparison of predictive performance of data mining algorithms in predicting body weight in Mengali rams of Pakistan

Explore at:

jpegAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.5719009.v1

Dataset updated

Jun 4, 2023

Dataset provided by

SciELOhttp://www.scielo.org/

Authors

Senol Celik; Ecevit Eyduran; Koksal Karadas; Mohammad Masood Tariq

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

Pakistan

Description

ABSTRACT The present study aimed at comparing predictive performance of some data mining algorithms (CART, CHAID, Exhaustive CHAID, MARS, MLP, and RBF) in biometrical data of Mengali rams. To compare the predictive capability of the algorithms, the biometrical data regarding body (body length, withers height, and heart girth) and testicular (testicular length, scrotal length, and scrotal circumference) measurements of Mengali rams in predicting live body weight were evaluated by most goodness of fit criteria. In addition, age was considered as a continuous independent variable. In this context, MARS data mining algorithm was used for the first time to predict body weight in two forms, without (MARS_1) and with interaction (MARS_2) terms. The superiority order in the predictive accuracy of the algorithms was found as CART > CHAID ≈ Exhaustive CHAID > MARS_2 > MARS_1 > RBF > MLP. Moreover, all tested algorithms provided a strong predictive accuracy for estimating body weight. However, MARS is the only algorithm that generated a prediction equation for body weight. Therefore, it is hoped that the available results might present a valuable contribution in terms of predicting body weight and describing the relationship between the body weight and body and testicular measurements in revealing breed standards and the conservation of indigenous gene sources for Mengali sheep breeding. Therefore, it will be possible to perform more profitable and productive sheep production. Use of data mining algorithms is useful for revealing the relationship between body weight and testicular traits in describing breed standards of Mengali sheep.

Clear search

Close search

Google apps

Main menu

Data from: Comparison of predictive performance of data mining algorithms in...

Dataset for training classifiers of comparative sentences

Data from: Comparison of data mining models applied to a surface...

Comparison of 14 classifiers

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery -...

Data from: Which Is a More Accurate Predictor in Colorectal Survival...

Comparative Reviews Dataset's

Context

Content

Acknowledgements

Inspiration

Statistical Analysis and Data Mining - impact-factor

Data from: Discovering System Health Anomalies using Data Mining Techniques

MINING PRODUCTION by Country Dataset

OULAD-Comparative analysis-SS vs STEM courses

MOESM4 of Deep-learning: investigating deep neural networks hyper-parameters...

Cryptocurrency Mining Platform Report

Foursquare Venue and Venue Comments Data

Global Web Crawler Tool Market Research Report: By Application (Data Mining,...

Data from: Comparison between traditional project appraisal methods and...

LScDC Word-Category RIG Matrix

GDP FROM MINING by Country Dataset

GDP FROM MINING by Country in AMERICA

Data from: CONCEPT- DM2 DATA MODEL TO ANALYSE HEALTHCARE PATHWAYS OF TYPE 2...

Data from: Comparison of predictive performance of data mining algorithms in predicting body weight in Mengali rams of Pakistan