100+ datasets found

s
Data from: Social Media Data Mining Becomes Ordinary
orda.shef.ac.uk
docx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Helen Kennedy (2023). Social Media Data Mining Becomes Ordinary [Dataset]. http://doi.org/10.15131/shef.data.5195032.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.15131/shef.data.5195032.v1
Dataset updated
May 30, 2023
Dataset provided by
The University of Sheffield
Authors
Helen Kennedy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This research explored what happens when social media data mining becomes ordinary and is carried out by organisations that might be seen as the pillars of everyday life. The interviews on which the transcripts are based are discussed in Chapter 6 of the book. The referenced book contains a description of the methods. No other publications resulted from working with these transcripts.
d
Distributed Data Mining in Peer-to-Peer Networks
catalog.data.gov
data.nasa.gov
+1more
Updated Apr 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Distributed Data Mining in Peer-to-Peer Networks [Dataset]. https://catalog.data.gov/dataset/distributed-data-mining-in-peer-to-peer-networks
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description
Peer-to-peer (P2P) networks are gaining popularity in many applications such as file sharing, e-commerce, and social networking, many of which deal with rich, distributed data sources that can benefit from data mining. P2P networks are, in fact,well-suited to distributed data mining (DDM), which deals with the problem of data analysis in environments with distributed data,computing nodes,and users. This article offers an overview of DDM applications and algorithms for P2P environments,focusing particularly on local algorithms that perform data analysis by using computing primitives with limited communication overhead. The authors describe both exact and approximate local P2P data mining algorithms that work in a decentralized and communication-efficient manner.
f
Additional file 1 of Novel methods of qualitative analysis for health policy...
springernature.figshare.com
txt
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mireya Martínez-García; Maite Vallejo; Enrique Hernández-Lemus; Jorge Alberto Álvarez-Díaz (2023). Additional file 1 of Novel methods of qualitative analysis for health policy research [Dataset]. http://doi.org/10.6084/m9.figshare.7587416.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7587416.v1
Dataset updated
Jun 1, 2023
Dataset provided by
figshare
Authors
Mireya Martínez-García; Maite Vallejo; Enrique Hernández-Lemus; Jorge Alberto Álvarez-Díaz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Interactive network files. Interactive network files with all statistical and topological analyses. This is a Cytoscape.cys session. In order to open/view/modify this file please use the freely available Cytoscape software platform, available at http://www.cytoscape.org/download.php . (SIF 3413 kb)
o
Data from: Identifying Missing Data Handling Methods with Text Mining
openicpsr.org
delimited
Updated Mar 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Krisztián Boros; Zoltán Kmetty (2023). Identifying Missing Data Handling Methods with Text Mining [Dataset]. http://doi.org/10.3886/E185961V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E185961V1
Dataset updated
Mar 8, 2023
Dataset provided by
Hungarian Academy of Sciences
Authors
Krisztián Boros; Zoltán Kmetty
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1999 - Dec 31, 2016
Description
Missing data is an inevitable aspect of every empirical research. Researchers developed several techniques to handle missing data to avoid information loss and biases. Over the past 50 years, these methods have become more and more efficient and also more complex. Building on previous review studies, this paper aims to analyze what kind of missing data handling methods are used among various scientific disciplines. For the analysis, we used nearly 50.000 scientific articles that were published between 1999 and 2016. JSTOR provided the data in text format. Furthermore, we utilized a text-mining approach to extract the necessary information from our corpus. Our results show that the usage of advanced missing data handling methods such as Multiple Imputation or Full Information Maximum Likelihood estimation is steadily growing in the examination period. Additionally, simpler methods, like listwise and pairwise deletion, are still in widespread use.
f
Data from: Enhancing the Human Health Status Prediction: The ATHLOS Project
tandf.figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
P. Anagnostou; S. Tasoulis; A. G. Vrahatis; S. Georgakopoulos; M. Prina; J. L. Ayuso-Mateos; J. Bickenbach; I. Bayes-Marin; F. F. Caballero; L. Egea-Cortés; E. García-Esquinas; M. Leonardi; S. Scherbov; A. Tamosiunas; A. Galas; J. M. Haro; A. Sanchez-Niubo; V. Plagianakos; D. Panagiotakos (2023). Enhancing the Human Health Status Prediction: The ATHLOS Project [Dataset]. http://doi.org/10.6084/m9.figshare.14798079.v1
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14798079.v1
Dataset updated
Jun 3, 2023
Dataset provided by
Taylor & Francis
Authors
P. Anagnostou; S. Tasoulis; A. G. Vrahatis; S. Georgakopoulos; M. Prina; J. L. Ayuso-Mateos; J. Bickenbach; I. Bayes-Marin; F. F. Caballero; L. Egea-Cortés; E. García-Esquinas; M. Leonardi; S. Scherbov; A. Tamosiunas; A. Galas; J. M. Haro; A. Sanchez-Niubo; V. Plagianakos; D. Panagiotakos
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Preventive healthcare is a crucial pillar of health as it contributes to staying healthy and having immediate treatment when needed. Mining knowledge from longitudinal studies has the potential to significantly contribute to the improvement of preventive healthcare. Unfortunately, data originated from such studies are characterized by high complexity, huge volume, and a plethora of missing values. Machine Learning, Data Mining and Data Imputation models are utilized a part of solving these challenges, respectively. Toward this direction, we focus on the development of a complete methodology for the ATHLOS Project – funded by the European Union’s Horizon 2020 Research and Innovation Program, which aims to achieve a better interpretation of the impact of aging on health. The inherent complexity of the provided dataset lies in the fact that the project includes 15 independent European and international longitudinal studies of aging. In this work, we mainly focus on the HealthStatus (HS) score, an index that estimates the human status of health, aiming to examine the effect of various data imputation models to the prediction power of classification and regression models. Our results are promising, indicating the critical importance of data imputation in enhancing preventive medicine’s crucial role.
f
Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf
frontiersin.figshare.com
pdf
Updated Jun 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xin Qiao; Hong Jiao (2023). Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf [Dataset]. http://doi.org/10.3389/fpsyg.2018.02231.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fpsyg.2018.02231.s001
Dataset updated
Jun 7, 2023
Dataset provided by
Frontiers
Authors
Xin Qiao; Hong Jiao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Due to increasing use of technology-enhanced educational assessment, data mining methods have been explored to analyse process data in log files from such assessment. However, most studies were limited to one data mining technique under one specific scenario. The current study demonstrates the usage of four frequently used supervised techniques, including Classification and Regression Trees (CART), gradient boosting, random forest, support vector machine (SVM), and two unsupervised methods, Self-organizing Map (SOM) and k-means, fitted to one assessment data. The USA sample (N = 426) from the 2012 Program for International Student Assessment (PISA) responding to problem-solving items is extracted to demonstrate the methods. After concrete feature generation and feature selection, classifier development procedures are implemented using the illustrated techniques. Results show satisfactory classification accuracy for all the techniques. Suggestions for the selection of classifiers are presented based on the research questions, the interpretability and the simplicity of the classifiers. Interpretations for the results from both supervised and unsupervised learning methods are provided.
r
Journal of Big Data Impact Factor 2024-2025 - ResearchHelpDesk
researchhelpdesk.org
Updated Feb 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Research Help Desk (2022). Journal of Big Data Impact Factor 2024-2025 - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/impact-factor-if/289/journal-of-big-data
Explore at:
Dataset updated
Feb 23, 2022
Dataset authored and provided by
Research Help Desk
Description
Journal of Big Data Impact Factor 2024-2025 - ResearchHelpDesk - The Journal of Big Data publishes high-quality, scholarly research papers, methodologies and case studies covering a broad range of topics, from big data analytics to data-intensive computing and all applications of big data research. The journal examines the challenges facing big data today and going forward including, but not limited to: data capture and storage; search, sharing, and analytics; big data technologies; data visualization; architectures for massively parallel processing; data mining tools and techniques; machine learning algorithms for big data; cloud computing platforms; distributed file systems and databases; and scalable storage systems. Academic researchers and practitioners will find the Journal of Big Data to be a seminal source of innovative material. All articles published by the Journal of Big Data are made freely and permanently accessible online immediately upon publication, without subscription charges or registration barriers. As authors of articles published in the Journal of Big Data you are the copyright holders of your article and have granted to any third party, in advance and in perpetuity, the right to use, reproduce or disseminate your article, according to the SpringerOpen copyright and license agreement. For those of you who are US government employees or are prevented from being copyright holders for similar reasons, SpringerOpen can accommodate non-standard copyright lines.
f
Data from: Results obtained in a data mining process applied to a database...
scielo.figshare.com
jpeg
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
E.M. Ruiz Lobaina; C. P. Romero Suárez (2023). Results obtained in a data mining process applied to a database containing bibliographic information concerning four segments of science. [Dataset]. http://doi.org/10.6084/m9.figshare.20011798.v1
Explore at:
jpegAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.20011798.v1
Dataset updated
Jun 4, 2023
Dataset provided by
SciELO journals
Authors
E.M. Ruiz Lobaina; C. P. Romero Suárez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract The objective of this work is to improve the quality of the information that belongs to the database CubaCiencia, of the Institute of Scientific and Technological Information. This database has bibliographic information referring to four segments of science and is the main database of the Library Management System. The applied methodology was based on the Decision Trees, the Correlation Matrix, the 3D Scatter Plot, etc., which are techniques used by data mining, for the study of large volumes of information. The results achieved not only made it possible to improve the information in the database, but also provided truly useful patterns in the solution of the proposed objectives.
m
COVID19FN
data.mendeley.com
Updated Jun 13, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Isha AGARWAL (2020). COVID19FN [Dataset]. http://doi.org/10.17632/b96v5hmfv6.1
Explore at:
Unique identifier
https://doi.org/10.17632/b96v5hmfv6.1
Dataset updated
Jun 13, 2020
Authors
Isha AGARWAL
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
COVID19FN is a dataset that comprises of labelled news articles of misinformation spread during the Infodemic. It contains approximately 2800 news articles, real and fake, scraped from Poynter and other fact-checking sites. It also contains information such as source URL, publish date and origin country of the news article. Potential applications of this dataset would be to explore various research areas such as classification of intention, study of spatial and temporal features, and linguistic indications that can provide further insight and help mitigate its effect as much as possible.
l
LSC (Leicester Scientific Corpus)
figshare.le.ac.uk
Updated Apr 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neslihan Suzen (2020). LSC (Leicester Scientific Corpus) [Dataset]. http://doi.org/10.25392/leicester.data.9449639.v2
Explore at:
Unique identifier
https://doi.org/10.25392/leicester.data.9449639.v2
Dataset updated
Apr 15, 2020
Dataset provided by
University of Leicester
Authors
Neslihan Suzen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Leicester
Description
The LSC (Leicester Scientific Corpus)

April 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk) Supervised by Prof Alexander Gorban and Dr Evgeny MirkesThe data are extracted from the Web of Science [1]. You may not copy or distribute these data in whole or in part without the written consent of Clarivate Analytics.[Version 2] A further cleaning is applied in Data Processing for LSC Abstracts in Version 1*. Details of cleaning procedure are explained in Step 6.* Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v1.Getting StartedThis text provides the information on the LSC (Leicester Scientific Corpus) and pre-processing steps on abstracts, and describes the structure of files to organise the corpus. This corpus is created to be used in future work on the quantification of the meaning of research texts and make it available for use in Natural Language Processing projects.LSC is a collection of abstracts of articles and proceeding papers published in 2014, and indexed by the Web of Science (WoS) database [1]. The corpus contains only documents in English. Each document in the corpus contains the following parts:1. Authors: The list of authors of the paper2. Title: The title of the paper 3. Abstract: The abstract of the paper 4. Categories: One or more category from the list of categories [2]. Full list of categories is presented in file ‘List_of _Categories.txt’. 5. Research Areas: One or more research area from the list of research areas [3]. Full list of research areas is presented in file ‘List_of_Research_Areas.txt’. 6. Total Times cited: The number of times the paper was cited by other items from all databases within Web of Science platform [4] 7. Times cited in Core Collection: The total number of times the paper was cited by other papers within the WoS Core Collection [4]The corpus was collected in July 2018 online and contains the number of citations from publication date to July 2018. We describe a document as the collection of information (about a paper) listed above. The total number of documents in LSC is 1,673,350.Data ProcessingStep 1: Downloading of the Data Online

The dataset is collected manually by exporting documents as Tab-delimitated files online. All documents are available online.Step 2: Importing the Dataset to R

The LSC was collected as TXT files. All documents are extracted to R.Step 3: Cleaning the Data from Documents with Empty Abstract or without CategoryAs our research is based on the analysis of abstracts and categories, all documents with empty abstracts and documents without categories are removed.Step 4: Identification and Correction of Concatenate Words in AbstractsEspecially medicine-related publications use ‘structured abstracts’. Such type of abstracts are divided into sections with distinct headings such as introduction, aim, objective, method, result, conclusion etc. Used tool for extracting abstracts leads concatenate words of section headings with the first word of the section. For instance, we observe words such as ConclusionHigher and ConclusionsRT etc. The detection and identification of such words is done by sampling of medicine-related publications with human intervention. Detected concatenate words are split into two words. For instance, the word ‘ConclusionHigher’ is split into ‘Conclusion’ and ‘Higher’.The section headings in such abstracts are listed below:

Background Method(s) Design Theoretical Measurement(s) Location Aim(s) Methodology Process Abstract Population Approach Objective(s) Purpose(s) Subject(s) Introduction Implication(s) Patient(s) Procedure(s) Hypothesis Measure(s) Setting(s) Limitation(s) Discussion Conclusion(s) Result(s) Finding(s) Material (s) Rationale(s) Implications for health and nursing policyStep 5: Extracting (Sub-setting) the Data Based on Lengths of AbstractsAfter correction, the lengths of abstracts are calculated. ‘Length’ indicates the total number of words in the text, calculated by the same rule as for Microsoft Word ‘word count’ [5].According to APA style manual [6], an abstract should contain between 150 to 250 words. In LSC, we decided to limit length of abstracts from 30 to 500 words in order to study documents with abstracts of typical length ranges and to avoid the effect of the length to the analysis.

Step 6: [Version 2] Cleaning Copyright Notices, Permission polices, Journal Names and Conference Names from LSC Abstracts in Version 1Publications can include a footer of copyright notice, permission policy, journal name, licence, author’s right or conference name below the text of abstract by conferences and journals. Used tool for extracting and processing abstracts in WoS database leads to attached such footers to the text. For example, our casual observation yields that copyright notices such as ‘Published by Elsevier ltd.’ is placed in many texts. To avoid abnormal appearances of words in further analysis of words such as bias in frequency calculation, we performed a cleaning procedure on such sentences and phrases in abstracts of LSC version 1. We removed copyright notices, names of conferences, names of journals, authors’ rights, licenses and permission policies identiﬁed by sampling of abstracts.Step 7: [Version 2] Re-extracting (Sub-setting) the Data Based on Lengths of AbstractsThe cleaning procedure described in previous step leaded to some abstracts having less than our minimum length criteria (30 words). 474 texts were removed.Step 8: Saving the Dataset into CSV FormatDocuments are saved into 34 CSV files. In CSV files, the information is organised with one record on each line and parts of abstract, title, list of authors, list of categories, list of research areas, and times cited is recorded in fields.To access the LSC for research purposes, please email to ns433@le.ac.uk.References[1]Web of Science. (15 July). Available: https://apps.webofknowledge.com/ [2]WoS Subject Categories. Available: https://images.webofknowledge.com/WOKRS56B5/help/WOS/hp_subject_category_terms_tasca.html [3]Research Areas in WoS. Available: https://images.webofknowledge.com/images/help/WOS/hp_research_areas_easca.html [4]Times Cited in WoS Core Collection. (15 July). Available: https://support.clarivate.com/ScientificandAcademicResearch/s/article/Web-of-Science-Times-Cited-accessibility-and-variation?language=en_US [5]Word Count. Available: https://support.office.com/en-us/article/show-word-count-3c9e6a11-a04d-43b4-977c-563a0e0d5da3 [6]A. P. Association, Publication manual. American Psychological Association Washington, DC, 1983.
s
Online Feature Selection and Its Applications
researchdata.smu.edu.sg
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HOI Steven; Jialei WANG; Peilin ZHAO; Rong JIN (2023). Online Feature Selection and Its Applications [Dataset]. http://doi.org/10.25440/smu.12062733.v1
Explore at:
Unique identifier
https://doi.org/10.25440/smu.12062733.v1
Dataset updated
May 31, 2023
Dataset provided by
SMU Research Data Repository (RDR)
Authors
HOI Steven; Jialei WANG; Peilin ZHAO; Rong JIN
License
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Description
Feature selection is an important technique for data mining before a machine learning algorithm is applied. Despite its importance, most studies of feature selection are restricted to batch learning. Unlike traditional batch learning methods, online learning represents a promising family of efficient and scalable machine learning algorithms for large-scale applications. Most existing studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of Online Feature Selection (OFS) in which an online learner is only allowed to maintain a classifier involved only a small and fixed number of features. The key challenge of Online Feature Selection is how to make accurate prediction using a small and fixed number of active features. This is in contrast to the classical setup of online learning where all the features can be used for prediction. We attempt to tackle this challenge by studying sparsity regularization and truncation techniques. Specifically, this article addresses two different tasks of online feature selection: (1) learning with full input where an learner is allowed to access all the features to decide the subset of active features, and (2) learning with partial input where only a limited number of features is allowed to be accessed for each instance by the learner. We present novel algorithms to solve each of the two problems and give their performance analysis. We evaluate the performance of the proposed algorithms for online feature selection on several public datasets, and demonstrate their applications to real-world problems including image classification in computer vision and microarray gene expression analysis in bioinformatics. The encouraging results of our experiments validate the efficacy and efficiency of the proposed techniques.Related Publication: Hoi, S. C., Wang, J., Zhao, P., & Jin, R. (2012). Online feature selection for mining big data. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (pp. 93-100). ACM. http://dx.doi.org/10.1145/2351316.2351329 Full text available in InK: http://ink.library.smu.edu.sg/sis_research/2402/ Wang, J., Zhao, P., Hoi, S. C., & Jin, R. (2014). Online feature selection and its applications. IEEE Transactions on Knowledge and Data Engineering, 26(3), 698-710. http://dx.doi.org/10.1109/TKDE.2013.32 Full text available in InK: http://ink.library.smu.edu.sg/sis_research/2277/
r
Journal of Big Data FAQ - ResearchHelpDesk
researchhelpdesk.org
Updated May 25, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Research Help Desk (2022). Journal of Big Data FAQ - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/faq/289/journal-of-big-data
Explore at:
Dataset updated
May 25, 2022
Dataset authored and provided by
Research Help Desk
Description
Journal of Big Data FAQ - ResearchHelpDesk - The Journal of Big Data publishes high-quality, scholarly research papers, methodologies and case studies covering a broad range of topics, from big data analytics to data-intensive computing and all applications of big data research. The journal examines the challenges facing big data today and going forward including, but not limited to: data capture and storage; search, sharing, and analytics; big data technologies; data visualization; architectures for massively parallel processing; data mining tools and techniques; machine learning algorithms for big data; cloud computing platforms; distributed file systems and databases; and scalable storage systems. Academic researchers and practitioners will find the Journal of Big Data to be a seminal source of innovative material. All articles published by the Journal of Big Data are made freely and permanently accessible online immediately upon publication, without subscription charges or registration barriers. As authors of articles published in the Journal of Big Data you are the copyright holders of your article and have granted to any third party, in advance and in perpetuity, the right to use, reproduce or disseminate your article, according to the SpringerOpen copyright and license agreement. For those of you who are US government employees or are prevented from being copyright holders for similar reasons, SpringerOpen can accommodate non-standard copyright lines.
z
Data from: Publication FP7 Funding Acknowledgment - PLOS OpenAIRE
zenodo.org
explore.openaire.eu
+1more
zip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Najko Jahn; Martin Fenner; Harry Dimitropoulos; Jochen Schirrwagen; Najko Jahn; Martin Fenner; Harry Dimitropoulos; Jochen Schirrwagen (2020). Publication FP7 Funding Acknowledgment - PLOS OpenAIRE [Dataset]. http://doi.org/10.5281/zenodo.1239
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1239
Dataset updated
Jan 24, 2020
Dataset provided by
DOI
Authors
Najko Jahn; Martin Fenner; Harry Dimitropoulos; Jochen Schirrwagen; Najko Jahn; Martin Fenner; Harry Dimitropoulos; Jochen Schirrwagen
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The dataset contains a sample of metadata describing papers published in PLOS and their identified grant agreement number of FP7 projects. A second file shows the frequency of FP7 grants. The sample was created in July 2012.
Data from: PTMTorrent: A Dataset for Mining Open-source Pre-trained Model...
figshare.com
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wenxin Jiang; Nicholas Synovic; Purvish Jajal; Taylor R. Schorlemmer; Arav Tewari; Bhavesh Pareek; George K. Thiruvathukal; James C. Davis (2023). PTMTorrent: A Dataset for Mining Open-source Pre-trained Model Packages [Dataset]. http://doi.org/10.6084/m9.figshare.22009880.v4
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22009880.v4
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Wenxin Jiang; Nicholas Synovic; Purvish Jajal; Taylor R. Schorlemmer; Arav Tewari; Bhavesh Pareek; George K. Thiruvathukal; James C. Davis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Due to the cost of developing and training deep learning models from scratch, machine learning engineers have begun to reuse pre-trained models (PTMs) and fine-tune them for downstream tasks. PTM registries known as “model hubs” support engineers in distributing and reusing deep learning models. PTM packages include pre-trained weights, documentation, model architectures, datasets, and metadata. Mining the information in PTM packages will enable the discovery of engineering phenomena and tools to support software engineers. However, accessing this information is difficult — there are many PTM registries, and both the registries and the individual packages may have rate limiting for accessing the data.

We present an open-source dataset, PTMTorrent, to facilitate the evaluation and understanding of PTM packages. This paper describes the creation, structure, usage, and limitations of the dataset. The dataset includes a snapshot of 5 model hubs and a total of 15,913 PTM packages. These packages are represented in a uniform data schema for cross-hub mining. We describe prior uses of this data and suggest research opportunities for mining using our dataset.

We provide links to the PTM Dataset and PTM Torrent Source Code.
r
International Journal of Engineering and Advanced Technology Publication fee...
researchhelpdesk.org
Updated Jun 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Research Help Desk (2022). International Journal of Engineering and Advanced Technology Publication fee - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/publication-fee/552/international-journal-of-engineering-and-advanced-technology
Explore at:
Dataset updated
Jun 25, 2022
Dataset authored and provided by
Research Help Desk
Description
International Journal of Engineering and Advanced Technology Publication fee - ResearchHelpDesk - International Journal of Engineering and Advanced Technology (IJEAT) is having Online-ISSN 2249-8958, bi-monthly international journal, being published in the months of February, April, June, August, October, and December by Blue Eyes Intelligence Engineering & Sciences Publication (BEIESP) Bhopal (M.P.), India since the year 2011. It is academic, online, open access, double-blind, peer-reviewed international journal. It aims to publish original, theoretical and practical advances in Computer Science & Engineering, Information Technology, Electrical and Electronics Engineering, Electronics and Telecommunication, Mechanical Engineering, Civil Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. All submitted papers will be reviewed by the board of committee of IJEAT. Aim of IJEAT Journal disseminate original, scientific, theoretical or applied research in the field of Engineering and allied fields. dispense a platform for publishing results and research with a strong empirical component. aqueduct the significant gap between research and practice by promoting the publication of original, novel, industry-relevant research. seek original and unpublished research papers based on theoretical or experimental works for the publication globally. publish original, theoretical and practical advances in Computer Science & Engineering, Information Technology, Electrical and Electronics Engineering, Electronics and Telecommunication, Mechanical Engineering, Civil Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. impart a platform for publishing results and research with a strong empirical component. create a bridge for a significant gap between research and practice by promoting the publication of original, novel, industry-relevant research. solicit original and unpublished research papers, based on theoretical or experimental works. Scope of IJEAT International Journal of Engineering and Advanced Technology (IJEAT) covers all topics of all engineering branches. Some of them are Computer Science & Engineering, Information Technology, Electronics & Communication, Electrical and Electronics, Electronics and Telecommunication, Civil Engineering, Mechanical Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. The main topic includes but not limited to: 1. Smart Computing and Information Processing Signal and Speech Processing Image Processing and Pattern Recognition WSN Artificial Intelligence and machine learning Data mining and warehousing Data Analytics Deep learning Bioinformatics High Performance computing Advanced Computer networking Cloud Computing IoT Parallel Computing on GPU Human Computer Interactions 2. Recent Trends in Microelectronics and VLSI Design Process & Device Technologies Low-power design Nanometer-scale integrated circuits Application specific ICs (ASICs) FPGAs Nanotechnology Nano electronics and Quantum Computing 3. Challenges of Industry and their Solutions, Communications Advanced Manufacturing Technologies Artificial Intelligence Autonomous Robots Augmented Reality Big Data Analytics and Business Intelligence Cyber Physical Systems (CPS) Digital Clone or Simulation Industrial Internet of Things (IIoT) Manufacturing IOT Plant Cyber security Smart Solutions – Wearable Sensors and Smart Glasses System Integration Small Batch Manufacturing Visual Analytics Virtual Reality 3D Printing 4. Internet of Things (IoT) Internet of Things (IoT) & IoE & Edge Computing Distributed Mobile Applications Utilizing IoT Security, Privacy and Trust in IoT & IoE Standards for IoT Applications Ubiquitous Computing Block Chain-enabled IoT Device and Data Security and Privacy Application of WSN in IoT Cloud Resources Utilization in IoT Wireless Access Technologies for IoT Mobile Applications and Services for IoT Machine/ Deep Learning with IoT & IoE Smart Sensors and Internet of Things for Smart City Logic, Functional programming and Microcontrollers for IoT Sensor Networks, Actuators for Internet of Things Data Visualization using IoT IoT Application and Communication Protocol Big Data Analytics for Social Networking using IoT IoT Applications for Smart Cities Emulation and Simulation Methodologies for IoT IoT Applied for Digital Contents 5. Microwaves and Photonics Microwave filter Micro Strip antenna Microwave Link design Microwave oscillator Frequency selective surface Microwave Antenna Microwave Photonics Radio over fiber Optical communication Optical oscillator Optical Link design Optical phase lock loop Optical devices 6. Computation Intelligence and Analytics Soft Computing Advance Ubiquitous Computing Parallel Computing Distributed Computing Machine Learning Information Retrieval Expert Systems Data Mining Text Mining Data Warehousing Predictive Analysis Data Management Big Data Analytics Big Data Security 7. Energy Harvesting and Wireless Power Transmission Energy harvesting and transfer for wireless sensor networks Economics of energy harvesting communications Waveform optimization for wireless power transfer RF Energy Harvesting Wireless Power Transmission Microstrip Antenna design and application Wearable Textile Antenna Luminescence Rectenna 8. Advance Concept of Networking and Database Computer Network Mobile Adhoc Network Image Security Application Artificial Intelligence and machine learning in the Field of Network and Database Data Analytic High performance computing Pattern Recognition 9. Machine Learning (ML) and Knowledge Mining (KM) Regression and prediction Problem solving and planning Clustering Classification Neural information processing Vision and speech perception Heterogeneous and streaming data Natural language processing Probabilistic Models and Methods Reasoning and inference Marketing and social sciences Data mining Knowledge Discovery Web mining Information retrieval Design and diagnosis Game playing Streaming data Music Modelling and Analysis Robotics and control Multi-agent systems Bioinformatics Social sciences Industrial, financial and scientific applications of all kind 10. Advanced Computer networking Computational Intelligence Data Management, Exploration, and Mining Robotics Artificial Intelligence and Machine Learning Computer Architecture and VLSI Computer Graphics, Simulation, and Modelling Digital System and Logic Design Natural Language Processing and Machine Translation Parallel and Distributed Algorithms Pattern Recognition and Analysis Systems and Software Engineering Nature Inspired Computing Signal and Image Processing Reconfigurable Computing Cloud, Cluster, Grid and P2P Computing Biomedical Computing Advanced Bioinformatics Green Computing Mobile Computing Nano Ubiquitous Computing Context Awareness and Personalization, Autonomic and Trusted Computing Cryptography and Applied Mathematics Security, Trust and Privacy Digital Rights Management Networked-Driven Multicourse Chips Internet Computing Agricultural Informatics and Communication Community Information Systems Computational Economics, Digital Photogrammetric Remote Sensing, GIS and GPS Disaster Management e-governance, e-Commerce, e-business, e-Learning Forest Genomics and Informatics Healthcare Informatics Information Ecology and Knowledge Management Irrigation Informatics Neuro-Informatics Open Source: Challenges and opportunities Web-Based Learning: Innovation and Challenges Soft computing Signal and Speech Processing Natural Language Processing 11. Communications Microstrip Antenna Microwave Radar and Satellite Smart Antenna MIMO Antenna Wireless Communication RFID Network and Applications 5G Communication 6G Communication 12. Algorithms and Complexity Sequential, Parallel And Distributed Algorithms And Data Structures Approximation And Randomized Algorithms Graph Algorithms And Graph Drawing On-Line And Streaming Algorithms Analysis Of Algorithms And Computational Complexity Algorithm Engineering Web Algorithms Exact And Parameterized Computation Algorithmic Game Theory Computational Biology Foundations Of Communication Networks Computational Geometry Discrete Optimization 13. Software Engineering and Knowledge Engineering Software Engineering Methodologies Agent-based software engineering Artificial intelligence approaches to software engineering Component-based software engineering Embedded and ubiquitous software engineering Aspect-based software engineering Empirical software engineering Search-Based Software engineering Automated software design and synthesis Computer-supported cooperative work Automated software specification Reverse engineering Software Engineering Techniques and Production Perspectives Requirements engineering Software analysis, design and modelling Software maintenance and evolution Software engineering tools and environments Software engineering decision support Software design patterns Software product lines Process and workflow management Reflection and metadata approaches Program understanding and system maintenance Software domain modelling and analysis Software economics Multimedia and hypermedia software engineering Software engineering case study and experience reports Enterprise software, middleware, and tools Artificial intelligent methods, models, techniques Artificial life and societies Swarm intelligence Smart Spaces Autonomic computing and agent-based systems Autonomic computing Adaptive Systems Agent architectures, ontologies, languages and protocols Multi-agent systems Agent-based learning and knowledge discovery Interface agents Agent-based auctions and marketplaces Secure mobile and multi-agent systems Mobile agents SOA and Service-Oriented Systems Service-centric software engineering Service oriented requirements engineering Service oriented architectures Middleware for service based systems Service discovery and composition Service level
f
Data from: Epistemological grouping of published articles on big data...
scielo.figshare.com
png
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patricia Kuzmenko FURLAN; Fernando José Barbin LAURINDO (2023). Epistemological grouping of published articles on big data analytics [Dataset]. http://doi.org/10.6084/m9.figshare.5885407.v1
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5885407.v1
Dataset updated
Jun 2, 2023
Dataset provided by
SciELO journals
Authors
Patricia Kuzmenko FURLAN; Fernando José Barbin LAURINDO
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract The era of big data is yet a reality for businesses and individuals. In recent year, the academic literature exploring this field has grown rapidly. This article aimed to identify the main fields and features of the published papers about big data analytics. The methodological approach considered was a bibliometric research at the ISI Web of Science platform, whose focus was given to the big data management issues. It was possible to identify five distinct groups within the published papers: evolution of big data; management, business and strategy; human behavior and the social and cultural aspects; data mining and knowledge generation; Internet of Things. It was possible to conclude that big data corresponds to an emerging theme, which is not yet consolidated. There is a wide variation in the terms used, which influences the bibliographic searches. Therefore, as a complimentary contribution of this research, the main keywords used in such articles were identified, which contributes for bibliometric research of future studies.
r
A predictive model for opal exploration in Australia from a data mining...
researchdata.edu.au
Updated May 1, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomas Landgrebe; Thomas Landgrebe; Adriana Dutkiewicz; Dietmar Muller (2015). A predictive model for opal exploration in Australia from a data mining approach [Dataset]. http://doi.org/10.4227/11/5587A86C0FDF1
Explore at:
Unique identifier
https://doi.org/10.4227/11/5587A86C0FDF1
Dataset updated
May 1, 2015
Dataset provided by
The University of Sydney
Authors
Thomas Landgrebe; Thomas Landgrebe; Adriana Dutkiewicz; Dietmar Muller
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Area covered

Dataset funded by
Australian Research Council
Description
This data collection is associated with the publications: Merdith, A. S., Landgrebe, T. C. W., Dutkiewicz, A., & Müller, R. D. (2013). Towards a predictive model for opal exploration using a spatio-temporal data mining approach. Australian Journal of Earth Sciences, 60(2), 217-229. doi: 10.1080/08120099.2012.754793
and
Landgrebe, T. C. W., Merdith, A., Dutkiewicz, A., & Müller, R. D. (2013). Relationships between palaeogeography and opal occurrence in Australia: A data-mining approach. Computers & Geosciences, 56(0), 76-82. doi: 10.1016/j.cageo.2013.02.002
Publication Abstract - Merdith et al. (2013)
Opal is Australia's national gemstone, however most significant opal discoveries were made in the early 1900's - more than 100 years ago - until recently. Currently there is no formal exploration model for opal, meaning there are no widely accepted concepts or methodologies available to suggest where new opal fields may be found. As a consequence opal mining in Australia is a cottage industry with the majority of opal exploration focused around old opal fields. The EarthByte Group has developed a new opal exploration methodology for the Great Artesian Basin. The work is based on the concept of applying “big data mining” approaches to data sets relevant for identifying regions that are prospective for opal. The group combined a multitude of geological and geophysical data sets that were jointly analysed to establish associations between particular features in the data with known opal mining sites. A “training set” of known opal localities (1036 opal mines) was assembled, using those localities, which were featured in published reports and on maps. The data used include rock types, soil type, regolith type, topography, radiometric data and a stack of digital palaeogeographic maps. The different data layers were analysed via spatio-temporal data mining combining the GPlates PaleoGIS software (www.gplates.org) with the Orange data mining software (orange.biolab.si) to produce the first opal prospectivity map for the Great Artesian Basin. One of the main results of the study is that the geological conditions favourable for opal were found to be related to a particular sequence of surface environments over geological time. These conditions involved alternating shallow seas and river systems followed by uplift and erosion. The approach reduces the entire area of the Great Artesian Basin to a mere 6% that is deemed to be prospective for opal exploration. The work is described in two companion papers in the Australian Journal of Earth Sciences and Computers and Geosciences.
Publication Abstract - Landgrebe et al. (2013)
Age-coded multi-layered geological datasets are becoming increasingly prevalent with the surge in open-access geodata, yet there are few methodologies for extracting geological information and knowledge from these data. We present a novel methodology, based on the open-source GPlates software in which age-coded digital palaeogeographic maps are used to “data-mine” spatio-temporal patterns related to the occurrence of Australian opal. Our aim is to test the concept that only a particular sequence of depositional/erosional environments may lead to conditions suitable for the formation of gem quality sedimentary opal. Time-varying geographic environment properties are extracted from a digital palaeogeographic dataset of the eastern Australian Great Artesian Basin (GAB) at 1036 opal localities. We obtain a total of 52 independent ordinal sequences sampling 19 time slices from the Early Cretaceous to the present-day. We find that 95% of the known opal deposits are tied to only 27 sequences all comprising fluvial and shallow marine depositional sequences followed by a prolonged phase of erosion. We then map the total area of the GAB that matches these 27 opal-specific sequences, resulting in an opal-prospective region of only about 10% of the total area of the basin. The key patterns underlying this association involve only a small number of key environmental transitions. We demonstrate that these key associations are generally absent at arbitrary locations in the basin. This new methodology allows for the simplification of a complex time-varying geological dataset into a single map view, enabling straightforward application for opal exploration and for future co-assessment with other datasets/geological criteria. This approach may help unravel the poorly understood opal formation process using an empirical spatio-temporal data-mining methodology and readily available datasets to aid hypothesis testing.
Authors and Institutions
Andrew Merdith - EarthByte Research Group, School of Geosciences, The University of Sydney, Australia. ORCID: 0000-0002-7564-8149
Thomas Landgrebe - EarthByte Research Group, School of Geosciences, The University of Sydney, Australia
Adriana Dutkiewicz - EarthByte Research Group, School of Geosciences, The University of Sydney, Australia
R. Dietmar Müller - EarthByte Research Group, School of Geosciences, The University of Sydney, Australia. ORCID: 0000-0002-3334-5764
Overview of Resources Contained
This collection contains geological data from Australia used for data mining in the publications Merdith et al. (2013) and Landgrebe et al. (2013). The resulting maps of opal prospectivity are also included.
List of Resources
Note: For details on the files included in this data collection, see “Description_of_Resources.txt”.
Note: For information on file formats and what programs to use to interact with various file formats, see “File_Formats_and_Recommended_Programs.txt”.
Map of Barfield region, Australia (.jpg, 270 KB)
Map overviewing the Great Artesian basins and main opal mining camps (.png, 82 KB)
Maps showing opal prospectivity data mining results for different geological datasets (.tif, 23.1 MB)
Map of opal prospectivity from palaeogeography data mining (.pdf, 2.6 MB)
Raster of palaeogeography target regions for viewing in Google Earth (.jpg, 418 KB)
Opal mine locations (.gpml, .txt, .kmz, .shp, total 15.6 MB)
Map of opal prospectivity from all data mining results as a Google Earth overlay (.kmz, 12 KB)
Map of probability of opal occurrence in prospective regions from all data mining results (.tif, 5.9 MB)
Paleogeography of Australia (.gpml, .txt, .shp, total 114.2 MB)
Radiometric data showing potassium concentration contrasts (.tif, .kmz, total 311.3 MB)
Regolith data (.gpml, .txt, .kml, .shp, total 7.1 MB)
Soil type data (.gpml, .txt, .kml, .shp, total 7.1 MB)
For more information on this data collection, and links to other datasets from the EarthByte Research Group please visit EarthByte
For more information about using GPlates, including tutorials and a user manual please visit GPlates or EarthByte
The top 24 clusters of well-being research named using the dominant theme...
plos.figshare.com
xls
Updated Jun 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yousif Elsamani; Cristian Mejia; Yuya Kajikawa (2023). The top 24 clusters of well-being research named using the dominant theme with the most important quantitative data (number of articles, average publication year, top three journals, and number of articles in each journal) until 2021. [Dataset]. http://doi.org/10.1371/journal.pone.0280005.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0280005.t002
Dataset updated
Jun 21, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Yousif Elsamani; Cristian Mejia; Yuya Kajikawa
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The top 24 clusters of well-being research named using the dominant theme with the most important quantitative data (number of articles, average publication year, top three journals, and number of articles in each journal) until 2021.
m
Source Code
bridges.monash.edu
researchdata.edu.au
zip
Updated Oct 15, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chang Wei Tan (2017). Source Code [Dataset]. http://doi.org/10.4225/03/59e33dfb920f1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4225/03/59e33dfb920f1
Dataset updated
Oct 15, 2017
Dataset provided by
Monash University
Authors
Chang Wei Tan
License
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Description
This is the source code for the paper "Efficient search of the best warping window for Dynamic Time Warping".This work focused on fast learning/searching for the best warping window for Dynamic Time Warping and Time Series Classification.For more info, visit https://github.com/ChangWeiTan/FastWWSearch
s
Digital Data Analytics, Public Engagement and the Social Life of Methods
orda.shef.ac.uk
docx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Helen Kennedy; Giles Moss; Stylianos Moshanas; Chris Birchall (2023). Digital Data Analytics, Public Engagement and the Social Life of Methods [Dataset]. http://doi.org/10.15131/shef.data.5194993.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.15131/shef.data.5194993.v1
Dataset updated
May 30, 2023
Dataset provided by
The University of Sheffield
Authors
Helen Kennedy; Giles Moss; Stylianos Moshanas; Chris Birchall
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Interview and workshop transcripts from EPSRC Digital Transformations Communities and Cultures Network + (http://www.communitiesandculture.org/) project Digital Data Analytics, Public Engagement and the Social Life of Methods (http://www.communitiesandculture.org/projects/digital-data-analysis/). Methodology described in papers available at the above link.

Facebook

Twitter

Click to copy link

Link copied

Cite

Helen Kennedy (2023). Social Media Data Mining Becomes Ordinary [Dataset]. http://doi.org/10.15131/shef.data.5195032.v1

Data from: Social Media Data Mining Becomes Ordinary

Explore at:

docxAvailable download formats

Unique identifier

https://doi.org/10.15131/shef.data.5195032.v1

Dataset updated

May 30, 2023

Dataset provided by

The University of Sheffield

Authors

Helen Kennedy

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This research explored what happens when social media data mining becomes ordinary and is carried out by organisations that might be seen as the pillars of everyday life. The interviews on which the transcripts are based are discussed in Chapter 6 of the book. The referenced book contains a description of the methods. No other publications resulted from working with these transcripts.

Clear search

Close search

Google apps

Main menu

Data from: Social Media Data Mining Becomes Ordinary

Distributed Data Mining in Peer-to-Peer Networks

Additional file 1 of Novel methods of qualitative analysis for health policy...

Data from: Identifying Missing Data Handling Methods with Text Mining

Data from: Enhancing the Human Health Status Prediction: The ATHLOS Project

Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf

Journal of Big Data Impact Factor 2024-2025 - ResearchHelpDesk

Data from: Results obtained in a data mining process applied to a database...

COVID19FN

LSC (Leicester Scientific Corpus)

Online Feature Selection and Its Applications

Journal of Big Data FAQ - ResearchHelpDesk

Data from: Publication FP7 Funding Acknowledgment - PLOS OpenAIRE

Data from: PTMTorrent: A Dataset for Mining Open-source Pre-trained Model...

International Journal of Engineering and Advanced Technology Publication fee...

Data from: Epistemological grouping of published articles on big data...

A predictive model for opal exploration in Australia from a data mining...

and

Landgrebe, T. C. W., Merdith, A., Dutkiewicz, A., & Müller, R. D. (2013). Relationships between palaeogeography and opal occurrence in Australia: A data-mining approach. Computers & Geosciences, 56(0), 76-82. doi: 10.1016/j.cageo.2013.02.002

Publication Abstract - Merdith et al. (2013)

Publication Abstract - Landgrebe et al. (2013)

Authors and Institutions

Overview of Resources Contained

List of Resources

The top 24 clusters of well-being research named using the dominant theme...

Source Code

Digital Data Analytics, Public Engagement and the Social Life of Methods

Data from: Social Media Data Mining Becomes Ordinary