22 datasets found

n
LitMiner
neuinfo.org
scicrunch.org
+1more
Updated Feb 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). LitMiner [Dataset]. http://identifiers.org/RRID:SCR_008200
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_008200 https://identifiers.org/RRID:SCR_008200/resolver/mentions
Dataset updated
Feb 27, 2025
Description
THIS RESOURCE IS NO LONGER IN SERVICE, documented August 23, 2016. The LitMiner software is a literature data-mining tool that facilitates the identification of major gene regulation key players related to a user-defined field of interest in PubMed abstracts. The prediction of gene-regulatory relationships is based on co-occurrence analysis of key terms within the abstracts. LitMiner predicts relationships between key terms from the biomedical domain in four categories (genes, chemical compounds, diseases and tissues). The usefulness of the LitMiner system has been demonstrated recently in a study that reconstructed disease-related regulatory networks by promoter modeling that was initiated by a LitMiner generated primary gene list. To overcome the limitations and to verify and improve the data, we developed WikiGene, a Wiki-based curation tool that allows revision of the data by expert users over the Internet. It is based on the annotation of key terms in article abstracts followed by statistical co-citation analysis of annotated key terms in order to predict relationships. Key terms belonging to four different categories are used for the annotation process: -Genes: Names of genes and gene products. Gene name recognition is based on Ensembl . Synonyms and aliases are resolved. -Chemical Compounds: Names of chemical compounds and their respective aliases. -Diseases and Phenotypes: Names of diseases and phenotypes -Tissues and Organs: Names of tissues and organs LitMiner uses a database of disease and phenotype terms for literature annotation. Currently, there are 2225 diseases or phenotypes, 801 tissues and organs, and 10477 compounds in the database.
d
Data from: Surface Materials Data from Breccia-Pipe Uranium Mine and...
catalog.data.gov
data.usgs.gov
+1more
Updated Oct 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Surface Materials Data from Breccia-Pipe Uranium Mine and Reference Sites, Arizona, USA [Dataset]. https://catalog.data.gov/dataset/surface-materials-data-from-breccia-pipe-uranium-mine-and-reference-sites-arizona-usa
Explore at:
Dataset updated
Oct 30, 2025
Dataset provided by
U.S. Geological Survey
Area covered
United States, Arizona
Description
This data release includes elemental analysis of soil samples collected at breccia-pipe uranium mines, at one undeveloped breccia-pipe uranium deposit, and at a reference site in northern Arizona. Samples were collected near the Arizona 1, Canyon, Kanab North, and Pinenut uranium mines, over the EZ2 breccia-pipe uranium deposit, and at the Little Robinson Tank reference site. Samples were collected around the Arizona 1 mine after active mining had ceased during July 2015; around and within the mine yard at the Canyon mine during mine-development activity and before active mining occurred in June 2013; around and within the mine yard at the Kanab North mine during reclamation and before reclamation was completed in June 2016; around the Pinenut mine during active mining in October 2014; directly over the EZ2 deposit before any development activity occurred during November 2015; and at the Little Robinson Tank reference site during November 2015. This data release includes data for four different types of soil samples: (type 1) incremental soil samples where more than 30 equally-spaced subsamples were collected and composited over a limited areal extent termed a decision unit and depicted generally as a trapezoidal-shaped polygon mapped within a mine yard, or surrounding a mine site; (type 2) incremental soil samples where more than 30 subsamples were collected and composited over a roughly two dimensional linear or sinuous mapped pattern following roads also termed a decision unit; (type 3) discrete integrated soil samples (Bern and others, 2019 use the term “point” for these samples) where more than 30 subsamples were collected within fenced exclosures (generally about 3 meters square) containing Big Springs Number Eight dust sampling equipment; and (type 4) integrated soil samples comprised of at least 10 subsamples collected from underneath plywood cover boards used to collect herpetofauna. Incremental samples (types 1 and 2) were collected in triplicate from the soil surface from 0-5 centimeters (cm) depth using a Multi-Incremental Sampling Tool (MIST) collecting approximately the same volume for each subsample subject to slight variation due to variable soil conditions. The volume of soil represented by each type 1 and 2 sample is termed a decision unit (DU), the areal extent of which is defined by a mapped polygonal or sinuous or linear area, and the depth of which is the 5 cm that is sampled by the MIST. Each subsample of each triplicate incremental sample was passed through a 2-millimeter sieve and composited into a clean 19-liter bucket, with each completed triplicate sample transferred to double zip-top bags for transfer to the laboratory. Integrated samples (types 3 and 4) were collected using a plastic soil scoop to collect soil from 0-5 cm depth and were composited into double zip-top plastic bags for transfer to the laboratory. Data are divided into two different data tables based upon type: types 1 and 2 are in T1_DUSamples.csv; types 3 and 4 are in T2_BSNESamples.csv. The file DataDictionary_v1.csv defines all table headings and abbreviations. Sample preparation and analytical techniques are described in the metadata file. This data release also includes location information for the approximate center points of the incremental sample polygons and linear features (decision units) and for the discrete integrated samples. Note, locations for incremental samples for decision units (sample types 1 and 2) are the approximate center of the geographical area (polygon, linear, or sinuous feature) over which the sample was collected. As such, the elemental values represent average concentrations for the sample volume collected over the entire geographic area and depth of 0-5 centimeters of each decision unit, and do not represent concentrations that would be measured in a discrete sample collected at that central location.
LongAlpaca-Yukang ML Instructional Outputs
kaggle.com
zip
Updated Nov 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). LongAlpaca-Yukang ML Instructional Outputs [Dataset]. https://www.kaggle.com/datasets/thedevastator/longalpaca-yukang-ml-instructional-outputs
Explore at:
zip(168273444 bytes)Available download formats
Dataset updated
Nov 24, 2023
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
LongAlpaca-Yukang ML Instructional Outputs

Unlocking the Power of AI

By Huggingface Hub [source]

About this dataset

This dataset contains 12000 instructional outputs from LongAlpaca-Yukang Machine Learning system, unlocking the cutting-edge power of Artificial Intelligence for users. With this data, researchers have an abundance of information to explore the mysteries behind AI and how it works. This dataset includes columns such as output, instruction, file and input which provide endless possibilities of analysis ripe for you to discover! Teeming with potential insights into AI’s functioning and implications for our everyday lives, let this data be your guide in unravelling the many secrets yet to be discovered in the world of AI

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

Exploring the Dataset:

The dataset contains 12000 rows of information, with four columns containing output, instruction, file and input data. You can use these columns to explore the workings of a machine learning system, examine different instructional outputs for different inputs or instructions, study training data for specific ML systems, or analyze files being used by a machine learning system.

Visualizing Data:

Using built-in plotting tools within your chosen toolkit (such as Python), you can create powerful visualizations. Plotting outputs versus input instructions will give you an overview of what your machine learning system is capable of doing--and how it performs on different types of tasks or problems. You could also plot outputs along side files being used--this would help identify patterns in training data and identify areas that need improvement in your machine learning models.

Analyzing Performance:

Using statistical analysis techniques such as regressions or clustering algorithms, you can measure performance metrics such as accuracy and understand how they vary across instruction types. Experimenting with hyperparameter tuning may be helpful to see which settings yield better results for any given situation. Additionally correlations between inputs samples and output measurements can be examined so any relationships can be identified such as trends in accuracy over certain sets of instructions.

Drawing Conclusions:

By leveraging the power of big data mining tools, you are able to build comprehensive predictive models that allow us to project future outcomes based on past performance metric measurements from various instruction types fed into our system's datasets — allowing us determine if certain changes produce improve outcomes over time for our AI model’s capability & predictability!

Research Ideas

Developing self-improving Artificial Intelligence algorithms by using the outputs and instructional data to identify correlations and feedback loop structures between instructions and output results.

Generating Machine Learning simulations using this dataset to optimize AI performance based on given instruction set.

Using the instructions, input, and output data in the dataset to build AI systems for natural language processing, enabling comprehensive understanding of user queries and providing more accurate answers accordingly

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: train.csv | Column name | Description | |:--------------|:-------------------------------------------------------| | output | The output of the instruction given. (String) | | file | The file used when executing the instruction. (String) | | input | Additional context for the instruction. (String) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Huggingface Hub.
Market Basket Analysis
kaggle.com
zip
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
zip(23875170 bytes)Available download formats
Dataset updated
Dec 9, 2021
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
KDD-99 Original dataset
kaggle.com
zip
Updated Aug 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nagi (2025). KDD-99 Original dataset [Dataset]. https://www.kaggle.com/datasets/primus11/kdd-99-original-dataset
Explore at:
zip(19081776 bytes)Available download formats
Dataset updated
Aug 13, 2025
Authors
nagi
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
KDD Cup 1999 Dataset

The KDD Cup 1999 dataset is one of the earliest and most widely used benchmark datasets for network intrusion detection research.
It was created for the Third International Knowledge Discovery and Data Mining Tools Competition, hosted by the UCI KDD Archive, using network traffic captured in a simulated military environment at the MIT Lincoln Laboratory. The dataset contains both normal and malicious traffic, with attacks grouped into four main categories: Denial of Service (DoS), Probe, Remote to Local (R2L), and User to Root (U2R).

Key Characteristics

Simulated Traffic Environment: Network traffic was generated in a controlled environment to replicate a military network under attack.

Attack Categories:

DoS: e.g., smurf, neptune, teardrop

Probe: e.g., satan, nmap, ipsweep

R2L: e.g., guess_passwd, ftp_write, imap

U2R: e.g., buffer_overflow, rootkit, perl

Data Capture: Raw TCP dump data was processed into connection records.

Feature Extraction: Each record contains 41 features, including:

Basic features: Duration, protocol type, service, flag

Content features: Failed login counts, number of file creations

Traffic features: Connection counts within time windows, percentage of specific connections

Labeling: Each record is labeled as normal or as one of the specific attack types.

Data Volume: Around 4.9 million records in the full dataset; a 10% subset is also available.

Advantages

Established as a historical benchmark in IDS research.

Covers multiple attack categories for classification tasks.

Suitable for binary classification (normal vs. attack) and multi-class classification (attack type identification).

Limitations

Contains high redundancy (~78% repeated records) which can bias model performance.

Traffic patterns are outdated and may not reflect modern threats.

Imbalanced distribution of attack categories.

Usage

The KDD Cup 1999 dataset has been extensively used in academia for evaluating IDS algorithms due to its: - Large size and labeled structure - Multiple attack types - Historical significance in the development of intrusion detection systems
r
International Journal of Engineering and Advanced Technology FAQ -...
researchhelpdesk.org
Updated May 28, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Research Help Desk (2022). International Journal of Engineering and Advanced Technology FAQ - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/faq/552/international-journal-of-engineering-and-advanced-technology
Explore at:
Dataset updated
May 28, 2022
Dataset authored and provided by
Research Help Desk
Description
International Journal of Engineering and Advanced Technology FAQ - ResearchHelpDesk - International Journal of Engineering and Advanced Technology (IJEAT) is having Online-ISSN 2249-8958, bi-monthly international journal, being published in the months of February, April, June, August, October, and December by Blue Eyes Intelligence Engineering & Sciences Publication (BEIESP) Bhopal (M.P.), India since the year 2011. It is academic, online, open access, double-blind, peer-reviewed international journal. It aims to publish original, theoretical and practical advances in Computer Science & Engineering, Information Technology, Electrical and Electronics Engineering, Electronics and Telecommunication, Mechanical Engineering, Civil Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. All submitted papers will be reviewed by the board of committee of IJEAT. Aim of IJEAT Journal disseminate original, scientific, theoretical or applied research in the field of Engineering and allied fields. dispense a platform for publishing results and research with a strong empirical component. aqueduct the significant gap between research and practice by promoting the publication of original, novel, industry-relevant research. seek original and unpublished research papers based on theoretical or experimental works for the publication globally. publish original, theoretical and practical advances in Computer Science & Engineering, Information Technology, Electrical and Electronics Engineering, Electronics and Telecommunication, Mechanical Engineering, Civil Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. impart a platform for publishing results and research with a strong empirical component. create a bridge for a significant gap between research and practice by promoting the publication of original, novel, industry-relevant research. solicit original and unpublished research papers, based on theoretical or experimental works. Scope of IJEAT International Journal of Engineering and Advanced Technology (IJEAT) covers all topics of all engineering branches. Some of them are Computer Science & Engineering, Information Technology, Electronics & Communication, Electrical and Electronics, Electronics and Telecommunication, Civil Engineering, Mechanical Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. The main topic includes but not limited to: 1. Smart Computing and Information Processing Signal and Speech Processing Image Processing and Pattern Recognition WSN Artificial Intelligence and machine learning Data mining and warehousing Data Analytics Deep learning Bioinformatics High Performance computing Advanced Computer networking Cloud Computing IoT Parallel Computing on GPU Human Computer Interactions 2. Recent Trends in Microelectronics and VLSI Design Process & Device Technologies Low-power design Nanometer-scale integrated circuits Application specific ICs (ASICs) FPGAs Nanotechnology Nano electronics and Quantum Computing 3. Challenges of Industry and their Solutions, Communications Advanced Manufacturing Technologies Artificial Intelligence Autonomous Robots Augmented Reality Big Data Analytics and Business Intelligence Cyber Physical Systems (CPS) Digital Clone or Simulation Industrial Internet of Things (IIoT) Manufacturing IOT Plant Cyber security Smart Solutions – Wearable Sensors and Smart Glasses System Integration Small Batch Manufacturing Visual Analytics Virtual Reality 3D Printing 4. Internet of Things (IoT) Internet of Things (IoT) & IoE & Edge Computing Distributed Mobile Applications Utilizing IoT Security, Privacy and Trust in IoT & IoE Standards for IoT Applications Ubiquitous Computing Block Chain-enabled IoT Device and Data Security and Privacy Application of WSN in IoT Cloud Resources Utilization in IoT Wireless Access Technologies for IoT Mobile Applications and Services for IoT Machine/ Deep Learning with IoT & IoE Smart Sensors and Internet of Things for Smart City Logic, Functional programming and Microcontrollers for IoT Sensor Networks, Actuators for Internet of Things Data Visualization using IoT IoT Application and Communication Protocol Big Data Analytics for Social Networking using IoT IoT Applications for Smart Cities Emulation and Simulation Methodologies for IoT IoT Applied for Digital Contents 5. Microwaves and Photonics Microwave filter Micro Strip antenna Microwave Link design Microwave oscillator Frequency selective surface Microwave Antenna Microwave Photonics Radio over fiber Optical communication Optical oscillator Optical Link design Optical phase lock loop Optical devices 6. Computation Intelligence and Analytics Soft Computing Advance Ubiquitous Computing Parallel Computing Distributed Computing Machine Learning Information Retrieval Expert Systems Data Mining Text Mining Data Warehousing Predictive Analysis Data Management Big Data Analytics Big Data Security 7. Energy Harvesting and Wireless Power Transmission Energy harvesting and transfer for wireless sensor networks Economics of energy harvesting communications Waveform optimization for wireless power transfer RF Energy Harvesting Wireless Power Transmission Microstrip Antenna design and application Wearable Textile Antenna Luminescence Rectenna 8. Advance Concept of Networking and Database Computer Network Mobile Adhoc Network Image Security Application Artificial Intelligence and machine learning in the Field of Network and Database Data Analytic High performance computing Pattern Recognition 9. Machine Learning (ML) and Knowledge Mining (KM) Regression and prediction Problem solving and planning Clustering Classification Neural information processing Vision and speech perception Heterogeneous and streaming data Natural language processing Probabilistic Models and Methods Reasoning and inference Marketing and social sciences Data mining Knowledge Discovery Web mining Information retrieval Design and diagnosis Game playing Streaming data Music Modelling and Analysis Robotics and control Multi-agent systems Bioinformatics Social sciences Industrial, financial and scientific applications of all kind 10. Advanced Computer networking Computational Intelligence Data Management, Exploration, and Mining Robotics Artificial Intelligence and Machine Learning Computer Architecture and VLSI Computer Graphics, Simulation, and Modelling Digital System and Logic Design Natural Language Processing and Machine Translation Parallel and Distributed Algorithms Pattern Recognition and Analysis Systems and Software Engineering Nature Inspired Computing Signal and Image Processing Reconfigurable Computing Cloud, Cluster, Grid and P2P Computing Biomedical Computing Advanced Bioinformatics Green Computing Mobile Computing Nano Ubiquitous Computing Context Awareness and Personalization, Autonomic and Trusted Computing Cryptography and Applied Mathematics Security, Trust and Privacy Digital Rights Management Networked-Driven Multicourse Chips Internet Computing Agricultural Informatics and Communication Community Information Systems Computational Economics, Digital Photogrammetric Remote Sensing, GIS and GPS Disaster Management e-governance, e-Commerce, e-business, e-Learning Forest Genomics and Informatics Healthcare Informatics Information Ecology and Knowledge Management Irrigation Informatics Neuro-Informatics Open Source: Challenges and opportunities Web-Based Learning: Innovation and Challenges Soft computing Signal and Speech Processing Natural Language Processing 11. Communications Microstrip Antenna Microwave Radar and Satellite Smart Antenna MIMO Antenna Wireless Communication RFID Network and Applications 5G Communication 6G Communication 12. Algorithms and Complexity Sequential, Parallel And Distributed Algorithms And Data Structures Approximation And Randomized Algorithms Graph Algorithms And Graph Drawing On-Line And Streaming Algorithms Analysis Of Algorithms And Computational Complexity Algorithm Engineering Web Algorithms Exact And Parameterized Computation Algorithmic Game Theory Computational Biology Foundations Of Communication Networks Computational Geometry Discrete Optimization 13. Software Engineering and Knowledge Engineering Software Engineering Methodologies Agent-based software engineering Artificial intelligence approaches to software engineering Component-based software engineering Embedded and ubiquitous software engineering Aspect-based software engineering Empirical software engineering Search-Based Software engineering Automated software design and synthesis Computer-supported cooperative work Automated software specification Reverse engineering Software Engineering Techniques and Production Perspectives Requirements engineering Software analysis, design and modelling Software maintenance and evolution Software engineering tools and environments Software engineering decision support Software design patterns Software product lines Process and workflow management Reflection and metadata approaches Program understanding and system maintenance Software domain modelling and analysis Software economics Multimedia and hypermedia software engineering Software engineering case study and experience reports Enterprise software, middleware, and tools Artificial intelligent methods, models, techniques Artificial life and societies Swarm intelligence Smart Spaces Autonomic computing and agent-based systems Autonomic computing Adaptive Systems Agent architectures, ontologies, languages and protocols Multi-agent systems Agent-based learning and knowledge discovery Interface agents Agent-based auctions and marketplaces Secure mobile and multi-agent systems Mobile agents SOA and Service-Oriented Systems Service-centric software engineering Service oriented requirements engineering Service oriented architectures Middleware for service based systems Service discovery and composition Service level agreements (drafting,
d
Data from: Unveiling the impacts of land use on the phylogeography of...
search.dataone.org
datasetcatalog.nlm.nih.gov
+1more
Updated Jul 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriel Ernesto GarcÃa PeÃ±a; AndrÃ© VÃctor Rubio (2025). Unveiling the impacts of land use on the phylogeography of zoonotic New World Hantaviruses [Dataset]. http://doi.org/10.5061/dryad.rv15dv4fq
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.rv15dv4fq
Dataset updated
Jul 27, 2025
Dataset provided by
Dryad Digital Repository
Authors
Gabriel Ernesto GarcÃa PeÃ±a; AndrÃ© VÃctor Rubio
Time period covered
Jan 1, 2024
Description
Billions of genomic sequences are stored in public repositories (NCBI) as well as records of species occurrence (GBIF). By implementing analytical tools from different scientific disciplines, data mining on these databases can be a source of information to aid in the global surveillance of zoonotic pathogens that circulate among wildlife. We illustrate this by investigating the hantavirus-rodent system in the Americas, i.e. New World Hantaviruses (NWH). First we draw the circulation of pathogenic NWH among rodents; by inferring the phylogenetic links among 278 genomic samples of the S segment (N protein) of NWH found in 55 species of Cricetidae rodents. Second, machine learning was used to assess the impact of land use on the probability of presence of the rodent species linked with reservoirs of pathogenic hantaviruses. Our results show that hosts are widely present across the Americas. Some hosts are present in the primary forest and agricultural land, but not in the secondary forest;..., Data analysis follows 4 main steps:

Data Collection and Curation. GenBank Accesion Numbers of Hantavirus sequences were obtained from a BLAST query, metadata was collected, taxonomic names homogenized, and sequences found in wild animals were selected.

Genetic Sequence Alignment and Phylogenetic Inference. Genetic data was aligned and used to infer the phylogenetic relationships among the samples.

Phylogenetic Network analysis on the genetic links of Hantaviruses among hosts. Phylogenetic network was built from the phylogentic tree of New World Hantavirus.

Geographic analysis on the habitat suitability of hosts linked in the phylogenetic network. Habitat suitability within the distribution areas of each species was modeled with classification trees. Historical records on the species presence were used to assess the land use change in the time of sampling, and train a model to predict the presence of the species based on 12 land use variables. These models were used to predict the a..., , ## Unveiling the Impacts of Land Use on the Phylogeography of Zoonotic New World Hantaviruses

Gabriel E GarcÃa-PeÃ±a and AndrÃ© V. Rubio. Ecography 2024. DOI: 10.1111/ecog.06996

Supplementary Material

Description of the data and file structure

Analysis presented in the main article was performed in R (R Core Team 2022); MAFFT (Katoh 2005) and JModelTest2 (Darriba et al. 2012), following 4 main steps:

1. Data Collection and Curation.

BLAST_Nprot.csv : Accession numbers from the BLAST search for Hanatavirus. With this list of accesion numbers, it is possible to download the genetic sequences in R, by using the function read.GenBank() from the library ape. Metadata of these sequences can be accesed with the R code presented in the file: fetch.metadata.R (see code section).

2. Genetic Sequence Alignment and Phylogenetic Inference.

Nprot_MaxAlign.fas: Fasta file with Multiple sequence alignment of the genetic sequences. Fasta file can be read in R...
Forecast revenue big data market worldwide 2011-2027
statista.com
Updated Mar 15, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2018). Forecast revenue big data market worldwide 2011-2027 [Dataset]. https://www.statista.com/statistics/254266/global-big-data-market-forecast/
Explore at:
Dataset updated
Mar 15, 2018
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
Worldwide
Description
The global big data market is forecasted to grow to 103 billion U.S. dollars by 2027, more than double its expected market size in 2018. With a share of 45 percent, the software segment would become the large big data market segment by 2027. What is Big data? Big data is a term that refers to the kind of data sets that are too large or too complex for traditional data processing applications. It is defined as having one or some of the following characteristics: high volume, high velocity or high variety. Fast-growing mobile data traffic, cloud computing traffic, as well as the rapid development of technologies such as artificial intelligence (AI) and the Internet of Things (IoT) all contribute to the increasing volume and complexity of data sets. Big data analytics Advanced analytics tools, such as predictive analytics and data mining, help to extract value from the data and generate new business insights. The global big data and business analytics market was valued at 169 billion U.S. dollars in 2018 and is expected to grow to 274 billion U.S. dollars in 2022. As of November 2018, 45 percent of professionals in the market research industry reportedly used big data analytics as a research method.
r
International Journal of Engineering and Advanced Technology Acceptance Rate...
researchhelpdesk.org
Updated May 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Research Help Desk (2022). International Journal of Engineering and Advanced Technology Acceptance Rate - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/acceptance-rate/552/international-journal-of-engineering-and-advanced-technology
Explore at:
Dataset updated
May 1, 2022
Dataset authored and provided by
Research Help Desk
Description
International Journal of Engineering and Advanced Technology Acceptance Rate - ResearchHelpDesk - International Journal of Engineering and Advanced Technology (IJEAT) is having Online-ISSN 2249-8958, bi-monthly international journal, being published in the months of February, April, June, August, October, and December by Blue Eyes Intelligence Engineering & Sciences Publication (BEIESP) Bhopal (M.P.), India since the year 2011. It is academic, online, open access, double-blind, peer-reviewed international journal. It aims to publish original, theoretical and practical advances in Computer Science & Engineering, Information Technology, Electrical and Electronics Engineering, Electronics and Telecommunication, Mechanical Engineering, Civil Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. All submitted papers will be reviewed by the board of committee of IJEAT. Aim of IJEAT Journal disseminate original, scientific, theoretical or applied research in the field of Engineering and allied fields. dispense a platform for publishing results and research with a strong empirical component. aqueduct the significant gap between research and practice by promoting the publication of original, novel, industry-relevant research. seek original and unpublished research papers based on theoretical or experimental works for the publication globally. publish original, theoretical and practical advances in Computer Science & Engineering, Information Technology, Electrical and Electronics Engineering, Electronics and Telecommunication, Mechanical Engineering, Civil Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. impart a platform for publishing results and research with a strong empirical component. create a bridge for a significant gap between research and practice by promoting the publication of original, novel, industry-relevant research. solicit original and unpublished research papers, based on theoretical or experimental works. Scope of IJEAT International Journal of Engineering and Advanced Technology (IJEAT) covers all topics of all engineering branches. Some of them are Computer Science & Engineering, Information Technology, Electronics & Communication, Electrical and Electronics, Electronics and Telecommunication, Civil Engineering, Mechanical Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. The main topic includes but not limited to: 1. Smart Computing and Information Processing Signal and Speech Processing Image Processing and Pattern Recognition WSN Artificial Intelligence and machine learning Data mining and warehousing Data Analytics Deep learning Bioinformatics High Performance computing Advanced Computer networking Cloud Computing IoT Parallel Computing on GPU Human Computer Interactions 2. Recent Trends in Microelectronics and VLSI Design Process & Device Technologies Low-power design Nanometer-scale integrated circuits Application specific ICs (ASICs) FPGAs Nanotechnology Nano electronics and Quantum Computing 3. Challenges of Industry and their Solutions, Communications Advanced Manufacturing Technologies Artificial Intelligence Autonomous Robots Augmented Reality Big Data Analytics and Business Intelligence Cyber Physical Systems (CPS) Digital Clone or Simulation Industrial Internet of Things (IIoT) Manufacturing IOT Plant Cyber security Smart Solutions – Wearable Sensors and Smart Glasses System Integration Small Batch Manufacturing Visual Analytics Virtual Reality 3D Printing 4. Internet of Things (IoT) Internet of Things (IoT) & IoE & Edge Computing Distributed Mobile Applications Utilizing IoT Security, Privacy and Trust in IoT & IoE Standards for IoT Applications Ubiquitous Computing Block Chain-enabled IoT Device and Data Security and Privacy Application of WSN in IoT Cloud Resources Utilization in IoT Wireless Access Technologies for IoT Mobile Applications and Services for IoT Machine/ Deep Learning with IoT & IoE Smart Sensors and Internet of Things for Smart City Logic, Functional programming and Microcontrollers for IoT Sensor Networks, Actuators for Internet of Things Data Visualization using IoT IoT Application and Communication Protocol Big Data Analytics for Social Networking using IoT IoT Applications for Smart Cities Emulation and Simulation Methodologies for IoT IoT Applied for Digital Contents 5. Microwaves and Photonics Microwave filter Micro Strip antenna Microwave Link design Microwave oscillator Frequency selective surface Microwave Antenna Microwave Photonics Radio over fiber Optical communication Optical oscillator Optical Link design Optical phase lock loop Optical devices 6. Computation Intelligence and Analytics Soft Computing Advance Ubiquitous Computing Parallel Computing Distributed Computing Machine Learning Information Retrieval Expert Systems Data Mining Text Mining Data Warehousing Predictive Analysis Data Management Big Data Analytics Big Data Security 7. Energy Harvesting and Wireless Power Transmission Energy harvesting and transfer for wireless sensor networks Economics of energy harvesting communications Waveform optimization for wireless power transfer RF Energy Harvesting Wireless Power Transmission Microstrip Antenna design and application Wearable Textile Antenna Luminescence Rectenna 8. Advance Concept of Networking and Database Computer Network Mobile Adhoc Network Image Security Application Artificial Intelligence and machine learning in the Field of Network and Database Data Analytic High performance computing Pattern Recognition 9. Machine Learning (ML) and Knowledge Mining (KM) Regression and prediction Problem solving and planning Clustering Classification Neural information processing Vision and speech perception Heterogeneous and streaming data Natural language processing Probabilistic Models and Methods Reasoning and inference Marketing and social sciences Data mining Knowledge Discovery Web mining Information retrieval Design and diagnosis Game playing Streaming data Music Modelling and Analysis Robotics and control Multi-agent systems Bioinformatics Social sciences Industrial, financial and scientific applications of all kind 10. Advanced Computer networking Computational Intelligence Data Management, Exploration, and Mining Robotics Artificial Intelligence and Machine Learning Computer Architecture and VLSI Computer Graphics, Simulation, and Modelling Digital System and Logic Design Natural Language Processing and Machine Translation Parallel and Distributed Algorithms Pattern Recognition and Analysis Systems and Software Engineering Nature Inspired Computing Signal and Image Processing Reconfigurable Computing Cloud, Cluster, Grid and P2P Computing Biomedical Computing Advanced Bioinformatics Green Computing Mobile Computing Nano Ubiquitous Computing Context Awareness and Personalization, Autonomic and Trusted Computing Cryptography and Applied Mathematics Security, Trust and Privacy Digital Rights Management Networked-Driven Multicourse Chips Internet Computing Agricultural Informatics and Communication Community Information Systems Computational Economics, Digital Photogrammetric Remote Sensing, GIS and GPS Disaster Management e-governance, e-Commerce, e-business, e-Learning Forest Genomics and Informatics Healthcare Informatics Information Ecology and Knowledge Management Irrigation Informatics Neuro-Informatics Open Source: Challenges and opportunities Web-Based Learning: Innovation and Challenges Soft computing Signal and Speech Processing Natural Language Processing 11. Communications Microstrip Antenna Microwave Radar and Satellite Smart Antenna MIMO Antenna Wireless Communication RFID Network and Applications 5G Communication 6G Communication 12. Algorithms and Complexity Sequential, Parallel And Distributed Algorithms And Data Structures Approximation And Randomized Algorithms Graph Algorithms And Graph Drawing On-Line And Streaming Algorithms Analysis Of Algorithms And Computational Complexity Algorithm Engineering Web Algorithms Exact And Parameterized Computation Algorithmic Game Theory Computational Biology Foundations Of Communication Networks Computational Geometry Discrete Optimization 13. Software Engineering and Knowledge Engineering Software Engineering Methodologies Agent-based software engineering Artificial intelligence approaches to software engineering Component-based software engineering Embedded and ubiquitous software engineering Aspect-based software engineering Empirical software engineering Search-Based Software engineering Automated software design and synthesis Computer-supported cooperative work Automated software specification Reverse engineering Software Engineering Techniques and Production Perspectives Requirements engineering Software analysis, design and modelling Software maintenance and evolution Software engineering tools and environments Software engineering decision support Software design patterns Software product lines Process and workflow management Reflection and metadata approaches Program understanding and system maintenance Software domain modelling and analysis Software economics Multimedia and hypermedia software engineering Software engineering case study and experience reports Enterprise software, middleware, and tools Artificial intelligent methods, models, techniques Artificial life and societies Swarm intelligence Smart Spaces Autonomic computing and agent-based systems Autonomic computing Adaptive Systems Agent architectures, ontologies, languages and protocols Multi-agent systems Agent-based learning and knowledge discovery Interface agents Agent-based auctions and marketplaces Secure mobile and multi-agent systems Mobile agents SOA and Service-Oriented Systems Service-centric software engineering Service oriented requirements engineering Service oriented architectures Middleware for service based systems Service discovery and composition Service level
Product data mining: entity classification&linking
kaggle.com
zip
Updated Jul 13, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
zzhang (2020). Product data mining: entity classification&linking [Dataset]. https://www.kaggle.com/ziqizhang/product-data-miningentity-classificationlinking
Explore at:
zip(10933 bytes)Available download formats
Dataset updated
Jul 13, 2020
Authors
zzhang
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
IMPORTANT: Round 1 results are now released, check our website for the leaderboard. We now open Round 2 submissions!

1. Overview

We release two datasets that are part of the the Semantic Web Challenge on Mining the Web of HTML-embedded Product Data is co-located with the 19th International Semantic Web Conference (https://iswc2020.semanticweb.org/, 2-6 Nov 2020 at Athens, Greece). The datasets belong to two shared tasks related to product data mining on the Web: (1) product matching (linking) and (2) product classification. This event is organised by The University of Sheffield, The University of Mannheim and Amazon, and is open to anyone. Systems successfully beating the baseline of the respective task, will be invited to write a paper describing their method and system and present the method as a poster (and potentially also a short talk) at the ISWC2020 conference. Winners of each task will be awarded 500 euro as prize (partly sponsored by Peak Indicators, https://www.peakindicators.com/).

2. Task and dataset brief

The challenge organises two tasks, product matching and product categorisation.

i) Product Matching deals with identifying product offers on different websites that refer to the same real-world product (e.g., the same iPhone X model offered using different names/offer titles as well as different descriptions on various websites). A multi-million product offer corpus (16M) containing product offer clusters is released for the generation of training data. A validation set containing 1.1K offer pairs and a test set of 600 offer pairs will also be released. The goal of this task is to classify if the offer pairs in these datasets are match (i.e., referring to the same product) or non-match.

ii) Product classification deals with assigning predefined product category labels (which can be multiple levels) to product instances (e.g., iPhone X is a ‘SmartPhone’, and also ‘Electronics’). A training dataset containing 10K product offers, a validation set of 3K product offers and a test set of 3K product offers will be released. Each dataset contains product offers with their metadata (e.g., name, description, URL) and three classification labels each corresponding to a level in the GS1 Global Product Classification taxonomy. The goal is to classify these product offers into the pre-defined category labels.

All datasets are built based on structured data that was extracted from the Common Crawl (https://commoncrawl.org/) by the Web Data Commons project (http://webdatacommons.org/). Datasets can be found at: https://ir-ischool-uos.github.io/mwpd/

3. Resources and tools

The challenge will also release utility code (in Python) for processing the above datasets and scoring the system outputs. In addition, the following language resources for product-related data mining tasks: A text corpus of 150 million product offer descriptions Word embeddings trained on the above corpus

4. Challenge website

For details of the challenge please visit https://ir-ischool-uos.github.io/mwpd/

5. Organizing committee

Dr Ziqi Zhang (Information School, The University of Sheffield) Prof. Christian Bizer (Institute of Computer Science and Business Informatics, The Mannheim University) Dr Haiping Lu (Department of Computer Science, The University of Sheffield) Dr Jun Ma (Amazon Inc. Seattle, US) Prof. Paul Clough (Information School, The University of Sheffield & Peak Indicators) Ms Anna Primpeli (Institute of Computer Science and Business Informatics, The Mannheim University) Mr Ralph Peeters (Institute of Computer Science and Business Informatics, The Mannheim University) Mr. Abdulkareem Alqusair (Information School, The University of Sheffield)

6. Contact

To contact the organising committee please use the Google discussion group https://groups.google.com/forum/#!forum/mwpd2020
HTRU2
figshare.com
zip
Updated Apr 1, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert Lyon (2016). HTRU2 [Dataset]. http://doi.org/10.6084/m9.figshare.3080389.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3080389.v1
Dataset updated
Apr 1, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
Robert Lyon
License
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Description
Overview HTRU2 is a data set which describes a sample of pulsar candidates collected during the High Time Resolution Universe Survey (South) [1]. Pulsars are a rare type of Neutron star that produce radio emission detectable here on Earth. They are of considerable scientific interest as probes of space-time, the inter-stellar medium, and states of matter (see [2] for more uses). As pulsars rotate, their emission beam sweeps across the sky, and when this crosses our line of sight, produces a detectable pattern of broadband radio emission. As pulsars rotate rapidly, this pattern repeats periodically. Thus pulsar search involves looking for periodic radio signals with large radio telescopes. Each pulsar produces a slightly different emission pattern, which varies slightly with each rotation (see [2] for an introduction to pulsar astrophysics to find out why). Thus a potential signal detection known as a 'candidate', is averaged over many rotations of the pulsar, as determined by the length of an observation. In the absence of additional info, each candidate could potentially describe a real pulsar. However in practice almost all detections are caused by radio frequency interference (RFI) and noise, making legitimate signals hard to find. Machine learning tools are now being used to automatically label pulsar candidates to facilitate rapid analysis. Classification systems in particular are being widely adopted, (see [4,5,6,7,8,9]) which treat the candidate data sets as binary classification problems. Here the legitimate pulsar examples are a minority positive class, and spurious examples the majority negative class. At present multi-class labels are unavailable, given the costs associated with data annotation. The data set shared here contains 16,259 spurious examples caused by RFI/noise, and 1,639 real pulsar examples. These examples have all been checked by human annotators. Each candidate is described by 8 continuous variables. The first four are simple statistics obtained from the integrated pulse profile (folded profile). This is an array of continuous variables that describe a longitude-resolved version of the signal that has been averaged in both time and frequency (see [3] for more details). The remaining four variables are similarly obtained from the DM-SNR curve (again see [3] for more details). These are summarised below: 1. Mean of the integrated profile. 2. Standard deviation of the integrated profile. 3. Excess kurtosis of the integrated profile. 4. Skewness of the integrated profile. 5. Mean of the DM-SNR curve. 6. Standard deviation of the DM-SNR curve. 7. Excess kurtosis of the DM-SNR curve. 8. Skewness of the DM-SNR curve. HTRU 2 Summary 17,898 total examples. 1,639 positive examples. 16,259 negative examples. The data is presented in two formats: CSV and ARFF (used by the WEKA data mining tool). Candidates are stored in both files in separate rows. Each row lists the variables first, and the class label is the final entry. The class labels used are 0 (negative) and 1 (positive). Please note that the data contains no positional information or other astronomical details. It is simply feature data extracted from candidate files using the PulsarFeatureLab tool (see [10]).2. Citing our work If you use the dataset in your work please cite us using the DOI of the dataset, and the paper: R. J. Lyon, B. W. Stappers, S. Cooper, J. M. Brooke, J. D. Knowles, Fifty Years of Pulsar Candidate Selection: From simple filters to a new principled real-time classification approach MNRAS, 2016. 3. Acknowledgements This data was obtained with the support of grant EP/I028099/1 for the University of Manchester Centre for Doctoral Training in Computer Science, from the UK Engineering and Physical Sciences Research Council (EPSRC). The raw observational data was collected by the High Time Resolution Universe Collaboration using the Parkes Observatory, funded by the Commonwealth of Australia and managed by the CSIRO. 4. References [1] M.~J. Keith et al., "The High Time Resolution Universe Pulsar Survey - I. System Configuration and Initial Discoveries",2010, Monthly Notices of the Royal Astronomical Society, vol. 409, pp. 619-627. DOI: 10.1111/j.1365-2966.2010.17325.x [2] D. R. Lorimer and M. Kramer, "Handbook of Pulsar Astronomy", Cambridge University Press, 2005. [3] R. J. Lyon, "Why Are Pulsars Hard To Find?", PhD Thesis, University of Manchester, 2015. [4] R. J. Lyon et al., "Fifty Years of Pulsar Candidate Selection: From simple filters to a new principled real-time classification approach", Monthly Notices of the Royal Astronomical Society, submitted. [5] R. P. Eatough et al., "Selection of radio pulsar candidates using artificial neural networks", Monthly Notices of the Royal Astronomical Society, vol. 407, no. 4, pp. 2443-2450, 2010. [6] S. D. Bates et al., "The high time resolution universe pulsar survey vi. an artificial neural network and timing of 75 pulsars", Monthly Notices of the Royal Astronomical Society, vol. 427, no. 2, pp. 1052-1065, 2012. [7] D. Thornton, "The High Time Resolution Radio Sky", PhD thesis, University of Manchester, Jodrell Bank Centre for Astrophysics School of Physics and Astronomy, 2013. [8] K. J. Lee et al., "PEACE: pulsar evaluation algorithm for candidate extraction a software package for post-analysis processing of pulsar survey candidates", Monthly Notices of the Royal Astronomical Society, vol. 433, no. 1, pp. 688-694, 2013. [9] V. Morello et al., "SPINN: a straightforward machine learning solution to the pulsar candidate selection problem", Monthly Notices of the Royal Astronomical Society, vol. 443, no. 2, pp. 1651-1662, 2014. [10] R. J. Lyon, "PulsarFeatureLab", 2015, https://dx.doi.org/10.6084/m9.figshare.1536472.v1.
Aggregated score derived from: Accuracy, F-measure, Matthews correlation...
figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shib Sankar Bhowmick; Indrajit Saha; Debotosh Bhattacharjee; Loredana M. Genovese; Filippo Geraci (2023). Aggregated score derived from: Accuracy, F-measure, Matthews correlation coefficient (MCC) and Area under the curve (AUC) for SVM classification after different feature selection methods. [Dataset]. http://doi.org/10.1371/journal.pone.0200353.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0200353.t004
Dataset updated
Jun 3, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Shib Sankar Bhowmick; Indrajit Saha; Debotosh Bhattacharjee; Loredana M. Genovese; Filippo Geraci
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
AS is bounded in the range of [0, 4].
E
The Orange workflow for observing collocation trends ColTrend 1.0
live.european-language-grid.eu
Updated Oct 25, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). The Orange workflow for observing collocation trends ColTrend 1.0 [Dataset]. https://live.european-language-grid.eu/catalogue/tool-service/20150
Explore at:
Dataset updated
Oct 25, 2020
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The Orange workflow for observing collocation trends ColTrend 1.0

ColTrend is a workflow (.OWS file) for Orange Data Mining (an open-source machine learning and data visualization software: https://orangedatamining.com/) that allows the user to observe temporal collocation trends in corpora. The workflow consists of a series of Python scripts, data filters, and visualizers.

As input, the workflow takes a .CSV file with data on collocations and their relative frequencies by year of publication extracted from a corpus. As output, it provides a .TSV file containing the same data (or a filtered selection thereof) enriched with four measures that indicate the collocation’s temporal trend in the corpus: (1) the slope (k) of a linear regression model fitted to the frequency data, which indicates whether the frequency of use of the collocation is increasing or declining; (2) the coefficient of determination (R2) of the linear regression model, indicating how linear the change in the collocation’s use is; (3) the ratio (m) of maximum relative frequency and average relative frequency, which indicates peaks in collocation usage; and (4) the coefficient of recent growth (t), which indicates an increased usage of the collocation in the last three years of the observed corpus data.

The entry also contains three .CSV files that can be used to test the workflow. The files contain collocation candidates (along with their relative frequencies per year of publication) extracted from the Gigafida 2.0 Corpus of Written Slovene (https://viri.cjvt.si/gigafida/) with three different syntactic structures (as defined in http://hdl.handle.net/11356/1415): 1) p0-s0 (adjective + noun, e.g. rezervni sklad), 2) s0-s2 (noun + noun in the genitive case, e.g. ukinitev lastnine), and 3) gg-s4 (verb + noun in the accusative case, e.g. pripraviti besedilo).

It should be noted that only collocation candidates with absolute frequency of 15 and above were extracted.

Please note that the ColTrend workflow requires the installation of the Text Mining add-on for Orange. For installation instructions as well as a more detailed description of the different phases of the workflow and the measures used to observe the collocation trends, please consult the README file.
r
International Journal of Engineering and Advanced Technology Impact Factor...
researchhelpdesk.org
Updated Feb 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Research Help Desk (2022). International Journal of Engineering and Advanced Technology Impact Factor 2024-2025 - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/impact-factor-if/552/international-journal-of-engineering-and-advanced-technology
Explore at:
Dataset updated
Feb 23, 2022
Dataset authored and provided by
Research Help Desk
Description
International Journal of Engineering and Advanced Technology Impact Factor 2024-2025 - ResearchHelpDesk - International Journal of Engineering and Advanced Technology (IJEAT) is having Online-ISSN 2249-8958, bi-monthly international journal, being published in the months of February, April, June, August, October, and December by Blue Eyes Intelligence Engineering & Sciences Publication (BEIESP) Bhopal (M.P.), India since the year 2011. It is academic, online, open access, double-blind, peer-reviewed international journal. It aims to publish original, theoretical and practical advances in Computer Science & Engineering, Information Technology, Electrical and Electronics Engineering, Electronics and Telecommunication, Mechanical Engineering, Civil Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. All submitted papers will be reviewed by the board of committee of IJEAT. Aim of IJEAT Journal disseminate original, scientific, theoretical or applied research in the field of Engineering and allied fields. dispense a platform for publishing results and research with a strong empirical component. aqueduct the significant gap between research and practice by promoting the publication of original, novel, industry-relevant research. seek original and unpublished research papers based on theoretical or experimental works for the publication globally. publish original, theoretical and practical advances in Computer Science & Engineering, Information Technology, Electrical and Electronics Engineering, Electronics and Telecommunication, Mechanical Engineering, Civil Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. impart a platform for publishing results and research with a strong empirical component. create a bridge for a significant gap between research and practice by promoting the publication of original, novel, industry-relevant research. solicit original and unpublished research papers, based on theoretical or experimental works. Scope of IJEAT International Journal of Engineering and Advanced Technology (IJEAT) covers all topics of all engineering branches. Some of them are Computer Science & Engineering, Information Technology, Electronics & Communication, Electrical and Electronics, Electronics and Telecommunication, Civil Engineering, Mechanical Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. The main topic includes but not limited to: 1. Smart Computing and Information Processing Signal and Speech Processing Image Processing and Pattern Recognition WSN Artificial Intelligence and machine learning Data mining and warehousing Data Analytics Deep learning Bioinformatics High Performance computing Advanced Computer networking Cloud Computing IoT Parallel Computing on GPU Human Computer Interactions 2. Recent Trends in Microelectronics and VLSI Design Process & Device Technologies Low-power design Nanometer-scale integrated circuits Application specific ICs (ASICs) FPGAs Nanotechnology Nano electronics and Quantum Computing 3. Challenges of Industry and their Solutions, Communications Advanced Manufacturing Technologies Artificial Intelligence Autonomous Robots Augmented Reality Big Data Analytics and Business Intelligence Cyber Physical Systems (CPS) Digital Clone or Simulation Industrial Internet of Things (IIoT) Manufacturing IOT Plant Cyber security Smart Solutions – Wearable Sensors and Smart Glasses System Integration Small Batch Manufacturing Visual Analytics Virtual Reality 3D Printing 4. Internet of Things (IoT) Internet of Things (IoT) & IoE & Edge Computing Distributed Mobile Applications Utilizing IoT Security, Privacy and Trust in IoT & IoE Standards for IoT Applications Ubiquitous Computing Block Chain-enabled IoT Device and Data Security and Privacy Application of WSN in IoT Cloud Resources Utilization in IoT Wireless Access Technologies for IoT Mobile Applications and Services for IoT Machine/ Deep Learning with IoT & IoE Smart Sensors and Internet of Things for Smart City Logic, Functional programming and Microcontrollers for IoT Sensor Networks, Actuators for Internet of Things Data Visualization using IoT IoT Application and Communication Protocol Big Data Analytics for Social Networking using IoT IoT Applications for Smart Cities Emulation and Simulation Methodologies for IoT IoT Applied for Digital Contents 5. Microwaves and Photonics Microwave filter Micro Strip antenna Microwave Link design Microwave oscillator Frequency selective surface Microwave Antenna Microwave Photonics Radio over fiber Optical communication Optical oscillator Optical Link design Optical phase lock loop Optical devices 6. Computation Intelligence and Analytics Soft Computing Advance Ubiquitous Computing Parallel Computing Distributed Computing Machine Learning Information Retrieval Expert Systems Data Mining Text Mining Data Warehousing Predictive Analysis Data Management Big Data Analytics Big Data Security 7. Energy Harvesting and Wireless Power Transmission Energy harvesting and transfer for wireless sensor networks Economics of energy harvesting communications Waveform optimization for wireless power transfer RF Energy Harvesting Wireless Power Transmission Microstrip Antenna design and application Wearable Textile Antenna Luminescence Rectenna 8. Advance Concept of Networking and Database Computer Network Mobile Adhoc Network Image Security Application Artificial Intelligence and machine learning in the Field of Network and Database Data Analytic High performance computing Pattern Recognition 9. Machine Learning (ML) and Knowledge Mining (KM) Regression and prediction Problem solving and planning Clustering Classification Neural information processing Vision and speech perception Heterogeneous and streaming data Natural language processing Probabilistic Models and Methods Reasoning and inference Marketing and social sciences Data mining Knowledge Discovery Web mining Information retrieval Design and diagnosis Game playing Streaming data Music Modelling and Analysis Robotics and control Multi-agent systems Bioinformatics Social sciences Industrial, financial and scientific applications of all kind 10. Advanced Computer networking Computational Intelligence Data Management, Exploration, and Mining Robotics Artificial Intelligence and Machine Learning Computer Architecture and VLSI Computer Graphics, Simulation, and Modelling Digital System and Logic Design Natural Language Processing and Machine Translation Parallel and Distributed Algorithms Pattern Recognition and Analysis Systems and Software Engineering Nature Inspired Computing Signal and Image Processing Reconfigurable Computing Cloud, Cluster, Grid and P2P Computing Biomedical Computing Advanced Bioinformatics Green Computing Mobile Computing Nano Ubiquitous Computing Context Awareness and Personalization, Autonomic and Trusted Computing Cryptography and Applied Mathematics Security, Trust and Privacy Digital Rights Management Networked-Driven Multicourse Chips Internet Computing Agricultural Informatics and Communication Community Information Systems Computational Economics, Digital Photogrammetric Remote Sensing, GIS and GPS Disaster Management e-governance, e-Commerce, e-business, e-Learning Forest Genomics and Informatics Healthcare Informatics Information Ecology and Knowledge Management Irrigation Informatics Neuro-Informatics Open Source: Challenges and opportunities Web-Based Learning: Innovation and Challenges Soft computing Signal and Speech Processing Natural Language Processing 11. Communications Microstrip Antenna Microwave Radar and Satellite Smart Antenna MIMO Antenna Wireless Communication RFID Network and Applications 5G Communication 6G Communication 12. Algorithms and Complexity Sequential, Parallel And Distributed Algorithms And Data Structures Approximation And Randomized Algorithms Graph Algorithms And Graph Drawing On-Line And Streaming Algorithms Analysis Of Algorithms And Computational Complexity Algorithm Engineering Web Algorithms Exact And Parameterized Computation Algorithmic Game Theory Computational Biology Foundations Of Communication Networks Computational Geometry Discrete Optimization 13. Software Engineering and Knowledge Engineering Software Engineering Methodologies Agent-based software engineering Artificial intelligence approaches to software engineering Component-based software engineering Embedded and ubiquitous software engineering Aspect-based software engineering Empirical software engineering Search-Based Software engineering Automated software design and synthesis Computer-supported cooperative work Automated software specification Reverse engineering Software Engineering Techniques and Production Perspectives Requirements engineering Software analysis, design and modelling Software maintenance and evolution Software engineering tools and environments Software engineering decision support Software design patterns Software product lines Process and workflow management Reflection and metadata approaches Program understanding and system maintenance Software domain modelling and analysis Software economics Multimedia and hypermedia software engineering Software engineering case study and experience reports Enterprise software, middleware, and tools Artificial intelligent methods, models, techniques Artificial life and societies Swarm intelligence Smart Spaces Autonomic computing and agent-based systems Autonomic computing Adaptive Systems Agent architectures, ontologies, languages and protocols Multi-agent systems Agent-based learning and knowledge discovery Interface agents Agent-based auctions and marketplaces Secure mobile and multi-agent systems Mobile agents SOA and Service-Oriented Systems Service-centric software engineering Service oriented requirements engineering Service oriented architectures Middleware for service based systems Service discovery and composition Service level
Additional file 1 of MGS2AMR: a gene-centric mining of metagenomic...
springernature.figshare.com
zip
Updated Aug 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pieter-Jan Van Camp; V. B. Surya Prasath; David B. Haslam; Aleksey Porollo (2024). Additional file 1 of MGS2AMR: a gene-centric mining of metagenomic sequencing data for pathogens and their antimicrobial resistance profile [Dataset]. http://doi.org/10.6084/m9.figshare.24309905.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24309905.v1
Dataset updated
Aug 16, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Pieter-Jan Van Camp; V. B. Surya Prasath; David B. Haslam; Aleksey Porollo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Additional file 1: Fig. S1. Resolving shortest paths with loops in GFA. Green segment is the start and end of the loop. 1. Loop that begins and ends on the different sides of the start-segment. Resolved by generating two paths (A,B,C,D) and (A,D,C,B). Note that the sequence direction of A differs in two paths. 2. Loop that begins and ends on the same end of the start-segment. Resolved similar to Loop 1, but the direction of A is identical in both paths. 3. Hairpin loop with repeated segments A, B and C. Resolved by creating two paths (A,B,C,D,E,F) and (A,B,C,F,E,D). 4. Hairpin loop with different start- (A) and end- (H) segments. Resolved by removing all path data (G and H) after the repeated segment (C), reducing the problem to the hairpin loop in example 3 with the same solutions: (A,B,C,D,E,F) and (A,B,C,F,E,D). Fig. S2. Example of the evaluation of homology matches. The seed segments of ARG1 and ARG2 both match a reference genome at the same position, indicating they refer to the same ARG. The position of segment 4 in the reference genome does not align with the expected distance from the ARG as represented in the GFA of ARG 1 suggesting it likely represents a false positive match, and therefore will be excluded from further analysis. Fig. S3. Bacteria associated with the 6 bacteria used in validation. This heatmap shows which bacterial sequences (both genome or plasmid) also tend to score high when the known presence is one of the 6 used in validation. It reflects the uncertainty that comes with bacterial calling in metagenomics. Fig. S4. MGS2AMR run time and memory usage for 5 benchmarking samples. All tools were allowed to use up to 8 CPUs. The numbers 1 through 5 refer to the file ID in Table S3. The four main pipeline steps are denoted as follows: A. MetaCherchant (existing tool). B. The MetaCherchant output pre-processing for BLAST (novel R scripts). C. BLAST+ (existing tool) D. ARG annotation (novel R scripts). Note that the large leap in memory for BLASTn is nearly entirely explained by having to load the nucleotide database into memory (~150 GB).

Spreads Market Analysis, Size, and Forecast 2024-2028: Europe (France,...

technavio.com

pdf

Updated Oct 17, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Technavio (2024). Spreads Market Analysis, Size, and Forecast 2024-2028: Europe (France, Germany, Italy, Spain, UK), North America (Canada and Mexico), APAC (China, India, Japan, South Korea), South America (Brazil), and Middle East and Africa (UAE) [Dataset]. https://www.technavio.com/report/spreads-market-industry-analysis

Explore at:

pdfAvailable download formats

Dataset updated

Oct 17, 2024

Dataset provided by

TechNavio

Authors

Technavio

License

https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

Time period covered

2024 - 2028

Area covered

France, Germany, United Kingdom, Italy, Japan, Spain, Canada, Brazil, Mexico

Description

Snapshot img

Spreads Market Size 2024-2028

The spreads market size is forecast to increase by USD 7.52 billion at a CAGR of 4.2% between 2023 and 2028.

The market is experiencing significant growth, driven primarily by the increasing trend towards on-the-go consumption and the growing popularity of e-commerce channels. The consumers' busy lifestyles have led to a surge in demand for convenient and portable food options, including spreads and sandwiches. Moreover, the rise of e-commerce platforms has made it easier for consumers to access a wide range of spreads from various brands, further fueling market growth. However, the market also faces challenges,One major obstacle is the health concerns associated with spreads, particularly those high in sugar and saturated fats.
As consumers become more health-conscious, there is a growing demand for healthier spread options. Another challenge is the intense competition in the market, with numerous players vying for market share. Companies must differentiate themselves by offering unique and innovative products to meet the evolving needs and preferences of consumers. To capitalize on opportunities and navigate challenges effectively, market participants must stay abreast of consumer trends and respond with agility and innovation.

What will be the Size of the Spreads Market during the forecast period?

Request Free Sample

The market continues to evolve, with financial institutions increasingly relying on advanced data analysis techniques to gain insights and make informed decisions. Data quality is paramount, as enterprise solutions implement data warehousing and financial modeling to ensure accurate and reliable information. Data governance and marketing analysis employ machine learning and sales forecasting to identify trends and patterns in big data. Freemium models and artificial intelligence are transforming customer segmentation, enabling businesses to target their offerings more effectively. Cloud computing platforms and spreadsheet software offer user-friendly data dashboards for business process automation and user experience optimization.
Predictive modeling and collaboration tools facilitate real-time data analysis and scenario planning for investment firms. Business intelligence software and data visualization tools provide valuable insights for business users, while risk management and operations optimization rely on prescriptive analytics and data analytics software. Portfolio management and investment analysis benefit from interactive reports and data integration, enabling advanced analytics and mobile accessibility. Data storytelling and user interfaces enhance the value of data, while data security remains a critical concern. Subscription models and project management tools enable data mining and workflow automation for power users. The continuous dynamism of the market underscores the importance of staying informed and adaptable to evolving trends and patterns.

How is this Spreads Industry segmented?

The spreads industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

Distribution Channel

  Offline
  Online


End-User

  Households
  Food Service
  Industrial


Product Type

  Jams & Jellies
  Nut Butters
  Cheese Spreads
  Savory Spreads


Packaging

  Jars
  Tubes
  Packets


Geography

  North America

    US
    Canada
    Mexico


  Europe

    France
    Germany
    Italy
    Spain
    UK


  Middle East and Africa

    UAE


  APAC

    China
    India
    Japan
    South Korea


  South America

    Brazil


  Rest of World (ROW)

By Distribution Channel Insights

The offline segment is estimated to witness significant growth during the forecast period.

The market encompasses various retail sectors, including department stores, supermarkets, hypermarkets, convenience stores, and restaurants. Major retail chains, such as Tesco Plc (Tesco) and Walmart Inc. (Walmart), have dedicated sections for spreads, offering a diverse range of butter, fruit, and chocolate spreads. companies employ marketing strategies, like branding through signages and discounts on product packages, to attract consumers. Walmart and Walgreens are long-standing retailers of spreads. Operating in the organized retail sector, companies consider factors like geographical presence, production and inventory management ease, and goods transportation. Businesses utilize enterprise solutions, such as data warehousing, financial modeling, and data governance, to manage their spreads offerings.

Machine learning and predictive analytics enable sales forecasting and customer segmentation. Data visualization tools help in data storytelling and risk management. Cloud-based platforms facilitate business planning and col

E
The Orange workflow for observing collocation clusters ColEmbed 1.0
live.european-language-grid.eu
Updated Oct 25, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). The Orange workflow for observing collocation clusters ColEmbed 1.0 [Dataset]. https://live.european-language-grid.eu/catalogue/tool-service/20151
Explore at:
Dataset updated
Oct 25, 2020
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The Orange Workflow for Observing Collocation Clusters ColEmbed 1.0

ColEmbed is a workflow (.OWS file) for Orange Data Mining (an open-source machine learning and data visualization software: https://orangedatamining.com/) that allows the user to observe clusters of collocation candidates extracted from corpora. The workflow consists of a series of data filters, embedding processors, and visualizers.

As input, the workflow takes a tab-separated file (.TSV/.TAB) with data on collocations extracted from a corpus, along with their relative frequencies by year of publication and other optional values (such as information on temporal trends). The workflow allows the user to select the features which are then used in the workflow to cluster collocation candidates, along with the embeddings generated based on the selected lemmas (either one lemma or both lemmas can be selected, depending on our clustering criteria; for instance, if we wish to cluster adjective+noun candidates based on the similarities of their noun components, we only select the second lemma to be taken into account in embedding generation). The obtained embedding clusters can be visualized and further processed (e.g. by finding the closest neighbors of a reference collocation). The workflow is described in more detail in the accompanying README file.

The entry also contains three .TAB files that can be used to test the workflow. The files contain collocation candidates (along with their relative frequencies per year of publication and four measures describing their temporal trends; see http://hdl.handle.net/11356/1424 for more details) extracted from the Gigafida 2.0 Corpus of Written Slovene (https://viri.cjvt.si/gigafida/) with three different syntactic structures (as defined in http://hdl.handle.net/11356/1415): 1) p0-s0 (adjective + noun, e.g. rezervni sklad), 2) s0-s2 (noun + noun in the genitive case, e.g. ukinitev lastnine), and 3) gg-s4 (verb + noun in the accusative case, e.g. pripraviti besedilo).

It should be noted that only collocation candidates with absolute frequency of 15 and above were extracted.

Please note that the ColEmbed workflow requires the installation of the Text Mining add-on for Orange. For installation instructions as well as a more detailed description of the different phases of the workflow and the measures used to observe the collocation trends, please consult the README file.
Common pre-diagnostic features in individuals with different rare diseases...
plos.figshare.com
datasetcatalog.nlm.nih.gov
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lorenz Grigull; Sandra Mehmecke; Ann-Katrin Rother; Susanne Blöß; Christian Klemann; Ulrike Schumacher; Urs Mücke; Xiaowei Kortum; Werner Lechner; Frank Klawonn (2023). Common pre-diagnostic features in individuals with different rare diseases represent a key for diagnostic support with computerized pattern recognition? [Dataset]. http://doi.org/10.1371/journal.pone.0222637
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0222637
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Lorenz Grigull; Sandra Mehmecke; Ann-Katrin Rother; Susanne Blöß; Christian Klemann; Ulrike Schumacher; Urs Mücke; Xiaowei Kortum; Werner Lechner; Frank Klawonn
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
BackgroundRare diseases (RD) result in a wide variety of clinical presentations, and this creates a significant diagnostic challenge for health care professionals. We hypothesized that there exist a set of consistent and shared phenomena among all individuals affected by (different) RD during the time before diagnosis is established.ObjectiveWe aimed to identify commonalities between different RD and developed a machine learning diagnostic support tool for RD.Methods20 interviews with affected individuals with different RD, focusing on the time period before their diagnosis, were performed and qualitatively analyzed. Out of these pre-diagnostic experiences, we distilled key phenomena and created a questionnaire which was then distributed among individuals with the established diagnosis of i.) RD, ii.) other common non-rare diseases (NRO) iii.) common chronic diseases (CD), iv.), or psychosomatic/somatoform disorders (PSY). Finally, four combined single machine learning methods and a fusion algorithm were used to distinguish the different answer patterns of the questionnaires.ResultsThe questionnaire contained 53 questions. A total sum of 1763 questionnaires (758 RD, 149 CD, 48 PSY, 200 NRO, 34 healthy individuals and 574 not evaluable questionnaires) were collected. Based on 3 independent data sets the 10-fold stratified cross-validation method for the answer-pattern recognition resulted in sensitivity values of 88.9% to detect the answer pattern of a RD, 86.6% for NRO, 87.7% for CD and 84.2% for PSY.ConclusionDespite the great diversity in presentation and pathogenesis of each RD, patients with RD share surprisingly similar pre-diagnosis experiences. Our questionnaire and data-mining based approach successfully detected unique patterns in groups of individuals affected by a broad range of different rare diseases. Therefore, these results indicate distinct patterns that may be used for diagnostic support in RD.
Galatanet dataset
zenodo.org
data.niaid.nih.gov
bin, csv, png, txt +1
Updated Oct 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vincent Labatut; Vincent Labatut; Jean-Michel Balasque; Jean-Michel Balasque (2024). Galatanet dataset [Dataset]. http://doi.org/10.5281/zenodo.6811542
Explore at:
bin, txt, csv, png, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6811542
Dataset updated
Oct 1, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Vincent Labatut; Vincent Labatut; Jean-Michel Balasque; Jean-Michel Balasque
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Description. This project contains the dataset relative to the Galatanet survey, conducted in 2009 and 2010 at the Galatasaray University in Istanbul (Turkey). The goal of this survey was to retrieve information regarding the social relationships between students, their feeling regarding the university in general, and their purchase behavior. The survey was conducted during two phases: the first one in 2009 and the second in 2010.

The dataset includes two kinds of data. First, the answers to most of the questions are contained in a large table, available under both CSV and MS Excel formats. An description file allows understanding the meaning of each field appearing in the table. Note the
survey form is also contained in the archive, for reference (it is in French and Turkish only, though). Second, the social network of students is available under both Pajek and Graphml formats. Having both individual (nodal attributes) and relational (links) information in the same dataset is, to our knowledge, rare and difficult to find in public sources, and this makes (to our opinion) this dataset interesting and valuable.

All data are completely anonymous: students' names have been replaced by random numbers. Note that the survey is not exactly the same between the two phases: some small adjustments were applied thanks to the feedback from the first phase (but the datasets have been normalized since then). Also, the electronic form was very much improved for the second phase, which explains why the answers are much more complete than in the first phase.

The data were used in our following publications:

Labatut, V. & Balasque, J.-M. (2010). Business-oriented Analysis of a Social Network of University Students. In: International Conference on Advances in Social Network Analysis and Mining, 25-32. Odense, DK : IEEE. ⟨hal-00633643⟩ - DOI: 10.1109/ASONAM.2010.15

An extended version of the original article: Labatut, V. & Balasque, J.-M. (2013). Informative Value of Individual and Relational Data Compared Through Business-Oriented Community Detection. Özyer, T.; Rokne, J.; Wagner, G. & Reuser, A. H. (Eds.), The Influence of Technology on Social Network Analysis and Mining, Springer, 2013, chap.6, 303-330. ⟨hal-00633650⟩ - DOI: 10.1007/978-3-7091-1346-2_13

A more didactic article using some of these data just for illustration purposes: Labatut, V. & Balasque, J.-M. (2012). Detection and Interpretation of Communities in Complex Networks: Methods and Practical Application. Abraham, A. & Hassanien, A.-E. (Eds.), Computational Social Networks: Tools, Perspectives and Applications, Springer, chap.4, 81-113. ⟨hal-00633653⟩ - DOI: 10.1007/978-1-4471-4048-1_4

Citation. If you use this data, please cite article [1] above:

@InProceedings{Labatut2010,
author = {Labatut, Vincent and Balasque, Jean-Michel},
title = {Business-oriented Analysis of a Social Network of University Students},
booktitle = {International Conference on Advances in Social Networks Analysis and Mining},
year = {2010},
pages = {25-32},
address = {Odense, DK},
publisher = {IEEE Publishing},
doi = {10.1109/ASONAM.2010.15},
}

Contact. 2009-2010 by Jean-Michel Balasque (jmbalasque@gsu.edu.tr) & Vincent Labatut (vlabatut@gsu.edu.tr)

License. This dataset is open data: you can redistribute it and/or use it under the terms of the Creative Commons Zero license (see `license.txt`).
U
Underground Development Drill Rigs Report
promarketreports.com
doc, pdf, ppt
Updated May 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pro Market Reports (2025). Underground Development Drill Rigs Report [Dataset]. https://www.promarketreports.com/reports/underground-development-drill-rigs-226507
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
May 10, 2025
Dataset authored and provided by
Pro Market Reports
License
https://www.promarketreports.com/privacy-policyhttps://www.promarketreports.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global underground development drill rigs market is experiencing robust growth, driven by the increasing demand for efficient and safe excavation solutions in mining and tunnel construction projects worldwide. The market size in 2025 is estimated at $2.5 billion, projecting a Compound Annual Growth Rate (CAGR) of 6% from 2025 to 2033. This growth is fueled by several key factors, including the rising global infrastructure development, particularly in emerging economies, the expanding mining sector focused on deep-level resource extraction, and ongoing advancements in drilling technology leading to improved efficiency and reduced operational costs. The demand for larger, more powerful rigs capable of handling challenging geological conditions further contributes to market expansion. Different rig types, such as single, double, triple, and four-boom drilling rigs, cater to varying project requirements and contribute to market segmentation. Mining currently holds the largest application share, owing to its significant dependence on efficient drilling for exploration and extraction. However, the market faces challenges such as the cyclical nature of the mining industry, fluctuations in commodity prices impacting investment decisions, and stringent environmental regulations governing mining and construction activities. Despite these headwinds, the long-term outlook remains positive, supported by the continued growth in global infrastructure projects and the ongoing need for sustainable and cost-effective underground development solutions. The technological advancements in automation, remote operation, and data analytics are also expected to propel market growth in the coming years. Major players like Epiroc, Sandvik, and Komatsu are actively shaping the market through innovations and strategic partnerships, leading to intense competition and driving further market consolidation. Regional variations in growth are expected, with Asia-Pacific projected as a key growth region due to its booming infrastructure development and mining activities. This report provides a detailed analysis of the global underground development drill rigs market, projected to reach $7.5 billion by 2030. It explores key market trends, regional dynamics, competitive landscapes, and emerging technologies impacting this crucial sector for mining and infrastructure development. The report leverages rigorous market research methodologies and incorporates data from leading industry players such as Epiroc, Sandvik, and Komatsu Mining Corp.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). LitMiner [Dataset]. http://identifiers.org/RRID:SCR_008200

LitMiner

RRID:SCR_008200, nif-0000-21241, biotools:litminer, LitMiner (RRID:SCR_008200), LitMiner

Explore at:

Unique identifier

https://identifiers.org/RRID:SCR_008200 https://identifiers.org/RRID:SCR_008200/resolver/mentions

Dataset updated

Feb 27, 2025

Description

THIS RESOURCE IS NO LONGER IN SERVICE, documented August 23, 2016. The LitMiner software is a literature data-mining tool that facilitates the identification of major gene regulation key players related to a user-defined field of interest in PubMed abstracts. The prediction of gene-regulatory relationships is based on co-occurrence analysis of key terms within the abstracts. LitMiner predicts relationships between key terms from the biomedical domain in four categories (genes, chemical compounds, diseases and tissues). The usefulness of the LitMiner system has been demonstrated recently in a study that reconstructed disease-related regulatory networks by promoter modeling that was initiated by a LitMiner generated primary gene list. To overcome the limitations and to verify and improve the data, we developed WikiGene, a Wiki-based curation tool that allows revision of the data by expert users over the Internet. It is based on the annotation of key terms in article abstracts followed by statistical co-citation analysis of annotated key terms in order to predict relationships. Key terms belonging to four different categories are used for the annotation process: -Genes: Names of genes and gene products. Gene name recognition is based on Ensembl . Synonyms and aliases are resolved. -Chemical Compounds: Names of chemical compounds and their respective aliases. -Diseases and Phenotypes: Names of diseases and phenotypes -Tissues and Organs: Names of tissues and organs LitMiner uses a database of disease and phenotype terms for literature annotation. Currently, there are 2225 diseases or phenotypes, 801 tissues and organs, and 10477 compounds in the database.

Clear search

Close search

Google apps

Main menu

LitMiner

Data from: Surface Materials Data from Breccia-Pipe Uranium Mine and...

LongAlpaca-Yukang ML Instructional Outputs

LongAlpaca-Yukang ML Instructional Outputs

Unlocking the Power of AI

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Exploring the Dataset:

Visualizing Data:

Analyzing Performance:

Drawing Conclusions:

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

KDD-99 Original dataset

KDD Cup 1999 Dataset

Key Characteristics

Advantages

Limitations

Usage

International Journal of Engineering and Advanced Technology FAQ -...

Data from: Unveiling the impacts of land use on the phylogeography of...

Supplementary Material

Description of the data and file structure

1. Data Collection and Curation.

2. Genetic Sequence Alignment and Phylogenetic Inference.

Forecast revenue big data market worldwide 2011-2027

International Journal of Engineering and Advanced Technology Acceptance Rate...

Product data mining: entity classification&linking

IMPORTANT: Round 1 results are now released, check our website for the leaderboard. We now open Round 2 submissions!

1. Overview

2. Task and dataset brief

3. Resources and tools

4. Challenge website

5. Organizing committee

6. Contact

HTRU2

Aggregated score derived from: Accuracy, F-measure, Matthews correlation...

The Orange workflow for observing collocation trends ColTrend 1.0

International Journal of Engineering and Advanced Technology Impact Factor...

Additional file 1 of MGS2AMR: a gene-centric mining of metagenomic...

Spreads Market Analysis, Size, and Forecast 2024-2028: Europe (France,...

Snapshot img

The Orange workflow for observing collocation clusters ColEmbed 1.0

Common pre-diagnostic features in individuals with different rare diseases...

Galatanet dataset

Underground Development Drill Rigs Report

LitMiner

RRID:SCR_008200, nif-0000-21241, biotools:litminer, LitMiner (RRID:SCR_008200), LitMiner