100+ datasets found

Data Mining Market Size, Competitive Landscape, Growth Trends 2030
mordorintelligence.com
pdf,excel,csv,ppt
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mordor Intelligence (2025). Data Mining Market Size, Competitive Landscape, Growth Trends 2030 [Dataset]. https://www.mordorintelligence.com/industry-reports/data-mining-market
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Jun 23, 2025
Dataset authored and provided by
Mordor Intelligence
License
https://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy
Time period covered
2019 - 2030
Area covered
Global
Description
The Data Mining Market is Segmented by Component (Tools [ETL and Data Preparation, Data-Mining Workbench, and More], Services [Professional Services, and More]), End-User Enterprise Size (Small and Medium Enterprises, Large Enterprises), Deployment (Cloud, On-Premise), End-User Industry (BFSI, IT and Telecom, Government and Defence, and More), and Geography. The Market Forecasts are Provided in Terms of Value (USD).
d
Data Mining in Systems Health Management
catalog.data.gov
s.cnmilf.com
+1more
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Data Mining in Systems Health Management [Dataset]. https://catalog.data.gov/dataset/data-mining-in-systems-health-management
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
This chapter presents theoretical and practical aspects associated to the implementation of a combined model-based/data-driven approach for failure prognostics based on particle filtering algorithms, in which the current esti- mate of the state PDF is used to determine the operating condition of the system and predict the progression of a fault indicator, given a dynamic state model and a set of process measurements. In this approach, the task of es- timating the current value of the fault indicator, as well as other important changing parameters in the environment, involves two basic steps: the predic- tion step, based on the process model, and an update step, which incorporates the new measurement into the a priori state estimate. This framework allows to estimate of the probability of failure at future time instants (RUL PDF) in real-time, providing information about time-to- failure (TTF) expectations, statistical confidence intervals, long-term predic- tions; using for this purpose empirical knowledge about critical conditions for the system (also referred to as the hazard zones). This information is of paramount significance for the improvement of the system reliability and cost-effective operation of critical assets, as it has been shown in a case study where feedback correction strategies (based on uncertainty measures) have been implemented to lengthen the RUL of a rotorcraft transmission system with propagating fatigue cracks on a critical component. Although the feed- back loop is implemented using simple linear relationships, it is helpful to provide a quick insight into the manner that the system reacts to changes on its input signals, in terms of its predicted RUL. The method is able to manage non-Gaussian pdf’s since it includes concepts such as nonlinear state estimation and confidence intervals in its formulation. Real data from a fault seeded test showed that the proposed framework was able to anticipate modifications on the system input to lengthen its RUL. Results of this test indicate that the method was able to successfully suggest the correction that the system required. In this sense, future work will be focused on the development and testing of similar strategies using different input-output uncertainty metrics.
Data from: Peer-to-Peer Data Mining, Privacy Issues, and Games
data.nasa.gov
s.cnmilf.com
+2more
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Peer-to-Peer Data Mining, Privacy Issues, and Games [Dataset]. https://data.nasa.gov/dataset/peer-to-peer-data-mining-privacy-issues-and-games
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Peer-to-Peer (P2P) networks are gaining increasing popularity in many distributed applications such as file-sharing, network storage, web caching, sear- ching and indexing of relevant documents and P2P network-threat analysis. Many of these applications require scalable analysis of data over a P2P network. This paper starts by offering a brief overview of distributed data mining applications and algorithms for P2P environments. Next it discusses some of the privacy concerns with P2P data mining and points out the problems of existing privacy-preserving multi-party data mining techniques. It further points out that most of the nice assumptions of these existing privacy preserving techniques fall apart in real-life applications of privacy-preserving distributed data mining (PPDM). The paper offers a more realistic formulation of the PPDM problem as a multi-party game and points out some recent results.
e
Overview of data mining and predictive analytics
paper.erudition.co.in
html
Updated Dec 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Einetic (2025). Overview of data mining and predictive analytics [Dataset]. https://paper.erudition.co.in/makaut/btech-in-computer-science-and-engineering-artificial-intelligence-and-machine-learning/6/data-mining
Explore at:
htmlAvailable download formats
Dataset updated
Dec 3, 2025
Dataset authored and provided by
Einetic
License
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Description
Question Paper Solutions of chapter Overview of data mining and predictive analytics of Data Mining, 6th Semester , B.Tech in Computer Science & Engineering (Artificial Intelligence and Machine Learning)
Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf
frontiersin.figshare.com
pdf
Updated Jun 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xin Qiao; Hong Jiao (2023). Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf [Dataset]. http://doi.org/10.3389/fpsyg.2018.02231.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fpsyg.2018.02231.s001
Dataset updated
Jun 7, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Xin Qiao; Hong Jiao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Due to increasing use of technology-enhanced educational assessment, data mining methods have been explored to analyse process data in log files from such assessment. However, most studies were limited to one data mining technique under one specific scenario. The current study demonstrates the usage of four frequently used supervised techniques, including Classification and Regression Trees (CART), gradient boosting, random forest, support vector machine (SVM), and two unsupervised methods, Self-organizing Map (SOM) and k-means, fitted to one assessment data. The USA sample (N = 426) from the 2012 Program for International Student Assessment (PISA) responding to problem-solving items is extracted to demonstrate the methods. After concrete feature generation and feature selection, classifier development procedures are implemented using the illustrated techniques. Results show satisfactory classification accuracy for all the techniques. Suggestions for the selection of classifiers are presented based on the research questions, the interpretability and the simplicity of the classifiers. Interpretations for the results from both supervised and unsupervised learning methods are provided.
R
Data Mining Kel 11 Dataset
universe.roboflow.com
zip
Updated Oct 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Mining (2025). Data Mining Kel 11 Dataset [Dataset]. https://universe.roboflow.com/data-mining-mtwls/data-mining-kel-11-zp4xe
Explore at:
zipAvailable download formats
Dataset updated
Oct 29, 2025
Dataset authored and provided by
Data Mining
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Beras
Description
Data Mining Kel 11

## Overview Data Mining Kel 11 is a dataset for classification tasks - it contains Beras annotations for 59,785 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Wiley.Data.Mining.for.Business.Analytics.in.R.Full
kaggle.com
zip
Updated Oct 4, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
liyanonline (2021). Wiley.Data.Mining.for.Business.Analytics.in.R.Full [Dataset]. https://www.kaggle.com/datasets/liyanonline/wileydataminingforbusinessanalyticsinrfull
Explore at:
zip(6519556 bytes)Available download formats
Dataset updated
Oct 4, 2021
Authors
liyanonline
Description
Dataset

This dataset was created by liyanonline

Contents
Data supporting the Master thesis "Monitoring von Open Data Praktiken -...
zenodo.org
data.niaid.nih.gov
+1more
zip
Updated Nov 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katharina Zinke; Katharina Zinke (2024). Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" [Dataset]. http://doi.org/10.5281/zenodo.14196539
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14196539
Dataset updated
Nov 21, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Katharina Zinke; Katharina Zinke
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Dresden
Description
Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" (Monitoring open data practices - challenges in finding data publications using the example of publications by researchers at TU Dresden) - Katharina Zinke, Institut für Bibliotheks- und Informationswissenschaften, Humboldt-Universität Berlin, 2023

This ZIP-File contains the data the thesis is based on, interim exports of the results and the R script with all pre-processing, data merging and analyses carried out. The documentation of the additional, explorative analysis is also available. The actual PDFs and text files of the scientific papers used are not included as they are published open access.

The folder structure is shown below with the file names and a brief description of the contents of each file. For details concerning the analyses approach, please refer to the master's thesis (publication following soon).

## Data sources

Folder 01_SourceData/

- PLOS-Dataset_v2_Mar23.csv (PLOS-OSI dataset)

- ScopusSearch_ExportResults.csv (export of Scopus search results from Scopus)

- ScopusSearch_ExportResults.ris (export of Scopus search results from Scopus)

- Zotero_Export_ScopusSearch.csv (export of the file names and DOIs of the Scopus search results from Zotero)

## Automatic classification

Folder 02_AutomaticClassification/

- (NOT INCLUDED) PDFs folder (Folder for PDFs of all publications identified by the Scopus search, named AuthorLastName_Year_PublicationTitle_Title)

- (NOT INCLUDED) PDFs_to_text folder (Folder for all texts extracted from the PDFs by ODDPub, named AuthorLastName_Year_PublicationTitle_Title)

- PLOS_ScopusSearch_matched.csv (merge of the Scopus search results with the PLOS_OSI dataset for the files contained in both)

- oddpub_results_wDOIs.csv (results file of the ODDPub classification)

- PLOS_ODDPub.csv (merge of the results file of the ODDPub classification with the PLOS-OSI dataset for the publications contained in both)

## Manual coding

Folder 03_ManualCheck/

- CodeSheet_ManualCheck.txt (Code sheet with descriptions of the variables for manual coding)

- ManualCheck_2023-06-08.csv (Manual coding results file)

- PLOS_ODDPub_Manual.csv (Merge of the results file of the ODDPub and PLOS-OSI classification with the results file of the manual coding)

## Explorative analysis for the discoverability of open data

Folder04_FurtherAnalyses

Proof_of_of_Concept_Open_Data_Monitoring.pdf (Description of the explorative analysis of the discoverability of open data publications using the example of a researcher) - in German

## R-Script

Analyses_MA_OpenDataMonitoring.R (R-Script for preparing, merging and analyzing the data and for performing the ODDPub algorithm)
dataset data mining
kaggle.com
zip
Updated May 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IqbalPanre22021 (2025). dataset data mining [Dataset]. https://www.kaggle.com/datasets/iqbalpanre22021/dataset-data-mining
Explore at:
zip(17079 bytes)Available download formats
Dataset updated
May 21, 2025
Authors
IqbalPanre22021
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by IqbalPanre22021

Released under Apache 2.0

Contents
Data Mining at NASA: From Theory to Applications - Dataset - NASA Open Data...
data.nasa.gov
Updated Mar 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Data Mining at NASA: From Theory to Applications - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/data-mining-at-nasa-from-theory-to-applications
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
NASA has some of the largest and most complex data sources in the world, with data sources ranging from the earth sciences, space sciences, and massive distributed engineering data sets from commercial aircraft and spacecraft. This talk will discuss some of the issues and algorithms developed to analyze and discover patterns in these data sets. We will also provide an overview of a large research program in Integrated Vehicle Health Management. The goal of this program is to develop advanced technologies to automatically detect, diagnose, predict, and mitigate adverse events during the flight of an aircraft. A case study will be presented on a recent data mining analysis performed to support the Flight Readiness Review of the Space Shuttle Mission STS-119.
d
Privacy Preserving Distributed Data Mining
catalog.data.gov
s.cnmilf.com
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). Privacy Preserving Distributed Data Mining [Dataset]. https://catalog.data.gov/dataset/privacy-preserving-distributed-data-mining
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
Distributed data mining from privacy-sensitive multi-party data is likely to play an important role in the next generation of integrated vehicle health monitoring systems. For example, consider an airline manufacturer [tex]$\mathcal{C}$[/tex] manufacturing an aircraft model [tex]$A$[/tex] and selling it to five different airline operating companies [tex]$\mathcal{V}_1 \dots \mathcal{V}_5$[/tex]. These aircrafts, during their operation, generate huge amount of data. Mining this data can reveal useful information regarding the health and operability of the aircraft which can be useful for disaster management and prediction of efficient operating regimes. Now if the manufacturer [tex]$\mathcal{C}$[/tex] wants to analyze the performance data collected from different aircrafts of model-type [tex]$A$[/tex] belonging to different airlines then central collection of data for subsequent analysis may not be an option. It should be noted that the result of this analysis may be statistically more significant if the data for aircraft model [tex]$A$[/tex] across all companies were available to [tex]$\mathcal{C}$[/tex]. The potential problems arising out of such a data mining scenario are:
Privacy Preserving Distributed Data Mining - Dataset - NASA Open Data Portal...
data.nasa.gov
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Privacy Preserving Distributed Data Mining - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/privacy-preserving-distributed-data-mining
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Distributed data mining from privacy-sensitive multi-party data is likely to play an important role in the next generation of integrated vehicle health monitoring systems. For example, consider an airline manufacturer [tex]$\mathcal{C}$[/tex] manufacturing an aircraft model [tex]$A$[/tex] and selling it to five different airline operating companies [tex]$\mathcal{V}_1 \dots \mathcal{V}_5$[/tex]. These aircrafts, during their operation, generate huge amount of data. Mining this data can reveal useful information regarding the health and operability of the aircraft which can be useful for disaster management and prediction of efficient operating regimes. Now if the manufacturer [tex]$\mathcal{C}$[/tex] wants to analyze the performance data collected from different aircrafts of model-type [tex]$A$[/tex] belonging to different airlines then central collection of data for subsequent analysis may not be an option. It should be noted that the result of this analysis may be statistically more significant if the data for aircraft model [tex]$A$[/tex] across all companies were available to [tex]$\mathcal{C}$[/tex]. The potential problems arising out of such a data mining scenario are:
Distributed Data Mining in Peer-to-Peer Networks - Dataset - NASA Open Data...
data.nasa.gov
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Distributed Data Mining in Peer-to-Peer Networks - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/distributed-data-mining-in-peer-to-peer-networks
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Peer-to-peer (P2P) networks are gaining popularity in many applications such as file sharing, e-commerce, and social networking, many of which deal with rich, distributed data sources that can benefit from data mining. P2P networks are, in fact,well-suited to distributed data mining (DDM), which deals with the problem of data analysis in environments with distributed data,computing nodes,and users. This article offers an overview of DDM applications and algorithms for P2P environments,focusing particularly on local algorithms that perform data analysis by using computing primitives with limited communication overhead. The authors describe both exact and approximate local P2P data mining algorithms that work in a decentralized and communication-efficient manner.
R
Data Mining Test Dataset
universe.roboflow.com
zip
Updated Oct 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ons (2025). Data Mining Test Dataset [Dataset]. https://universe.roboflow.com/ons-eykpy/data-mining-test-fjlw4/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Oct 20, 2025
Dataset authored and provided by
ons
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Cars Damage Cars Bounding Boxes
Description
Data Mining Test

## Overview Data Mining Test is a dataset for object detection tasks - it contains Cars Damage Cars annotations for 382 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Logs and Mined Sequential Patterns of Programming Processes from...
figshare.com
txt
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Minji Kong; Lori Pollock (2023). Logs and Mined Sequential Patterns of Programming Processes from "Semi-Automatically Mining Students' Common Scratch Programming Behaviors" [Dataset]. http://doi.org/10.6084/m9.figshare.12100797.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12100797.v1
Dataset updated
Jun 3, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Minji Kong; Lori Pollock
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present a ProgSnap2-based dataset containing anonymized logs of over 34,000 programming events exhibited by 81 programming students in Scratch, a visual programming environment, during our designed study as described in the paper "Semi-Automatically Mining Students' Common Scratch Programming Behaviors." We also include a list of approx. 3100 mined sequential patterns of programming processes that are performed by at least 10% of the 62 of the 81 students who are novice programmers, and represent maximal patterns generated by the MG-FSM algorithm while allowing a gap of one programming event. README.txt — overview of the dataset and its propertiesmainTable.csv — main event table of the dataset holding rows of programming eventscodeState.csv — table holding XML representations of code snapshots at the time of each programming eventdatasetMetadata.csv — describes features of the datasetScratch-SeqPatterns.txt — list of sequential patterns mined from the Main Event Table
Data Mining in Systems Health Management - Dataset - NASA Open Data Portal
data.nasa.gov
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Data Mining in Systems Health Management - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/data-mining-in-systems-health-management
Explore at:
Dataset updated
Mar 31, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
This chapter presents theoretical and practical aspects associated to the implementation of a combined model-based/data-driven approach for failure prognostics based on particle filtering algorithms, in which the current esti- mate of the state PDF is used to determine the operating condition of the system and predict the progression of a fault indicator, given a dynamic state model and a set of process measurements. In this approach, the task of es- timating the current value of the fault indicator, as well as other important changing parameters in the environment, involves two basic steps: the predic- tion step, based on the process model, and an update step, which incorporates the new measurement into the a priori state estimate. This framework allows to estimate of the probability of failure at future time instants (RUL PDF) in real-time, providing information about time-to- failure (TTF) expectations, statistical confidence intervals, long-term predic- tions; using for this purpose empirical knowledge about critical conditions for the system (also referred to as the hazard zones). This information is of paramount significance for the improvement of the system reliability and cost-effective operation of critical assets, as it has been shown in a case study where feedback correction strategies (based on uncertainty measures) have been implemented to lengthen the RUL of a rotorcraft transmission system with propagating fatigue cracks on a critical component. Although the feed- back loop is implemented using simple linear relationships, it is helpful to provide a quick insight into the manner that the system reacts to changes on its input signals, in terms of its predicted RUL. The method is able to manage non-Gaussian pdf’s since it includes concepts such as nonlinear state estimation and confidence intervals in its formulation. Real data from a fault seeded test showed that the proposed framework was able to anticipate modifications on the system input to lengthen its RUL. Results of this test indicate that the method was able to successfully suggest the correction that the system required. In this sense, future work will be focused on the development and testing of similar strategies using different input-output uncertainty metrics.
Data from: Data Mining Project Dataset
kaggle.com
zip
Updated Dec 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mark Dobres (2020). Data Mining Project Dataset [Dataset]. https://www.kaggle.com/markdobres/data-mining-project-dataset
Explore at:
zip(1552418617 bytes)Available download formats
Dataset updated
Dec 10, 2020
Authors
Mark Dobres
Description
Dataset

This dataset was created by Mark Dobres

Contents
R
Data Mining Dataset
universe.roboflow.com
zip
Updated Aug 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ilham project (2023). Data Mining Dataset [Dataset]. https://universe.roboflow.com/ilham-project/data-mining-n52lu/model/1
Explore at:
zipAvailable download formats
Dataset updated
Aug 4, 2023
Dataset authored and provided by
ilham project
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Variables measured
Uangrupiah Bounding Boxes
Description
Data Mining

## Overview Data Mining is a dataset for object detection tasks - it contains Uangrupiah annotations for 692 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
Operational Mineral Resource Mines
kaggle.com
zip
Updated Dec 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Operational Mineral Resource Mines [Dataset]. https://www.kaggle.com/datasets/thedevastator/operational-mineral-resource-mines
Explore at:
zip(344875 bytes)Available download formats
Dataset updated
Dec 8, 2023
Authors
The Devastator
Description
Operational Mineral Resource Mines

Operational mineral resource mines with location and contact details

By Homeland Infrastructure Foundation [source]

About this dataset

The Mines and Mineral Resources dataset provides comprehensive information on operational mines for mineral resources, excluding sand and gravel quarries. It includes detailed data on the location, contact details, and specific characteristics of these mines.

One key aspect of this dataset is that it focuses exclusively on actively operating mines. This ensures that the information is up-to-date and relevant for various stakeholders, including first responders and law enforcement teams.

To facilitate easy access to these mines, each mine's location point has been strategically placed at the intersection between the main haul road within the mine site and the nearest public road. This point serves as a starting point from which it should be straightforward to navigate to the actual mining areas or pits.

The dataset covers an extensive range of attributes related to each mine. These include detailed descriptions of the industry classification using both NAICS (North American Industry Classification System) codes and SIC (Standard Industrial Classification) codes. Additionally, information on security classification highlights any potential security considerations associated with each mine.

Contact details for each mine are provided to ensure efficient communication in case of emergencies or inquiries. These include emergency contact phone numbers along with extension numbers, titles of emergency contact persons, vendor responsible for providing data services related to a particular mine, inspecting officer's name, as well as general phone numbers for contacting the respective mining companies or plants.

Address information comprises complete addresses including additional address details when applicable such as suite or floor numbers. City names are specified along with counties where these mines are located in order to identify their geographical locations more precisely. ZIP codes provide essential postal code references while FIPS (Federal Information Processing Standards) codes offer unique identifiers for each county.

Directions from nearby public roads guide individuals towards accessing specific mines efficiently. This ensures that first responders or visitors can easily reach their intended destinations within these expansive mining areas.

The dataset also includes valuable geospatial data such as latitude (Y) and longitude (X) coordinates, enabling accurate positioning of each mine on a map. The precision level of these geospatial data points is provided to understand the accuracy and reliability of this location information.

Thematic categorization based on the Homeland Security Infrastructure Program offers insights into the specific industry focus or purpose of each mine. This classification allows for better understanding and organization within the dataset.

Additionally, quality control and quality assurance information are provided to convey the reliability and validity of the data. This includes details on how geospatial data was determined (geohow) and recorded dates for both geospatial (geodate) and general contact information (contdate).

Overall,

How to use the dataset

Understand the Purpose:

This dataset includes information on mines believed to be operational, allowing first responders or law enforcement teams to easily locate and access these mines in case of emergencies or other purposes.

The dataset focuses on mines related to mineral resources, excluding sand and gravel quarries.

Column Descriptions:

FEATTYPE: The type of feature represented by the data point.

SECCLASS: The security classification of the mine.

NAME: The name of the mine.

AREA_: The area covered by the mine.

PHONE: The contact phone number for the mine.

ADDRESS: The address of the mine.

ADDRESS2: Additional address information for the mine.

CITY: The city where the mine is located.

STATE: The state where the mine is located.

ZIP/ZIPP4: The ZIP code/ZIP+4 code of the mine's location COUNTY and FIPS codes provide county-related information for better understanding DIRECTIONS can be useful for reaching a particular destination EMERGTITLE represents emergency contact person's title EMERGPHONE provides emergency contact phone number along with extension (EMERGEXT) CONTHOW specifies how to contact or communicate with a particular entity associated with a specific column value (e.g., MINE_TYPE) &g...
l
LSC (Leicester Scientific Corpus)
figshare.le.ac.uk
Updated Apr 15, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neslihan Suzen (2020). LSC (Leicester Scientific Corpus) [Dataset]. http://doi.org/10.25392/leicester.data.9449639.v2
Explore at:
Unique identifier
https://doi.org/10.25392/leicester.data.9449639.v2
Dataset updated
Apr 15, 2020
Dataset provided by
University of Leicester
Authors
Neslihan Suzen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Leicester
Description
The LSC (Leicester Scientific Corpus)

April 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk) Supervised by Prof Alexander Gorban and Dr Evgeny MirkesThe data are extracted from the Web of Science [1]. You may not copy or distribute these data in whole or in part without the written consent of Clarivate Analytics.[Version 2] A further cleaning is applied in Data Processing for LSC Abstracts in Version 1*. Details of cleaning procedure are explained in Step 6.* Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v1.Getting StartedThis text provides the information on the LSC (Leicester Scientific Corpus) and pre-processing steps on abstracts, and describes the structure of files to organise the corpus. This corpus is created to be used in future work on the quantification of the meaning of research texts and make it available for use in Natural Language Processing projects.LSC is a collection of abstracts of articles and proceeding papers published in 2014, and indexed by the Web of Science (WoS) database [1]. The corpus contains only documents in English. Each document in the corpus contains the following parts:1. Authors: The list of authors of the paper2. Title: The title of the paper 3. Abstract: The abstract of the paper 4. Categories: One or more category from the list of categories [2]. Full list of categories is presented in file ‘List_of _Categories.txt’. 5. Research Areas: One or more research area from the list of research areas [3]. Full list of research areas is presented in file ‘List_of_Research_Areas.txt’. 6. Total Times cited: The number of times the paper was cited by other items from all databases within Web of Science platform [4] 7. Times cited in Core Collection: The total number of times the paper was cited by other papers within the WoS Core Collection [4]The corpus was collected in July 2018 online and contains the number of citations from publication date to July 2018. We describe a document as the collection of information (about a paper) listed above. The total number of documents in LSC is 1,673,350.Data ProcessingStep 1: Downloading of the Data Online

The dataset is collected manually by exporting documents as Tab-delimitated files online. All documents are available online.Step 2: Importing the Dataset to R

The LSC was collected as TXT files. All documents are extracted to R.Step 3: Cleaning the Data from Documents with Empty Abstract or without CategoryAs our research is based on the analysis of abstracts and categories, all documents with empty abstracts and documents without categories are removed.Step 4: Identification and Correction of Concatenate Words in AbstractsEspecially medicine-related publications use ‘structured abstracts’. Such type of abstracts are divided into sections with distinct headings such as introduction, aim, objective, method, result, conclusion etc. Used tool for extracting abstracts leads concatenate words of section headings with the first word of the section. For instance, we observe words such as ConclusionHigher and ConclusionsRT etc. The detection and identification of such words is done by sampling of medicine-related publications with human intervention. Detected concatenate words are split into two words. For instance, the word ‘ConclusionHigher’ is split into ‘Conclusion’ and ‘Higher’.The section headings in such abstracts are listed below:

Background Method(s) Design Theoretical Measurement(s) Location Aim(s) Methodology Process Abstract Population Approach Objective(s) Purpose(s) Subject(s) Introduction Implication(s) Patient(s) Procedure(s) Hypothesis Measure(s) Setting(s) Limitation(s) Discussion Conclusion(s) Result(s) Finding(s) Material (s) Rationale(s) Implications for health and nursing policyStep 5: Extracting (Sub-setting) the Data Based on Lengths of AbstractsAfter correction, the lengths of abstracts are calculated. ‘Length’ indicates the total number of words in the text, calculated by the same rule as for Microsoft Word ‘word count’ [5].According to APA style manual [6], an abstract should contain between 150 to 250 words. In LSC, we decided to limit length of abstracts from 30 to 500 words in order to study documents with abstracts of typical length ranges and to avoid the effect of the length to the analysis.

Step 6: [Version 2] Cleaning Copyright Notices, Permission polices, Journal Names and Conference Names from LSC Abstracts in Version 1Publications can include a footer of copyright notice, permission policy, journal name, licence, author’s right or conference name below the text of abstract by conferences and journals. Used tool for extracting and processing abstracts in WoS database leads to attached such footers to the text. For example, our casual observation yields that copyright notices such as ‘Published by Elsevier ltd.’ is placed in many texts. To avoid abnormal appearances of words in further analysis of words such as bias in frequency calculation, we performed a cleaning procedure on such sentences and phrases in abstracts of LSC version 1. We removed copyright notices, names of conferences, names of journals, authors’ rights, licenses and permission policies identiﬁed by sampling of abstracts.Step 7: [Version 2] Re-extracting (Sub-setting) the Data Based on Lengths of AbstractsThe cleaning procedure described in previous step leaded to some abstracts having less than our minimum length criteria (30 words). 474 texts were removed.Step 8: Saving the Dataset into CSV FormatDocuments are saved into 34 CSV files. In CSV files, the information is organised with one record on each line and parts of abstract, title, list of authors, list of categories, list of research areas, and times cited is recorded in fields.To access the LSC for research purposes, please email to ns433@le.ac.uk.References[1]Web of Science. (15 July). Available: https://apps.webofknowledge.com/ [2]WoS Subject Categories. Available: https://images.webofknowledge.com/WOKRS56B5/help/WOS/hp_subject_category_terms_tasca.html [3]Research Areas in WoS. Available: https://images.webofknowledge.com/images/help/WOS/hp_research_areas_easca.html [4]Times Cited in WoS Core Collection. (15 July). Available: https://support.clarivate.com/ScientificandAcademicResearch/s/article/Web-of-Science-Times-Cited-accessibility-and-variation?language=en_US [5]Word Count. Available: https://support.office.com/en-us/article/show-word-count-3c9e6a11-a04d-43b4-977c-563a0e0d5da3 [6]A. P. Association, Publication manual. American Psychological Association Washington, DC, 1983.

Facebook

Twitter

Click to copy link

Link copied

Cite

Mordor Intelligence (2025). Data Mining Market Size, Competitive Landscape, Growth Trends 2030 [Dataset]. https://www.mordorintelligence.com/industry-reports/data-mining-market

Data Mining Market Size, Competitive Landscape, Growth Trends 2030

Explore at:

pdf,excel,csv,pptAvailable download formats

Dataset updated

Jun 23, 2025

Dataset authored and provided by

Mordor Intelligence

License

https://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy

Time period covered

2019 - 2030

Area covered

Global

Description

The Data Mining Market is Segmented by Component (Tools [ETL and Data Preparation, Data-Mining Workbench, and More], Services [Professional Services, and More]), End-User Enterprise Size (Small and Medium Enterprises, Large Enterprises), Deployment (Cloud, On-Premise), End-User Industry (BFSI, IT and Telecom, Government and Defence, and More), and Geography. The Market Forecasts are Provided in Terms of Value (USD).

Clear search

Close search

Google apps

Main menu

Data Mining Market Size, Competitive Landscape, Growth Trends 2030

Data Mining in Systems Health Management

Data from: Peer-to-Peer Data Mining, Privacy Issues, and Games

Overview of data mining and predictive analytics

Table_1_Data Mining Techniques in Analyzing Process Data: A Didactic.pdf

Data Mining Kel 11 Dataset

Data Mining Kel 11

Wiley.Data.Mining.for.Business.Analytics.in.R.Full

Dataset

Contents

Data supporting the Master thesis "Monitoring von Open Data Praktiken -...

dataset data mining

Dataset

Contents

Data Mining at NASA: From Theory to Applications - Dataset - NASA Open Data...

Privacy Preserving Distributed Data Mining

Privacy Preserving Distributed Data Mining - Dataset - NASA Open Data Portal...

Distributed Data Mining in Peer-to-Peer Networks - Dataset - NASA Open Data...

Data Mining Test Dataset

Data Mining Test

Logs and Mined Sequential Patterns of Programming Processes from...

Data Mining in Systems Health Management - Dataset - NASA Open Data Portal

Data from: Data Mining Project Dataset

Dataset

Contents

Data Mining Dataset

Data Mining

Operational Mineral Resource Mines

Operational Mineral Resource Mines

Operational mineral resource mines with location and contact details

About this dataset

How to use the dataset

LSC (Leicester Scientific Corpus)

Data Mining Market Size, Competitive Landscape, Growth Trends 2030