100+ datasets found

c
Linked Data Mining Challenge RM Set
datacatalogue.cessda.eu
search.gesis.org
+2more
Updated Mar 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Schaible, Johann (2023). Linked Data Mining Challenge RM Set [Dataset]. http://doi.org/10.7802/78
Explore at:
Unique identifier
https://doi.org/10.7802/78
Dataset updated
Mar 11, 2023
Dataset provided by
GESIS - Leibniz Institute for the Social Sciences
Authors
Schaible, Johann
Description
Rapid Miner Process files and XML test set including the predicted labels for the Linked Data Mining Challenge 2015.
MSR Data Mining Challenge 2024
figshare.com
txt
Updated Dec 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MANISHA MUKHERJEE (2023). MSR Data Mining Challenge 2024 [Dataset]. http://doi.org/10.6084/m9.figshare.24791403.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24791403.v1
Dataset updated
Dec 12, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
MANISHA MUKHERJEE
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Replication package for the paper.
s
Online Feature Selection and Its Applications
researchdata.smu.edu.sg
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HOI Steven; Jialei WANG; Peilin ZHAO; Rong JIN (2023). Online Feature Selection and Its Applications [Dataset]. http://doi.org/10.25440/smu.12062733.v1
Explore at:
Unique identifier
https://doi.org/10.25440/smu.12062733.v1
Dataset updated
May 31, 2023
Dataset provided by
SMU Research Data Repository (RDR)
Authors
HOI Steven; Jialei WANG; Peilin ZHAO; Rong JIN
License
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Description
Feature selection is an important technique for data mining before a machine learning algorithm is applied. Despite its importance, most studies of feature selection are restricted to batch learning. Unlike traditional batch learning methods, online learning represents a promising family of efficient and scalable machine learning algorithms for large-scale applications. Most existing studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of Online Feature Selection (OFS) in which an online learner is only allowed to maintain a classifier involved only a small and fixed number of features. The key challenge of Online Feature Selection is how to make accurate prediction using a small and fixed number of active features. This is in contrast to the classical setup of online learning where all the features can be used for prediction. We attempt to tackle this challenge by studying sparsity regularization and truncation techniques. Specifically, this article addresses two different tasks of online feature selection: (1) learning with full input where an learner is allowed to access all the features to decide the subset of active features, and (2) learning with partial input where only a limited number of features is allowed to be accessed for each instance by the learner. We present novel algorithms to solve each of the two problems and give their performance analysis. We evaluate the performance of the proposed algorithms for online feature selection on several public datasets, and demonstrate their applications to real-world problems including image classification in computer vision and microarray gene expression analysis in bioinformatics. The encouraging results of our experiments validate the efficacy and efficiency of the proposed techniques.Related Publication: Hoi, S. C., Wang, J., Zhao, P., & Jin, R. (2012). Online feature selection for mining big data. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (pp. 93-100). ACM. http://dx.doi.org/10.1145/2351316.2351329 Full text available in InK: http://ink.library.smu.edu.sg/sis_research/2402/ Wang, J., Zhao, P., Hoi, S. C., & Jin, R. (2014). Online feature selection and its applications. IEEE Transactions on Knowledge and Data Engineering, 26(3), 698-710. http://dx.doi.org/10.1109/TKDE.2013.32 Full text available in InK: http://ink.library.smu.edu.sg/sis_research/2277/
Data supporting the Master thesis "Monitoring von Open Data Praktiken -...
zenodo.org
data.niaid.nih.gov
zip
Updated Nov 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katharina Zinke; Katharina Zinke (2024). Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" [Dataset]. http://doi.org/10.5281/zenodo.14196539
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14196539
Dataset updated
Nov 21, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Katharina Zinke; Katharina Zinke
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" (Monitoring open data practices - challenges in finding data publications using the example of publications by researchers at TU Dresden) - Katharina Zinke, Institut für Bibliotheks- und Informationswissenschaften, Humboldt-Universität Berlin, 2023

This ZIP-File contains the data the thesis is based on, interim exports of the results and the R script with all pre-processing, data merging and analyses carried out. The documentation of the additional, explorative analysis is also available. The actual PDFs and text files of the scientific papers used are not included as they are published open access.

The folder structure is shown below with the file names and a brief description of the contents of each file. For details concerning the analyses approach, please refer to the master's thesis (publication following soon).

## Data sources

Folder 01_SourceData/

- PLOS-Dataset_v2_Mar23.csv (PLOS-OSI dataset)

- ScopusSearch_ExportResults.csv (export of Scopus search results from Scopus)

- ScopusSearch_ExportResults.ris (export of Scopus search results from Scopus)

- Zotero_Export_ScopusSearch.csv (export of the file names and DOIs of the Scopus search results from Zotero)

## Automatic classification

Folder 02_AutomaticClassification/

- (NOT INCLUDED) PDFs folder (Folder for PDFs of all publications identified by the Scopus search, named AuthorLastName_Year_PublicationTitle_Title)

- (NOT INCLUDED) PDFs_to_text folder (Folder for all texts extracted from the PDFs by ODDPub, named AuthorLastName_Year_PublicationTitle_Title)

- PLOS_ScopusSearch_matched.csv (merge of the Scopus search results with the PLOS_OSI dataset for the files contained in both)

- oddpub_results_wDOIs.csv (results file of the ODDPub classification)

- PLOS_ODDPub.csv (merge of the results file of the ODDPub classification with the PLOS-OSI dataset for the publications contained in both)

## Manual coding

Folder 03_ManualCheck/

- CodeSheet_ManualCheck.txt (Code sheet with descriptions of the variables for manual coding)

- ManualCheck_2023-06-08.csv (Manual coding results file)

- PLOS_ODDPub_Manual.csv (Merge of the results file of the ODDPub and PLOS-OSI classification with the results file of the manual coding)

## Explorative analysis for the discoverability of open data

Folder04_FurtherAnalyses

Proof_of_of_Concept_Open_Data_Monitoring.pdf (Description of the explorative analysis of the discoverability of open data publications using the example of a researcher) - in German

## R-Script

Analyses_MA_OpenDataMonitoring.R (R-Script for preparing, merging and analyzing the data and for performing the ODDPub algorithm)
D
Data Mining Software Report
marketresearchforecast.com
doc, pdf, ppt
Updated Mar 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Data Mining Software Report [Dataset]. https://www.marketresearchforecast.com/reports/data-mining-software-41235
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Mar 19, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global Data Mining Software market is experiencing robust growth, driven by the increasing need for businesses to extract valuable insights from massive datasets. The market, estimated at $15 billion in 2025, is projected to witness a Compound Annual Growth Rate (CAGR) of 12% from 2025 to 2033, reaching an estimated $45 billion by 2033. This expansion is fueled by several key factors. The burgeoning adoption of cloud-based solutions offers scalability and cost-effectiveness, attracting both large enterprises and SMEs. Furthermore, advancements in machine learning and artificial intelligence algorithms are enhancing the accuracy and efficiency of data mining processes, leading to better decision-making across various sectors like finance, healthcare, and marketing. The rise of big data analytics and the increasing availability of affordable, high-powered computing resources are also significant contributors to market growth. However, the market faces certain challenges. Data security and privacy concerns remain paramount, especially with the increasing volume of sensitive information being processed. The complexity of data mining software and the need for skilled professionals to operate and interpret the results present a barrier to entry for some businesses. The high initial investment cost associated with implementing sophisticated data mining solutions can also deter smaller organizations. Nevertheless, the ongoing technological advancements and the growing recognition of the strategic value of data-driven decision-making are expected to overcome these restraints and propel the market toward continued expansion. The market segmentation reveals a strong preference for cloud-based solutions, reflecting the industry's trend toward flexible and scalable IT infrastructure. Large enterprises currently dominate the market share, but SMEs are rapidly adopting data mining software, indicating promising future growth in this segment. Geographic analysis shows that North America and Europe are currently leading the market, but the Asia-Pacific region is poised for significant growth due to increasing digitalization and economic expansion in countries like China and India.
f
MSR Challenge Excel Figures
figshare.com
png
Updated Feb 20, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luke Soldano (2016). MSR Challenge Excel Figures [Dataset]. http://doi.org/10.6084/m9.figshare.2504173.v1
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.2504173.v1
Dataset updated
Feb 20, 2016
Dataset provided by
figshare
Authors
Luke Soldano
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Figures for the paper "The Relationship between Commit Message Detail and Defect Proneness in Java Projects on GitHub" submitted to the MSR 2016 Data Mining Challenge. These figures show the number of available Java projects with certain constraints applied. In particular, these constraints are number of contributors to the repository and number of commits to that repository.
Data from: Deep seabed mining: a rising environmental challenge
pacific-data.sprep.org
pdf
Updated Nov 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cuyvers, Luc (2022). Deep seabed mining: a rising environmental challenge [Dataset]. https://pacific-data.sprep.org/dataset/deep-seabed-mining-rising-environmental-challenge
Explore at:
pdfAvailable download formats
Dataset updated
Nov 5, 2022
Dataset provided by
International Union for Conservation of Naturehttp://iucn.org/
Authors
Cuyvers, Luc
License
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
Area covered
SPREP LIBRARY
Description
The sea below 200 meters depth accounts for 95% of the volume of the ocean, making it the largest habitat for life on Earth.Available onlineCall Number: [EL]ISBN/ISSN: 9782831719214Physical Description: 88 p
D
Data Mining and Modeling Report
datainsightsmarket.com
doc, pdf, ppt
Updated May 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Data Mining and Modeling Report [Dataset]. https://www.datainsightsmarket.com/reports/data-mining-and-modeling-1947982
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
May 26, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Data Mining and Modeling market is experiencing robust growth, driven by the exponential increase in data volume and the rising need for businesses to extract actionable insights for strategic decision-making. The market, estimated at $25 billion in 2025, is projected to expand at a Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching approximately $75 billion by 2033. This growth is fueled by several key factors, including the increasing adoption of cloud-based data mining solutions, the development of sophisticated analytical tools capable of handling big data, and the growing demand for predictive analytics across diverse sectors such as finance, healthcare, and retail. Furthermore, advancements in artificial intelligence (AI) and machine learning (ML) are significantly enhancing the capabilities of data mining and modeling tools, enabling more accurate predictions and deeper insights. The market is segmented by various deployment models (cloud, on-premise), analytical techniques (regression, classification, clustering), and industry verticals. The major restraints on market growth include the high cost of implementation and maintenance of data mining and modeling solutions, the scarcity of skilled professionals proficient in advanced analytical techniques, and concerns about data privacy and security. However, these challenges are being gradually addressed through the development of user-friendly tools, the emergence of specialized training programs, and the increasing adoption of robust security measures. The competitive landscape is characterized by a mix of established players like SAS and IBM, along with several specialized providers like Symbrium, Coheris, and Expert System. These companies are constantly innovating to enhance their offerings and cater to the evolving needs of businesses across various industries. The market's trajectory indicates a promising future driven by ongoing technological advancements and the increasing importance of data-driven decision-making in a rapidly evolving business environment.
t
Mining Processes Data - Dataset - LDM
service.tib.eu
Updated Dec 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Mining Processes Data - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/mining-processes-data
Explore at:
Dataset updated
Dec 2, 2024
Description
The dataset used in the paper is a real-world data challenge in the mining processes of a flotation plant.
4
BPI Challenge 2020: International Declarations
data.4tu.nl
application/gzip
Updated Jul 28, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Boudewijn van Dongen (2020). BPI Challenge 2020: International Declarations [Dataset]. http://doi.org/10.4121/uuid:2bbf8f6a-fc50-48eb-aa9e-c4ea5ef7e8c5
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.4121/uuid:2bbf8f6a-fc50-48eb-aa9e-c4ea5ef7e8c5
Dataset updated
Jul 28, 2020
Dataset provided by
4TU.ResearchData
Authors
Boudewijn van Dongen
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This file contains the events related to International Declarations: 6,449 cases, 72151 events Parent item: BPI Challenge 2020 The dataset contains events pertaining to two years of travel expense claims. In 2017, events were collected for two departments, in 2018 for the entire university. The various permits and declaration documents (domestic and international declarations, pre-paid travel costs and requests for payment) all follow a similar process flow. After submission by the employee, the request is sent for approval to the travel administration. If approved, the request is then forwarded to the budget owner and after that to the supervisor. If the budget owner and supervisor are the same person, then only one of these steps is taken. In some cases, the director also needs to approve the request.The process finished with either the trip taking place or a payment being requested and payed.

On a high level, we distinguish two types of trips, namely domestic and international. For domestic trips, no prior permission is needed, i.e. an employee can undertake these trips and ask for reimbursement of the costs afterwards. For international trips, permission is needed from the supervisor. This permission is obtained by filing a travel-permit and this travel permit should be approved before making any arrangements. To get the costs for a travel reimbursed, a claim is filed. This can be done as soon as costs are actually payed (for example for flights or conference registration fees), or within two months after the trip (for example hotel and food costs which are usually payed on the spot).
o
Data from: Mining Rule Violations in JavaScript Code Snippets
explore.openaire.eu
data.niaid.nih.gov
Updated May 26, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Uriel Ferreira Campos; Guilherme Smethurst; João Pedro Moraes; Rodrigo Bonifácio; Gustavo Pinto (2019). Mining Rule Violations in JavaScript Code Snippets [Dataset]. http://doi.org/10.5281/zenodo.2593818
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.2593818
Dataset updated
May 26, 2019
Authors
Uriel Ferreira Campos; Guilherme Smethurst; João Pedro Moraes; Rodrigo Bonifácio; Gustavo Pinto
Description
Content of this repository This is the repository that contains the scripts and dataset for the MSR 2019 mining challenge Github Repository with the software used : here. ======= DATASET The dataset was retrived utilizing google bigquery and dumped to a csv file for further processing, this original file with no treatment is called jsanswers.csv, here we can find the following information : 1. The Id of the question (PostId) 2. The Content (in this case the code block) 3. the lenght of the code block 4. the line count of the code block 5. The score of the post 6. The title A quick look at this files, one can notice that a postID can have multiple rows related to it, that's how multiple codeblocks are saved in the database. Filtered Dataset: Extracting code from CSV We used a python script called "ExtractCodeFromCSV.py" to extract the code from the original csv and merge all the codeblocks in their respective javascript file with the postID as name, this resulted in 336 thousand files. Running ESlint Due to the single threaded nature of ESlint, we needed to create a script to run ESlint because it took a huge toll on the machine to run it on 336 thousand files, this script is named "ESlintRunnerScript.py", it splits the files in 20 evenly distributed parts and runs 20 processes of esLinter to generate the reports, as such it generates 20 json files. Number of Violations per Rule This information was extracted using the script named "parser.py", it generated the file named "NumberofViolationsPerRule.csv" which contains the number of violations per rule used in the linter configuration in the dataset. Number of violations per Category As a way to make relevant statistics of the dataset, we generated the number of violations per rule category as defined in the eslinter website, this information was extracted using the same "parser.py" script. Individual Reports This information was extracted from the json reports, it's a csv file with PostID and violations per rule. Rules The file Rules with categories contains all the rules used and their categories.
m
SPHERE: Students' performance dataset of conceptual understanding,...
data.mendeley.com
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Purwoko Haryadi Santoso (2025). SPHERE: Students' performance dataset of conceptual understanding, scientific ability, and learning attitude in physics education research (PER) [Dataset]. http://doi.org/10.17632/88d7m2fv7p.2
Explore at:
Unique identifier
https://doi.org/10.17632/88d7m2fv7p.2
Dataset updated
Jan 15, 2025
Authors
Purwoko Haryadi Santoso
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The SPHERE is students' performance in physics education research dataset. It is presented as a multi-domain learning dataset of students’ performance on physics that has been collected through several research-based assessments (RBAs) established by the physics education research (PER) community. A total of 497 eleventh-grade students were involved from three large and a small public high school located in a suburban district of a high-populated province in Indonesia. Some variables related to demographics, accessibility to literature resources, and students’ physics identity are also investigated. Some RBAs utilized in this data were selected based on concepts learned by the students in the Indonesian physics curriculum. We commenced the survey of students’ understanding on Newtonian mechanics at the end of the first semester using Force Concept Inventory (FCI) and Force and Motion Conceptual Evaluation (FMCE). In the second semester, we assessed the students’ scientific abilities and learning attitude through Scientific Abilities Assessment Rubrics (SAAR) and the Colorado Learning Attitudes about Science Survey (CLASS) respectively. The conceptual assessments were continued at the second semester measured through Rotational and Rolling Motion Conceptual Survey (RRMCS), Fluid Mechanics Concept Inventory (FMCI), Mechanical Waves Conceptual Survey (MWCS), Thermal Concept Evaluation (TCE), and Survey of Thermodynamic Processes and First and Second Laws (STPFaSL). We expect SPHERE could be a valuable dataset for supporting the advancement of the PER field particularly in quantitative studies. For example, there is a need to help advance research on using machine learning and data mining techniques in PER that might face challenges due to the unavailable dataset for the specific purpose of PER studies. SPHERE can be reused as a students’ performance dataset on physics specifically dedicated for PER scholars which might be willing to implement machine learning techniques in physics education.
ExploreSA: Gawler Challenge datasets
researchdata.edu.au
Updated Mar 12, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department for Energy and Mining (2020). ExploreSA: Gawler Challenge datasets [Dataset]. https://researchdata.edu.au/exploresa-gawler-challenge-datasets/2208438
Explore at:
Dataset updated
Mar 12, 2020
Dataset provided by
Government of South Australiahttp://sa.gov.au/
Authors
Department for Energy and Mining
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Gawler
Description
ExploreSA: The Gawler Challenge is a global online competition from the Government of South Australia. For quick access to South Australian open data, the SARIG product catalog has created an “ExploreSAGawlerChallenge” tag to filter data of interest for the challenge. The product catalog uses the metadata profile of ISO19115 and allows you to discover and query metadata, including links to associated resources such as web services, maps, publications or data downloads.
Mapping Mining to SDGs (Sustainable Development Goals)
cookislands-data.sprep.org
pacificdata.org
+14more
pdf
Updated Feb 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Secretariat of the Pacific Regional Environment Programme (2025). Mapping Mining to SDGs (Sustainable Development Goals) [Dataset]. https://cookislands-data.sprep.org/dataset/mapping-mining-sdgs-sustainable-development-goals
Explore at:
pdf(5977998)Available download formats
Dataset updated
Feb 20, 2025
Dataset provided by
Pacific Regional Environment Programmehttps://www.sprep.org/
License
Public Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
Area covered
Pacific Region
Description
The 2030 Agenda for Sustainable Development and the Sustainable Development Goals (SDGs) represent the world’s plan of action for social inclusion, environmental sustainability and economic development. The mining industry has an unprecedented opportunity to mobilize significant human, physical, technological and financial resources to advance the SDGs.

Mining is a global industry and is often located in remote, ecologically sensitive and less-developed areas that include many indigenous lands and territories. When managed appropriately, it can create jobs, spur innovation and bring investment and infrastructure at a game-changing scale over long time horizons. Yet, if managed poorly, mining can also lead to environmental degradation, displaced populations, inequality and increased conflict, among other challenges.

By mapping the linkages between mining and the SDGs, the aim of this Atlas is to encourage mining companies of all sizes to incorporate relevant SDGs into their business and operations, validate their current efforts and spark new ideas.
f
Customer information database.
plos.figshare.com
xls
Updated Jun 5, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Huijun Chen (2023). Customer information database. [Dataset]. http://doi.org/10.1371/journal.pone.0285506.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0285506.t002
Dataset updated
Jun 5, 2023
Dataset provided by
PLOS ONE
Authors
Huijun Chen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The technological development in the new economic era has brought challenges to enterprises. Enterprises need to use massive and effective consumption information to provide customers with high-quality customized services. Big data technology has strong mining ability. The relevant theories of computer data mining technology are summarized to optimize the marketing strategy of enterprises. The application of data mining in precision marketing services is analyzed. Extreme Gradient Boosting (XGBoost) has shown strong advantages in machine learning algorithms. In order to help enterprises to analyze customer data quickly and accurately, the characteristics of XGBoost feedback are used to reverse the main factors that can affect customer activation cards, and effective analysis is carried out for these factors. The data obtained from the analysis points out the direction of effective marketing for potential customers to be activated. Finally, the performance of XGBoost is compared with the other three methods. The characteristics that affect the top 7 prediction results are tested for differences. The results show that: (1) the accuracy and recall rate of the proposed model are higher than other algorithms, and the performance is the best. (2) The significance p values of the features included in the test are all less than 0.001. The data shows that there is a very significant difference between the proposed features and the results of activation or not. The contributions of this paper are mainly reflected in two aspects. 1. Four precision marketing strategies based on big data mining are designed to provide scientific support for enterprise decision-making. 2. The improvement of the connection rate and stickiness between enterprises and customers has played a huge driving role in overall customer marketing.
l
LScDC (Leicester Scientific Dictionary-Core)
figshare.le.ac.uk
docx
Updated Apr 15, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neslihan Suzen (2020). LScDC (Leicester Scientific Dictionary-Core) [Dataset]. http://doi.org/10.25392/leicester.data.9896579.v3
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.25392/leicester.data.9896579.v3
Dataset updated
Apr 15, 2020
Dataset provided by
University of Leicester
Authors
Neslihan Suzen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Leicester
Description
The LScDC (Leicester Scientific Dictionary-Core Dictionary)April 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk/suzenneslihan@hotmail.com)Supervised by Prof Alexander Gorban and Dr Evgeny Mirkes[Version 3] The third version of LScDC (Leicester Scientific Dictionary-Core) is formed using the updated LScD (Leicester Scientific Dictionary) - Version 3*. All steps applied to build the new version of core dictionary are the same as in Version 2** and can be found in description of Version 2 below. We did not repeat the explanation. The files provided with this description are also same as described as for LScDC Version 2. The numbers of words in the 3rd versions of LScD and LScDC are summarized below. # of wordsLScD (v3) 972,060LScDC (v3) 103,998 * Suzen, Neslihan (2019): LScD (Leicester Scientific Dictionary). figshare. Dataset. https://doi.org/10.25392/leicester.data.9746900.v3 ** Suzen, Neslihan (2019): LScDC (Leicester Scientific Dictionary-Core). figshare. Dataset. https://doi.org/10.25392/leicester.data.9896579.v2[Version 2] Getting StartedThis file describes a sorted and cleaned list of words from LScD (Leicester Scientific Dictionary), explains steps for sub-setting the LScD and basic statistics of words in the LSC (Leicester Scientific Corpus), to be found in [1, 2]. The LScDC (Leicester Scientific Dictionary-Core) is a list of words ordered by the number of documents containing the words, and is available in the CSV file published. There are 104,223 unique words (lemmas) in the LScDC. This dictionary is created to be used in future work on the quantification of the sense of research texts. The objective of sub-setting the LScD is to discard words which appear too rarely in the corpus. In text mining algorithms, usage of enormous number of text data brings the challenge to the performance and the accuracy of data mining applications. The performance and the accuracy of models are heavily depend on the type of words (such as stop words and content words) and the number of words in the corpus. Rare occurrence of words in a collection is not useful in discriminating texts in large corpora as rare words are likely to be non-informative signals (or noise) and redundant in the collection of texts. The selection of relevant words also holds out the possibility of more effective and faster operation of text mining algorithms.To build the LScDC, we decided the following process on LScD: removing words that appear in no more than 10 documents (
Data Science Platform Market Analysis, Size, and Forecast 2025-2029: North...
technavio.com
Updated Feb 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2025). Data Science Platform Market Analysis, Size, and Forecast 2025-2029: North America (US and Canada), Europe (France, Germany, UK), APAC (China, India, Japan), South America (Brazil), and Middle East and Africa (UAE) [Dataset]. https://www.technavio.com/report/data-science-platform-market-industry-analysis
Explore at:
Dataset updated
Feb 15, 2025
Dataset provided by
TechNavio
Authors
Technavio
Time period covered
2021 - 2025
Area covered
Global, Canada, United States
Description
Snapshot img

Data Science Platform Market Size 2025-2029

The data science platform market size is forecast to increase by USD 763.9 million, at a CAGR of 40.2% between 2024 and 2029.

The market is experiencing significant growth, driven by the increasing integration of Artificial Intelligence (AI) and Machine Learning (ML) technologies. This fusion enables organizations to derive deeper insights from their data, fueling business innovation and decision-making. Another trend shaping the market is the emergence of containerization and microservices in data science platforms. This approach offers enhanced flexibility, scalability, and efficiency, making it an attractive choice for businesses seeking to streamline their data science operations. However, the market also faces challenges. Data privacy and security remain critical concerns, with the increasing volume and complexity of data posing significant risks. Ensuring robust data security and privacy measures is essential for companies to maintain customer trust and comply with regulatory requirements. Additionally, managing the complexity of data science platforms and ensuring seamless integration with existing systems can be a daunting task, requiring significant investment in resources and expertise. Companies must navigate these challenges effectively to capitalize on the market's opportunities and stay competitive in the rapidly evolving data landscape.

What will be the Size of the Data Science Platform Market during the forecast period?

Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
Request Free SampleThe market continues to evolve, driven by the increasing demand for advanced analytics and artificial intelligence solutions across various sectors. Real-time analytics and classification models are at the forefront of this evolution, with APIs integrations enabling seamless implementation. Deep learning and model deployment are crucial components, powering applications such as fraud detection and customer segmentation. Data science platforms provide essential tools for data cleaning and data transformation, ensuring data integrity for big data analytics. Feature engineering and data visualization facilitate model training and evaluation, while data security and data governance ensure data privacy and compliance. Machine learning algorithms, including regression models and clustering models, are integral to predictive modeling and anomaly detection. Statistical analysis and time series analysis provide valuable insights, while ETL processes streamline data integration. Cloud computing enables scalability and cost savings, while risk management and algorithm selection optimize model performance. Natural language processing and sentiment analysis offer new opportunities for data storytelling and computer vision. Supply chain optimization and recommendation engines are among the latest applications of data science platforms, demonstrating their versatility and continuous value proposition. Data mining and data warehousing provide the foundation for these advanced analytics capabilities.

How is this Data Science Platform Industry segmented?

The data science platform industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. DeploymentOn-premisesCloudComponentPlatformServicesEnd-userBFSIRetail and e-commerceManufacturingMedia and entertainmentOthersSectorLarge enterprisesSMEsApplicationData PreparationData VisualizationMachine LearningPredictive AnalyticsData GovernanceOthersGeographyNorth AmericaUSCanadaEuropeFranceGermanyUKMiddle East and AfricaUAEAPACChinaIndiaJapanSouth AmericaBrazilRest of World (ROW)

By Deployment Insights

The on-premises segment is estimated to witness significant growth during the forecast period.In the dynamic the market, businesses increasingly adopt solutions to gain real-time insights from their data, enabling them to make informed decisions. Classification models and deep learning algorithms are integral parts of these platforms, providing capabilities for fraud detection, customer segmentation, and predictive modeling. API integrations facilitate seamless data exchange between systems, while data security measures ensure the protection of valuable business information. Big data analytics and feature engineering are essential for deriving meaningful insights from vast datasets. Data transformation, data mining, and statistical analysis are crucial processes in data preparation and discovery. Machine learning models, including regression and clustering, are employed for model training and evaluation. Time series analysis and natural language processing are valuable tools for understanding trends and customer sen
Datasets associated with "Mining of Consumer Product and Purchasing Data to...
catalog.data.gov
Updated Jul 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2021). Datasets associated with "Mining of Consumer Product and Purchasing Data to Identify Potential Chemical Co-exposures" [Dataset]. https://catalog.data.gov/dataset/datasets-associated-with-mining-of-consumer-product-and-purchasing-data-to-identify-potent
Explore at:
Dataset updated
Jul 26, 2021
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Description
Background: Chemicals in consumer products are a major contributor to human chemical co-exposures. Consumers purchase and use a wide variety of products containing potentially thousands of chemicals. There is a need to identify potential real-world chemical co-exposures in order to prioritize in vitro toxicity screening. However, due to the vast number of potential chemical combinations, this has been a major challenge. Objectives: We aim to develop and implement a data-driven procedure for identifying prevalent chemical combinations to which humans are exposed through purchase and use of consumer products. Methods: We applied frequent itemset mining on an integrated dataset linking consumer product chemical ingredient data with product purchasing data from sixty thousand households to identify chemical combinations resulting from co-use of consumer products. Results: We identified co-occurrence patterns of chemicals over all households as well as those specific to demographic groups based on race/ethnicity, income, education, and family composition. We also identified chemicals with the highest potential for aggregate exposure by identifying chemicals occurring in multiple products used by the same household. Lastly, a case study of chemicals active in estrogen and androgen receptor in silico models revealed priority chemical combinations co-targeting receptors involved in important biological signaling pathways. Discussion: Integration and comprehensive analysis of household purchasing data and product-chemical information provided a means to assess human near-field exposure and inform selection of chemical combinations for high-throughput screening in in vitro assays. This dataset is associated with the following publication: Stanfield, Z., C. Addington, K. Dionisio, D. Lyons, R. Tornero-Velez, K. Phillips, T. Buckley, and K. Isaacs. Mining of consumer product and purchasing data to identify potential chemical co-exposures.. ENVIRONMENTAL HEALTH PERSPECTIVES. National Institute of Environmental Health Sciences (NIEHS), Research Triangle Park, NC, USA, 129(6): N/A, (2021).
r
International Journal of Engineering and Advanced Technology Publication fee...
researchhelpdesk.org
Updated Jun 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Research Help Desk (2022). International Journal of Engineering and Advanced Technology Publication fee - ResearchHelpDesk [Dataset]. https://www.researchhelpdesk.org/journal/publication-fee/552/international-journal-of-engineering-and-advanced-technology
Explore at:
Dataset updated
Jun 25, 2022
Dataset authored and provided by
Research Help Desk
Description
International Journal of Engineering and Advanced Technology Publication fee - ResearchHelpDesk - International Journal of Engineering and Advanced Technology (IJEAT) is having Online-ISSN 2249-8958, bi-monthly international journal, being published in the months of February, April, June, August, October, and December by Blue Eyes Intelligence Engineering & Sciences Publication (BEIESP) Bhopal (M.P.), India since the year 2011. It is academic, online, open access, double-blind, peer-reviewed international journal. It aims to publish original, theoretical and practical advances in Computer Science & Engineering, Information Technology, Electrical and Electronics Engineering, Electronics and Telecommunication, Mechanical Engineering, Civil Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. All submitted papers will be reviewed by the board of committee of IJEAT. Aim of IJEAT Journal disseminate original, scientific, theoretical or applied research in the field of Engineering and allied fields. dispense a platform for publishing results and research with a strong empirical component. aqueduct the significant gap between research and practice by promoting the publication of original, novel, industry-relevant research. seek original and unpublished research papers based on theoretical or experimental works for the publication globally. publish original, theoretical and practical advances in Computer Science & Engineering, Information Technology, Electrical and Electronics Engineering, Electronics and Telecommunication, Mechanical Engineering, Civil Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. impart a platform for publishing results and research with a strong empirical component. create a bridge for a significant gap between research and practice by promoting the publication of original, novel, industry-relevant research. solicit original and unpublished research papers, based on theoretical or experimental works. Scope of IJEAT International Journal of Engineering and Advanced Technology (IJEAT) covers all topics of all engineering branches. Some of them are Computer Science & Engineering, Information Technology, Electronics & Communication, Electrical and Electronics, Electronics and Telecommunication, Civil Engineering, Mechanical Engineering, Textile Engineering and all interdisciplinary streams of Engineering Sciences. The main topic includes but not limited to: 1. Smart Computing and Information Processing Signal and Speech Processing Image Processing and Pattern Recognition WSN Artificial Intelligence and machine learning Data mining and warehousing Data Analytics Deep learning Bioinformatics High Performance computing Advanced Computer networking Cloud Computing IoT Parallel Computing on GPU Human Computer Interactions 2. Recent Trends in Microelectronics and VLSI Design Process & Device Technologies Low-power design Nanometer-scale integrated circuits Application specific ICs (ASICs) FPGAs Nanotechnology Nano electronics and Quantum Computing 3. Challenges of Industry and their Solutions, Communications Advanced Manufacturing Technologies Artificial Intelligence Autonomous Robots Augmented Reality Big Data Analytics and Business Intelligence Cyber Physical Systems (CPS) Digital Clone or Simulation Industrial Internet of Things (IIoT) Manufacturing IOT Plant Cyber security Smart Solutions – Wearable Sensors and Smart Glasses System Integration Small Batch Manufacturing Visual Analytics Virtual Reality 3D Printing 4. Internet of Things (IoT) Internet of Things (IoT) & IoE & Edge Computing Distributed Mobile Applications Utilizing IoT Security, Privacy and Trust in IoT & IoE Standards for IoT Applications Ubiquitous Computing Block Chain-enabled IoT Device and Data Security and Privacy Application of WSN in IoT Cloud Resources Utilization in IoT Wireless Access Technologies for IoT Mobile Applications and Services for IoT Machine/ Deep Learning with IoT & IoE Smart Sensors and Internet of Things for Smart City Logic, Functional programming and Microcontrollers for IoT Sensor Networks, Actuators for Internet of Things Data Visualization using IoT IoT Application and Communication Protocol Big Data Analytics for Social Networking using IoT IoT Applications for Smart Cities Emulation and Simulation Methodologies for IoT IoT Applied for Digital Contents 5. Microwaves and Photonics Microwave filter Micro Strip antenna Microwave Link design Microwave oscillator Frequency selective surface Microwave Antenna Microwave Photonics Radio over fiber Optical communication Optical oscillator Optical Link design Optical phase lock loop Optical devices 6. Computation Intelligence and Analytics Soft Computing Advance Ubiquitous Computing Parallel Computing Distributed Computing Machine Learning Information Retrieval Expert Systems Data Mining Text Mining Data Warehousing Predictive Analysis Data Management Big Data Analytics Big Data Security 7. Energy Harvesting and Wireless Power Transmission Energy harvesting and transfer for wireless sensor networks Economics of energy harvesting communications Waveform optimization for wireless power transfer RF Energy Harvesting Wireless Power Transmission Microstrip Antenna design and application Wearable Textile Antenna Luminescence Rectenna 8. Advance Concept of Networking and Database Computer Network Mobile Adhoc Network Image Security Application Artificial Intelligence and machine learning in the Field of Network and Database Data Analytic High performance computing Pattern Recognition 9. Machine Learning (ML) and Knowledge Mining (KM) Regression and prediction Problem solving and planning Clustering Classification Neural information processing Vision and speech perception Heterogeneous and streaming data Natural language processing Probabilistic Models and Methods Reasoning and inference Marketing and social sciences Data mining Knowledge Discovery Web mining Information retrieval Design and diagnosis Game playing Streaming data Music Modelling and Analysis Robotics and control Multi-agent systems Bioinformatics Social sciences Industrial, financial and scientific applications of all kind 10. Advanced Computer networking Computational Intelligence Data Management, Exploration, and Mining Robotics Artificial Intelligence and Machine Learning Computer Architecture and VLSI Computer Graphics, Simulation, and Modelling Digital System and Logic Design Natural Language Processing and Machine Translation Parallel and Distributed Algorithms Pattern Recognition and Analysis Systems and Software Engineering Nature Inspired Computing Signal and Image Processing Reconfigurable Computing Cloud, Cluster, Grid and P2P Computing Biomedical Computing Advanced Bioinformatics Green Computing Mobile Computing Nano Ubiquitous Computing Context Awareness and Personalization, Autonomic and Trusted Computing Cryptography and Applied Mathematics Security, Trust and Privacy Digital Rights Management Networked-Driven Multicourse Chips Internet Computing Agricultural Informatics and Communication Community Information Systems Computational Economics, Digital Photogrammetric Remote Sensing, GIS and GPS Disaster Management e-governance, e-Commerce, e-business, e-Learning Forest Genomics and Informatics Healthcare Informatics Information Ecology and Knowledge Management Irrigation Informatics Neuro-Informatics Open Source: Challenges and opportunities Web-Based Learning: Innovation and Challenges Soft computing Signal and Speech Processing Natural Language Processing 11. Communications Microstrip Antenna Microwave Radar and Satellite Smart Antenna MIMO Antenna Wireless Communication RFID Network and Applications 5G Communication 6G Communication 12. Algorithms and Complexity Sequential, Parallel And Distributed Algorithms And Data Structures Approximation And Randomized Algorithms Graph Algorithms And Graph Drawing On-Line And Streaming Algorithms Analysis Of Algorithms And Computational Complexity Algorithm Engineering Web Algorithms Exact And Parameterized Computation Algorithmic Game Theory Computational Biology Foundations Of Communication Networks Computational Geometry Discrete Optimization 13. Software Engineering and Knowledge Engineering Software Engineering Methodologies Agent-based software engineering Artificial intelligence approaches to software engineering Component-based software engineering Embedded and ubiquitous software engineering Aspect-based software engineering Empirical software engineering Search-Based Software engineering Automated software design and synthesis Computer-supported cooperative work Automated software specification Reverse engineering Software Engineering Techniques and Production Perspectives Requirements engineering Software analysis, design and modelling Software maintenance and evolution Software engineering tools and environments Software engineering decision support Software design patterns Software product lines Process and workflow management Reflection and metadata approaches Program understanding and system maintenance Software domain modelling and analysis Software economics Multimedia and hypermedia software engineering Software engineering case study and experience reports Enterprise software, middleware, and tools Artificial intelligent methods, models, techniques Artificial life and societies Swarm intelligence Smart Spaces Autonomic computing and agent-based systems Autonomic computing Adaptive Systems Agent architectures, ontologies, languages and protocols Multi-agent systems Agent-based learning and knowledge discovery Interface agents Agent-based auctions and marketplaces Secure mobile and multi-agent systems Mobile agents SOA and Service-Oriented Systems Service-centric software engineering Service oriented requirements engineering Service oriented architectures Middleware for service based systems Service discovery and composition Service level
L
Life Sciences Data Mining and Visualization Software Report
marketresearchforecast.com
doc, pdf, ppt
Updated Jun 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Market Research Forecast (2025). Life Sciences Data Mining and Visualization Software Report [Dataset]. https://www.marketresearchforecast.com/reports/life-sciences-data-mining-and-visualization-software-542790
Explore at:
pdf, doc, pptAvailable download formats
Dataset updated
Jun 16, 2025
Dataset authored and provided by
Market Research Forecast
License
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The Life Sciences Data Mining and Visualization Software market is experiencing robust growth, driven by the increasing volume of biological data generated through genomics, proteomics, and clinical trials. The market's expansion is fueled by the urgent need for efficient tools to analyze this complex data, enabling faster drug discovery, personalized medicine initiatives, and improved patient outcomes. Companies are increasingly investing in advanced analytics solutions to gain actionable insights from their data, leading to improved operational efficiency and reduced research and development costs. The integration of artificial intelligence (AI) and machine learning (ML) capabilities within these software solutions is a significant trend, enhancing the ability to identify patterns and make predictions from large datasets. This market is segmented by software type (e.g., data mining, visualization, integrated solutions), deployment mode (cloud, on-premise), and end-user (pharmaceutical companies, biotechnology firms, research institutions). Competition is fierce, with established players like IBM, Microsoft, and SAS competing with specialized life sciences focused companies and emerging innovative startups. While the market faces challenges such as the high cost of implementation and the need for specialized expertise, the long-term prospects remain positive. The continuous advancements in data generation technologies and the growing demand for data-driven decision-making in the life sciences sector will continue to fuel market growth. Furthermore, the increasing adoption of cloud-based solutions is expected to lower the barrier to entry for smaller companies and research institutions, further expanding the market. This makes the Life Sciences Data Mining and Visualization Software market a particularly attractive investment opportunity with high potential for both established players and new entrants. The market's estimated size in 2025 is $10 Billion, with a projected Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033.