100+ datasets found
  1. d

    Privacy Preserving Distributed Data Mining

    • catalog.data.gov
    • s.cnmilf.com
    Updated Apr 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Privacy Preserving Distributed Data Mining [Dataset]. https://catalog.data.gov/dataset/privacy-preserving-distributed-data-mining
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    Distributed data mining from privacy-sensitive multi-party data is likely to play an important role in the next generation of integrated vehicle health monitoring systems. For example, consider an airline manufacturer [tex]$\mathcal{C}$[/tex] manufacturing an aircraft model [tex]$A$[/tex] and selling it to five different airline operating companies [tex]$\mathcal{V}_1 \dots \mathcal{V}_5$[/tex]. These aircrafts, during their operation, generate huge amount of data. Mining this data can reveal useful information regarding the health and operability of the aircraft which can be useful for disaster management and prediction of efficient operating regimes. Now if the manufacturer [tex]$\mathcal{C}$[/tex] wants to analyze the performance data collected from different aircrafts of model-type [tex]$A$[/tex] belonging to different airlines then central collection of data for subsequent analysis may not be an option. It should be noted that the result of this analysis may be statistically more significant if the data for aircraft model [tex]$A$[/tex] across all companies were available to [tex]$\mathcal{C}$[/tex]. The potential problems arising out of such a data mining scenario are:

  2. Data Mining Tools Market - A Global and Regional Analysis

    • bisresearch.com
    csv, pdf
    Updated Nov 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bisresearch (2025). Data Mining Tools Market - A Global and Regional Analysis [Dataset]. https://bisresearch.com/industry-report/global-data-mining-tools-market.html
    Explore at:
    csv, pdfAvailable download formats
    Dataset updated
    Nov 30, 2025
    Dataset authored and provided by
    Bisresearch
    License

    https://bisresearch.com/privacy-policy-cookie-restriction-modehttps://bisresearch.com/privacy-policy-cookie-restriction-mode

    Time period covered
    2023 - 2033
    Area covered
    Worldwide
    Description

    The Data Mining Tools Market is expected to be valued at $1.24 billion in 2024, with an anticipated expansion at a CAGR of 11.63% to reach $3.73 billion by 2034.

  3. Z

    Data Analysis for the Systematic Literature Review of DL4SE

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cody Watson; Nathan Cooper; David Nader; Kevin Moran; Denys Poshyvanyk (2024). Data Analysis for the Systematic Literature Review of DL4SE [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4768586
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    College of William and Mary
    Washington and Lee University
    Authors
    Cody Watson; Nathan Cooper; David Nader; Kevin Moran; Denys Poshyvanyk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data Analysis is the process that supports decision-making and informs arguments in empirical studies. Descriptive statistics, Exploratory Data Analysis (EDA), and Confirmatory Data Analysis (CDA) are the approaches that compose Data Analysis (Xia & Gong; 2014). An Exploratory Data Analysis (EDA) comprises a set of statistical and data mining procedures to describe data. We ran EDA to provide statistical facts and inform conclusions. The mined facts allow attaining arguments that would influence the Systematic Literature Review of DL4SE.

    The Systematic Literature Review of DL4SE requires formal statistical modeling to refine the answers for the proposed research questions and formulate new hypotheses to be addressed in the future. Hence, we introduce DL4SE-DA, a set of statistical processes and data mining pipelines that uncover hidden relationships among Deep Learning reported literature in Software Engineering. Such hidden relationships are collected and analyzed to illustrate the state-of-the-art of DL techniques employed in the software engineering context.

    Our DL4SE-DA is a simplified version of the classical Knowledge Discovery in Databases, or KDD (Fayyad, et al; 1996). The KDD process extracts knowledge from a DL4SE structured database. This structured database was the product of multiple iterations of data gathering and collection from the inspected literature. The KDD involves five stages:

    Selection. This stage was led by the taxonomy process explained in section xx of the paper. After collecting all the papers and creating the taxonomies, we organize the data into 35 features or attributes that you find in the repository. In fact, we manually engineered features from the DL4SE papers. Some of the features are venue, year published, type of paper, metrics, data-scale, type of tuning, learning algorithm, SE data, and so on.

    Preprocessing. The preprocessing applied was transforming the features into the correct type (nominal), removing outliers (papers that do not belong to the DL4SE), and re-inspecting the papers to extract missing information produced by the normalization process. For instance, we normalize the feature “metrics” into “MRR”, “ROC or AUC”, “BLEU Score”, “Accuracy”, “Precision”, “Recall”, “F1 Measure”, and “Other Metrics”. “Other Metrics” refers to unconventional metrics found during the extraction. Similarly, the same normalization was applied to other features like “SE Data” and “Reproducibility Types”. This separation into more detailed classes contributes to a better understanding and classification of the paper by the data mining tasks or methods.

    Transformation. In this stage, we omitted to use any data transformation method except for the clustering analysis. We performed a Principal Component Analysis to reduce 35 features into 2 components for visualization purposes. Furthermore, PCA also allowed us to identify the number of clusters that exhibit the maximum reduction in variance. In other words, it helped us to identify the number of clusters to be used when tuning the explainable models.

    Data Mining. In this stage, we used three distinct data mining tasks: Correlation Analysis, Association Rule Learning, and Clustering. We decided that the goal of the KDD process should be oriented to uncover hidden relationships on the extracted features (Correlations and Association Rules) and to categorize the DL4SE papers for a better segmentation of the state-of-the-art (Clustering). A clear explanation is provided in the subsection “Data Mining Tasks for the SLR od DL4SE”. 5.Interpretation/Evaluation. We used the Knowledge Discover to automatically find patterns in our papers that resemble “actionable knowledge”. This actionable knowledge was generated by conducting a reasoning process on the data mining outcomes. This reasoning process produces an argument support analysis (see this link).

    We used RapidMiner as our software tool to conduct the data analysis. The procedures and pipelines were published in our repository.

    Overview of the most meaningful Association Rules. Rectangles are both Premises and Conclusions. An arrow connecting a Premise with a Conclusion implies that given some premise, the conclusion is associated. E.g., Given that an author used Supervised Learning, we can conclude that their approach is irreproducible with a certain Support and Confidence.

    Support = Number of occurrences this statement is true divided by the amount of statements Confidence = The support of the statement divided by the number of occurrences of the premise

  4. w

    Global Data Mining and Modeling Market Research Report: By Application...

    • wiseguyreports.com
    Updated Aug 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global Data Mining and Modeling Market Research Report: By Application (Fraud Detection, Customer Segmentation, Risk Management, Market Basket Analysis), By Deployment Model (Cloud, On-Premises, Hybrid), By Technique (Predictive Analytics, Descriptive Analytics, Prescriptive Analytics, Text Mining), By End Use (Retail, Telecommunications, Banking and Financial Services, Healthcare) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/data-mining-and-modeling-market
    Explore at:
    Dataset updated
    Aug 23, 2025
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Aug 25, 2025
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2023
    REGIONS COVEREDNorth America, Europe, APAC, South America, MEA
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20247.87(USD Billion)
    MARKET SIZE 20258.37(USD Billion)
    MARKET SIZE 203515.4(USD Billion)
    SEGMENTS COVEREDApplication, Deployment Model, Technique, End Use, Regional
    COUNTRIES COVEREDUS, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
    KEY MARKET DYNAMICSGrowing demand for actionable insights, Increasing adoption of AI technologies, Rising need for predictive analytics, Expanding data sources and volume, Regulatory compliance and data privacy concerns
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDInformatica, Tableau, Cloudera, Microsoft, Google, Alteryx, Oracle, SAP, SAS, DataRobot, Dell Technologies, Qlik, Teradata, TIBCO Software, Snowflake, IBM
    MARKET FORECAST PERIOD2025 - 2035
    KEY MARKET OPPORTUNITIESIncreased demand for predictive analytics, Growth in big data technologies, Rising need for data-driven decision-making, Adoption of AI and machine learning, Expansion in healthcare data analysis
    COMPOUND ANNUAL GROWTH RATE (CAGR) 6.3% (2025 - 2035)
  5. Data Mining Project - Boston

    • kaggle.com
    zip
    Updated Nov 25, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SophieLiu (2019). Data Mining Project - Boston [Dataset]. https://www.kaggle.com/sliu65/data-mining-project-boston
    Explore at:
    zip(59313797 bytes)Available download formats
    Dataset updated
    Nov 25, 2019
    Authors
    SophieLiu
    Area covered
    Boston
    Description

    Context

    To make this a seamless process, I cleaned the data and delete many variables that I thought were not important to our dataset. I then uploaded all of those files to Kaggle for each of you to download. The rideshare_data has both lyft and uber but it is still a cleaned version from the dataset we downloaded from Kaggle.

    Use of Data Files

    You can easily subset the data into the car types that you will be modeling by first loading the csv into R, here is the code for how you do this:

    This loads the file into R

    df<-read.csv('uber.csv')

    The next codes is to subset the data into specific car types. The example below only has Uber 'Black' car types.

    df_black<-subset(uber_df, uber_df$name == 'Black')

    This next portion of code will be to load it into R. First, we must write this dataframe into a csv file on our computer in order to load it into R.

    write.csv(df_black, "nameofthefileyouwanttosaveas.csv")

    The file will appear in you working directory. If you are not familiar with your working directory. Run this code:

    getwd()

    The output will be the file path to your working directory. You will find the file you just created in that folder.

    Inspiration

    Your data will be in front of the world's largest data science community. What questions do you want to see answered?

  6. f

    DataSheet_1_The TargetMine Data Warehouse: Enhancement and Updates.pdf

    • frontiersin.figshare.com
    pdf
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yi-An Chen; Lokesh P. Tripathi; Takeshi Fujiwara; Tatsuya Kameyama; Mari N. Itoh; Kenji Mizuguchi (2023). DataSheet_1_The TargetMine Data Warehouse: Enhancement and Updates.pdf [Dataset]. http://doi.org/10.3389/fgene.2019.00934.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Frontiers
    Authors
    Yi-An Chen; Lokesh P. Tripathi; Takeshi Fujiwara; Tatsuya Kameyama; Mari N. Itoh; Kenji Mizuguchi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Biological data analysis is the key to new discoveries in disease biology and drug discovery. The rapid proliferation of high-throughput ‘omics’ data has necessitated a need for tools and platforms that allow the researchers to combine and analyse different types of biological data and obtain biologically relevant knowledge. We had previously developed TargetMine, an integrative data analysis platform for target prioritisation and broad-based biological knowledge discovery. Here, we describe the newly modelled biological data types and the enhanced visual and analytical features of TargetMine. These enhancements have included: an enhanced coverage of gene–gene relations, small molecule metabolite to pathway mappings, an improved literature survey feature, and in silico prediction of gene functional associations such as protein–protein interactions and global gene co-expression. We have also described two usage examples on trans-omics data analysis and extraction of gene-disease associations using MeSH term descriptors. These examples have demonstrated how the newer enhancements in TargetMine have contributed to a more expansive coverage of the biological data space and can help interpret genotype–phenotype relations. TargetMine with its auxiliary toolkit is available at https://targetmine.mizuguchilab.org. The TargetMine source code is available at https://github.com/chenyian-nibio/targetmine-gradle.

  7. Privacy Preserving Distributed Data Mining - Dataset - NASA Open Data Portal...

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Privacy Preserving Distributed Data Mining - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/privacy-preserving-distributed-data-mining
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Distributed data mining from privacy-sensitive multi-party data is likely to play an important role in the next generation of integrated vehicle health monitoring systems. For example, consider an airline manufacturer [tex]$\mathcal{C}$[/tex] manufacturing an aircraft model [tex]$A$[/tex] and selling it to five different airline operating companies [tex]$\mathcal{V}_1 \dots \mathcal{V}_5$[/tex]. These aircrafts, during their operation, generate huge amount of data. Mining this data can reveal useful information regarding the health and operability of the aircraft which can be useful for disaster management and prediction of efficient operating regimes. Now if the manufacturer [tex]$\mathcal{C}$[/tex] wants to analyze the performance data collected from different aircrafts of model-type [tex]$A$[/tex] belonging to different airlines then central collection of data for subsequent analysis may not be an option. It should be noted that the result of this analysis may be statistically more significant if the data for aircraft model [tex]$A$[/tex] across all companies were available to [tex]$\mathcal{C}$[/tex]. The potential problems arising out of such a data mining scenario are:

  8. e

    U.S. Data Analysis Storage Management Market Research Report By Product Type...

    • exactitudeconsultancy.com
    Updated Mar 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Exactitude Consultancy (2025). U.S. Data Analysis Storage Management Market Research Report By Product Type (On-Premises, Cloud-Based), By Application (Data Warehousing, Data Mining, Big Data Analytics), By End User (Healthcare, BFSI, Retail, IT and Telecom), By Technology (Hadoop, SQL Databases, NoSQL Databases), By Distribution Channel (Direct Sales, Online Sales) – Forecast to 2034. [Dataset]. https://exactitudeconsultancy.com/reports/50774/u-s-data-analysis-storage-management-market
    Explore at:
    Dataset updated
    Mar 2025
    Dataset authored and provided by
    Exactitude Consultancy
    License

    https://exactitudeconsultancy.com/privacy-policyhttps://exactitudeconsultancy.com/privacy-policy

    Description

    The U.S. Data Analysis Storage Management market is projected to be valued at $10 billion in 2024, driven by factors such as increasing consumer awareness and the rising prevalence of industry-specific trends. The market is expected to grow at a CAGR of 12%, reaching approximately $31 billion by 2034.

  9. b

    Data from: Ontology of Core Data Mining Entities

    • bioregistry.io
    Updated Jul 5, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2014). Ontology of Core Data Mining Entities [Dataset]. https://bioregistry.io/ontodm
    Explore at:
    Dataset updated
    Jul 5, 2014
    Description

    OntoDM-core defines the most essential data mining entities in a three-layered ontological structure comprising of a specification, an implementation and an application layer. It provides a representational framework for the description of mining structured data, and in addition provides taxonomies of datasets, data mining tasks, generalizations, data mining algorithms and constraints, based on the type of data. OntoDM-core is designed to support a wide range of applications/use cases, such as semantic annotation of data mining algorithms, datasets and results; annotation of QSAR studies in the context of drug discovery investigations; and disambiguation of terms in text mining. (from abstract)

  10. w

    Global Big Data Tool Market Research Report: By Deployment Model...

    • wiseguyreports.com
    Updated Sep 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global Big Data Tool Market Research Report: By Deployment Model (On-Premises, Cloud-Based, Hybrid), By Tool Type (Data Processing Tools, Data Visualization Tools, Data Mining Tools, Data Storage Tools), By End User (BFSI, Healthcare, Retail, Manufacturing, IT and Telecommunications), By Application (Fraud Detection, Customer Analytics, Predictive Maintenance, Supply Chain Management) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/big-data-tool-market
    Explore at:
    Dataset updated
    Sep 15, 2025
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Sep 25, 2025
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2023
    REGIONS COVEREDNorth America, Europe, APAC, South America, MEA
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 202439.4(USD Billion)
    MARKET SIZE 202543.3(USD Billion)
    MARKET SIZE 2035110.5(USD Billion)
    SEGMENTS COVEREDDeployment Model, Tool Type, End User, Application, Regional
    COUNTRIES COVEREDUS, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
    KEY MARKET DYNAMICSIncreasing data volume, Demand for real-time analytics, Growing cloud adoption, Rising need for data security, Advancements in AI technologies
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDInformatica, IBM, Amazon Web Services, Snowflake, DataRobot, Domo, Oracle, SAP, Microsoft, MongoDB, Cloudera, Google, SAS Institute, Teradata, Qlik, Hortonworks
    MARKET FORECAST PERIOD2025 - 2035
    KEY MARKET OPPORTUNITIESIncreased demand for analytics solutions, Growth in cloud-based big data tools, Rise of AI and machine learning integration, Expansion in real-time data processing, Enhanced data privacy and security needs
    COMPOUND ANNUAL GROWTH RATE (CAGR) 9.8% (2025 - 2035)
  11. DatasetofDatasets (DoD)

    • kaggle.com
    zip
    Updated Aug 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Konstantinos Malliaridis (2024). DatasetofDatasets (DoD) [Dataset]. https://www.kaggle.com/terminalgr/datasetofdatasets-124-1242024
    Explore at:
    zip(7583 bytes)Available download formats
    Dataset updated
    Aug 12, 2024
    Authors
    Konstantinos Malliaridis
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset is essentially the metadata from 164 datasets. Each of its lines concerns a dataset from which 22 features have been extracted, which are used to classify each dataset into one of the categories 0-Unmanaged, 2-INV, 3-SI, 4-NOA (DatasetType).

    This Dataset consists of 164 Rows. Each row is the metadata of an other dataset. The target column is datasetType which has 4 values indicating the dataset type. These are:

    2 - Invoice detail (INV): This dataset type is a special report (usually called Detailed Sales Statement) produced by a Company Accounting or an Enterprise Resource Planning software (ERP). Using a INV-type dataset directly for ARM is extremely convenient for users as it relieves them from the tedious work of transforming data into another more suitable form. INV-type data input typically includes a header but, only two of its attributes are essential for data mining. The first attribute serves as the grouping identifier creating a unique transaction (e.g., Invoice ID, Order Number), while the second attribute contains the items utilized for data mining (e.g., Product Code, Product Name, Product ID).

    3 - Sparse Item (SI): This type is widespread in Association Rules Mining (ARM). It involves a header and a fixed number of columns. Each item corresponds to a column. Each row represents a transaction. The typical cell stores a value, usually one character in length, that depicts the presence or absence of the item in the corresponding transaction. The absence character must be identified or declared before the Association Rules Mining process takes place.

    4 - Nominal Attributes (NOA): This type is commonly used in Machine Learning and Data Mining tasks. It involves a fixed number of columns. Each column registers nominal/categorical values. The presence of a header row is optional. However, in cases where no header is provided, there is a risk of extracting incorrect rules if similar values exist in different attributes of the dataset. The potential values for each attribute can vary.

    0 - Unmanaged for ARM: On the other hand, not all datasets are suitable for extracting useful association rules or frequent item sets. For instance, datasets characterized predominantly by numerical features with arbitrary values, or datasets that involve fragmented or mixed types of data types. For such types of datasets, ARM processing becomes possible only by introducing a data discretization stage which in turn introduces information loss. Such types of datasets are not considered in the present treatise and they are termed (0) Unmanaged in the sequel.

    The dataset type is crucial to determine for ARM, and the current dataset is used to classify the dataset's type using a Supervised Machine Learning Model.

    There is and another dataset type named 1 - Market Basket List (MBL) where each dataset row is a transaction. A transaction involves a variable number of items. However, due to this characteristic, these datasets can be easily categorized using procedural programming and DoD does not include instances of them. For more details about Dataset Types please refer to article "WebApriori: a web application for association rules mining". https://link.springer.com/chapter/10.1007/978-3-030-49663-0_44

  12. w

    Global Data Market Research Report: By Data Type (Structured Data,...

    • wiseguyreports.com
    Updated Sep 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global Data Market Research Report: By Data Type (Structured Data, Unstructured Data, Semi-Structured Data, Big Data), By Deployment Model (On-Premises, Cloud, Hybrid Cloud), By Application (Business Intelligence, Data Analytics, Data Warehousing, Data Mining), By End Use Industry (Healthcare, Retail, Finance, Telecommunications) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/data-market
    Explore at:
    Dataset updated
    Sep 15, 2025
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Sep 25, 2025
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2023
    REGIONS COVEREDNorth America, Europe, APAC, South America, MEA
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 2024153.8(USD Billion)
    MARKET SIZE 2025192.4(USD Billion)
    MARKET SIZE 20351800.0(USD Billion)
    SEGMENTS COVEREDData Type, Deployment Model, Application, End Use Industry, Regional
    COUNTRIES COVEREDUS, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
    KEY MARKET DYNAMICSData privacy regulations, Cloud computing adoption, Big data analytics growth, Artificial intelligence integration, Internet of Things expansion
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDAccenture, IBM, Snowflake, Palantir Technologies, DataRobot, Oracle, Salesforce, Tencent, Alibaba, SAP, Microsoft, Intel, Cloudera, Amazon, Google, Cisco
    MARKET FORECAST PERIOD2025 - 2035
    KEY MARKET OPPORTUNITIESData-driven decision making, Cloud data storage expansion, AI and machine learning integration, Data privacy solutions demand, Real-time analytics and insights
    COMPOUND ANNUAL GROWTH RATE (CAGR) 25.1% (2025 - 2035)
  13. Data from: CONCEPT- DM2 DATA MODEL TO ANALYSE HEALTHCARE PATHWAYS OF TYPE 2...

    • zenodo.org
    bin, png, zip
    Updated Jul 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Berta Ibáñez-Beroiz; Berta Ibáñez-Beroiz; Asier Ballesteros-Domínguez; Asier Ballesteros-Domínguez; Ignacio Oscoz-Villanueva; Ignacio Oscoz-Villanueva; Ibai Tamayo; Ibai Tamayo; Julián Librero; Julián Librero; Mónica Enguita-Germán; Mónica Enguita-Germán; Francisco Estupiñán-Romero; Francisco Estupiñán-Romero; Enrique Bernal-Delgado; Enrique Bernal-Delgado (2024). CONCEPT- DM2 DATA MODEL TO ANALYSE HEALTHCARE PATHWAYS OF TYPE 2 DIABETES [Dataset]. http://doi.org/10.5281/zenodo.7778291
    Explore at:
    bin, png, zipAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Berta Ibáñez-Beroiz; Berta Ibáñez-Beroiz; Asier Ballesteros-Domínguez; Asier Ballesteros-Domínguez; Ignacio Oscoz-Villanueva; Ignacio Oscoz-Villanueva; Ibai Tamayo; Ibai Tamayo; Julián Librero; Julián Librero; Mónica Enguita-Germán; Mónica Enguita-Germán; Francisco Estupiñán-Romero; Francisco Estupiñán-Romero; Enrique Bernal-Delgado; Enrique Bernal-Delgado
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Technical notes and documentation on the common data model of the project CONCEPT-DM2.

    This publication corresponds to the Common Data Model (CDM) specification of the CONCEPT-DM2 project for the implementation of a federated network analysis of the healthcare pathway of type 2 diabetes.

    Aims of the CONCEPT-DM2 project:

    General aim: To analyse chronic care effectiveness and efficiency of care pathways in diabetes, assuming the relevance of care pathways as independent factors of health outcomes using data from real life world (RWD) from five Spanish Regional Health Systems.

    Main specific aims:

    • To characterize the care pathways in patients with diabetes through the whole care system in terms of process indicators and pharmacologic recommendations
    • To compare these observed care pathways with the theoretical clinical pathways derived from the clinical practice guidelines
    • To assess if the adherence to clinical guidelines influence on important health outcomes, such as cardiovascular hospitalizations.
    • To compare the traditional analytical methods with process mining methods in terms of modeling quality, prediction performance and information provided.

    Study Design: It is a population-based retrospective observational study centered on all T2D patients diagnosed in five Regional Health Services within the Spanish National Health Service. We will include all the contacts of these patients with the health services using the electronic medical record systems including Primary Care data, Specialized Care data, Hospitalizations, Urgent Care data, Pharmacy Claims, and also other registers such as the mortality and the population register.

    Cohort definition: All patients with code of Type 2 Diabetes in the clinical health records

    • Inclusion criteria: patients that, at 01/01/2017 or during the follow-up from 01/01/2017 to 31/12/2022 had active health card (active TIS - tarjeta sanitaria activa) and code of type 2 diabetes (T2D, DM2 in spanish) in the clinical records of primary care (CIAP2 T90 in case of using CIAP code system)
    • Exclusion criteria:
      • patients with no contact with the health system from 01/01/2017 to 31/12/2022
      • patients that had a T1D (DM1) code opened after the T2D code during the follow-up.
    • Study period. From 01/01/2017 to 31/12/2022

    Files included in this publication:

    • Datamodel_CONCEPT_DM2_diagram.png
    • Common data model specification (Datamodel_CONCEPT_DM2_v.0.1.0.xlsx)
    • Synthetic datasets (Datamodel_CONCEPT_DM2_sample_data)
      • sample_data1_dm_patient.csv
      • sample_data2_dm_param.csv
      • sample_data3_dm_patient.csv
      • sample_data4_dm_param.csv
      • sample_data5_dm_patient.csv
      • sample_data6_dm_param.csv
      • sample_data7_dm_param.csv
      • sample_data8_dm_param.csv
    • Datamodel_CONCEPT_DM2_explanation.pptx
  14. h

    AI Clinical Data Mining Market to See Incredible Expansion

    • htfmarketinsights.com
    pdf & excel
    Updated Oct 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HTF Market Intelligence (2025). AI Clinical Data Mining Market to See Incredible Expansion [Dataset]. https://htfmarketinsights.com/report/4373835-ai-clinical-data-mining-market
    Explore at:
    pdf & excelAvailable download formats
    Dataset updated
    Oct 6, 2025
    Dataset authored and provided by
    HTF Market Intelligence
    License

    https://www.htfmarketinsights.com/privacy-policyhttps://www.htfmarketinsights.com/privacy-policy

    Time period covered
    2019 - 2031
    Area covered
    Global
    Description

    Global AI Clinical Data Mining Market is segmented by Application (Healthcare_Pharmaceuticals_Biotechnology_IT_Research), Type (Data Mining Algorithms_Clinical Trial Data Analysis_EHR Data Mining_AI for Predictive Analytics_Medical Data Integration), and Geography (North America_ LATAM_ West Europe_Central & Eastern Europe_ Northern Europe_ Southern Europe_ East Asia_ Southeast Asia_ South Asia_ Central Asia_ Oceania_ MEA)

  15. w

    Global Big Data Analytics and Hadoop Market Research Report: By Deployment...

    • wiseguyreports.com
    Updated Oct 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global Big Data Analytics and Hadoop Market Research Report: By Deployment Type (On-Premise, Cloud-Based), By Application (Customer Analytics, Operations Analytics, Risk Management, Fraud Detection, Predictive Maintenance), By End Use Industry (Retail, Healthcare, Banking, Telecommunications, Manufacturing), By Solution Type (Data Management, Data Integration, Data Visualization, Data Mining, Process Analytics) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/big-data-analytics-and-hadoop-market
    Explore at:
    Dataset updated
    Oct 14, 2025
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Oct 25, 2025
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2023
    REGIONS COVEREDNorth America, Europe, APAC, South America, MEA
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 202471.8(USD Billion)
    MARKET SIZE 202578.9(USD Billion)
    MARKET SIZE 2035200.0(USD Billion)
    SEGMENTS COVEREDDeployment Type, Application, End Use Industry, Solution Type, Regional
    COUNTRIES COVEREDUS, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
    KEY MARKET DYNAMICSIncreasing data volume, Growing demand for insights, Rising adoption of cloud services, Need for real-time analytics, Advancements in machine learning
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDSAS Institute, Amazon, Hortonworks, Micro Focus, SAP, Teradata, Google, Dell Technologies, Microsoft, Snowflake, Palantir Technologies, Databricks, Cloudera, IBM, Oracle
    MARKET FORECAST PERIOD2025 - 2035
    KEY MARKET OPPORTUNITIESIncreased cloud adoption, Real-time data analytics demand, AI integration with Hadoop, Enhanced data privacy regulations, Growth in IoT data processing
    COMPOUND ANNUAL GROWTH RATE (CAGR) 9.8% (2025 - 2035)
  16. l

    LSC (Leicester Scientific Corpus)

    • figshare.le.ac.uk
    Updated Apr 15, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neslihan Suzen (2020). LSC (Leicester Scientific Corpus) [Dataset]. http://doi.org/10.25392/leicester.data.9449639.v1
    Explore at:
    Dataset updated
    Apr 15, 2020
    Dataset provided by
    University of Leicester
    Authors
    Neslihan Suzen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Leicester
    Description

    The LSC (Leicester Scientific Corpus)August 2019 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk) Supervised by Prof Alexander Gorban and Dr Evgeny MirkesThe data is extracted from the Web of Science® [1] You may not copy or distribute this data in whole or in part without the written consent of Clarivate Analytics.Getting StartedThis text provides background information on the LSC (Leicester Scientific Corpus) and pre-processing steps on abstracts, and describes the structure of files to organise the corpus. This corpus is created to be used in future work on the quantification of the sense of research texts. One of the goal of publishing the data is to make it available for further analysis and use in Natural Language Processing projects.LSC is a collection of abstracts of articles and proceeding papers published in 2014, and indexed by the Web of Science (WoS) database [1]. Each document contains title, list of authors, list of categories, list of research areas, and times cited. The corpus contains only documents in English.The corpus was collected in July 2018 online and contains the number of citations from publication date to July 2018.Each document in the corpus contains the following parts:1. Authors: The list of authors of the paper2. Title: The title of the paper3. Abstract: The abstract of the paper4. Categories: One or more category from the list of categories [2]. Full list of categories is presented in file ‘List_of _Categories.txt’.5. Research Areas: One or more research area from the list of research areas [3]. Full list of research areas is presented in file ‘List_of_Research_Areas.txt’.6. Total Times cited: The number of times the paper was cited by other items from all databases within Web of Science platform [4]7. Times cited in Core Collection: The total number of times the paper was cited by other papers within the WoS Core Collection [4]We describe a document as the collection of information (about a paper) listed above. The total number of documents in LSC is 1,673,824.All documents in LSC have nonempty abstract, title, categories, research areas and times cited in WoS databases. There are 119 documents with empty authors list, we did not exclude these documents.Data ProcessingThis section describes all steps in order for the LSC to be collected, clean and available to researchers. Processing the data consists of six main steps:Step 1: Downloading of the Data OnlineThis is the step of collecting the dataset online. This is done manually by exporting documents as Tab-delimitated files. All downloaded documents are available online.Step 2: Importing the Dataset to RThis is the process of converting the collection to RData format for processing the data. The LSC was collected as TXT files. All documents are extracted to R.Step 3: Cleaning the Data from Documents with Empty Abstract or without CategoryNot all papers have abstract and categories in the collection. As our research is based on the analysis of abstracts and categories, preliminary detecting and removing inaccurate documents were performed. All documents with empty abstracts and documents without categories are removed.Step 4: Identification and Correction of Concatenate Words in AbstractsTraditionally, abstracts are written in a format of executive summary with one paragraph of continuous writing, which is known as ‘unstructured abstract’. However, especially medicine-related publications use ‘structured abstracts’. Such type of abstracts are divided into sections with distinct headings such as introduction, aim, objective, method, result, conclusion etc.Used tool for extracting abstracts leads concatenate words of section headings with the first word of the section. As a result, some of structured abstracts in the LSC require additional process of correction to split such concatenate words. For instance, we observe words such as ConclusionHigher and ConclusionsRT etc. in the corpus. The detection and identification of concatenate words cannot be totally automated. Human intervention is needed in the identification of possible headings of sections. We note that we only consider concatenate words in headings of sections as it is not possible to detect all concatenate words without deep knowledge of research areas. Identification of such words is done by sampling of medicine-related publications. The section headings in such abstracts are listed in the List 1.List 1 Headings of sections identified in structured abstractsBackground Method(s) DesignTheoretical Measurement(s) LocationAim(s) Methodology ProcessAbstract Population ApproachObjective(s) Purpose(s) Subject(s)Introduction Implication(s) Patient(s)Procedure(s) Hypothesis Measure(s)Setting(s) Limitation(s) DiscussionConclusion(s) Result(s) Finding(s)Material (s) Rationale(s)Implications for health and nursing policyAll words including headings in the List 1 are detected in entire corpus, and then words are split into two words. For instance, the word ‘ConclusionHigher’ is split into ‘Conclusion’ and ‘Higher’.Step 5: Extracting (Sub-setting) the Data Based on Lengths of AbstractsAfter correction of concatenate words is completed, the lengths of abstracts are calculated. ‘Length’ indicates the totalnumber of words in the text, calculated by the same rule as for Microsoft Word ‘word count’ [5].According to APA style manual [6], an abstract should contain between 150 to 250 words. However, word limits vary from journal to journal. For instance, Journal of Vascular Surgery recommends that ‘Clinical and basic research studies must include a structured abstract of 400 words or less’[7].In LSC, the length of abstracts varies from 1 to 3805. We decided to limit length of abstracts from 30 to 500 words in order to study documents with abstracts of typical length ranges and to avoid the effect of the length to the analysis. Documents containing less than 30 and more than 500 words in abstracts are removed.Step 6: Saving the Dataset into CSV FormatCorrected and extracted documents are saved into 36 CSV files. The structure of files are described in the following section.The Structure of Fields in CSV FilesIn CSV files, the information is organised with one record on each line and parts of abstract, title, list of authors, list of categories, list of research areas, and times cited is recorded in separated fields.To access the LSC for research purposes, please email to ns433@le.ac.uk.References[1]Web of Science. (15 July). Available: https://apps.webofknowledge.com/[2]WoS Subject Categories. Available: https://images.webofknowledge.com/WOKRS56B5/help/WOS/hp_subject_category_terms_tasca.html[3]Research Areas in WoS. Available: https://images.webofknowledge.com/images/help/WOS/hp_research_areas_easca.html[4]Times Cited in WoS Core Collection. (15 July). Available: https://support.clarivate.com/ScientificandAcademicResearch/s/article/Web-of-Science-Times-Cited-accessibility-and-variation?language=en_US[5]Word Count. Available: https://support.office.com/en-us/article/show-word-count-3c9e6a11-a04d-43b4-977c-563a0e0d5da3[6]A. P. Association, Publication manual. American Psychological Association Washington, DC, 1983.[7]P. Gloviczki and P. F. Lawrence, "Information for authors," Journal of Vascular Surgery, vol. 65, no. 1, pp. A16-A22, 2017.

  17. w

    Global Data and Analytics DaaS for MID Market Research Report: By Service...

    • wiseguyreports.com
    Updated Oct 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global Data and Analytics DaaS for MID Market Research Report: By Service Model (Data Integration, Data Visualization, Data Mining, Predictive Analytics), By Deployment Type (Cloud-Based, On-Premises), By Industry (Healthcare, Retail, Manufacturing, Financial Services), By Data Type (Structured Data, Unstructured Data, Semi-Structured Data) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/data-and-analytics-d-a-service-for-mid-market
    Explore at:
    Dataset updated
    Oct 14, 2025
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Oct 25, 2025
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2023
    REGIONS COVEREDNorth America, Europe, APAC, South America, MEA
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20249.03(USD Billion)
    MARKET SIZE 20259.73(USD Billion)
    MARKET SIZE 203520.5(USD Billion)
    SEGMENTS COVEREDService Model, Deployment Type, Industry, Data Type, Regional
    COUNTRIES COVEREDUS, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
    KEY MARKET DYNAMICSGrowing demand for data insights, Increasing adoption of cloud solutions, Rising importance of data security, Need for scalable analytics tools, Shortage of data skilled professionals
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDTableau, Qlik, Domo, TIBCO, SAP, MicroStrategy, Google, Zoho, Microsoft, Salesforce, Infor, SAS, Looker, IBM, Sisense, Oracle
    MARKET FORECAST PERIOD2025 - 2035
    KEY MARKET OPPORTUNITIESCloud-based analytics solutions, Real-time data insights, AI-driven data management, Scalable DaaS platforms, Industry-specific analytics tools
    COMPOUND ANNUAL GROWTH RATE (CAGR) 7.8% (2025 - 2035)
  18. h

    Global Data Mining Tools Market - Global Outlook 2020-2033

    • htfmarketinsights.com
    pdf & excel
    Updated Oct 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HTF Market Intelligence (2025). Global Data Mining Tools Market - Global Outlook 2020-2033 [Dataset]. https://www.htfmarketinsights.com/report/4364059-data-mining-tools-market
    Explore at:
    pdf & excelAvailable download formats
    Dataset updated
    Oct 15, 2025
    Dataset authored and provided by
    HTF Market Intelligence
    License

    https://www.htfmarketinsights.com/privacy-policyhttps://www.htfmarketinsights.com/privacy-policy

    Time period covered
    2019 - 2031
    Area covered
    Global
    Description

    Global Data Mining Tools Market is segmented by Application (Predictive analytics_Fraud detection_Marketing_Healthcare diagnostics_Manufacturing optimization), Type (Classification tools_Clustering tools_Regression tools_Association tools_Text mining tools), and Geography (North America_ LATAM_ West Europe_Central & Eastern Europe_ Northern Europe_ Southern Europe_ East Asia_ Southeast Asia_ South Asia_ Central Asia_ Oceania_ MEA)

  19. Sample data (five types of features of one participant)

    • figshare.com
    txt
    Updated Mar 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hua Liao (2022). Sample data (five types of features of one participant) [Dataset]. http://doi.org/10.6084/m9.figshare.19443503.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Mar 29, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Hua Liao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sample data (five types of features of one participant)

  20. e

    Open Source Intelligence (OSINT) Market Research Report By Product Type...

    • exactitudeconsultancy.com
    Updated May 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Exactitude Consultancy (2025). Open Source Intelligence (OSINT) Market Research Report By Product Type (Data Mining Tools, Analysis Software, Visualization Tools), By Application (Cybersecurity, Fraud Detection, Market Research), By End User (Government, Healthcare, Retail, Financial Services), By Technology (Machine Learning, Natural Language Processing, Cloud Computing), By Distribution Channel (Direct Sales, Online Sales) – Forecast to 2034 [Dataset]. https://exactitudeconsultancy.com/reports/64800/open-source-intelligence-osint-market
    Explore at:
    Dataset updated
    May 2025
    Dataset authored and provided by
    Exactitude Consultancy
    License

    https://exactitudeconsultancy.com/privacy-policyhttps://exactitudeconsultancy.com/privacy-policy

    Description

    The OSINT market is projected to be valued at $8.5 billion in 2024, driven by factors such as increasing consumer awareness and the rising prevalence of industry-specific trends. The market is expected to grow at a CAGR of 11.5%, reaching approximately $24 billion by 2034.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dashlink (2025). Privacy Preserving Distributed Data Mining [Dataset]. https://catalog.data.gov/dataset/privacy-preserving-distributed-data-mining

Privacy Preserving Distributed Data Mining

Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description

Distributed data mining from privacy-sensitive multi-party data is likely to play an important role in the next generation of integrated vehicle health monitoring systems. For example, consider an airline manufacturer [tex]$\mathcal{C}$[/tex] manufacturing an aircraft model [tex]$A$[/tex] and selling it to five different airline operating companies [tex]$\mathcal{V}_1 \dots \mathcal{V}_5$[/tex]. These aircrafts, during their operation, generate huge amount of data. Mining this data can reveal useful information regarding the health and operability of the aircraft which can be useful for disaster management and prediction of efficient operating regimes. Now if the manufacturer [tex]$\mathcal{C}$[/tex] wants to analyze the performance data collected from different aircrafts of model-type [tex]$A$[/tex] belonging to different airlines then central collection of data for subsequent analysis may not be an option. It should be noted that the result of this analysis may be statistically more significant if the data for aircraft model [tex]$A$[/tex] across all companies were available to [tex]$\mathcal{C}$[/tex]. The potential problems arising out of such a data mining scenario are:

Search
Clear search
Close search
Google apps
Main menu