100+ datasets found
  1. d

    Data from: A Generic Local Algorithm for Mining Data Streams in Large...

    • catalog.data.gov
    • datasets.ai
    • +2more
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). A Generic Local Algorithm for Mining Data Streams in Large Distributed Systems [Dataset]. https://catalog.data.gov/dataset/a-generic-local-algorithm-for-mining-data-streams-in-large-distributed-systems
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    In a large network of computers or wireless sensors, each of the components (henceforth, peers) has some data about the global state of the system. Much of the system's functionality such as message routing, information retrieval and load sharing relies on modeling the global state. We refer to the outcome of the function (e.g., the load experienced by each peer) as the emph{model} of the system. Since the state of the system is constantly changing, it is necessary to keep the models up-to-date. Computing global data mining models e.g. decision trees, k-means clustering in large distributed systems may be very costly due to the scale of the system and due to communication cost, which may be high. The cost further increases in a dynamic scenario when the data changes rapidly. In this paper we describe a two step approach for dealing with these costs. First, we describe a highly efficient emph{local} algorithm which can be used to monitor a wide class of data mining models. Then, we use this algorithm as a feedback loop for the monitoring of complex functions of the data such as its k-means clustering. The theoretical claims are corroborated with a thorough experimental analysis.

  2. Data Mining Tools Market - A Global and Regional Analysis

    • bisresearch.com
    csv, pdf
    Updated Nov 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bisresearch (2025). Data Mining Tools Market - A Global and Regional Analysis [Dataset]. https://bisresearch.com/industry-report/global-data-mining-tools-market.html
    Explore at:
    csv, pdfAvailable download formats
    Dataset updated
    Nov 30, 2025
    Dataset authored and provided by
    Bisresearch
    License

    https://bisresearch.com/privacy-policy-cookie-restriction-modehttps://bisresearch.com/privacy-policy-cookie-restriction-mode

    Time period covered
    2023 - 2033
    Area covered
    Worldwide
    Description

    The Data Mining Tools Market is expected to be valued at $1.24 billion in 2024, with an anticipated expansion at a CAGR of 11.63% to reach $3.73 billion by 2034.

  3. f

    Table_4_Revealing Relationships Among Cognitive Functions Using Functional...

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated May 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hiroki Kurashige; Jun Kaneko; Yuichi Yamashita; Rieko Osu; Yohei Otaka; Takashi Hanakawa; Manabu Honda; Hideaki Kawabata (2023). Table_4_Revealing Relationships Among Cognitive Functions Using Functional Connectivity and a Large-Scale Meta-Analysis Database.XLSX [Dataset]. http://doi.org/10.3389/fnhum.2019.00457.s017
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Frontiers
    Authors
    Hiroki Kurashige; Jun Kaneko; Yuichi Yamashita; Rieko Osu; Yohei Otaka; Takashi Hanakawa; Manabu Honda; Hideaki Kawabata
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To characterize each cognitive function per se and to understand the brain as an aggregate of those functions, it is vital to relate dozens of these functions to each other. Knowledge about the relationships among cognitive functions is informative not only for basic neuroscientific research but also for clinical applications and developments of brain-inspired artificial intelligence. In the present study, we propose an exhaustive data mining approach to reveal relationships among cognitive functions based on functional brain mapping and network analysis. We began our analysis with 109 pseudo-activation maps (cognitive function maps; CFM) that were reconstructed from a functional magnetic resonance imaging meta-analysis database, each of which corresponds to one of 109 cognitive functions such as ‘emotion,’ ‘attention,’ ‘episodic memory,’ etc. Based on the resting-state functional connectivity between the CFMs, we mapped the cognitive functions onto a two-dimensional space where the relevant functions were located close to each other, which provided a rough picture of the brain as an aggregate of cognitive functions. Then, we conducted so-called conceptual analysis of cognitive functions using clustering of voxels in each CFM connected to the other 108 CFMs with various strengths. As a result, a CFM for each cognitive function was subdivided into several parts, each of which is strongly associated with some CFMs for a subset of the other cognitive functions, which brought in sub-concepts (i.e., sub-functions) of the cognitive function. Moreover, we conducted network analysis for the network whose nodes were parcels derived from whole-brain parcellation based on the whole-brain voxel-to-CFM resting-state functional connectivities. Since each parcel is characterized by associations with the 109 cognitive functions, network analyses using them are expected to inform about relationships between cognitive and network characteristics. Indeed, we found that informational diversities of interaction between parcels and densities of local connectivity were dependent on the kinds of associated functions. In addition, we identified the homogeneous and inhomogeneous network communities about the associated functions. Altogether, we suggested the effectiveness of our approach in which we fused the large-scale meta-analysis of functional brain mapping with the methods of network neuroscience to investigate the relationships among cognitive functions.

  4. Data Mining Tools Market Size, Share, Growth, Forecast, By Component...

    • verifiedmarketresearch.com
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VERIFIED MARKET RESEARCH (2025). Data Mining Tools Market Size, Share, Growth, Forecast, By Component (Software, Services), By Deployment Mode (On-Premise, Cloud-Based), By Function (Data Cleaning, Data Integration, Data Transformation, Data Visualization), By Application (Marketing, Fraud Detection & Risk Management, Cybersecurity, Customer Relationship Management (CRM)) [Dataset]. https://www.verifiedmarketresearch.com/product/data-mining-tools-market/
    Explore at:
    Dataset updated
    Jun 13, 2025
    Dataset provided by
    Verified Market Researchhttps://www.verifiedmarketresearch.com/
    Authors
    VERIFIED MARKET RESEARCH
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2026 - 2032
    Area covered
    Global
    Description

    Data Mining Tools Market size was valued at USD 915.42 Million in 2024 and is projected to reach USD 2171.21 Million by 2032, growing at a CAGR of 11.40% from 2026 to 2032.• Big Data Explosion: Exponential growth in data generation from IoT devices, social media, mobile applications, and digital transactions is creating massive datasets requiring advanced mining tools for analysis. Organizations need sophisticated solutions to extract meaningful insights from structured and unstructured data sources for competitive advantage.• Digital Transformation Initiatives: Accelerating digital transformation across industries is driving demand for data mining tools that enable data-driven decision making and business intelligence. Companies are investing in analytics capabilities to optimize operations, improve customer experiences, and develop new revenue streams through data monetization strategies.

  5. w

    Global Life Science Data Mining and Visualization Software Market Research...

    • wiseguyreports.com
    Updated Sep 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global Life Science Data Mining and Visualization Software Market Research Report: By Application (Drug Discovery, Clinical Data Management, Genomic Research, Patient Data Analysis), By Deployment Type (On-Premises, Cloud-Based, Hybrid), By End User (Pharmaceutical Companies, Biotechnology Firms, Research Organizations, Academic Institutions), By Functionality (Data Mining, Data Visualization, Predictive Analytics, Statistical Analysis) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/life-science-data-mining-and-visualization-software-market
    Explore at:
    Dataset updated
    Sep 15, 2025
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Sep 25, 2025
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2023
    REGIONS COVEREDNorth America, Europe, APAC, South America, MEA
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20245.92(USD Billion)
    MARKET SIZE 20256.34(USD Billion)
    MARKET SIZE 203512.5(USD Billion)
    SEGMENTS COVEREDApplication, Deployment Type, End User, Functionality, Regional
    COUNTRIES COVEREDUS, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
    KEY MARKET DYNAMICSIncreasing data complexity, Growing demand for analytics, Rising need for regulatory compliance, Advancements in AI technologies, Enhanced data visualization techniques
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDRapidMiner, Elsevier, IBM, BioStat, Palantir Technologies, Oracle, Tableau, Altair Engineering, Biovia, Microsoft, Wolfram Research, Minitab, Cytel, TIBCO Software, SAS Institute, Qlik
    MARKET FORECAST PERIOD2025 - 2035
    KEY MARKET OPPORTUNITIESGrowing demand for personalized medicine, Advancements in big data analytics, Increasing use of AI and ML technologies, Rising adoption of cloud-based solutions, Expanding regulatory compliance requirements
    COMPOUND ANNUAL GROWTH RATE (CAGR) 7.1% (2025 - 2035)
  6. Table_11_Revealing Relationships Among Cognitive Functions Using Functional...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hiroki Kurashige; Jun Kaneko; Yuichi Yamashita; Rieko Osu; Yohei Otaka; Takashi Hanakawa; Manabu Honda; Hideaki Kawabata (2023). Table_11_Revealing Relationships Among Cognitive Functions Using Functional Connectivity and a Large-Scale Meta-Analysis Database.XLSX [Dataset]. http://doi.org/10.3389/fnhum.2019.00457.s012
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Hiroki Kurashige; Jun Kaneko; Yuichi Yamashita; Rieko Osu; Yohei Otaka; Takashi Hanakawa; Manabu Honda; Hideaki Kawabata
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To characterize each cognitive function per se and to understand the brain as an aggregate of those functions, it is vital to relate dozens of these functions to each other. Knowledge about the relationships among cognitive functions is informative not only for basic neuroscientific research but also for clinical applications and developments of brain-inspired artificial intelligence. In the present study, we propose an exhaustive data mining approach to reveal relationships among cognitive functions based on functional brain mapping and network analysis. We began our analysis with 109 pseudo-activation maps (cognitive function maps; CFM) that were reconstructed from a functional magnetic resonance imaging meta-analysis database, each of which corresponds to one of 109 cognitive functions such as ‘emotion,’ ‘attention,’ ‘episodic memory,’ etc. Based on the resting-state functional connectivity between the CFMs, we mapped the cognitive functions onto a two-dimensional space where the relevant functions were located close to each other, which provided a rough picture of the brain as an aggregate of cognitive functions. Then, we conducted so-called conceptual analysis of cognitive functions using clustering of voxels in each CFM connected to the other 108 CFMs with various strengths. As a result, a CFM for each cognitive function was subdivided into several parts, each of which is strongly associated with some CFMs for a subset of the other cognitive functions, which brought in sub-concepts (i.e., sub-functions) of the cognitive function. Moreover, we conducted network analysis for the network whose nodes were parcels derived from whole-brain parcellation based on the whole-brain voxel-to-CFM resting-state functional connectivities. Since each parcel is characterized by associations with the 109 cognitive functions, network analyses using them are expected to inform about relationships between cognitive and network characteristics. Indeed, we found that informational diversities of interaction between parcels and densities of local connectivity were dependent on the kinds of associated functions. In addition, we identified the homogeneous and inhomogeneous network communities about the associated functions. Altogether, we suggested the effectiveness of our approach in which we fused the large-scale meta-analysis of functional brain mapping with the methods of network neuroscience to investigate the relationships among cognitive functions.

  7. w

    Global Data Science Tool Market Research Report: By Application (Predictive...

    • wiseguyreports.com
    Updated Sep 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global Data Science Tool Market Research Report: By Application (Predictive Analytics, Data Mining, Machine Learning, Statistical Analysis), By Deployment Model (On-Premise, Cloud-Based, Hybrid), By End User (Retail, Healthcare, Finance, Manufacturing), By Functionality (Data Visualization, Data Preparation, Model Building, Model Deployment) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/data-science-tool-market
    Explore at:
    Dataset updated
    Sep 15, 2025
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Sep 25, 2025
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2023
    REGIONS COVEREDNorth America, Europe, APAC, South America, MEA
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20249.0(USD Billion)
    MARKET SIZE 202510.05(USD Billion)
    MARKET SIZE 203530.0(USD Billion)
    SEGMENTS COVEREDApplication, Deployment Model, End User, Functionality, Regional
    COUNTRIES COVEREDUS, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
    KEY MARKET DYNAMICSGrowing demand for data-driven insights, Increasing adoption of machine learning, Rising need for data visualization tools, Expanding use of big data analytics, Emergence of cloud-based solutions
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDRapidMiner, IBM, Snowflake, TIBCO Software, Datarobot, Oracle, Tableau, Teradata, MathWorks, Microsoft, Cloudera, Google, SAS Institute, Alteryx, Qlik, DataRobot
    MARKET FORECAST PERIOD2025 - 2035
    KEY MARKET OPPORTUNITIESIncreased demand for AI solutions, Growing importance of big data analytics, Rising adoption of cloud-based tools, Integration of automation technologies, Expanding use cases across industries
    COMPOUND ANNUAL GROWTH RATE (CAGR) 11.6% (2025 - 2035)
  8. w

    Global AI Driven Analytics Platform Market Research Report: By Deployment...

    • wiseguyreports.com
    Updated Aug 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global AI Driven Analytics Platform Market Research Report: By Deployment Type (Cloud-Based, On-Premises, Hybrid), By Component (Software, Services), By Industry (Healthcare, Retail, Finance, Manufacturing, Telecommunications), By Functionality (Data Mining, Real-Time Analytics, Predictive Analytics, Prescriptive Analytics) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/ai-driven-analytics-platform-market
    Explore at:
    Dataset updated
    Aug 18, 2025
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Aug 25, 2025
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2023
    REGIONS COVEREDNorth America, Europe, APAC, South America, MEA
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20247.26(USD Billion)
    MARKET SIZE 20258.14(USD Billion)
    MARKET SIZE 203525.5(USD Billion)
    SEGMENTS COVEREDDeployment Type, Component, Industry, Functionality, Regional
    COUNTRIES COVEREDUS, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
    KEY MARKET DYNAMICSGrowing demand for data-driven insights, Increasing adoption of cloud technologies, Rise in automation across industries, Enhancements in machine learning algorithms, Increased focus on real-time analytics
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDTableau, Microsoft, Google, Alteryx, Oracle, Domo, TIBCO, SAP, SAS, Qlik, Salesforce, IBM
    MARKET FORECAST PERIOD2025 - 2035
    KEY MARKET OPPORTUNITIESIncreased demand for predictive analytics, Growing adoption of cloud-based solutions, Integration with IoT devices, Expansion in emerging markets, Enhanced decision-making capabilities.
    COMPOUND ANNUAL GROWTH RATE (CAGR) 12.1% (2025 - 2035)
  9. w

    Global Content Analytics Discovery Cognitive Software Market Research...

    • wiseguyreports.com
    Updated Oct 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global Content Analytics Discovery Cognitive Software Market Research Report: By Application (Sentiment Analysis, Data Mining, Predictive Analytics, Risk Management), By Deployment Type (On-Premises, Cloud-Based, Hybrid), By End User (BFSI, Healthcare, Retail, Telecommunications, Government), By Functionality (Text Analytics, Speech Analytics, Social Media Analytics, Web Analytics) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/content-analytics-discovery-cognitive-software-market
    Explore at:
    Dataset updated
    Oct 15, 2025
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Oct 25, 2025
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2023
    REGIONS COVEREDNorth America, Europe, APAC, South America, MEA
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20245.08(USD Billion)
    MARKET SIZE 20255.61(USD Billion)
    MARKET SIZE 203515.0(USD Billion)
    SEGMENTS COVEREDApplication, Deployment Type, End User, Functionality, Regional
    COUNTRIES COVEREDUS, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
    KEY MARKET DYNAMICSGrowing data volumes, Increasing AI adoption, Enhanced consumer insights, Competitive differentiation, Rising demand for automation
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDTableau, Qlik, SAS Institute, Domo, SAP, MicroStrategy, TIBCO Software, Palantir Technologies, Microsoft, Salesforce, Information Builders, Alteryx, IBM, Apache Software Foundation, Sisense, Oracle
    MARKET FORECAST PERIOD2025 - 2035
    KEY MARKET OPPORTUNITIESIncreased demand for data-driven insights, Growing adoption of AI technologies, Expansion in e-commerce platforms, Rise in personalized marketing strategies, Enhanced need for regulatory compliance.
    COMPOUND ANNUAL GROWTH RATE (CAGR) 10.4% (2025 - 2035)
  10. t

    SDOstreamclust: Stream Clustering Robust to Concept Drift - Evaluation Tests...

    • researchdata.tuwien.ac.at
    zip
    Updated Nov 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Felix Iglesias Vazquez; Felix Iglesias Vazquez (2025). SDOstreamclust: Stream Clustering Robust to Concept Drift - Evaluation Tests [Dataset]. http://doi.org/10.48436/xh0w2-q5x18
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 25, 2025
    Dataset provided by
    TU Wien
    Authors
    Felix Iglesias Vazquez; Felix Iglesias Vazquez
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SDOstreamclust Evaluation Tests

    conducted for the paper: Stream Clustering Robust to Concept Drift. Please refer to:

    Iglesias Vazquez, F., Konzett, S., Zseby, T., & Bifet, A. (2025). Stream Clustering Robust to Concept Drift. In 2025 International Joint Conference on Neural Networks (IJCNN) (pp. 1–10). IEEE. https://doi.org/10.1109/IJCNN64981.2025.11227664

    Context and methodology

    SDOstreamclust is a stream clustering algorithm able to process data incrementally or per batches. It is a combination of the previous SDOstream (anomaly detection in data streams) and SDOclust (static clustering). SDOstreamclust holds the characteristics of SDO algoritmhs: lightweight, intuitive, self-adjusting, resistant to noise, capable of identifying non-convex clusters, and constructed upon robust parameters and interpretable models. Moreover, it shows excellent adaptation to concept drift

    In this repository, SDOclust is evaluated with 165 datasets (both synthetic and real) and compared with CluStream, DBstream, DenStream, StreamKMeans.

    This repository is framed within the research on the following domains: algorithm evaluation, stream clustering, unsupervised learning, machine learning, data mining, streaming data analysis. Datasets and algorithms can be used for experiment replication and for further evaluation and comparison.

    Docker

    A Docker version is also available in: https://hub.docker.com/r/fiv5/sdostreamclust

    Technical details

    Experiments are conducted in Python v3.8.14. The file and folder structure is as follows:- [algorithms] contains a script with functions related to algorithm configurations.

    • [data] contains datasets in ARFF format.
    • [results] contains CSV files with algorithms' performances obtained from running the "run.sh" script (as shown in the paper).
    • "dependencies.sh" lists and installs python dependencies.
    • "pysdoclust-stream-main.zip" contains the SDOstreamclust python package.
    • "README.md" shows details and intructions to use this repository.
    • "run.sh" runs the complete experiments.
    • "run_comp.py"for running experiments specified by arguments.
    • "TSindex.py" implements functions for the Temporal Silhouette index.
    Note: if codes in SDOstreamclust are modified, SWIG (v4.2.1) wrappers have to be rebuilt and SDOstreamclust consequently reinstalled with pip.

    License

    The CC-BY license applies to all data generated with MDCgen. All distributed code is under the GPLv3+ license.

  11. Z

    Data from: Reconstruction of magnetospheric storm-time dynamics using...

    • data.niaid.nih.gov
    Updated Sep 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tsyganenko, Nikolai (2020). Reconstruction of magnetospheric storm-time dynamics using cylindrical basis functions and multi-mission data mining [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4036005
    Explore at:
    Dataset updated
    Sep 19, 2020
    Dataset provided by
    Saint-Petersburg State University
    Authors
    Tsyganenko, Nikolai
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This zip file contains data used to create figures and tables, describing the results of the paper "Reconstruction of magnetospheric storm-time dynamics using cylindrical basis functions and multi-mission data mining", by N. A. Tsyganenko, V. A. Andreeva, and M. I. Sitnov.

  12. d

    Data from: Local L2 Thresholding Based Data Mining in Peer-to-Peer Systems

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Local L2 Thresholding Based Data Mining in Peer-to-Peer Systems [Dataset]. https://catalog.data.gov/dataset/local-l2-thresholding-based-data-mining-in-peer-to-peer-systems
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    In a large network of computers, wireless sensors, or mobile devices, each of the components (hence, peers) has some data about the global status of the system. Many of the functions of the system, such as routing decisions, search strategies, data cleansing, and the assignment of mutual trust, depend on the global status. Therefore, it is essential that the system be able to detect, and react to, changes in its global status. Computing global predicates in such systems is usually very costly. Mainly because of their scale, and in some cases (e.g., sensor networks) also because of the high cost of communication. The cost further increases when the data changes rapidly (due to state changes, node failure, etc.) and computation has to follow these changes. In this paper we describe a two step approach for dealing with these costs. First, we describe a highly efficient local algorithm which detect when the L2 norm of the average data surpasses a threshold. Then, we use this algorithm as a feedback loop for the monitoring of complex predicates on the data – such as the data’s k-means clustering. The efficiency of the L2 algorithm guarantees that so long as the clustering results represent the data (i.e., the data is stationary) few resources are required. When the data undergoes an epoch change – a change in the underlying distribution – and the model no longer represents it, the feedback loop indicates this and the model is rebuilt. Furthermore, the existence of a feedback loop allows using approximate and “best-effort ” methods for constructing the model; if an ill-fit model is built the feedback loop would indicate so, and the model would be rebuilt.

  13. l

    LScDC Word-Category RIG Matrix

    • figshare.le.ac.uk
    pdf
    Updated Apr 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neslihan Suzen (2020). LScDC Word-Category RIG Matrix [Dataset]. http://doi.org/10.25392/leicester.data.12133431.v2
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Apr 28, 2020
    Dataset provided by
    University of Leicester
    Authors
    Neslihan Suzen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    LScDC Word-Category RIG MatrixApril 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk / suzenneslihan@hotmail.com)Supervised by Prof Alexander Gorban and Dr Evgeny MirkesGetting StartedThis file describes the Word-Category RIG Matrix for theLeicester Scientific Corpus (LSC) [1], the procedure to build the matrix and introduces the Leicester Scientific Thesaurus (LScT) with the construction process. The Word-Category RIG Matrix is a 103,998 by 252 matrix, where rows correspond to words of Leicester Scientific Dictionary-Core (LScDC) [2] and columns correspond to 252 Web of Science (WoS) categories [3, 4, 5]. Each entry in the matrix corresponds to a pair (category,word). Its value for the pair shows the Relative Information Gain (RIG) on the belonging of a text from the LSC to the category from observing the word in this text. The CSV file of Word-Category RIG Matrix in the published archive is presented with two additional columns of the sum of RIGs in categories and the maximum of RIGs over categories (last two columns of the matrix). So, the file ‘Word-Category RIG Matrix.csv’ contains a total of 254 columns.This matrix is created to be used in future research on quantifying of meaning in scientific texts under the assumption that words have scientifically specific meanings in subject categories and the meaning can be estimated by information gains from word to categories. LScT (Leicester Scientific Thesaurus) is a scientific thesaurus of English. The thesaurus includes a list of 5,000 words from the LScDC. We consider ordering the words of LScDC by the sum of their RIGs in categories. That is, words are arranged in their informativeness in the scientific corpus LSC. Therefore, meaningfulness of words evaluated by words’ average informativeness in the categories. We have decided to include the most informative 5,000 words in the scientific thesaurus. Words as a Vector of Frequencies in WoS CategoriesEach word of the LScDC is represented as a vector of frequencies in WoS categories. Given the collection of the LSC texts, each entry of the vector consists of the number of texts containing the word in the corresponding category.It is noteworthy that texts in a corpus do not necessarily belong to a single category, as they are likely to correspond to multidisciplinary studies, specifically in a corpus of scientific texts. In other words, categories may not be exclusive. There are 252 WoS categories and a text can be assigned to at least 1 and at most 6 categories in the LSC. Using the binary calculation of frequencies, we introduce the presence of a word in a category. We create a vector of frequencies for each word, where dimensions are categories in the corpus.The collection of vectors, with all words and categories in the entire corpus, can be shown in a table, where each entry corresponds to a pair (word,category). This table is build for the LScDC with 252 WoS categories and presented in published archive with this file. The value of each entry in the table shows how many times a word of LScDC appears in a WoS category. The occurrence of a word in a category is determined by counting the number of the LSC texts containing the word in a category. Words as a Vector of Relative Information Gains Extracted for CategoriesIn this section, we introduce our approach to representation of a word as a vector of relative information gains for categories under the assumption that meaning of a word can be quantified by their information gained for categories.For each category, a function is defined on texts that takes the value 1, if the text belongs to the category, and 0 otherwise. For each word, a function is defined on texts that takes the value 1 if the word belongs to the text, and 0 otherwise. Consider LSC as a probabilistic sample space (the space of equally probable elementary outcomes). For the Boolean random variables, the joint probability distribution, the entropy and information gains are defined.The information gain about the category from the word is the amount of information on the belonging of a text from the LSC to the category from observing the word in the text [6]. We used the Relative Information Gain (RIG) providing a normalised measure of the Information Gain. This provides the ability of comparing information gains for different categories. The calculations of entropy, Information Gains and Relative Information Gains can be found in the README file in the archive published. Given a word, we created a vector where each component of the vector corresponds to a category. Therefore, each word is represented as a vector of relative information gains. It is obvious that the dimension of vector for each word is the number of categories. The set of vectors is used to form the Word-Category RIG Matrix, in which each column corresponds to a category, each row corresponds to a word and each component is the relative information gain from the word to the category. In Word-Category RIG Matrix, a row vector represents the corresponding word as a vector of RIGs in categories. We note that in the matrix, a column vector represents RIGs of all words in an individual category. If we choose an arbitrary category, words can be ordered by their RIGs from the most informative to the least informative for the category. As well as ordering words in each category, words can be ordered by two criteria: sum and maximum of RIGs in categories. The top n words in this list can be considered as the most informative words in the scientific texts. For a given word, the sum and maximum of RIGs are calculated from the Word-Category RIG Matrix.RIGs for each word of LScDC in 252 categories are calculated and vectors of words are formed. We then form the Word-Category RIG Matrix for the LSC. For each word, the sum (S) and maximum (M) of RIGs in categories are calculated and added at the end of the matrix (last two columns of the matrix). The Word-Category RIG Matrix for the LScDC with 252 categories, the sum of RIGs in categories and the maximum of RIGs over categories can be found in the database.Leicester Scientific Thesaurus (LScT)Leicester Scientific Thesaurus (LScT) is a list of 5,000 words form the LScDC [2]. Words of LScDC are sorted in descending order by the sum (S) of RIGs in categories and the top 5,000 words are selected to be included in the LScT. We consider these 5,000 words as the most meaningful words in the scientific corpus. In other words, meaningfulness of words evaluated by words’ average informativeness in the categories and the list of these words are considered as a ‘thesaurus’ for science. The LScT with value of sum can be found as CSV file with the published archive. Published archive contains following files:1) Word_Category_RIG_Matrix.csv: A 103,998 by 254 matrix where columns are 252 WoS categories, the sum (S) and the maximum (M) of RIGs in categories (last two columns of the matrix), and rows are words of LScDC. Each entry in the first 252 columns is RIG from the word to the category. Words are ordered as in the LScDC.2) Word_Category_Frequency_Matrix.csv: A 103,998 by 252 matrix where columns are 252 WoS categories and rows are words of LScDC. Each entry of the matrix is the number of texts containing the word in the corresponding category. Words are ordered as in the LScDC.3) LScT.csv: List of words of LScT with sum (S) values. 4) Text_No_in_Cat.csv: The number of texts in categories. 5) Categories_in_Documents.csv: List of WoS categories for each document of the LSC.6) README.txt: Description of Word-Category RIG Matrix, Word-Category Frequency Matrix and LScT and forming procedures.7) README.pdf (same as 6 in PDF format)References[1] Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v2[2] Suzen, Neslihan (2019): LScDC (Leicester Scientific Dictionary-Core). figshare. Dataset. https://doi.org/10.25392/leicester.data.9896579.v3[3] Web of Science. (15 July). Available: https://apps.webofknowledge.com/[4] WoS Subject Categories. Available: https://images.webofknowledge.com/WOKRS56B5/help/WOS/hp_subject_category_terms_tasca.html [5] Suzen, N., Mirkes, E. M., & Gorban, A. N. (2019). LScDC-new large scientific dictionary. arXiv preprint arXiv:1912.06858. [6] Shannon, C. E. (1948). A mathematical theory of communication. Bell system technical journal, 27(3), 379-423.

  14. Local L2 Thresholding Based Data Mining in Peer-to-Peer Systems - Dataset -...

    • data.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Local L2 Thresholding Based Data Mining in Peer-to-Peer Systems - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/local-l2-thresholding-based-data-mining-in-peer-to-peer-systems
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    In a large network of computers, wireless sensors, or mobile devices, each of the components (hence, peers) has some data about the global status of the system. Many of the functions of the system, such as routing decisions, search strategies, data cleansing, and the assignment of mutual trust, depend on the global status. Therefore, it is essential that the system be able to detect, and react to, changes in its global status. Computing global predicates in such systems is usually very costly. Mainly because of their scale, and in some cases (e.g., sensor networks) also because of the high cost of communication. The cost further increases when the data changes rapidly (due to state changes, node failure, etc.) and computation has to follow these changes. In this paper we describe a two step approach for dealing with these costs. First, we describe a highly efficient local algorithm which detect when the L2 norm of the average data surpasses a threshold. Then, we use this algorithm as a feedback loop for the monitoring of complex predicates on the data – such as the data’s k-means clustering. The efficiency of the L2 algorithm guarantees that so long as the clustering results represent the data (i.e., the data is stationary) few resources are required. When the data undergoes an epoch change – a change in the underlying distribution – and the model no longer represents it, the feedback loop indicates this and the model is rebuilt. Furthermore, the existence of a feedback loop allows using approximate and “best-effort ” methods for constructing the model; if an ill-fit model is built the feedback loop would indicate so, and the model would be rebuilt.

  15. f

    Data from: Clinical significance and biological function of transcriptional...

    • datasetcatalog.nlm.nih.gov
    • tandf.figshare.com
    Updated Oct 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yang, Lihua; Liang, Liang; Ma, Jie; Lu, Huiping; Huang, Menglan; Qin, Xingan; Dang, Yiwu; Chen, Gang; Lv, Zili; Huang, Zhiguang; Wu, Hong (2020). Clinical significance and biological function of transcriptional repressor GATA binding 1 in gastric cancer: a study based on data mining, RT-qPCR, immunochemistry, and vitro experiment [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000515418
    Explore at:
    Dataset updated
    Oct 12, 2020
    Authors
    Yang, Lihua; Liang, Liang; Ma, Jie; Lu, Huiping; Huang, Menglan; Qin, Xingan; Dang, Yiwu; Chen, Gang; Lv, Zili; Huang, Zhiguang; Wu, Hong
    Description

    Transcriptional repressor GATA binding 1 (TRPS1) is a newly discovered transcription factor, which has been reported in many tumors, except for gastric cancer (GC). In this study, we aimed to grope for clinical significance and biological function of TRPS1 in GC. TRPS1 expression in GC and its relationship with clinicopathological features were analyzed based on public databases, and verified by immunohistochemistry and RT-qPCR. Kaplan-Meier survival curve and Cox regression model were used to estimate the influence of TRPS1 on the univariate prognosis and multivariate survival risk factors of GC. The effects of TRPS1 on malignant biological behaviors of GC cells were studied by CCK8 cell proliferation, scratch test, and Transwell assay. The function of TRPS1 was further analyzed by signaling pathway analysis. TRPS1 mRNA expression in GC tissues was up-regulated and was of great significance in some prognostic factors. Protein expression of TRPS1 in tumor tissues was significantly higher than that in paracancerous tissues. Over-expression of TRPS1 was a poor prognostic indicator for GC patients. TRPS1 knockdown could inhibit the proliferation, migration, and invasion of GC cells. The important role of TRPS1 was in the extracellular matrix, and it was involved in actin binding and proteoglycan in cancer. The hub genes of TRPS1 (FN1, ITGB1) were defined. TRPS1 may be a tumor promoter and promote the development of GC by influencing the malignant biological behaviors of GC. TRPS1 is expected to be a key diagnostic and prognostic indicator for GC patients.

  16. w

    Global Analytics Business Intelligence Platform Market Research Report: By...

    • wiseguyreports.com
    Updated Sep 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global Analytics Business Intelligence Platform Market Research Report: By Deployment Mode (Cloud-based, On-premises, Hybrid), By Functionality (Reporting, Data Mining, Online Analytical Processing, Dashboard, Data Visualization), By End User (BFSI, Healthcare, Retail, Manufacturing, Telecommunications), By Organization Size (Small Enterprises, Medium Enterprises, Large Enterprises) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/analytics-business-intelligence-platform-market
    Explore at:
    Dataset updated
    Sep 15, 2025
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Sep 25, 2025
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2023
    REGIONS COVEREDNorth America, Europe, APAC, South America, MEA
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 202426.7(USD Billion)
    MARKET SIZE 202528.0(USD Billion)
    MARKET SIZE 203545.0(USD Billion)
    SEGMENTS COVEREDDeployment Mode, Functionality, End User, Organization Size, Regional
    COUNTRIES COVEREDUS, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
    KEY MARKET DYNAMICSgrowing demand for data visualization, increasing need for real-time analytics, rise in cloud-based solutions, emphasis on data-driven decision making, integration of AI and machine learning
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDSisense, IBM, Domo, BOARD International, Oracle, MicroStrategy, Infor, ThoughtSpot, SAP, Looker, Microsoft, Tableau Software, TIBCO Software, SAS Institute, Alteryx, Qlik, Zoho Corporation
    MARKET FORECAST PERIOD2025 - 2035
    KEY MARKET OPPORTUNITIESCloud-based analytics solutions expansion, Integration with IoT technologies, Demand for real-time data insights, Adoption of AI-driven analytics, Growth in mobile BI applications
    COMPOUND ANNUAL GROWTH RATE (CAGR) 4.9% (2025 - 2035)
  17. Data Mining for IVHM using Sparse Binary Ensembles, Phase I

    • data.nasa.gov
    application/rdfxml +5
    Updated Jun 26, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Data Mining for IVHM using Sparse Binary Ensembles, Phase I [Dataset]. https://data.nasa.gov/dataset/Data-Mining-for-IVHM-using-Sparse-Binary-Ensembles/qfus-evzq
    Explore at:
    xml, tsv, csv, application/rssxml, application/rdfxml, jsonAvailable download formats
    Dataset updated
    Jun 26, 2018
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    In response to NASA SBIR topic A1.05, "Data Mining for Integrated Vehicle Health Management", Michigan Aerospace Corporation (MAC) asserts that our unique SPADE (Sparse Processing Applied to Data Exploitation) technology meets a significant fraction of the stated criteria and has functionality that enables it to handle many applications within the aircraft lifecycle. SPADE distills input data into highly quantized features and uses MAC's novel techniques for constructing Ensembles of Decision Trees to develop extremely accurate diagnostic/prognostic models for classification, regression, clustering, anomaly detection and semi-supervised learning tasks. These techniques are currently being employed to do Threat Assessment for satellites in conjunction with researchers at the Air Force Research Lab. Significant advantages to this approach include: 1) completely data driven; 2) training and evaluation are faster than conventional methods; 3) operates effectively on huge datasets (> billion samples X > million features), 4) proven to be as accurate as state-of-the-art techniques in many significant real-world applications. The specific goals for Phase 1 will be to work with domain experts at NASA and with our partners Boeing, SpaceX and GMV Space Systems to delineate a subset of problems that are particularly well-suited to this approach and to determine requirements for deploying algorithms on platforms of opportunity.

  18. Examining the Capacity of Text Mining and Software Metrics in Vulnerability...

    • data.europa.eu
    unknown
    Updated Sep 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2023). Examining the Capacity of Text Mining and Software Metrics in Vulnerability Prediction [dataset] [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-8369963?locale=fr
    Explore at:
    unknown(79359120)Available download formats
    Dataset updated
    Sep 21, 2023
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains the extension of a publicly available dataset that was published initially by Ferenc et al. in their paper: “Ferenc, R.; Hegedus, P.; Gyimesi, P.; Antal, G.; Bán, D.; Gyimóthy, T. Challenging machine learning algorithms in predicting vulnerable javascript functions. 2019 IEEE/ACM 7th InternationalWorkshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE). IEEE, 2019, pp. 8–14.” The dataset contained software metrics for source code functions written in JavaScript (JS) programming language. Each function was labeled as vulnerable or clean. The authors gathered vulnerabilities from publicly available vulnerability databases. In our paper entitled: “Examining the Capacity of Text Mining and Software Metrics in Vulnerability Prediction” and cited as: “Kalouptsoglou I, Siavvas M, Kehagias D, Chatzigeorgiou A, Ampatzoglou A. Examining the Capacity of Text Mining and Software Metrics in Vulnerability Prediction. Entropy. 2022; 24(5):651. https://doi.org/10.3390/e24050651” , we presented an extended version of the dataset by extracting textual features for the labeled JS functions. In particular, we got the dataset provided by Ferenc et al. in CSV format and then we gathered all the GitHub URLs of the dataset's functions (i.e., methods). Using these URLs, we collected the source code of the corresponding JS files from GitHub. Subsequently, by utilizing the start and end line information for every function, we cut off the code of the functions. Each function was then tokenized to construct a list of tokens per function. To extract text features, we used a text mining technique called sequences of tokens. As a result, we created a repository with all methods' source code, the token sequences of each method, and their labels. To boost the generalizability of type-specific tokens, all comments were eliminated, as well as all integers and strings, which were replaced with two unique IDs. The dataset contains 12,106 JavaScript functions, from which 1,493 are considered vulnerable. This dataset was created and utilized during the Vulnerability Prediction Task of the Horizon2020 IoTAC Project as training and evaluation data for the construction of vulnerability prediction models. The dataset is provided in the csv format. Each row of the csv file has the following parts: Label: Flag with values ‘1’ for vulnerable and ‘0’ for non-vulnerable methods Name: The name of the JavaScript method Longname: The longname of the JavaScript method Path: The path of the file of the method in the repository Full_repo_path: The GitHub URL of the file of the method TokenX: Each next row corresponds to each token included in the method

  19. O

    Online Analytical Processing Tools Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Aug 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Online Analytical Processing Tools Report [Dataset]. https://www.datainsightsmarket.com/reports/online-analytical-processing-tools-1130556
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Aug 12, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Online Analytical Processing (OLAP) tools market is experiencing robust growth, driven by the increasing need for businesses to derive actionable insights from large and complex datasets. The market's expansion is fueled by the widespread adoption of cloud-based solutions, offering scalability and cost-effectiveness compared to on-premise deployments. Furthermore, the rising demand for real-time business intelligence (BI) and advanced analytics capabilities is pushing organizations to invest in sophisticated OLAP tools that enable faster decision-making. Key trends include the integration of artificial intelligence (AI) and machine learning (ML) algorithms into OLAP platforms to automate data analysis and generate predictive insights. The growing adoption of self-service BI tools is also empowering business users to access and analyze data independently, reducing reliance on IT departments. While data security and integration complexities pose challenges, the overall market outlook remains positive, with a projected Compound Annual Growth Rate (CAGR) of approximately 15% from 2025 to 2033. This growth is expected across various segments, including cloud-based OLAP, on-premise OLAP, and industry-specific solutions. The competitive landscape is characterized by a mix of established players like IBM and Infor, and agile emerging vendors such as AnswerDock and ClicData. The success of these vendors hinges on their ability to deliver innovative solutions that meet the evolving needs of businesses. This includes offering user-friendly interfaces, robust data visualization capabilities, and seamless integration with existing enterprise systems. The market is segmented by deployment type (cloud, on-premise), industry (finance, healthcare, retail), and functionality (reporting, data mining, forecasting). North America currently holds a significant market share, followed by Europe and Asia-Pacific, but growth is expected to be strong across all regions as businesses globally embrace data-driven decision-making. The continued focus on enhancing data security and improving data governance will be crucial for sustaining the market’s positive trajectory.

  20. d

    Data from: Privacy Preserving Outlier Detection through Random Nonlinear...

    • catalog.data.gov
    • data.amerigeoss.org
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Privacy Preserving Outlier Detection through Random Nonlinear Data Distortion [Dataset]. https://catalog.data.gov/dataset/privacy-preserving-outlier-detection-through-random-nonlinear-data-distortion
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    Consider a scenario in which the data owner has some private/sensitive data and wants a data miner to access it for studying important patterns without revealing the sensitive information. Privacy preserving data mining aims to solve this problem by randomly transforming the data prior to its release to data miners. Previous work only considered the case of linear data perturbations — additive, multiplicative or a combination of both for studying the usefulness of the perturbed output. In this paper, we discuss nonlinear data distortion using potentially nonlinear random data transformation and show how it can be useful for privacy preserving anomaly detection from sensitive datasets. We develop bounds on the expected accuracy of the nonlinear distortion and also quantify privacy by using standard definitions. The highlight of this approach is to allow a user to control the amount of privacy by varying the degree of nonlinearity. We show how our general transformation can be used for anomaly detection in practice for two specific problem instances: a linear model and a popular nonlinear model using the sigmoid function. We also analyze the proposed nonlinear transformation in full generality and then show that for specific cases it is distance preserving. A main contribution of this paper is the discussion between the invertibility of a transformation and privacy preservation and the application of these techniques to outlier detection. Experiments conducted on real-life datasets demonstrate the effectiveness of the approach.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dashlink (2025). A Generic Local Algorithm for Mining Data Streams in Large Distributed Systems [Dataset]. https://catalog.data.gov/dataset/a-generic-local-algorithm-for-mining-data-streams-in-large-distributed-systems

Data from: A Generic Local Algorithm for Mining Data Streams in Large Distributed Systems

Related Article
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description

In a large network of computers or wireless sensors, each of the components (henceforth, peers) has some data about the global state of the system. Much of the system's functionality such as message routing, information retrieval and load sharing relies on modeling the global state. We refer to the outcome of the function (e.g., the load experienced by each peer) as the emph{model} of the system. Since the state of the system is constantly changing, it is necessary to keep the models up-to-date. Computing global data mining models e.g. decision trees, k-means clustering in large distributed systems may be very costly due to the scale of the system and due to communication cost, which may be high. The cost further increases in a dynamic scenario when the data changes rapidly. In this paper we describe a two step approach for dealing with these costs. First, we describe a highly efficient emph{local} algorithm which can be used to monitor a wide class of data mining models. Then, we use this algorithm as a feedback loop for the monitoring of complex functions of the data such as its k-means clustering. The theoretical claims are corroborated with a thorough experimental analysis.

Search
Clear search
Close search
Google apps
Main menu