100+ datasets found
  1. d

    Distributed Data Mining in Peer-to-Peer Networks

    • catalog.data.gov
    Updated Apr 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Distributed Data Mining in Peer-to-Peer Networks [Dataset]. https://catalog.data.gov/dataset/distributed-data-mining-in-peer-to-peer-networks
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    Peer-to-peer (P2P) networks are gaining popularity in many applications such as file sharing, e-commerce, and social networking, many of which deal with rich, distributed data sources that can benefit from data mining. P2P networks are, in fact,well-suited to distributed data mining (DDM), which deals with the problem of data analysis in environments with distributed data,computing nodes,and users. This article offers an overview of DDM applications and algorithms for P2P environments,focusing particularly on local algorithms that perform data analysis by using computing primitives with limited communication overhead. The authors describe both exact and approximate local P2P data mining algorithms that work in a decentralized and communication-efficient manner.

  2. Distributed Data Mining in Peer-to-Peer Networks - Dataset - NASA Open Data...

    • data.nasa.gov
    • data.staging.idas-ds1.appdat.jsc.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Distributed Data Mining in Peer-to-Peer Networks - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/distributed-data-mining-in-peer-to-peer-networks
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Peer-to-peer (P2P) networks are gaining popularity in many applications such as file sharing, e-commerce, and social networking, many of which deal with rich, distributed data sources that can benefit from data mining. P2P networks are, in fact,well-suited to distributed data mining (DDM), which deals with the problem of data analysis in environments with distributed data,computing nodes,and users. This article offers an overview of DDM applications and algorithms for P2P environments,focusing particularly on local algorithms that perform data analysis by using computing primitives with limited communication overhead. The authors describe both exact and approximate local P2P data mining algorithms that work in a decentralized and communication-efficient manner.

  3. d

    Data from: Peer-to-Peer Data Mining, Privacy Issues, and Games

    • catalog.data.gov
    • data.nasa.gov
    • +2more
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Peer-to-Peer Data Mining, Privacy Issues, and Games [Dataset]. https://catalog.data.gov/dataset/peer-to-peer-data-mining-privacy-issues-and-games
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    Peer-to-Peer (P2P) networks are gaining increasing popularity in many distributed applications such as file-sharing, network storage, web caching, sear- ching and indexing of relevant documents and P2P network-threat analysis. Many of these applications require scalable analysis of data over a P2P network. This paper starts by offering a brief overview of distributed data mining applications and algorithms for P2P environments. Next it discusses some of the privacy concerns with P2P data mining and points out the problems of existing privacy-preserving multi-party data mining techniques. It further points out that most of the nice assumptions of these existing privacy preserving techniques fall apart in real-life applications of privacy-preserving distributed data mining (PPDM). The paper offers a more realistic formulation of the PPDM problem as a multi-party game and points out some recent results.

  4. Enron Dynamic Network

    • figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    günce keziban orman; Vincent Labatut (2023). Enron Dynamic Network [Dataset]. http://doi.org/10.6084/m9.figshare.3203410.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    günce keziban orman; Vincent Labatut
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Enron is a well-known dataset in network science and text mining. It has been widely studied in academia. In network science, several different static networks appear in the literature. However, up to now, no dynamic network has been published, even though the email conversations have timestamps.We processed the original dataset to extract a dynamic network, In original dataset, It contains 158 nodes representing Enron employees between 1997 and 2002. All the addresses in the From and To fields of each email are considered, resulting in a network of 28802 nodes representing a distinct email addresses. A time span of one month is chosen for the time slices, generating 46 time slices. Two nodes are connected if the corresponding persons emailed each other during the given time slice. We did not make any distinction between sender and receiver, and thus produced an undirected dynamic network.

  5. Multi-objective optimization based privacy preserving distributed data...

    • data.nasa.gov
    Updated Mar 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Multi-objective optimization based privacy preserving distributed data mining in Peer-to-Peer networks - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/multi-objective-optimization-based-privacy-preserving-distributed-data-mining-in-peer-to-p
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    This paper proposes a scalable, local privacy preserving algorithm for distributed Peer-to-Peer (P2P) data aggregation useful for many advanced data mining/analysis tasks such as average/sum computation, decision tree induction, feature selection, and more. Unlike most multi-party privacy-preserving data mining algorithms, this approach works in an asynchronous manner through local interactions and it is highly scalable. It particularly deals with the distributed computation of the sum of a set of numbers stored at different peers in a P2P network in the context of a P2P web mining application. The proposed optimization based privacy-preserving technique for computing the sum allows different peers to specify different privacy requirements without having to adhere to a global set of parameters for the chosen privacy model. Since distributed sum computation is a frequently used primitive, the proposed approach is likely to have significant impact on many data mining tasks such as multi-party privacy-preserving clustering, frequent itemset mining, and statistical aggregate computation.

  6. d

    Data from: Multi-objective optimization based privacy preserving distributed...

    • catalog.data.gov
    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    • +1more
    Updated Apr 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Multi-objective optimization based privacy preserving distributed data mining in Peer-to-Peer networks [Dataset]. https://catalog.data.gov/dataset/multi-objective-optimization-based-privacy-preserving-distributed-data-mining-in-peer-to-p
    Explore at:
    Dataset updated
    Apr 9, 2025
    Dataset provided by
    Dashlink
    Description

    This paper proposes a scalable, local privacy preserving algorithm for distributed Peer-to-Peer (P2P) data aggregation useful for many advanced data mining/analysis tasks such as average/sum computation, decision tree induction, feature selection, and more. Unlike most multi-party privacy-preserving data mining algorithms, this approach works in an asynchronous manner through local interactions and it is highly scalable. It particularly deals with the distributed computation of the sum of a set of numbers stored at different peers in a P2P network in the context of a P2P web mining application. The proposed optimization based privacy-preserving technique for computing the sum allows different peers to specify different privacy requirements without having to adhere to a global set of parameters for the chosen privacy model. Since distributed sum computation is a frequently used primitive, the proposed approach is likely to have significant impact on many data mining tasks such as multi-party privacy-preserving clustering, frequent itemset mining, and statistical aggregate computation.

  7. D

    Data Mining Tools Market Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Feb 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Data Mining Tools Market Report [Dataset]. https://www.marketresearchforecast.com/reports/data-mining-tools-market-1722
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Feb 3, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Data Mining Tools Market size was valued at USD 1.01 USD billion in 2023 and is projected to reach USD 1.99 USD billion by 2032, exhibiting a CAGR of 10.2 % during the forecast period. The growing adoption of data-driven decision-making and the increasing need for business intelligence are major factors driving market growth. Data mining refers to filtering, sorting, and classifying data from larger datasets to reveal subtle patterns and relationships, which helps enterprises identify and solve complex business problems through data analysis. Data mining software tools and techniques allow organizations to foresee future market trends and make business-critical decisions at crucial times. Data mining is an essential component of data science that employs advanced data analytics to derive insightful information from large volumes of data. Businesses rely heavily on data mining to undertake analytics initiatives in the organizational setup. The analyzed data sourced from data mining is used for varied analytics and business intelligence (BI) applications, which consider real-time data analysis along with some historical pieces of information. Recent developments include: May 2023 – WiMi Hologram Cloud Inc. introduced a new data interaction system developed by combining neural network technology and data mining. Using real-time interaction, the system can offer reliable and safe information transmission., May 2023 – U.S. Data Mining Group, Inc., operating in bitcoin mining site, announced a hosting contract to deploy 150,000 bitcoins in partnership with major companies such as TeslaWatt, Sphere 3D, Marathon Digital, and more. The company is offering industry turn-key solutions for curtailment, accounting, and customer relations., April 2023 – Artificial intelligence and single-cell biotech analytics firm, One Biosciences, launched a single cell data mining algorithm called ‘MAYA’. The algorithm is for cancer patients to detect therapeutic vulnerabilities., May 2022 – Europe-based Solarisbank, a banking-as-a-service provider, announced its partnership with Snowflake to boost its cloud data strategy. Using the advanced cloud infrastructure, the company can enhance data mining efficiency and strengthen its banking position.. Key drivers for this market are: Increasing Focus on Customer Satisfaction to Drive Market Growth. Potential restraints include: Requirement of Skilled Technical Resources Likely to Hamper Market Growth. Notable trends are: Incorporation of Data Mining and Machine Learning Solutions to Propel Market Growth.

  8. Local L2 Thresholding Based Data Mining in Peer-to-Peer Systems - Dataset -...

    • data.nasa.gov
    • data.staging.idas-ds1.appdat.jsc.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Local L2 Thresholding Based Data Mining in Peer-to-Peer Systems - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/local-l2-thresholding-based-data-mining-in-peer-to-peer-systems
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    In a large network of computers, wireless sensors, or mobile devices, each of the components (hence, peers) has some data about the global status of the system. Many of the functions of the system, such as routing decisions, search strategies, data cleansing, and the assignment of mutual trust, depend on the global status. Therefore, it is essential that the system be able to detect, and react to, changes in its global status. Computing global predicates in such systems is usually very costly. Mainly because of their scale, and in some cases (e.g., sensor networks) also because of the high cost of communication. The cost further increases when the data changes rapidly (due to state changes, node failure, etc.) and computation has to follow these changes. In this paper we describe a two step approach for dealing with these costs. First, we describe a highly efficient local algorithm which detect when the L2 norm of the average data surpasses a threshold. Then, we use this algorithm as a feedback loop for the monitoring of complex predicates on the data – such as the data’s k-means clustering. The efficiency of the L2 algorithm guarantees that so long as the clustering results represent the data (i.e., the data is stationary) few resources are required. When the data undergoes an epoch change – a change in the underlying distribution – and the model no longer represents it, the feedback loop indicates this and the model is rebuilt. Furthermore, the existence of a feedback loop allows using approximate and “best-effort ” methods for constructing the model; if an ill-fit model is built the feedback loop would indicate so, and the model would be rebuilt.

  9. MOESM1 of Integrating text mining, data mining, and network analysis for...

    • springernature.figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriela Jurca; Omar Addam; Alper Aksac; Shang Gao; Tansel Ă–zyer; Douglas Demetrick; Reda Alhajj (2023). MOESM1 of Integrating text mining, data mining, and network analysis for identifying genetic breast cancer trends [Dataset]. http://doi.org/10.6084/m9.figshare.c.3642398_D1.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Gabriela Jurca; Omar Addam; Alper Aksac; Shang Gao; Tansel Ă–zyer; Douglas Demetrick; Reda Alhajj
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Additional file 1. Additional Tables.

  10. f

    Visual Data Mining of Biological Networks: One Size Does Not Fit All

    • plos.figshare.com
    xml
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chiara Pastrello; David Otasek; Kristen Fortney; Giuseppe Agapito; Mario Cannataro; Elize Shirdel; Igor Jurisica (2023). Visual Data Mining of Biological Networks: One Size Does Not Fit All [Dataset]. http://doi.org/10.1371/journal.pcbi.1002833
    Explore at:
    xmlAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS Computational Biology
    Authors
    Chiara Pastrello; David Otasek; Kristen Fortney; Giuseppe Agapito; Mario Cannataro; Elize Shirdel; Igor Jurisica
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    High-throughput technologies produce massive amounts of data. However, individual methods yield data specific to the technique used and biological setup. The integration of such diverse data is necessary for the qualitative analysis of information relevant to hypotheses or discoveries. It is often useful to integrate these datasets using pathways and protein interaction networks to get a broader view of the experiment. The resulting network needs to be able to focus on either the large-scale picture or on the more detailed small-scale subsets, depending on the research question and goals. In this tutorial, we illustrate a workflow useful to integrate, analyze, and visualize data from different sources, and highlight important features of tools to support such analyses.

  11. m

    Arab Computational Propaganda on X (Twitter)

    • data.mendeley.com
    Updated Oct 2, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bodor Almotairy (2023). Arab Computational Propaganda on X (Twitter) [Dataset]. http://doi.org/10.17632/58mttpbc7x.3
    Explore at:
    Dataset updated
    Oct 2, 2023
    Authors
    Bodor Almotairy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The database includes three datasets. All of them were extracted from a dataset published by X (Twitter Transparency Websites) that includes tweets from malicious accounts trying to manipulate public opinion in the Kingdom of Saudi Arabia. Although the propagandist tweets were published by malicious accounts, as X (Twitter) stated, the tweets at their level were not classified as propaganda or not. Propagandists usually mix propaganda and non-propaganda tweets in an attempt to hide their identities. Therefore, it was necessary to classify their tweets as propaganda or not, based on the propaganda technique used. Since the datasets are very large, we annotated a sample of 2,100 tweets. The datasets are made up of 16,355,558 tweets from propagandist users focused on sports and banking topics.

  12. Additional file 1 of Novel methods of qualitative analysis for health policy...

    • springernature.figshare.com
    txt
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mireya Martínez-García; Maite Vallejo; Enrique Hernández-Lemus; Jorge Alberto Álvarez-Díaz (2023). Additional file 1 of Novel methods of qualitative analysis for health policy research [Dataset]. http://doi.org/10.6084/m9.figshare.7587416.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    figshare
    Authors
    Mireya Martínez-García; Maite Vallejo; Enrique Hernández-Lemus; Jorge Alberto Álvarez-Díaz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Interactive network files. Interactive network files with all statistical and topological analyses. This is a Cytoscape.cys session. In order to open/view/modify this file please use the freely available Cytoscape software platform, available at http://www.cytoscape.org/download.php . (SIF 3413 kb)

  13. Knowledge Graph: tyrolean mining documents 15th and 16th century

    • zenodo.org
    • explore.openaire.eu
    • +1more
    bin
    Updated Sep 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gerald Hiebel; Gerald Hiebel; Elisabeth Gruber-Tokić; Elisabeth Gruber-Tokić; Milena Peralta Friedburg; Milena Peralta Friedburg; Brigit Danthine; Brigit Danthine (2024). Knowledge Graph: tyrolean mining documents 15th and 16th century [Dataset]. http://doi.org/10.5281/zenodo.6276586
    Explore at:
    binAvailable download formats
    Dataset updated
    Sep 26, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gerald Hiebel; Gerald Hiebel; Elisabeth Gruber-Tokić; Elisabeth Gruber-Tokić; Milena Peralta Friedburg; Milena Peralta Friedburg; Brigit Danthine; Brigit Danthine
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains a Knowledge Graph (.nq file) of two historical mining documents: “Verleihbuch der Rattenberger Bergrichter” ( Hs. 37, 1460-1463) and “Schwazer Berglehenbuch” (Hs. 1587, approx. 1515) stored by the Tyrolean Regional Archive, Innsbruck (Austria). The user of the KG may explore the montanistic network and relations between people, claims and mines in the late medieval Tyrol. The core regions concern the districts Schwaz and Kufstein (Tyrol, Austria).

    The ontology used to represent the claims is CIDOC CRM, an ISO certified ontology for Cultural Heritage documentation. Supported by the Karma tool the KG is generated as RDF (Resource Description Framework). The generated RDF data is imported into a Triplestore, in this case GraphDB, and then displayed visually. This puts the data from the early mining texts into a semantically structured context and makes the mutual relationships between people, places and mines visible.

    Both documents and the Knowledge Graph were processed and generated by the research team of the project “Text Mining Medieval Mining Texts”. The research project (2019-2022) was carried out at the university of Innsbruck and funded by go!digital next generation programme of the Austrian Academy of Sciences.

    Citeable Transcripts of the historical documents are online available:
    Hs. 37 DOI: 10.5281/zenodo.6274562
    Hs. 1587 DOI: 10.5281/zenodo.6274928

  14. S

    Data from: AMiner: Search and Mining of Academic Social Networks

    • scidb.cn
    Updated Oct 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Huaiyu Wan; Yutao Zhang; Jing Zhang; Jie Tang (2020). AMiner: Search and Mining of Academic Social Networks [Dataset]. http://doi.org/10.11922/sciencedb.j00104.00021
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 15, 2020
    Dataset provided by
    Science Data Bank
    Authors
    Huaiyu Wan; Yutao Zhang; Jing Zhang; Jie Tang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    8 figures of the paper. Figure 1 presents the architecture of AMiner. Figure 2 shows the schema of the researcher profile. Figure 3 is an example of researcher profile. Figure 4 is an overview of the name disambiguation framework in AMiner. Figure 5 is graphical representation of the three Author-Conference-Topic (ACT) models. Figure 6 shows an example result of experts found for “Data Mining”. Figure 7 is a model framework of DeepInf. Figure 8 shows an example of researcher ranking by sociability index.

  15. f

    Average number of communities in the network of sectors and average minimum...

    • plos.figshare.com
    xls
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gérald Gurtner; Stefania Vitali; Marco Cipolla; Fabrizio Lillo; Rosario Nunzio Mantegna; Salvatore Miccichè; Simone Pozzi (2023). Average number of communities in the network of sectors and average minimum number of communities containing 90% of the nodes. [Dataset]. http://doi.org/10.1371/journal.pone.0094414.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Gérald Gurtner; Stefania Vitali; Marco Cipolla; Fabrizio Lillo; Rosario Nunzio Mantegna; Salvatore Miccichè; Simone Pozzi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The top part refers to existing partitions and the bottom part to the unsupervised partitions. Results were obtained averaging over 28 days, and the errors bars are standard deviations. The number of ACCs is not stable because some of them are not used during the period.

  16. A pre-trained sound event detection neural network

    • figshare.com
    • search.datacite.org
    bin
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ian McLoughlin (2023). A pre-trained sound event detection neural network [Dataset]. http://doi.org/10.6084/m9.figshare.5245789.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Ian McLoughlin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a network trained earlier on clean data. It does not give very good results but is enough to show the system working. It can achieve above 90% on clean sounds and probably about 80% accuracy in 0dB SNR.The network was saved manually (using the MATLAB 'save' command) after running the training code, and before running the testing code.

  17. d

    Data from: Local L2 Thresholding Based Data Mining in Peer-to-Peer Systems

    • catalog.data.gov
    • datasets.ai
    • +1more
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Local L2 Thresholding Based Data Mining in Peer-to-Peer Systems [Dataset]. https://catalog.data.gov/dataset/local-l2-thresholding-based-data-mining-in-peer-to-peer-systems
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    In a large network of computers, wireless sensors, or mobile devices, each of the components (hence, peers) has some data about the global status of the system. Many of the functions of the system, such as routing decisions, search strategies, data cleansing, and the assignment of mutual trust, depend on the global status. Therefore, it is essential that the system be able to detect, and react to, changes in its global status. Computing global predicates in such systems is usually very costly. Mainly because of their scale, and in some cases (e.g., sensor networks) also because of the high cost of communication. The cost further increases when the data changes rapidly (due to state changes, node failure, etc.) and computation has to follow these changes. In this paper we describe a two step approach for dealing with these costs. First, we describe a highly efficient local algorithm which detect when the L2 norm of the average data surpasses a threshold. Then, we use this algorithm as a feedback loop for the monitoring of complex predicates on the data – such as the data’s k-means clustering. The efficiency of the L2 algorithm guarantees that so long as the clustering results represent the data (i.e., the data is stationary) few resources are required. When the data undergoes an epoch change – a change in the underlying distribution – and the model no longer represents it, the feedback loop indicates this and the model is rebuilt. Furthermore, the existence of a feedback loop allows using approximate and “best-effort ” methods for constructing the model; if an ill-fit model is built the feedback loop would indicate so, and the model would be rebuilt.

  18. m

    Data from: Exploring Small-World Network with an Elite-Clique: Bringing...

    • data.mendeley.com
    Updated Jan 4, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jar-der Luo (2019). Exploring Small-World Network with an Elite-Clique: Bringing Embeddedness Theory into the Dynamic Evolution of a Venture Capital Network [Dataset]. http://doi.org/10.17632/yy8nyv3d6g.1
    Explore at:
    Dataset updated
    Jan 4, 2019
    Authors
    Jar-der Luo
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    Inside the code is an algorithmic framework for modeling the dynamic process of how the Venture Capital network come into being. Usage: To simulate this model, run Launcher.m with Parameter ClassANumber = 75 (original VC number); ClassBNumber = 375(original firm number); step = 14 (iteration times); Count = 10 ( times during each iteration).

  19. m

    Data for: Corrosive Sulphur effect in power and distribution transformers...

    • data.mendeley.com
    Updated Dec 18, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ricardo Arias (2018). Data for: Corrosive Sulphur effect in power and distribution transformers failures and treatments [Dataset]. http://doi.org/10.17632/6ng342f32t.1
    Explore at:
    Dataset updated
    Dec 18, 2018
    Authors
    Ricardo Arias
    License

    Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
    License information was derived automatically

    Description

    Parameters for case study: Corrosive Sulphur in power and distribution transformers

  20. R

    Replication Data for: Detecting Rocks in Challenging Mining Environments...

    • datos.uchile.cl
    zip
    Updated Apr 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patricio Loncomilla; Samtani, Pavan; Javier Ruiz-del-Solar; Patricio Loncomilla; Samtani, Pavan; Javier Ruiz-del-Solar (2023). Replication Data for: Detecting Rocks in Challenging Mining Environments using Convolutional Neural Networks and Ellipses as an alternative to Bounding Boxes [Dataset]. http://doi.org/10.34691/FK2/1GQBHK
    Explore at:
    zip(372432555), zip(86831735)Available download formats
    Dataset updated
    Apr 12, 2023
    Dataset provided by
    Repositorio de datos de investigación de la Universidad de Chile
    Authors
    Patricio Loncomilla; Samtani, Pavan; Javier Ruiz-del-Solar; Patricio Loncomilla; Samtani, Pavan; Javier Ruiz-del-Solar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The automation of heavy-duty machinery and vehicles used in underground mines is a growing tendency which requires addressing several challenges, such as the robust detection of rocks in the production areas of mines. For instance, human assistance must be requested when using autonomous LHD (Load-Haul-Dump) loaders in case rocks are too big to be loaded into the bucket. Also, in the case of autonomous rock breaking hammers, oversized rocks need to be identified and located, to then be broken in smaller sections. In this work, a novel approach called Rocky-CenterNet is proposed for detecting rocks. Unlike other object detectors, Rocky-CenterNet uses ellipses to enclose a rock’s bounds, enabling a better description of the shape of the rocks than the classical approach based on bounding boxes. The performance of Rocky-CenterNet is compared with the one of CenterNet and Mask R-CNN, which use bounding boxes and segmentation masks, respectively. The comparisons were performed on two datasets: the Hammer-Rocks dataset (introduced in this work) and the Scaled Front View dataset. The Hammer-Rocks dataset was captured in an underground ore pass, while a rock-breaking hammer was operating. This dataset includes challenging conditions such as the presence of dust in the air and occluded rocks. The metrics considered are related to the quality of the detections and the processing times involved. From the results, it is shown that ellipses provide a better approximation of the rocks shapes’ than bounding boxes. Moreover, when rocks are annotated using ellipses, Rocky-CenterNet offers the best performance while requiring shorter processing times than Mask-RCNN (4x faster). Thus, using ellipses to describe rocks is a reliable alternative. Both the datasets and the code are available for research purposes.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dashlink (2025). Distributed Data Mining in Peer-to-Peer Networks [Dataset]. https://catalog.data.gov/dataset/distributed-data-mining-in-peer-to-peer-networks

Distributed Data Mining in Peer-to-Peer Networks

Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dashlink
Description

Peer-to-peer (P2P) networks are gaining popularity in many applications such as file sharing, e-commerce, and social networking, many of which deal with rich, distributed data sources that can benefit from data mining. P2P networks are, in fact,well-suited to distributed data mining (DDM), which deals with the problem of data analysis in environments with distributed data,computing nodes,and users. This article offers an overview of DDM applications and algorithms for P2P environments,focusing particularly on local algorithms that perform data analysis by using computing primitives with limited communication overhead. The authors describe both exact and approximate local P2P data mining algorithms that work in a decentralized and communication-efficient manner.

Search
Clear search
Close search
Google apps
Main menu