7 datasets found
  1. d

    Data from: A Generic Local Algorithm for Mining Data Streams in Large...

    • catalog.data.gov
    • datasets.ai
    • +3more
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). A Generic Local Algorithm for Mining Data Streams in Large Distributed Systems [Dataset]. https://catalog.data.gov/dataset/a-generic-local-algorithm-for-mining-data-streams-in-large-distributed-systems
    Explore at:
    Dataset updated
    Apr 10, 2025
    Dataset provided by
    Dashlink
    Description

    In a large network of computers or wireless sensors, each of the components (henceforth, peers) has some data about the global state of the system. Much of the system's functionality such as message routing, information retrieval and load sharing relies on modeling the global state. We refer to the outcome of the function (e.g., the load experienced by each peer) as the emph{model} of the system. Since the state of the system is constantly changing, it is necessary to keep the models up-to-date. Computing global data mining models e.g. decision trees, k-means clustering in large distributed systems may be very costly due to the scale of the system and due to communication cost, which may be high. The cost further increases in a dynamic scenario when the data changes rapidly. In this paper we describe a two step approach for dealing with these costs. First, we describe a highly efficient emph{local} algorithm which can be used to monitor a wide class of data mining models. Then, we use this algorithm as a feedback loop for the monitoring of complex functions of the data such as its k-means clustering. The theoretical claims are corroborated with a thorough experimental analysis.

  2. Application Research of Clustering on kmeans

    • kaggle.com
    zip
    Updated Feb 27, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ddpr raju (2021). Application Research of Clustering on kmeans [Dataset]. https://www.kaggle.com/ddprraju/tirupati-compus-school
    Explore at:
    zip(34507 bytes)Available download formats
    Dataset updated
    Feb 27, 2021
    Authors
    ddpr raju
    Description

    Dataset

    This dataset was created by ddpr raju

    Contents

    It contains the following files:

  3. m

    Multidimensional Dataset Of Food Security And Nutrition In Cauca.

    • data.mendeley.com
    Updated Dec 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Santiago Restrepo (2021). Multidimensional Dataset Of Food Security And Nutrition In Cauca. [Dataset]. http://doi.org/10.17632/wsss65c885.1
    Explore at:
    Dataset updated
    Dec 6, 2021
    Authors
    David Santiago Restrepo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Cauca
    Description

    A multidimensional dataset created for the department of Cauca based on public data sources is published. The dataset integrates the 4 FAO food security dimensions: physical availability of food, economic and physical access to food, food utilization, and the sustainability of the dimensions mentioned above. It also allows analysis of different variables such as nutritional, socioeconomic, climatic, sociodemographic, among others with statistical techniques or temporal analysis. The dataset can also be used for analysis and extraction of characteristics with computer vision techniques from satellite images, or multimodal machine learning with data of a different nature (images and tabular data).

    The dataset Contains the folders: - Multidimensional dataset of Cauca/: Here are the tabular data of the municipalities of the department of Cauca. The folder contains the files: 1. dictionary(English).xlsx: The dictionary of the static variables for each municipality of Cauca in english. 2. dictionary(Español): The dictionary of the static variables for each municipality of Cauca in spanish. 3. dictionary(English).xlsx: The dictionary of the static variables for each municipality of Cauca in english. 4. MultidimensionalDataset_AllMunicipalities.csv: Nutritional, climatic, sociodemographic, socioeconomic and agricultural data of the 42 municipalities of the department of Cauca, although with some null values due to the lack of data in nutrition surveys of some municipalities. - Satellite Images Popayán/: Here are the monthly Landsat 8 satellite images of the municipality of Popayán in Cauca. The folder contains the folders: 1. RGB/: Contains the RGB images of the municipality of Popayán in the department of Cauca. It contains RGB images of Popayán from April 2013 to December 2020 in a resolution of 15 m / px. The title of each image is image year_month.png. 1. 6 Band Images/: Contains images of Landsat 8 using bands 1 to 8 to generate images of the municipality of Popayán in the department of Cauca. It contains 6 band images in a tif format of Popayán from April 2013 to December 2020 in a resolution of 15 m / px. The title of each image is image year_month.tif.

  4. P

    Dataset for Erasable Itemset Mining

    • opendata.pku.edu.cn
    Updated Nov 19, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peking University Open Research Data Platform (2015). Dataset for Erasable Itemset Mining [Dataset]. http://doi.org/10.18170/DVN/ISHFQX
    Explore at:
    text/plain; charset=us-ascii(5336007), text/plain; charset=us-ascii(9764947), text/plain; charset=us-ascii(7000387)Available download formats
    Dataset updated
    Nov 19, 2015
    Dataset provided by
    Peking University Open Research Data Platform
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    These three artificial datasets are for mining erasable itemset. The definition of erasable itemset is in the following reference papers. Note that the three data sets all include 200 different items. But for each item, we did not give the profit value of it. Users can generate as they require, with normal or randomly distribution.

  5. r

    A predictive model for opal exploration in Australia from a data mining...

    • researchdata.edu.au
    Updated May 1, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Landgrebe; Thomas Landgrebe; Adriana Dutkiewicz; Dietmar Muller (2015). A predictive model for opal exploration in Australia from a data mining approach [Dataset]. http://doi.org/10.4227/11/5587A86C0FDF1
    Explore at:
    Dataset updated
    May 1, 2015
    Dataset provided by
    The University of Sydney
    Authors
    Thomas Landgrebe; Thomas Landgrebe; Adriana Dutkiewicz; Dietmar Muller
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Area covered
    Dataset funded by
    Australian Research Council
    Description

    This data collection is associated with the publications: Merdith, A. S., Landgrebe, T. C. W., Dutkiewicz, A., & Müller, R. D. (2013). Towards a predictive model for opal exploration using a spatio-temporal data mining approach. Australian Journal of Earth Sciences, 60(2), 217-229. doi: 10.1080/08120099.2012.754793

    and

    Landgrebe, T. C. W., Merdith, A., Dutkiewicz, A., & Müller, R. D. (2013). Relationships between palaeogeography and opal occurrence in Australia: A data-mining approach. Computers & Geosciences, 56(0), 76-82. doi: 10.1016/j.cageo.2013.02.002

    Publication Abstract - Merdith et al. (2013)

    Opal is Australia's national gemstone, however most significant opal discoveries were made in the early 1900's - more than 100 years ago - until recently. Currently there is no formal exploration model for opal, meaning there are no widely accepted concepts or methodologies available to suggest where new opal fields may be found. As a consequence opal mining in Australia is a cottage industry with the majority of opal exploration focused around old opal fields. The EarthByte Group has developed a new opal exploration methodology for the Great Artesian Basin. The work is based on the concept of applying “big data mining” approaches to data sets relevant for identifying regions that are prospective for opal. The group combined a multitude of geological and geophysical data sets that were jointly analysed to establish associations between particular features in the data with known opal mining sites. A “training set” of known opal localities (1036 opal mines) was assembled, using those localities, which were featured in published reports and on maps. The data used include rock types, soil type, regolith type, topography, radiometric data and a stack of digital palaeogeographic maps. The different data layers were analysed via spatio-temporal data mining combining the GPlates PaleoGIS software (www.gplates.org) with the Orange data mining software (orange.biolab.si) to produce the first opal prospectivity map for the Great Artesian Basin. One of the main results of the study is that the geological conditions favourable for opal were found to be related to a particular sequence of surface environments over geological time. These conditions involved alternating shallow seas and river systems followed by uplift and erosion. The approach reduces the entire area of the Great Artesian Basin to a mere 6% that is deemed to be prospective for opal exploration. The work is described in two companion papers in the Australian Journal of Earth Sciences and Computers and Geosciences.

    Publication Abstract - Landgrebe et al. (2013)

    Age-coded multi-layered geological datasets are becoming increasingly prevalent with the surge in open-access geodata, yet there are few methodologies for extracting geological information and knowledge from these data. We present a novel methodology, based on the open-source GPlates software in which age-coded digital palaeogeographic maps are used to “data-mine” spatio-temporal patterns related to the occurrence of Australian opal. Our aim is to test the concept that only a particular sequence of depositional/erosional environments may lead to conditions suitable for the formation of gem quality sedimentary opal. Time-varying geographic environment properties are extracted from a digital palaeogeographic dataset of the eastern Australian Great Artesian Basin (GAB) at 1036 opal localities. We obtain a total of 52 independent ordinal sequences sampling 19 time slices from the Early Cretaceous to the present-day. We find that 95% of the known opal deposits are tied to only 27 sequences all comprising fluvial and shallow marine depositional sequences followed by a prolonged phase of erosion. We then map the total area of the GAB that matches these 27 opal-specific sequences, resulting in an opal-prospective region of only about 10% of the total area of the basin. The key patterns underlying this association involve only a small number of key environmental transitions. We demonstrate that these key associations are generally absent at arbitrary locations in the basin. This new methodology allows for the simplification of a complex time-varying geological dataset into a single map view, enabling straightforward application for opal exploration and for future co-assessment with other datasets/geological criteria. This approach may help unravel the poorly understood opal formation process using an empirical spatio-temporal data-mining methodology and readily available datasets to aid hypothesis testing.

    Authors and Institutions

    Andrew Merdith - EarthByte Research Group, School of Geosciences, The University of Sydney, Australia. ORCID: 0000-0002-7564-8149

    Thomas Landgrebe - EarthByte Research Group, School of Geosciences, The University of Sydney, Australia

    Adriana Dutkiewicz - EarthByte Research Group, School of Geosciences, The University of Sydney, Australia

    R. Dietmar Müller - EarthByte Research Group, School of Geosciences, The University of Sydney, Australia. ORCID: 0000-0002-3334-5764

    Overview of Resources Contained

    This collection contains geological data from Australia used for data mining in the publications Merdith et al. (2013) and Landgrebe et al. (2013). The resulting maps of opal prospectivity are also included.

    List of Resources

    Note: For details on the files included in this data collection, see “Description_of_Resources.txt”.

    Note: For information on file formats and what programs to use to interact with various file formats, see “File_Formats_and_Recommended_Programs.txt”.

    • Map of Barfield region, Australia (.jpg, 270 KB)
    • Map overviewing the Great Artesian basins and main opal mining camps (.png, 82 KB)
    • Maps showing opal prospectivity data mining results for different geological datasets (.tif, 23.1 MB)
    • Map of opal prospectivity from palaeogeography data mining (.pdf, 2.6 MB)
    • Raster of palaeogeography target regions for viewing in Google Earth (.jpg, 418 KB)
    • Opal mine locations (.gpml, .txt, .kmz, .shp, total 15.6 MB)
    • Map of opal prospectivity from all data mining results as a Google Earth overlay (.kmz, 12 KB)
    • Map of probability of opal occurrence in prospective regions from all data mining results (.tif, 5.9 MB)
    • Paleogeography of Australia (.gpml, .txt, .shp, total 114.2 MB)
    • Radiometric data showing potassium concentration contrasts (.tif, .kmz, total 311.3 MB)
    • Regolith data (.gpml, .txt, .kml, .shp, total 7.1 MB)
    • Soil type data (.gpml, .txt, .kml, .shp, total 7.1 MB)

    For more information on this data collection, and links to other datasets from the EarthByte Research Group please visit EarthByte

    For more information about using GPlates, including tutorials and a user manual please visit GPlates or EarthByte

  6. s

    Data release [made at SA Director of Mines' discretion] : Langhorne Creek,...

    • pid.sarig.sa.gov.au
    Updated Oct 1, 2003
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2003). Data release [made at SA Director of Mines' discretion] : Langhorne Creek, Bremer, Currency Creek and Strathalbyn (the Fleurieu Project). Joint annual reports, for the period 1/10/2003 to 30/9/2008. [Dataset]. https://pid.sarig.sa.gov.au/dataset/mesac25071
    Explore at:
    Dataset updated
    Oct 1, 2003
    Area covered
    Strathalbyn, Langhorne Creek, Fleurieu Peninsula, Currency Creek
    Description

    The Fleurieu Project, incorporating EL's 2839 (Bremer), 3128 (Currency Creek) and 2677 (Langhorne Creek), is located approximately 60km southeast of the city of Adelaide, centred about the township of Strathalbyn. The Fleurieu Project began life... The Fleurieu Project, incorporating EL's 2839 (Bremer), 3128 (Currency Creek) and 2677 (Langhorne Creek), is located approximately 60km southeast of the city of Adelaide, centred about the township of Strathalbyn. The Fleurieu Project began life in 1991 with the granting of the original Bremer exploration licence to Aberfoyle Resources Limited. Aberfoyle discovered the Angas prospect in the same year. The ELs were taken up to encompass Kanmantoo Group meta-greywackes that are known to host economic stratiform zinc and lead mineralisation occurring as massive sulphides. During first year of joint annual reporting for the ‘Fleurieu Project’, 32 NQ2 diamond holes for 7,067 m (including 1370.75 m (mostly rotary mud) pre-collars) were drilled at the Angas project, with the aim of better defining and extending the known resource, prior to undertaking a scoping and mining pre-feasibility study. An updated 3D model of the resource and a new resource estimate were in progress at the time of reporting. A MIMDAS induced polarisation (IP)/magnetotellurics (MT) survey was completed over Angas and its potential strike along extensions, a total of 21 lines of pole-dipole IP and MT data were read. The survey and subsequent processing and modelling were undertaken by Geophysical Resources and Services Pty. Ltd. Three-dimensional modelling of the IP and resistivity data was also undertaken. The block models from the survey identify a conductive zone corresponding to the Rankine shoot, plus two additional zones that were drill tested, however, these did not demonstrate the existence of sulphides that would explain the anomalies. A review of DHEM (downhole electromagnetic) data suggested that a re-evaluation of some of the old data sets using the latest computer-based inversion interpretation methodology could provide new and improved targets for on-going exploration. Consultant Stephen Toteff was contracted to undertake a review of the Strathburn prospect, which is located immediately north of the Angas project area, between the Strathalbyn mine and the EL boundary, just south of the old Breadalbane copper mine. From the 2004 drilling program, Joseph Ogierman prepared a detailed report on the relationship between structure and mineralisation at Angas. No exploration work was undertaken on EL 3128 “Currency Creek” & EL 3310 “Langhorne Creek”. During the second year of joint annual reporting, exploration focused on the Angas Zinc Project EL 2839 “Bremer”, where a Total Indicated Resource of over 3 Mt of base and precious metals has been defined. During the period, 44 NQ2 diamond cored holes for 11,577.5 m (including 2342.7 m pre-collar) were drilled to better define and extend the mineralisation prior to undertaking scoping and pre-feasibility studies. EM geophysics and a limited soil sampling survey were also undertaken. Within the Angas deposit, mineralisation is enveloped by pyrrhotite and is particularly conductive and can therefore be targeted by EM geophysical methods. A down hole EM survey was undertaken on a single hole (AN084), with the interpretation of the data undertaken by J. Silic on this and previous DHEM data. The soil sampling survey was undertaken to follow up a previously identified Pb soil anomaly that extended several 100’s meters north of the Angas deposit that remained open to the north. Due to land access issues, the survey was limited to 38 samples collected at 50 m intervals along 3 NE-SW lines, north-east of the main anomaly which remains untested. Elevated zinc >100 ppm was returned from several samples. A resource estimate was announced 14/11/2005 included a Total Indicated Resource of 3.04 Mt with grades 8% Zn, 3.1% Pb, 34 g/t Ag, 0.5 g/t Au and 0.3% Cu. No exploration work was undertaken on EL 3128 “Currency Creek” & EL 3310 “Langhorne Creek”. Mining Lease 6229 was granted over the Angas area on 17 August 2006. During the third year of joint reporting, activities undertaken by Terramin have included exploration drilling, geophysics, geotechnical work, environmental programs and rehabilitation. Exploration during the period was again focused on the Angas area and included the drilling of 33 NQ2 diamond holes, totalling 11,154.2 m (including pre-collars). The aim of the drilling was to continue to better define and extend the known resource. Selected samples of drill core were sent for petrographic analysis. Down-hole electro-magnetic (DHEM) geophysics and a small surface moving loop EM (MLEM) survey were also undertaken. Old DHEM data were reviewed. Several target areas were defined and better delineated. Geotechnical work included the drilling of one PQ and 16 HQ diamond holes for 1909.6 m (including pre-collars) in the vicinity of the proposed underground developments. Other work involved an extensive geotechnical survey of the ground conditions around the (future) tailings storage. At selected locations around the future Angas mine site, and more distally, 12 dust monitoring sites, five surface water sites, and, as part of a regional groundwater monitoring program, nine piezometers at six sites were established during the reporting period. An additional four NQ2 diamond holes for 311.9 m were drilled as part of a groundwater evaluation program in the vicinity of the planned boxcut. During the period, Outer-Rim Exploration Services (ORES) were contracted to undertake DHEM surveys on several deep holes using a Crone 3D pulse system with a time-domain EM method. A limited surface moving loop EM survey was also undertaken over an area known as the Dawson anomaly. Geophysical consultant Llew Wynn was contracted to interpret old EM data and EM data collected in 2006 as well as EM data flown as a part of the federal government’s National Action Plan for Salinity, the Bureau of Rural Sciences. Geophysical consultant David McInnes was also retained to review and reinterpret old data from previous gravity and MIMDAS surveys. Included within the report is a summary of the 2006 Angas Zinc Project Feasibility Study. Terramin was successful in obtaining funding for subsidised drilling through the PACE Initiative Year 2 round – DPY2-28. In May 2004 drillhole AN084, under the first PACE program, was designed to test for down-plunge extensions of Rankine mineralisation and for possible parallel shoots within the Jettner Zone. The second PACE-subsidised diamond drillhole (drilled September 2005), AN109, which is the deepest drilled to date at the Angas prospect, was designed to intersect the recently recognised Jettner Deeps downhole EM geophysical anomaly, and this hole was sited approximately 300 m south of AN084. [See ENV11159 CNO:2025674]. During the fourth year of joint reporting on the Fleurieu Project, development of the Angas Zinc Mine (ML 6229) began in June 2007. Exploration efforts during the period were again focused on the Angas area within the ML. Exploration outside of mining lease saw diamond NQ2 drilling completed in 2 areas: the Gemmell zone near Angas, and at Harriett Hill 20 km east of Angas. Thirteen samples of drill core were submitted for petrographic description. At the Gemmell zone, 5 inclined holes were drilled for 850.5 m to explore for any northern extensions of the Angas resource. Holes AN162–164 intersected fine- to medium-grained disseminated sulphides and stringer veins of pyrrhotite-pyrite(-sphalerite-galena-chalcopyrite) in the Host Unit. The drilling confirmed the presence of sulphides, but only as narrow zones with stringer veins of pyrrhotite, pyrite and sphalerite, and negligible galena. Terramin acquired low-level airborne EM and magnetics survey data that was flown by Bureau of Rural Sciences as part of the National Action Plan for Salinity that covered portions of the project area. Terramin contracted Fugro Airborne Surveys Pty Ltd to reprocess selected lines of the data from over the Angas deposit. Consultant geophysicist Llew Wynn then used the data to characterise an “Angas signature”, which was then used to identify areas within the project that had an Angas-like signatures, notably the coincidence of elevated EM and magnetic responses. Moving loop and fixed loop electro-magnetic (MLEM and FLEM) surveys were undertaken at Harriett Hill prior to the drilling to better define airborne EM and magnetic anomalies. A total of 21.5 line-km of MLEM 6.1 line-km of FLEM data was collected. From these surveys, collars for the three diamond holes were chosen. At Harriett Hill, three holes (888.7 m) were drilled to test geophysical targets. These holes were supported by a PACE grant (Project no. DPY4-24) [ENV11558 CNO:2026148]. Two water bores (totalling 360 m) were also drilled to find water for the diamond drilling. Down-hole electro-magnetic (DHEM) geophysics surveys were undertaken on each of the three diamond holes at Harriett Hill, and the two water bores. DHEM was also undertaken on hole AN150, which was collared south of Angas and drilled in the previous reporting period. During the fifth year of joint reporting for the Fleurieu Project, the first ore from the Angas Zinc mine was produced in April 2008 and production began in July 2008. Exploration during the reporting period shifted to more regional target with three NQ2 diamond holes (totalling 748.9 m, including pre-collars) drilled at the Brinkley prospect, 12 km SW of Murray Bridge and the assay results were returned for the three drillholes completed at Harriett Hill in the previous reporting period. Moving loop and fixed loop electro-magnetic (MLEM and FLEM) surveys were undertaken at Brinkley prior to the drilling to better define airborne EM and magnetic anomalies. A MLEM survey was also undertaken at Navarino as a follow-up to a

  7. f

    Definitions of common notations.

    • plos.figshare.com
    xls
    Updated Jun 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yong-Ki Kim; Hyeong-Jin Kim; Hyunjo Lee; Jae-Woo Chang (2023). Definitions of common notations. [Dataset]. http://doi.org/10.1371/journal.pone.0267908.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 3, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Yong-Ki Kim; Hyeong-Jin Kim; Hyunjo Lee; Jae-Woo Chang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Definitions of common notations.

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dashlink (2025). A Generic Local Algorithm for Mining Data Streams in Large Distributed Systems [Dataset]. https://catalog.data.gov/dataset/a-generic-local-algorithm-for-mining-data-streams-in-large-distributed-systems

Data from: A Generic Local Algorithm for Mining Data Streams in Large Distributed Systems

Related Article
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description

In a large network of computers or wireless sensors, each of the components (henceforth, peers) has some data about the global state of the system. Much of the system's functionality such as message routing, information retrieval and load sharing relies on modeling the global state. We refer to the outcome of the function (e.g., the load experienced by each peer) as the emph{model} of the system. Since the state of the system is constantly changing, it is necessary to keep the models up-to-date. Computing global data mining models e.g. decision trees, k-means clustering in large distributed systems may be very costly due to the scale of the system and due to communication cost, which may be high. The cost further increases in a dynamic scenario when the data changes rapidly. In this paper we describe a two step approach for dealing with these costs. First, we describe a highly efficient emph{local} algorithm which can be used to monitor a wide class of data mining models. Then, we use this algorithm as a feedback loop for the monitoring of complex functions of the data such as its k-means clustering. The theoretical claims are corroborated with a thorough experimental analysis.

Search
Clear search
Close search
Google apps
Main menu