https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Question Paper Solutions of chapter Module III of Data Warehousing and Data Mining, 7th Semester , Computer Science and Engineering
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Opal is Australia's national gemstone, however most significant opal discoveries were made in the early 1900's - more than 100 years ago - until recently. Currently there is no formal exploration model for opal, meaning there are no widely accepted concepts or methodologies available to suggest where new opal fields may be found. As a consequence opal mining in Australia is a cottage industry with the majority of opal exploration focused around old opal fields. The EarthByte Group has developed a new opal exploration methodology for the Great Artesian Basin. The work is based on the concept of applying “big data mining” approaches to data sets relevant for identifying regions that are prospective for opal. The group combined a multitude of geological and geophysical data sets that were jointly analysed to establish associations between particular features in the data with known opal mining sites. A “training set” of known opal localities (1036 opal mines) was assembled, using those localities, which were featured in published reports and on maps. The data used include rock types, soil type, regolith type, topography, radiometric data and a stack of digital palaeogeographic maps. The different data layers were analysed via spatio-temporal data mining combining the GPlates PaleoGIS software (www.gplates.org) with the Orange data mining software (orange.biolab.si) to produce the first opal prospectivity map for the Great Artesian Basin. One of the main results of the study is that the geological conditions favourable for opal were found to be related to a particular sequence of surface environments over geological time. These conditions involved alternating shallow seas and river systems followed by uplift and erosion. The approach reduces the entire area of the Great Artesian Basin to a mere 6% that is deemed to be prospective for opal exploration. The work is described in two companion papers in the Australian Journal of Earth Sciences and Computers and Geosciences.
Age-coded multi-layered geological datasets are becoming increasingly prevalent with the surge in open-access geodata, yet there are few methodologies for extracting geological information and knowledge from these data. We present a novel methodology, based on the open-source GPlates software in which age-coded digital palaeogeographic maps are used to “data-mine” spatio-temporal patterns related to the occurrence of Australian opal. Our aim is to test the concept that only a particular sequence of depositional/erosional environments may lead to conditions suitable for the formation of gem quality sedimentary opal. Time-varying geographic environment properties are extracted from a digital palaeogeographic dataset of the eastern Australian Great Artesian Basin (GAB) at 1036 opal localities. We obtain a total of 52 independent ordinal sequences sampling 19 time slices from the Early Cretaceous to the present-day. We find that 95% of the known opal deposits are tied to only 27 sequences all comprising fluvial and shallow marine depositional sequences followed by a prolonged phase of erosion. We then map the total area of the GAB that matches these 27 opal-specific sequences, resulting in an opal-prospective region of only about 10% of the total area of the basin. The key patterns underlying this association involve only a small number of key environmental transitions. We demonstrate that these key associations are generally absent at arbitrary locations in the basin. This new methodology allows for the simplification of a complex time-varying geological dataset into a single map view, enabling straightforward application for opal exploration and for future co-assessment with other datasets/geological criteria. This approach may help unravel the poorly understood opal formation process using an empirical spatio-temporal data-mining methodology and readily available datasets to aid hypothesis testing.
Andrew Merdith - EarthByte Research Group, School of Geosciences, The University of Sydney, Australia. ORCID: 0000-0002-7564-8149
Thomas Landgrebe - EarthByte Research Group, School of Geosciences, The University of Sydney, Australia
Adriana Dutkiewicz - EarthByte Research Group, School of Geosciences, The University of Sydney, Australia
R. Dietmar Müller - EarthByte Research Group, School of Geosciences, The University of Sydney, Australia. ORCID: 0000-0002-3334-5764
This collection contains geological data from Australia used for data mining in the publications Merdith et al. (2013) and Landgrebe et al. (2013). The resulting maps of opal prospectivity are also included.
Note: For details on the files included in this data collection, see “Description_of_Resources.txt”.
Note: For information on file formats and what programs to use to interact with various file formats, see “File_Formats_and_Recommended_Programs.txt”.
For more information on this data collection, and links to other datasets from the EarthByte Research Group please visit EarthByte
For more information about using GPlates, including tutorials and a user manual please visit GPlates or EarthByte
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Question Paper Solutions of Data Warehousing and Data Mining (Old),7th Semester,Computer Science and Engineering,Maulana Abul Kalam Azad University of Technology
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Question Paper Solutions of chapter Module IV of Data Warehousing and Data Mining, 7th Semester , Computer Science and Engineering
http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
Dataset available only to University of Arizona affiliates. To obtain access, you must log in to ReDATA with your NetID. Data is for research use by each individual downloader only. Sharing and/or redistribution of any portion of this dataset is prohibited.The UA Libraries have acquired XML and PDF files of two newspapers from the ProQuest Historical Newspapers collection: The New York Times (1851-1936) and The Washington Post (1877-1934). These files may be downloaded and used for text and data mining. For questions about news resources or text mining, contact Mary Feeney (mfeeney@arizona.edu), Librarian in the Research & Learning Department. NOTE: The uncompressed datasets are very large.Detailed file descriptions and MD5 hash values for each file can be found in the README.txt file.For inquiries regarding the contents of this dataset, please contact the Corresponding Author listed in the README.txt file. Administrative inquiries (e.g., removal requests, trouble downloading, etc.) can be directed to data-management@arizona.edu
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the data used in the experiments of our paper:N. Arinik, R. Figueiredo, V. Labatut (2020), Multiplicity and Diversity: Analyzing the Optimal Solution Space of the Correlation Clustering Problem on Complete Signed Graphs, Journal of Complex Networks, DOI: 10.1093/comnet/cnaa025. The code source is accessible here: https://github.com/CompNet/SosoccThis dataset contains:* Plot files used in the article* Input signed networks* All optimal solutions (i.e. optimal solution space) of the corresponding networks* Evaluation files# PLOT FILES* Figure1.zip
: Figures showing that there might be many distinct optimal solutions of a small-sized network.* Figure2.zip
: Figures showing that distinct optimal solutions of a given network might be partition-wise very similar or different.* Figure4: All Results.zip
: Figure 4 in the article contains only a few plots regarding the results for space considerations. This zip file contains all plots, and it is organized by the values of l0
. In each l0
folder, the results are shown in three different perspectives: --- Detected Imbalance Percentage vs Graph Order (i.e. number of vertices) --- Prop mispl vs Graph order --- Graph order vs Prop mispl* workflow.pdf
: The workflow of the methodology used in the article.* Syrian network With All Solutions.pdf
: Syrian network (on top) with core part information through node colors, and its optimal solutions in which node colors represent partition information (on bottom).#NETWORKSAll networks are in Input Signed Networks.tar.gz
.Networks are generated through a simple random model (available in https://github.com/CompNet/SignedBenchmark) designed to produce complete (or uncomplete) unweighted networks with built-in modular structure. There are 3 parameters used for the generation:- number of nodes (n
)- initial number of modules (l0
)- proportion of misplaced links, i.e. proportion of frustrated links, (qm
)Inside Input Signed Networks.tar.gz
:NETWORKS|_n=NB-NODE_l0=INIT_NB_MODULE_dens=1.0000....|_propMispl=PROP_MISPL ........|_propNeg=PROP_NEG ............|_network=NETWORK_NO- The first hierarchy => the folders are named as follows: n=NB-NODE_l0=INIT-NB-MODULE_dens=1.0000 The number of nodes, the initial number of modules and the network density are given. The network density is always 1, since we treat only complete signed networks.- The second hierarchy => the folders are named as follows: propMispl=PROP_MISPL Proportion of misplaced links is given.- The third hierarchy => the folders are named as follows: propNeg=PROP_NEG Proportion of negative links (qn
) is specified. qn
changes depending on n
and l0
. Since only complete signed networks are studied, this parameter is automatically computed from the other input parameters.- The fourth hierarchy => the folders are named as follows: network=NETWORK_NO Network numbers are shown.In the end, thre are three file formats describing the same network content: GraphML (.graphml), Pajek NET (.net) or .G format.# PARTITIONSAll partition results are in Partition Results.tar.gz
. Note that all optimal partitions of a signed network are obtained through an exact partitioning method. The code source is accessible here: https://github.com/arinik9/ExCCInside Partition Results.tar.gz
:PARTITIONS|_n=NB-NODE_l0=INIT_NB_MODULE_dens=1.0000 ....|_propMispl=PROP_MISPL ........|_propNeg=PROP_NEG ............|_network=NETWORK_NO ................|_"ExCC-all" ....................|_"signed-unweighted"- The first hierarchy => the folders are named as follows: n=NB-NODE_l0=INIT-NB-MODULE_dens=1.0000- The second hierarchy => the folders are named as follows: propMispl=PROP_MISPL- The third hierarchy => the folders are named as follows: propNeg=PROP_NEG- The fourth hierarchy => the folders are named as follows: network=NETWORK_NO- The fifth hierarchy => the folders are named as follows: "ExCC-all" The name of the partitioning method are shown. Since an exact partitioning method is used to obtain all distinct optimal solutions, it is named as "ExCC-all".- The sixth hierarchy => the folders are named as follows: "signed-unweighted" The type of signed networks are shown: signed and unweightedIn the end, the partition results are located, and the file names are named as follows: membership.txt. Note that the first partition result number starts from zero.# EVALUATIONSEvaluation results related to our plots are in Evaluation Results.tar.gz. Note that the hierarchy of this folder is the same as that of 'Partitions'. Inside
Evaluation Results.tar.gz:-
Best-k-for-kmedoids.csv: It contains three columns. 1) the number of solution classes via kmedoids, 2) the best Silhouette score, 3) the best clustering in terms of Silhouette score, which represents solution classes.-
class-core-part-size-tresh=1.00.csv. It indicates the proportion of core part size for each solution class.-
exec-time.csv: It indicates the execution time in seconds.-
imbalance.csv: It contains the information of imbalance as 1) count and 2) percentage -
nb-solution.csv`: It indicates the total number of solutions
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The tectonic earthquake data are primarily from a Uniform Moment Magnitude Earthquake Catalog developed for Utah and its surrounding region by Arabasz and others (2016) for the time period 1850 through September 2012. For the map, we extended the catalog through December 2016 and expanded it to include earthquakes smaller than magnitude 2.9. MIS was excluded from the compilation of Arabasz and others (2016) but has been added to the map to show its significance in east-central Utah. Data for the seismic events plotted on the map are listed in two separate catalogs in the form of an ArcGIS feature class within a file geodatabase. The catalog files are available in the Utah Geospatial Resource Center (UGRC) State Geographic Information Database (SGID, https://gis.utah.gov/data/geoscience/) and at https://ugspub.nr.utah.gov/publications/open_file_reports/ofr-667/ofr-667.zip. The primary catalog used for the map, termed the Earthquake Catalog (EQ Catalog, Utah_EQcat_1850_2016), comprises tectonic earthquakes located within the “Utah Region” (lat. 36.75° to 42.50° N, long. 108.75° to 114.25° W) from 1850 through 2016. This region is the standard region used by the University of Utah Seismograph Stations (UUSS) for the compilation and reporting of earthquakes within and surrounding Utah. Note that the map covers most, but not all, of the Utah Region. The map delineates two areas in east-central Utah that are characterized by predominantly (more than 90%) MIS. All seismic events (including both MIS and tectonic earthquakes) located in these two areas are listed in a separate catalog, termed the Coal-Mining-Region Catalog (CMR Catalog)(Utah_CMRcat_1928_2016), which extends from 1928 (the year of the first located event) through 2016. The EQ and CMR catalogs are mutually exclusive. The EQ Catalog does not include tectonic earthquakes located within the two delineated areas of predominantly MIS. More information about the earthquake epicenter data is contained in UGS OFR 667 (https://ugspub.nr.utah.gov/publications/open_file_reports/ofr-667/ofr-667.pdf).
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://paper.erudition.co.in/termshttps://paper.erudition.co.in/terms
Question Paper Solutions of chapter Module III of Data Warehousing and Data Mining, 7th Semester , Computer Science and Engineering