18 datasets found
  1. Web Mining for Collaborative Food Delivery

    • kaggle.com
    zip
    Updated Aug 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jocelyn Dumlao (2023). Web Mining for Collaborative Food Delivery [Dataset]. https://www.kaggle.com/datasets/jocelyndumlao/web-mining-for-collaborative-food-delivery
    Explore at:
    zip(396903 bytes)Available download formats
    Dataset updated
    Aug 26, 2023
    Authors
    Jocelyn Dumlao
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description

    This is the main data set that was built for the work titled: "A Web Mining Approach to Collaborative Consumption of Food Delivery Services" which is the official institutional research project of Professor Juan C. Correa at Fundación Universitaria Konrad Lorenz.

    Categories

    Urban Transportation, Consumer, e-Commerce Retail

    Acknowledgements & Source

    Professor Juan C. Correa at Fundación Universitaria Konrad Lorenz

    Data Source

    View Details

    Image Source

    Please don't forget to upvote if you find this useful.

  2. R

    Web Mining Dataset

    • universe.roboflow.com
    zip
    Updated May 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Web Mining Project (2023). Web Mining Dataset [Dataset]. https://universe.roboflow.com/web-mining-project/web-mining/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 17, 2023
    Dataset authored and provided by
    Web Mining Project
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Zebra Cross
    Description

    Web Mining

    ## Overview
    
    Web Mining is a dataset for classification tasks - it contains Zebra Cross annotations for 1,494 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  3. Knowledge Graph: tyrolean mining documents 15th and 16th century

    • zenodo.org
    • data-staging.niaid.nih.gov
    • +1more
    bin
    Updated Sep 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gerald Hiebel; Gerald Hiebel; Elisabeth Gruber-Tokić; Elisabeth Gruber-Tokić; Milena Peralta Friedburg; Milena Peralta Friedburg; Brigit Danthine; Brigit Danthine (2024). Knowledge Graph: tyrolean mining documents 15th and 16th century [Dataset]. http://doi.org/10.5281/zenodo.6276586
    Explore at:
    binAvailable download formats
    Dataset updated
    Sep 26, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gerald Hiebel; Gerald Hiebel; Elisabeth Gruber-Tokić; Elisabeth Gruber-Tokić; Milena Peralta Friedburg; Milena Peralta Friedburg; Brigit Danthine; Brigit Danthine
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains a Knowledge Graph (.nq file) of two historical mining documents: “Verleihbuch der Rattenberger Bergrichter” ( Hs. 37, 1460-1463) and “Schwazer Berglehenbuch” (Hs. 1587, approx. 1515) stored by the Tyrolean Regional Archive, Innsbruck (Austria). The user of the KG may explore the montanistic network and relations between people, claims and mines in the late medieval Tyrol. The core regions concern the districts Schwaz and Kufstein (Tyrol, Austria).

    The ontology used to represent the claims is CIDOC CRM, an ISO certified ontology for Cultural Heritage documentation. Supported by the Karma tool the KG is generated as RDF (Resource Description Framework). The generated RDF data is imported into a Triplestore, in this case GraphDB, and then displayed visually. This puts the data from the early mining texts into a semantically structured context and makes the mutual relationships between people, places and mines visible.

    Both documents and the Knowledge Graph were processed and generated by the research team of the project “Text Mining Medieval Mining Texts”. The research project (2019-2022) was carried out at the university of Innsbruck and funded by go!digital next generation programme of the Austrian Academy of Sciences.

    Citeable Transcripts of the historical documents are online available:
    Hs. 37 DOI: 10.5281/zenodo.6274562
    Hs. 1587 DOI: 10.5281/zenodo.6274928

  4. model web mining project akhir

    • kaggle.com
    zip
    Updated May 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cornelius Justin Satryo Hadi (2023). model web mining project akhir [Dataset]. https://www.kaggle.com/datasets/corneliusjustin/model-web-mining-project-akhir
    Explore at:
    zip(128427024 bytes)Available download formats
    Dataset updated
    May 14, 2023
    Authors
    Cornelius Justin Satryo Hadi
    Description

    Dataset

    This dataset was created by Cornelius Justin Satryo Hadi

    Contents

  5. q

    Simulated supermarket transaction data

    • researchdatafinder.qut.edu.au
    • researchdata.edu.au
    Updated May 31, 2010
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuefeng Li (2010). Simulated supermarket transaction data [Dataset]. https://researchdatafinder.qut.edu.au/individual/q44
    Explore at:
    Dataset updated
    May 31, 2010
    Dataset provided by
    Queensland University of Technology (QUT)
    Authors
    Yuefeng Li
    Description

    A database of de-identified supermarket customer transactions. This large simulated dataset was created based on a real data sample.

  6. R

    Cw Detection Dataset

    • universe.roboflow.com
    zip
    Updated May 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Web Mining Project (2023). Cw Detection Dataset [Dataset]. https://universe.roboflow.com/web-mining-project/cw-detection/model/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 17, 2023
    Dataset authored and provided by
    Web Mining Project
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Crosswalks Bounding Boxes
    Description

    CW Detection

    ## Overview
    
    CW Detection is a dataset for object detection tasks - it contains Crosswalks annotations for 2,512 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  7. c

    Sustaining growth for innovative new enterprises: UK firm data

    • datacatalogue.cessda.eu
    Updated Sep 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sensier, M; Gök , A; Shapira, P (2025). Sustaining growth for innovative new enterprises: UK firm data [Dataset]. http://doi.org/10.5255/UKDA-SN-851779
    Explore at:
    Dataset updated
    Sep 26, 2025
    Dataset provided by
    University of Manchester
    Authors
    Sensier, M; Gök , A; Shapira, P
    Time period covered
    Jan 1, 2012 - Dec 31, 2014
    Area covered
    United Kingdom
    Variables measured
    Organization
    Measurement technique
    We collected the financial information on the UK firms by downloading Companies House data from the FAME database available through the University of Manchester Library (see http://www.library.manchester.ac.uk/searchresources/databases/f/). Grant information on companies came from the Technology Strategy Board. Patent information was from the Derwent database and publication information was from the Web of Science. The Consumer Price index was from the Office for National Statistics (http://www.ons.gov.uk/ons/rel/cpi/consumer-price-indices/index.html). The Human Resources in Science and Technology variable was from the Eurostat database (http://ec.europa.eu/eurostat/data/database).Unstructured data was mined from firm's web-sites. The UK Intellectual Property Office has clarified that the data mining we are doing and the way we are doing it is permissible. See: https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/375954/Research.pdf
    Description

    To select the group of UK firms we initially searched in the FAME database (available from the University of Manchester Library) with keywords relating to the green goods sector, please see the publication Shapira, et al (2014, in Technological Forecasting & Social Change, vol. 85, pp. 93-104) for further details on the keywords. This database contains anonymized firm data from a sample of UK firms in the green goods production industry. We combine data from structured sources (the FAME database, patents and publications) with unstructured data mined from firm's web-sites by saving key words in text and summing up counts of these to create additional explanatory variables for firm growth. The data is in a panel from 2003-2012 with some observations missing for firms. We collect historical data from firm's web-sites available in an archive from the Wayback machine.

    This project probes the growth strategies of innovative small and medium-size enterprises (SMEs). Our research focuses on emerging green goods industries that manufacture outputs which benefit the environment or conserve natural resources, with an international comparative element involving the UK, the US, and China.

    The project investigates the contributions of strategy, resources and relationships to how innovative British, American, and Chinese SMEs achieve significant growth. The targeted technology-oriented green goods sectors are strategically important to environmental rebalancing and have significant potential (in the UK) for export growth. The research examines the diverse pathways to innovation and growth across different regions. We use a mix of methodologies, including analyses of structured and unstructured data on SME business and technology performance and strategies, case studies, and modelling. Novel approaches using web mining are pioneered to gain timely information about enterprise developmental pathways. Findings from the project will be used to inform management and policy development at enterprise, regional and national levels.

    The project is led by the Manchester Institute of Innovation Research at the University of Manchester, in collaboration with Georgia Institute of Technology, US; Beijing Institute of Technology, China, and Experian, UK.

  8. Product data mining: entity classification&linking

    • kaggle.com
    zip
    Updated Jul 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    zzhang (2020). Product data mining: entity classification&linking [Dataset]. https://www.kaggle.com/ziqizhang/product-data-miningentity-classificationlinking
    Explore at:
    zip(10933 bytes)Available download formats
    Dataset updated
    Jul 13, 2020
    Authors
    zzhang
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    IMPORTANT: Round 1 results are now released, check our website for the leaderboard. We now open Round 2 submissions!

    1. Overview

    We release two datasets that are part of the the Semantic Web Challenge on Mining the Web of HTML-embedded Product Data is co-located with the 19th International Semantic Web Conference (https://iswc2020.semanticweb.org/, 2-6 Nov 2020 at Athens, Greece). The datasets belong to two shared tasks related to product data mining on the Web: (1) product matching (linking) and (2) product classification. This event is organised by The University of Sheffield, The University of Mannheim and Amazon, and is open to anyone. Systems successfully beating the baseline of the respective task, will be invited to write a paper describing their method and system and present the method as a poster (and potentially also a short talk) at the ISWC2020 conference. Winners of each task will be awarded 500 euro as prize (partly sponsored by Peak Indicators, https://www.peakindicators.com/).

    2. Task and dataset brief

    The challenge organises two tasks, product matching and product categorisation.

    i) Product Matching deals with identifying product offers on different websites that refer to the same real-world product (e.g., the same iPhone X model offered using different names/offer titles as well as different descriptions on various websites). A multi-million product offer corpus (16M) containing product offer clusters is released for the generation of training data. A validation set containing 1.1K offer pairs and a test set of 600 offer pairs will also be released. The goal of this task is to classify if the offer pairs in these datasets are match (i.e., referring to the same product) or non-match.

    ii) Product classification deals with assigning predefined product category labels (which can be multiple levels) to product instances (e.g., iPhone X is a ‘SmartPhone’, and also ‘Electronics’). A training dataset containing 10K product offers, a validation set of 3K product offers and a test set of 3K product offers will be released. Each dataset contains product offers with their metadata (e.g., name, description, URL) and three classification labels each corresponding to a level in the GS1 Global Product Classification taxonomy. The goal is to classify these product offers into the pre-defined category labels.

    All datasets are built based on structured data that was extracted from the Common Crawl (https://commoncrawl.org/) by the Web Data Commons project (http://webdatacommons.org/). Datasets can be found at: https://ir-ischool-uos.github.io/mwpd/

    3. Resources and tools

    The challenge will also release utility code (in Python) for processing the above datasets and scoring the system outputs. In addition, the following language resources for product-related data mining tasks: A text corpus of 150 million product offer descriptions Word embeddings trained on the above corpus

    4. Challenge website

    For details of the challenge please visit https://ir-ischool-uos.github.io/mwpd/

    5. Organizing committee

    Dr Ziqi Zhang (Information School, The University of Sheffield) Prof. Christian Bizer (Institute of Computer Science and Business Informatics, The Mannheim University) Dr Haiping Lu (Department of Computer Science, The University of Sheffield) Dr Jun Ma (Amazon Inc. Seattle, US) Prof. Paul Clough (Information School, The University of Sheffield & Peak Indicators) Ms Anna Primpeli (Institute of Computer Science and Business Informatics, The Mannheim University) Mr Ralph Peeters (Institute of Computer Science and Business Informatics, The Mannheim University) Mr. Abdulkareem Alqusair (Information School, The University of Sheffield)

    6. Contact

    To contact the organising committee please use the Google discussion group https://groups.google.com/forum/#!forum/mwpd2020

  9. l

    LScD (Leicester Scientific Dictionary)

    • figshare.le.ac.uk
    docx
    Updated Apr 15, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neslihan Suzen (2020). LScD (Leicester Scientific Dictionary) [Dataset]. http://doi.org/10.25392/leicester.data.9746900.v3
    Explore at:
    docxAvailable download formats
    Dataset updated
    Apr 15, 2020
    Dataset provided by
    University of Leicester
    Authors
    Neslihan Suzen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Leicester
    Description

    LScD (Leicester Scientific Dictionary)April 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk/suzenneslihan@hotmail.com)Supervised by Prof Alexander Gorban and Dr Evgeny Mirkes[Version 3] The third version of LScD (Leicester Scientific Dictionary) is created from the updated LSC (Leicester Scientific Corpus) - Version 2*. All pre-processing steps applied to build the new version of the dictionary are the same as in Version 2** and can be found in description of Version 2 below. We did not repeat the explanation. After pre-processing steps, the total number of unique words in the new version of the dictionary is 972,060. The files provided with this description are also same as described as for LScD Version 2 below.* Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v2** Suzen, Neslihan (2019): LScD (Leicester Scientific Dictionary). figshare. Dataset. https://doi.org/10.25392/leicester.data.9746900.v2[Version 2] Getting StartedThis document provides the pre-processing steps for creating an ordered list of words from the LSC (Leicester Scientific Corpus) [1] and the description of LScD (Leicester Scientific Dictionary). This dictionary is created to be used in future work on the quantification of the meaning of research texts. R code for producing the dictionary from LSC and instructions for usage of the code are available in [2]. The code can be also used for list of texts from other sources, amendments to the code may be required.LSC is a collection of abstracts of articles and proceeding papers published in 2014 and indexed by the Web of Science (WoS) database [3]. Each document contains title, list of authors, list of categories, list of research areas, and times cited. The corpus contains only documents in English. The corpus was collected in July 2018 and contains the number of citations from publication date to July 2018. The total number of documents in LSC is 1,673,824.LScD is an ordered list of words from texts of abstracts in LSC.The dictionary stores 974,238 unique words, is sorted by the number of documents containing the word in descending order. All words in the LScD are in stemmed form of words. The LScD contains the following information:1.Unique words in abstracts2.Number of documents containing each word3.Number of appearance of a word in the entire corpusProcessing the LSCStep 1.Downloading the LSC Online: Use of the LSC is subject to acceptance of request of the link by email. To access the LSC for research purposes, please email to ns433@le.ac.uk. The data are extracted from Web of Science [3]. You may not copy or distribute these data in whole or in part without the written consent of Clarivate Analytics.Step 2.Importing the Corpus to R: The full R code for processing the corpus can be found in the GitHub [2].All following steps can be applied for arbitrary list of texts from any source with changes of parameter. The structure of the corpus such as file format and names (also the position) of fields should be taken into account to apply our code. The organisation of CSV files of LSC is described in README file for LSC [1].Step 3.Extracting Abstracts and Saving Metadata: Metadata that include all fields in a document excluding abstracts and the field of abstracts are separated. Metadata are then saved as MetaData.R. Fields of metadata are: List_of_Authors, Title, Categories, Research_Areas, Total_Times_Cited and Times_cited_in_Core_Collection.Step 4.Text Pre-processing Steps on the Collection of Abstracts: In this section, we presented our approaches to pre-process abstracts of the LSC.1.Removing punctuations and special characters: This is the process of substitution of all non-alphanumeric characters by space. We did not substitute the character “-” in this step, because we need to keep words like “z-score”, “non-payment” and “pre-processing” in order not to lose the actual meaning of such words. A processing of uniting prefixes with words are performed in later steps of pre-processing.2.Lowercasing the text data: Lowercasing is performed to avoid considering same words like “Corpus”, “corpus” and “CORPUS” differently. Entire collection of texts are converted to lowercase.3.Uniting prefixes of words: Words containing prefixes joined with character “-” are united as a word. The list of prefixes united for this research are listed in the file “list_of_prefixes.csv”. The most of prefixes are extracted from [4]. We also added commonly used prefixes: ‘e’, ‘extra’, ‘per’, ‘self’ and ‘ultra’.4.Substitution of words: Some of words joined with “-” in the abstracts of the LSC require an additional process of substitution to avoid losing the meaning of the word before removing the character “-”. Some examples of such words are “z-test”, “well-known” and “chi-square”. These words have been substituted to “ztest”, “wellknown” and “chisquare”. Identification of such words is done by sampling of abstracts form LSC. The full list of such words and decision taken for substitution are presented in the file “list_of_substitution.csv”.5.Removing the character “-”: All remaining character “-” are replaced by space.6.Removing numbers: All digits which are not included in a word are replaced by space. All words that contain digits and letters are kept because alphanumeric characters such as chemical formula might be important for our analysis. Some examples are “co2”, “h2o” and “21st”.7.Stemming: Stemming is the process of converting inflected words into their word stem. This step results in uniting several forms of words with similar meaning into one form and also saving memory space and time [5]. All words in the LScD are stemmed to their word stem.8.Stop words removal: Stop words are words that are extreme common but provide little value in a language. Some common stop words in English are ‘I’, ‘the’, ‘a’ etc. We used ‘tm’ package in R to remove stop words [6]. There are 174 English stop words listed in the package.Step 5.Writing the LScD into CSV Format: There are 1,673,824 plain processed texts for further analysis. All unique words in the corpus are extracted and written in the file “LScD.csv”.The Organisation of the LScDThe total number of words in the file “LScD.csv” is 974,238. Each field is described below:Word: It contains unique words from the corpus. All words are in lowercase and their stem forms. The field is sorted by the number of documents that contain words in descending order.Number of Documents Containing the Word: In this content, binary calculation is used: if a word exists in an abstract then there is a count of 1. If the word exits more than once in a document, the count is still 1. Total number of document containing the word is counted as the sum of 1s in the entire corpus.Number of Appearance in Corpus: It contains how many times a word occurs in the corpus when the corpus is considered as one large document.Instructions for R CodeLScD_Creation.R is an R script for processing the LSC to create an ordered list of words from the corpus [2]. Outputs of the code are saved as RData file and in CSV format. Outputs of the code are:Metadata File: It includes all fields in a document excluding abstracts. Fields are List_of_Authors, Title, Categories, Research_Areas, Total_Times_Cited and Times_cited_in_Core_Collection.File of Abstracts: It contains all abstracts after pre-processing steps defined in the step 4.DTM: It is the Document Term Matrix constructed from the LSC[6]. Each entry of the matrix is the number of times the word occurs in the corresponding document.LScD: An ordered list of words from LSC as defined in the previous section.The code can be used by:1.Download the folder ‘LSC’, ‘list_of_prefixes.csv’ and ‘list_of_substitution.csv’2.Open LScD_Creation.R script3.Change parameters in the script: replace with the full path of the directory with source files and the full path of the directory to write output files4.Run the full code.References[1]N. Suzen. (2019). LSC (Leicester Scientific Corpus) [Dataset]. Available: https://doi.org/10.25392/leicester.data.9449639.v1[2]N. Suzen. (2019). LScD-LEICESTER SCIENTIFIC DICTIONARY CREATION. Available: https://github.com/neslihansuzen/LScD-LEICESTER-SCIENTIFIC-DICTIONARY-CREATION[3]Web of Science. (15 July). Available: https://apps.webofknowledge.com/[4]A. Thomas, "Common Prefixes, Suffixes and Roots," Center for Development and Learning, 2013.[5]C. Ramasubramanian and R. Ramya, "Effective pre-processing activities in text mining using improved porter’s stemming algorithm," International Journal of Advanced Research in Computer and Communication Engineering, vol. 2, no. 12, pp. 4536-4538, 2013.[6]I. Feinerer, "Introduction to the tm Package Text Mining in R," Accessible en ligne: https://cran.r-project.org/web/packages/tm/vignettes/tm.pdf, 2013.

  10. Link structures of collections of academic web sites

    • figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mike Thelwall (2023). Link structures of collections of academic web sites [Dataset]. http://doi.org/10.6084/m9.figshare.785776.v4
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Mike Thelwall
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Databases of academic web links 2000-2006 This project was created for research into web links: including web link mining, and the creation of link metrics. It is aimed at providing the raw data and software for researchers to analyse link structures without having to rely upon commercial search engines, and without having to run their own web crawler. You may use all of the resources on this site for non-commercial reasons but please notify us if you have an academic paper or book published that uses the data in any way (so that we know the site is getting good use).

  11. d

    MIN4EU LGRB-BW: near-surface mineral raw material occurrences - harmonized...

    • daten-bw.de
    • ckan.mobidatalab.eu
    • +4more
    Updated Nov 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Geoportal Baden-Württemberg (2025). MIN4EU LGRB-BW: near-surface mineral raw material occurrences - harmonized dataset [Dataset]. https://www.daten-bw.de/de/web/guest/suchen/-/details/min4eu-lgrb-bw-near-surface-mineral-raw-material-occurrences-harmonized-dataset
    Explore at:
    http://publications.europa.eu/resource/authority/file-type/wfs_srvcAvailable download formats
    Dataset updated
    Nov 18, 2025
    Dataset provided by
    Regierungspraesidium Freiburg - Dept. 9 State Authority for Geology, Mineral Resources and Mining, Ref. 96 state raw material geology
    Authors
    Geoportal Baden-Württemberg
    License

    http://dcat-ap.de/def/licenses/other-closedhttp://dcat-ap.de/def/licenses/other-closed

    Description

    Since 1999, the Geologic Survey of Baden-Württemberg publishes a statewide geological map series 1 : 50 000 "Karte der mineralischen Rohstoffe 1 : 50 000 (KMR 50)". On it, the distribution of near-surface mineral raw material prospects and occurrences (mainly) and deposits (subordinate) is shown. This continuously completed and updated map currently covers around 60% of the federal state. It is the base for the regional associations in the task of mineral planning.

    The prospects and occurrences are classified according to different raw material groups (e.g. raw material for crushed stone (limestone, igneous rocks, metamorphic rocks, sand and gravel), raw materials for cement, dimension stone, high purity limestone, gypsum ...). Their spatial delineation is based on various group-specific criteria such as minimum workable thickness, minimum resources, ratio overburden/workable thickness, and so on. It is assumed that they contain deposits as a whole or in parts. In the vast majority of cases, the data is not sufficient for the immediate planning of mining projects, but it does facilitate the selection of exploration areas.

    The name of each area (e.g. L 6926-3) consists of three parts. L = roman rnumeral fo 50, 6926 = sheet number of the topographic map 1 : 50 000, 3 = number of the area/mineral occurrence shown on this sheet.

    Co-occurring land-use conflicts, e.g. water protection areas and nature conservation areas, forestry and agriculture, are not taken into account in the processing of KMR 50. Their assessment is the task of land use planning, the licensing authorities and the companies interested in mining.

    The data is stored in the statewide raw material area database "olan-db" of the LGRB.

  12. B

    Data from: Text mining for neuroanatomy using WhiteText with an updated...

    • borealisdata.ca
    • search.dataone.org
    Updated Mar 11, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leon French; Paul Pavlidis (2019). Text mining for neuroanatomy using WhiteText with an updated corpus and a new web application [Dataset]. http://doi.org/10.5683/SP2/4J5NHT
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 11, 2019
    Dataset provided by
    Borealis
    Authors
    Leon French; Paul Pavlidis
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Dataset funded by
    NSERC, NIH
    Description

    We describe the WhiteText project, and its progress towards automatically extracting statements of neuroanatomical connectivity from text. We review progress to date on the three main steps of the project: recognition of brain region mentions, standardization of brain region mentions to neuroanatomical nomenclature, and connectivity statement extraction. We further describe a new version of our manually curated corpus that adds 2,111 connectivity statements from 1,828 additional abstracts. Cross-validation classification within the new corpus replicates results on our original corpus, recalling 67% of connectivity statements at 51% precision. The resulting merged corpus provides 5,208 connectivity statements that can be used to seed species-specific connectivity matrices and to better train automated techniques. Finally, we present a new web application that allows fast interactive browsing of the over 70,000 sentences indexed by the system, as a tool for accessing the data and assisting in further curation. Software and data are freely available at http://www.chibi.ubc.ca/WhiteText/.

  13. l

    Sentiment Analysis of the Presidential US Election 2016

    • repository.lboro.ac.uk
    html
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Martin Sykora; Thomas Jackson; Suzanne Elayan (2023). Sentiment Analysis of the Presidential US Election 2016 [Dataset]. http://doi.org/10.17028/rd.lboro.4040589.v1
    Explore at:
    htmlAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Loughborough University
    Authors
    Martin Sykora; Thomas Jackson; Suzanne Elayan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    This project entails a web-based interactive set of visualisations generated from advanced sentiment analysis over the US presidential elections 2016 from live social media (Twitter) data.

  14. n

    Signaling Pathways Project

    • neuinfo.org
    • dknet.org
    • +1more
    Updated Jan 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Signaling Pathways Project [Dataset]. http://identifiers.org/RRID:SCR_018412
    Explore at:
    Dataset updated
    Jan 29, 2022
    Description

    Web multi omics knowledgebase based upon public, manually curated transcriptomic and cistromic datasets involving genetic and small molecule manipulations of cellular receptors, enzymes and transcription factors. Integrated omics knowledgebase for mammalian cellular signaling pathways. Web browser interface was designed to accommodate numerous routine data mining strategies. Datasets are biocurated versions of publically archived datasets and are formatted according to recommendations of the FORCE11 Joint Declaration on Data Citation Principles73, and are made available under Creative Commons CC 3.0 BY license. Original datasets are available.

  15. D

    Metals And Mining Liability Insurance Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Metals And Mining Liability Insurance Market Research Report 2033 [Dataset]. https://dataintelo.com/report/metals-and-mining-liability-insurance-market
    Explore at:
    pptx, csv, pdfAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Metals and Mining Liability Insurance Market Outlook




    According to our latest research, the global metals and mining liability insurance market size reached USD 11.2 billion in 2024, driven by increasing regulatory scrutiny and risk management needs across the mining sector. The market is expected to grow at a robust CAGR of 6.7% from 2025 to 2033, with the forecasted market size anticipated to hit USD 19.4 billion by 2033. This growth is primarily fueled by heightened environmental concerns, stricter compliance mandates, and the expansion of mining operations globally, all of which are compelling industry stakeholders to prioritize comprehensive liability coverage.




    One of the most significant growth factors driving the metals and mining liability insurance market is the increasing complexity of regulatory frameworks governing mining operations worldwide. Governments and international bodies are continually updating environmental and safety standards, which has led to a surge in demand for specialized liability insurance products. Mining companies face an intricate web of risks, including environmental damage, third-party injuries, and property loss, which can result in substantial financial liabilities. As a result, there is a marked shift towards more comprehensive and customizable insurance solutions that address these evolving risks. The growing awareness among mining operators regarding the importance of risk transfer mechanisms is further propelling the adoption of liability insurance across the sector.




    Another pivotal growth driver is the rapid expansion of mining activities in emerging economies, particularly in Asia Pacific and Latin America. These regions are witnessing significant investments in both surface and underground mining projects, spurred by rising demand for metals such as copper, lithium, and rare earth elements. As mining operations become more extensive and technologically advanced, the exposure to environmental hazards, worker safety incidents, and operational disruptions also escalates. This has led to a corresponding increase in the uptake of liability insurance policies, as stakeholders seek to mitigate the financial and reputational risks associated with large-scale mining ventures. Insurance providers are responding by developing tailored products that address the unique risk profiles of different mining activities and geographies.




    Technological advancements and digital transformation within the metals and mining sector are also contributing to market growth. The integration of automation, IoT devices, and data analytics in mining operations has enhanced operational efficiency but introduced new liabilities related to cyber threats, equipment malfunctions, and data breaches. Insurers are increasingly offering specialized coverage for these emerging risks, which is attracting a broader range of clients. Additionally, the adoption of digital platforms for policy management, claims processing, and risk assessment is streamlining the insurance procurement process, making it easier for mining companies and contractors to access and manage their liability coverage. This digital shift is expected to further accelerate market growth in the coming years.




    From a regional perspective, Asia Pacific is emerging as the dominant market for metals and mining liability insurance, accounting for a significant share of global premiums in 2024. The region’s rapid industrialization, coupled with ongoing investments in mining infrastructure, has created a fertile ground for insurance providers. North America and Europe also represent substantial markets, driven by stringent regulatory environments and the presence of major mining conglomerates. Meanwhile, Latin America and the Middle East & Africa are witnessing steady growth, supported by new project developments and increasing awareness of liability risks. The regional outlook for the metals and mining liability insurance market remains positive, with all key regions expected to contribute to the overall expansion of the industry.



    Coverage Type Analysis




    The metals and mining liability insurance market is segmented by coverage type into general liability, environmental liability, professional liability, workers’ compensation, product liability, and others. General liability insurance remains the cornerstone of risk management for mining companies, offering protection against third-party claims for bodily injury, property damage, and a

  16. t

    Data from: Meiofauna abundance and distribution predicted with random forest...

    • service.tib.eu
    • doi.pangaea.de
    Updated Nov 30, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Meiofauna abundance and distribution predicted with random forest regression in the German exploration area for polymetallic nodule mining, Clarion Clipperton Fracture Zone, Pacific [Dataset]. https://service.tib.eu/ldmservice/dataset/png-doi-10-1594-pangaea-912217
    Explore at:
    Dataset updated
    Nov 30, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Pacific Ocean, Clipperton Island
    Description

    The dataset contains counts of meiofauna organisms on high taxonomic level and predicted distributions computed for overall meiofauna abundance, diversity (Simpson's Index D and Evenness E), richness (ntax) and individual taxa using random forest regressions. Furthermore, a habitatmap is provided, dividing the area based on k-means clustering of combined predicted distributions, bathymetry and backscatter. The spatial layers are saved as grid-files, being the standard format of the R-package "raster" (https://cran.r-project.org/web/packages/raster/index.html). Study area is an area allocated to the German Federal Institute for Geosciences and Natural Resources for the exploration of polymetallic nodule mining. Deep-sea mining highly endangers the benthic communities; hence the definition of preservation zones, not only for preservation but also to enable the re-settlement of mined areas, is highly important. These datasets on the spatial distribution of meiofauna have been used to account for a modelling approach to find areas with similar environmental conditions and similar benthic communities.

  17. Distribution of meiofauna abundance predicted in a potential deep-sea mining...

    • doi.pangaea.de
    html, tsv
    Updated Jan 29, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katja Uhlenkott; Annemiek Vink; Thomas Kuhn; Benjamin Gillard; Pedro Martínez Arbizu (2021). Distribution of meiofauna abundance predicted in a potential deep-sea mining area in the Clarion Clipperton Fracture Zone (CCZ) [Dataset]. http://doi.org/10.1594/PANGAEA.927217
    Explore at:
    html, tsvAvailable download formats
    Dataset updated
    Jan 29, 2021
    Dataset provided by
    PANGAEA
    Authors
    Katja Uhlenkott; Annemiek Vink; Thomas Kuhn; Benjamin Gillard; Pedro Martínez Arbizu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Variables measured
    Binary Object, Binary Object (File Size)
    Description

    The dataset contains predicted distributions of meiofauna on high taxonomic level computed for overall meiofauna abundance, diversity (Simpson's Index D and Evenness E), richness (ntax) and individual taxa using random forest regressions. The study area is a prospective mining operation area positioned in the German License Area for the Exploration of polymetallic nodules in the Clarion Clipperton Fracture Zone (CCZ). Investigating the influence of different environmental predictors, predictions are based on backscatter value and bathymetric variables only, on spatially predicted sediment and and polymetallic nodule parameters as well as on all of these environmental variables. Investigating inter-annual differences, predictions are based on samples obtained solely in 2013, 2014 and 2016, respectively. The spatial layers are saved as grid-files, being the standard format of the R-package "raster" (https://cran.r-project.org/web/packages/raster/index.html).

  18. Megafauna distribution predicted with random forest classification in the...

    • doi.pangaea.de
    zip
    Updated Aug 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katja Uhlenkott; Erik Simon-Lledó; Annemiek Vink; Pedro Martínez Arbizu (2022). Megafauna distribution predicted with random forest classification in the German contract area for polymetallic nodule mining, Clarion Clipperton Fracture Zone, Pacific [Dataset]. http://doi.org/10.1594/PANGAEA.946804
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 3, 2022
    Dataset provided by
    PANGAEA
    Authors
    Katja Uhlenkott; Erik Simon-Lledó; Annemiek Vink; Pedro Martínez Arbizu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    The dataset contains grid-files of the predicted probability of occurrence, respectively non-occurrence, predicted for the 68 morphotypes across the German contract area allocated to the German Federal Institute for Geosciences and Natural Resources (BGR) for the exploration of polymetallic nodules in the Clarion Clipperton Facture Zone, Pacific with random forest classification. All grid-files are saved in the standard format of the R-package "raster" (https://cran.r-project.org/web/packages/raster/index.html) and contain 2 bands, the first being the predicted probability for absence and the second layer for presence of the specific morphotype.

  19. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Jocelyn Dumlao (2023). Web Mining for Collaborative Food Delivery [Dataset]. https://www.kaggle.com/datasets/jocelyndumlao/web-mining-for-collaborative-food-delivery
Organization logo

Web Mining for Collaborative Food Delivery

Digging Insights: Web Mining for Food Delivery Collaboration

Explore at:
zip(396903 bytes)Available download formats
Dataset updated
Aug 26, 2023
Authors
Jocelyn Dumlao
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Description

This is the main data set that was built for the work titled: "A Web Mining Approach to Collaborative Consumption of Food Delivery Services" which is the official institutional research project of Professor Juan C. Correa at Fundación Universitaria Konrad Lorenz.

Categories

Urban Transportation, Consumer, e-Commerce Retail

Acknowledgements & Source

Professor Juan C. Correa at Fundación Universitaria Konrad Lorenz

Data Source

View Details

Image Source

Please don't forget to upvote if you find this useful.

Search
Clear search
Close search
Google apps
Main menu