18 datasets found

Web Mining for Collaborative Food Delivery
kaggle.com
zip
Updated Aug 26, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jocelyn Dumlao (2023). Web Mining for Collaborative Food Delivery [Dataset]. https://www.kaggle.com/datasets/jocelyndumlao/web-mining-for-collaborative-food-delivery
Explore at:
zip(396903 bytes)Available download formats
Dataset updated
Aug 26, 2023
Authors
Jocelyn Dumlao
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Description

This is the main data set that was built for the work titled: "A Web Mining Approach to Collaborative Consumption of Food Delivery Services" which is the official institutional research project of Professor Juan C. Correa at Fundación Universitaria Konrad Lorenz.

Categories

Urban Transportation, Consumer, e-Commerce Retail

Acknowledgements & Source

Professor Juan C. Correa at Fundación Universitaria Konrad Lorenz

Data Source

View Details

Image Source

Please don't forget to upvote if you find this useful.
R
Web Mining Dataset
universe.roboflow.com
zip
Updated May 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Web Mining Project (2023). Web Mining Dataset [Dataset]. https://universe.roboflow.com/web-mining-project/web-mining/model/1
Explore at:
zipAvailable download formats
Dataset updated
May 17, 2023
Dataset authored and provided by
Web Mining Project
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Zebra Cross
Description
Web Mining

## Overview Web Mining is a dataset for classification tasks - it contains Zebra Cross annotations for 1,494 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Knowledge Graph: tyrolean mining documents 15th and 16th century
zenodo.org
data-staging.niaid.nih.gov
+1more
bin
Updated Sep 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gerald Hiebel; Gerald Hiebel; Elisabeth Gruber-Tokić; Elisabeth Gruber-Tokić; Milena Peralta Friedburg; Milena Peralta Friedburg; Brigit Danthine; Brigit Danthine (2024). Knowledge Graph: tyrolean mining documents 15th and 16th century [Dataset]. http://doi.org/10.5281/zenodo.6276586
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6276586
Dataset updated
Sep 26, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gerald Hiebel; Gerald Hiebel; Elisabeth Gruber-Tokić; Elisabeth Gruber-Tokić; Milena Peralta Friedburg; Milena Peralta Friedburg; Brigit Danthine; Brigit Danthine
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains a Knowledge Graph (.nq file) of two historical mining documents: “Verleihbuch der Rattenberger Bergrichter” ( Hs. 37, 1460-1463) and “Schwazer Berglehenbuch” (Hs. 1587, approx. 1515) stored by the Tyrolean Regional Archive, Innsbruck (Austria). The user of the KG may explore the montanistic network and relations between people, claims and mines in the late medieval Tyrol. The core regions concern the districts Schwaz and Kufstein (Tyrol, Austria).

The ontology used to represent the claims is CIDOC CRM, an ISO certified ontology for Cultural Heritage documentation. Supported by the Karma tool the KG is generated as RDF (Resource Description Framework). The generated RDF data is imported into a Triplestore, in this case GraphDB, and then displayed visually. This puts the data from the early mining texts into a semantically structured context and makes the mutual relationships between people, places and mines visible.

Both documents and the Knowledge Graph were processed and generated by the research team of the project “Text Mining Medieval Mining Texts”. The research project (2019-2022) was carried out at the university of Innsbruck and funded by go!digital next generation programme of the Austrian Academy of Sciences.

Citeable Transcripts of the historical documents are online available:
Hs. 37 DOI: 10.5281/zenodo.6274562
Hs. 1587 DOI: 10.5281/zenodo.6274928
model web mining project akhir
kaggle.com
zip
Updated May 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cornelius Justin Satryo Hadi (2023). model web mining project akhir [Dataset]. https://www.kaggle.com/datasets/corneliusjustin/model-web-mining-project-akhir
Explore at:
zip(128427024 bytes)Available download formats
Dataset updated
May 14, 2023
Authors
Cornelius Justin Satryo Hadi
Description
Dataset

This dataset was created by Cornelius Justin Satryo Hadi

Contents
q
Simulated supermarket transaction data
researchdatafinder.qut.edu.au
researchdata.edu.au
Updated May 31, 2010
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuefeng Li (2010). Simulated supermarket transaction data [Dataset]. https://researchdatafinder.qut.edu.au/individual/q44
Explore at:
Dataset updated
May 31, 2010
Dataset provided by
Queensland University of Technology (QUT)
Authors
Yuefeng Li
Description
A database of de-identified supermarket customer transactions. This large simulated dataset was created based on a real data sample.
R
Cw Detection Dataset
universe.roboflow.com
zip
Updated May 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Web Mining Project (2023). Cw Detection Dataset [Dataset]. https://universe.roboflow.com/web-mining-project/cw-detection/model/1
Explore at:
zipAvailable download formats
Dataset updated
May 17, 2023
Dataset authored and provided by
Web Mining Project
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Crosswalks Bounding Boxes
Description
CW Detection

## Overview CW Detection is a dataset for object detection tasks - it contains Crosswalks annotations for 2,512 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
c
Sustaining growth for innovative new enterprises: UK firm data
datacatalogue.cessda.eu
Updated Sep 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sensier, M; Gök , A; Shapira, P (2025). Sustaining growth for innovative new enterprises: UK firm data [Dataset]. http://doi.org/10.5255/UKDA-SN-851779
Explore at:
Unique identifier
https://doi.org/10.5255/UKDA-SN-851779
Dataset updated
Sep 26, 2025
Dataset provided by
University of Manchester
Authors
Sensier, M; Gök , A; Shapira, P
Time period covered
Jan 1, 2012 - Dec 31, 2014
Area covered
United Kingdom
Variables measured
Organization
Measurement technique
We collected the financial information on the UK firms by downloading Companies House data from the FAME database available through the University of Manchester Library (see http://www.library.manchester.ac.uk/searchresources/databases/f/). Grant information on companies came from the Technology Strategy Board. Patent information was from the Derwent database and publication information was from the Web of Science. The Consumer Price index was from the Office for National Statistics (http://www.ons.gov.uk/ons/rel/cpi/consumer-price-indices/index.html). The Human Resources in Science and Technology variable was from the Eurostat database (http://ec.europa.eu/eurostat/data/database).Unstructured data was mined from firm's web-sites. The UK Intellectual Property Office has clarified that the data mining we are doing and the way we are doing it is permissible. See: https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/375954/Research.pdf
Description
To select the group of UK firms we initially searched in the FAME database (available from the University of Manchester Library) with keywords relating to the green goods sector, please see the publication Shapira, et al (2014, in Technological Forecasting & Social Change, vol. 85, pp. 93-104) for further details on the keywords. This database contains anonymized firm data from a sample of UK firms in the green goods production industry. We combine data from structured sources (the FAME database, patents and publications) with unstructured data mined from firm's web-sites by saving key words in text and summing up counts of these to create additional explanatory variables for firm growth. The data is in a panel from 2003-2012 with some observations missing for firms. We collect historical data from firm's web-sites available in an archive from the Wayback machine.
This project probes the growth strategies of innovative small and medium-size enterprises (SMEs). Our research focuses on emerging green goods industries that manufacture outputs which benefit the environment or conserve natural resources, with an international comparative element involving the UK, the US, and China.

The project investigates the contributions of strategy, resources and relationships to how innovative British, American, and Chinese SMEs achieve significant growth. The targeted technology-oriented green goods sectors are strategically important to environmental rebalancing and have significant potential (in the UK) for export growth. The research examines the diverse pathways to innovation and growth across different regions. We use a mix of methodologies, including analyses of structured and unstructured data on SME business and technology performance and strategies, case studies, and modelling. Novel approaches using web mining are pioneered to gain timely information about enterprise developmental pathways. Findings from the project will be used to inform management and policy development at enterprise, regional and national levels.

The project is led by the Manchester Institute of Innovation Research at the University of Manchester, in collaboration with Georgia Institute of Technology, US; Beijing Institute of Technology, China, and Experian, UK.
Product data mining: entity classification&linking
kaggle.com
zip
Updated Jul 13, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
zzhang (2020). Product data mining: entity classification&linking [Dataset]. https://www.kaggle.com/ziqizhang/product-data-miningentity-classificationlinking
Explore at:
zip(10933 bytes)Available download formats
Dataset updated
Jul 13, 2020
Authors
zzhang
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
IMPORTANT: Round 1 results are now released, check our website for the leaderboard. We now open Round 2 submissions!

1. Overview

We release two datasets that are part of the the Semantic Web Challenge on Mining the Web of HTML-embedded Product Data is co-located with the 19th International Semantic Web Conference (https://iswc2020.semanticweb.org/, 2-6 Nov 2020 at Athens, Greece). The datasets belong to two shared tasks related to product data mining on the Web: (1) product matching (linking) and (2) product classification. This event is organised by The University of Sheffield, The University of Mannheim and Amazon, and is open to anyone. Systems successfully beating the baseline of the respective task, will be invited to write a paper describing their method and system and present the method as a poster (and potentially also a short talk) at the ISWC2020 conference. Winners of each task will be awarded 500 euro as prize (partly sponsored by Peak Indicators, https://www.peakindicators.com/).

2. Task and dataset brief

The challenge organises two tasks, product matching and product categorisation.

i) Product Matching deals with identifying product offers on different websites that refer to the same real-world product (e.g., the same iPhone X model offered using different names/offer titles as well as different descriptions on various websites). A multi-million product offer corpus (16M) containing product offer clusters is released for the generation of training data. A validation set containing 1.1K offer pairs and a test set of 600 offer pairs will also be released. The goal of this task is to classify if the offer pairs in these datasets are match (i.e., referring to the same product) or non-match.

ii) Product classification deals with assigning predefined product category labels (which can be multiple levels) to product instances (e.g., iPhone X is a ‘SmartPhone’, and also ‘Electronics’). A training dataset containing 10K product offers, a validation set of 3K product offers and a test set of 3K product offers will be released. Each dataset contains product offers with their metadata (e.g., name, description, URL) and three classification labels each corresponding to a level in the GS1 Global Product Classification taxonomy. The goal is to classify these product offers into the pre-defined category labels.

All datasets are built based on structured data that was extracted from the Common Crawl (https://commoncrawl.org/) by the Web Data Commons project (http://webdatacommons.org/). Datasets can be found at: https://ir-ischool-uos.github.io/mwpd/

3. Resources and tools

The challenge will also release utility code (in Python) for processing the above datasets and scoring the system outputs. In addition, the following language resources for product-related data mining tasks: A text corpus of 150 million product offer descriptions Word embeddings trained on the above corpus

4. Challenge website

For details of the challenge please visit https://ir-ischool-uos.github.io/mwpd/

5. Organizing committee

Dr Ziqi Zhang (Information School, The University of Sheffield) Prof. Christian Bizer (Institute of Computer Science and Business Informatics, The Mannheim University) Dr Haiping Lu (Department of Computer Science, The University of Sheffield) Dr Jun Ma (Amazon Inc. Seattle, US) Prof. Paul Clough (Information School, The University of Sheffield & Peak Indicators) Ms Anna Primpeli (Institute of Computer Science and Business Informatics, The Mannheim University) Mr Ralph Peeters (Institute of Computer Science and Business Informatics, The Mannheim University) Mr. Abdulkareem Alqusair (Information School, The University of Sheffield)

6. Contact

To contact the organising committee please use the Google discussion group https://groups.google.com/forum/#!forum/mwpd2020
l
LScD (Leicester Scientific Dictionary)
figshare.le.ac.uk
docx
Updated Apr 15, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neslihan Suzen (2020). LScD (Leicester Scientific Dictionary) [Dataset]. http://doi.org/10.25392/leicester.data.9746900.v3
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.25392/leicester.data.9746900.v3
Dataset updated
Apr 15, 2020
Dataset provided by
University of Leicester
Authors
Neslihan Suzen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Leicester
Description
LScD (Leicester Scientific Dictionary)April 2020 by Neslihan Suzen, PhD student at the University of Leicester (ns433@leicester.ac.uk/suzenneslihan@hotmail.com)Supervised by Prof Alexander Gorban and Dr Evgeny Mirkes[Version 3] The third version of LScD (Leicester Scientific Dictionary) is created from the updated LSC (Leicester Scientific Corpus) - Version 2*. All pre-processing steps applied to build the new version of the dictionary are the same as in Version 2** and can be found in description of Version 2 below. We did not repeat the explanation. After pre-processing steps, the total number of unique words in the new version of the dictionary is 972,060. The files provided with this description are also same as described as for LScD Version 2 below.* Suzen, Neslihan (2019): LSC (Leicester Scientific Corpus). figshare. Dataset. https://doi.org/10.25392/leicester.data.9449639.v2** Suzen, Neslihan (2019): LScD (Leicester Scientific Dictionary). figshare. Dataset. https://doi.org/10.25392/leicester.data.9746900.v2[Version 2] Getting StartedThis document provides the pre-processing steps for creating an ordered list of words from the LSC (Leicester Scientific Corpus) [1] and the description of LScD (Leicester Scientific Dictionary). This dictionary is created to be used in future work on the quantification of the meaning of research texts. R code for producing the dictionary from LSC and instructions for usage of the code are available in [2]. The code can be also used for list of texts from other sources, amendments to the code may be required.LSC is a collection of abstracts of articles and proceeding papers published in 2014 and indexed by the Web of Science (WoS) database [3]. Each document contains title, list of authors, list of categories, list of research areas, and times cited. The corpus contains only documents in English. The corpus was collected in July 2018 and contains the number of citations from publication date to July 2018. The total number of documents in LSC is 1,673,824.LScD is an ordered list of words from texts of abstracts in LSC.The dictionary stores 974,238 unique words, is sorted by the number of documents containing the word in descending order. All words in the LScD are in stemmed form of words. The LScD contains the following information:1.Unique words in abstracts2.Number of documents containing each word3.Number of appearance of a word in the entire corpusProcessing the LSCStep 1.Downloading the LSC Online: Use of the LSC is subject to acceptance of request of the link by email. To access the LSC for research purposes, please email to ns433@le.ac.uk. The data are extracted from Web of Science [3]. You may not copy or distribute these data in whole or in part without the written consent of Clarivate Analytics.Step 2.Importing the Corpus to R: The full R code for processing the corpus can be found in the GitHub [2].All following steps can be applied for arbitrary list of texts from any source with changes of parameter. The structure of the corpus such as file format and names (also the position) of fields should be taken into account to apply our code. The organisation of CSV files of LSC is described in README file for LSC [1].Step 3.Extracting Abstracts and Saving Metadata: Metadata that include all fields in a document excluding abstracts and the field of abstracts are separated. Metadata are then saved as MetaData.R. Fields of metadata are: List_of_Authors, Title, Categories, Research_Areas, Total_Times_Cited and Times_cited_in_Core_Collection.Step 4.Text Pre-processing Steps on the Collection of Abstracts: In this section, we presented our approaches to pre-process abstracts of the LSC.1.Removing punctuations and special characters: This is the process of substitution of all non-alphanumeric characters by space. We did not substitute the character “-” in this step, because we need to keep words like “z-score”, “non-payment” and “pre-processing” in order not to lose the actual meaning of such words. A processing of uniting prefixes with words are performed in later steps of pre-processing.2.Lowercasing the text data: Lowercasing is performed to avoid considering same words like “Corpus”, “corpus” and “CORPUS” differently. Entire collection of texts are converted to lowercase.3.Uniting prefixes of words: Words containing prefixes joined with character “-” are united as a word. The list of prefixes united for this research are listed in the file “list_of_prefixes.csv”. The most of prefixes are extracted from [4]. We also added commonly used prefixes: ‘e’, ‘extra’, ‘per’, ‘self’ and ‘ultra’.4.Substitution of words: Some of words joined with “-” in the abstracts of the LSC require an additional process of substitution to avoid losing the meaning of the word before removing the character “-”. Some examples of such words are “z-test”, “well-known” and “chi-square”. These words have been substituted to “ztest”, “wellknown” and “chisquare”. Identification of such words is done by sampling of abstracts form LSC. The full list of such words and decision taken for substitution are presented in the file “list_of_substitution.csv”.5.Removing the character “-”: All remaining character “-” are replaced by space.6.Removing numbers: All digits which are not included in a word are replaced by space. All words that contain digits and letters are kept because alphanumeric characters such as chemical formula might be important for our analysis. Some examples are “co2”, “h2o” and “21st”.7.Stemming: Stemming is the process of converting inflected words into their word stem. This step results in uniting several forms of words with similar meaning into one form and also saving memory space and time [5]. All words in the LScD are stemmed to their word stem.8.Stop words removal: Stop words are words that are extreme common but provide little value in a language. Some common stop words in English are ‘I’, ‘the’, ‘a’ etc. We used ‘tm’ package in R to remove stop words [6]. There are 174 English stop words listed in the package.Step 5.Writing the LScD into CSV Format: There are 1,673,824 plain processed texts for further analysis. All unique words in the corpus are extracted and written in the file “LScD.csv”.The Organisation of the LScDThe total number of words in the file “LScD.csv” is 974,238. Each field is described below:Word: It contains unique words from the corpus. All words are in lowercase and their stem forms. The field is sorted by the number of documents that contain words in descending order.Number of Documents Containing the Word: In this content, binary calculation is used: if a word exists in an abstract then there is a count of 1. If the word exits more than once in a document, the count is still 1. Total number of document containing the word is counted as the sum of 1s in the entire corpus.Number of Appearance in Corpus: It contains how many times a word occurs in the corpus when the corpus is considered as one large document.Instructions for R CodeLScD_Creation.R is an R script for processing the LSC to create an ordered list of words from the corpus [2]. Outputs of the code are saved as RData file and in CSV format. Outputs of the code are:Metadata File: It includes all fields in a document excluding abstracts. Fields are List_of_Authors, Title, Categories, Research_Areas, Total_Times_Cited and Times_cited_in_Core_Collection.File of Abstracts: It contains all abstracts after pre-processing steps defined in the step 4.DTM: It is the Document Term Matrix constructed from the LSC[6]. Each entry of the matrix is the number of times the word occurs in the corresponding document.LScD: An ordered list of words from LSC as defined in the previous section.The code can be used by:1.Download the folder ‘LSC’, ‘list_of_prefixes.csv’ and ‘list_of_substitution.csv’2.Open LScD_Creation.R script3.Change parameters in the script: replace with the full path of the directory with source files and the full path of the directory to write output files4.Run the full code.References[1]N. Suzen. (2019). LSC (Leicester Scientific Corpus) [Dataset]. Available: https://doi.org/10.25392/leicester.data.9449639.v1[2]N. Suzen. (2019). LScD-LEICESTER SCIENTIFIC DICTIONARY CREATION. Available: https://github.com/neslihansuzen/LScD-LEICESTER-SCIENTIFIC-DICTIONARY-CREATION[3]Web of Science. (15 July). Available: https://apps.webofknowledge.com/[4]A. Thomas, "Common Prefixes, Suffixes and Roots," Center for Development and Learning, 2013.[5]C. Ramasubramanian and R. Ramya, "Effective pre-processing activities in text mining using improved porter’s stemming algorithm," International Journal of Advanced Research in Computer and Communication Engineering, vol. 2, no. 12, pp. 4536-4538, 2013.[6]I. Feinerer, "Introduction to the tm Package Text Mining in R," Accessible en ligne: https://cran.r-project.org/web/packages/tm/vignettes/tm.pdf, 2013.
Link structures of collections of academic web sites
figshare.com
zip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mike Thelwall (2023). Link structures of collections of academic web sites [Dataset]. http://doi.org/10.6084/m9.figshare.785776.v4
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.785776.v4
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Mike Thelwall
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Databases of academic web links 2000-2006 This project was created for research into web links: including web link mining, and the creation of link metrics. It is aimed at providing the raw data and software for researchers to analyse link structures without having to rely upon commercial search engines, and without having to run their own web crawler. You may use all of the resources on this site for non-commercial reasons but please notify us if you have an academic paper or book published that uses the data in any way (so that we know the site is getting good use).
d
MIN4EU LGRB-BW: near-surface mineral raw material occurrences - harmonized...
daten-bw.de
ckan.mobidatalab.eu
+4more
Updated Nov 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Geoportal Baden-Württemberg (2025). MIN4EU LGRB-BW: near-surface mineral raw material occurrences - harmonized dataset [Dataset]. https://www.daten-bw.de/de/web/guest/suchen/-/details/min4eu-lgrb-bw-near-surface-mineral-raw-material-occurrences-harmonized-dataset
Explore at:
http://publications.europa.eu/resource/authority/file-type/wfs_srvcAvailable download formats
Dataset updated
Nov 18, 2025
Dataset provided by
Regierungspraesidium Freiburg - Dept. 9 State Authority for Geology, Mineral Resources and Mining, Ref. 96 state raw material geology
Authors
Geoportal Baden-Württemberg
License
http://dcat-ap.de/def/licenses/other-closedhttp://dcat-ap.de/def/licenses/other-closed
Description
Since 1999, the Geologic Survey of Baden-Württemberg publishes a statewide geological map series 1 : 50 000 "Karte der mineralischen Rohstoffe 1 : 50 000 (KMR 50)". On it, the distribution of near-surface mineral raw material prospects and occurrences (mainly) and deposits (subordinate) is shown. This continuously completed and updated map currently covers around 60% of the federal state. It is the base for the regional associations in the task of mineral planning.

The prospects and occurrences are classified according to different raw material groups (e.g. raw material for crushed stone (limestone, igneous rocks, metamorphic rocks, sand and gravel), raw materials for cement, dimension stone, high purity limestone, gypsum ...). Their spatial delineation is based on various group-specific criteria such as minimum workable thickness, minimum resources, ratio overburden/workable thickness, and so on. It is assumed that they contain deposits as a whole or in parts. In the vast majority of cases, the data is not sufficient for the immediate planning of mining projects, but it does facilitate the selection of exploration areas.

The name of each area (e.g. L 6926-3) consists of three parts. L = roman rnumeral fo 50, 6926 = sheet number of the topographic map 1 : 50 000, 3 = number of the area/mineral occurrence shown on this sheet.

Co-occurring land-use conflicts, e.g. water protection areas and nature conservation areas, forestry and agriculture, are not taken into account in the processing of KMR 50. Their assessment is the task of land use planning, the licensing authorities and the companies interested in mining.

The data is stored in the statewide raw material area database "olan-db" of the LGRB.
B
Data from: Text mining for neuroanatomy using WhiteText with an updated...
borealisdata.ca
search.dataone.org
Updated Mar 11, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leon French; Paul Pavlidis (2019). Text mining for neuroanatomy using WhiteText with an updated corpus and a new web application [Dataset]. http://doi.org/10.5683/SP2/4J5NHT
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.5683/SP2/4J5NHT
Dataset updated
Mar 11, 2019
Dataset provided by
Borealis
Authors
Leon French; Paul Pavlidis
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dataset funded by
NSERC, NIH
Description
We describe the WhiteText project, and its progress towards automatically extracting statements of neuroanatomical connectivity from text. We review progress to date on the three main steps of the project: recognition of brain region mentions, standardization of brain region mentions to neuroanatomical nomenclature, and connectivity statement extraction. We further describe a new version of our manually curated corpus that adds 2,111 connectivity statements from 1,828 additional abstracts. Cross-validation classification within the new corpus replicates results on our original corpus, recalling 67% of connectivity statements at 51% precision. The resulting merged corpus provides 5,208 connectivity statements that can be used to seed species-specific connectivity matrices and to better train automated techniques. Finally, we present a new web application that allows fast interactive browsing of the over 70,000 sentences indexed by the system, as a tool for accessing the data and assisting in further curation. Software and data are freely available at http://www.chibi.ubc.ca/WhiteText/.
l
Sentiment Analysis of the Presidential US Election 2016
repository.lboro.ac.uk
html
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Sykora; Thomas Jackson; Suzanne Elayan (2023). Sentiment Analysis of the Presidential US Election 2016 [Dataset]. http://doi.org/10.17028/rd.lboro.4040589.v1
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.17028/rd.lboro.4040589.v1
Dataset updated
May 31, 2023
Dataset provided by
Loughborough University
Authors
Martin Sykora; Thomas Jackson; Suzanne Elayan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
This project entails a web-based interactive set of visualisations generated from advanced sentiment analysis over the US presidential elections 2016 from live social media (Twitter) data.
n
Signaling Pathways Project
neuinfo.org
dknet.org
+1more
Updated Jan 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Signaling Pathways Project [Dataset]. http://identifiers.org/RRID:SCR_018412
Explore at:
Unique identifier
https://identifiers.org/RRID:SCR_018412
Dataset updated
Jan 29, 2022
Description
Web multi omics knowledgebase based upon public, manually curated transcriptomic and cistromic datasets involving genetic and small molecule manipulations of cellular receptors, enzymes and transcription factors. Integrated omics knowledgebase for mammalian cellular signaling pathways. Web browser interface was designed to accommodate numerous routine data mining strategies. Datasets are biocurated versions of publically archived datasets and are formatted according to recommendations of the FORCE11 Joint Declaration on Data Citation Principles73, and are made available under Creative Commons CC 3.0 BY license. Original datasets are available.
D
Metals And Mining Liability Insurance Market Research Report 2033
dataintelo.com
csv, pdf, pptx
Updated Sep 30, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Metals And Mining Liability Insurance Market Research Report 2033 [Dataset]. https://dataintelo.com/report/metals-and-mining-liability-insurance-market
Explore at:
pptx, csv, pdfAvailable download formats
Dataset updated
Sep 30, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Metals and Mining Liability Insurance Market Outlook

According to our latest research, the global metals and mining liability insurance market size reached USD 11.2 billion in 2024, driven by increasing regulatory scrutiny and risk management needs across the mining sector. The market is expected to grow at a robust CAGR of 6.7% from 2025 to 2033, with the forecasted market size anticipated to hit USD 19.4 billion by 2033. This growth is primarily fueled by heightened environmental concerns, stricter compliance mandates, and the expansion of mining operations globally, all of which are compelling industry stakeholders to prioritize comprehensive liability coverage.

One of the most significant growth factors driving the metals and mining liability insurance market is the increasing complexity of regulatory frameworks governing mining operations worldwide. Governments and international bodies are continually updating environmental and safety standards, which has led to a surge in demand for specialized liability insurance products. Mining companies face an intricate web of risks, including environmental damage, third-party injuries, and property loss, which can result in substantial financial liabilities. As a result, there is a marked shift towards more comprehensive and customizable insurance solutions that address these evolving risks. The growing awareness among mining operators regarding the importance of risk transfer mechanisms is further propelling the adoption of liability insurance across the sector.

Another pivotal growth driver is the rapid expansion of mining activities in emerging economies, particularly in Asia Pacific and Latin America. These regions are witnessing significant investments in both surface and underground mining projects, spurred by rising demand for metals such as copper, lithium, and rare earth elements. As mining operations become more extensive and technologically advanced, the exposure to environmental hazards, worker safety incidents, and operational disruptions also escalates. This has led to a corresponding increase in the uptake of liability insurance policies, as stakeholders seek to mitigate the financial and reputational risks associated with large-scale mining ventures. Insurance providers are responding by developing tailored products that address the unique risk profiles of different mining activities and geographies.

Technological advancements and digital transformation within the metals and mining sector are also contributing to market growth. The integration of automation, IoT devices, and data analytics in mining operations has enhanced operational efficiency but introduced new liabilities related to cyber threats, equipment malfunctions, and data breaches. Insurers are increasingly offering specialized coverage for these emerging risks, which is attracting a broader range of clients. Additionally, the adoption of digital platforms for policy management, claims processing, and risk assessment is streamlining the insurance procurement process, making it easier for mining companies and contractors to access and manage their liability coverage. This digital shift is expected to further accelerate market growth in the coming years.

From a regional perspective, Asia Pacific is emerging as the dominant market for metals and mining liability insurance, accounting for a significant share of global premiums in 2024. The region’s rapid industrialization, coupled with ongoing investments in mining infrastructure, has created a fertile ground for insurance providers. North America and Europe also represent substantial markets, driven by stringent regulatory environments and the presence of major mining conglomerates. Meanwhile, Latin America and the Middle East & Africa are witnessing steady growth, supported by new project developments and increasing awareness of liability risks. The regional outlook for the metals and mining liability insurance market remains positive, with all key regions expected to contribute to the overall expansion of the industry.

Coverage Type Analysis

The metals and mining liability insurance market is segmented by coverage type into general liability, environmental liability, professional liability, workers’ compensation, product liability, and others. General liability insurance remains the cornerstone of risk management for mining companies, offering protection against third-party claims for bodily injury, property damage, and a
t
Data from: Meiofauna abundance and distribution predicted with random forest...
service.tib.eu
doi.pangaea.de
Updated Nov 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Meiofauna abundance and distribution predicted with random forest regression in the German exploration area for polymetallic nodule mining, Clarion Clipperton Fracture Zone, Pacific [Dataset]. https://service.tib.eu/ldmservice/dataset/png-doi-10-1594-pangaea-912217
Explore at:
Dataset updated
Nov 30, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Pacific Ocean, Clipperton Island
Description
The dataset contains counts of meiofauna organisms on high taxonomic level and predicted distributions computed for overall meiofauna abundance, diversity (Simpson's Index D and Evenness E), richness (ntax) and individual taxa using random forest regressions. Furthermore, a habitatmap is provided, dividing the area based on k-means clustering of combined predicted distributions, bathymetry and backscatter. The spatial layers are saved as grid-files, being the standard format of the R-package "raster" (https://cran.r-project.org/web/packages/raster/index.html). Study area is an area allocated to the German Federal Institute for Geosciences and Natural Resources for the exploration of polymetallic nodule mining. Deep-sea mining highly endangers the benthic communities; hence the definition of preservation zones, not only for preservation but also to enable the re-settlement of mined areas, is highly important. These datasets on the spatial distribution of meiofauna have been used to account for a modelling approach to find areas with similar environmental conditions and similar benthic communities.
Distribution of meiofauna abundance predicted in a potential deep-sea mining...
doi.pangaea.de
html, tsv
Updated Jan 29, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katja Uhlenkott; Annemiek Vink; Thomas Kuhn; Benjamin Gillard; Pedro Martínez Arbizu (2021). Distribution of meiofauna abundance predicted in a potential deep-sea mining area in the Clarion Clipperton Fracture Zone (CCZ) [Dataset]. http://doi.org/10.1594/PANGAEA.927217
Explore at:
html, tsvAvailable download formats
Unique identifier
https://doi.org/10.1594/PANGAEA.927217
Dataset updated
Jan 29, 2021
Dataset provided by
PANGAEA
Authors
Katja Uhlenkott; Annemiek Vink; Thomas Kuhn; Benjamin Gillard; Pedro Martínez Arbizu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Variables measured
Binary Object, Binary Object (File Size)
Description
The dataset contains predicted distributions of meiofauna on high taxonomic level computed for overall meiofauna abundance, diversity (Simpson's Index D and Evenness E), richness (ntax) and individual taxa using random forest regressions. The study area is a prospective mining operation area positioned in the German License Area for the Exploration of polymetallic nodules in the Clarion Clipperton Fracture Zone (CCZ). Investigating the influence of different environmental predictors, predictions are based on backscatter value and bathymetric variables only, on spatially predicted sediment and and polymetallic nodule parameters as well as on all of these environmental variables. Investigating inter-annual differences, predictions are based on samples obtained solely in 2013, 2014 and 2016, respectively. The spatial layers are saved as grid-files, being the standard format of the R-package "raster" (https://cran.r-project.org/web/packages/raster/index.html).
Megafauna distribution predicted with random forest classification in the...
doi.pangaea.de
zip
Updated Aug 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katja Uhlenkott; Erik Simon-Lledó; Annemiek Vink; Pedro Martínez Arbizu (2022). Megafauna distribution predicted with random forest classification in the German contract area for polymetallic nodule mining, Clarion Clipperton Fracture Zone, Pacific [Dataset]. http://doi.org/10.1594/PANGAEA.946804
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1594/PANGAEA.946804
Dataset updated
Aug 3, 2022
Dataset provided by
PANGAEA
Authors
Katja Uhlenkott; Erik Simon-Lledó; Annemiek Vink; Pedro Martínez Arbizu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
The dataset contains grid-files of the predicted probability of occurrence, respectively non-occurrence, predicted for the 68 morphotypes across the German contract area allocated to the German Federal Institute for Geosciences and Natural Resources (BGR) for the exploration of polymetallic nodules in the Clarion Clipperton Facture Zone, Pacific with random forest classification. All grid-files are saved in the standard format of the R-package "raster" (https://cran.r-project.org/web/packages/raster/index.html) and contain 2 bands, the first being the predicted probability for absence and the second layer for presence of the specific morphotype.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Jocelyn Dumlao (2023). Web Mining for Collaborative Food Delivery [Dataset]. https://www.kaggle.com/datasets/jocelyndumlao/web-mining-for-collaborative-food-delivery

Web Mining for Collaborative Food Delivery

Digging Insights: Web Mining for Food Delivery Collaboration

Explore at:

zip(396903 bytes)Available download formats

Dataset updated

Aug 26, 2023

Authors

Jocelyn Dumlao

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This is the main data set that was built for the work titled: "A Web Mining Approach to Collaborative Consumption of Food Delivery Services" which is the official institutional research project of Professor Juan C. Correa at Fundación Universitaria Konrad Lorenz.

Acknowledgements & Source

Professor Juan C. Correa at Fundación Universitaria Konrad Lorenz

Data Source

View Details

Image Source

Please don't forget to upvote if you find this useful.

Clear search

Close search

Google apps

Main menu

Web Mining for Collaborative Food Delivery

Description

Categories

Acknowledgements & Source

Please don't forget to upvote if you find this useful.

Web Mining Dataset

Web Mining

Knowledge Graph: tyrolean mining documents 15th and 16th century

model web mining project akhir

Dataset

Contents

Simulated supermarket transaction data

Cw Detection Dataset

CW Detection

Sustaining growth for innovative new enterprises: UK firm data

Product data mining: entity classification&linking

IMPORTANT: Round 1 results are now released, check our website for the leaderboard. We now open Round 2 submissions!

1. Overview

2. Task and dataset brief

3. Resources and tools

4. Challenge website

5. Organizing committee

6. Contact

LScD (Leicester Scientific Dictionary)

Link structures of collections of academic web sites

MIN4EU LGRB-BW: near-surface mineral raw material occurrences - harmonized...

Data from: Text mining for neuroanatomy using WhiteText with an updated...

Sentiment Analysis of the Presidential US Election 2016

Signaling Pathways Project

Metals And Mining Liability Insurance Market Research Report 2033

Metals and Mining Liability Insurance Market Outlook

Coverage Type Analysis

Data from: Meiofauna abundance and distribution predicted with random forest...

Distribution of meiofauna abundance predicted in a potential deep-sea mining...

Megafauna distribution predicted with random forest classification in the...

Web Mining for Collaborative Food Delivery

Digging Insights: Web Mining for Food Delivery Collaboration

Description

Categories

Acknowledgements & Source

Please don't forget to upvote if you find this useful.