100+ datasets found

h
miners-detection
huggingface.co
Updated Sep 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Training Data (2023). miners-detection [Dataset]. https://huggingface.co/datasets/TrainingDataPro/miners-detection
Explore at:
Dataset updated
Sep 20, 2023
Authors
Training Data
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
The dataset consists of of photos captured within various mines, focusing on miners engaged in their work. Each photo is annotated with bounding box detection of the miners, an attribute highlights whether each miner is sitting or standing in the photo.

The dataset's diverse applications such as computer vision, safety assessment and others make it a valuable resource for researchers, employers, and policymakers in the mining industry.
Logs and Mined Sequential Patterns of Programming Processes from...
figshare.com
txt
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Minji Kong; Lori Pollock (2023). Logs and Mined Sequential Patterns of Programming Processes from "Semi-Automatically Mining Students' Common Scratch Programming Behaviors" [Dataset]. http://doi.org/10.6084/m9.figshare.12100797.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12100797.v1
Dataset updated
Jun 3, 2023
Dataset provided by
figshare
Authors
Minji Kong; Lori Pollock
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present a ProgSnap2-based dataset containing anonymized logs of over 34,000 programming events exhibited by 81 programming students in Scratch, a visual programming environment, during our designed study as described in the paper "Semi-Automatically Mining Students' Common Scratch Programming Behaviors." We also include a list of approx. 3100 mined sequential patterns of programming processes that are performed by at least 10% of the 62 of the 81 students who are novice programmers, and represent maximal patterns generated by the MG-FSM algorithm while allowing a gap of one programming event. README.txt — overview of the dataset and its propertiesmainTable.csv — main event table of the dataset holding rows of programming eventscodeState.csv — table holding XML representations of code snapshots at the time of each programming eventdatasetMetadata.csv — describes features of the datasetScratch-SeqPatterns.txt — list of sequential patterns mined from the Main Event Table
u
Human-Computer Interaction Logs
indigo.uic.edu
zip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julian Theis; Houshang Darabi (2023). Human-Computer Interaction Logs [Dataset]. http://doi.org/10.25417/uic.11923386.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.25417/uic.11923386.v1
Dataset updated
May 30, 2023
Dataset provided by
University of Illinois Chicago
Authors
Julian Theis; Houshang Darabi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset comprises ten human-computer interaction logs of real participants who solved a given task in a Windows environment. The participants were allowed to use the standard notepad, calculator, and file explorer. All recordings are anonymized and do not contain any private information.Simple:Each of the five log files in the folder simple contains Human-Computer Interaction recordings of a participant solving a simple task. Participants were provided 30 raw text files where each one contained data about the revenue and expenses of a single product for a given time period. In total 15 summaries were asked to be created by summarizing the data of two files and calculating the combined revenue, expenses, and profit. Complex:Each of the five log files in the folder complex contains Human-Computer Interaction recordings of a participant solving a more advanced task. In particular, participants were given a folder of text documents and were asked to create summary documents that contain the total revenue and expenses of the quarter, profit, and, where applicable, profit improvement compared to the previous quarter and the same quarter of the previous year. Each quarter’s data comprised multiple text files.The logging application that has been used is the one described inJulian Theis and Houshang Darabi. 2019. Behavioral Petri Net Mining and Automated Analysis for Human-Computer Interaction Recommendations in Multi-Application Environments. Proc. ACM Hum.-Comput. Interact. 3, EICS, Article 13 (June 2019), 16 pages. DOI: https://doi.org/10.1145/3331155Please refer to Table 1 and Table 2 of this publication regarding the structure of the log files. The first column corresponds to the timestamp in milliseconds, the second column represents the event key, and the third column contains additional event-specific information.
u
Public benchmark dataset for Conformance Checking in Process Mining
figshare.unimelb.edu.au
melbourne.figshare.com
xml
Updated Jan 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Reissner (2022). Public benchmark dataset for Conformance Checking in Process Mining [Dataset]. http://doi.org/10.26188/5cd91d0d3adaa
Explore at:
xmlAvailable download formats
Unique identifier
https://doi.org/10.26188/5cd91d0d3adaa
Dataset updated
Jan 30, 2022
Dataset provided by
The University of Melbourne
Authors
Daniel Reissner
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains a variety of publicly available real-life event logs. We derived two types of Petri nets for each event log with two state-of-the-art process miners : Inductive Miner (IM) and Split Miner (SM). Each event log-Petri net pair is intended for evaluating the scalability of existing conformance checking techniques.We used this data-set to evaluate the scalability of the S-Component approach for measuring fitness. The dataset contains tables of descriptive statistics of both process models and event logs. In addition, this dataset includes the results in terms of time performance measured in milliseconds for several approaches for both multi-threaded and single-threaded executions. Last, the dataset contains a cost-comparison of different approaches and reports on the degree of over-approximation of the S-Components approach. The description of the compared conformance checking techniques can be found here: https://arxiv.org/abs/1910.09767. Update:The dataset has been extended with the event logs of the BPIC18 and BPIC19 logs. BPIC19 is actually a collection of four different processes and thus was split into four event logs. For each of the additional five event logs, again, two process models have been mined with inductive and split miner. We used the extended dataset to test the scalability of our tandem repeats approach for measuring fitness. The dataset now contains updated tables of log and model statistics as well as tables of the conducted experiments measuring execution time and raw fitness cost of various fitness approaches. The description of the compared conformance checking techniques can be found here: https://arxiv.org/abs/2004.01781.Update: The dataset has also been used to measure the scalability of a new Generalization measure based on concurrent and repetitive patterns. : A concurrency oracle is used in tandem with partial orders to identify concurrent patterns in the log that are tested against parallel blocks in the process model. Tandem repeats are used with various trace reduction and extensions to define repetitive patterns in the log that are tested against loops in the process model. Each pattern is assigned a partial fulfillment. The generalization is then the average of pattern fulfillments weighted by the trace counts for which the patterns have been observed. The dataset no includes the time results and a breakdown of Generalization values for the dataset.
Market Basket Analysis
kaggle.com
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 9, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
f
The Maven Dependency Dataset
figshare.com
data.4tu.nl
txt
Updated Jul 23, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Steven Raemaekers; A. (Arie) van Deursen; Joost Visser (2020). The Maven Dependency Dataset [Dataset]. http://doi.org/10.4121/uuid:68a0e837-4fda-407a-949e-a159546e67b6
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.4121/uuid:68a0e837-4fda-407a-949e-a159546e67b6
Dataset updated
Jul 23, 2020
Dataset provided by
4TU.ResearchData
Authors
Steven Raemaekers; A. (Arie) van Deursen; Joost Visser
License
https://doi.org/10.4121/resource:terms_of_usehttps://doi.org/10.4121/resource:terms_of_use
Description
The Maven Dependency Dataset contains the data as described in the paper "Mining Metrics, Changes and Dependencies from the Maven Dependency Dataset". NOTE: See the README.TXT file for more information on the data in this dataset. The dataset consists of multiple parts: A snapshot of the Maven repository dated July 30, 2011 (maven.tar.gz), a MySQL database (complete.tar.gz) containing information on individual methods, classes and packages of different library versions, a Berkeley DB database (berkeley.tar.gz) containing metrics on all methods, classes and packages in the repository, a Neo4j graph database (graphdb.tar.gz) containing a call graph of the entire repository, scripts and analysis files (scriptsAndData.tar.gz), Source code and a binary package of the analysis software (fullmaven.jar and fullmaven-sources.jar), and text dumps of data in these databases (graphdump.tar.gz, processed.tar.gz, calls.tar.gz and units.tar.gz).
d
Data from: A Generic Local Algorithm for Mining Data Streams in Large...
catalog.data.gov
datasets.ai
+3more
Updated Apr 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2025). A Generic Local Algorithm for Mining Data Streams in Large Distributed Systems [Dataset]. https://catalog.data.gov/dataset/a-generic-local-algorithm-for-mining-data-streams-in-large-distributed-systems
Explore at:
Dataset updated
Apr 10, 2025
Dataset provided by
Dashlink
Description
In a large network of computers or wireless sensors, each of the components (henceforth, peers) has some data about the global state of the system. Much of the system's functionality such as message routing, information retrieval and load sharing relies on modeling the global state. We refer to the outcome of the function (e.g., the load experienced by each peer) as the emph{model} of the system. Since the state of the system is constantly changing, it is necessary to keep the models up-to-date. Computing global data mining models e.g. decision trees, k-means clustering in large distributed systems may be very costly due to the scale of the system and due to communication cost, which may be high. The cost further increases in a dynamic scenario when the data changes rapidly. In this paper we describe a two step approach for dealing with these costs. First, we describe a highly efficient emph{local} algorithm which can be used to monitor a wide class of data mining models. Then, we use this algorithm as a feedback loop for the monitoring of complex functions of the data such as its k-means clustering. The theoretical claims are corroborated with a thorough experimental analysis.
Global ML-ready dataset for mining areas in satellite images
zenodo.org
zip
Updated Nov 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Simon Jasansky; Simon Jasansky; Victor Maus; Mirela Popa; Anna Wilbik; Anna Wilbik; Victor Maus; Mirela Popa (2024). Global ML-ready dataset for mining areas in satellite images [Dataset]. http://doi.org/10.5281/zenodo.14195737
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14195737
Dataset updated
Nov 21, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Simon Jasansky; Simon Jasansky; Victor Maus; Mirela Popa; Anna Wilbik; Anna Wilbik; Victor Maus; Mirela Popa
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset is a global resource for machine learning applications in mining area detection and semantic segmentation on satellite imagery. It contains Sentinel-2 satellite images and corresponding mining area masks + bounding boxes for 1,210 sites worldwide. Ground-truth masks are derived from Maus et al. (2022) and Tang et al. (2023), and validated through manual verification to ensure accurate alignment with Sentinel-2 imagery from specific timestamps.

The dataset includes three mask variants:

Masks exclusively from Maus et al. (n=1,090)

Masks exclusively from Tang et al. (n=817)

A preferred mask selected from either Maus or Tang based on alignment quality determined during manual review (n=1,210).

Each tile corresponds to a 2048x2048 pixel Sentinel-2 image, with metadata on mine type (surface, placer, underground, brine & evaporation) and scale (artisanal, industrial). For convenience, the preferred mask dataset is already split into training (75%), validation (15%), and test (10%) sets.

Furthermore, dataset quality was validated by re-validating test set tiles manually and correcting any mismatches between mining polygons and visually observed true mining area in the images, resulting in the following estimated quality metrics:

Combined Maus Tang
Accuracy 99.78 99.74 99.83
Precision 99.22 99.20 99.24
Recall 95.71 96.34 95.10

Note that the dataset does not contain the Sentinel-2 images themselves but contains a reference to specific Sentinel-2 images. Thus, for any ML applications, the images must be persisted first. For example, Sentinel-2 imagery is available from Microsoft's Planetary Computer and filterable via STAC API: https://planetarycomputer.microsoft.com/dataset/sentinel-2-l2a. Additionally, the temporal specificity of the data allows integration with other imagery sources from the indicated timestamp, such as Landsat or other high-resolution imagery.

Source code used to generate this dataset and to use it for ML model training is available at https://github.com/SimonJasansky/mine-segmentation. It includes useful Python scripts, e.g. to download Sentinel-2 images via STAC API, or to divide tile images (2048x2048px) into smaller chips (e.g. 512x512px).

A database schema, a schematic depiction of the dataset generation process, and a map of the global distribution of tiles are provided in the accompanying images.
d
Data-Mining-Final-Project-Data
search.dataone.org
Updated Sep 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anderson, Ty Julian (2024). Data-Mining-Final-Project-Data [Dataset]. http://doi.org/10.7910/DVN/8ETVW9
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/8ETVW9
Dataset updated
Sep 24, 2024
Dataset provided by
Harvard Dataverse
Authors
Anderson, Ty Julian
Description
Financial News Headlines. Visit https://dataone.org/datasets/sha256%3Ade01b1cf5318d53f0296b475ff28734d90acd6240a76f1eee1df39fefda07ef0 for complete metadata about this dataset.
R
Cv Cbi Mining Safety Dataset
universe.roboflow.com
zip
Updated May 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MININGSAFETYCBI (2024). Cv Cbi Mining Safety Dataset [Dataset]. https://universe.roboflow.com/miningsafetycbi/cv-cbi-mining-safety/dataset/3
Explore at:
zipAvailable download formats
Dataset updated
May 29, 2024
Dataset authored and provided by
MININGSAFETYCBI
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Variables measured
Boots Helms Vests Boots Bounding Boxes
Description
Project Description for Roboflow: Mining Safety - PPE Detection

Project Name: Mining Safety - PPE Detection

Overview:

The Mining Safety - PPE Detection project aims to enhance safety protocols in mining environments by leveraging computer vision technology to detect Personal Protective Equipment (PPE). This project focuses on the detection of various PPE items and the absence of mandatory safety gear to ensure that workers adhere to safety regulations, thereby minimizing the risk of accidents and injuries.

Objective:

To develop a robust object detection model capable of accurately identifying 13 different classes of PPE in real-time using a dataset sourced from Roboflow Universe. The ultimate goal is to integrate this model into a monitoring system that can alert supervisors about non-compliance with PPE requirements in mining sites.

PPE Classes (Labels):

Goggles

Helmet

Mask

No-Boots

No-Gloves

No-Helmet

No-Mask

No-Vest

Undefined

Vest

Boots

Ear-Protection

Gloves

Dataset:

Total Images: 7444

Source: Roboflow Universe

Annotations: Each image is annotated with bounding boxes corresponding to one or more of the 13 PPE classes.

Image Variety: The images come from various mining sites with different lighting conditions, camera angles, and worker positions to ensure diversity and robustness of the model.

Project Steps:

Data Collection and Annotation:

Import and utilize the dataset from Roboflow Universe, ensuring it covers diverse conditions and scenarios.

Verify and, if necessary, re-annotate images to match the 13 PPE classes accurately using the Roboflow platform.

Data Preprocessing:

Perform data augmentation techniques such as rotation, scaling, and cropping to increase the variability and size of the dataset.

Split the dataset into training, validation, and test sets (e.g., 80% training, 10% validation, 10% test).

Model Selection and Training:

Use a pre-trained YOLO (You Only Look Once) model due to its efficiency and accuracy in real-time object detection tasks.

Fine-tune the model on the annotated dataset using transfer learning to adapt it specifically to the mining safety PPE detection task.

Model Evaluation:

Evaluate the model's performance using metrics such as precision, recall, F1-score, and mean Average Precision (mAP).

Conduct error analysis to identify common misclassifications and refine the model accordingly.

Deployment:

Integrate the trained model into a real-time monitoring system.

Develop a user interface that displays video feeds and highlights detected PPE and any non-compliance issues.

Implement alert mechanisms to notify supervisors of any detected safety violations.

Continuous Improvement:

Collect feedback from the deployment to continuously improve the model.

Regularly update the dataset with new images and retrain the model to maintain high accuracy.

Expected Outcomes:

A high-accuracy object detection model capable of identifying and differentiating between 13 classes of PPE.

Enhanced safety monitoring system for mining sites, reducing the likelihood of accidents due to non-compliance with PPE regulations.

A scalable solution that can be adapted to other industrial environments requiring PPE detection.

Tools and Technologies:

Annotation Tool: Roboflow

Object Detection Model: YOLO (preferably YOLOv8 or YOLOv9 for efficiency)

Programming Language: Python

Frameworks: PyTorch or TensorFlow for model training and inference

Deployment Platform: Docker for containerization and deployment on edge devices or cloud platforms

Monitoring and Alert System: Custom-built using Flask/Django (for web interface) and integrated with real-time notification services (e.g., Slack, email, SMS)

This project will significantly contribute to improving the safety standards in mining operations by ensuring that all workers are consistently wearing the required protective gear.
u
The TAWOS dataset
rdr.ucl.ac.uk
txt
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vali Tawosi; Afnan Abdulaziz A Alsubaihin; Rebecca Moussa; Federica Sarro (2023). The TAWOS dataset [Dataset]. http://doi.org/10.5522/04/19085834.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.5522/04/19085834.v1
Dataset updated
May 31, 2023
Dataset provided by
University College London
Authors
Vali Tawosi; Afnan Abdulaziz A Alsubaihin; Rebecca Moussa; Federica Sarro
License
https://www.apache.org/licenses/LICENSE-2.0.htmlhttps://www.apache.org/licenses/LICENSE-2.0.html
Description
TAWOS (Peacock in Farsi and Arabic) is a dataset of agile open-source software project issues mined from Jira repositories including many descriptive features (raw and derived). The dataset aims to be all-inclusive, making it well-suited to several research avenues, and cross-analyses therein. This dataset is described and presented in the paper "A Versatile Dataset of Agile Open Source Software Projects" authored by Vali Tawosi, Afnan Al-Subaihin, Rebecca Moussa and Federica Sarro. The paper is accepted at the 2022 Mining Software Repositories (MSR) conference. Citation information will be available soon. For further information please refer to "https://github.com/SOLAR-group/TAWOS".
Grocery Store dataset for data mining
kaggle.com
zip
Updated Mar 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Honey Patel (2021). Grocery Store dataset for data mining [Dataset]. https://www.kaggle.com/honeypatel2158/grocery-store-dataset-for-data-mining
Explore at:
zip(7990 bytes)Available download formats
Dataset updated
Mar 9, 2021
Authors
Honey Patel
Description
Dataset

This dataset was created by Honey Patel

Contents
f
MSR 2019 Mining Challenge Dataset
figshare.com
zip
Updated Mar 19, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Akond Rahman (2019). MSR 2019 Mining Challenge Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.6943304.v3
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6943304.v3
Dataset updated
Mar 19, 2019
Dataset provided by
figshare
Authors
Akond Rahman
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset used for paper: Snakes in Paradise?: Insecure Python-related Coding Practices in Stack Overflow
m
Multidimensional Dataset Of Food Security And Nutrition In Cauca.
data.mendeley.com
Updated Dec 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Santiago Restrepo (2021). Multidimensional Dataset Of Food Security And Nutrition In Cauca. [Dataset]. http://doi.org/10.17632/wsss65c885.1
Explore at:
Unique identifier
https://doi.org/10.17632/wsss65c885.1
Dataset updated
Dec 6, 2021
Authors
David Santiago Restrepo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Cauca
Description
A multidimensional dataset created for the department of Cauca based on public data sources is published. The dataset integrates the 4 FAO food security dimensions: physical availability of food, economic and physical access to food, food utilization, and the sustainability of the dimensions mentioned above. It also allows analysis of different variables such as nutritional, socioeconomic, climatic, sociodemographic, among others with statistical techniques or temporal analysis. The dataset can also be used for analysis and extraction of characteristics with computer vision techniques from satellite images, or multimodal machine learning with data of a different nature (images and tabular data).

The dataset Contains the folders: - Multidimensional dataset of Cauca/: Here are the tabular data of the municipalities of the department of Cauca. The folder contains the files: 1. dictionary(English).xlsx: The dictionary of the static variables for each municipality of Cauca in english. 2. dictionary(Español): The dictionary of the static variables for each municipality of Cauca in spanish. 3. dictionary(English).xlsx: The dictionary of the static variables for each municipality of Cauca in english. 4. MultidimensionalDataset_AllMunicipalities.csv: Nutritional, climatic, sociodemographic, socioeconomic and agricultural data of the 42 municipalities of the department of Cauca, although with some null values due to the lack of data in nutrition surveys of some municipalities. - Satellite Images Popayán/: Here are the monthly Landsat 8 satellite images of the municipality of Popayán in Cauca. The folder contains the folders: 1. RGB/: Contains the RGB images of the municipality of Popayán in the department of Cauca. It contains RGB images of Popayán from April 2013 to December 2020 in a resolution of 15 m / px. The title of each image is image year_month.png. 1. 6 Band Images/: Contains images of Landsat 8 using bands 1 to 8 to generate images of the municipality of Popayán in the department of Cauca. It contains 6 band images in a tif format of Popayán from April 2013 to December 2020 in a resolution of 15 m / px. The title of each image is image year_month.tif.
P
Dataset for Erasable Itemset Mining
opendata.pku.edu.cn
Updated Nov 19, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peking University Open Research Data Platform (2015). Dataset for Erasable Itemset Mining [Dataset]. http://doi.org/10.18170/DVN/ISHFQX
Explore at:
text/plain; charset=us-ascii(5336007), text/plain; charset=us-ascii(9764947), text/plain; charset=us-ascii(7000387)Available download formats
Unique identifier
https://doi.org/10.18170/DVN/ISHFQX
Dataset updated
Nov 19, 2015
Dataset provided by
Peking University Open Research Data Platform
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
These three artificial datasets are for mining erasable itemset. The definition of erasable itemset is in the following reference papers. Note that the three data sets all include 200 different items. But for each item, we did not give the profit value of it. Users can generate as they require, with normal or randomly distribution.
m
Data Buffalo Toraja
data.mendeley.com
Updated May 16, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abdul Rachman Manga (2025). Data Buffalo Toraja [Dataset]. http://doi.org/10.17632/kbft73pdkw.2
Explore at:
Unique identifier
https://doi.org/10.17632/kbft73pdkw.2
Dataset updated
May 16, 2025
Authors
Abdul Rachman Manga
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
This data was taken directly in the Toraja area using a digital camera, a minimum shooting distance of 3 m in video form, the results of the shooting are divided into frames
Datasets(Original, Mean, Median, Most Frequent).zip
figshare.com
zip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Omar Elzeki (2023). Datasets(Original, Mean, Median, Most Frequent).zip [Dataset]. http://doi.org/10.6084/m9.figshare.8118710.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.8118710.v1
Dataset updated
May 31, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Omar Elzeki
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset is transformed into Matlab format. They are designed to be in cell formats. Each cell is a matrix which consists of a column representing the gene and row for the subject.Each dataset is organized in a separate directory. The directory contains four versions: a) Original dataset, b) Imputed dataset by MEAN,c) Imputed dataset by MEDIAN,d) Imputed dataset by Most Frequent,
m
Balinese Story Texts Dataset - Characters, Aliases, and their Classification...
data.mendeley.com
Updated Mar 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
I Made Satria Bimantara (2024). Balinese Story Texts Dataset - Characters, Aliases, and their Classification [Dataset]. http://doi.org/10.17632/h2tf5ymcp9.2
Explore at:
Unique identifier
https://doi.org/10.17632/h2tf5ymcp9.2
Dataset updated
Mar 15, 2024
Authors
I Made Satria Bimantara
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset consists of 120 Balinese story texts (as known as Satua Bali) which have been annotated for character analysis purposes, including character identification, alias clustering, and character classification into protagonist or antagonist. The labeling involved two Balinese native speakers who were fluent in understanding Balinese story texts. One of them is an expert in the fields of sociolinguistics and macrolinguistics. Reliability and level of agreement in the dataset are measured by Cohen's kappa coefficient, Jaccard similarity coefficient, and F1-score and all of them show almost perfect agreement values (>0,81). There are four main folders, each used for different character analysis purposes: 1. First Dataset (charsNamedEntity): 89,917 tokens annotated with five character named entity labels (ANM, ADJ, PNAME, GODS, OBJ) for character named entity recognition purpose 2. Second Dataset (charsExtraction): 6,634 annotated sentences for the purpose of character identification at the sentence level 3. Third Dataset (charsAliasClustering): 930 lists of character groups from 120 story texts for the purpose of alias clustering 4. Fourth Dataset (charsClassification): 848 lists of character groups that have been filtered into two groups (Protagonist and Antagonist)
m
Bias-Free Dataset of Food Delivery App Reviews with Data Poisoning Attacks
data.mendeley.com
Updated Oct 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hyunggu Jung (2023). Bias-Free Dataset of Food Delivery App Reviews with Data Poisoning Attacks [Dataset]. http://doi.org/10.17632/rnyrpzyw3h.1
Explore at:
Unique identifier
https://doi.org/10.17632/rnyrpzyw3h.1
Dataset updated
Oct 13, 2023
Authors
Hyunggu Jung
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset consists of reviews collected from restaurants on a Korean delivery app platform running a review event. A total of 128,668 reviews were collected from 136 restaurants by crawling reviews using the Selenium library in Python. The 136 chosen restaurants run review events which demand customers to write reviews with 5 stars and photos. So the annotation of data was done by considering 1) whether the review gives five-star ratings, and 2) whether the review contains photo(s).
m
OpenScience Slovenia document metadata dataset
data.mendeley.com
narcis.nl
Updated Nov 5, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mladen Borovič (2019). OpenScience Slovenia document metadata dataset [Dataset]. http://doi.org/10.17632/7wh9xvvmgk.1
Explore at:
Unique identifier
https://doi.org/10.17632/7wh9xvvmgk.1
Dataset updated
Nov 5, 2019
Authors
Mladen Borovič
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Area covered
Slovenia
Description
The OpenScience Slovenia metadata dataset contains metadata entries for Slovenian public domain academic documents which include undergraduate and postgraduate theses, research and professional articles, along with other academic document types. The data within the dataset was collected as a part of the establishment of the Slovenian Open-Access Infrastructure which defined a unified document collection process and cataloguing for universities in Slovenia within the infrastructure repositories. The data was collected from several already established but separate library systems in Slovenia and merged into a single metadata scheme using metadata deduplication and merging techniques. It consists of text and numerical fields, representing attributes that describe documents. These attributes include document titles, keywords, abstracts, typologies, authors, issue years and other identifiers such as URL and UDC. The potential of this dataset lies especially in text mining and text classification tasks and can also be used in development or benchmarking of content-based recommender systems on real-world data.

Facebook

Twitter

Click to copy link

Link copied

Cite

Training Data (2023). miners-detection [Dataset]. https://huggingface.co/datasets/TrainingDataPro/miners-detection

miners-detection

TrainingDataPro/miners-detection

Explore at:

98 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Sep 20, 2023

Authors

Training Data

License

Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically

Description

The dataset consists of of photos captured within various mines, focusing on miners engaged in their work. Each photo is annotated with bounding box detection of the miners, an attribute highlights whether each miner is sitting or standing in the photo.

The dataset's diverse applications such as computer vision, safety assessment and others make it a valuable resource for researchers, employers, and policymakers in the mining industry.

Clear search

Close search

Google apps

Main menu

	Combined	Maus	Tang
Accuracy	99.78	99.74	99.83
Precision	99.22	99.20	99.24
Recall	95.71	96.34	95.10

miners-detection

Logs and Mined Sequential Patterns of Programming Processes from...

Human-Computer Interaction Logs

Public benchmark dataset for Conformance Checking in Process Mining

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

The Maven Dependency Dataset

Data from: A Generic Local Algorithm for Mining Data Streams in Large...

Global ML-ready dataset for mining areas in satellite images

Data-Mining-Final-Project-Data

Cv Cbi Mining Safety Dataset

Project Description for Roboflow: Mining Safety - PPE Detection

Project Name: Mining Safety - PPE Detection

Overview:

Objective:

PPE Classes (Labels):

Dataset:

Project Steps:

Expected Outcomes:

Tools and Technologies:

The TAWOS dataset

Grocery Store dataset for data mining

Dataset

Contents

MSR 2019 Mining Challenge Dataset

Multidimensional Dataset Of Food Security And Nutrition In Cauca.

Dataset for Erasable Itemset Mining

Data Buffalo Toraja

Datasets(Original, Mean, Median, Most Frequent).zip

Balinese Story Texts Dataset - Characters, Aliases, and their Classification...

Bias-Free Dataset of Food Delivery App Reviews with Data Poisoning Attacks

OpenScience Slovenia document metadata dataset

miners-detectionSee More Versions

TrainingDataPro/miners-detection

miners-detection